linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages
@ 2024-03-29 21:24 Michael Roth
  2024-03-29 21:24 ` [PATCH gmem 1/6] KVM: guest_memfd: Fix stub for kvm_gmem_get_uninit_pfn() Michael Roth
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Michael Roth @ 2024-03-29 21:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li, Matthew Wilcox

These patches are based on top of:

  https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue

and are also available from:
  
  https://github.com/AMDESE/linux/commits/kvm-gmem-common-v1/
  

Overview
--------

This is a small collection of patches that addresses some review comments
from Paolo's series:

  [PATCH 00/21] TDX/SNP part 1 of n, for 6.9
  https://lore.kernel.org/lkml/20240227232100.478238-1-pbonzini@redhat.com/

and also introduces some flags and interfaces that might also be relevant to
the scope of that series.

I'm posting these together initially as they comprise what may be the last
batch of SNP dependencies that are potentially relevant to TDX and a common
CoCo tree, but can split out or move back into SNP series, whatever is
deemed preferable.


Patch Layout
------------

1-3: These are smaller fix-ups to address various review comments pertaining
     to the gmem hooks that were originally part of the SNP hypervisor
     postings. In each case they can be potentially squashed into the
     corresponding patches in kvm-coco-queue if that's preferable.

4-5: This introduces an AS_INACCESSIBLE flag that prevents unexpected
     accesses to hole-punched gmem pages before invalidation hooks have had
     a chance to make them safely accessible to the host again.

6:   This implements an interface that was proposed by Sean during this[1]
     discussion regarding SNP_LAUNCH_UPDATE and discussed in more detail
     during the PUCK session "Finalizing internal guest_memfd APIs for
     SNP/TDX". It is not verbatim what was discussed, but is hopefully a
     reasonable starting point to handle use-cases like SNP_LAUNCH_UPDATE.
     It may also avoid the need to export kvm_gmem_get_uninit_pfn() as an
     external interface if SNP_LAUNCH_UPDATE is still the only
     known/planned user.


Thanks!

[1] https://lore.kernel.org/lkml/Zb1yv67h6gkYqqv9@google.com/


----------------------------------------------------------------
Michael Roth (6):
      KVM: guest_memfd: Fix stub for kvm_gmem_get_uninit_pfn()
      KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary
      KVM: x86: Pass private/shared fault indicator to gmem_validate_fault
      mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
      KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode
      KVM: guest_memfd: Add interface for populating gmem pages with user data

 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu/mmu.c          |  3 ++-
 include/linux/kvm_host.h        | 45 +++++++++++++++++++++++++++++++++++-
 include/linux/pagemap.h         |  1 +
 mm/truncate.c                   |  3 ++-
 virt/kvm/guest_memfd.c          | 51 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 102 insertions(+), 4 deletions(-)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH gmem 1/6] KVM: guest_memfd: Fix stub for kvm_gmem_get_uninit_pfn()
  2024-03-29 21:24 [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages Michael Roth
@ 2024-03-29 21:24 ` Michael Roth
  2024-03-29 21:24 ` [PATCH gmem 2/6] KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary Michael Roth
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Michael Roth @ 2024-03-29 21:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li

The stub needs to return an integer in order to avoid breakage when
CONFIG_KVM_PRIVATE_MEM is not set. Add it.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 73638779974a..2f5074eff958 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2459,7 +2459,9 @@ static inline int kvm_gmem_get_uninit_pfn(struct kvm *kvm,
 static inline int kvm_gmem_undo_get_pfn(struct kvm *kvm,
 				        struct kvm_memory_slot *slot, gfn_t gfn,
 				        int order)
-{}
+{
+	return -EIO;
+}
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
 #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH gmem 2/6] KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary
  2024-03-29 21:24 [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages Michael Roth
  2024-03-29 21:24 ` [PATCH gmem 1/6] KVM: guest_memfd: Fix stub for kvm_gmem_get_uninit_pfn() Michael Roth
@ 2024-03-29 21:24 ` Michael Roth
  2024-04-01  5:06   ` Binbin Wu
  2024-03-29 21:24 ` [PATCH gmem 3/6] KVM: x86: Pass private/shared fault indicator to gmem_validate_fault Michael Roth
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Michael Roth @ 2024-03-29 21:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li, Isaku Yamahata

It has been reported that the internal workings of
kvm_gmem_prepare_folio() incurs noticeable overhead for large guests
even for platforms where kvm_arch_gmem_prepare() is a no-op.

Provide a new kvm_arch_gmem_prepare_needed() hook so that architectures
that set CONFIG_HAVE_KVM_GMEM_PREPARE can still opt-out of issuing the
kvm_arch_gmem_prepare() callback if the particular KVM instance doesn't
require any sort of special preparation of its gmem pages prior to use.

Link: https://lore.kernel.org/lkml/20240228202906.GB10568@ls.amr.corp.intel.com/
Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/guest_memfd.c   | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2f5074eff958..5b8308b5e4af 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2466,6 +2466,7 @@ static inline int kvm_gmem_undo_get_pfn(struct kvm *kvm,
 
 #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
+bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
 #endif
 
 #ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 74e19170af8a..4ce0056d1149 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -13,6 +13,13 @@ struct kvm_gmem {
 	struct list_head entry;
 };
 
+#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+bool __weak kvm_arch_gmem_prepare_needed(struct kvm *kvm)
+{
+	return false;
+}
+#endif
+
 static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
 {
 #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
@@ -27,6 +34,9 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 		gfn_t gfn;
 		int rc;
 
+		if (!kvm_arch_gmem_prepare_needed(kvm))
+			continue;
+
 		slot = xa_load(&gmem->bindings, index);
 		if (!slot)
 			continue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH gmem 3/6] KVM: x86: Pass private/shared fault indicator to gmem_validate_fault
  2024-03-29 21:24 [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages Michael Roth
  2024-03-29 21:24 ` [PATCH gmem 1/6] KVM: guest_memfd: Fix stub for kvm_gmem_get_uninit_pfn() Michael Roth
  2024-03-29 21:24 ` [PATCH gmem 2/6] KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary Michael Roth
@ 2024-03-29 21:24 ` Michael Roth
  2024-03-29 21:24 ` [PATCH gmem 4/6] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Michael Roth @ 2024-03-29 21:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li

TDX has use for a similar interface, but in that case it needs an
indication of whether or not the fault was private. Go ahead and plumb
that information through.

Link: https://lore.kernel.org/lkml/35bc4582-8a03-413b-be0e-4cc419715772@linux.intel.com/
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/mmu/mmu.c          | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 16fff18ef2e5..90dc0ae9311a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1815,7 +1815,8 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
-	int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
+	int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
+				   u8 *max_level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d7ee18fe524..0049d49aa913 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4340,7 +4340,8 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
 
 	r = static_call(kvm_x86_gmem_validate_fault)(vcpu->kvm, fault->pfn,
-						     fault->gfn, &fault->max_level);
+						     fault->gfn, fault->is_private,
+						     &fault->max_level);
 	if (r) {
 		kvm_release_pfn_clean(fault->pfn);
 		return r;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH gmem 4/6] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
  2024-03-29 21:24 [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages Michael Roth
                   ` (2 preceding siblings ...)
  2024-03-29 21:24 ` [PATCH gmem 3/6] KVM: x86: Pass private/shared fault indicator to gmem_validate_fault Michael Roth
@ 2024-03-29 21:24 ` Michael Roth
  2024-04-15 13:19   ` Vlastimil Babka
  2024-03-29 21:24 ` [PATCH gmem 5/6] KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
  2024-03-29 21:24 ` [PATCH gmem 6/6] KVM: guest_memfd: Add interface for populating gmem pages with user data Michael Roth
  5 siblings, 1 reply; 12+ messages in thread
From: Michael Roth @ 2024-03-29 21:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li, Matthew Wilcox

filemap users like guest_memfd may use page cache pages to
allocate/manage memory that is only intended to be accessed by guests
via hardware protections like encryption. Writes to memory of this sort
in common paths like truncation may cause unexpected behavior such
writing garbage instead of zeros when attempting to zero pages, or
worse, triggering hardware protections that are considered fatal as far
as the kernel is concerned.

Introduce a new address_space flag, AS_INACCESSIBLE, and use this
initially to prevent zero'ing of pages during truncation, with the
understanding that it is up to the owner of the mapping to handle this
specially if needed.

Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
Cc: Matthew Wilcox <willy@infradead.org>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/pagemap.h | 1 +
 mm/truncate.c           | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e8ac0b32f84d..a7c3f43d1d22 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -207,6 +207,7 @@ enum mapping_flags {
 	AS_STABLE_WRITES,	/* must wait for writeback before modifying
 				   folio contents */
 	AS_UNMOVABLE,		/* The mapping cannot be moved, ever */
+	AS_INACCESSIBLE,	/* Do not attempt direct R/W access to the mapping */
 };
 
 /**
diff --git a/mm/truncate.c b/mm/truncate.c
index 725b150e47ac..c501338c7ebd 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -233,7 +233,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 	 * doing a complex calculation here, and then doing the zeroing
 	 * anyway if the page split fails.
 	 */
-	folio_zero_range(folio, offset, length);
+	if (!(folio->mapping->flags & AS_INACCESSIBLE))
+		folio_zero_range(folio, offset, length);
 
 	if (folio_has_private(folio))
 		folio_invalidate(folio, offset, length);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH gmem 5/6] KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode
  2024-03-29 21:24 [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages Michael Roth
                   ` (3 preceding siblings ...)
  2024-03-29 21:24 ` [PATCH gmem 4/6] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
@ 2024-03-29 21:24 ` Michael Roth
  2024-04-15 13:21   ` Vlastimil Babka
  2024-03-29 21:24 ` [PATCH gmem 6/6] KVM: guest_memfd: Add interface for populating gmem pages with user data Michael Roth
  5 siblings, 1 reply; 12+ messages in thread
From: Michael Roth @ 2024-03-29 21:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li

truncate_inode_pages_range() may attempt to zero pages before truncating
them, and this will occur before arch-specific invalidations can be
triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For
AMD SEV-SNP this would result in an RMP #PF being generated by the
hardware, which is currently treated as fatal (and even if specifically
allowed for, would not result in anything other than garbage being
written to guest pages due to encryption). On Intel TDX this would also
result in undesirable behavior.

Set the AS_INACCESSIBLE flag to prevent the MM from attempting
unexpected accesses of this sort during operations like truncation.

This may also in some cases yield a decent performance improvement for
guest_memfd userspace implementations that hole-punch ranges immediately
after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since
the current implementation of truncate_inode_pages_range() always ends
up zero'ing an entire 4K range if it is backing by a 2M folio.

Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 virt/kvm/guest_memfd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4ce0056d1149..3668a5f1d82b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -428,6 +428,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	inode->i_private = (void *)(unsigned long)flags;
 	inode->i_op = &kvm_gmem_iops;
 	inode->i_mapping->a_ops = &kvm_gmem_aops;
+	inode->i_mapping->flags |= AS_INACCESSIBLE;
 	inode->i_mode |= S_IFREG;
 	inode->i_size = size;
 	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH gmem 6/6] KVM: guest_memfd: Add interface for populating gmem pages with user data
  2024-03-29 21:24 [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages Michael Roth
                   ` (4 preceding siblings ...)
  2024-03-29 21:24 ` [PATCH gmem 5/6] KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
@ 2024-03-29 21:24 ` Michael Roth
  2024-04-15 13:36   ` Vlastimil Babka
  5 siblings, 1 reply; 12+ messages in thread
From: Michael Roth @ 2024-03-29 21:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li

During guest run-time, kvm_arch_gmem_prepare() is issued as needed to
prepare newly-allocated gmem pages prior to mapping them into the guest.
In the case of SEV-SNP, this mainly involves setting the pages to
private in the RMP table.

However, for the GPA ranges comprising the initial guest payload, which
are encrypted/measured prior to starting the guest, the gmem pages need
to be accessed prior to setting them to private in the RMP table so they
can be initialized with the userspace-provided data. Additionally, an
SNP firmware call is needed afterward to encrypt them in-place and
measure the contents into the guest's launch digest.

While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that
this handling can be done in an open-coded/vendor-specific manner, this
may expose more gmem-internal state/dependencies to external callers
than necessary. Try to avoid this by implementing an interface that
tries to handle as much of the common functionality inside gmem as
possible, while also making it generic enough to potentially be
usable/extensible for use-cases beyond just SEV-SNP.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h | 40 ++++++++++++++++++++++++++++++++++++++++
 virt/kvm/guest_memfd.c   | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5b8308b5e4af..8a75787090f3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2473,4 +2473,44 @@ bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
 void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
 #endif
 
+/**
+ * kvm_gmem_populate_args - kvm_gmem_populate() argument structure
+ *
+ * @gfn: starting GFN to be populated
+ * @src: userspace-provided buffer containing data to copy into GFN range
+ * @npages: number of pages to copy from userspace-buffer
+ * @do_memcpy: whether to do a direct memcpy of the data prior to issuing
+ *             the post-populate callback
+ * @post_populate: callback to issue for each gmem page that backs the GPA
+ *                 range (which will be filled with corresponding contents from
+ *                 @src if @do_memcpy was set)
+ * @opaque: opaque data to pass to @post_populate callback
+ */
+struct kvm_gmem_populate_args {
+	gfn_t gfn;
+	void __user *src;
+	int npages;
+	bool do_memcpy;
+	int (*post_populate)(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn, kvm_pfn_t pfn, void __user *src, int order,
+			     void *opaque);
+	void *opaque;
+};
+
+/**
+ * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
+ *
+ * @kvm: KVM instance
+ * @slot: slot containing the GPA range being prepared
+ * @args: argument structure
+ *
+ * This is primarily intended for cases where a gmem-backed GPA range needs
+ * to be initialized with userspace-provided data prior to being mapped into
+ * the guest as a private page. This should be called with the slots->lock
+ * held so that caller-enforced invariants regarding the expected memory
+ * attributes of the GPA range do not race with KVM_SET_MEMORY_ATTRIBUTES.
+ */
+int kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
+		      struct kvm_gmem_populate_args *args);
+
 #endif
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 3668a5f1d82b..3e3c4b7fff3b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -643,3 +643,43 @@ int kvm_gmem_undo_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	return r;
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_undo_get_pfn);
+
+int kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
+		      struct kvm_gmem_populate_args *args)
+{
+	int ret, max_order, i;
+
+	for (i = 0; i < args->npages; i += (1 << max_order)) {
+		void __user *src = args->src + i * PAGE_SIZE;
+		gfn_t gfn = args->gfn + i;
+		kvm_pfn_t pfn;
+
+		ret = __kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &max_order, false);
+		if (ret)
+			break;
+
+		if (!IS_ALIGNED(gfn, (1 << max_order)) ||
+		    (args->npages - i) < (1 << max_order))
+			max_order = 0;
+
+		if (args->do_memcpy && args->src) {
+			ret = copy_from_user(pfn_to_kaddr(pfn), src, (1 << max_order) * PAGE_SIZE);
+			if (ret)
+				goto e_release;
+		}
+
+		if (args->post_populate) {
+			ret = args->post_populate(kvm, slot, gfn, pfn, src, max_order,
+						  args->opaque);
+			if (ret)
+				goto e_release;
+		}
+e_release:
+		put_page(pfn_to_page(pfn));
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_populate);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH gmem 2/6] KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary
  2024-03-29 21:24 ` [PATCH gmem 2/6] KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary Michael Roth
@ 2024-04-01  5:06   ` Binbin Wu
  2024-04-02 21:50     ` Isaku Yamahata
  0 siblings, 1 reply; 12+ messages in thread
From: Binbin Wu @ 2024-04-01  5:06 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Xiaoyao Li, Isaku Yamahata



On 3/30/2024 5:24 AM, Michael Roth wrote:
> It has been reported that the internal workings of
> kvm_gmem_prepare_folio() incurs noticeable overhead for large guests
> even for platforms where kvm_arch_gmem_prepare() is a no-op.
>
> Provide a new kvm_arch_gmem_prepare_needed() hook so that architectures
> that set CONFIG_HAVE_KVM_GMEM_PREPARE can still opt-out of issuing the
> kvm_arch_gmem_prepare() callback

Just wondering which part has big impact on performance,
the issue of kvm_arch_gmem_prepare() callback or the preparation code for
the kvm_arch_gmem_prepare()?


> if the particular KVM instance doesn't
> require any sort of special preparation of its gmem pages prior to use.
>
> Link: https://lore.kernel.org/lkml/20240228202906.GB10568@ls.amr.corp.intel.com/
> Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   include/linux/kvm_host.h |  1 +
>   virt/kvm/guest_memfd.c   | 10 ++++++++++
>   2 files changed, 11 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2f5074eff958..5b8308b5e4af 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2466,6 +2466,7 @@ static inline int kvm_gmem_undo_get_pfn(struct kvm *kvm,
>   
>   #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
>   int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
> +bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
>   #endif
>   
>   #ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 74e19170af8a..4ce0056d1149 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -13,6 +13,13 @@ struct kvm_gmem {
>   	struct list_head entry;
>   };
>   
> +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
> +bool __weak kvm_arch_gmem_prepare_needed(struct kvm *kvm)
> +{
> +	return false;
> +}
> +#endif
> +
>   static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
>   {
>   #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
> @@ -27,6 +34,9 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
>   		gfn_t gfn;
>   		int rc;
>   
> +		if (!kvm_arch_gmem_prepare_needed(kvm))
> +			continue;

Can multiple gmems (if any) bound to the same inode's address space 
belong to different kvm instances?
If not, just return here?

> +
>   		slot = xa_load(&gmem->bindings, index);
>   		if (!slot)
>   			continue;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH gmem 2/6] KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary
  2024-04-01  5:06   ` Binbin Wu
@ 2024-04-02 21:50     ` Isaku Yamahata
  0 siblings, 0 replies; 12+ messages in thread
From: Isaku Yamahata @ 2024-04-02 21:50 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Michael Roth, kvm, linux-coco, Paolo Bonzini,
	Sean Christopherson, Isaku Yamahata, Xu Yilun, Xiaoyao Li,
	Isaku Yamahata

On Mon, Apr 01, 2024 at 01:06:07PM +0800,
Binbin Wu <binbin.wu@linux.intel.com> wrote:

> 
> 
> On 3/30/2024 5:24 AM, Michael Roth wrote:
> > It has been reported that the internal workings of
> > kvm_gmem_prepare_folio() incurs noticeable overhead for large guests
> > even for platforms where kvm_arch_gmem_prepare() is a no-op.
> > 
> > Provide a new kvm_arch_gmem_prepare_needed() hook so that architectures
> > that set CONFIG_HAVE_KVM_GMEM_PREPARE can still opt-out of issuing the
> > kvm_arch_gmem_prepare() callback
> 
> Just wondering which part has big impact on performance,
> the issue of kvm_arch_gmem_prepare() callback or the preparation code for
> the kvm_arch_gmem_prepare()?

I'm fine without this patch for now baecause this is optimization.
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH gmem 4/6] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
  2024-03-29 21:24 ` [PATCH gmem 4/6] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
@ 2024-04-15 13:19   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2024-04-15 13:19 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li, Matthew Wilcox

On 3/29/24 10:24 PM, Michael Roth wrote:
> filemap users like guest_memfd may use page cache pages to
> allocate/manage memory that is only intended to be accessed by guests
> via hardware protections like encryption. Writes to memory of this sort
> in common paths like truncation may cause unexpected behavior such
> writing garbage instead of zeros when attempting to zero pages, or
> worse, triggering hardware protections that are considered fatal as far
> as the kernel is concerned.
> 
> Introduce a new address_space flag, AS_INACCESSIBLE, and use this
> initially to prevent zero'ing of pages during truncation, with the
> understanding that it is up to the owner of the mapping to handle this
> specially if needed.
> 
> Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
> Cc: Matthew Wilcox <willy@infradead.org>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Hm somehow it seems like a rather blunt solution to a fairly specific issue
on one hand, and on the other hand I'm not sure whether there are other
places (not yet triggered) that should now take into account the flag to
keep its promise. But as long as it gets the job done, and can be replaced
later with something better...

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/pagemap.h | 1 +
>  mm/truncate.c           | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index e8ac0b32f84d..a7c3f43d1d22 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -207,6 +207,7 @@ enum mapping_flags {
>  	AS_STABLE_WRITES,	/* must wait for writeback before modifying
>  				   folio contents */
>  	AS_UNMOVABLE,		/* The mapping cannot be moved, ever */
> +	AS_INACCESSIBLE,	/* Do not attempt direct R/W access to the mapping */
>  };
>  
>  /**
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 725b150e47ac..c501338c7ebd 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -233,7 +233,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>  	 * doing a complex calculation here, and then doing the zeroing
>  	 * anyway if the page split fails.
>  	 */
> -	folio_zero_range(folio, offset, length);
> +	if (!(folio->mapping->flags & AS_INACCESSIBLE))
> +		folio_zero_range(folio, offset, length);
>  
>  	if (folio_has_private(folio))
>  		folio_invalidate(folio, offset, length);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH gmem 5/6] KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode
  2024-03-29 21:24 ` [PATCH gmem 5/6] KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
@ 2024-04-15 13:21   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2024-04-15 13:21 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li

On 3/29/24 10:24 PM, Michael Roth wrote:
> truncate_inode_pages_range() may attempt to zero pages before truncating
> them, and this will occur before arch-specific invalidations can be
> triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For
> AMD SEV-SNP this would result in an RMP #PF being generated by the
> hardware, which is currently treated as fatal (and even if specifically
> allowed for, would not result in anything other than garbage being
> written to guest pages due to encryption). On Intel TDX this would also
> result in undesirable behavior.
> 
> Set the AS_INACCESSIBLE flag to prevent the MM from attempting
> unexpected accesses of this sort during operations like truncation.
> 
> This may also in some cases yield a decent performance improvement for
> guest_memfd userspace implementations that hole-punch ranges immediately
> after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since
> the current implementation of truncate_inode_pages_range() always ends
> up zero'ing an entire 4K range if it is backing by a 2M folio.
> 
> Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  virt/kvm/guest_memfd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 4ce0056d1149..3668a5f1d82b 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -428,6 +428,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>  	inode->i_private = (void *)(unsigned long)flags;
>  	inode->i_op = &kvm_gmem_iops;
>  	inode->i_mapping->a_ops = &kvm_gmem_aops;
> +	inode->i_mapping->flags |= AS_INACCESSIBLE;
>  	inode->i_mode |= S_IFREG;
>  	inode->i_size = size;
>  	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH gmem 6/6] KVM: guest_memfd: Add interface for populating gmem pages with user data
  2024-03-29 21:24 ` [PATCH gmem 6/6] KVM: guest_memfd: Add interface for populating gmem pages with user data Michael Roth
@ 2024-04-15 13:36   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2024-04-15 13:36 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, Paolo Bonzini, Sean Christopherson, Isaku Yamahata,
	Xu Yilun, Binbin Wu, Xiaoyao Li

On 3/29/24 10:24 PM, Michael Roth wrote:
> During guest run-time, kvm_arch_gmem_prepare() is issued as needed to
> prepare newly-allocated gmem pages prior to mapping them into the guest.
> In the case of SEV-SNP, this mainly involves setting the pages to
> private in the RMP table.
> 
> However, for the GPA ranges comprising the initial guest payload, which
> are encrypted/measured prior to starting the guest, the gmem pages need
> to be accessed prior to setting them to private in the RMP table so they
> can be initialized with the userspace-provided data. Additionally, an
> SNP firmware call is needed afterward to encrypt them in-place and
> measure the contents into the guest's launch digest.
> 
> While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that
> this handling can be done in an open-coded/vendor-specific manner, this
> may expose more gmem-internal state/dependencies to external callers
> than necessary. Try to avoid this by implementing an interface that
> tries to handle as much of the common functionality inside gmem as
> possible, while also making it generic enough to potentially be
> usable/extensible for use-cases beyond just SEV-SNP.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  include/linux/kvm_host.h | 40 ++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/guest_memfd.c   | 40 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5b8308b5e4af..8a75787090f3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2473,4 +2473,44 @@ bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
>  void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
>  #endif
>  
> +/**
> + * kvm_gmem_populate_args - kvm_gmem_populate() argument structure
> + *
> + * @gfn: starting GFN to be populated
> + * @src: userspace-provided buffer containing data to copy into GFN range
> + * @npages: number of pages to copy from userspace-buffer
> + * @do_memcpy: whether to do a direct memcpy of the data prior to issuing
> + *             the post-populate callback
> + * @post_populate: callback to issue for each gmem page that backs the GPA
> + *                 range (which will be filled with corresponding contents from
> + *                 @src if @do_memcpy was set)
> + * @opaque: opaque data to pass to @post_populate callback
> + */
> +struct kvm_gmem_populate_args {
> +	gfn_t gfn;
> +	void __user *src;
> +	int npages;
> +	bool do_memcpy;
> +	int (*post_populate)(struct kvm *kvm, struct kvm_memory_slot *slot,
> +			     gfn_t gfn, kvm_pfn_t pfn, void __user *src, int order,
> +			     void *opaque);
> +	void *opaque;
> +};
> +
> +/**
> + * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
> + *
> + * @kvm: KVM instance
> + * @slot: slot containing the GPA range being prepared
> + * @args: argument structure
> + *
> + * This is primarily intended for cases where a gmem-backed GPA range needs
> + * to be initialized with userspace-provided data prior to being mapped into
> + * the guest as a private page. This should be called with the slots->lock
> + * held so that caller-enforced invariants regarding the expected memory
> + * attributes of the GPA range do not race with KVM_SET_MEMORY_ATTRIBUTES.
> + */
> +int kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
> +		      struct kvm_gmem_populate_args *args);
> +
>  #endif
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 3668a5f1d82b..3e3c4b7fff3b 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -643,3 +643,43 @@ int kvm_gmem_undo_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  	return r;
>  }
>  EXPORT_SYMBOL_GPL(kvm_gmem_undo_get_pfn);
> +
> +int kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
> +		      struct kvm_gmem_populate_args *args)
> +{
> +	int ret, max_order, i;
> +
> +	for (i = 0; i < args->npages; i += (1 << max_order)) {
> +		void __user *src = args->src + i * PAGE_SIZE;
> +		gfn_t gfn = args->gfn + i;
> +		kvm_pfn_t pfn;
> +
> +		ret = __kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &max_order, false);
> +		if (ret)
> +			break;
> +
> +		if (!IS_ALIGNED(gfn, (1 << max_order)) ||
> +		    (args->npages - i) < (1 << max_order))
> +			max_order = 0;
> +
> +		if (args->do_memcpy && args->src) {
> +			ret = copy_from_user(pfn_to_kaddr(pfn), src, (1 << max_order) * PAGE_SIZE);
> +			if (ret)
> +				goto e_release;> +		}
> +
> +		if (args->post_populate) {
> +			ret = args->post_populate(kvm, slot, gfn, pfn, src, max_order,
> +						  args->opaque);
> +			if (ret)
> +				goto e_release;

This if (ret) goto seems unnecessary, was there more code before the label
in a previous version?

With that we could also change this block to "if (!ret &&
args->post_populate)" and remove the first goto and the label.

> +		}
> +e_release:
> +		put_page(pfn_to_page(pfn));
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_gmem_populate);


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-15 13:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29 21:24 [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages Michael Roth
2024-03-29 21:24 ` [PATCH gmem 1/6] KVM: guest_memfd: Fix stub for kvm_gmem_get_uninit_pfn() Michael Roth
2024-03-29 21:24 ` [PATCH gmem 2/6] KVM: guest_memfd: Only call kvm_arch_gmem_prepare hook if necessary Michael Roth
2024-04-01  5:06   ` Binbin Wu
2024-04-02 21:50     ` Isaku Yamahata
2024-03-29 21:24 ` [PATCH gmem 3/6] KVM: x86: Pass private/shared fault indicator to gmem_validate_fault Michael Roth
2024-03-29 21:24 ` [PATCH gmem 4/6] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
2024-04-15 13:19   ` Vlastimil Babka
2024-03-29 21:24 ` [PATCH gmem 5/6] KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
2024-04-15 13:21   ` Vlastimil Babka
2024-03-29 21:24 ` [PATCH gmem 6/6] KVM: guest_memfd: Add interface for populating gmem pages with user data Michael Roth
2024-04-15 13:36   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).