linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX
@ 2023-08-15 17:18 isaku.yamahata
  2023-08-15 17:18 ` [PATCH 1/8] KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd isaku.yamahata
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch series for KVM guest memfd is to have common code base for SEV and
TDX.  Several minor fixes.  Based on this patch series, TDX KVM can defer page
clearing without mmu lock.

Isaku Yamahata (6):
  KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd
  KVM: gmem: removed duplicated kvm_gmem_init()
  KVM: gmem: Fix kvm_gmem_issue_arch_invalidate()
  KVM: gmem: protect kvm_mmu_invalidate_end()
  KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier
  RFC: KVM: gmem: Guarantee the order of destruction

Michael Roth (2):
  KVM: gmem, x86: Add gmem hook for initializing private memory
  KVM: gmem, x86: Add gmem hook for invalidating private memory

 arch/x86/include/asm/kvm-x86-ops.h |  2 ++
 arch/x86/include/asm/kvm_host.h    |  4 +++
 arch/x86/kvm/mmu/mmu.c             | 12 ++++++--
 arch/x86/kvm/x86.c                 |  6 ++++
 include/linux/kvm_host.h           | 27 ++++++++++++++++++
 virt/kvm/guest_mem.c               | 46 ++++++++++++++++++++++++++++--
 virt/kvm/kvm_main.c                | 46 ++++++++++++++++++++++++++++--
 7 files changed, 136 insertions(+), 7 deletions(-)


base-commit: 89b6a7b873d72280e85976bbb8fe4998b2ababa8
-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/8] KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-15 17:18 ` [PATCH 2/8] KVM: gmem: removed duplicated kvm_gmem_init() isaku.yamahata
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

When kvm_gmem_bind() fails fget(), return EBADF instead of EINVAL because
EBADF is more appropriate.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/guest_mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index db644f7fa48b..c81d2bb9ae93 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -479,7 +479,7 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 
 	file = fget(fd);
 	if (!file)
-		return -EINVAL;
+		return -EBADF;
 
 	if (file->f_op != &kvm_gmem_fops)
 		goto err;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/8] KVM: gmem: removed duplicated kvm_gmem_init()
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
  2023-08-15 17:18 ` [PATCH 1/8] KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-15 17:18 ` [PATCH 3/8] KVM: gmem: Fix kvm_gmem_issue_arch_invalidate() isaku.yamahata
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba, Xiaoyao Li

From: Isaku Yamahata <isaku.yamahata@intel.com>

Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/kvm_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ee331cf8ba54..8bfeb615fc4d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6308,7 +6308,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	kvm_preempt_ops.sched_out = kvm_sched_out;
 
 	kvm_init_debug();
-	kvm_gmem_init();
 
 	r = kvm_vfio_ops_init();
 	if (WARN_ON_ONCE(r))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/8] KVM: gmem: Fix kvm_gmem_issue_arch_invalidate()
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
  2023-08-15 17:18 ` [PATCH 1/8] KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd isaku.yamahata
  2023-08-15 17:18 ` [PATCH 2/8] KVM: gmem: removed duplicated kvm_gmem_init() isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-18 22:33   ` Sean Christopherson
  2023-08-15 17:18 ` [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end() isaku.yamahata
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

__filemap_get_folio() can return error.  Use IS_ERR_OR_NULL.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/guest_mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index c81d2bb9ae93..ed03f1d12172 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -53,7 +53,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
 	struct folio *folio;
 
 	folio = kvm_gmem_get_huge_folio(inode, index);
-	if (!folio) {
+	if (IS_ERR_OR_NULL(folio)) {
 		folio = filemap_grab_folio(inode->i_mapping, index);
 		if (!folio)
 			return NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
                   ` (2 preceding siblings ...)
  2023-08-15 17:18 ` [PATCH 3/8] KVM: gmem: Fix kvm_gmem_issue_arch_invalidate() isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-16 20:28   ` Jarkko Sakkinen
  2023-08-18 17:55   ` Sean Christopherson
  2023-08-15 17:18 ` [PATCH 5/8] KVM: gmem, x86: Add gmem hook for initializing private memory isaku.yamahata
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
and it's protected by kvm::mmu_lock.  call kvm_mmu_invalidate_end() before
unlocking it. Not after the unlock.

Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/kvm_main.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8bfeb615fc4d..49380cd62367 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
 	} arg;
 	gfn_handler_t handler;
 	on_lock_fn_t on_lock;
+	on_unlock_fn_t before_unlock;
 	on_unlock_fn_t on_unlock;
 	bool flush_on_ret;
 	bool may_block;
@@ -629,6 +630,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 		kvm_flush_remote_tlbs(kvm);
 
 	if (locked) {
+		if (!IS_KVM_NULL_FN(range->before_unlock))
+			range->before_unlock(kvm);
 		KVM_MMU_UNLOCK(kvm);
 		if (!IS_KVM_NULL_FN(range->on_unlock))
 			range->on_unlock(kvm);
@@ -653,6 +656,7 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
 		.arg.pte	= pte,
 		.handler	= handler,
 		.on_lock	= (void *)kvm_null_fn,
+		.before_unlock	= (void *)kvm_null_fn,
 		.on_unlock	= (void *)kvm_null_fn,
 		.flush_on_ret	= true,
 		.may_block	= false,
@@ -672,6 +676,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
 		.end		= end,
 		.handler	= handler,
 		.on_lock	= (void *)kvm_null_fn,
+		.before_unlock	= (void *)kvm_null_fn,
 		.on_unlock	= (void *)kvm_null_fn,
 		.flush_on_ret	= false,
 		.may_block	= false,
@@ -776,6 +781,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 		.end		= range->end,
 		.handler	= kvm_mmu_unmap_gfn_range,
 		.on_lock	= kvm_mmu_invalidate_begin,
+		.before_unlock	= (void *)kvm_null_fn,
 		.on_unlock	= kvm_arch_guest_memory_reclaimed,
 		.flush_on_ret	= true,
 		.may_block	= mmu_notifier_range_blockable(range),
@@ -815,6 +821,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 
 void kvm_mmu_invalidate_end(struct kvm *kvm)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
 	/*
 	 * This sequence increase will notify the kvm page fault that
 	 * the page that is going to be mapped in the spte could have
@@ -846,6 +854,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
 		.end		= range->end,
 		.handler	= (void *)kvm_null_fn,
 		.on_lock	= kvm_mmu_invalidate_end,
+		.before_unlock	= (void *)kvm_null_fn,
 		.on_unlock	= (void *)kvm_null_fn,
 		.flush_on_ret	= false,
 		.may_block	= mmu_notifier_range_blockable(range),
@@ -2433,6 +2442,8 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
 		kvm_flush_remote_tlbs(kvm);
 
 	if (locked) {
+		if (!IS_KVM_NULL_FN(range->before_unlock))
+			range->before_unlock(kvm);
 		KVM_MMU_UNLOCK(kvm);
 		if (!IS_KVM_NULL_FN(range->on_unlock))
 			range->on_unlock(kvm);
@@ -2447,6 +2458,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, unsigned long attributes,
 		.end = end,
 		.handler = kvm_mmu_unmap_gfn_range,
 		.on_lock = kvm_mmu_invalidate_begin,
+		.before_unlock	= (void *)kvm_null_fn,
 		.on_unlock = (void *)kvm_null_fn,
 		.flush_on_ret = true,
 		.may_block = true,
@@ -2457,7 +2469,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, unsigned long attributes,
 		.arg.attributes = attributes,
 		.handler = kvm_arch_post_set_memory_attributes,
 		.on_lock = (void *)kvm_null_fn,
-		.on_unlock = kvm_mmu_invalidate_end,
+		.before_unlock = kvm_mmu_invalidate_end,
+		.on_unlock = (void *)kvm_null_fn,
 		.may_block = true,
 	};
 	unsigned long i;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/8] KVM: gmem, x86: Add gmem hook for initializing private memory
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
                   ` (3 preceding siblings ...)
  2023-08-15 17:18 ` [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end() isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-16 20:30   ` Jarkko Sakkinen
  2023-08-15 17:18 ` [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating " isaku.yamahata
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Michael Roth <michael.roth@amd.com>

All gmem pages are expected to be 'private' as defined by a particular
arch/platform. Platforms like SEV-SNP require additional operations to
move these pages into a private state, so implement a hook that can be
used to prepare this memory prior to mapping it into a guest.

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table, so this hook will also be used by the KVM MMU to clamp
the maximum mapping size accordingly.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Link: https://lore.kernel.org/r/20230612042559.375660-2-michael.roth@amd.com

---
Changes v2 -> v3:
- Newly added
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  3 +++
 arch/x86/kvm/mmu/mmu.c             | 12 ++++++++++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 13bc212cd4bc..439ba4beb5af 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -133,6 +133,7 @@ KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bbefd79b7950..2bc42f2887fa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1732,6 +1732,9 @@ struct kvm_x86_ops {
 	 * Returns vCPU specific APICv inhibit reasons
 	 */
 	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+	int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
+			    kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 05943ccb55a4..06900b01b8f0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4352,6 +4352,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault)
 {
 	int max_order, r;
+	u8 max_level;
 
 	if (!kvm_slot_can_be_private(fault->slot))
 		return kvm_do_memory_fault_exit(vcpu, fault);
@@ -4361,8 +4362,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	if (r)
 		return r;
 
-	fault->max_level = min(kvm_max_level_for_order(max_order),
-			       fault->max_level);
+	max_level = kvm_max_level_for_order(max_order);
+	r = static_call(kvm_x86_gmem_prepare)(vcpu->kvm, fault->slot, fault->pfn,
+					      fault->gfn, &max_level);
+	if (r) {
+		kvm_release_pfn_clean(fault->pfn);
+		return r;
+	}
+
+	fault->max_level = min(max_level, fault->max_level);
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
 	return RET_PF_CONTINUE;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
                   ` (4 preceding siblings ...)
  2023-08-15 17:18 ` [PATCH 5/8] KVM: gmem, x86: Add gmem hook for initializing private memory isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-16  0:42   ` kernel test robot
                     ` (2 more replies)
  2023-08-15 17:18 ` [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier isaku.yamahata
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Michael Roth <michael.roth@amd.com>

TODO: add a CONFIG option that can be to completely skip arch
invalidation loop and avoid __weak references for arch/platforms that
don't need an additional invalidation hook.

In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Add hooks to wire up handling of this sort when freeing memory in
response to FALLOC_FL_PUNCH_HOLE operations.

Also issue invalidations of all allocated pages when releasing the gmem
file so that the pages are not left in an unusable state when they get
freed back to the host.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@amd.com

---
Changes v4 -> v5:
- Fix compile issue by adding static inline when gmem is disabled

Changes v2 -> v3:
- Newly added
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/x86.c                 |  6 +++++
 include/linux/kvm_host.h           |  3 +++
 virt/kvm/guest_mem.c               | 42 ++++++++++++++++++++++++++++++
 5 files changed, 53 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 439ba4beb5af..48f043de2ec0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2bc42f2887fa..17e78f9f2d17 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1735,6 +1735,7 @@ struct kvm_x86_ops {
 
 	int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
+	void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index de195ad83ec0..b54818d02cb1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13274,6 +13274,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
+#ifdef CONFIG_KVM_PRIVATE_MEM
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+	static_call_cond(kvm_x86_gmem_invalidate)(kvm, start, end);
+}
+#endif
 
 int kvm_spec_ctrl_test_value(u64 value)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 091bc89ae805..349b0bf81fa5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2358,6 +2358,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 #ifdef CONFIG_KVM_PRIVATE_MEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 			      gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2366,6 +2367,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
+
+static inline void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) { }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
 #endif
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index ed03f1d12172..342d2938716a 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -127,6 +127,46 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
 	KVM_MMU_UNLOCK(kvm);
 }
 
+void __weak kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+}
+
+/* Handle arch-specific hooks needed before releasing guarded pages. */
+static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file,
+					   pgoff_t start, pgoff_t end)
+{
+	pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT;
+	pgoff_t index = start;
+
+	end = min(end, file_end);
+
+	while (index < end) {
+		struct folio *folio;
+		unsigned int order;
+		struct page *page;
+		kvm_pfn_t pfn;
+
+		folio = __filemap_get_folio(file->f_mapping, index,
+					    FGP_LOCK, 0);
+		if (!folio) {
+			index++;
+			continue;
+		}
+
+		page = folio_file_page(folio, index);
+		pfn = page_to_pfn(page);
+		order = folio_order(folio);
+
+		kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));
+
+		index = folio_next_index(folio);
+		folio_unlock(folio);
+		folio_put(folio);
+
+		cond_resched();
+	}
+}
+
 static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 {
 	struct list_head *gmem_list = &inode->i_mapping->private_list;
@@ -143,6 +183,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 	list_for_each_entry(gmem, gmem_list, entry)
 		kvm_gmem_invalidate_begin(gmem, start, end);
 
+	kvm_gmem_issue_arch_invalidate(kvm, file, start, end);
 	truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1);
 
 	list_for_each_entry(gmem, gmem_list, entry)
@@ -253,6 +294,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
 	 * memory, as its lifetime is associated with the inode, not the file.
 	 */
 	kvm_gmem_invalidate_begin(gmem, 0, -1ul);
+	kvm_gmem_issue_arch_invalidate(gmem->kvm, file, 0, -1ul);
 	kvm_gmem_invalidate_end(gmem, 0, -1ul);
 
 	list_del(&gmem->entry);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
                   ` (5 preceding siblings ...)
  2023-08-15 17:18 ` [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating " isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-18 18:15   ` Sean Christopherson
  2023-08-15 17:18 ` [PATCH 8/8] RFC: KVM: gmem: Guarantee the order of destruction isaku.yamahata
  2023-08-18 23:14 ` [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX Sean Christopherson
  8 siblings, 1 reply; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add slots_lock around kvm_flush_shadow_all().  kvm_gmem_release() via
fput() and kvm_mmu_notifier_release() via mmput() can be called
simultaneously on process exit because vhost, /dev/vhost_{net, vsock}, can
delay the call to release mmu_notifier, kvm_mmu_notifier_release() by its
kernel thread.  Vhost uses get_task_mm() and mmput() for the kernel thread
to access process memory.  mmput() can defer after closing the file.

kvm_flush_shadow_all() and kvm_gmem_release() can be called simultaneously.
With TDX KVM, HKID releasing by kvm_flush_shadow_all() and private memory
releasing by kvm_gmem_release() can race.  Add slots_lock to
kvm_mmu_notifier_release().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/kvm_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 49380cd62367..4855d0b7a859 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -927,9 +927,16 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 	int idx;
 
+	/*
+	 * Avoide race with kvm_gmem_release().
+	 * This function is called via mmu notifier, mmu_release().
+	 * kvm_gmem_release() is called via fput() on process exit.
+	 */
+	mutex_lock(&kvm->slots_lock);
 	idx = srcu_read_lock(&kvm->srcu);
 	kvm_flush_shadow_all(kvm);
 	srcu_read_unlock(&kvm->srcu, idx);
+	mutex_unlock(&kvm->slots_lock);
 }
 
 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 8/8] RFC: KVM: gmem: Guarantee the order of destruction
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
                   ` (6 preceding siblings ...)
  2023-08-15 17:18 ` [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier isaku.yamahata
@ 2023-08-15 17:18 ` isaku.yamahata
  2023-08-18 23:14 ` [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX Sean Christopherson
  8 siblings, 0 replies; 26+ messages in thread
From: isaku.yamahata @ 2023-08-15 17:18 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Call kvm_flush_shadow_all() before releasing kvm gmem file on the guest
destruction.

The current gmem doesn't guarantee the destruction order between kvm gmem
and kvm_mmu_notifier_release(), which calls kvm_flush_shadow_all().  When
destructing TD guest, it's efficient to call kvm_flush_shadow_all() before
calling kvm_gmem_issue_arch_invalidate() on releasing its struct file
because kvm_flush_shadow_all() releases its host key ID (HKID).  After
releasing HKID, the TDX module doesn't have to validate the consistency of
the Secure-EPT structure.

One way is to make struct kvm to reference kvm gmem file.  The current gmem
implementation chose to make kvm gmem file to reference struct kvm.  So
reference from struct kvm to reference kvm gmem file results in a circular
reference.  Use kvm_mmu_notifier_release() to break it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/linux/kvm_host.h | 24 ++++++++++++++++++++++++
 virt/kvm/kvm_main.c      | 23 ++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 349b0bf81fa5..d717945702a8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -594,6 +594,10 @@ struct kvm_memory_slot {
 	u16 as_id;
 
 #ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
+	struct file *file;
+#endif
+	/* Private guest_mem */
 	struct {
 		struct file __rcu *file;
 		pgoff_t pgoff;
@@ -601,6 +605,26 @@ struct kvm_memory_slot {
 #endif
 };
 
+static inline int kvm_memslot_gmem_fget(struct kvm_memory_slot *memslot, int fd)
+{
+#if defined(CONFIG_KVM_PRIVATE_MEM) && defined(CONFIG_KVM_GENERIC_MMU_NOTIFIER)
+	memslot->file = fget(fd);
+	if (!memslot->file)
+		return -EBADF;
+#endif
+	return 0;
+}
+
+static inline void kvm_memslot_gmem_fput(struct kvm_memory_slot *memslot)
+{
+#if defined(CONFIG_KVM_PRIVATE_MEM) && defined(CONFIG_KVM_GENERIC_MMU_NOTIFIER)
+	if (memslot->file) {
+		fput(memslot->file);
+		memslot->file = NULL;
+	}
+#endif
+}
+
 static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
 {
 	return slot && (slot->flags & KVM_MEM_PRIVATE);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4855d0b7a859..35bc3b64b7e4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -926,6 +926,7 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
 {
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 	int idx;
+	int i;
 
 	/*
 	 * Avoide race with kvm_gmem_release().
@@ -936,6 +937,18 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
 	idx = srcu_read_lock(&kvm->srcu);
 	kvm_flush_shadow_all(kvm);
 	srcu_read_unlock(&kvm->srcu, idx);
+
+	/* Break circular reference count: kvm->gmem, gmem->kvm. */
+	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+		struct kvm_memslots *slots = __kvm_memslots(kvm, i);
+		struct kvm_memory_slot *memslot;
+		struct hlist_node *tmp;
+		int bkt;
+
+		hash_for_each_safe(slots->id_hash, bkt, tmp, memslot, id_node[slots->node_idx])
+			kvm_memslot_gmem_fput(memslot);
+	}
+
 	mutex_unlock(&kvm->slots_lock);
 }
 
@@ -1008,8 +1021,10 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
 /* This does not remove the slot from struct kvm_memslots data structures */
 static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
-	if (slot->flags & KVM_MEM_PRIVATE)
+	if (slot->flags & KVM_MEM_PRIVATE) {
 		kvm_gmem_unbind(slot);
+		kvm_memslot_gmem_fput(slot);
+	}
 
 	kvm_destroy_dirty_bitmap(slot);
 
@@ -1734,6 +1749,8 @@ static void kvm_commit_memory_region(struct kvm *kvm,
 		if (old->dirty_bitmap && !new->dirty_bitmap)
 			kvm_destroy_dirty_bitmap(old);
 
+		kvm_memslot_gmem_fput(old);
+
 		/*
 		 * The final quirk.  Free the detached, old slot, but only its
 		 * memory, not any metadata.  Metadata, including arch specific
@@ -2088,6 +2105,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	new->flags = mem->flags;
 	new->userspace_addr = mem->userspace_addr;
 	if (mem->flags & KVM_MEM_PRIVATE) {
+		r = kvm_memslot_gmem_fget(new, mem->gmem_fd);
+		if (r)
+			goto out;
 		r = kvm_gmem_bind(kvm, new, mem->gmem_fd, mem->gmem_offset);
 		if (r)
 			goto out;
@@ -2103,6 +2123,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (mem->flags & KVM_MEM_PRIVATE)
 		kvm_gmem_unbind(new);
 out:
+	kvm_memslot_gmem_fput(new);
 	kfree(new);
 	return r;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory
  2023-08-15 17:18 ` [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating " isaku.yamahata
@ 2023-08-16  0:42   ` kernel test robot
  2023-08-16 20:37   ` Isaku Yamahata
  2023-10-10  9:17   ` Xu Yilun
  2 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2023-08-16  0:42 UTC (permalink / raw)
  To: isaku.yamahata, kvm, linux-kernel
  Cc: oe-kbuild-all, isaku.yamahata, isaku.yamahata, Michael Roth,
	Paolo Bonzini, Sean Christopherson, erdemaktas, Sagi Shahar,
	David Matlack, Kai Huang, Zhi Wang, chen.bo, linux-coco,
	Chao Peng, Ackerley Tng, Vishal Annapurve, Yuan Yao,
	Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on 89b6a7b873d72280e85976bbb8fe4998b2ababa8]

url:    https://github.com/intel-lab-lkp/linux/commits/isaku-yamahata-intel-com/KVM-gmem-Make-kvm_gmem_bind-return-EBADF-on-wrong-fd/20230816-012315
base:   89b6a7b873d72280e85976bbb8fe4998b2ababa8
patch link:    https://lore.kernel.org/r/8c9f0470ba6e5dc122f3f4e37c4dcfb6fb97b184.1692119201.git.isaku.yamahata%40intel.com
patch subject: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230816/202308160801.jwbys3HI-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230816/202308160801.jwbys3HI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308160801.jwbys3HI-lkp@intel.com/

All errors (new ones prefixed by >>):

   arch/x86/kvm/../../../virt/kvm/guest_mem.c: In function 'kvm_gmem_punch_hole':
>> arch/x86/kvm/../../../virt/kvm/guest_mem.c:186:40: error: 'kvm' undeclared (first use in this function)
     186 |         kvm_gmem_issue_arch_invalidate(kvm, file, start, end);
         |                                        ^~~
   arch/x86/kvm/../../../virt/kvm/guest_mem.c:186:40: note: each undeclared identifier is reported only once for each function it appears in
>> arch/x86/kvm/../../../virt/kvm/guest_mem.c:186:45: error: 'file' undeclared (first use in this function)
     186 |         kvm_gmem_issue_arch_invalidate(kvm, file, start, end);
         |                                             ^~~~


vim +/kvm +186 arch/x86/kvm/../../../virt/kvm/guest_mem.c

   169	
   170	static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
   171	{
   172		struct list_head *gmem_list = &inode->i_mapping->private_list;
   173		pgoff_t start = offset >> PAGE_SHIFT;
   174		pgoff_t end = (offset + len) >> PAGE_SHIFT;
   175		struct kvm_gmem *gmem;
   176	
   177		/*
   178		 * Bindings must stable across invalidation to ensure the start+end
   179		 * are balanced.
   180		 */
   181		filemap_invalidate_lock(inode->i_mapping);
   182	
   183		list_for_each_entry(gmem, gmem_list, entry)
   184			kvm_gmem_invalidate_begin(gmem, start, end);
   185	
 > 186		kvm_gmem_issue_arch_invalidate(kvm, file, start, end);
   187		truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1);
   188	
   189		list_for_each_entry(gmem, gmem_list, entry)
   190			kvm_gmem_invalidate_end(gmem, start, end);
   191	
   192		filemap_invalidate_unlock(inode->i_mapping);
   193	
   194		return 0;
   195	}
   196	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-15 17:18 ` [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end() isaku.yamahata
@ 2023-08-16 20:28   ` Jarkko Sakkinen
  2023-08-18 17:55   ` Sean Christopherson
  1 sibling, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2023-08-16 20:28 UTC (permalink / raw)
  To: isaku.yamahata, kvm, linux-kernel
  Cc: isaku.yamahata, Michael Roth, Paolo Bonzini, Sean Christopherson,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

On Tue Aug 15, 2023 at 8:18 PM EEST,  wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
> and it's protected by kvm::mmu_lock.  call kvm_mmu_invalidate_end() before
> unlocking it. Not after the unlock.
>
> Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  virt/kvm/kvm_main.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8bfeb615fc4d..49380cd62367 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
>  	} arg;
>  	gfn_handler_t handler;
>  	on_lock_fn_t on_lock;
> +	on_unlock_fn_t before_unlock;
>  	on_unlock_fn_t on_unlock;
>  	bool flush_on_ret;
>  	bool may_block;
> @@ -629,6 +630,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>  		kvm_flush_remote_tlbs(kvm);
>  
>  	if (locked) {
> +		if (!IS_KVM_NULL_FN(range->before_unlock))
> +			range->before_unlock(kvm);
>  		KVM_MMU_UNLOCK(kvm);
>  		if (!IS_KVM_NULL_FN(range->on_unlock))
>  			range->on_unlock(kvm);
> @@ -653,6 +656,7 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
>  		.arg.pte	= pte,
>  		.handler	= handler,
>  		.on_lock	= (void *)kvm_null_fn,
> +		.before_unlock	= (void *)kvm_null_fn,
>  		.on_unlock	= (void *)kvm_null_fn,
>  		.flush_on_ret	= true,
>  		.may_block	= false,
> @@ -672,6 +676,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
>  		.end		= end,
>  		.handler	= handler,
>  		.on_lock	= (void *)kvm_null_fn,
> +		.before_unlock	= (void *)kvm_null_fn,
>  		.on_unlock	= (void *)kvm_null_fn,
>  		.flush_on_ret	= false,
>  		.may_block	= false,
> @@ -776,6 +781,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  		.end		= range->end,
>  		.handler	= kvm_mmu_unmap_gfn_range,
>  		.on_lock	= kvm_mmu_invalidate_begin,
> +		.before_unlock	= (void *)kvm_null_fn,
>  		.on_unlock	= kvm_arch_guest_memory_reclaimed,
>  		.flush_on_ret	= true,
>  		.may_block	= mmu_notifier_range_blockable(range),
> @@ -815,6 +821,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  
>  void kvm_mmu_invalidate_end(struct kvm *kvm)
>  {
> +	lockdep_assert_held_write(&kvm->mmu_lock);
> +
>  	/*
>  	 * This sequence increase will notify the kvm page fault that
>  	 * the page that is going to be mapped in the spte could have
> @@ -846,6 +854,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
>  		.end		= range->end,
>  		.handler	= (void *)kvm_null_fn,
>  		.on_lock	= kvm_mmu_invalidate_end,
> +		.before_unlock	= (void *)kvm_null_fn,
>  		.on_unlock	= (void *)kvm_null_fn,
>  		.flush_on_ret	= false,
>  		.may_block	= mmu_notifier_range_blockable(range),
> @@ -2433,6 +2442,8 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
>  		kvm_flush_remote_tlbs(kvm);
>  
>  	if (locked) {
> +		if (!IS_KVM_NULL_FN(range->before_unlock))
> +			range->before_unlock(kvm);
>  		KVM_MMU_UNLOCK(kvm);
>  		if (!IS_KVM_NULL_FN(range->on_unlock))
>  			range->on_unlock(kvm);
> @@ -2447,6 +2458,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, unsigned long attributes,
>  		.end = end,
>  		.handler = kvm_mmu_unmap_gfn_range,
>  		.on_lock = kvm_mmu_invalidate_begin,
> +		.before_unlock	= (void *)kvm_null_fn,
>  		.on_unlock = (void *)kvm_null_fn,
>  		.flush_on_ret = true,
>  		.may_block = true,
> @@ -2457,7 +2469,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, unsigned long attributes,
>  		.arg.attributes = attributes,
>  		.handler = kvm_arch_post_set_memory_attributes,
>  		.on_lock = (void *)kvm_null_fn,
> -		.on_unlock = kvm_mmu_invalidate_end,
> +		.before_unlock = kvm_mmu_invalidate_end,
> +		.on_unlock = (void *)kvm_null_fn,
>  		.may_block = true,
>  	};
>  	unsigned long i;
> -- 
> 2.25.1

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] KVM: gmem, x86: Add gmem hook for initializing private memory
  2023-08-15 17:18 ` [PATCH 5/8] KVM: gmem, x86: Add gmem hook for initializing private memory isaku.yamahata
@ 2023-08-16 20:30   ` Jarkko Sakkinen
  0 siblings, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2023-08-16 20:30 UTC (permalink / raw)
  To: isaku.yamahata, kvm, linux-kernel
  Cc: isaku.yamahata, Michael Roth, Paolo Bonzini, Sean Christopherson,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

On Tue Aug 15, 2023 at 8:18 PM EEST,  wrote:
> From: Michael Roth <michael.roth@amd.com>
>
> All gmem pages are expected to be 'private' as defined by a particular
> arch/platform. Platforms like SEV-SNP require additional operations to
> move these pages into a private state, so implement a hook that can be
> used to prepare this memory prior to mapping it into a guest.
>
> In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
> 2MB mapping in the guest's nested page table depends on whether or not
> any subpages within the range have already been initialized as private
> in the RMP table, so this hook will also be used by the KVM MMU to clamp
> the maximum mapping size accordingly.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Link: https://lore.kernel.org/r/20230612042559.375660-2-michael.roth@amd.com
>
> ---
> Changes v2 -> v3:
> - Newly added
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  3 +++
>  arch/x86/kvm/mmu/mmu.c             | 12 ++++++++++--
>  3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 13bc212cd4bc..439ba4beb5af 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -133,6 +133,7 @@ KVM_X86_OP(msr_filter_changed)
>  KVM_X86_OP(complete_emulated_msr)
>  KVM_X86_OP(vcpu_deliver_sipi_vector)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
>  
>  #undef KVM_X86_OP
>  #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index bbefd79b7950..2bc42f2887fa 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1732,6 +1732,9 @@ struct kvm_x86_ops {
>  	 * Returns vCPU specific APICv inhibit reasons
>  	 */
>  	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
> +
> +	int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
> +			    kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
>  };
>  
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 05943ccb55a4..06900b01b8f0 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4352,6 +4352,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
>  				   struct kvm_page_fault *fault)
>  {
>  	int max_order, r;
> +	u8 max_level;
>  
>  	if (!kvm_slot_can_be_private(fault->slot))
>  		return kvm_do_memory_fault_exit(vcpu, fault);
> @@ -4361,8 +4362,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
>  	if (r)
>  		return r;
>  
> -	fault->max_level = min(kvm_max_level_for_order(max_order),
> -			       fault->max_level);
> +	max_level = kvm_max_level_for_order(max_order);
> +	r = static_call(kvm_x86_gmem_prepare)(vcpu->kvm, fault->slot, fault->pfn,
> +					      fault->gfn, &max_level);
> +	if (r) {
> +		kvm_release_pfn_clean(fault->pfn);
> +		return r;
> +	}
> +
> +	fault->max_level = min(max_level, fault->max_level);
>  	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
>  	return RET_PF_CONTINUE;
>  }
> -- 
> 2.25.1

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory
  2023-08-15 17:18 ` [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating " isaku.yamahata
  2023-08-16  0:42   ` kernel test robot
@ 2023-08-16 20:37   ` Isaku Yamahata
  2023-10-10  9:17   ` Xu Yilun
  2 siblings, 0 replies; 26+ messages in thread
From: Isaku Yamahata @ 2023-08-16 20:37 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

On Tue, Aug 15, 2023 at 10:18:53AM -0700,
isaku.yamahata@intel.com wrote:

> From: Michael Roth <michael.roth@amd.com>
> 
> TODO: add a CONFIG option that can be to completely skip arch
> invalidation loop and avoid __weak references for arch/platforms that
> don't need an additional invalidation hook.
> 
> In some cases, like with SEV-SNP, guest memory needs to be updated in a
> platform-specific manner before it can be safely freed back to the host.
> Add hooks to wire up handling of this sort when freeing memory in
> response to FALLOC_FL_PUNCH_HOLE operations.
> 
> Also issue invalidations of all allocated pages when releasing the gmem
> file so that the pages are not left in an unusable state when they get
> freed back to the host.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@amd.com

Somehow I used the old one. Here is the updated one. The change is the argument
for kvm_gmem_issue_arch_invalidate() is struct inode instead of struct file.


From e14483943e2ab6d8a0e4d00ea903509595847aa9 Mon Sep 17 00:00:00 2001
Message-Id: <e14483943e2ab6d8a0e4d00ea903509595847aa9.1692218085.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1692218085.git.isaku.yamahata@intel.com>
References: <cover.1692218085.git.isaku.yamahata@intel.com>
From: Michael Roth <michael.roth@amd.com>
Date: Sun, 11 Jun 2023 23:25:10 -0500
Subject: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private
 memory

TODO: add a CONFIG option that can be to completely skip arch
invalidation loop and avoid __weak references for arch/platforms that
don't need an additional invalidation hook.

In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Add hooks to wire up handling of this sort when freeing memory in
response to FALLOC_FL_PUNCH_HOLE operations.

Also issue invalidations of all allocated pages when releasing the gmem
file so that the pages are not left in an unusable state when they get
freed back to the host.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@amd.com

---
Changes:
- Use struct inode instead of struct file.

Changes v4 -> v5:
- Fix compile issue by adding static inline when gmem is disabled

Changes v2 -> v3:
- Newly added

KVM: guest_mem: fix kvm_gmem_issue_arch_invalidate()

Now kvm_gmem_issue_arch_invalidate() takes struct inode instead of
struct file.  Adjust argument.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/x86.c                 |  6 +++++
 include/linux/kvm_host.h           |  3 +++
 virt/kvm/guest_mem.c               | 42 ++++++++++++++++++++++++++++++
 5 files changed, 53 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 439ba4beb5af..48f043de2ec0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2bc42f2887fa..17e78f9f2d17 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1735,6 +1735,7 @@ struct kvm_x86_ops {
 
 	int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
+	void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index de195ad83ec0..b54818d02cb1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13274,6 +13274,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
+#ifdef CONFIG_KVM_PRIVATE_MEM
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+	static_call_cond(kvm_x86_gmem_invalidate)(kvm, start, end);
+}
+#endif
 
 int kvm_spec_ctrl_test_value(u64 value)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 091bc89ae805..349b0bf81fa5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2358,6 +2358,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 #ifdef CONFIG_KVM_PRIVATE_MEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 			      gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2366,6 +2367,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
+
+static inline void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) { }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
 #endif
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index ed03f1d12172..13d6dab08f87 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -127,6 +127,46 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
 	KVM_MMU_UNLOCK(kvm);
 }
 
+void __weak kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+}
+
+/* Handle arch-specific hooks needed before releasing guarded pages. */
+static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode,
+					   pgoff_t start, pgoff_t end)
+{
+	pgoff_t file_end = i_size_read(inode) >> PAGE_SHIFT;
+	pgoff_t index = start;
+
+	end = min(end, file_end);
+
+	while (index < end) {
+		struct folio *folio;
+		unsigned int order;
+		struct page *page;
+		kvm_pfn_t pfn;
+
+		folio = __filemap_get_folio(inode->i_mapping, index,
+					    FGP_LOCK, 0);
+		if (!folio) {
+			index++;
+			continue;
+		}
+
+		page = folio_file_page(folio, index);
+		pfn = page_to_pfn(page);
+		order = folio_order(folio);
+
+		kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));
+
+		index = folio_next_index(folio);
+		folio_unlock(folio);
+		folio_put(folio);
+
+		cond_resched();
+	}
+}
+
 static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 {
 	struct list_head *gmem_list = &inode->i_mapping->private_list;
@@ -143,6 +183,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 	list_for_each_entry(gmem, gmem_list, entry)
 		kvm_gmem_invalidate_begin(gmem, start, end);
 
+	kvm_gmem_issue_arch_invalidate(gmem->kvm, inode, start, end);
 	truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1);
 
 	list_for_each_entry(gmem, gmem_list, entry)
@@ -253,6 +294,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
 	 * memory, as its lifetime is associated with the inode, not the file.
 	 */
 	kvm_gmem_invalidate_begin(gmem, 0, -1ul);
+	kvm_gmem_issue_arch_invalidate(gmem->kvm, file_inode(file), 0, -1ul);
 	kvm_gmem_invalidate_end(gmem, 0, -1ul);
 
 	list_del(&gmem->entry);
-- 
2.25.1
-- 
Isaku Yamahata <isaku.yamahata@linux.intel.com>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-15 17:18 ` [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end() isaku.yamahata
  2023-08-16 20:28   ` Jarkko Sakkinen
@ 2023-08-18 17:55   ` Sean Christopherson
  2023-08-18 20:32     ` Kalra, Ashish
  1 sibling, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2023-08-18 17:55 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
> and it's protected by kvm::mmu_lock.  call kvm_mmu_invalidate_end() before
> unlocking it. Not after the unlock.
> 
> Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")

This fixes is wrong.  It won't matter in the long run, but it makes my life that
much harder.

> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  virt/kvm/kvm_main.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8bfeb615fc4d..49380cd62367 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
>  	} arg;
>  	gfn_handler_t handler;
>  	on_lock_fn_t on_lock;
> +	on_unlock_fn_t before_unlock;
>  	on_unlock_fn_t on_unlock;

Ugh, shame on my past me.  Having on_lock and on_unlock be asymmetrical with respect
to the lock is nasty.

I would much rather we either (a) be explicit, e.g. before_(un)lock and after_(un)lock,
or (b) have just on_(un)lock, make them symetrical, and handle the SEV mess a
different way.

The SEV hook doesn't actually care about running immediately after unlock, it just
wants to know if there was an overlapping memslot.  It can run after SRCU is dropped,
because even if we make the behavior more precise (right now it blasts WBINVD),
just having a reference to memslots isn't sufficient, the code needs to guarantee
memslots are *stable*.  And that is already guaranteed by the notifier code, i.e.
the SEV code could just reacquire SRCU.

And that will hold true for any possible stuff that wants to do something *after*
dropping mmu_lock.

One idea would be to use a struct to return a tuple of the actual return value
along with whether or not mmu_lock was acquired, i.e. if there was overlap.  The
only really gross part is squeezing in meaningful name.  Absusing a #define is
one way to make the code somewhat readable...

I think this has my vote, especially if we can figure out a way to keep the
line lengths reasonable without a gnarly #define hack (I really don't want to
split the return onto a separate line).

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 95c621e05b5a..ec45510549bf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -541,6 +541,11 @@ struct kvm_mmu_notifier_range {
        bool may_block;
 };
 
+struct kvm_mmu_notifier_return {
+       bool ret;
+       bool locked;
+};
+
 /*
  * Use a dedicated stub instead of NULL to indicate that there is no callback
  * function/handler.  The compiler technically can't guarantee that a real
@@ -560,10 +565,15 @@ static void kvm_null_fn(void)
             node;                                                           \
             node = interval_tree_iter_next(node, start, last))      \
 
-static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
-                                                 const struct kvm_mmu_notifier_range *range)
+#define kvm_mn_ret_t __always_inline struct kvm_mmu_notifier_return
+
+static kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
+                                          const struct kvm_mmu_notifier_range *range)
 {
-       bool ret = false, locked = false;
+       struct kvm_mmu_notifier_return r = {
+               .ret = false,
+               .locked = false,
+       };
        struct kvm_gfn_range gfn_range;
        struct kvm_memory_slot *slot;
        struct kvm_memslots *slots;
@@ -574,12 +584,12 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
        BUILD_BUG_ON(sizeof(gfn_range.arg) != sizeof(range->arg));
 
        if (WARN_ON_ONCE(range->end <= range->start))
-               return 0;
+               return r;
 
        /* A null handler is allowed if and only if on_lock() is provided. */
        if (WARN_ON_ONCE(IS_KVM_NULL_FN(range->on_lock) &&
                         IS_KVM_NULL_FN(range->handler)))
-               return 0;
+               return r;
 
        idx = srcu_read_lock(&kvm->srcu);
 
@@ -613,8 +623,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
                        gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
                        gfn_range.slot = slot;
 
-                       if (!locked) {
-                               locked = true;
+                       if (!r.locked) {
+                               r.locked = true;
                                KVM_MMU_LOCK(kvm);
                                if (!IS_KVM_NULL_FN(range->on_lock))
                                        range->on_lock(kvm);
@@ -622,33 +632,28 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
                                if (IS_KVM_NULL_FN(range->handler))
                                        break;
                        }
-                       ret |= range->handler(kvm, &gfn_range);
+                       r.ret |= range->handler(kvm, &gfn_range);
                }
        }
 
-       if (range->flush_on_ret && ret)
+       if (range->flush_on_ret && r.ret)
                kvm_flush_remote_tlbs(kvm);
 
-       if (locked) {
+       if (r.locked) {
                KVM_MMU_UNLOCK(kvm);
                if (!IS_KVM_NULL_FN(range->on_unlock))
                        range->on_unlock(kvm);
        }
 
-       if (range->reclaim_on_ret && ret)
-               kvm_arch_guest_memory_reclaimed(kvm);
-
        srcu_read_unlock(&kvm->srcu, idx);
 
-       /* The notifiers are averse to booleans. :-( */
-       return (int)ret;
+       return r;
 }
 
...skipping...
        const struct kvm_mmu_notifier_range range = {
@@ -780,7 +785,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
                .end            = range->end,
                .handler        = kvm_mmu_unmap_gfn_range,
                .on_lock        = kvm_mmu_invalidate_begin,
-               .on_unlock      = kvm_null_fn,
+               .on_unlock      = (void *)kvm_null_fn,
                .flush_on_ret   = true,
                .reclaim_on_ret = true,
                .may_block      = mmu_notifier_range_blockable(range),
@@ -813,7 +818,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
        gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end,
                                          hva_range.may_block);
 
-       __kvm_handle_hva_range(kvm, &hva_range);
+       if (__kvm_handle_hva_range(kvm, &hva_range).locked)
+               kvm_arch_guest_memory_reclaimed(kvm);
 
        return 0;
 }
@@ -881,7 +887,7 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
 {
        trace_kvm_age_hva(start, end);
 
-       return kvm_handle_hva_range(mn, start, end, __pte(0), kvm_age_gfn);
+       return kvm_handle_hva_range(mn, start, end, __pte(0), kvm_age_gfn).ret;
 }
 
 static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
@@ -904,7 +910,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
         * cadence. If we find this inaccurate, we might come up with a
         * more sophisticated heuristic later.
         */
-       return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn);
+       return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn).ret;
 }
 
 static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
@@ -914,7 +920,7 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
        trace_kvm_test_age_hva(address);
 
        return kvm_handle_hva_range_no_flush(mn, address, address + 1,
-                                            kvm_test_age_gfn);
+                                            kvm_test_age_gfn).ret;
 }
 
 static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
@@ -2400,12 +2406,14 @@ static u64 kvm_supported_mem_attributes(struct kvm *kvm)
 static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
                                                 struct kvm_mmu_notifier_range *range)
 {
+       struct kvm_mmu_notifier_return r = {
+               .ret = false,
+               .locked = false,
+       };
        struct kvm_gfn_range gfn_range;
        struct kvm_memory_slot *slot;
        struct kvm_memslots *slots;
        struct kvm_memslot_iter iter;
-       bool locked = false;
-       bool ret = false;
        int i;
 
        gfn_range.arg.raw = range->arg.raw;
@@ -2423,21 +2431,21 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
                        if (gfn_range.start >= gfn_range.end)
                                continue;
 
-                       if (!locked) {
-                               locked = true;
+                       if (!r.locked) {
+                               r.locked = true;
                                KVM_MMU_LOCK(kvm);
                                if (!IS_KVM_NULL_FN(range->on_lock))
                                        range->on_lock(kvm);
                        }
 
-                       ret |= range->handler(kvm, &gfn_range);
+                       r.ret |= range->handler(kvm, &gfn_range);
                }
        }
 
-       if (range->flush_on_ret && ret)
+       if (range->flush_on_ret && r.ret)
                kvm_flush_remote_tlbs(kvm);
 
-       if (locked) {
+       if (r.locked) {
                KVM_MMU_UNLOCK(kvm);
                if (!IS_KVM_NULL_FN(range->on_unlock))
                        range->on_unlock(kvm);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier
  2023-08-15 17:18 ` [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier isaku.yamahata
@ 2023-08-18 18:15   ` Sean Christopherson
  0 siblings, 0 replies; 26+ messages in thread
From: Sean Christopherson @ 2023-08-18 18:15 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add slots_lock around kvm_flush_shadow_all().  kvm_gmem_release() via
> fput() and kvm_mmu_notifier_release() via mmput() can be called
> simultaneously on process exit because vhost, /dev/vhost_{net, vsock}, can
> delay the call to release mmu_notifier, kvm_mmu_notifier_release() by its
> kernel thread.  Vhost uses get_task_mm() and mmput() for the kernel thread
> to access process memory.  mmput() can defer after closing the file.
> 
> kvm_flush_shadow_all() and kvm_gmem_release() can be called simultaneously.

KVM shouldn't reclaim memory on file release, it should instead do that on the
inode being "evicted": https://lore.kernel.org/all/ZLGiEfJZTyl7M8mS@google.com

> With TDX KVM, HKID releasing by kvm_flush_shadow_all() and private memory
> releasing by kvm_gmem_release() can race.  Add slots_lock to
> kvm_mmu_notifier_release().

No, the right answer is to not release the HKID until the VM is destroyed.  gmem
has a reference to its associated kvm instance, and so that will naturally ensure
memory all memory encrypted with the HKID is freed before the HKID is released.
kvm_flush_shadow_all() should only tear down page tables, it shouldn't be freeing
guest_memfd memory.

Then patches 6-8 go away.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-18 17:55   ` Sean Christopherson
@ 2023-08-18 20:32     ` Kalra, Ashish
  2023-08-18 22:44       ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Kalra, Ashish @ 2023-08-18 20:32 UTC (permalink / raw)
  To: Sean Christopherson, isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba


On 8/18/2023 12:55 PM, Sean Christopherson wrote:
> On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
>> and it's protected by kvm::mmu_lock.  call kvm_mmu_invalidate_end() before
>> unlocking it. Not after the unlock.
>>
>> Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
> 
> This fixes is wrong.  It won't matter in the long run, but it makes my life that
> much harder.
> 
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> ---
>>   virt/kvm/kvm_main.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 8bfeb615fc4d..49380cd62367 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
>>   	} arg;
>>   	gfn_handler_t handler;
>>   	on_lock_fn_t on_lock;
>> +	on_unlock_fn_t before_unlock;
>>   	on_unlock_fn_t on_unlock;
> 
> Ugh, shame on my past me.  Having on_lock and on_unlock be asymmetrical with respect
> to the lock is nasty.
> 
> I would much rather we either (a) be explicit, e.g. before_(un)lock and after_(un)lock,
> or (b) have just on_(un)lock, make them symetrical, and handle the SEV mess a
> different way.
> 
> The SEV hook doesn't actually care about running immediately after unlock, it just
> wants to know if there was an overlapping memslot.  It can run after SRCU is dropped,
> because even if we make the behavior more precise (right now it blasts WBINVD),
> just having a reference to memslots isn't sufficient, the code needs to guarantee
> memslots are *stable*.  And that is already guaranteed by the notifier code, i.e.
> the SEV code could just reacquire SRCU.

On a separate note here, the SEV hook blasting WBINVD is still causing 
serious performance degradation issues with SNP triggered via 
AutoNUMA/numad/KSM, etc. With reference to previous discussions related 
to it, we have plans to replace WBINVD with CLFLUSHOPT.

Pasting your previous thoughts on the same:

For SNP guests, KVM should use CLFLUSHOPT and not WBINVD.
That will slow down the SNP guest itself, but it should eliminate the 
noisy neighbor problems.

In theory, KVM could do the same for SEV/SEV-ES guests, but that's 
subtly quite difficult, because in order to use CLFLUSHOPT, the kernel 
needs a valid VA=>PA mapping.
Because mmu_notifier_invalidate_range_start() calls aren't fully 
serialized, KVM would encounter situations where there is no valid 
mapping for the userspace VA.
KVM could ignore those, but IIRC when Mingwei and I last looked at this, 
we weren't super confident that KVM wouldn't miss edge cases.

Using KVM's SPTEs to get the PA isn't a great option, as that would 
require KVM to flush whenever a leaf SPTE were zapped, i.e. even when 
_KVM_ initiates the zap.

UPM is supposed to make this easier because the notifier should be able 
to provide the PFN(s) being unmapped and the use the direct map to 
flush.  I don't think the proposed series actually provides the PFN, but 
it should not be difficult to add.

Thanks,
Ashish


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] KVM: gmem: Fix kvm_gmem_issue_arch_invalidate()
  2023-08-15 17:18 ` [PATCH 3/8] KVM: gmem: Fix kvm_gmem_issue_arch_invalidate() isaku.yamahata
@ 2023-08-18 22:33   ` Sean Christopherson
  0 siblings, 0 replies; 26+ messages in thread
From: Sean Christopherson @ 2023-08-18 22:33 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> __filemap_get_folio() can return error.  Use IS_ERR_OR_NULL.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

*sigh*

(a) This patch is ordered incorrectly, it belongs after patch 6.

(b) Don't post broken patches with fixup in the same damn series.

This is not hard, these are basic rules of the road.

Sorry for being grumpy, but I spent way too much time figuring out what on earth
was going on, for something that should have required exactly _zero_ effort on
my end.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-18 20:32     ` Kalra, Ashish
@ 2023-08-18 22:44       ` Sean Christopherson
  2023-08-19  2:08         ` Mingwei Zhang
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2023-08-18 22:44 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Michael Roth,
	Paolo Bonzini, erdemaktas, Sagi Shahar, David Matlack, Kai Huang,
	Zhi Wang, chen.bo, linux-coco, Chao Peng, Ackerley Tng,
	Vishal Annapurve, Yuan Yao, Jarkko Sakkinen, Xu Yilun,
	Quentin Perret, wei.w.wang, Fuad Tabba, Mingwei Zhang

+Mingwei to correct me if I'm wrong

On Fri, Aug 18, 2023, Ashish Kalra wrote:
> 
> On 8/18/2023 12:55 PM, Sean Christopherson wrote:
> > On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > 
> > > kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
> > > and it's protected by kvm::mmu_lock.  call kvm_mmu_invalidate_end() before
> > > unlocking it. Not after the unlock.
> > > 
> > > Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
> > 
> > This fixes is wrong.  It won't matter in the long run, but it makes my life that
> > much harder.
> > 
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > ---
> > >   virt/kvm/kvm_main.c | 15 ++++++++++++++-
> > >   1 file changed, 14 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 8bfeb615fc4d..49380cd62367 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
> > >   	} arg;
> > >   	gfn_handler_t handler;
> > >   	on_lock_fn_t on_lock;
> > > +	on_unlock_fn_t before_unlock;
> > >   	on_unlock_fn_t on_unlock;
> > 
> > Ugh, shame on my past me.  Having on_lock and on_unlock be asymmetrical with respect
> > to the lock is nasty.
> > 
> > I would much rather we either (a) be explicit, e.g. before_(un)lock and after_(un)lock,
> > or (b) have just on_(un)lock, make them symetrical, and handle the SEV mess a
> > different way.
> > 
> > The SEV hook doesn't actually care about running immediately after unlock, it just
> > wants to know if there was an overlapping memslot.  It can run after SRCU is dropped,
> > because even if we make the behavior more precise (right now it blasts WBINVD),
> > just having a reference to memslots isn't sufficient, the code needs to guarantee
> > memslots are *stable*.  And that is already guaranteed by the notifier code, i.e.
> > the SEV code could just reacquire SRCU.
> 
> On a separate note here, the SEV hook blasting WBINVD is still causing
> serious performance degradation issues with SNP triggered via
> AutoNUMA/numad/KSM, etc. With reference to previous discussions related to
> it, we have plans to replace WBINVD with CLFLUSHOPT.

Isn't the flush unnecessary when freeing shared memory?  My recollection is that
the problematic scenario is when encrypted memory is freed back to the host,
because KVM already flushes when potentially encrypted mapping memory into the
guest.

With SNP+guest_memfd, private/encrypted memory should be unreachabled via the
hva-based mmu_notifiers.  gmem should have full control of the page lifecycles,
i.e. can get the kernel virtual address as appropriated, and so it SNP shouldn't
need the nuclear option.

E.g. something like this?

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 07756b7348ae..1c6828ae391d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2328,7 +2328,7 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
 
 void sev_guest_memory_reclaimed(struct kvm *kvm)
 {
-       if (!sev_guest(kvm))
+       if (!sev_guest(kvm) || sev_snp_guest(kvm))
                return;
 
        wbinvd_on_all_cpus();

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX
  2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
                   ` (7 preceding siblings ...)
  2023-08-15 17:18 ` [PATCH 8/8] RFC: KVM: gmem: Guarantee the order of destruction isaku.yamahata
@ 2023-08-18 23:14 ` Sean Christopherson
  8 siblings, 0 replies; 26+ messages in thread
From: Sean Christopherson @ 2023-08-18 23:14 UTC (permalink / raw)
  To: Sean Christopherson, kvm, linux-kernel, isaku.yamahata
  Cc: isaku.yamahata, Michael Roth, Paolo Bonzini, erdemaktas,
	Sagi Shahar, David Matlack, Kai Huang, Zhi Wang, chen.bo,
	linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve, Yuan Yao,
	Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Tue, 15 Aug 2023 10:18:47 -0700, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> This patch series for KVM guest memfd is to have common code base for SEV and
> TDX.  Several minor fixes.  Based on this patch series, TDX KVM can defer page
> clearing without mmu lock.
> 
> Isaku Yamahata (6):
>   KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd
>   KVM: gmem: removed duplicated kvm_gmem_init()
>   KVM: gmem: Fix kvm_gmem_issue_arch_invalidate()
>   KVM: gmem: protect kvm_mmu_invalidate_end()
>   KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier
>   RFC: KVM: gmem: Guarantee the order of destruction
> 
> [...]

Applied patches 1 and 2 to kvm-x86 guest_memfd.  I'll post the alternative
approach for fixing the unlocking bug next week (need to test, and I'm out of
time this week).

Regarding the initialize/invalidate hooks, I resurrected the discussion from
the previous version[*], I'd like to bottom out on a solution in that thread
before applying anything.

[*] https://lore.kernel.org/all/ZN%2FwY53TF2aOZtLu@google.com

[1/8] KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd
      https://github.com/kvm-x86/linux/commit/07ac04fbefce
[2/8] KVM: gmem: removed duplicated kvm_gmem_init()
      https://github.com/kvm-x86/linux/commit/9ab46d91d5ea

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-18 22:44       ` Sean Christopherson
@ 2023-08-19  2:08         ` Mingwei Zhang
  2023-08-21 14:42           ` Sean Christopherson
  2023-08-21 21:44           ` Kalra, Ashish
  0 siblings, 2 replies; 26+ messages in thread
From: Mingwei Zhang @ 2023-08-19  2:08 UTC (permalink / raw)
  To: Sean Christopherson, Jacky Li
  Cc: Ashish Kalra, isaku.yamahata, kvm, linux-kernel, isaku.yamahata,
	Michael Roth, Paolo Bonzini, erdemaktas, Sagi Shahar,
	David Matlack, Kai Huang, Zhi Wang, chen.bo, linux-coco,
	Chao Peng, Ackerley Tng, Vishal Annapurve, Yuan Yao,
	Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

+Jacky Li

On Fri, Aug 18, 2023 at 3:45 PM Sean Christopherson <seanjc@google.com> wrote:
>
> +Mingwei to correct me if I'm wrong
>
> On Fri, Aug 18, 2023, Ashish Kalra wrote:
> >
> > On 8/18/2023 12:55 PM, Sean Christopherson wrote:
> > > On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
> > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > >
> > > > kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
> > > > and it's protected by kvm::mmu_lock.  call kvm_mmu_invalidate_end() before
> > > > unlocking it. Not after the unlock.
> > > >
> > > > Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
> > >
> > > This fixes is wrong.  It won't matter in the long run, but it makes my life that
> > > much harder.
> > >
> > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > ---
> > > >   virt/kvm/kvm_main.c | 15 ++++++++++++++-
> > > >   1 file changed, 14 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index 8bfeb615fc4d..49380cd62367 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
> > > >           } arg;
> > > >           gfn_handler_t handler;
> > > >           on_lock_fn_t on_lock;
> > > > + on_unlock_fn_t before_unlock;
> > > >           on_unlock_fn_t on_unlock;
> > >
> > > Ugh, shame on my past me.  Having on_lock and on_unlock be asymmetrical with respect
> > > to the lock is nasty.
> > >
> > > I would much rather we either (a) be explicit, e.g. before_(un)lock and after_(un)lock,
> > > or (b) have just on_(un)lock, make them symetrical, and handle the SEV mess a
> > > different way.
> > >
> > > The SEV hook doesn't actually care about running immediately after unlock, it just
> > > wants to know if there was an overlapping memslot.  It can run after SRCU is dropped,
> > > because even if we make the behavior more precise (right now it blasts WBINVD),
> > > just having a reference to memslots isn't sufficient, the code needs to guarantee
> > > memslots are *stable*.  And that is already guaranteed by the notifier code, i.e.
> > > the SEV code could just reacquire SRCU.
> >
> > On a separate note here, the SEV hook blasting WBINVD is still causing
> > serious performance degradation issues with SNP triggered via
> > AutoNUMA/numad/KSM, etc. With reference to previous discussions related to
> > it, we have plans to replace WBINVD with CLFLUSHOPT.
>
> Isn't the flush unnecessary when freeing shared memory?  My recollection is that
> the problematic scenario is when encrypted memory is freed back to the host,
> because KVM already flushes when potentially encrypted mapping memory into the
> guest.
>
> With SNP+guest_memfd, private/encrypted memory should be unreachabled via the
> hva-based mmu_notifiers.  gmem should have full control of the page lifecycles,
> i.e. can get the kernel virtual address as appropriated, and so it SNP shouldn't
> need the nuclear option.
>
> E.g. something like this?
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 07756b7348ae..1c6828ae391d 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2328,7 +2328,7 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
>
>  void sev_guest_memory_reclaimed(struct kvm *kvm)
>  {
> -       if (!sev_guest(kvm))
> +       if (!sev_guest(kvm) || sev_snp_guest(kvm))
>                 return;
>
>         wbinvd_on_all_cpus();

I hope this is the final solution :)

So, short answer: no.

SNP+guest_memfd prevent untrusted host user space from directly
modifying the data, this is good enough for CVE-2022-0171, but there
is no such guarantee that the host kernel in some scenarios could
access the data and generate dirty caches. In fact, AFAIC, SNP VM does
not track whether each page is previously shared, isn't it? If a page
was previously shared and was written by the host kernel or devices
before it was changed to private. No one tracks it and dirty caches
are there!

So, to avoid any corner case situations like the above, it seems
currently we have to retain the property: flushing the cache when the
guest memory mapping leaves KVM NPT.

Of course, this is fundamentally because SME_COHERENT only applies to
CPU cores, but not DMA. If SME_COHERENT is complete, flushing is no
longer needed. Alternatively, we need extra bookkeeping for KVM to
know whether each page has dirty cache lines. Another alternative is
to filter mmu_notifier reasons, which is the part that I am planning
to take. thoughts?

Thanks.
-Mingwei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-19  2:08         ` Mingwei Zhang
@ 2023-08-21 14:42           ` Sean Christopherson
  2023-08-21 21:44           ` Kalra, Ashish
  1 sibling, 0 replies; 26+ messages in thread
From: Sean Christopherson @ 2023-08-21 14:42 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Jacky Li, Ashish Kalra, isaku.yamahata, kvm, linux-kernel,
	isaku.yamahata, Michael Roth, Paolo Bonzini, erdemaktas,
	Sagi Shahar, David Matlack, Kai Huang, Zhi Wang, chen.bo,
	linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve, Yuan Yao,
	Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Fri, Aug 18, 2023, Mingwei Zhang wrote:
> +Jacky Li
> 
> On Fri, Aug 18, 2023 at 3:45 PM Sean Christopherson <seanjc@google.com> wrote:
> > > On a separate note here, the SEV hook blasting WBINVD is still causing
> > > serious performance degradation issues with SNP triggered via
> > > AutoNUMA/numad/KSM, etc. With reference to previous discussions related to
> > > it, we have plans to replace WBINVD with CLFLUSHOPT.
> >
> > Isn't the flush unnecessary when freeing shared memory?  My recollection is that
> > the problematic scenario is when encrypted memory is freed back to the host,
> > because KVM already flushes when potentially encrypted mapping memory into the
> > guest.
> >
> > With SNP+guest_memfd, private/encrypted memory should be unreachabled via the
> > hva-based mmu_notifiers.  gmem should have full control of the page lifecycles,
> > i.e. can get the kernel virtual address as appropriated, and so it SNP shouldn't
> > need the nuclear option.
> >
> > E.g. something like this?
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 07756b7348ae..1c6828ae391d 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -2328,7 +2328,7 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
> >
> >  void sev_guest_memory_reclaimed(struct kvm *kvm)
> >  {
> > -       if (!sev_guest(kvm))
> > +       if (!sev_guest(kvm) || sev_snp_guest(kvm))
> >                 return;
> >
> >         wbinvd_on_all_cpus();
> 
> I hope this is the final solution :)
> 
> So, short answer: no.
> 
> SNP+guest_memfd prevent untrusted host user space from directly
> modifying the data, this is good enough for CVE-2022-0171, but there
> is no such guarantee that the host kernel in some scenarios could
> access the data and generate dirty caches. In fact, AFAIC, SNP VM does
> not track whether each page is previously shared, isn't it? If a page
> was previously shared and was written by the host kernel or devices
> before it was changed to private. No one tracks it and dirty caches
> are there!

There's an unstated assumption that KVM will do CLFLUSHOPT (if necessary) for
SEV-* guests when allocating into guest_memfd().

> So, to avoid any corner case situations like the above, it seems
> currently we have to retain the property: flushing the cache when the
> guest memory mapping leaves KVM NPT.

What I'm saying is that for guests whose private memory is backed by guest_memfd(),
which is all SNP guests, it should be impossible for memory that is reachable via
mmu_notifiers to be mapped in KVM's MMU as private.  So yes, KVM needs to flush
when memory is freed from guest_memfd(), but not for memory that is reclaimed by
mmu_notifiers, i.e. not for sev_guest_memory_reclaimed().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-19  2:08         ` Mingwei Zhang
  2023-08-21 14:42           ` Sean Christopherson
@ 2023-08-21 21:44           ` Kalra, Ashish
  2023-08-22 22:30             ` Kalra, Ashish
  2023-08-22 23:17             ` Sean Christopherson
  1 sibling, 2 replies; 26+ messages in thread
From: Kalra, Ashish @ 2023-08-21 21:44 UTC (permalink / raw)
  To: Mingwei Zhang, Sean Christopherson, Jacky Li
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Michael Roth,
	Paolo Bonzini, erdemaktas, Sagi Shahar, David Matlack, Kai Huang,
	Zhi Wang, chen.bo, linux-coco, Chao Peng, Ackerley Tng,
	Vishal Annapurve, Yuan Yao, Jarkko Sakkinen, Xu Yilun,
	Quentin Perret, wei.w.wang, Fuad Tabba

Hello Mingwei & Sean,

On 8/18/2023 9:08 PM, Mingwei Zhang wrote:
> +Jacky Li
> 
> On Fri, Aug 18, 2023 at 3:45 PM Sean Christopherson <seanjc@google.com> wrote:
>>
>> +Mingwei to correct me if I'm wrong
>>
>> On Fri, Aug 18, 2023, Ashish Kalra wrote:
>>>
>>> On 8/18/2023 12:55 PM, Sean Christopherson wrote:
>>>> On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
>>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>>
>>>>> kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
>>>>> and it's protected by kvm::mmu_lock.  call kvm_mmu_invalidate_end() before
>>>>> unlocking it. Not after the unlock.
>>>>>
>>>>> Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
>>>>
>>>> This fixes is wrong.  It won't matter in the long run, but it makes my life that
>>>> much harder.
>>>>
>>>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>> ---
>>>>>    virt/kvm/kvm_main.c | 15 ++++++++++++++-
>>>>>    1 file changed, 14 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>>> index 8bfeb615fc4d..49380cd62367 100644
>>>>> --- a/virt/kvm/kvm_main.c
>>>>> +++ b/virt/kvm/kvm_main.c
>>>>> @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
>>>>>            } arg;
>>>>>            gfn_handler_t handler;
>>>>>            on_lock_fn_t on_lock;
>>>>> + on_unlock_fn_t before_unlock;
>>>>>            on_unlock_fn_t on_unlock;
>>>>
>>>> Ugh, shame on my past me.  Having on_lock and on_unlock be asymmetrical with respect
>>>> to the lock is nasty.
>>>>
>>>> I would much rather we either (a) be explicit, e.g. before_(un)lock and after_(un)lock,
>>>> or (b) have just on_(un)lock, make them symetrical, and handle the SEV mess a
>>>> different way.
>>>>
>>>> The SEV hook doesn't actually care about running immediately after unlock, it just
>>>> wants to know if there was an overlapping memslot.  It can run after SRCU is dropped,
>>>> because even if we make the behavior more precise (right now it blasts WBINVD),
>>>> just having a reference to memslots isn't sufficient, the code needs to guarantee
>>>> memslots are *stable*.  And that is already guaranteed by the notifier code, i.e.
>>>> the SEV code could just reacquire SRCU.
>>>
>>> On a separate note here, the SEV hook blasting WBINVD is still causing
>>> serious performance degradation issues with SNP triggered via
>>> AutoNUMA/numad/KSM, etc. With reference to previous discussions related to
>>> it, we have plans to replace WBINVD with CLFLUSHOPT.
>>
>> Isn't the flush unnecessary when freeing shared memory?  My recollection is that
>> the problematic scenario is when encrypted memory is freed back to the host,
>> because KVM already flushes when potentially encrypted mapping memory into the
>> guest.
>>
>> With SNP+guest_memfd, private/encrypted memory should be unreachabled via the
>> hva-based mmu_notifiers.  gmem should have full control of the page lifecycles,
>> i.e. can get the kernel virtual address as appropriated, and so it SNP shouldn't
>> need the nuclear option.
>>
>> E.g. something like this?
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 07756b7348ae..1c6828ae391d 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2328,7 +2328,7 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
>>
>>   void sev_guest_memory_reclaimed(struct kvm *kvm)
>>   {
>> -       if (!sev_guest(kvm))
>> +       if (!sev_guest(kvm) || sev_snp_guest(kvm))
>>                  return;
>>
>>          wbinvd_on_all_cpus();
> 
> I hope this is the final solution :)
> 
> So, short answer: no.
> 
> SNP+guest_memfd prevent untrusted host user space from directly
> modifying the data, this is good enough for CVE-2022-0171, but there
> is no such guarantee that the host kernel in some scenarios could
> access the data and generate dirty caches. In fact, AFAIC, SNP VM does
> not track whether each page is previously shared, isn't it? If a page
> was previously shared and was written by the host kernel or devices
> before it was changed to private. No one tracks it and dirty caches
> are there!
> 
> So, to avoid any corner case situations like the above, it seems
> currently we have to retain the property: flushing the cache when the
> guest memory mapping leaves KVM NPT.
> 
> Of course, this is fundamentally because SME_COHERENT only applies to
> CPU cores, but not DMA. If SME_COHERENT is complete, flushing is no
> longer needed. Alternatively, we need extra bookkeeping for KVM to
> know whether each page has dirty cache lines. Another alternative is
> to filter mmu_notifier reasons, which is the part that I am planning
> to take. thoughts?
> 

Now running SNP+guest_memfd with discard=both option enabled:

# bpftrace -e 'kprobe:sev_guest_memory_reclaimed {@[kstack]=count()}'
Attaching 1 probe...
^C

@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_release+60
     __mmu_notifier_release+128
     exit_mmap+657
     __mmput+72
     mmput+49
     do_exit+752
     do_group_exit+57
     get_signal+2486
     arch_do_signal_or_restart+51
     exit_to_user_mode_prepare+257
     syscall_exit_to_user_mode+42
     do_syscall_64+109
     entry_SYSCALL_64_after_hwframe+114
]: 1
@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_invalidate_range_start+869
     __mmu_notifier_invalidate_range_start+152
     change_protection+4628
     change_prot_numa+93
     task_numa_work+588
     task_work_run+108
     exit_to_user_mode_prepare+337
     syscall_exit_to_user_mode+42
     do_syscall_64+109
     entry_SYSCALL_64_after_hwframe+114
]: 2
@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_invalidate_range_start+869
     __mmu_notifier_invalidate_range_start+152
     change_protection+4628
     change_prot_numa+93
     task_numa_work+588
     task_work_run+108
     xfer_to_guest_mode_handle_work+228
     kvm_arch_vcpu_ioctl_run+1572
     kvm_vcpu_ioctl+671
     __x64_sys_ioctl+153
     do_syscall_64+96
     entry_SYSCALL_64_after_hwframe+114
]: 2
@[
     sev_guest_memory_reclaimed+5
     kvm_set_memslot+740
     __kvm_set_memory_region.part.0+411
     kvm_set_memory_region+89
     kvm_vm_ioctl+1482
     __x64_sys_ioctl+153
     do_syscall_64+96
     entry_SYSCALL_64_after_hwframe+114
]: 104
@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_invalidate_range_start+869
     __mmu_notifier_invalidate_range_start+152
     zap_page_range_single+384
     unmap_mapping_range+279
     shmem_fallocate+932
     vfs_fallocate+345
     __x64_sys_fallocate+71
     do_syscall_64+96
     entry_SYSCALL_64_after_hwframe+114
]: 5465
@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_invalidate_range_start+869
     __mmu_notifier_invalidate_range_start+152
     zap_page_range_single+384
     madvise_vma_behavior+1967
     madvise_walk_vmas+190
     do_madvise.part.0+264
     __x64_sys_madvise+98
     do_syscall_64+96
     entry_SYSCALL_64_after_hwframe+114
]: 69677

The maximum hits are seen with shmem_fallocate and madvise, which we 
believe are response to shared<->private
GHCB page-state-chage requests. discard=both handles discard both for
private and shared memory, so freeing shared memory
via fallocate(shared_memfd, FALLOC_FL_PUNCH_HOLE, ...) would trigger the
notifiers when freeing shared pages after guest converts a GPA to
private.

Now, as with SNP+guest_memfd, guest private memory is not mapped in host 
anymore, so i added a generic fix (instead of Sean's proposed patch of 
checking for SNP guest inside sev_guest_memory_reclaimed()):

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -593,6 +593,9 @@ static __always_inline int 
__kvm_handle_hva_range(struct kvm *kvm,
                         unsigned long hva_start, hva_end;

                         slot = container_of(node, struct 
kvm_memory_slot, hva_node[slots->node_idx]);
+                       if (kvm_slot_can_be_private(slot)) {
+                               continue;
+                       }
                         hva_start = max(range->start, 
slot->userspace_addr);
                         hva_end = min(range->end, slot->userspace_addr +
                                                   (slot->npages << 
PAGE_SHIFT));

With this fix added, the traces are as follows:

# bpftrace -e 'kprobe:sev_guest_memory_reclaimed {@[kstack]=count()}'
Attaching 1 probe...
^C

@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_invalidate_range_start+812
     __mmu_notifier_invalidate_range_start+152
     change_protection+4628
     change_prot_numa+93
     task_numa_work+588
     task_work_run+108
     exit_to_user_mode_prepare+337
     syscall_exit_to_user_mode+42
     do_syscall_64+109
     entry_SYSCALL_64_after_hwframe+114
]: 1
@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_release+60
     __mmu_notifier_release+128
     exit_mmap+657
     __mmput+72
     mmput+49
     do_exit+752
     do_group_exit+57
     get_signal+2486
     arch_do_signal_or_restart+51
     exit_to_user_mode_prepare+257
     syscall_exit_to_user_mode+42
     do_syscall_64+109
     entry_SYSCALL_64_after_hwframe+114
]: 1
@[
     sev_guest_memory_reclaimed+5
     kvm_mmu_notifier_invalidate_range_start+812
     __mmu_notifier_invalidate_range_start+152
     change_protection+4628
     change_prot_numa+93
     task_numa_work+588
     task_work_run+108
     xfer_to_guest_mode_handle_work+228
     kvm_arch_vcpu_ioctl_run+1572
     kvm_vcpu_ioctl+671
     __x64_sys_ioctl+153
     do_syscall_64+96
     entry_SYSCALL_64_after_hwframe+114
]:
@[
     sev_guest_memory_reclaimed+5
     kvm_set_memslot+740
     __kvm_set_memory_region.part.0+411
     kvm_set_memory_region+89
     kvm_vm_ioctl+1482
     __x64_sys_ioctl+153
     do_syscall_64+96
     entry_SYSCALL_64_after_hwframe+114
]: 104
#

As expected, the SEV hook is not invoked for the guest private memory 
pages (no more invalidation from shmem_fallocate() + madvise()).

Isn't it better to skip invoking the KVM MMU invalidation notifier when 
the invalidated range belongs to guest private memory ?

 > In fact, AFAIC, SNP VM does
 > not track whether each page is previously shared, isn't it? If a page
 > was previously shared and was written by the host kernel or devices
 > before it was changed to private. No one tracks it and dirty caches
 > are there!

The skipped invalidation here covered the case Mingwei mentioned above, 
where the pages are changed from private->shared and subsequent freeing 
of shared pages triggered the invalidation.

But, then why are we concerned about this, i thought we have concerns 
about the case where the dirty cache lines contain encrypted guest data ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-21 21:44           ` Kalra, Ashish
@ 2023-08-22 22:30             ` Kalra, Ashish
  2023-08-22 23:17             ` Sean Christopherson
  1 sibling, 0 replies; 26+ messages in thread
From: Kalra, Ashish @ 2023-08-22 22:30 UTC (permalink / raw)
  To: Mingwei Zhang, Sean Christopherson, Jacky Li
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Michael Roth,
	Paolo Bonzini, erdemaktas, Sagi Shahar, David Matlack, Kai Huang,
	Zhi Wang, chen.bo, linux-coco, Chao Peng, Ackerley Tng,
	Vishal Annapurve, Yuan Yao, Jarkko Sakkinen, Xu Yilun,
	Quentin Perret, wei.w.wang, Fuad Tabba



On 8/21/2023 4:44 PM, Kalra, Ashish wrote:
> Hello Mingwei & Sean,
> 
> On 8/18/2023 9:08 PM, Mingwei Zhang wrote:
>> +Jacky Li
>>
>> On Fri, Aug 18, 2023 at 3:45 PM Sean Christopherson 
>> <seanjc@google.com> wrote:
>>>
>>> +Mingwei to correct me if I'm wrong
>>>
>>> On Fri, Aug 18, 2023, Ashish Kalra wrote:
>>>>
>>>> On 8/18/2023 12:55 PM, Sean Christopherson wrote:
>>>>> On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
>>>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>>>
>>>>>> kvm_mmu_invalidate_end() updates struct 
>>>>>> kvm::mmu_invalidate_in_progress
>>>>>> and it's protected by kvm::mmu_lock.  call 
>>>>>> kvm_mmu_invalidate_end() before
>>>>>> unlocking it. Not after the unlock.
>>>>>>
>>>>>> Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
>>>>>
>>>>> This fixes is wrong.  It won't matter in the long run, but it makes 
>>>>> my life that
>>>>> much harder.
>>>>>
>>>>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>>> ---
>>>>>>    virt/kvm/kvm_main.c | 15 ++++++++++++++-
>>>>>>    1 file changed, 14 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>>>> index 8bfeb615fc4d..49380cd62367 100644
>>>>>> --- a/virt/kvm/kvm_main.c
>>>>>> +++ b/virt/kvm/kvm_main.c
>>>>>> @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
>>>>>>            } arg;
>>>>>>            gfn_handler_t handler;
>>>>>>            on_lock_fn_t on_lock;
>>>>>> + on_unlock_fn_t before_unlock;
>>>>>>            on_unlock_fn_t on_unlock;
>>>>>
>>>>> Ugh, shame on my past me.  Having on_lock and on_unlock be 
>>>>> asymmetrical with respect
>>>>> to the lock is nasty.
>>>>>
>>>>> I would much rather we either (a) be explicit, e.g. before_(un)lock 
>>>>> and after_(un)lock,
>>>>> or (b) have just on_(un)lock, make them symetrical, and handle the 
>>>>> SEV mess a
>>>>> different way.
>>>>>
>>>>> The SEV hook doesn't actually care about running immediately after 
>>>>> unlock, it just
>>>>> wants to know if there was an overlapping memslot.  It can run 
>>>>> after SRCU is dropped,
>>>>> because even if we make the behavior more precise (right now it 
>>>>> blasts WBINVD),
>>>>> just having a reference to memslots isn't sufficient, the code 
>>>>> needs to guarantee
>>>>> memslots are *stable*.  And that is already guaranteed by the 
>>>>> notifier code, i.e.
>>>>> the SEV code could just reacquire SRCU.
>>>>
>>>> On a separate note here, the SEV hook blasting WBINVD is still causing
>>>> serious performance degradation issues with SNP triggered via
>>>> AutoNUMA/numad/KSM, etc. With reference to previous discussions 
>>>> related to
>>>> it, we have plans to replace WBINVD with CLFLUSHOPT.
>>>
>>> Isn't the flush unnecessary when freeing shared memory?  My 
>>> recollection is that
>>> the problematic scenario is when encrypted memory is freed back to 
>>> the host,
>>> because KVM already flushes when potentially encrypted mapping memory 
>>> into the
>>> guest.
>>>
>>> With SNP+guest_memfd, private/encrypted memory should be unreachabled 
>>> via the
>>> hva-based mmu_notifiers.  gmem should have full control of the page 
>>> lifecycles,
>>> i.e. can get the kernel virtual address as appropriated, and so it 
>>> SNP shouldn't
>>> need the nuclear option.
>>>
>>> E.g. something like this?
>>>
>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>> index 07756b7348ae..1c6828ae391d 100644
>>> --- a/arch/x86/kvm/svm/sev.c
>>> +++ b/arch/x86/kvm/svm/sev.c
>>> @@ -2328,7 +2328,7 @@ static void sev_flush_encrypted_page(struct 
>>> kvm_vcpu *vcpu, void *va)
>>>
>>>   void sev_guest_memory_reclaimed(struct kvm *kvm)
>>>   {
>>> -       if (!sev_guest(kvm))
>>> +       if (!sev_guest(kvm) || sev_snp_guest(kvm))
>>>                  return;
>>>
>>>          wbinvd_on_all_cpus();
>>
>> I hope this is the final solution :)
>>
>> So, short answer: no.
>>
>> SNP+guest_memfd prevent untrusted host user space from directly
>> modifying the data, this is good enough for CVE-2022-0171, but there
>> is no such guarantee that the host kernel in some scenarios could
>> access the data and generate dirty caches. In fact, AFAIC, SNP VM does
>> not track whether each page is previously shared, isn't it? If a page
>> was previously shared and was written by the host kernel or devices
>> before it was changed to private. No one tracks it and dirty caches
>> are there!
>>
>> So, to avoid any corner case situations like the above, it seems
>> currently we have to retain the property: flushing the cache when the
>> guest memory mapping leaves KVM NPT.
>>
>> Of course, this is fundamentally because SME_COHERENT only applies to
>> CPU cores, but not DMA. If SME_COHERENT is complete, flushing is no
>> longer needed. Alternatively, we need extra bookkeeping for KVM to
>> know whether each page has dirty cache lines. Another alternative is
>> to filter mmu_notifier reasons, which is the part that I am planning
>> to take. thoughts?
>>

Additionally looking at MMU notifier event filtering and the various 
code paths (of interest) from where the MMU invalidation notifier gets 
invoked:

For NUMA load balancing during #PF fault handling, try_to_migrate_one() 
does MMU invalidation notifier invocation with MMU_NOTIFY_CLEAR event 
and then looking at KSM code paths, try_to_merge_one_page() -> 
write_protect_page() and try_to_merge_one_page() -> replace_page() do 
the MMU invalidation notifier invocations also with MMU_NOTIFY_CLEAR event.

Now, i remember from previous discussions, that the CLEAR event is an 
overloaded event used for page zapping, madvise, etc., so i don't think 
we will be able to filter *out* this event and this event is triggering 
most of the performance issues we are observing.

So considering what Sean mentioned earlier:

 >What I'm saying is that for guests whose private memory is backed by 
 >guest_memfd(), which is all SNP guests, it should be impossible for 
 >memory that is reachable via mmu_notifiers to be mapped in KVM's MMU 
as >private.  So yes, KVM needs to flush when memory is freed from 
 >guest_memfd(), but not for memory that is reclaimed by mmu_notifiers, 
i.e. not for sev_guest_memory_reclaimed().

I think the right solution for SNP guests should be:

 >>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
 >>> index 07756b7348ae..1c6828ae391d 100644
 >>> --- a/arch/x86/kvm/svm/sev.c
 >>> +++ b/arch/x86/kvm/svm/sev.c
 >>> @@ -2328,7 +2328,7 @@ static void sev_flush_encrypted_page(struct
 >>> kvm_vcpu *vcpu, void *va)
 >>>
 >>>   void sev_guest_memory_reclaimed(struct kvm *kvm)
 >>>   {
 >>> -       if (!sev_guest(kvm))
 >>> +       if (!sev_guest(kvm) || sev_snp_guest(kvm))
 >>>                  return;
 >>>
 >>>          wbinvd_on_all_cpus();

Thoughts?

Thanks,
Ashish

> 
> Now running SNP+guest_memfd with discard=both option enabled:
> 
> # bpftrace -e 'kprobe:sev_guest_memory_reclaimed {@[kstack]=count()}'
> Attaching 1 probe...
> ^C
> 
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_release+60
>      __mmu_notifier_release+128
>      exit_mmap+657
>      __mmput+72
>      mmput+49
>      do_exit+752
>      do_group_exit+57
>      get_signal+2486
>      arch_do_signal_or_restart+51
>      exit_to_user_mode_prepare+257
>      syscall_exit_to_user_mode+42
>      do_syscall_64+109
>      entry_SYSCALL_64_after_hwframe+114
> ]: 1
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_invalidate_range_start+869
>      __mmu_notifier_invalidate_range_start+152
>      change_protection+4628
>      change_prot_numa+93
>      task_numa_work+588
>      task_work_run+108
>      exit_to_user_mode_prepare+337
>      syscall_exit_to_user_mode+42
>      do_syscall_64+109
>      entry_SYSCALL_64_after_hwframe+114
> ]: 2
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_invalidate_range_start+869
>      __mmu_notifier_invalidate_range_start+152
>      change_protection+4628
>      change_prot_numa+93
>      task_numa_work+588
>      task_work_run+108
>      xfer_to_guest_mode_handle_work+228
>      kvm_arch_vcpu_ioctl_run+1572
>      kvm_vcpu_ioctl+671
>      __x64_sys_ioctl+153
>      do_syscall_64+96
>      entry_SYSCALL_64_after_hwframe+114
> ]: 2
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_set_memslot+740
>      __kvm_set_memory_region.part.0+411
>      kvm_set_memory_region+89
>      kvm_vm_ioctl+1482
>      __x64_sys_ioctl+153
>      do_syscall_64+96
>      entry_SYSCALL_64_after_hwframe+114
> ]: 104
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_invalidate_range_start+869
>      __mmu_notifier_invalidate_range_start+152
>      zap_page_range_single+384
>      unmap_mapping_range+279
>      shmem_fallocate+932
>      vfs_fallocate+345
>      __x64_sys_fallocate+71
>      do_syscall_64+96
>      entry_SYSCALL_64_after_hwframe+114
> ]: 5465
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_invalidate_range_start+869
>      __mmu_notifier_invalidate_range_start+152
>      zap_page_range_single+384
>      madvise_vma_behavior+1967
>      madvise_walk_vmas+190
>      do_madvise.part.0+264
>      __x64_sys_madvise+98
>      do_syscall_64+96
>      entry_SYSCALL_64_after_hwframe+114
> ]: 69677
> 
> The maximum hits are seen with shmem_fallocate and madvise, which we 
> believe are response to shared<->private
> GHCB page-state-chage requests. discard=both handles discard both for
> private and shared memory, so freeing shared memory
> via fallocate(shared_memfd, FALLOC_FL_PUNCH_HOLE, ...) would trigger the
> notifiers when freeing shared pages after guest converts a GPA to
> private.
> 
> Now, as with SNP+guest_memfd, guest private memory is not mapped in host 
> anymore, so i added a generic fix (instead of Sean's proposed patch of 
> checking for SNP guest inside sev_guest_memory_reclaimed()):
> 
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -593,6 +593,9 @@ static __always_inline int 
> __kvm_handle_hva_range(struct kvm *kvm,
>                          unsigned long hva_start, hva_end;
> 
>                          slot = container_of(node, struct 
> kvm_memory_slot, hva_node[slots->node_idx]);
> +                       if (kvm_slot_can_be_private(slot)) {
> +                               continue;
> +                       }
>                          hva_start = max(range->start, 
> slot->userspace_addr);
>                          hva_end = min(range->end, slot->userspace_addr +
>                                                    (slot->npages << 
> PAGE_SHIFT));
> 
> With this fix added, the traces are as follows:
> 
> # bpftrace -e 'kprobe:sev_guest_memory_reclaimed {@[kstack]=count()}'
> Attaching 1 probe...
> ^C
> 
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_invalidate_range_start+812
>      __mmu_notifier_invalidate_range_start+152
>      change_protection+4628
>      change_prot_numa+93
>      task_numa_work+588
>      task_work_run+108
>      exit_to_user_mode_prepare+337
>      syscall_exit_to_user_mode+42
>      do_syscall_64+109
>      entry_SYSCALL_64_after_hwframe+114
> ]: 1
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_release+60
>      __mmu_notifier_release+128
>      exit_mmap+657
>      __mmput+72
>      mmput+49
>      do_exit+752
>      do_group_exit+57
>      get_signal+2486
>      arch_do_signal_or_restart+51
>      exit_to_user_mode_prepare+257
>      syscall_exit_to_user_mode+42
>      do_syscall_64+109
>      entry_SYSCALL_64_after_hwframe+114
> ]: 1
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_mmu_notifier_invalidate_range_start+812
>      __mmu_notifier_invalidate_range_start+152
>      change_protection+4628
>      change_prot_numa+93
>      task_numa_work+588
>      task_work_run+108
>      xfer_to_guest_mode_handle_work+228
>      kvm_arch_vcpu_ioctl_run+1572
>      kvm_vcpu_ioctl+671
>      __x64_sys_ioctl+153
>      do_syscall_64+96
>      entry_SYSCALL_64_after_hwframe+114
> ]:
> @[
>      sev_guest_memory_reclaimed+5
>      kvm_set_memslot+740
>      __kvm_set_memory_region.part.0+411
>      kvm_set_memory_region+89
>      kvm_vm_ioctl+1482
>      __x64_sys_ioctl+153
>      do_syscall_64+96
>      entry_SYSCALL_64_after_hwframe+114
> ]: 104
> #
> 
> As expected, the SEV hook is not invoked for the guest private memory 
> pages (no more invalidation from shmem_fallocate() + madvise()).
> 
> Isn't it better to skip invoking the KVM MMU invalidation notifier when 
> the invalidated range belongs to guest private memory ?
> 
>  > In fact, AFAIC, SNP VM does
>  > not track whether each page is previously shared, isn't it? If a page
>  > was previously shared and was written by the host kernel or devices
>  > before it was changed to private. No one tracks it and dirty caches
>  > are there!
> 
> The skipped invalidation here covered the case Mingwei mentioned above, 
> where the pages are changed from private->shared and subsequent freeing 
> of shared pages triggered the invalidation.
> 
> But, then why are we concerned about this, i thought we have concerns 
> about the case where the dirty cache lines contain encrypted guest data ?
> 
> Thanks,
> Ashish

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-21 21:44           ` Kalra, Ashish
  2023-08-22 22:30             ` Kalra, Ashish
@ 2023-08-22 23:17             ` Sean Christopherson
  2023-08-31 16:50               ` Kalra, Ashish
  1 sibling, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2023-08-22 23:17 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Mingwei Zhang, Jacky Li, isaku.yamahata, kvm, linux-kernel,
	isaku.yamahata, Michael Roth, Paolo Bonzini, erdemaktas,
	Sagi Shahar, David Matlack, Kai Huang, Zhi Wang, chen.bo,
	linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve, Yuan Yao,
	Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Mon, Aug 21, 2023, Ashish Kalra wrote:
> Hello Mingwei & Sean,
> 
> On 8/18/2023 9:08 PM, Mingwei Zhang wrote:
> The maximum hits are seen with shmem_fallocate and madvise, which we believe
> are response to shared<->private
> GHCB page-state-chage requests. discard=both handles discard both for
> private and shared memory, so freeing shared memory
> via fallocate(shared_memfd, FALLOC_FL_PUNCH_HOLE, ...) would trigger the
> notifiers when freeing shared pages after guest converts a GPA to
> private.
> 
> Now, as with SNP+guest_memfd, guest private memory is not mapped in host
> anymore, so i added a generic fix (instead of Sean's proposed patch of
> checking for SNP guest inside sev_guest_memory_reclaimed()):
> 
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -593,6 +593,9 @@ static __always_inline int __kvm_handle_hva_range(struct
> kvm *kvm,
>                         unsigned long hva_start, hva_end;
> 
>                         slot = container_of(node, struct kvm_memory_slot,
> hva_node[slots->node_idx]);
> +                       if (kvm_slot_can_be_private(slot)) {
> +                               continue;
> +                       }
>                         hva_start = max(range->start, slot->userspace_addr);
>                         hva_end = min(range->end, slot->userspace_addr +
>                                                   (slot->npages <<
> PAGE_SHIFT));

...

> As expected, the SEV hook is not invoked for the guest private memory pages
> (no more invalidation from shmem_fallocate() + madvise()).
> 
> Isn't it better to skip invoking the KVM MMU invalidation notifier when the
> invalidated range belongs to guest private memory ?

Oooh, you're running into problems where KVM blasts both the private and shared
mappings even though invalidations from the mmu_notifier are shared-only by
definition.

The answer is "yes", but simply skipping slots that _can_ be private is wrong,
as KVM still needs to zap any shared mappings.  I have a plan[*], but I completely
spaced on incorporating the idea into the gmem RFC.  I'll add that to the "list
of todos for merging gmem", which I need to get sent out asap.

https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com

> > In fact, AFAIC, SNP VM does not track whether each page is previously
> > shared, isn't it? If a page was previously shared and was written by the
> > host kernel or devices before it was changed to private. No one tracks it
> > and dirty caches are there!
> 
> The skipped invalidation here covered the case Mingwei mentioned above,
> where the pages are changed from private->shared and subsequent freeing of
> shared pages triggered the invalidation.
> 
> But, then why are we concerned about this, i thought we have concerns about
> the case where the dirty cache lines contain encrypted guest data ?

Yes, that's my understanding as well (assuming by "this" you mean the case where
the CPU cache has dirty lines for _shared_ addresses).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()
  2023-08-22 23:17             ` Sean Christopherson
@ 2023-08-31 16:50               ` Kalra, Ashish
  0 siblings, 0 replies; 26+ messages in thread
From: Kalra, Ashish @ 2023-08-31 16:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Mingwei Zhang, Jacky Li, isaku.yamahata, kvm, linux-kernel,
	isaku.yamahata, Michael Roth, Paolo Bonzini, erdemaktas,
	Sagi Shahar, David Matlack, Kai Huang, Zhi Wang, chen.bo,
	linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve, Yuan Yao,
	Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

Hello Sean,

On 8/22/2023 6:17 PM, Sean Christopherson wrote:
> On Mon, Aug 21, 2023, Ashish Kalra wrote:
>> Hello Mingwei & Sean,
>>
>> On 8/18/2023 9:08 PM, Mingwei Zhang wrote:
>> The maximum hits are seen with shmem_fallocate and madvise, which we believe
>> are response to shared<->private
>> GHCB page-state-chage requests. discard=both handles discard both for
>> private and shared memory, so freeing shared memory
>> via fallocate(shared_memfd, FALLOC_FL_PUNCH_HOLE, ...) would trigger the
>> notifiers when freeing shared pages after guest converts a GPA to
>> private.
>>
>> Now, as with SNP+guest_memfd, guest private memory is not mapped in host
>> anymore, so i added a generic fix (instead of Sean's proposed patch of
>> checking for SNP guest inside sev_guest_memory_reclaimed()):
>>
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -593,6 +593,9 @@ static __always_inline int __kvm_handle_hva_range(struct
>> kvm *kvm,
>>                          unsigned long hva_start, hva_end;
>>
>>                          slot = container_of(node, struct kvm_memory_slot,
>> hva_node[slots->node_idx]);
>> +                       if (kvm_slot_can_be_private(slot)) {
>> +                               continue;
>> +                       }
>>                          hva_start = max(range->start, slot->userspace_addr);
>>                          hva_end = min(range->end, slot->userspace_addr +
>>                                                    (slot->npages <<
>> PAGE_SHIFT));
> 
> ...
> 
>> As expected, the SEV hook is not invoked for the guest private memory pages
>> (no more invalidation from shmem_fallocate() + madvise()).
>>
>> Isn't it better to skip invoking the KVM MMU invalidation notifier when the
>> invalidated range belongs to guest private memory ?
> 
> Oooh, you're running into problems where KVM blasts both the private and shared
> mappings even though invalidations from the mmu_notifier are shared-only by
> definition.
> 
> The answer is "yes", but simply skipping slots that _can_ be private is wrong,
> as KVM still needs to zap any shared mappings.  I have a plan[*], but I completely
> spaced on incorporating the idea into the gmem RFC.  I'll add that to the "list
> of todos for merging gmem", which I need to get sent out asap.
> 
> https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com

Looking at your gmem TODO's post, i don't see anything specific for this 
support:

https://lore.kernel.org/kvm/ZOjpIL0SFH+E3Dj4@google.com/

Thanks,
Ashish

> 
>>> In fact, AFAIC, SNP VM does not track whether each page is previously
>>> shared, isn't it? If a page was previously shared and was written by the
>>> host kernel or devices before it was changed to private. No one tracks it
>>> and dirty caches are there!
>>
>> The skipped invalidation here covered the case Mingwei mentioned above,
>> where the pages are changed from private->shared and subsequent freeing of
>> shared pages triggered the invalidation.
>>
>> But, then why are we concerned about this, i thought we have concerns about
>> the case where the dirty cache lines contain encrypted guest data ?
> 
> Yes, that's my understanding as well (assuming by "this" you mean the case where
> the CPU cache has dirty lines for _shared_ addresses).
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory
  2023-08-15 17:18 ` [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating " isaku.yamahata
  2023-08-16  0:42   ` kernel test robot
  2023-08-16 20:37   ` Isaku Yamahata
@ 2023-10-10  9:17   ` Xu Yilun
  2 siblings, 0 replies; 26+ messages in thread
From: Xu Yilun @ 2023-10-10  9:17 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Quentin Perret, wei.w.wang, Fuad Tabba

On 2023-08-15 at 10:18:53 -0700, isaku.yamahata@intel.com wrote:
> From: Michael Roth <michael.roth@amd.com>
> 
> TODO: add a CONFIG option that can be to completely skip arch
> invalidation loop and avoid __weak references for arch/platforms that
> don't need an additional invalidation hook.
> 
> In some cases, like with SEV-SNP, guest memory needs to be updated in a
> platform-specific manner before it can be safely freed back to the host.
> Add hooks to wire up handling of this sort when freeing memory in
> response to FALLOC_FL_PUNCH_HOLE operations.
> 
> Also issue invalidations of all allocated pages when releasing the gmem
> file so that the pages are not left in an unusable state when they get
> freed back to the host.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@amd.com
>

[...]
 
> +/* Handle arch-specific hooks needed before releasing guarded pages. */
> +static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file,
> +					   pgoff_t start, pgoff_t end)
> +{
> +	pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT;
> +	pgoff_t index = start;
> +
> +	end = min(end, file_end);
> +
> +	while (index < end) {
> +		struct folio *folio;
> +		unsigned int order;
> +		struct page *page;
> +		kvm_pfn_t pfn;
> +
> +		folio = __filemap_get_folio(file->f_mapping, index,
> +					    FGP_LOCK, 0);
> +		if (!folio) {
> +			index++;
> +			continue;
> +		}
> +
> +		page = folio_file_page(folio, index);
> +		pfn = page_to_pfn(page);
> +		order = folio_order(folio);
> +
> +		kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));

Observed an issue there.

The valid page may not point to the first page in the folio, then the
range [pfn, pfn + (1ul << order)) expands to the next folio. This makes
a part of the pages be invalidated again when loop to the next folio.

On TDX, it causes TDH_PHYMEM_PAGE_WBINVD failed.

> +
> +		index = folio_next_index(folio);
> +		folio_unlock(folio);
> +		folio_put(folio);
> +
> +		cond_resched();
> +	}
> +}

My fix would be:

diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index e629782d73d5..3665003c3746 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -155,7 +155,7 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode,

        while (index < end) {
                struct folio *folio;
-               unsigned int order;
+               pgoff_t ntails;
                struct page *page;
                kvm_pfn_t pfn;

@@ -168,9 +168,9 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode,

                page = folio_file_page(folio, index);
                pfn = page_to_pfn(page);
-               order = folio_order(folio);
+               ntails = folio_nr_pages(folio) - folio_page_idx(folio, page);

-               kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));
+               kvm_arch_gmem_invalidate(kvm, pfn, pfn + min(ntails, end - index));

                index = folio_next_index(folio);
                folio_unlock(folio);

Thanks,
Yilun

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-10-10  9:18 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-15 17:18 [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX isaku.yamahata
2023-08-15 17:18 ` [PATCH 1/8] KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd isaku.yamahata
2023-08-15 17:18 ` [PATCH 2/8] KVM: gmem: removed duplicated kvm_gmem_init() isaku.yamahata
2023-08-15 17:18 ` [PATCH 3/8] KVM: gmem: Fix kvm_gmem_issue_arch_invalidate() isaku.yamahata
2023-08-18 22:33   ` Sean Christopherson
2023-08-15 17:18 ` [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end() isaku.yamahata
2023-08-16 20:28   ` Jarkko Sakkinen
2023-08-18 17:55   ` Sean Christopherson
2023-08-18 20:32     ` Kalra, Ashish
2023-08-18 22:44       ` Sean Christopherson
2023-08-19  2:08         ` Mingwei Zhang
2023-08-21 14:42           ` Sean Christopherson
2023-08-21 21:44           ` Kalra, Ashish
2023-08-22 22:30             ` Kalra, Ashish
2023-08-22 23:17             ` Sean Christopherson
2023-08-31 16:50               ` Kalra, Ashish
2023-08-15 17:18 ` [PATCH 5/8] KVM: gmem, x86: Add gmem hook for initializing private memory isaku.yamahata
2023-08-16 20:30   ` Jarkko Sakkinen
2023-08-15 17:18 ` [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating " isaku.yamahata
2023-08-16  0:42   ` kernel test robot
2023-08-16 20:37   ` Isaku Yamahata
2023-10-10  9:17   ` Xu Yilun
2023-08-15 17:18 ` [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier isaku.yamahata
2023-08-18 18:15   ` Sean Christopherson
2023-08-15 17:18 ` [PATCH 8/8] RFC: KVM: gmem: Guarantee the order of destruction isaku.yamahata
2023-08-18 23:14 ` [PATCH 0/8] KVM: gmem: Adding hooks for SEV and TDX Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).