linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping
@ 2021-12-15  1:15 Sean Christopherson
  2021-12-15  1:15 ` [PATCH 1/4] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Sean Christopherson
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-12-15  1:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Patches 01-03 implement a bug fix by ensuring KVM zaps both valid and
invalid roots when unmapping a gfn range (including the magic "all" range).
Failure to zap invalid roots means KVM doesn't honor the mmu_notifier's
requirement that all references are dropped.

set_nx_huge_pages() is the most blatant offender, as it doesn't elevate
mm_users and so a VM's entire mm can be released, but the same underlying
bug exists for any "unmap" command from the mmu_notifier in combination
with a memslot update.  E.g. if KVM is deleting a memslot, and a
mmu_notifier hook acquires mmu_lock while it's dropped by
kvm_mmu_zap_all_fast(), the mmu_notifier hook will see the to-be-deleted
memslot but won't zap entries from the invalid roots.

Patch 04 is cleanup to reuse the common iterator for walking _only_
invalid roots.

Sean Christopherson (4):
  KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap
    hook
  KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
  KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
  KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots

 arch/x86/kvm/mmu/tdp_mmu.c | 116 +++++++++++++++++--------------------
 arch/x86/kvm/mmu/tdp_mmu.h |   3 -
 2 files changed, 53 insertions(+), 66 deletions(-)

-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook
  2021-12-15  1:15 [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Sean Christopherson
@ 2021-12-15  1:15 ` Sean Christopherson
  2021-12-15  1:15 ` [PATCH 2/4] KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root() Sean Christopherson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-12-15  1:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Use the common TDP MMU zap helper when handling an MMU notifier unmap
event, the two flows are semantically identical.  Consolidate the code in
preparation for a future bug fix, as both kvm_tdp_mmu_unmap_gfn_range()
and __kvm_tdp_mmu_zap_gfn_range() are guilty of not zapping SPTEs in
invalid roots.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7b1bc816b7c3..d320b56d5cd7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1032,13 +1032,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
 				 bool flush)
 {
-	struct kvm_mmu_page *root;
-
-	for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, false)
-		flush = zap_gfn_range(kvm, root, range->start, range->end,
-				      range->may_block, flush, false);
-
-	return flush;
+	return __kvm_tdp_mmu_zap_gfn_range(kvm, range->slot->as_id, range->start,
+					   range->end, range->may_block, flush);
 }
 
 typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter,
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
  2021-12-15  1:15 [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Sean Christopherson
  2021-12-15  1:15 ` [PATCH 1/4] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Sean Christopherson
@ 2021-12-15  1:15 ` Sean Christopherson
  2021-12-15  1:15 ` [PATCH 3/4] KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU Sean Christopherson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-12-15  1:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Move the check for an invalid root out of kvm_tdp_mmu_get_root() and into
the one place it actually matters, tdp_mmu_next_root(), as the other user
already has an implicit validity check.  A future bug fix will need to
get references to invalid roots to honor mmu_notifier requests; there's
no point in forcing what will be a common path to open code getting a
reference to a root.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.h |  3 ---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d320b56d5cd7..200001190fcf 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -121,9 +121,14 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
 						   typeof(*next_root), link);
 
-	while (next_root && !kvm_tdp_mmu_get_root(kvm, next_root))
+	while (next_root) {
+		if (!next_root->role.invalid &&
+		    kvm_tdp_mmu_get_root(kvm, next_root))
+			break;
+
 		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
 				&next_root->link, typeof(*next_root), link);
+	}
 
 	rcu_read_unlock();
 
@@ -200,7 +205,10 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 
 	role = page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level);
 
-	/* Check for an existing root before allocating a new one. */
+	/*
+	 * Check for an existing root before allocating a new one.  Note, the
+	 * role check prevents consuming an invalid root.
+	 */
 	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
 		if (root->role.word == role.word &&
 		    kvm_tdp_mmu_get_root(kvm, root))
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 3899004a5d91..08c917511fed 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -10,9 +10,6 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm *kvm,
 						     struct kvm_mmu_page *root)
 {
-	if (root->role.invalid)
-		return false;
-
 	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
 }
 
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
  2021-12-15  1:15 [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Sean Christopherson
  2021-12-15  1:15 ` [PATCH 1/4] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Sean Christopherson
  2021-12-15  1:15 ` [PATCH 2/4] KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root() Sean Christopherson
@ 2021-12-15  1:15 ` Sean Christopherson
  2021-12-15  1:15 ` [PATCH 4/4] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
  2021-12-15 17:20 ` [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Paolo Bonzini
  4 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-12-15  1:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Zap both valid and invalid roots when zapping/unmapping a gfn range, as
KVM must ensure it holds no references to the freed page after returning
from the unmap operation.  Most notably, the TDP MMU doesn't zap invalid
roots in mmu_notifier callbacks.  This leads to use-after-free and other
issues if the mmu_notifier runs to completion while an invalid root
zapper yields as KVM fails to honor the requirement that there must be
_no_ references to the page after the mmu_notifier returns.

The bug is most easily reproduced by hacking KVM to cause a collision
between set_nx_huge_pages() and kvm_mmu_notifier_release(), but the bug
exists between kvm_mmu_notifier_invalidate_range_start() and memslot
updates as well.  Invalidating a root ensures pages aren't accessible by
the guest, and KVM won't read or write page data itself, but KVM will
trigger e.g. kvm_set_pfn_dirty() when zapping SPTEs, and thus completing
a zap of an invalid root _after_ the mmu_notifier returns is fatal.

  WARNING: CPU: 24 PID: 1496 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:173 [kvm]
  RIP: 0010:kvm_is_zone_device_pfn+0x96/0xa0 [kvm]
  Call Trace:
   <TASK>
   kvm_set_pfn_dirty+0xa8/0xe0 [kvm]
   __handle_changed_spte+0x2ab/0x5e0 [kvm]
   __handle_changed_spte+0x2ab/0x5e0 [kvm]
   __handle_changed_spte+0x2ab/0x5e0 [kvm]
   zap_gfn_range+0x1f3/0x310 [kvm]
   kvm_tdp_mmu_zap_invalidated_roots+0x50/0x90 [kvm]
   kvm_mmu_zap_all_fast+0x177/0x1a0 [kvm]
   set_nx_huge_pages+0xb4/0x190 [kvm]
   param_attr_store+0x70/0x100
   module_attr_store+0x19/0x30
   kernfs_fop_write_iter+0x119/0x1b0
   new_sync_write+0x11c/0x1b0
   vfs_write+0x1cc/0x270
   ksys_write+0x5f/0xe0
   do_syscall_64+0x38/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Fixes: b7cccd397f31 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 39 +++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 200001190fcf..577985fa001d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -99,15 +99,18 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 }
 
 /*
- * Finds the next valid root after root (or the first valid root if root
- * is NULL), takes a reference on it, and returns that next root. If root
- * is not NULL, this thread should have already taken a reference on it, and
- * that reference will be dropped. If no valid root is found, this
- * function will return NULL.
+ * Returns the next root after @prev_root (or the first root if @prev_root is
+ * NULL).  A reference to the returned root is acquired, and the reference to
+ * @prev_root is released (the caller obviously must hold a reference to
+ * @prev_root if it's non-NULL).
+ *
+ * If @only_valid is true, invalid roots are skipped.
+ *
+ * Returns NULL if the end of tdp_mmu_roots was reached.
  */
 static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 					      struct kvm_mmu_page *prev_root,
-					      bool shared)
+					      bool shared, bool only_valid)
 {
 	struct kvm_mmu_page *next_root;
 
@@ -122,7 +125,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 						   typeof(*next_root), link);
 
 	while (next_root) {
-		if (!next_root->role.invalid &&
+		if ((!only_valid || !next_root->role.invalid) &&
 		    kvm_tdp_mmu_get_root(kvm, next_root))
 			break;
 
@@ -148,13 +151,19 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * mode. In the unlikely event that this thread must free a root, the lock
  * will be temporarily dropped and reacquired in write mode.
  */
-#define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)	\
-	for (_root = tdp_mmu_next_root(_kvm, NULL, _shared);		\
-	     _root;							\
-	     _root = tdp_mmu_next_root(_kvm, _root, _shared))		\
-		if (kvm_mmu_page_as_id(_root) != _as_id) {		\
+#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, _only_valid)\
+	for (_root = tdp_mmu_next_root(_kvm, NULL, _shared, _only_valid);	\
+	     _root;								\
+	     _root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid))	\
+		if (kvm_mmu_page_as_id(_root) != _as_id) {			\
 		} else
 
+#define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)	\
+	__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, true)
+
+#define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)		\
+	__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, false)
+
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)				\
 	list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link,		\
 				lockdep_is_held_type(&kvm->mmu_lock, 0) ||	\
@@ -1224,7 +1233,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
+	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		spte_set |= wrprot_gfn_range(kvm, root, slot->base_gfn,
 			     slot->base_gfn + slot->npages, min_level);
 
@@ -1294,7 +1303,7 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
+	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		spte_set |= clear_dirty_gfn_range(kvm, root, slot->base_gfn,
 				slot->base_gfn + slot->npages);
 
@@ -1419,7 +1428,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
+	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		zap_collapsible_spte_range(kvm, root, slot);
 }
 
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots
  2021-12-15  1:15 [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-12-15  1:15 ` [PATCH 3/4] KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU Sean Christopherson
@ 2021-12-15  1:15 ` Sean Christopherson
  2021-12-15 17:20 ` [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Paolo Bonzini
  4 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-12-15  1:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Now that tdp_mmu_next_root() can process both valid and invalid roots,
extend it to be able to process _only_ invalid roots, add yet another
iterator macro for walking invalid roots, and use the new macro in
kvm_tdp_mmu_zap_invalidated_roots().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 76 ++++++++++++++------------------------
 1 file changed, 27 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 577985fa001d..b6f7ba057f65 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -98,22 +98,34 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
 
+enum tdp_mmu_roots_iter_type {
+	ALL_ROOTS = -1,
+	VALID_ROOTS = 0,
+	INVALID_ROOTS = 1,
+};
+
 /*
  * Returns the next root after @prev_root (or the first root if @prev_root is
  * NULL).  A reference to the returned root is acquired, and the reference to
  * @prev_root is released (the caller obviously must hold a reference to
  * @prev_root if it's non-NULL).
  *
- * If @only_valid is true, invalid roots are skipped.
+ * If @type is not ALL_ROOTS, (in)valid roots are skipped accordingly.
  *
  * Returns NULL if the end of tdp_mmu_roots was reached.
  */
 static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 					      struct kvm_mmu_page *prev_root,
-					      bool shared, bool only_valid)
+					      bool shared,
+					      enum tdp_mmu_roots_iter_type type)
 {
 	struct kvm_mmu_page *next_root;
 
+	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
+
+	/* Ensure correctness for the below comparison against role.invalid. */
+	BUILD_BUG_ON(!!VALID_ROOTS || !INVALID_ROOTS);
+
 	rcu_read_lock();
 
 	if (prev_root)
@@ -125,7 +137,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 						   typeof(*next_root), link);
 
 	while (next_root) {
-		if ((!only_valid || !next_root->role.invalid) &&
+		if ((type == ALL_ROOTS || (type == !!next_root->role.invalid)) &&
 		    kvm_tdp_mmu_get_root(kvm, next_root))
 			break;
 
@@ -151,18 +163,21 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * mode. In the unlikely event that this thread must free a root, the lock
  * will be temporarily dropped and reacquired in write mode.
  */
-#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, _only_valid)\
-	for (_root = tdp_mmu_next_root(_kvm, NULL, _shared, _only_valid);	\
+#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, _type) \
+	for (_root = tdp_mmu_next_root(_kvm, NULL, _shared, _type);		\
 	     _root;								\
-	     _root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid))	\
-		if (kvm_mmu_page_as_id(_root) != _as_id) {			\
+	     _root = tdp_mmu_next_root(_kvm, _root, _shared, _type))		\
+		if (_as_id > 0 && kvm_mmu_page_as_id(_root) != _as_id) {	\
 		} else
 
+#define for_each_invalid_tdp_mmu_root_yield_safe(_kvm, _root)			\
+	__for_each_tdp_mmu_root_yield_safe(_kvm, _root, -1, true, INVALID_ROOTS)
+
 #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)	\
-	__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, true)
+	__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, VALID_ROOTS)
 
 #define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)		\
-	__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, false)
+	__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, ALL_ROOTS)
 
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)				\
 	list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link,		\
@@ -811,28 +826,6 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
 		kvm_flush_remote_tlbs(kvm);
 }
 
-static struct kvm_mmu_page *next_invalidated_root(struct kvm *kvm,
-						  struct kvm_mmu_page *prev_root)
-{
-	struct kvm_mmu_page *next_root;
-
-	if (prev_root)
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
-						  &prev_root->link,
-						  typeof(*prev_root), link);
-	else
-		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
-						   typeof(*next_root), link);
-
-	while (next_root && !(next_root->role.invalid &&
-			      refcount_read(&next_root->tdp_mmu_root_count)))
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
-						  &next_root->link,
-						  typeof(*next_root), link);
-
-	return next_root;
-}
-
 /*
  * Since kvm_tdp_mmu_zap_all_fast has acquired a reference to each
  * invalidated root, they will not be freed until this function drops the
@@ -843,36 +836,21 @@ static struct kvm_mmu_page *next_invalidated_root(struct kvm *kvm,
  */
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
 {
-	struct kvm_mmu_page *next_root;
 	struct kvm_mmu_page *root;
 	bool flush = false;
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
-	rcu_read_lock();
-
-	root = next_invalidated_root(kvm, NULL);
-
-	while (root) {
-		next_root = next_invalidated_root(kvm, root);
-
-		rcu_read_unlock();
-
+	for_each_invalid_tdp_mmu_root_yield_safe(kvm, root) {
 		flush = zap_gfn_range(kvm, root, 0, -1ull, true, flush, true);
 
 		/*
-		 * Put the reference acquired in
-		 * kvm_tdp_mmu_invalidate_roots
+		 * Put the reference acquired in kvm_tdp_mmu_invalidate_roots().
+		 * Note, the iterator holds its own reference.
 		 */
 		kvm_tdp_mmu_put_root(kvm, root, true);
-
-		root = next_root;
-
-		rcu_read_lock();
 	}
 
-	rcu_read_unlock();
-
 	if (flush)
 		kvm_flush_remote_tlbs(kvm);
 }
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping
  2021-12-15  1:15 [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-12-15  1:15 ` [PATCH 4/4] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
@ 2021-12-15 17:20 ` Paolo Bonzini
  2021-12-15 20:41   ` Sean Christopherson
  4 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2021-12-15 17:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 12/15/21 02:15, Sean Christopherson wrote:
> Patches 01-03 implement a bug fix by ensuring KVM zaps both valid and
> invalid roots when unmapping a gfn range (including the magic "all" range).
> Failure to zap invalid roots means KVM doesn't honor the mmu_notifier's
> requirement that all references are dropped.
> 
> set_nx_huge_pages() is the most blatant offender, as it doesn't elevate
> mm_users and so a VM's entire mm can be released, but the same underlying
> bug exists for any "unmap" command from the mmu_notifier in combination
> with a memslot update.  E.g. if KVM is deleting a memslot, and a
> mmu_notifier hook acquires mmu_lock while it's dropped by
> kvm_mmu_zap_all_fast(), the mmu_notifier hook will see the to-be-deleted
> memslot but won't zap entries from the invalid roots.
> 
> Patch 04 is cleanup to reuse the common iterator for walking _only_
> invalid roots.
> 
> Sean Christopherson (4):
>    KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap
>      hook
>    KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
>    KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
>    KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots
> 
>   arch/x86/kvm/mmu/tdp_mmu.c | 116 +++++++++++++++++--------------------
>   arch/x86/kvm/mmu/tdp_mmu.h |   3 -
>   2 files changed, 53 insertions(+), 66 deletions(-)
> 

Queued 1-3 for 5.16 and 4 for 5.17.

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping
  2021-12-15 17:20 ` [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Paolo Bonzini
@ 2021-12-15 20:41   ` Sean Christopherson
  2021-12-16  0:44     ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2021-12-15 20:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On Wed, Dec 15, 2021, Paolo Bonzini wrote:
> On 12/15/21 02:15, Sean Christopherson wrote:
> > Patches 01-03 implement a bug fix by ensuring KVM zaps both valid and
> > invalid roots when unmapping a gfn range (including the magic "all" range).
> > Failure to zap invalid roots means KVM doesn't honor the mmu_notifier's
> > requirement that all references are dropped.
> > 
> > set_nx_huge_pages() is the most blatant offender, as it doesn't elevate
> > mm_users and so a VM's entire mm can be released, but the same underlying
> > bug exists for any "unmap" command from the mmu_notifier in combination
> > with a memslot update.  E.g. if KVM is deleting a memslot, and a
> > mmu_notifier hook acquires mmu_lock while it's dropped by
> > kvm_mmu_zap_all_fast(), the mmu_notifier hook will see the to-be-deleted
> > memslot but won't zap entries from the invalid roots.
> > 
> > Patch 04 is cleanup to reuse the common iterator for walking _only_
> > invalid roots.
> > 
> > Sean Christopherson (4):
> >    KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap
> >      hook
> >    KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
> >    KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
> >    KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots
> > 
> >   arch/x86/kvm/mmu/tdp_mmu.c | 116 +++++++++++++++++--------------------
> >   arch/x86/kvm/mmu/tdp_mmu.h |   3 -
> >   2 files changed, 53 insertions(+), 66 deletions(-)
> > 
> 
> Queued 1-3 for 5.16 and 4 for 5.17.

Actually, can you please unqueue patch 4?  I think we can actually kill off
kvm_tdp_mmu_zap_invalidated_roots() entirely.  I don't know if that code will be
ready for 5.17, but if it is then this patch is unnecesary.  And if not, it
shouldn't be difficult to re-queue this a bit later.

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping
  2021-12-15 20:41   ` Sean Christopherson
@ 2021-12-16  0:44     ` Sean Christopherson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-12-16  0:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On Wed, Dec 15, 2021, Sean Christopherson wrote:
> On Wed, Dec 15, 2021, Paolo Bonzini wrote:
> > On 12/15/21 02:15, Sean Christopherson wrote:
> > > Patches 01-03 implement a bug fix by ensuring KVM zaps both valid and
> > > invalid roots when unmapping a gfn range (including the magic "all" range).
> > > Failure to zap invalid roots means KVM doesn't honor the mmu_notifier's
> > > requirement that all references are dropped.
> > > 
> > > set_nx_huge_pages() is the most blatant offender, as it doesn't elevate
> > > mm_users and so a VM's entire mm can be released, but the same underlying
> > > bug exists for any "unmap" command from the mmu_notifier in combination
> > > with a memslot update.  E.g. if KVM is deleting a memslot, and a
> > > mmu_notifier hook acquires mmu_lock while it's dropped by
> > > kvm_mmu_zap_all_fast(), the mmu_notifier hook will see the to-be-deleted
> > > memslot but won't zap entries from the invalid roots.
> > > 
> > > Patch 04 is cleanup to reuse the common iterator for walking _only_
> > > invalid roots.
> > > 
> > > Sean Christopherson (4):
> > >    KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap
> > >      hook
> > >    KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
> > >    KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
> > >    KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots
> > > 
> > >   arch/x86/kvm/mmu/tdp_mmu.c | 116 +++++++++++++++++--------------------
> > >   arch/x86/kvm/mmu/tdp_mmu.h |   3 -
> > >   2 files changed, 53 insertions(+), 66 deletions(-)
> > > 
> > 
> > Queued 1-3 for 5.16 and 4 for 5.17.
> 
> Actually, can you please unqueue patch 4?  I think we can actually kill off
> kvm_tdp_mmu_zap_invalidated_roots() entirely.  I don't know if that code will be
> ready for 5.17, but if it is then this patch is unnecesary.  And if not, it
> shouldn't be difficult to re-queue this a bit later.

Cancel that request, the comment above kvm_tdp_mmu_zap_invalidated_roots() lies,
as do the changelogs for commits b7cccd397f31 ("KVM: x86/mmu: Fast invalidation
for TDP MMU") and 4c6654bd160d ("KVM: x86/mmu: Tear down roots before
kvm_mmu_zap_all_fast returns"), and the fact that they are even separate commits.

KVM _must_ zap invalid roots before returning from kvm_mmu_zap_all_fast(), because
when it's called from kvm_mmu_invalidate_zap_pages_in_memslot(), KVM is relying on
it to fully remove all references to the memslot.  Once the memslot is gone, KVM's
mmu_notifier hooks will be unable to find the stale references as the hva=>gfn
translation is done via the memslots.   If userspace unmaps a range after deleting
a memslot, KVM will fail to zap in response to the mmu_notifier due to not finding
a memslot corresponding to the notifier's range, which leads to another variation
of the splat I've come to know and love.

  WARNING: CPU: 33 PID: 44927 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:173
  RIP: 0010:kvm_is_zone_device_pfn+0x96/0xa0 [kvm]
   kvm_set_pfn_dirty+0xa8/0xe0 [kvm]
   __handle_changed_spte+0x2f7/0x5b0 [kvm]
   __handle_changed_spte+0x2f7/0x5b0 [kvm]
   __tdp_mmu_set_spte+0x64/0x170 [kvm]
   tdp_mmu_zap_root+0x1f5/0x220 [kvm]
   kvm_tdp_mmu_zap_all+0x47/0x60 [kvm]
   kvm_mmu_zap_all+0xf0/0x100 [kvm]
   kvm_mmu_notifier_release+0x2b/0x60 [kvm]
   mmu_notifier_unregister+0x48/0xe0
   kvm_destroy_vm+0x129/0x2a0 [kvm]
   kvm_vm_release+0x1d/0x30 [kvm]
   __fput+0x82/0x240
   task_work_run+0x5c/0x90
   exit_to_user_mode_prepare+0x114/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x48/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

I'll include a patch in my flush+zap rework series to reword that comment, because
it is very, very misleading.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-16  0:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-15  1:15 [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Sean Christopherson
2021-12-15  1:15 ` [PATCH 1/4] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Sean Christopherson
2021-12-15  1:15 ` [PATCH 2/4] KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root() Sean Christopherson
2021-12-15  1:15 ` [PATCH 3/4] KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU Sean Christopherson
2021-12-15  1:15 ` [PATCH 4/4] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
2021-12-15 17:20 ` [PATCH 0/4] KVM: x86/mmu: Zap invalid TDP MMU roots when unmapping Paolo Bonzini
2021-12-15 20:41   ` Sean Christopherson
2021-12-16  0:44     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).