All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/2] Deprecate BUG() in pte_list_remove() in shadow mmu
@ 2022-11-28  0:20 Mingwei Zhang
  2022-11-28  0:20 ` [RFC PATCH v3 1/2] KVM: x86/mmu: plumb struct kvm all the way to pte_list_remove() Mingwei Zhang
  2022-11-28  0:20 ` [RFC PATCH v3 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu Mingwei Zhang
  0 siblings, 2 replies; 5+ messages in thread
From: Mingwei Zhang @ 2022-11-28  0:20 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: H. Peter Anvin, kvm, linux-kernel, Mingwei Zhang,
	Nagareddy Reddy, Jim Mattson, David Matlack

Deprecate BUG() in pte_list_remove() in shadow mmu to avoid crashing a
physical machine. There are several reasons and motivations to do so:

MMU bug is difficult to discover due to various racing conditions and
corner cases and thus it extremely hard to debug. The situation gets much
worse when it triggers the shutdown of a host. Host machine crash might
eliminates everything including the potential clues for debugging.

From cloud computing service perspective, BUG() or BUG_ON() is probably no
longer appropriate as the host reliability is top priority. Crashing the
physical machine is almost never a good option as it eliminates innocent
VMs and cause service outage in a larger scope. Even worse, if attacker can
reliably triggers this code by diverting the control flow or corrupting the
memory, then this becomes vm-of-death attack. This is a huge attack vector
to cloud providers, as the death of one single host machine is not the end
of the story. Without manual interferences, a failed cloud job may be
dispatched to other hosts and continue host crashes until all of them are
dead.

For the above reason, we propose the replacement of BUG() in
pte_list_remove() with KVM_BUG() to crash just the VM itself.


v2 -> v3:
 - plumb @kvm all the way to pte_list_remove() [seanjc, pbonzini]

v1 -> v2:
 - compile test the code.
 - fill KVM_BUG() with kvm_get_running_vcpu()->kvm
rfc v2:
 - https://lore.kernel.org/all/20221124003505.424617-1-mizhang@google.com/

rfc v1:
 - https://lore.kernel.org/all/20221123231206.274392-1-mizhang@google.com/


Mingwei Zhang (2):
  KVM: x86/mmu: plumb struct kvm all the way to pte_list_remove()
  KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu

 arch/x86/kvm/mmu/mmu.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH v3 1/2] KVM: x86/mmu: plumb struct kvm all the way to pte_list_remove()
  2022-11-28  0:20 [RFC PATCH v3 0/2] Deprecate BUG() in pte_list_remove() in shadow mmu Mingwei Zhang
@ 2022-11-28  0:20 ` Mingwei Zhang
  2022-11-28  0:20 ` [RFC PATCH v3 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu Mingwei Zhang
  1 sibling, 0 replies; 5+ messages in thread
From: Mingwei Zhang @ 2022-11-28  0:20 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: H. Peter Anvin, kvm, linux-kernel, Mingwei Zhang,
	Nagareddy Reddy, Jim Mattson, David Matlack

Plumb struct kvm all the way to pte_list_remove() to allow the usage of
KVM_BUG() and/or KVM_BUG_ON(). This is the prepration step to depricate the
usage of BUG() in pte_list_remove() in shadow mmu.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4736d7849c60..b5a44b8f5f7b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -947,7 +947,8 @@ pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
 	mmu_free_pte_list_desc(desc);
 }
 
-static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
+static void pte_list_remove(struct kvm *kvm, u64 *spte,
+			    struct kvm_rmap_head *rmap_head)
 {
 	struct pte_list_desc *desc;
 	struct pte_list_desc *prev_desc;
@@ -987,7 +988,7 @@ static void kvm_zap_one_rmap_spte(struct kvm *kvm,
 				  struct kvm_rmap_head *rmap_head, u64 *sptep)
 {
 	mmu_spte_clear_track_bits(kvm, sptep);
-	pte_list_remove(sptep, rmap_head);
+	pte_list_remove(kvm, sptep, rmap_head);
 }
 
 /* Return true if at least one SPTE was zapped, false otherwise */
@@ -1077,7 +1078,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	slot = __gfn_to_memslot(slots, gfn);
 	rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
 
-	pte_list_remove(spte, rmap_head);
+	pte_list_remove(kvm, spte, rmap_head);
 }
 
 /*
@@ -1730,16 +1731,16 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
 	pte_list_add(cache, parent_pte, &sp->parent_ptes);
 }
 
-static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
+static void mmu_page_remove_parent_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 				       u64 *parent_pte)
 {
-	pte_list_remove(parent_pte, &sp->parent_ptes);
+	pte_list_remove(kvm, parent_pte, &sp->parent_ptes);
 }
 
-static void drop_parent_pte(struct kvm_mmu_page *sp,
+static void drop_parent_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			    u64 *parent_pte)
 {
-	mmu_page_remove_parent_pte(sp, parent_pte);
+	mmu_page_remove_parent_pte(kvm, sp, parent_pte);
 	mmu_spte_clear_no_track(parent_pte);
 }
 
@@ -2382,7 +2383,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		if (child->role.access == direct_access)
 			return;
 
-		drop_parent_pte(child, sptep);
+		drop_parent_pte(vcpu->kvm, child, sptep);
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
 	}
 }
@@ -2400,7 +2401,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			drop_spte(kvm, spte);
 		} else {
 			child = spte_to_child_sp(pte);
-			drop_parent_pte(child, spte);
+			drop_parent_pte(kvm, child, spte);
 
 			/*
 			 * Recursively zap nested TDP SPs, parentless SPs are
@@ -2431,13 +2432,13 @@ static int kvm_mmu_page_unlink_children(struct kvm *kvm,
 	return zapped;
 }
 
-static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
+static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 
 	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
-		drop_parent_pte(sp, sptep);
+		drop_parent_pte(kvm, sp, sptep);
 }
 
 static int mmu_zap_unsync_children(struct kvm *kvm,
@@ -2475,7 +2476,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	++kvm->stat.mmu_shadow_zapped;
 	*nr_zapped = mmu_zap_unsync_children(kvm, sp, invalid_list);
 	*nr_zapped += kvm_mmu_page_unlink_children(kvm, sp, invalid_list);
-	kvm_mmu_unlink_parents(sp);
+	kvm_mmu_unlink_parents(kvm, sp);
 
 	/* Zapping children means active_mmu_pages has become unstable. */
 	list_unstable = *nr_zapped;
@@ -2839,7 +2840,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 			u64 pte = *sptep;
 
 			child = spte_to_child_sp(pte);
-			drop_parent_pte(child, sptep);
+			drop_parent_pte(vcpu->kvm, child, sptep);
 			flush = true;
 		} else if (pfn != spte_to_pfn(*sptep)) {
 			pgprintk("hfn old %llx new %llx\n",
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH v3 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu
  2022-11-28  0:20 [RFC PATCH v3 0/2] Deprecate BUG() in pte_list_remove() in shadow mmu Mingwei Zhang
  2022-11-28  0:20 ` [RFC PATCH v3 1/2] KVM: x86/mmu: plumb struct kvm all the way to pte_list_remove() Mingwei Zhang
@ 2022-11-28  0:20 ` Mingwei Zhang
  2022-11-28 17:46   ` Sean Christopherson
  1 sibling, 1 reply; 5+ messages in thread
From: Mingwei Zhang @ 2022-11-28  0:20 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: H. Peter Anvin, kvm, linux-kernel, Mingwei Zhang,
	Nagareddy Reddy, Jim Mattson, David Matlack

Replace BUG() in pte_list_remove() with KVM_BUG() to avoid crashing the
host. MMU bug is difficult to discover due to various racing conditions and
corner cases and thus it extremely hard to debug. The situation gets much
worse when it triggers the shutdown of a host. Host machine crash
eliminates everything including the potential clues for debugging.

From cloud computing service perspective, BUG() or BUG_ON() is probably no
longer appropriate as the host reliability is top priority. Crashing the
physical machine is almost never a good option as it eliminates innocent
VMs and cause service outage in a larger scope. Even worse, if attacker can
reliably triggers this code by diverting the control flow or corrupting the
memory, then this becomes vm-of-death attack. This is a huge attack vector
to cloud providers, as the death of one single host machine is not the end
of the story. Without manual interferences, a failed cloud job may be
dispatched to other hosts and continue host crashes until all of them are
dead.

Because of the above reasons, shrink the scope of crash to the target VM
only. KVM_BUG() and KVM_BUG_ON() requires a valid struct kvm which requires
extra plumbing. Avoid it in this version by just using
kvm_get_running_vcpu()->kvm instead.

Cc: Nagareddy Reddy <nspreddy@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b5a44b8f5f7b..e132d82ab4c0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -956,12 +956,12 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
 
 	if (!rmap_head->val) {
 		pr_err("%s: %p 0->BUG\n", __func__, spte);
-		BUG();
+		KVM_BUG(true, kvm, "");
 	} else if (!(rmap_head->val & 1)) {
 		rmap_printk("%p 1->0\n", spte);
 		if ((u64 *)rmap_head->val != spte) {
 			pr_err("%s:  %p 1->BUG\n", __func__, spte);
-			BUG();
+			KVM_BUG(true, kvm, "");
 		}
 		rmap_head->val = 0;
 	} else {
@@ -980,7 +980,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
 			desc = desc->more;
 		}
 		pr_err("%s: %p many->many\n", __func__, spte);
-		BUG();
+		KVM_BUG(true, kvm, "");
 	}
 }
 
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v3 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu
  2022-11-28  0:20 ` [RFC PATCH v3 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu Mingwei Zhang
@ 2022-11-28 17:46   ` Sean Christopherson
  2022-11-28 19:26     ` Mingwei Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2022-11-28 17:46 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, H. Peter Anvin, kvm, linux-kernel,
	Nagareddy Reddy, Jim Mattson, David Matlack

On Mon, Nov 28, 2022, Mingwei Zhang wrote:
> Replace BUG() in pte_list_remove() with KVM_BUG() to avoid crashing the
> host. MMU bug is difficult to discover due to various racing conditions and
> corner cases and thus it extremely hard to debug. The situation gets much
> worse when it triggers the shutdown of a host. Host machine crash
> eliminates everything including the potential clues for debugging.
> 
> From cloud computing service perspective, BUG() or BUG_ON() is probably no
> longer appropriate as the host reliability is top priority.

I don't think we need to bring "cloud computing" into this.  Linus has made it
clear over and over and over that BUG() / BUG_ON() need to be avoided unless
the alternative is worse.  E.g. the BUG() in __handle_changed_spte() is warranted
because the alternative is silent corruption of guest data.

> Crashing the physical machine is almost never a good option as it eliminates
> innocent VMs and cause service outage in a larger scope. Even worse, if
> attacker can reliably triggers this code by diverting the control flow or
> corrupting the memory,

Or if there's a KVM bug, which is waaaaay more likely.

> then this becomes vm-of-death attack.

This is true of any BUG(), and really of any unexpected fault while holding a
spinlock, e.g. NULL pointer derefs in the MMU are almost always fatal as well.

> This is a huge attack vector to cloud providers, as the death of one single
> host machine is not the end of the story. Without manual interferences, a
> failed cloud job may be dispatched to other hosts and continue host crashes
> until all of them are dead.
> 
> Because of the above reasons, shrink the scope of crash to the target VM
> only. KVM_BUG() and KVM_BUG_ON() requires a valid struct kvm which requires
> extra plumbing. Avoid it in this version by just using
> kvm_get_running_vcpu()->kvm instead.

Stale comment.

> Cc: Nagareddy Reddy <nspreddy@google.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: David Matlack <dmatlack@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b5a44b8f5f7b..e132d82ab4c0 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -956,12 +956,12 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
>  
>  	if (!rmap_head->val) {
>  		pr_err("%s: %p 0->BUG\n", __func__, spte);

These probably need to be ratelimited (or "once").  Bugging the VM will prevent
doing anything useful with the VM, but KVM still needs to destroy the VM, which
means zapping SPTEs and purging the rmaps.  Theoretically, there could be thousands
of broken rmaps.

> -		BUG();
> +		KVM_BUG(true, kvm, "");

If you don't want to provide a message, use KVM_BUG_ON(), not an empty message.
Though my vote would be to fold the existing pr_err() messages into KVM_BUG(),
which would make the WARN much more helpful and would address the pr_err() issue
above.  The __func__ printing can also go away in that case because the stack
track will provide all the necessary info.  The only reason not to drop the
pr_err() entirely is if a ratelimited message is helpful for debugging failures
that occur in production, which I doubt it true.

And rather than pass "true", wrap the actual check with the KVM_BUG().

>  	} else if (!(rmap_head->val & 1)) {
>  		rmap_printk("%p 1->0\n", spte);
>  		if ((u64 *)rmap_head->val != spte) {
>  			pr_err("%s:  %p 1->BUG\n", __func__, spte);
> -			BUG();
> +			KVM_BUG(true, kvm, "");

KVM needs to return here, otherwise KVM is knowingly writing a garbage pointer,
e.g. will corrupt memory or trigger a fault.

>  		}
>  		rmap_head->val = 0;
>  	} else {

Something like this?

---
 arch/x86/kvm/mmu/mmu.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b5a44b8f5f7b..12790ccb8731 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -954,15 +954,16 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
 	struct pte_list_desc *prev_desc;
 	int i;
 
-	if (!rmap_head->val) {
-		pr_err("%s: %p 0->BUG\n", __func__, spte);
-		BUG();
-	} else if (!(rmap_head->val & 1)) {
+	if (KVM_BUG(!rmap_head->val, kvm, "rmap for %p is empty", spte))
+		return;
+
+	if (!(rmap_head->val & 1)) {
 		rmap_printk("%p 1->0\n", spte);
-		if ((u64 *)rmap_head->val != spte) {
-			pr_err("%s:  %p 1->BUG\n", __func__, spte);
-			BUG();
-		}
+
+		if (KVM_BUG((u64 *)rmap_head->val != spte, kvm,
+			    "single rmap for %p doesn't match", spte))
+			return;
+
 		rmap_head->val = 0;
 	} else {
 		rmap_printk("%p many->many\n", spte);
@@ -979,8 +980,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
 			prev_desc = desc;
 			desc = desc->more;
 		}
-		pr_err("%s: %p many->many\n", __func__, spte);
-		BUG();
+		KVM_BUG(true, kvm, "no rmap for %p (many->many)", spte);
 	}
 }
 

base-commit: d74237e747db7f9f27e821e6683d58185e846378
-- 


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v3 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu
  2022-11-28 17:46   ` Sean Christopherson
@ 2022-11-28 19:26     ` Mingwei Zhang
  0 siblings, 0 replies; 5+ messages in thread
From: Mingwei Zhang @ 2022-11-28 19:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, H. Peter Anvin, kvm, linux-kernel,
	Nagareddy Reddy, Jim Mattson, David Matlack

> Something like this?
>
> ---
>  arch/x86/kvm/mmu/mmu.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b5a44b8f5f7b..12790ccb8731 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -954,15 +954,16 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
>         struct pte_list_desc *prev_desc;
>         int i;
>
> -       if (!rmap_head->val) {
> -               pr_err("%s: %p 0->BUG\n", __func__, spte);
> -               BUG();
> -       } else if (!(rmap_head->val & 1)) {
> +       if (KVM_BUG(!rmap_head->val, kvm, "rmap for %p is empty", spte))
> +               return;
> +
> +       if (!(rmap_head->val & 1)) {
>                 rmap_printk("%p 1->0\n", spte);
> -               if ((u64 *)rmap_head->val != spte) {
> -                       pr_err("%s:  %p 1->BUG\n", __func__, spte);
> -                       BUG();
> -               }
> +
> +               if (KVM_BUG((u64 *)rmap_head->val != spte, kvm,
> +                           "single rmap for %p doesn't match", spte))
> +                       return;
> +
>                 rmap_head->val = 0;
>         } else {
>                 rmap_printk("%p many->many\n", spte);
> @@ -979,8 +980,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
>                         prev_desc = desc;
>                         desc = desc->more;
>                 }
> -               pr_err("%s: %p many->many\n", __func__, spte);
> -               BUG();
> +               KVM_BUG(true, kvm, "no rmap for %p (many->many)", spte);
>         }
>  }
>
>
> base-commit: d74237e747db7f9f27e821e6683d58185e846378
> --
>

make sense, will update that in the next version.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-28 19:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-28  0:20 [RFC PATCH v3 0/2] Deprecate BUG() in pte_list_remove() in shadow mmu Mingwei Zhang
2022-11-28  0:20 ` [RFC PATCH v3 1/2] KVM: x86/mmu: plumb struct kvm all the way to pte_list_remove() Mingwei Zhang
2022-11-28  0:20 ` [RFC PATCH v3 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu Mingwei Zhang
2022-11-28 17:46   ` Sean Christopherson
2022-11-28 19:26     ` Mingwei Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.