All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Mingwei Zhang <mizhang@google.com>
Cc: David Matlack <dmatlack@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Nagareddy Reddy <nspreddy@google.com>,
	Jim Mattson <jmattson@google.com>
Subject: Re: [RFC PATCH v4 0/2] Deprecate BUG() in pte_list_remove() in shadow mmu
Date: Mon, 12 Dec 2022 16:45:59 +0000	[thread overview]
Message-ID: <Y5dax8XJV0F5adUw@google.com> (raw)
In-Reply-To: <Y5avm5VXpRt263wQ@google.com>

On Mon, Dec 12, 2022, Mingwei Zhang wrote:
> On Fri, Dec 09, 2022, David Matlack wrote:
> > On Tue, Nov 29, 2022 at 07:12:35PM +0000, Mingwei Zhang wrote:
> > > Deprecate BUG() in pte_list_remove() in shadow mmu to avoid crashing a
> > > physical machine. There are several reasons and motivations to do so:
> > > 
> > > MMU bug is difficult to discover due to various racing conditions and
> > > corner cases and thus it extremely hard to debug. The situation gets much
> > > worse when it triggers the shutdown of a host. Host machine crash might
> > > eliminates everything including the potential clues for debugging.
> > > 
> > > From cloud computing service perspective, BUG() or BUG_ON() is probably no
> > > longer appropriate as the host reliability is top priority. Crashing the
> > > physical machine is almost never a good option as it eliminates innocent
> > > VMs and cause service outage in a larger scope. Even worse, if attacker can
> > > reliably triggers this code by diverting the control flow or corrupting the
> > > memory, then this becomes vm-of-death attack. This is a huge attack vector
> > > to cloud providers, as the death of one single host machine is not the end
> > > of the story. Without manual interferences, a failed cloud job may be
> > > dispatched to other hosts and continue host crashes until all of them are
> > > dead.
> > 
> > My only concern with using KVM_BUG() is whether the machine can keep
> > running correctly after this warning has been hit. In other words, are
> > we sure the damage is contained to just this VM?

Hmm, good point.  The counter-argument is that KVM doesn't BUG() in get_mmio_spte()
when a non-MMIO SPTE has reserved bits set, and as we've seen internally in multiple
splats where the reserved bits appear to be set by stack overflow, that has a much,
much higher probability of being a symptom of data corruption.

That said, that's more of a reason to change get_mmio_spte() than it is to ignore
potential data corruption in this case.  KVM arguably should kill the VM if
get_mmio_spte() fails too.

What about explicitly treating both get_mmio_spte() and this as potential data
corruption?  E.g. something like this, and then use the DATA_CORRUPTION variant
in pte_list_remove()?

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2055e04b8f89..1cb69c6d186b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4075,6 +4075,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
                        pr_err("------ spte = 0x%llx level = %d, rsvd bits = 0x%llx",
                               sptes[level], level,
                               get_rsvd_bits(rsvd_check, sptes[level], level));
+               KVM_BUG_ON_DATA_CORRUPTION(reserved, vcpu->kvm);
        }
 
        return reserved;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f16c4689322b..5c4a06f66f46 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -863,6 +863,17 @@ static inline void kvm_vm_bugged(struct kvm *kvm)
        unlikely(__ret);                                        \
 })
 
+#define KVM_BUG_ON_DATA_CORRUPTION(cond, kvm)                  \
+({                                                             \
+       int __ret = (cond);                                     \
+                                                               \
+       if (IS_ENABLED(CONFIG_BUG_ON_DATA_CORRUPTION))          \
+               BUG_ON(__ret);                                  \
+       else if (WARN_ON_ONCE(__ret && !(kvm)->vm_bugged))      \
+               kvm_vm_bugged(kvm);                             \
+       unlikely(__ret);                                        \
+})
+
 static inline void kvm_vcpu_srcu_read_lock(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_PROVE_RCU


> > If, for example, the KVM_BUG() was triggered by a use-after-free, then
> > there might be corrupted memory floating around in the machine.
> > 
> 
> David,
> 
> Your concern is quite reasonable. But given that both rmap and spte are
> pointers/data structures managed by individual VMs, i.e., none of them
> are global pointers, use-after-free is unlikely happening on cross-VM
> cases.

Being per-VM allocations doesn't change the behavior/impact of use-after-free.
E.g. if there is no rmap found for a SPTE then there's a non-zero chance KVM has
previously zapped the SPTE and freed the memory the SPTE pointed at, and thus KVM
might be reading/writing memory that is now owned by something else in the kernel.

> Even if there is, then shuting down those corrupted VMs is feasible
> here, since pte_list_remove() basically does the checking.

But the damage may already be done.  And the KVM_REQ_VM_DEAD request wont't be
recognized until the next vcpu_enter_enter_guest(), e.g. it won't prevent vCPUs
(or even this vCPU) from processing more 

  reply	other threads:[~2022-12-12 16:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-29 19:12 [RFC PATCH v4 0/2] Deprecate BUG() in pte_list_remove() in shadow mmu Mingwei Zhang
2022-11-29 19:12 ` [RFC PATCH v4 1/2] KVM: x86/mmu: plumb struct kvm all the way to pte_list_remove() Mingwei Zhang
2022-11-29 19:12 ` [RFC PATCH v4 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu Mingwei Zhang
2022-12-06 23:06 ` [RFC PATCH v4 0/2] Deprecate BUG() in pte_list_remove() " Mingwei Zhang
2022-12-09 21:28 ` David Matlack
2022-12-12  4:35   ` Mingwei Zhang
2022-12-12 16:45     ` Sean Christopherson [this message]
2022-12-13  0:09       ` David Matlack
2022-12-13  1:39         ` Mingwei Zhang
2022-12-13  4:12           ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5dax8XJV0F5adUw@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=nspreddy@google.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.