Re: [PATCH] KVM: x86/mmu: Add capability to zap only sptes for the affected memslot

From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Xiong Zhang <xiong.y.zhang@intel.com>,
	Wayne Boyer <wayne.boyer@intel.com>,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH] KVM: x86/mmu: Add capability to zap only sptes for the affected memslot
Date: Thu, 9 Jul 2020 14:12:53 -0700	[thread overview]
Message-ID: <20200709211253.GW24919@linux.intel.com> (raw)
In-Reply-To: <51637a13-f23b-8b76-c93a-76346b4cc982@redhat.com>

On Wed, Jul 08, 2020 at 06:08:24PM +0200, Paolo Bonzini wrote:
> On 03/07/20 04:50, Sean Christopherson wrote:
> > Introduce a new capability, KVM_CAP_MEMSLOT_ZAP_CONTROL, to allow
> > userspace to control the memslot zapping behavior on a per-VM basis.
> > x86's default behavior is to zap all SPTEs, including the root shadow
> > page, across all memslots.  While effective, the nuke and pave approach
> > isn't exactly performant, especially for large VMs and/or VMs that
> > heavily utilize RO memslots for MMIO devices, e.g. option ROMs.
> > 
> > On a vanilla VM with 6gb of RAM, the targeted zap reduces the number of
> > EPT violations during boot by ~14% with THP enabled in the host, and by
> > ~7% with THP disabled in the host.  On a much more custom VM with 32gb
> > and a significant amount of memslot zapping, this can reduce the number
> > of EPT violations by 50% during guest boot, and improve boot time by
> > as much as 25%.
> > 
> > Keep the current x86 memslot zapping behavior as the default, as there's
> > an unresolved bug that pops up when zapping only the affected memslot,
> > and the exact conditions that trigger the bug are not fully known.  See
> > https://patchwork.kernel.org/patch/10798453 for details.
> > 
> > Implement the capability as a set of flags so that other architectures
> > might be able to use the capability without having to conform to x86's
> > semantics.
> 
> It's bad that we have no clue what's causing the bad behavior, but I
> don't think it's wise to have a bug that is known to happen when you
> enable the capability. :/

I don't necessarily disagree, but at the same time it's entirely possible
it's a Qemu bug.  If the bad behavior doesn't occur with other VMMs, those
other VMMs shouldn't be penalized because we can't figure out what Qemu is
getting wrong.

Even if this is a kernel bug, I'm fairly confident at this point that it's
not a KVM bug.  Or rather, if it's a KVM "bug", then there's a fundamental
dependency in memslot management that needs to be rooted out and documented.

And we're kind of in a catch-22; it'll be extremely difficult to narrow down
exactly who is breaking what without being able to easily test the optimized
zapping with other VMMs and/or setups.