Re: A question of TDP unloading.

From: Sean Christopherson <seanjc@google.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Ben Gardon <bgardon@google.com>,
	kvm@vger.kernel.org
Subject: Re: A question of TDP unloading.
Date: Wed, 28 Jul 2021 18:37:38 +0000	[thread overview]
Message-ID: <YQGj8gj7fpWDdLg5@google.com> (raw)
In-Reply-To: <20210728065605.e4ql2hzrj5fkngux@linux.intel.com>

On Wed, Jul 28, 2021, Yu Zhang wrote:
> Thanks a lot for your reply, Sean.
> 
> On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > Hi all,
> > > 
> > >   I'd like to ask a question about kvm_reset_context(): is there any
> > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> > 
> > The short answer is that mmu_role is changing, thus a new root shadow page is
> > needed.
> 
> I saw the mmu_role is recalculated, but I have not figured out how this
> change would affect TDP. May I ask a favor to give an example? Thanks!
> 
> I realized that if we only recalculate the mmu role, but do not unload
> the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
> be inconsistent with the mmu context. But I do not understand why this
> shall affect TDP. 

The SPTEs themselves are not affected if the base mmu_role doesn't change; note,
this holds true for shadow paging, too.  What changes is all of the kvm_mmu
knowledge about how to walk the guest PTEs, e.g. if a guest toggles CR4.SMAP,
then KVM needs to recalculate the #PF permissions for guest accesses so that
emulating instructions at CPL=0 does the right thing.

As for EFER.NX and CR0.WP, they are in the base page role because they need to
be there for shadow paging, e.g. if the guest toggles EFER.NX, then the reserved
bit and executable permissions change, and reusing shadow paging for the old
EFER.NX could result in missed reserved #PF and/or incorrect executable #PF
behavior.

For simplicitly, it's far, far eaiser to reuse the same page role struct for
TDP paging (both legacy and TDP MMUs) and shadow paging.

However, I think we can safely ignore NX, WP, SMEP, and SMAP in direct shadow
pages, which would allow reusing a TDP root across changes.  This is only a baby
step (assuming it even works), as further changes to set_cr0/cr4/efer would be
needed to fully realize the optimizations, e.g. to avoid complete teardown if
the root_count hits zero.

I'll put this on my todo list, I've been looking for an excuse to update the
cr0/cr4/efer flows anyways :-).  If it works, the changes should be relatively
minor, if it works...

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8cdfd8d45c4..700664fe163e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
        role = vcpu->arch.mmu->mmu_role.base;
        role.level = level;
        role.direct = direct;
-       if (role.direct)
+       if (role.direct) {
                role.gpte_is_8_bytes = true;
+
+               /*
+                * Guest PTE permissions do not impact SPTE permissions for
+                * direct MMUs.  Either there are no guest PTEs (CR0.PG=0) or
+                * guest PTE permissions are enforced by the CPU (TDP enabled).
+                */
+               WARN_ON_ONCE(access != ACC_ALL);
+               role.efer_nx = 0;
+               role.cr0_wp = 0;
+               role.smep_andnot_wp = 0;
+               role.smap_andnot_wp = 0;
+       }
        role.access = access;
        if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) {
                quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));