Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()

From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, bgardon@google.com,
	vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com,
	joro@8bytes.org
Subject: Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
Date: Wed, 5 May 2021 17:16:25 +0000	[thread overview]
Message-ID: <YJLS6cUghgktsMNJ@google.com> (raw)
In-Reply-To: <YJLH4Iyz4imfY0c2@google.com>

On Wed, May 05, 2021, Sean Christopherson wrote:
> On Wed, May 05, 2021, Kai Huang wrote:
> > Currently pf_fixed is increased even when page fault requires emulation,
> > or fault is spurious.  Fix by only increasing it when return value is
> > RET_PF_FIXED.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 1cad4c9f7c34..debe8c3ec844 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
> >  				       rcu_dereference(iter->sptep));
> >  	}
> >  
> > -	if (!prefault)
> > +	if (!prefault && ret == RET_PF_FIXED)
> >  		vcpu->stat.pf_fixed++;
> For RET_PF_EMULATE, I could go either way.  On one hand, KVM is installing a
> translation that accelerates future emulated MMIO, so it's kinda sorta fixing
> the page fault.  On the other handle, future emulated MMIO still page faults, it
> just gets handled without going through the full page fault handler.

Hrm, the other RET_PF_EMULATE case is when KVM creates a _new_ SPTE to handle a
page fault, but installs a read-only SPTE on a write fault because the page is
marked for write access tracking, e.g. for non-leaf guest page tables.  Bumping
pf_fixed is arguably correct in that case since KVM did fault in a page and from
the guest's perspective the page fault was fixed, it's just that "fixing" the
fault involved a bit of instruction emulation.

> If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should
> be done for all flavors of MMU.  For this patch, the correct code is:
> 
> 	if (ret != RET_PF_SPURIOUS)
> 		vcpu->stat.pf_fixed++;
> 
> which works because "ret" cannot be RET_PF_RETRY.
> 
> Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast
> page fault case.  So, if we decide to fix/adjust RET_PF_EMULATE, I think it would
> make sense to handle stat.pf_fixed in a common location.

Blech.  My original thought was to move the stat.pf_fixed logic all the way out
to kvm_mmu_do_page_fault(), but that would do the wrong thing if pf_fixed is
bumped on RET_PF_EMULATE and page_fault_handle_page_track() returns RET_PF_EMULATE.
That fast path handles the case where the guest gets a !WRITABLE page fault on
an PRESENT SPTE that KVM is write tracking.  *sigh*.

I'm leaning towards making RET_PF_EMULATE a modifier instead of a standalone
type, which would allow more precise pf_fixed adjustments and would also let us
clean up the EMULTYPE_ALLOW_RETRY_PF logic, which has a rather gross check for
detecting MMIO page faults.

> The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong,
> but it's pretty much guaranteed to be a waste of time since prefetching only
> installs SPTEs if there is a backing memslot and PFN.
> 
> Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change
> into a separate mini-series to address the other issues I pointed out.  I was
> hoping I could paste patches for them inline and let you carry them, but moving
> stat.pf_fixed handling to a common location requires additional code shuffling
> because of async page faults :-/

Cancel that idea, given the twisty mess of RET_PF_EMULATE it's probably best for
you to just send a new version of your patch to make the TDP MMU pf_fixed behavior
match the legacy MMU.  It doesn't make sense to hold up a trivial fix just so I
can scratch a refactoring itch :-)