All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Gardon <bgardon@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: kvm <kvm@vger.kernel.org>, Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>
Subject: Re: [PATCH] KVM: x86/mmu: Fix some return value error in kvm_tdp_mmu_map()
Date: Tue, 4 May 2021 09:45:55 -0700	[thread overview]
Message-ID: <CANgfPd-3a1a4se--+M6fCnfXP0kbbxqpKrv18JVA3UFcxrZ_3g@mail.gmail.com> (raw)
In-Reply-To: <e5814ecab90a3df04ea862dd31927a8f9275af77.camel@intel.com>

On Mon, May 3, 2021 at 4:32 PM Kai Huang <kai.huang@intel.com> wrote:
>
> On Mon, 2021-05-03 at 10:07 -0700, Ben Gardon wrote:
> > On Fri, Apr 30, 2021 at 9:03 AM Kai Huang <kai.huang@intel.com> wrote:
> > >
> > > There are couple of issues in current tdp_mmu_map_handle_target_level()
> > > regarding to return value which reflects page fault handler's behavior
> > > -- whether it truely fixed page fault, or fault was suprious, or fault
> > > requires emulation, etc:
> > >
> > > 1) Currently tdp_mmu_map_handle_target_level() return 0, which is
> > >    RET_PF_RETRY, when page fault is actually fixed.  This makes
> > >    kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
> > >    RET_PF_FIXED.
> >
> > Ooof that was an oversight. Thank you for catching that.
>
> Thanks for reviewing.
>
> >
> > >
> > > 2) When page fault is spurious, tdp_mmu_map_handle_target_level()
> > >    currently doesn't return immediately.  This is not correct, since it
> > >    may, for instance, lead to double emulation for a single instruction.
> >
> > Could you please add an example of what would be required for this to
> > happen? What effect would it have?
> > I don't doubt you're correct on this point, just having a hard time
> > pinpointing where the issue is.
>
> Hmm.. After reading your reply, I think I wasn't thinking correctly about the emulation
> part :)
>
> I was thinking the case that two threads simultaneously write to video ram (which is write
> protected and requires emulation) which has been swapped out, in which case one thread
> will succeed with setting up the mapping, and the other will get atomic exchange failure.
> Since both threads are trying to setup the same mapping, I thought in this case for the
> second thread (that gets atomic exchange failure) should just give up. But reading code
> again, and with your reply, I think the right behavior is, actually both threads need to
> do the emulation, because this is the correct behavior.'
>
> That being said, I still believe that for spurious fault, we should return immediately
> (otherwise it is not spurious fault). But I now also believe the spurious fault check in
> existing code happens too early -- it has to be after write protection emulation check.
> And I just checked the mmu_set_spte() code, if I read correctly, it exactly puts spurious
> fault check after write protection emulation check.
>
> Does this make sense?

Yeah, that makes sense. Having to move the emulation check after the
cmpxchg always felt a little weird to me Though I still think it makes
sense since the cmpxchg can fail.

>
> If this looks good to you, I guess it would be better to split this patch into smaller
> patches (for instance, one patch to handle case 1), and one to handle spurious fault
> change)?

That sounds good to me. That would definitely make it easier to review.

>
> >
> > >
> > > 3) One case of spurious fault is missing: when iter->old_spte is not
> > >    REMOVED_SPTE, but still tdp_mmu_set_spte_atomic() fails on atomic
> > >    exchange. This case means the page fault has already been handled by
> > >    another thread, and RET_PF_SPURIOUS should be returned. Currently
> > >    this case is not distinguished with iter->old_spte == REMOVED_SPTE
> > >    case, and RET_PF_RETRY is returned.
> >
> > See comment on this point in the code below.
> >
> > >
> > > Fix 1) by initializing ret to RET_PF_FIXED at beginning. Fix 2) & 3) by
> > > explicitly adding is_removed_spte() check at beginning, and return
> > > RET_PF_RETRY when it is true.  For other two cases (old spte equals to
> > > new spte, and tdp_mmu_set_spte_atomic() fails), return RET_PF_SPURIOUS
> > > immediately.
> > >
> > > Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler")
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  arch/x86/kvm/mmu/tdp_mmu.c | 12 +++++++-----
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > index 84ee1a76a79d..a4dc7c9a4ebb 100644
> > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > @@ -905,9 +905,12 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
> > >                                           kvm_pfn_t pfn, bool prefault)
> > >  {
> > >         u64 new_spte;
> > > -       int ret = 0;
> > > +       int ret = RET_PF_FIXED;
> > >         int make_spte_ret = 0;
> > >
> > > +       if (is_removed_spte(iter->old_spte))
> > > +               return RET_PF_RETRY;
> > > +
> > >         if (unlikely(is_noslot_pfn(pfn)))
> > >                 new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
> > >         else
> > > @@ -916,10 +919,9 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
> > >                                          map_writable, !shadow_accessed_mask,
> > >                                          &new_spte);
> > >
> > > -       if (new_spte == iter->old_spte)
> > > -               ret = RET_PF_SPURIOUS;
> > > -       else if (!tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
> > > -               return RET_PF_RETRY;
> > > +       if (new_spte == iter->old_spte ||
> > > +                       !tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
> > > +               return RET_PF_SPURIOUS;
> >
> >
> > I'm not sure this is quite right. In mmu_set_spte, I see the following comment:
> >
> > /*
> > * The fault is fully spurious if and only if the new SPTE and old SPTE
> > * are identical, and emulation is not required.
> > */
> >
> > Based on that comment, I think the existing code is correct. Further,
> > if the cmpxchg fails, we have no guarantee that the value that was
> > inserted instead resolved the page fault. For example, if two threads
> > try to fault in a large page, but one access is a write and the other
> > an instruction fetch, the thread with the write might lose the race to
> > install its leaf SPTE to the instruction fetch thread installing a
> > non-leaf SPTE for NX hugepages. In that case the fault might not be
> > spurious and a retry could be needed.
>
> Right. Thanks for educating me :)
>
> >
> > We could do what the fast PF handler does and check
> > is_access_allowed(error_code, spte) with whatever value we lost the
> > cmpxchg to, but I don't know if that's worth doing or not. There's not
> > really much control flow difference between the two return values, as
> > far as I can tell.
> >
> > It looks like we might also be incorrectly incrementing pf_fixed on a
> > spurious PF.
>
> Yes I also think for spurious fault we should return immediately.
>
> >
> > It might be more interesting and accurate to introduce a separate
> > return value for cmpxchg failures, just to see how often vCPUs
> > actually collide like that.
>
> It might be too complicated I guess, and probably won't worth doing that (and given my
> double emulation is wrong).

Maybe something as simple as a stat would actually be a better way to
track that. No hurry to add such a stat though.

>
> >
>

  reply	other threads:[~2021-05-04 16:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30 16:01 [PATCH] KVM: x86/mmu: Fix some return value error in kvm_tdp_mmu_map() Kai Huang
2021-05-03 17:07 ` Ben Gardon
2021-05-03 23:32   ` Kai Huang
2021-05-04 16:45     ` Ben Gardon [this message]
2021-05-04 21:54       ` Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANgfPd-3a1a4se--+M6fCnfXP0kbbxqpKrv18JVA3UFcxrZ_3g@mail.gmail.com \
    --to=bgardon@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.