linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org,
	"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" 
	<linux-kernel@vger.kernel.org>, Jim Mattson <jmattson@google.com>,
	Joerg Roedel <joro@8bytes.org>, Borislav Petkov <bp@alien8.de>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH v2 8/8] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use
Date: Tue, 27 Jul 2021 16:05:48 +0300	[thread overview]
Message-ID: <714b56eb83e94aca19e35a8c258e6f28edc0a60d.camel@redhat.com> (raw)
In-Reply-To: <YPnBxHwMJkTSBHfC@google.com>

On Thu, 2021-07-22 at 19:06 +0000, Sean Christopherson wrote:
> +Ben
> 
> On Thu, Jul 22, 2021, Maxim Levitsky wrote:
> > On Mon, 2021-07-19 at 18:49 +0000, Sean Christopherson wrote:
> > > On Sun, Jul 18, 2021, Maxim Levitsky wrote:
> > > > I am more inclined to fix this by just tracking if we hold the srcu
> > > > lock on each VCPU manually, just as we track the srcu index anyway,
> > > > and then kvm_request_apicv_update can use this to drop the srcu
> > > > lock when needed.
> > > 
> > > The entire approach of dynamically adding/removing the memslot seems doomed to
> > > failure, and is likely responsible for the performance issues with AVIC, e.g. a
> > > single vCPU temporarily inhibiting AVIC will zap all SPTEs _twice_; on disable
> > > and again on re-enable.
> > > 
> > > Rather than pile on more gunk, what about special casing the APIC access page
> > > memslot in try_async_pf()?  E.g. zap the GFN in avic_update_access_page() when
> > > disabling (and bounce through kvm_{inc,dec}_notifier_count()), and have the page
> > > fault path skip directly to MMIO emulation without caching the MMIO info.  It'd
> > > also give us a good excuse to rename try_async_pf() :-)
> > > 
> > > If lack of MMIO caching is a performance problem, an alternative solution would
> > > be to allow caching but add a helper to zap the MMIO SPTE and request all vCPUs to
> > > clear their cache.
> > > 
> > > It's all a bit gross, especially hijacking the mmu_notifier path, but IMO it'd be
> > > less awful than the current memslot+SRCU mess.
> > 
> > Hi!
> > 
> > I am testing your approach and it actually works very well! I can't seem to break it.
> > 
> > Could you explain why do I need to do something with kvm_{inc,dec}_notifier_count()) ?
> 
> Glad you asked, there's one more change needed.  kvm_zap_gfn_range() currently
> takes mmu_lock for read, but it needs to take mmu_lock for write for this case
> (more way below).
> 
> The existing users, update_mtrr() and kvm_post_set_cr0(), are a bit sketchy.  The
> whole thing is a grey area because KVM is trying to ensure it honors the guest's
> UC memtype for non-coherent DMA, but the inputs (CR0 and MTRRs) are per-vCPU,
> i.e. for it to work correctly, the guest has to ensure all running vCPUs do the
> same transition.  So in practice there's likely no observable bug, but it also
> means that taking mmu_lock for read is likely pointless, because for things to
> work the guest has to serialize all running vCPUs.
> 
> Ben, any objection to taking mmu_lock for write in kvm_zap_gfn_range()?  It would
> effectively revert commit 6103bc074048 ("KVM: x86/mmu: Allow zap gfn range to
> operate under the mmu read lock"); see attached patch.  And we could even bump
> the notifier count in that helper, e.g. on top of the attached:
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b607e8763aa2..7174058e982b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5568,6 +5568,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> 
>         write_lock(&kvm->mmu_lock);
> 
> +       kvm_inc_notifier_count(kvm, gfn_start, gfn_end);
> +
>         if (kvm_memslots_have_rmaps(kvm)) {
>                 for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>                         slots = __kvm_memslots(kvm, i);
> @@ -5598,6 +5600,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>         if (flush)
>                 kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> 
> +       kvm_dec_notifier_count(kvm, gfn_start, gfn_end);
> +
>         write_unlock(&kvm->mmu_lock);
>  }
> 

I understand what you mean now. I thought that I need to change to code of the
kvm_inc_notifier_count/kvm_dec_notifier_count.




> 
> 
> 
> Back to Maxim's original question...
> 
> Elevating mmu_notifier_count and bumping mmu_notifier_seq will will handle the case
> where APICv is being disabled while a different vCPU is concurrently faulting in a
> new mapping for the APIC page.  E.g. it handles this race:
> 
>  vCPU0                                 vCPU1
>                                        apic_access_memslot_enabled = true;
>  			               #NPF on APIC
> 			               apic_access_memslot_enabled==true, proceed with #NPF
>  apic_access_memslot_enabled = false 
>  kvm_zap_gfn_range(APIC);
>                                        __direct_map(APIC)
> 
>  mov [APIC], 0 <-- succeeds, but KVM wants to intercept to emulate

I understand this now. I guess this can't happen with original memslot disable
which I guess has the needed locking and flushing to avoid this.
(I didnt' study the code in depth thought)

> 
> 
> 
> The elevated mmu_notifier_count and/or changed mmu_notifier_seq will cause vCPU1
> to bail and resume the guest without fixing the #NPF.  After acquiring mmu_lock,
> vCPU1 will see the elevated mmu_notifier_count (if kvm_zap_gfn_range() is about
> to be called, or just finised) and/or a modified mmu_notifier_seq (after the
> count was decremented).
> 
> This is why kvm_zap_gfn_range() needs to take mmu_lock for write.  If it's allowed
> to run in parallel with the page fault handler, there's no guarantee that the
> correct apic_access_memslot_enabled will be observed.

I understand now.

So, Paolo, Ben Gardon, what do you think. Do you think this approach is feasable?
Do you agree to revert the usage of the read lock?

I will post a new series using this approach very soon, since I already have
msot of the code done.

Best regards,
	Maxim Levitsky

> 
> 	if (is_tdp_mmu_fault)
> 		read_lock(&vcpu->kvm->mmu_lock);
> 	else
> 		write_lock(&vcpu->kvm->mmu_lock);
> 
> 	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) <--- look here!
> 		goto out_unlock;
> 
> 	if (is_tdp_mmu_fault)
> 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
> 				    pfn, prefault);
> 	else
> 		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
> 				 prefault, is_tdp);



  reply	other threads:[~2021-07-27 13:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-13 14:20 [PATCH v2 0/8] My AVIC patch queue Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 1/8] KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 2/8] KVM: SVM: tweak warning about enabled AVIC on nested entry Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 3/8] KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 4/8] KVM: x86: APICv: drop immediate APICv disablement on current vCPU Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 5/8] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM Maxim Levitsky
2021-07-26 22:34   ` Paolo Bonzini
2021-07-27 13:22     ` Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 6/8] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 7/8] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 8/8] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
2021-07-18 12:13   ` Maxim Levitsky
2021-07-19  7:47     ` Vitaly Kuznetsov
2021-07-19  9:00       ` Maxim Levitsky
2021-07-19  9:23         ` Vitaly Kuznetsov
2021-07-19  9:58           ` Maxim Levitsky
2021-07-19 18:49     ` Sean Christopherson
2021-07-20  9:40       ` Maxim Levitsky
2021-07-22  9:12       ` KVM's support for non default APIC base Maxim Levitsky
2021-08-02  9:20         ` Maxim Levitsky
2021-08-06 21:55         ` Sean Christopherson
2021-08-09  9:40           ` Maxim Levitsky
2021-08-09 15:57             ` Sean Christopherson
2021-08-09 16:47             ` Jim Mattson
2021-08-10 20:42               ` Maxim Levitsky
2021-07-22 17:35       ` [PATCH v2 8/8] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
2021-07-22 19:06         ` Sean Christopherson
2021-07-27 13:05           ` Maxim Levitsky [this message]
2021-07-27 17:48             ` Ben Gardon
2021-07-27 18:17               ` Sean Christopherson
2021-07-29 14:10                 ` Maxim Levitsky
2021-07-26 17:24 ` [PATCH v2 0/8] My AVIC patch queue Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=714b56eb83e94aca19e35a8c258e6f28edc0a60d.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=bgardon@google.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).