All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org,
	"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" 
	<linux-kernel@vger.kernel.org>, Jim Mattson <jmattson@google.com>,
	Joerg Roedel <joro@8bytes.org>, Borislav Petkov <bp@alien8.de>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH v2 8/8] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use
Date: Tue, 27 Jul 2021 16:05:48 +0300	[thread overview]
Message-ID: <714b56eb83e94aca19e35a8c258e6f28edc0a60d.camel@redhat.com> (raw)
In-Reply-To: <YPnBxHwMJkTSBHfC@google.com>

On Thu, 2021-07-22 at 19:06 +0000, Sean Christopherson wrote:
> +Ben
> 
> On Thu, Jul 22, 2021, Maxim Levitsky wrote:
> > On Mon, 2021-07-19 at 18:49 +0000, Sean Christopherson wrote:
> > > On Sun, Jul 18, 2021, Maxim Levitsky wrote:
> > > > I am more inclined to fix this by just tracking if we hold the srcu
> > > > lock on each VCPU manually, just as we track the srcu index anyway,
> > > > and then kvm_request_apicv_update can use this to drop the srcu
> > > > lock when needed.
> > > 
> > > The entire approach of dynamically adding/removing the memslot seems doomed to
> > > failure, and is likely responsible for the performance issues with AVIC, e.g. a
> > > single vCPU temporarily inhibiting AVIC will zap all SPTEs _twice_; on disable
> > > and again on re-enable.
> > > 
> > > Rather than pile on more gunk, what about special casing the APIC access page
> > > memslot in try_async_pf()?  E.g. zap the GFN in avic_update_access_page() when
> > > disabling (and bounce through kvm_{inc,dec}_notifier_count()), and have the page
> > > fault path skip directly to MMIO emulation without caching the MMIO info.  It'd
> > > also give us a good excuse to rename try_async_pf() :-)
> > > 
> > > If lack of MMIO caching is a performance problem, an alternative solution would
> > > be to allow caching but add a helper to zap the MMIO SPTE and request all vCPUs to
> > > clear their cache.
> > > 
> > > It's all a bit gross, especially hijacking the mmu_notifier path, but IMO it'd be
> > > less awful than the current memslot+SRCU mess.
> > 
> > Hi!
> > 
> > I am testing your approach and it actually works very well! I can't seem to break it.
> > 
> > Could you explain why do I need to do something with kvm_{inc,dec}_notifier_count()) ?
> 
> Glad you asked, there's one more change needed.  kvm_zap_gfn_range() currently
> takes mmu_lock for read, but it needs to take mmu_lock for write for this case
> (more way below).
> 
> The existing users, update_mtrr() and kvm_post_set_cr0(), are a bit sketchy.  The
> whole thing is a grey area because KVM is trying to ensure it honors the guest's
> UC memtype for non-coherent DMA, but the inputs (CR0 and MTRRs) are per-vCPU,
> i.e. for it to work correctly, the guest has to ensure all running vCPUs do the
> same transition.  So in practice there's likely no observable bug, but it also
> means that taking mmu_lock for read is likely pointless, because for things to
> work the guest has to serialize all running vCPUs.
> 
> Ben, any objection to taking mmu_lock for write in kvm_zap_gfn_range()?  It would
> effectively revert commit 6103bc074048 ("KVM: x86/mmu: Allow zap gfn range to
> operate under the mmu read lock"); see attached patch.  And we could even bump
> the notifier count in that helper, e.g. on top of the attached:
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b607e8763aa2..7174058e982b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5568,6 +5568,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> 
>         write_lock(&kvm->mmu_lock);
> 
> +       kvm_inc_notifier_count(kvm, gfn_start, gfn_end);
> +
>         if (kvm_memslots_have_rmaps(kvm)) {
>                 for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>                         slots = __kvm_memslots(kvm, i);
> @@ -5598,6 +5600,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>         if (flush)
>                 kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> 
> +       kvm_dec_notifier_count(kvm, gfn_start, gfn_end);
> +
>         write_unlock(&kvm->mmu_lock);
>  }
> 

I understand what you mean now. I thought that I need to change to code of the
kvm_inc_notifier_count/kvm_dec_notifier_count.




> 
> 
> 
> Back to Maxim's original question...
> 
> Elevating mmu_notifier_count and bumping mmu_notifier_seq will will handle the case
> where APICv is being disabled while a different vCPU is concurrently faulting in a
> new mapping for the APIC page.  E.g. it handles this race:
> 
>  vCPU0                                 vCPU1
>                                        apic_access_memslot_enabled = true;
>  			               #NPF on APIC
> 			               apic_access_memslot_enabled==true, proceed with #NPF
>  apic_access_memslot_enabled = false 
>  kvm_zap_gfn_range(APIC);
>                                        __direct_map(APIC)
> 
>  mov [APIC], 0 <-- succeeds, but KVM wants to intercept to emulate

I understand this now. I guess this can't happen with original memslot disable
which I guess has the needed locking and flushing to avoid this.
(I didnt' study the code in depth thought)

> 
> 
> 
> The elevated mmu_notifier_count and/or changed mmu_notifier_seq will cause vCPU1
> to bail and resume the guest without fixing the #NPF.  After acquiring mmu_lock,
> vCPU1 will see the elevated mmu_notifier_count (if kvm_zap_gfn_range() is about
> to be called, or just finised) and/or a modified mmu_notifier_seq (after the
> count was decremented).
> 
> This is why kvm_zap_gfn_range() needs to take mmu_lock for write.  If it's allowed
> to run in parallel with the page fault handler, there's no guarantee that the
> correct apic_access_memslot_enabled will be observed.

I understand now.

So, Paolo, Ben Gardon, what do you think. Do you think this approach is feasable?
Do you agree to revert the usage of the read lock?

I will post a new series using this approach very soon, since I already have
msot of the code done.

Best regards,
	Maxim Levitsky

> 
> 	if (is_tdp_mmu_fault)
> 		read_lock(&vcpu->kvm->mmu_lock);
> 	else
> 		write_lock(&vcpu->kvm->mmu_lock);
> 
> 	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) <--- look here!
> 		goto out_unlock;
> 
> 	if (is_tdp_mmu_fault)
> 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
> 				    pfn, prefault);
> 	else
> 		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
> 				 prefault, is_tdp);



  reply	other threads:[~2021-07-27 13:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-13 14:20 [PATCH v2 0/8] My AVIC patch queue Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 1/8] KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 2/8] KVM: SVM: tweak warning about enabled AVIC on nested entry Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 3/8] KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 4/8] KVM: x86: APICv: drop immediate APICv disablement on current vCPU Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 5/8] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM Maxim Levitsky
2021-07-26 22:34   ` Paolo Bonzini
2021-07-27 13:22     ` Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 6/8] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 7/8] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC Maxim Levitsky
2021-07-13 14:20 ` [PATCH v2 8/8] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
2021-07-18 12:13   ` Maxim Levitsky
2021-07-19  7:47     ` Vitaly Kuznetsov
2021-07-19  9:00       ` Maxim Levitsky
2021-07-19  9:23         ` Vitaly Kuznetsov
2021-07-19  9:58           ` Maxim Levitsky
2021-07-19 18:49     ` Sean Christopherson
2021-07-20  9:40       ` Maxim Levitsky
2021-07-22  9:12       ` KVM's support for non default APIC base Maxim Levitsky
2021-08-02  9:20         ` Maxim Levitsky
2021-08-06 21:55         ` Sean Christopherson
2021-08-09  9:40           ` Maxim Levitsky
2021-08-09 15:57             ` Sean Christopherson
2021-08-09 16:47             ` Jim Mattson
2021-08-10 20:42               ` Maxim Levitsky
2021-07-22 17:35       ` [PATCH v2 8/8] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
2021-07-22 19:06         ` Sean Christopherson
2021-07-27 13:05           ` Maxim Levitsky [this message]
2021-07-27 17:48             ` Ben Gardon
2021-07-27 18:17               ` Sean Christopherson
2021-07-29 14:10                 ` Maxim Levitsky
2021-07-26 17:24 ` [PATCH v2 0/8] My AVIC patch queue Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=714b56eb83e94aca19e35a8c258e6f28edc0a60d.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=bgardon@google.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.