linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] KVM: x86: Fix and cleanup for recent AVIC changes
Date: Fri, 15 Oct 2021 16:15:37 +0000	[thread overview]
Message-ID: <YWmpKTk/7MOCzm15@google.com> (raw)
In-Reply-To: <ebf038b7b242dd19aba1e4adb6f4ef2701c53748.camel@redhat.com>

On Tue, Oct 12, 2021, Maxim Levitsky wrote:
> On Mon, 2021-10-11 at 16:58 +0000, Sean Christopherson wrote:
> > Argh, I forgot the memslot is still there, so the access won't be treated as MMIO
> > and thus won't end up in the MMIO cache.
> > 
> > So I agree that the code is functionally ok, but I'd still prefer to switch to
> > kvm_vcpu_apicv_active() so that this code is coherent with respect to the APICv
> > status at the time the fault occurred.
> > 
> > My objection to using kvm_apicv_activated() is that the result is completely
> > non-deterministic with respect to the vCPU's APICv status at the time of the
> > fault.  It works because all of the other mechanisms that are in place, e.g.
> > elevating the MMU notifier count, but the fact that the result is non-deterministic
> > means that using the per-vCPU status is also functionally ok.
> 
> The problem is that it is just not correct to use local AVIC enable state 
> to determine if we want to populate the SPTE or or just jump to the emulation.
> 
> 
> For example, assuming that the AVIC is now enabled on all vCPUs,
> we can have this scenario:
> 
>     vCPU0                                   vCPU1
>     =====                                   =====
> 
> - disable AVIC
> - VMRUN
>                                         - #NPT on AVIC MMIO access
>                                         - *stuck on something prior to the page fault code*
> - enable AVIC
> - VMRUN
>                                         - *still stuck on something prior to the page fault code*
> 
> - disable AVIC:
> 
>   - raise KVM_REQ_APICV_UPDATE request
> 					
>   - set global avic state to disable
> 
>   - zap the SPTE (does nothing, doesn't race
> 	with anything either)
> 
>   - handle KVM_REQ_APICV_UPDATE -
>     - disable vCPU0 AVIC
> 
> - VMRUN
> 					- *still stuck on something prior to the page fault code*
> 
>                                                             ...
>                                                             ...
>                                                             ...
> 
>                                         - now vCPU1 finally starts running the page fault code.
> 
>                                         - vCPU1 AVIC is still enabled 
>                                           (because vCPU1 never handled KVM_REQ_APICV_UPDATE),
>                                           so the page fault code will populate the SPTE.

But vCPU1 won't install the SPTE if it loses the race to acquire mmu_lock, because
kvm_zap_gfn_range() bumps the notifier sequence and so vCPU1 will retry the fault.
If vCPU1 wins the race, i.e. sees the same sequence number, then the zap is
guaranteed to find the newly-installed SPTE.

And IMO, retrying is the desired behavior.  Installing a SPTE based on the global
state works, but it's all kinds of weird to knowingly take an action the directly
contradicts the current vCPU state.

FWIW, I had gone so far as to type this up to handle the situation you described
before remembering the sequence interaction.

		/*
		 * If the APIC access page exists but is disabled, go directly
		 * to emulation without caching the MMIO access or creating a
		 * MMIO SPTE.  That way the cache doesn't need to be purged
		 * when the AVIC is re-enabled.
		 */
		if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
			/*
			 * Retry the fault if an APICv update is pending, as
			 * the kvm_zap_gfn_range() when APICv becomes inhibited
			 * may have already occurred, in which case installing
			 * a SPTE would be incorrect.
			 */
			if (!kvm_vcpu_apicv_active(vcpu)) {
				*r = RET_PF_EMULATE;
				return true;
			} else if (kvm_test_request(KVM_REQ_APICV_UPDATE, vcpu)) {
				*r = RET_PF_RETRY;
				return true;
			}
		}

>                                         - handle KVM_REQ_APICV_UPDATE
>                                            - finally disable vCPU1 AVIC
> 
>                                         - VMRUN (vCPU1 AVIC disabled, SPTE populated)
> 
> 					                 ***boom***

  reply	other threads:[~2021-10-15 16:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-09  1:01 [PATCH 0/2] KVM: x86: Fix and cleanup for recent AVIC changes Sean Christopherson
2021-10-09  1:01 ` [PATCH 1/2] KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS memslot Sean Christopherson
2021-10-10 12:47   ` Maxim Levitsky
2021-10-09  1:01 ` [PATCH 2/2] KVM: x86: Simplify APICv update request logic Sean Christopherson
2021-10-10 12:49   ` Maxim Levitsky
2021-10-11 17:55     ` Sean Christopherson
2021-10-10 12:37 ` [PATCH 0/2] KVM: x86: Fix and cleanup for recent AVIC changes Maxim Levitsky
2021-10-11 14:27   ` Sean Christopherson
2021-10-11 16:58     ` Sean Christopherson
2021-10-12  9:53       ` Maxim Levitsky
2021-10-15 16:15         ` Sean Christopherson [this message]
2021-10-15 16:23           ` Paolo Bonzini
2021-10-15 16:36             ` Sean Christopherson
2021-10-15 17:50               ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWmpKTk/7MOCzm15@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).