linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] KVM: x86: APICv cleanups
@ 2021-10-22  0:49 Sean Christopherson
  2021-10-22  0:49 ` [PATCH v2 1/4] KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS memslot Sean Christopherson
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-22  0:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

APICv cleanups and a dissertation on handling concurrent APIC access page
faults and APICv inhibit updates.

I've tested this but haven't hammered the AVIC stuff, I'd appreciate it if
someone with the Hyper-V setup can beat on the AVIC toggling.

Sean Christopherson (4):
  KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS
    memslot
  KVM: x86: Move SVM's APICv sanity check to common x86
  KVM: x86: Move apicv_active flag from vCPU to in-kernel local APIC
  KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism

 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/hyperv.c           |  4 +--
 arch/x86/kvm/lapic.c            | 46 ++++++++++---------------
 arch/x86/kvm/lapic.h            |  5 +--
 arch/x86/kvm/mmu/mmu.c          | 29 ++++++++++++++--
 arch/x86/kvm/svm/avic.c         |  2 +-
 arch/x86/kvm/svm/svm.c          |  2 --
 arch/x86/kvm/vmx/vmx.c          |  4 +--
 arch/x86/kvm/x86.c              | 59 ++++++++++++++++++++++-----------
 9 files changed, 93 insertions(+), 61 deletions(-)

-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/4] KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS memslot
  2021-10-22  0:49 [PATCH v2 0/4] KVM: x86: APICv cleanups Sean Christopherson
@ 2021-10-22  0:49 ` Sean Christopherson
  2021-10-22  0:49 ` [PATCH v2 2/4] KVM: x86: Move SVM's APICv sanity check to common x86 Sean Christopherson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-22  0:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

Query the vCPU's APICv state, not the overall VM's state, when handling
a page fault that hit the APIC Access Page memslot.  While the vCPU
state can be stale with respect to the overall VM APICv state, it's at
least coherent with respect to the page fault being handled, i.e. is the
value of the vCPU's APICv enabling at the time of the fault.

Using the VM's state is not functionally broken, but neither does it
provide any meaningful advantages, and arguably retrying the faulting
instruction, as will happen if the vCPU state is stale (see below), is
semantically the desired behavior as it aligns with existing KVM MMU
behavior where a change in VM state invalidates the context for handling
a page fault.

The vCPU state can be stale if a different vCPU (or other actor) toggles
APICv state after the page fault exit from the guest, in which case KVM
will attempt to handle the page fault prior to servicing the pending
KVM_REQ_APICV_UPDATE.  However, by design such a race is benign in all
scenarios.

If APICv is globally enabled but locally disabled, the page fault handler
will go straight to emulation (emulation of the local APIC is ok even if
APIC virtualization is enabled while the guest is running).

If APICv is globally disabled but locally enabled, there are two possible
scenarios, and in each scenario the final outcome is correct: KVM blocks
all vCPUs until SPTEs for the APIC Access Page have been zapped.

  (1) the page fault acquires mmu_lock before the APICv update calls
      kvm_zap_gfn_range().  The page fault will install a "bad" SPTE, but
      that SPTE will never be consumed as (a) no vCPUs can be running in
      the guest due to the kick from KVM_REQ_APICV_UPDATE, and (b) the
      "bad" SPTE is guaranteed to be zapped by kvm_zap_gfn_range() before
      any vCPU re-enters the guest.  Because KVM_REQ_APICV_UPDATE is
      raised before the VM state is update, all vCPUs will block on
      apicv_update_lock when attempting to service the request until the
      lock is dropped by the update flow, _after_ the SPTE is zapped.

  (2) the APICv update kvm_zap_gfn_range() acquires mmu_lock before the
      page fault.  The page fault handler will bail due to the change in
      mmu_notifier_seq made by kvm_zap_gfn_range().

Arguably, KVM should not (attempt to) install a SPTE based on state that
is knowingly outdated, but using the per-VM state suffers a similar flaw
as not checking for a pending KVM_REQ_APICV_UPDATE means that KVM will
either knowingly install a SPTE that will get zapped (page fault wins the
race) or will get rejected (kvm_zap_gfn_range() wins the race).  However,
adding a check for KVM_REQ_APICV_UPDATE is extremely undesirable as it
implies the check is meaningful/sufficient, which is not the case since
the page fault handler doesn't hold mmu_lock when it checks APICv status,
i.e. the mmu_notifier_seq mechanism is required for correctness.

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f9f228963088..6530d874f5b7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3911,10 +3911,35 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		 * If the APIC access page exists but is disabled, go directly
 		 * to emulation without caching the MMIO access or creating a
 		 * MMIO SPTE.  That way the cache doesn't need to be purged
-		 * when the AVIC is re-enabled.
+		 * when the AVIC is re-enabled.  Note, the vCPU's APICv state
+		 * may be stale if an APICv update is in-progress, but KVM is
+		 * guaranteed to operate correctly in all scenarios:
+		 *
+		 * 1.  APICv is globally enabled but locally disabled.  This
+		 *     vCPU will go straight to emulation without touching the
+		 *     MMIO cache or installing a SPTE.
+		 *
+		 * 2a. APICv is globally disabled but locally enabled, and this
+		 *     vCPU acquires mmu_lock before the APICv update calls
+		 *     kvm_zap_gfn_range().  This vCPU will install a bad SPTE,
+		 *     but the SPTE will never be consumed as (a) no vCPUs can
+		 *     be running due to the kick from KVM_REQ_APICV_UPDATE,
+		 *     and (b) the "bad" SPTE is guaranteed to be zapped by
+		 *     kvm_zap_gfn_range() before any vCPU re-enters the guest.
+		 *     Because KVM_REQ_APICV_UPDATE is raised before the VM
+		 *     state is update, vCPUs will block on apicv_update_lock
+		 *     when attempting to service the request until the lock is
+		 *     dropped by the update flow, _after_ the SPTE is zapped.
+		 *
+		 * 2b. APICv is globally disabled but locally enabled, and the
+		 *     APICv update kvm_zap_gfn_range() acquires mmu_lock before
+		 *     this vCPU.  This vCPU will bail from the page fault
+		 *     handler due kvm_zap_gfn_range() bumping mmu_notifier_seq,
+		 *     and will retry the faulting instruction after servicing
+		 *     the KVM_REQ_APICV_UPDATE request.
 		 */
 		if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
-		    !kvm_apicv_activated(vcpu->kvm)) {
+		    !kvm_vcpu_apicv_active(vcpu)) {
 			*r = RET_PF_EMULATE;
 			return true;
 		}
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/4] KVM: x86: Move SVM's APICv sanity check to common x86
  2021-10-22  0:49 [PATCH v2 0/4] KVM: x86: APICv cleanups Sean Christopherson
  2021-10-22  0:49 ` [PATCH v2 1/4] KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS memslot Sean Christopherson
@ 2021-10-22  0:49 ` Sean Christopherson
  2021-10-22  0:49 ` [PATCH v2 3/4] KVM: x86: Move apicv_active flag from vCPU to in-kernel local APIC Sean Christopherson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-22  0:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

Move SVM's assertion that vCPU's APICv state is consistent with its VM's
state out of svm_vcpu_run() and into x86's common inner run loop.  The
assertion and underlying logic is not unique to SVM, it's just that SVM
has more inhibiting conditions and thus is more likely to run headfirst
into any KVM bugs.

Add relevant comments to document exactly why the update path has unusual
ordering between the update the kick, why said ordering is safe, and also
the basic rules behind the assertion in the run loop.

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c |  2 --
 arch/x86/kvm/x86.c     | 20 ++++++++++++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cee4915d2ce3..a2a4e9b42750 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3864,8 +3864,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	pre_svm_run(vcpu);
 
-	WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
-
 	sync_lapic_to_cr8(vcpu);
 
 	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f55654158836..8b9c06a6b2a3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9445,6 +9445,18 @@ void __kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 
 	if (!!old != !!new) {
 		trace_kvm_apicv_update_request(activate, bit);
+		/*
+		 * Kick all vCPUs before setting apicv_inhibit_reasons to avoid
+		 * false positives in the sanity check WARN in svm_vcpu_run().
+		 * This task will wait for all vCPUs to ack the kick IRQ before
+		 * updating apicv_inhibit_reasons, and all other vCPUs will
+		 * block on acquiring apicv_update_lock so that vCPUs can't
+		 * redo svm_vcpu_run() without seeing the new inhibit state.
+		 *
+		 * Note, holding apicv_update_lock and taking it in the read
+		 * side (handling the request) also prevents other vCPUs from
+		 * servicing the request with a stale apicv_inhibit_reasons.
+		 */
 		kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
 		kvm->arch.apicv_inhibit_reasons = new;
 		if (new) {
@@ -9779,6 +9791,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	}
 
 	for (;;) {
+		/*
+		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
+		 * update must kick and wait for all vCPUs before toggling the
+		 * per-VM state, and responsing vCPUs must wait for the update
+		 * to complete before servicing KVM_REQ_APICV_UPDATE.
+		 */
+		WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
+
 		exit_fastpath = static_call(kvm_x86_run)(vcpu);
 		if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
 			break;
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/4] KVM: x86: Move apicv_active flag from vCPU to in-kernel local APIC
  2021-10-22  0:49 [PATCH v2 0/4] KVM: x86: APICv cleanups Sean Christopherson
  2021-10-22  0:49 ` [PATCH v2 1/4] KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS memslot Sean Christopherson
  2021-10-22  0:49 ` [PATCH v2 2/4] KVM: x86: Move SVM's APICv sanity check to common x86 Sean Christopherson
@ 2021-10-22  0:49 ` Sean Christopherson
  2021-10-22  0:49 ` [PATCH v2 4/4] KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism Sean Christopherson
  2021-10-22 10:12 ` [PATCH v2 0/4] KVM: x86: APICv cleanups Paolo Bonzini
  4 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-22  0:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

Move apicv_active from kvm_vcpu_arch to kvm_lapic as it's valid if and only
if the local APIC is emulated/virtualized by KVM.  This cleans up a handful
of ugly flows where KVM "blindly" dereferences vcpu->apic after checking
apicv_active.  This also resolves the discrepancy where existence of the
local APIC (in KVM) is checked by kvm_vcpu_apicv_active(), but rarely in
flows that open code access apicv_active.

Opportunistically convert kvm_vcpu_apicv_active() to use lapic_in_kernel()
to elide the conditional branch when all vCPUs have an in-kernel APIC.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/lapic.c            | 46 +++++++++++++--------------------
 arch/x86/kvm/lapic.h            |  5 ++--
 arch/x86/kvm/svm/avic.c         |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  4 +--
 arch/x86/kvm/x86.c              | 27 +++++++++----------
 6 files changed, 36 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d41699e89d1f..f64eef86391d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -629,7 +629,6 @@ struct kvm_vcpu_arch {
 	u64 efer;
 	u64 apic_base;
 	struct kvm_lapic *apic;    /* kernel irqchip context */
-	bool apicv_active;
 	bool load_eoi_exitmap_pending;
 	DECLARE_BITMAP(ioapic_handled_vectors, 256);
 	unsigned long apic_attention;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 76fb00921203..1a2cf9c27d52 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -484,14 +484,10 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 
 static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
 {
-	struct kvm_vcpu *vcpu;
-
-	vcpu = apic->vcpu;
-
-	if (unlikely(vcpu->arch.apicv_active)) {
+	if (unlikely(apic->apicv_active)) {
 		/* need to update RVI */
 		kvm_lapic_clear_vector(vec, apic->regs + APIC_IRR);
-		static_call(kvm_x86_hwapic_irr_update)(vcpu,
+		static_call(kvm_x86_hwapic_irr_update)(apic->vcpu,
 				apic_find_highest_irr(apic));
 	} else {
 		apic->irr_pending = false;
@@ -509,20 +505,16 @@ EXPORT_SYMBOL_GPL(kvm_apic_clear_irr);
 
 static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
 {
-	struct kvm_vcpu *vcpu;
-
 	if (__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
 		return;
 
-	vcpu = apic->vcpu;
-
 	/*
 	 * With APIC virtualization enabled, all caching is disabled
 	 * because the processor can modify ISR under the hood.  Instead
 	 * just set SVI.
 	 */
-	if (unlikely(vcpu->arch.apicv_active))
-		static_call(kvm_x86_hwapic_isr_update)(vcpu, vec);
+	if (unlikely(apic->apicv_active))
+		static_call(kvm_x86_hwapic_isr_update)(apic->vcpu, vec);
 	else {
 		++apic->isr_count;
 		BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
@@ -556,12 +548,9 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 
 static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
 {
-	struct kvm_vcpu *vcpu;
 	if (!__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
 		return;
 
-	vcpu = apic->vcpu;
-
 	/*
 	 * We do get here for APIC virtualization enabled if the guest
 	 * uses the Hyper-V APIC enlightenment.  In this case we may need
@@ -569,8 +558,8 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
 	 * on the other hand isr_count and highest_isr_cache are unused
 	 * and must be left alone.
 	 */
-	if (unlikely(vcpu->arch.apicv_active))
-		static_call(kvm_x86_hwapic_isr_update)(vcpu,
+	if (unlikely(apic->apicv_active))
+		static_call(kvm_x86_hwapic_isr_update)(apic->vcpu,
 						apic_find_highest_isr(apic));
 	else {
 		--apic->isr_count;
@@ -707,7 +696,7 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
 static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
 {
 	int highest_irr;
-	if (apic->vcpu->arch.apicv_active)
+	if (apic->apicv_active)
 		highest_irr = static_call(kvm_x86_sync_pir_to_irr)(apic->vcpu);
 	else
 		highest_irr = apic_find_highest_irr(apic);
@@ -1541,7 +1530,7 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 		int vec = reg & APIC_VECTOR_MASK;
 		void *bitmap = apic->regs + APIC_ISR;
 
-		if (vcpu->arch.apicv_active)
+		if (apic->apicv_active)
 			bitmap = apic->regs + APIC_IRR;
 
 		if (apic_test_vector(vec, bitmap))
@@ -1658,7 +1647,7 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool from_timer_fn)
 	if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
 		ktimer->expired_tscdeadline = ktimer->tscdeadline;
 
-	if (!from_timer_fn && vcpu->arch.apicv_active) {
+	if (!from_timer_fn && apic->apicv_active) {
 		WARN_ON(kvm_get_running_vcpu() != vcpu);
 		kvm_apic_inject_pending_timer_irqs(apic);
 		return;
@@ -2303,11 +2292,9 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 		pr_warn_once("APIC base relocation is unsupported by KVM");
 }
 
-void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
+void kvm_apic_update_apicv(struct kvm_lapic *apic)
 {
-	struct kvm_lapic *apic = vcpu->arch.apic;
-
-	if (vcpu->arch.apicv_active) {
+	if (apic->apicv_active) {
 		/* irr_pending is always true when apicv is activated. */
 		apic->irr_pending = true;
 		apic->isr_count = 1;
@@ -2367,14 +2354,14 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
 		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
-	kvm_apic_update_apicv(vcpu);
+	kvm_apic_update_apicv(apic);
 	apic->highest_isr_cache = -1;
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
 
 	vcpu->arch.pv_eoi.msr_val = 0;
 	apic_update_ppr(apic);
-	if (vcpu->arch.apicv_active) {
+	if (apic->apicv_active) {
 		static_call(kvm_x86_apicv_post_state_restore)(vcpu);
 		static_call(kvm_x86_hwapic_irr_update)(vcpu, -1);
 		static_call(kvm_x86_hwapic_isr_update)(vcpu, -1);
@@ -2484,6 +2471,9 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 	static_branch_inc(&apic_sw_disabled.key); /* sw disabled at reset */
 	kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
 
+	if (kvm_apicv_activated(vcpu->kvm))
+		apic->apicv_active = true;
+
 	return 0;
 nomem_free_apic:
 	kfree(apic);
@@ -2632,9 +2622,9 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 	update_divide_count(apic);
 	__start_apic_timer(apic, APIC_TMCCT);
 	kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
-	kvm_apic_update_apicv(vcpu);
+	kvm_apic_update_apicv(apic);
 	apic->highest_isr_cache = -1;
-	if (vcpu->arch.apicv_active) {
+	if (apic->apicv_active) {
 		static_call(kvm_x86_apicv_post_state_restore)(vcpu);
 		static_call(kvm_x86_hwapic_irr_update)(vcpu,
 				apic_find_highest_irr(apic));
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index d7c25d0c1354..053815507921 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -49,6 +49,7 @@ struct kvm_lapic {
 	struct kvm_timer lapic_timer;
 	u32 divide_count;
 	struct kvm_vcpu *vcpu;
+	bool apicv_active;
 	bool sw_enabled;
 	bool irr_pending;
 	bool lvt0_in_nmi_mode;
@@ -98,7 +99,7 @@ void kvm_apic_update_ppr(struct kvm_vcpu *vcpu);
 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 		     struct dest_map *dest_map);
 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
-void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
+void kvm_apic_update_apicv(struct kvm_lapic *apic);
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
@@ -212,7 +213,7 @@ static inline int apic_x2apic_mode(struct kvm_lapic *apic)
 
 static inline bool kvm_vcpu_apicv_active(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.apic && vcpu->arch.apicv_active;
+	return lapic_in_kernel(vcpu) && vcpu->arch.apic->apicv_active;
 }
 
 static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8052d92069e0..e2da28fc5072 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -668,7 +668,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 
 int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 {
-	if (!vcpu->arch.apicv_active)
+	if (!kvm_vcpu_apicv_active(vcpu))
 		return -1;
 
 	kvm_lapic_set_irr(vec, vcpu->arch.apic);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f677e72d864..014ce6ad8c9c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4004,7 +4004,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 	if (!r)
 		return 0;
 
-	if (!vcpu->arch.apicv_active)
+	if (!kvm_vcpu_apicv_active(vcpu))
 		return -1;
 
 	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
@@ -6316,7 +6316,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	int max_irr;
 	bool max_irr_updated;
 
-	if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))
+	if (KVM_BUG_ON(!kvm_vcpu_apicv_active(vcpu), vcpu->kvm))
 		return -EIO;
 
 	if (pi_test_on(&vmx->pi_desc)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8b9c06a6b2a3..03103b69e8a6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4361,7 +4361,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
-	if (vcpu->arch.apicv_active)
+	if (kvm_vcpu_apicv_active(vcpu))
 		static_call(kvm_x86_sync_pir_to_irr)(vcpu);
 
 	return kvm_apic_get_state(vcpu, s);
@@ -8945,10 +8945,7 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu)
 	if (!kvm_x86_ops.update_cr8_intercept)
 		return;
 
-	if (!lapic_in_kernel(vcpu))
-		return;
-
-	if (vcpu->arch.apicv_active)
+	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
 
 	if (!vcpu->arch.apic->vapic_addr)
@@ -9399,6 +9396,7 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
 
 void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 {
+	struct kvm_lapic *apic = vcpu->arch.apic;
 	bool activate;
 
 	if (!lapic_in_kernel(vcpu))
@@ -9407,11 +9405,11 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	mutex_lock(&vcpu->kvm->arch.apicv_update_lock);
 
 	activate = kvm_apicv_activated(vcpu->kvm);
-	if (vcpu->arch.apicv_active == activate)
+	if (apic->apicv_active == activate)
 		goto out;
 
-	vcpu->arch.apicv_active = activate;
-	kvm_apic_update_apicv(vcpu);
+	apic->apicv_active = activate;
+	kvm_apic_update_apicv(apic);
 	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
 
 	/*
@@ -9420,7 +9418,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	 * still active when the interrupt got accepted. Make sure
 	 * inject_pending_event() is called to check for that.
 	 */
-	if (!vcpu->arch.apicv_active)
+	if (!activate)
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
 
 out:
@@ -9486,7 +9484,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 	if (irqchip_split(vcpu->kvm))
 		kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
 	else {
-		if (vcpu->arch.apicv_active)
+		if (kvm_vcpu_apicv_active(vcpu))
 			static_call(kvm_x86_sync_pir_to_irr)(vcpu);
 		if (ioapic_in_kernel(vcpu->kvm))
 			kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
@@ -9758,7 +9756,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * This handles the case where a posted interrupt was
 	 * notified with kvm_vcpu_kick.
 	 */
-	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
+	if (kvm_lapic_enabled(vcpu) && kvm_vcpu_apicv_active(vcpu))
 		static_call(kvm_x86_sync_pir_to_irr)(vcpu);
 
 	if (kvm_vcpu_exit_request(vcpu)) {
@@ -9808,7 +9806,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			break;
 		}
 
-		if (vcpu->arch.apicv_active)
+		if (kvm_vcpu_apicv_active(vcpu))
 			static_call(kvm_x86_sync_pir_to_irr)(vcpu);
         }
 
@@ -10824,8 +10822,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		r = kvm_create_lapic(vcpu, lapic_timer_advance_ns);
 		if (r < 0)
 			goto fail_mmu_destroy;
-		if (kvm_apicv_activated(vcpu->kvm))
-			vcpu->arch.apicv_active = true;
 	} else
 		static_branch_inc(&kvm_has_noapic_vcpu);
 
@@ -11821,7 +11817,8 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 
 bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->arch.apicv_active && static_call(kvm_x86_dy_apicv_has_pending_interrupt)(vcpu))
+	if (kvm_vcpu_apicv_active(vcpu) &&
+	    static_call(kvm_x86_dy_apicv_has_pending_interrupt)(vcpu))
 		return true;
 
 	return false;
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/4] KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism
  2021-10-22  0:49 [PATCH v2 0/4] KVM: x86: APICv cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-10-22  0:49 ` [PATCH v2 3/4] KVM: x86: Move apicv_active flag from vCPU to in-kernel local APIC Sean Christopherson
@ 2021-10-22  0:49 ` Sean Christopherson
  2021-10-22 10:12 ` [PATCH v2 0/4] KVM: x86: APICv cleanups Paolo Bonzini
  4 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-22  0:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

Use a rw_semaphore instead of a mutex to coordinate APICv updates so that
vCPUs responding to requests can take the lock for read and run in
parallel.  Using a mutex forces serialization of vCPUs even though
kvm_vcpu_update_apicv() only touches data local to that vCPU or is
protected by a different lock, e.g. SVM's ir_list_lock.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/hyperv.c           |  4 ++--
 arch/x86/kvm/x86.c              | 12 +++++++-----
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f64eef86391d..b61c03cda2b4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1070,7 +1070,7 @@ struct kvm_arch {
 	atomic_t apic_map_dirty;
 
 	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
-	struct mutex apicv_update_lock;
+	struct rw_semaphore apicv_update_lock;
 
 	bool apic_access_memslot_enabled;
 	unsigned long apicv_inhibit_reasons;
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6f11cda2bfa4..4f15c0165c05 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -112,7 +112,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
 	if (!!auto_eoi_old == !!auto_eoi_new)
 		return;
 
-	mutex_lock(&vcpu->kvm->arch.apicv_update_lock);
+	down_write(&vcpu->kvm->arch.apicv_update_lock);
 
 	if (auto_eoi_new)
 		hv->synic_auto_eoi_used++;
@@ -123,7 +123,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
 				   !hv->synic_auto_eoi_used,
 				   APICV_INHIBIT_REASON_HYPERV);
 
-	mutex_unlock(&vcpu->kvm->arch.apicv_update_lock);
+	up_write(&vcpu->kvm->arch.apicv_update_lock);
 }
 
 static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03103b69e8a6..77f6dc3aa4ee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8742,7 +8742,7 @@ EXPORT_SYMBOL_GPL(kvm_apicv_activated);
 
 static void kvm_apicv_init(struct kvm *kvm)
 {
-	mutex_init(&kvm->arch.apicv_update_lock);
+	init_rwsem(&kvm->arch.apicv_update_lock);
 
 	if (enable_apicv)
 		clear_bit(APICV_INHIBIT_REASON_DISABLE,
@@ -9402,7 +9402,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	if (!lapic_in_kernel(vcpu))
 		return;
 
-	mutex_lock(&vcpu->kvm->arch.apicv_update_lock);
+	down_read(&vcpu->kvm->arch.apicv_update_lock);
 
 	activate = kvm_apicv_activated(vcpu->kvm);
 	if (apic->apicv_active == activate)
@@ -9422,7 +9422,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
 
 out:
-	mutex_unlock(&vcpu->kvm->arch.apicv_update_lock);
+	up_read(&vcpu->kvm->arch.apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
 
@@ -9430,6 +9430,8 @@ void __kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 {
 	unsigned long old, new;
 
+	lockdep_assert_held_write(&kvm->arch.apicv_update_lock);
+
 	if (!kvm_x86_ops.check_apicv_inhibit_reasons ||
 	    !static_call(kvm_x86_check_apicv_inhibit_reasons)(bit))
 		return;
@@ -9468,9 +9470,9 @@ EXPORT_SYMBOL_GPL(__kvm_request_apicv_update);
 
 void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 {
-	mutex_lock(&kvm->arch.apicv_update_lock);
+	down_write(&kvm->arch.apicv_update_lock);
 	__kvm_request_apicv_update(kvm, activate, bit);
-	mutex_unlock(&kvm->arch.apicv_update_lock);
+	up_write(&kvm->arch.apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
 
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv cleanups
  2021-10-22  0:49 [PATCH v2 0/4] KVM: x86: APICv cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-10-22  0:49 ` [PATCH v2 4/4] KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism Sean Christopherson
@ 2021-10-22 10:12 ` Paolo Bonzini
  2021-10-22 14:56   ` Maxim Levitsky
  4 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2021-10-22 10:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Maxim Levitsky

On 22/10/21 02:49, Sean Christopherson wrote:
> APICv cleanups and a dissertation on handling concurrent APIC access page
> faults and APICv inhibit updates.
> 
> I've tested this but haven't hammered the AVIC stuff, I'd appreciate it if
> someone with the Hyper-V setup can beat on the AVIC toggling.
> 
> Sean Christopherson (4):
>    KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS
>      memslot
>    KVM: x86: Move SVM's APICv sanity check to common x86
>    KVM: x86: Move apicv_active flag from vCPU to in-kernel local APIC
>    KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism
> 
>   arch/x86/include/asm/kvm_host.h |  3 +-
>   arch/x86/kvm/hyperv.c           |  4 +--
>   arch/x86/kvm/lapic.c            | 46 ++++++++++---------------
>   arch/x86/kvm/lapic.h            |  5 +--
>   arch/x86/kvm/mmu/mmu.c          | 29 ++++++++++++++--
>   arch/x86/kvm/svm/avic.c         |  2 +-
>   arch/x86/kvm/svm/svm.c          |  2 --
>   arch/x86/kvm/vmx/vmx.c          |  4 +--
>   arch/x86/kvm/x86.c              | 59 ++++++++++++++++++++++-----------
>   9 files changed, 93 insertions(+), 61 deletions(-)
> 

Queued, thanks.  I only made small edits to the comment in patch
1, to make it very slightly shorter.

	 * 2a. APICv is globally disabled but locally enabled, and this
	 *     vCPU acquires mmu_lock before __kvm_request_apicv_update
	 *     calls kvm_zap_gfn_range().  This vCPU will install a stale
	 *     SPTE, but no one will consume it as (a) no vCPUs can be
	 *     running due to the kick from KVM_REQ_APICV_UPDATE, and
	 *     (b) because KVM_REQ_APICV_UPDATE is raised before the VM
	 *     state is update, vCPUs attempting to service the request
	 *     will block on apicv_update_lock.  The update flow will
	 *     then zap the SPTE and release the lock.

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv cleanups
  2021-10-22 10:12 ` [PATCH v2 0/4] KVM: x86: APICv cleanups Paolo Bonzini
@ 2021-10-22 14:56   ` Maxim Levitsky
  2021-10-22 16:43     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2021-10-22 14:56 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On Fri, 2021-10-22 at 12:12 +0200, Paolo Bonzini wrote:
> On 22/10/21 02:49, Sean Christopherson wrote:
> > APICv cleanups and a dissertation on handling concurrent APIC access page
> > faults and APICv inhibit updates.
> > 
> > I've tested this but haven't hammered the AVIC stuff, I'd appreciate it if
> > someone with the Hyper-V setup can beat on the AVIC toggling.
> > 
> > Sean Christopherson (4):
> >    KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS
> >      memslot
> >    KVM: x86: Move SVM's APICv sanity check to common x86
> >    KVM: x86: Move apicv_active flag from vCPU to in-kernel local APIC
> >    KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism
> > 
> >   arch/x86/include/asm/kvm_host.h |  3 +-
> >   arch/x86/kvm/hyperv.c           |  4 +--
> >   arch/x86/kvm/lapic.c            | 46 ++++++++++---------------
> >   arch/x86/kvm/lapic.h            |  5 +--
> >   arch/x86/kvm/mmu/mmu.c          | 29 ++++++++++++++--
> >   arch/x86/kvm/svm/avic.c         |  2 +-
> >   arch/x86/kvm/svm/svm.c          |  2 --
> >   arch/x86/kvm/vmx/vmx.c          |  4 +--
> >   arch/x86/kvm/x86.c              | 59 ++++++++++++++++++++++-----------
> >   9 files changed, 93 insertions(+), 61 deletions(-)
> > 
> 
> Queued, thanks.  I only made small edits to the comment in patch
> 1, to make it very slightly shorter.
> 
> 	 * 2a. APICv is globally disabled but locally enabled, and this
> 	 *     vCPU acquires mmu_lock before __kvm_request_apicv_update
> 	 *     calls kvm_zap_gfn_range().  This vCPU will install a stale
> 	 *     SPTE, but no one will consume it as (a) no vCPUs can be
> 	 *     running due to the kick from KVM_REQ_APICV_UPDATE, and
> 	 *     (b) because KVM_REQ_APICV_UPDATE is raised before the VM
> 	 *     state is update, vCPUs attempting to service the request
> 	 *     will block on apicv_update_lock.  The update flow will
> 	 *     then zap the SPTE and release the lock.
> 
> Paolo
> 

Hi Paolo and Sean!

Could you expalain to me why the scenario when  I expalined about in my reply previous version of patch 1
is not correct?

This is the scenario I was worried about:



    vCPU0                                   vCPU1
    =====                                   =====

- disable AVIC
- VMRUN
                                        - #NPT on AVIC MMIO access
                                        - *stuck on something prior to the page fault code*
- enable AVIC
- VMRUN
                                        - *still stuck on something prior to the page fault code*

- disable AVIC:

  - raise KVM_REQ_APICV_UPDATE request
                                        
  - set global avic state to disable

  - zap the SPTE (does nothing, doesn't race
        with anything either)

  - handle KVM_REQ_APICV_UPDATE -
    - disable vCPU0 AVIC

- VMRUN
                                        - *still stuck on something prior to the page fault code*

                                                            ...
                                                            ...
                                                            ...

                                        - now vCPU1 finally starts running the page fault code.

                                        - vCPU1 AVIC is still enabled 
                                          (because vCPU1 never handled KVM_REQ_APICV_UPDATE),
                                          so the page fault code will populate the SPTE.
                                          

                                        - handle KVM_REQ_APICV_UPDATE
                                           - finally disable vCPU1 AVIC

                                        - VMRUN (vCPU1 AVIC disabled, SPTE populated)

                                                         ***boom***



Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv cleanups
  2021-10-22 14:56   ` Maxim Levitsky
@ 2021-10-22 16:43     ` Paolo Bonzini
  2021-10-25 14:35       ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2021-10-22 16:43 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 22/10/21 16:56, Maxim Levitsky wrote:
>      vCPU0                                   vCPU1
>      =====                                   =====
> 
> - disable AVIC
>                                          - #NPT on AVIC MMIO access

vCPU1 is now OUTSIDE_GUEST_CODE

>                                          -*stuck on something prior to the page fault code*
> - enable AVIC
>                                          -*still stuck on something prior to the page fault code*
> - disable AVIC:
>    - raise KVM_REQ_APICV_UPDATE request

kvm_make_all_cpus_request does not wait for vCPU1

>    - zap the SPTE (does nothing, doesn't race
>         with anything either)

vCPU0 writes mmu_notifier_seq here

>                                          - now vCPU1 finally starts running the page fault code.

vCPU1 reads mmu_notifier_seq here

So yeah, I think you're right.

The VM value doesn't have this problem, because it is always stored 
before mmu_notifier_seq is incremented (on the write side) and loaded 
after the mmu_notifier_seq (on the page fault side).  Therefore, if 
vCPU1 sees a stale per-VM apicv_active flag, it is always going to see a 
stale mmu_notifier_seq.

With the per-vCPU flag, instead, the flag is written by 
kvm_vcpu_update_apicv, and that can be long after mmu_notifier_seq is 
incremented.

Paolo

>                                          - vCPU1 AVIC is still enabled
>                                            (because vCPU1 never handled KVM_REQ_APICV_UPDATE),
>                                            so the page fault code will populate the SPTE.
> 
>                                          - handle KVM_REQ_APICV_UPDATE
>                                             - finally disable vCPU1 AVIC
> 
>                                          - VMRUN (vCPU1 AVIC disabled, SPTE populated)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv cleanups
  2021-10-22 16:43     ` Paolo Bonzini
@ 2021-10-25 14:35       ` Sean Christopherson
  2021-10-25 15:21         ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2021-10-25 14:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On Fri, Oct 22, 2021, Paolo Bonzini wrote:
> So yeah, I think you're right.

Yep.  The alternative would be to explicitly check for a pending APICv update.
I don't have a strong opinion, I dislike both options equally :-)

Want me to type up a v3 comment?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv cleanups
  2021-10-25 14:35       ` Sean Christopherson
@ 2021-10-25 15:21         ` Paolo Bonzini
  2021-10-25 15:59           ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2021-10-25 15:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 25/10/21 16:35, Sean Christopherson wrote:
>> So yeah, I think you're right.
> Yep.  The alternative would be to explicitly check for a pending APICv update.
> I don't have a strong opinion, I dislike both options equally:-)

No, checking for the update is worse and with this example, I can now 
point my finger on why I preferred the VM check even before: because 
even though the page fault path runs in vCPU context and uses a 
vCPU-specific role, overall the page tables are still per-VM.

Therefore it makes sense for the page fault path to synchronize with 
whoever updates the flag and zaps the page, and not with the KVM_REQ_* 
handler of the same vCPU.

(Here goes the usual shameless plug of my lockless programming articles 
on LWN---I think you're old enough to vaguely remember Jerry 
Pournelle---and in particular the first one at 
https://lwn.net/Articles/844224/).

> Want me to type up a v3 comment?

Yes, please do.

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv cleanups
  2021-10-25 15:21         ` Paolo Bonzini
@ 2021-10-25 15:59           ` Sean Christopherson
  2021-10-25 16:05             ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2021-10-25 15:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On Mon, Oct 25, 2021, Paolo Bonzini wrote:
> On 25/10/21 16:35, Sean Christopherson wrote:
> > > So yeah, I think you're right.
> > Yep.  The alternative would be to explicitly check for a pending APICv update.
> > I don't have a strong opinion, I dislike both options equally:-)
> 
> No, checking for the update is worse and with this example, I can now point
> my finger on why I preferred the VM check even before: because even though
> the page fault path runs in vCPU context and uses a vCPU-specific role,
> overall the page tables are still per-VM.

Arguably the lack of incorporation into the page role is the underlying bug, and
all the shenanigans with synchronizing updates are just workarounds for that bug.
I.e. page tables are never strictly per-VM, they're per-role, but we fudge it in
this case because we don't want to take on the overhead of maintaining two sets
of page tables to handle APICv.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv cleanups
  2021-10-25 15:59           ` Sean Christopherson
@ 2021-10-25 16:05             ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2021-10-25 16:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 25/10/21 17:59, Sean Christopherson wrote:
>> No, checking for the update is worse and with this example, I can now point
>> my finger on why I preferred the VM check even before: because even though
>> the page fault path runs in vCPU context and uses a vCPU-specific role,
>> overall the page tables are still per-VM.
> Arguably the lack of incorporation into the page role is the underlying bug, and
> all the shenanigans with synchronizing updates are just workarounds for that bug.
> I.e. page tables are never strictly per-VM, they're per-role, but we fudge it in
> this case because we don't want to take on the overhead of maintaining two sets
> of page tables to handle APICv.

Yes, that makes sense as well:

- you can have simpler code by using the vCPU state, but then 
correctness requires that the APICv state be part of the vCPU-specific 
MMU state.  That is, of the role.

- if you don't want to do that, because you want to maintain one set of 
page tables only, the price to pay is the synchronization shenanigans, 
both those involving apicv_update mutex^Wrwsem (which ensure no one uses 
the old state) and those involving kvm_faultin_pfn/kvm_zap_pfn_range (to 
ensure the one state used by the MMU is the correct one).

So it's a pick your poison situation.

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-10-25 16:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22  0:49 [PATCH v2 0/4] KVM: x86: APICv cleanups Sean Christopherson
2021-10-22  0:49 ` [PATCH v2 1/4] KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS memslot Sean Christopherson
2021-10-22  0:49 ` [PATCH v2 2/4] KVM: x86: Move SVM's APICv sanity check to common x86 Sean Christopherson
2021-10-22  0:49 ` [PATCH v2 3/4] KVM: x86: Move apicv_active flag from vCPU to in-kernel local APIC Sean Christopherson
2021-10-22  0:49 ` [PATCH v2 4/4] KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism Sean Christopherson
2021-10-22 10:12 ` [PATCH v2 0/4] KVM: x86: APICv cleanups Paolo Bonzini
2021-10-22 14:56   ` Maxim Levitsky
2021-10-22 16:43     ` Paolo Bonzini
2021-10-25 14:35       ` Sean Christopherson
2021-10-25 15:21         ` Paolo Bonzini
2021-10-25 15:59           ` Sean Christopherson
2021-10-25 16:05             ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).