linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups
@ 2022-10-01  0:58 Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 01/32] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps Sean Christopherson
                   ` (32 more replies)
  0 siblings, 33 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

The first half or so patches fix semi-urgent, real-world relevant APICv
and AVIC bugs.

The second half fixes a variety of AVIC and optimized APIC map bugs
where KVM doesn't play nice with various edge cases that are
architecturally legal(ish), but are unlikely to occur in most real world
scenarios

I have tested this heavily with KUT, but I haven't booted Windows and
don't have access to x2AVIC, so as usual, additional testing would be
much appreciated.

v4:
  - Fix more bugs! [Alejandro]
  - Delete APIC memslot to inhibit xAVIC acceleration when x2APIC is
    enabled on AMD/SVM instead of using a "partial" inihbit. [Maxim]

v3:
  - https://lore.kernel.org/all/20220920233134.940511-1-seanjc@google.com
  - Collect reviews. [Paolo]
  - Drop "partial" x2APIC inhibit and instead delete the memslot.
    [Maxim, Suravee]
  - Skip logical mode updates for x2APIC, which just reuses the
    phys_map with some clever logic. [Suravee]
  - Add a fix for "nodecode write" traps. [Alejandro]

v2:
  - https://lore.kernel.org/all/20220903002254.2411750-1-seanjc@google.com
  - Collect reviews. [Li, Maxim]
  - Disable only MMIO access when x2APIC is enabled (instead of disabling
    all of AVIC). [Maxim]
  - Inhibit AVIC when logical IDs are aliased. [Maxim]
  - Tweak name of set_virtual_apic_mode() hook. [Maxim]
  - Straight up revert logical ID fastpath mess. [Maxim]
  - Reword changelog about skipping vCPU during logical setup. [Maxim]
  - Fix LDR updates on AVIC. [Maxim?]
  - Fix a nasty ISR caching bug.
  - Flush TLB when activating AVIC.

v1: https://lore.kernel.org/all/20220831003506.4117148-1-seanjc@google.com

Sean Christopherson (31):
  KVM: x86: Blindly get current x2APIC reg value on "nodecode write"
    traps
  KVM: x86: Purge "highest ISR" cache when updating APICv state
  KVM: SVM: Flush the "current" TLB when activating AVIC
  KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid
    target
  KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is
    disabled
  KVM: x86: Track xAPIC ID only on userspace SET, _after_ vAPIC is
    updated
  KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to
    32-bit ID
  KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
  KVM: x86: Handle APICv updates for APIC "mode" changes via request
  KVM: x86: Move APIC access page helper to common x86 code
  KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
  KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean
  KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick
  Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when
    possible"
  KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch
  KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU
  KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0
  KVM: x86: Explicitly track all possibilities for APIC map's logical
    modes
  KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup
  KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs
  KVM: x86: Disable APIC logical map if vCPUs are aliased in logical
    mode
  KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
  KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled
  KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode
  KVM: SVM: Always update local APIC on writes to logical dest register
  KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
  KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
  KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
  KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps
  Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on
    a running vcpu"
  KVM: x86: Track required APICv inhibits with variable, not callback

Suravee Suthikulpanit (1):
  KVM: SVM: Fix x2APIC Logical ID calculation for
    avic_kick_target_vcpus_fast

 Documentation/virt/kvm/x86/errata.rst |  11 +
 arch/x86/include/asm/kvm-x86-ops.h    |   1 -
 arch/x86/include/asm/kvm_host.h       |  55 +++-
 arch/x86/kvm/lapic.c                  | 238 +++++++++++++---
 arch/x86/kvm/lapic.h                  |   2 +
 arch/x86/kvm/svm/avic.c               | 375 ++++++++++++--------------
 arch/x86/kvm/svm/nested.c             |   2 +-
 arch/x86/kvm/svm/svm.c                |   6 +-
 arch/x86/kvm/svm/svm.h                |  28 +-
 arch/x86/kvm/vmx/vmx.c                |  58 +---
 arch/x86/kvm/x86.c                    |  23 +-
 11 files changed, 477 insertions(+), 322 deletions(-)


base-commit: c59fb127583869350256656b7ed848c398bef879
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 01/32] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:47   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 02/32] KVM: x86: Purge "highest ISR" cache when updating APICv state Sean Christopherson
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

When emulating a x2APIC write in response to an APICv/AVIC trap, get the
the written value from the vAPIC page without checking that reads are
allowed for the target register.  AVIC can generate trap-like VM-Exits on
writes to EOI, and so KVM needs to get the written value from the backing
page without running afoul of EOI's write-only behavior.

Alternatively, EOI could be special cased to always write '0', e.g. so
that the sanity check could be preserved, but x2APIC on AMD is actually
supposed to disallow non-zero writes (not emulated by KVM), and the
sanity check was a byproduct of how the KVM code was written, i.e. wasn't
added to guard against anything in particular.

Fixes: 70c8327c11c6 ("KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg")
Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers")
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d7639d126e6c..05d079fc2c66 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2284,23 +2284,18 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u64 val;
 
-	if (apic_x2apic_mode(apic)) {
-		if (KVM_BUG_ON(kvm_lapic_msr_read(apic, offset, &val), vcpu->kvm))
-			return;
-	} else {
-		val = kvm_lapic_get_reg(apic, offset);
-	}
-
 	/*
 	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
 	 * xAPIC, ICR writes need to go down the common (slightly slower) path
 	 * to get the upper half from ICR2.
 	 */
 	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
+		val = kvm_lapic_get_reg64(apic, APIC_ICR);
 		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
 		trace_kvm_apic_write(APIC_ICR, val);
 	} else {
 		/* TODO: optimize to just emulate side effect w/o one more write */
+		val = kvm_lapic_get_reg(apic, offset);
 		kvm_lapic_reg_write(apic, offset, (u32)val);
 	}
 }
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 02/32] KVM: x86: Purge "highest ISR" cache when updating APICv state
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 01/32] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:47   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 03/32] KVM: SVM: Flush the "current" TLB when activating AVIC Sean Christopherson
                   ` (30 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Purge the "highest ISR" cache when updating APICv state on a vCPU.  The
cache must not be used when APICv is active as hardware may emulate EOIs
(and other operations) without exiting to KVM.

This fixes a bug where KVM will effectively block IRQs in perpetuity due
to the "highest ISR" never getting reset if APICv is activated on a vCPU
while an IRQ is in-service.  Hardware emulates the EOI and KVM never gets
a chance to update its cache.

Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function")
Cc: stable@vger.kernel.org
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 05d079fc2c66..5de1c7aa1ce9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2424,6 +2424,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
 		 */
 		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
 	}
+	apic->highest_isr_cache = -1;
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);
 
@@ -2480,7 +2481,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
 	kvm_apic_update_apicv(vcpu);
-	apic->highest_isr_cache = -1;
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
 
@@ -2767,7 +2767,6 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 	__start_apic_timer(apic, APIC_TMCCT);
 	kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
 	kvm_apic_update_apicv(vcpu);
-	apic->highest_isr_cache = -1;
 	if (apic->apicv_active) {
 		static_call_cond(kvm_x86_apicv_post_state_restore)(vcpu);
 		static_call_cond(kvm_x86_hwapic_irr_update)(vcpu, apic_find_highest_irr(apic));
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 03/32] KVM: SVM: Flush the "current" TLB when activating AVIC
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 01/32] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 02/32] KVM: x86: Purge "highest ISR" cache when updating APICv state Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
       [not found]   ` <b9f336f17eec6bfbb8429700e0f135d19813c576.camel@redhat.com>
  2022-10-01  0:58 ` [PATCH v4 04/32] KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target Sean Christopherson
                   ` (29 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Flush the TLB when activating AVIC as the CPU can insert into the TLB
while AVIC is "locally" disabled.  KVM doesn't treat "APIC hardware
disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
hardware disabled, AVIC is not guaranteed to be inhibited.  As a result,
KVM may create a valid NPT mapping for the APIC base, which the CPU can
cache as a non-AVIC translation.

Note, Intel handles this in vmx_set_virtual_apic_mode().

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 6919dee69f18..712330b80891 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
 		/* Disabling MSR intercept for x2APIC registers */
 		svm_set_x2apic_msr_interception(svm, false);
 	} else {
+		/*
+		 * Flush the TLB, the guest may have inserted a non-APIC
+		 * mapping into the TLB while AVIC was disabled.
+		 */
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
+
 		/* For xAVIC and hybrid-xAVIC modes */
 		vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
 		/* Enabling MSR intercept for x2APIC registers */
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 04/32] KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 03/32] KVM: SVM: Flush the "current" TLB when activating AVIC Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 05/32] KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled Sean Christopherson
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Emulate ICR writes on AVIC IPI failures due to invalid targets using the
same logic as failures due to invalid types.  AVIC acceleration fails if
_any_ of the targets are invalid, and crucially VM-Exits before sending
IPIs to targets that _are_ valid.  In logical mode, the destination is a
bitmap, i.e. a single IPI can target multiple logical IDs.  Doing nothing
causes KVM to drop IPIs if at least one target is valid and at least one
target is invalid.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Cc: stable@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 712330b80891..3b2c88b168ba 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -502,14 +502,18 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
 	trace_kvm_avic_incomplete_ipi(vcpu->vcpu_id, icrh, icrl, id, index);
 
 	switch (id) {
+	case AVIC_IPI_FAILURE_INVALID_TARGET:
 	case AVIC_IPI_FAILURE_INVALID_INT_TYPE:
 		/*
 		 * Emulate IPIs that are not handled by AVIC hardware, which
-		 * only virtualizes Fixed, Edge-Triggered INTRs.  The exit is
-		 * a trap, e.g. ICR holds the correct value and RIP has been
-		 * advanced, KVM is responsible only for emulating the IPI.
-		 * Sadly, hardware may sometimes leave the BUSY flag set, in
-		 * which case KVM needs to emulate the ICR write as well in
+		 * only virtualizes Fixed, Edge-Triggered INTRs, and falls over
+		 * if _any_ targets are invalid, e.g. if the logical mode mask
+		 * is a superset of running vCPUs.
+		 *
+		 * The exit is a trap, e.g. ICR holds the correct value and RIP
+		 * has been advanced, KVM is responsible only for emulating the
+		 * IPI.  Sadly, hardware may sometimes leave the BUSY flag set,
+		 * in which case KVM needs to emulate the ICR write as well in
 		 * order to clear the BUSY flag.
 		 */
 		if (icrl & APIC_ICR_BUSY)
@@ -525,8 +529,6 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
 		 */
 		avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh, index);
 		break;
-	case AVIC_IPI_FAILURE_INVALID_TARGET:
-		break;
 	case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
 		WARN_ONCE(1, "Invalid backing page\n");
 		break;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 05/32] KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 04/32] KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:53   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 06/32] KVM: x86: Track xAPIC ID only on userspace SET, _after_ vAPIC is updated Sean Christopherson
                   ` (27 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Don't inhibit APICv/AVIC due to an xAPIC ID mismatch if the APIC is
hardware disabled.  The ID cannot be consumed while the APIC is disabled,
and the ID is guaranteed to be set back to the vcpu_id when the APIC is
hardware enabled (architectural behavior correctly emulated by KVM).

Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5de1c7aa1ce9..67260f7ce43a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2072,6 +2072,9 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
 {
 	struct kvm *kvm = apic->vcpu->kvm;
 
+	if (!kvm_apic_hw_enabled(apic))
+		return;
+
 	if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
 		return;
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 06/32] KVM: x86: Track xAPIC ID only on userspace SET, _after_ vAPIC is updated
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 05/32] KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:53   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 07/32] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID Sean Christopherson
                   ` (26 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Track potential changes to a vCPU's xAPIC ID only for KVM_SET_LAPIC, i.e.
not for KVM_GET_LAPIC, and process the update after the incoming state
provided by userspace is copied to KVM's in-kernel vAPIC.  The latter bug
is the most problematic issue, as processing the update before KVM's
vAPIC is actually updated can result in false positives, e.g. due to the
APIC holding an x2APIC ID (wrong format), and false negatives, e.g. due
to KVM failing to detect an xAPIC ID "mismatch".

Processing an "update" in KVM_GET_LAPIC is likely a benign bug now that
the update helper ignores mismatches, but prior to that fix, invoking
KVM_GET_LAPIC while the APIC is disabled could effectively cause KVM to
consume stale state, e.g. if the APIC were in x2APIC mode before being
hardware disabled.

Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 67260f7ce43a..251856ba0750 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2720,8 +2720,6 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 			icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
 			__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
 		}
-	} else {
-		kvm_lapic_xapic_id_updated(vcpu->arch.apic);
 	}
 
 	return 0;
@@ -2757,6 +2755,9 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 	}
 	memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
 
+	if (!apic_x2apic_mode(vcpu->arch.apic))
+		kvm_lapic_xapic_id_updated(vcpu->arch.apic);
+
 	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
 	kvm_recalculate_apic_map(vcpu->kvm);
 	kvm_apic_set_version(vcpu);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 07/32] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 06/32] KVM: x86: Track xAPIC ID only on userspace SET, _after_ vAPIC is updated Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:53   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 08/32] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode Sean Christopherson
                   ` (25 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
it against the xAPIC ID to avoid false positives (sort of) on systems
with >255 CPUs, i.e. with IDs that don't fit into a u8.  The intent of
APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
it's original value,

The mismatch isn't technically a false positive, as architecturally the
xAPIC IDs do end up being aliased in this scenario, and neither APICv
nor AVIC correctly handles IPI virtualization when there is aliasing.
However, KVM already deliberately does not honor the aliasing behavior
that results when an x2APIC ID gets truncated to an xAPIC ID.  I.e. the
resulting APICv/AVIC behavior is aligned with KVM's existing behavior
when KVM's x2APIC hotplug hack is effectively enabled.

If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
piggyback whatever logic disables the optimized APIC map (which is what
provides the hotplug hack), i.e. so that KVM's optimized map and APIC
virtualization yield the same behavior.

For now, fix the immediate problem of APIC virtualization being disabled
for large VMs, which is a much more pressing issue than ensuring KVM
honors architectural behavior for APIC ID aliasing.

Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 251856ba0750..2503f162eb95 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2078,7 +2078,12 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
 	if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
 		return;
 
-	if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
+	/*
+	 * Deliberately truncate the vCPU ID when detecting a modified APIC ID
+	 * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
+	 * value.
+	 */
+	if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
 		return;
 
 	kvm_set_apicv_inhibit(apic->vcpu->kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 08/32] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (6 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 07/32] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:53   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 09/32] KVM: x86: Handle APICv updates for APIC "mode" changes via request Sean Christopherson
                   ` (24 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
avic_set_virtual_apic_mode() and invert the dependency being said
functions to avoid calling avic_vcpu_{load,put}() and
avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.

avic_set_virtual_apic_mode() is invoked from common x86 with preemption
enabled, which makes avic_vcpu_{load,put}() unhappy.  Luckily, calling
those and updating IRTE stuff is unnecessary as the only reason
avic_set_virtual_apic_mode() is called is to handle transitions between
xAPIC and x2APIC that don't also toggle APICv activation.  And if
activation doesn't change, there's no need to fiddle with the physical
APIC ID table or update IRTE.

The "full" refresh is guaranteed to be called if activation changes in
this case as the only call to the "set" path is:

	kvm_vcpu_update_apicv(vcpu);
	static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);

and kvm_vcpu_update_apicv() invokes the refresh if activation changes:

	if (apic->apicv_active == activate)
		goto out;

	apic->apicv_active = activate;
	kvm_apic_update_apicv(vcpu);
	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);

Rename the helper to reflect that it is also called during "refresh".

  WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
  CPU: 183 PID: 49186 Comm: stable Tainted: G           O       6.0.0-smp--fcddbca45f0a-sink #34
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
  RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
   avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
   avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
   kvm_lapic_set_base+0x149/0x1a0 [kvm]
   kvm_set_apic_base+0x8f/0xd0 [kvm]
   kvm_set_msr_common+0xa3a/0xdc0 [kvm]
   svm_set_msr+0x364/0x6b0 [kvm_amd]
   __kvm_set_msr+0xb8/0x1c0 [kvm]
   kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
   msr_interception+0x1c/0x30 [kvm_amd]
   svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
   svm_handle_exit+0xfc/0x160 [kvm_amd]
   vcpu_enter_guest+0x21bb/0x23e0 [kvm]
   vcpu_run+0x92/0x450 [kvm]
   kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
   kvm_vcpu_ioctl+0x559/0x620 [kvm]

Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
Cc: stable@vger.kernel.org
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 31 +++++++++++++++----------------
 arch/x86/kvm/svm/svm.c  |  2 +-
 arch/x86/kvm/svm/svm.h  |  2 +-
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3b2c88b168ba..97ad0661f963 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -747,18 +747,6 @@ void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 	avic_handle_ldr_update(vcpu);
 }
 
-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
-{
-	if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
-		return;
-
-	if (kvm_get_apic_mode(vcpu) == LAPIC_MODE_INVALID) {
-		WARN_ONCE(true, "Invalid local APIC state (vcpu_id=%d)", vcpu->vcpu_id);
-		return;
-	}
-	avic_refresh_apicv_exec_ctrl(vcpu);
-}
-
 static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
 {
 	int ret = 0;
@@ -1100,17 +1088,18 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
 	WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
 }
 
-
-void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb *vmcb = svm->vmcb01.ptr;
-	bool activated = kvm_vcpu_apicv_active(vcpu);
+
+	if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
+		return;
 
 	if (!enable_apicv)
 		return;
 
-	if (activated) {
+	if (kvm_vcpu_apicv_active(vcpu)) {
 		/**
 		 * During AVIC temporary deactivation, guest could update
 		 * APIC ID, DFR and LDR registers, which would not be trapped
@@ -1124,6 +1113,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 		avic_deactivate_vmcb(svm);
 	}
 	vmcb_mark_dirty(vmcb, VMCB_AVIC);
+}
+
+void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+	bool activated = kvm_vcpu_apicv_active(vcpu);
+
+	if (!enable_apicv)
+		return;
+
+	avic_refresh_virtual_apic_mode(vcpu);
 
 	if (activated)
 		avic_vcpu_load(vcpu, vcpu->cpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f0077d9357..afae97ea9a06 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4798,7 +4798,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.enable_nmi_window = svm_enable_nmi_window,
 	.enable_irq_window = svm_enable_irq_window,
 	.update_cr8_intercept = svm_update_cr8_intercept,
-	.set_virtual_apic_mode = avic_set_virtual_apic_mode,
+	.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
 	.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
 	.check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
 	.apicv_post_state_restore = avic_apicv_post_state_restore,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6a7686bf6900..7a95f50e80e7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -646,7 +646,7 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
 void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void avic_ring_doorbell(struct kvm_vcpu *vcpu);
 unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);
-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 
 
 /* sev.c */
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 09/32] KVM: x86: Handle APICv updates for APIC "mode" changes via request
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (7 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 08/32] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:54   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 10/32] KVM: x86: Move APIC access page helper to common x86 code Sean Christopherson
                   ` (23 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Use KVM_REQ_UPDATE_APICV to react to APIC "mode" changes, i.e. to handle
the APIC being hardware enabled/disabled and/or x2APIC being toggled.
There is no need to immediately update APICv state, the only requirement
is that APICv be updating prior to the next VM-Enter.

Making a request will allow piggybacking KVM_REQ_UPDATE_APICV to "inhibit"
the APICv memslot when x2APIC is enabled.  Doing that directly from
kvm_lapic_set_base() isn't feasible as KVM's SRCU must not be held when
modifying memslots (to avoid deadlock), and may or may not be held when
kvm_lapic_set_base() is called, i.e. KVM can't do the right thing without
tracking that is rightly buried behind CONFIG_PROVE_RCU=y.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2503f162eb95..316b61b56cca 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2401,7 +2401,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 		kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
 
 	if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
-		kvm_vcpu_update_apicv(vcpu);
+		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
 		static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
 	}
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 10/32] KVM: x86: Move APIC access page helper to common x86 code
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (8 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 09/32] KVM: x86: Handle APICv updates for APIC "mode" changes via request Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:55   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled Sean Christopherson
                   ` (22 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Move the APIC access page allocation helper function to common x86 code,
the allocation routine is virtually identical between APICv (VMX) and
AVIC (SVM).  Keep APICv's gfn_to_page() + put_page() sequence, which
verifies that a backing page can be allocated, i.e. that the system isn't
under heavy memory pressure.  Forcing the backing page to be populated
isn't strictly necessary, but skipping the effective prefetch only delays
the inevitable.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c    | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/lapic.h    |  1 +
 arch/x86/kvm/svm/avic.c | 41 +++++++----------------------------------
 arch/x86/kvm/vmx/vmx.c  | 35 +----------------------------------
 4 files changed, 44 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 316b61b56cca..80e8b1cc6dc2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2436,6 +2436,41 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);
 
+int kvm_alloc_apic_access_page(struct kvm *kvm)
+{
+	struct page *page;
+	void __user *hva;
+	int ret = 0;
+
+	mutex_lock(&kvm->slots_lock);
+	if (kvm->arch.apic_access_memslot_enabled)
+		goto out;
+
+	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+				      APIC_DEFAULT_PHYS_BASE, PAGE_SIZE);
+	if (IS_ERR(hva)) {
+		ret = PTR_ERR(hva);
+		goto out;
+	}
+
+	page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+	if (is_error_page(page)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * Do not pin the page in memory, so that memory hot-unplug
+	 * is able to migrate it.
+	 */
+	put_page(page);
+	kvm->arch.apic_access_memslot_enabled = true;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
+
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a5ac4a5a5179..0587a8282cb3 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -112,6 +112,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 		     struct dest_map *dest_map);
 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
 void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
+int kvm_alloc_apic_access_page(struct kvm *kvm);
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 97ad0661f963..ec28ba4c5f1b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -256,39 +256,6 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
 	return &avic_physical_id_table[index];
 }
 
-/*
- * Note:
- * AVIC hardware walks the nested page table to check permissions,
- * but does not use the SPA address specified in the leaf page
- * table entry since it uses  address in the AVIC_BACKING_PAGE pointer
- * field of the VMCB. Therefore, we set up the
- * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
- */
-static int avic_alloc_access_page(struct kvm *kvm)
-{
-	void __user *ret;
-	int r = 0;
-
-	mutex_lock(&kvm->slots_lock);
-
-	if (kvm->arch.apic_access_memslot_enabled)
-		goto out;
-
-	ret = __x86_set_memory_region(kvm,
-				      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
-				      APIC_DEFAULT_PHYS_BASE,
-				      PAGE_SIZE);
-	if (IS_ERR(ret)) {
-		r = PTR_ERR(ret);
-		goto out;
-	}
-
-	kvm->arch.apic_access_memslot_enabled = true;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 static int avic_init_backing_page(struct kvm_vcpu *vcpu)
 {
 	u64 *entry, new_entry;
@@ -305,7 +272,13 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
 	if (kvm_apicv_activated(vcpu->kvm)) {
 		int ret;
 
-		ret = avic_alloc_access_page(vcpu->kvm);
+		/*
+		 * Note, AVIC hardware walks the nested page table to check
+		 * permissions, but does not use the SPA address specified in
+		 * the leaf SPTE since it uses address in the AVIC_BACKING_PAGE
+		 * pointer field of the VMCB.
+		 */
+		ret = kvm_alloc_apic_access_page(vcpu->kvm);
 		if (ret)
 			return ret;
 	}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9dba04b6b019..974d9a366d5d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3771,39 +3771,6 @@ static void seg_setup(int seg)
 	vmcs_write32(sf->ar_bytes, ar);
 }
 
-static int alloc_apic_access_page(struct kvm *kvm)
-{
-	struct page *page;
-	void __user *hva;
-	int ret = 0;
-
-	mutex_lock(&kvm->slots_lock);
-	if (kvm->arch.apic_access_memslot_enabled)
-		goto out;
-	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
-				      APIC_DEFAULT_PHYS_BASE, PAGE_SIZE);
-	if (IS_ERR(hva)) {
-		ret = PTR_ERR(hva);
-		goto out;
-	}
-
-	page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-	if (is_error_page(page)) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * Do not pin the page in memory, so that memory hot-unplug
-	 * is able to migrate it.
-	 */
-	put_page(page);
-	kvm->arch.apic_access_memslot_enabled = true;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return ret;
-}
-
 int allocate_vpid(void)
 {
 	int vpid;
@@ -7348,7 +7315,7 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 	vmx->loaded_vmcs = &vmx->vmcs01;
 
 	if (cpu_need_virtualize_apic_accesses(vcpu)) {
-		err = alloc_apic_access_page(vcpu->kvm);
+		err = kvm_alloc_apic_access_page(vcpu->kvm);
 		if (err)
 			goto free_vmcs;
 	}
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (9 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 10/32] KVM: x86: Move APIC access page helper to common x86 code Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:56   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 12/32] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean Sean Christopherson
                   ` (21 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Free the APIC access page memslot if any vCPU enables x2APIC and SVM's
AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with
x2APIC enabled.  On AMD, due to its "hybrid" mode where AVIC is enabled
when x2APIC is enabled even without x2AVIC support, keeping the APIC
access page memslot results in the guest being able to access the virtual
APIC page as x2APIC is fully emulated by KVM.  I.e. hardware isn't aware
that the guest is operating in x2APIC mode.

Exempt nested SVM's update of APICv state from new logic as x2APIC can't
be toggled on VM-Exit.  In practice, invoking the x2APIC logic should be
harmless precisely because it should be a glorified nop, but play it
safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU lock.

Intel doesn't suffer from the same issue as APICv has fully independent
VMCS controls for xAPIC vs. x2APIC virtualization.  Technically, KVM
should provide bus error semantics and not memory semantics for the APIC
page when x2APIC is enabled, but KVM already provides memory semantics in
other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware
disabled (via APIC_BASE MSR).

Reserve an inhibit bit so that common code can detect whether or not the
"x2APIC inhibit" applies, but use a dedicated flag to track the inhibit
so that it doesn't need to be stripped from apicv_inhibit_reasons (since
it's not a "full" inhibit).

Note, checking apic_access_memslot_enabled without taking locks relies
it being set during vCPU creation (before kvm_vcpu_reset()).  vCPUs can
race to set the inhibit and delete the memslot, i.e. can get false
positives, but can't get false negatives as apic_access_memslot_enabled
can't be toggled "on" once any vCPU reaches KVM_RUN.

Opportunistically drop the "can" while updating avic_activate_vmcb()'s
comment, i.e. to state that KVM _does_ support the hybrid mode.  Move
the "Note:" down a line to conform to preferred kernel/KVM multi-line
comment style.

Opportunistically update the apicv_update_lock comment, as it isn't
actually used to protect apic_access_memslot_enabled (it's protected by
slots_lock).

Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 20 +++++++++++++----
 arch/x86/kvm/lapic.c            | 38 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.h            |  1 +
 arch/x86/kvm/svm/avic.c         | 15 +++++++------
 arch/x86/kvm/svm/nested.c       |  2 +-
 arch/x86/kvm/x86.c              | 16 ++++++++++++--
 6 files changed, 77 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d40206b16d6c..062758135c86 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1139,6 +1139,17 @@ enum kvm_apicv_inhibit {
 	 * AVIC is disabled because SEV doesn't support it.
 	 */
 	APICV_INHIBIT_REASON_SEV,
+
+	/*
+	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
+	 * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
+	 * independent controls for AVIC vs. x2AVIC, and also because SVM
+	 * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
+	 * x2AVIC.  Note, this isn't a "full" inhibit and is tracked separately.
+	 * AVIC can still be activated, but KVM must not create SPTEs for the
+	 * APIC base.  For simplicity, this is sticky.
+	 */
+	APICV_INHIBIT_REASON_X2APIC,
 };
 
 struct kvm_arch {
@@ -1176,10 +1187,11 @@ struct kvm_arch {
 	struct kvm_apic_map __rcu *apic_map;
 	atomic_t apic_map_dirty;
 
-	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
-	struct rw_semaphore apicv_update_lock;
-
 	bool apic_access_memslot_enabled;
+	bool apic_access_memslot_inhibited;
+
+	/* Protects apicv_inhibit_reasons */
+	struct rw_semaphore apicv_update_lock;
 	unsigned long apicv_inhibit_reasons;
 
 	gpa_t wall_clock;
@@ -1912,7 +1924,7 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
 
 bool kvm_apicv_activated(struct kvm *kvm);
 bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
-void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
+void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
 void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 				      enum kvm_apicv_inhibit reason, bool set);
 void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 80e8b1cc6dc2..42b61469674d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2443,7 +2443,8 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
 	int ret = 0;
 
 	mutex_lock(&kvm->slots_lock);
-	if (kvm->arch.apic_access_memslot_enabled)
+	if (kvm->arch.apic_access_memslot_enabled ||
+	    kvm->arch.apic_access_memslot_inhibited)
 		goto out;
 
 	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
@@ -2471,6 +2472,41 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
 
+void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+
+	if (!kvm->arch.apic_access_memslot_enabled)
+		return;
+
+	kvm_vcpu_srcu_read_unlock(vcpu);
+
+	mutex_lock(&kvm->slots_lock);
+
+	if (kvm->arch.apic_access_memslot_enabled) {
+		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
+		/*
+		 * Clear "enabled" after the memslot is deleted so that a
+		 * different vCPU doesn't get a false negative when checking
+		 * the flag out of slots_lock.  No additional memory barrier is
+		 * needed as modifying memslots requires waiting other vCPUs to
+		 * drop SRCU (see above), and false positives are ok as the
+		 * flag is rechecked after acquiring slots_lock.
+		 */
+		kvm->arch.apic_access_memslot_enabled = false;
+
+		/*
+		 * Mark the memslot as inhibited to prevent reallocating the
+		 * memslot during vCPU creation, e.g. if a vCPU is hotplugged.
+		 */
+		kvm->arch.apic_access_memslot_inhibited = true;
+	}
+
+	mutex_unlock(&kvm->slots_lock);
+
+	kvm_vcpu_srcu_read_lock(vcpu);
+}
+
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0587a8282cb3..a318609bb050 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -113,6 +113,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
 void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
 int kvm_alloc_apic_access_page(struct kvm *kvm);
+void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu);
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index ec28ba4c5f1b..535e35edce1d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
 
 	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
 
-	/* Note:
-	 * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
-	 * MSR accesses, while interrupt injection to a running vCPU
-	 * can be achieved using AVIC doorbell. The AVIC hardware still
-	 * accelerate MMIO accesses, but this does not cause any harm
-	 * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
+	/*
+	 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
+	 * accesses, while interrupt injection to a running vCPU can be
+	 * achieved using AVIC doorbell.  KVM disables the APIC access page
+	 * (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
+	 * AVIC in hybrid mode activates only the doorbell mechanism.
 	 */
 	if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
 	    avic_mode == AVIC_MODE_X2) {
@@ -975,7 +975,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
 			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
 			  BIT(APICV_INHIBIT_REASON_SEV)      |
 			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
-			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
+			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
+			  BIT(APICV_INHIBIT_REASON_X2APIC);
 
 	return supported & BIT(reason);
 }
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c620999d230..8d5e00a7ef84 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1084,7 +1084,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	 * to benefit from it right away.
 	 */
 	if (kvm_apicv_activated(vcpu->kvm))
-		kvm_vcpu_update_apicv(vcpu);
+		__kvm_vcpu_update_apicv(vcpu);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb9d2c23fb04..a20002924eb4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10251,7 +10251,7 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
 	kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
-void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
+void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	bool activate;
@@ -10286,7 +10286,19 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	preempt_enable();
 	up_read(&vcpu->kvm->arch.apicv_update_lock);
 }
-EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
+EXPORT_SYMBOL_GPL(__kvm_vcpu_update_apicv);
+
+static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
+{
+	if (!lapic_in_kernel(vcpu))
+		return;
+
+	if (apic_x2apic_mode(vcpu->arch.apic) &&
+	    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC))
+		kvm_inhibit_apic_access_page(vcpu);
+
+	__kvm_vcpu_update_apicv(vcpu);
+}
 
 void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 				      enum kvm_apicv_inhibit reason, bool set)
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 12/32] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (10 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 13/32] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick Sean Christopherson
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Replace the "avic_mode" enum with a single bool to track whether or not
x2AVIC is enabled.  KVM already has "apicv_enabled" that tracks if any
flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are
redundant and unnecessary noise.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 46 +++++++++++++++++++----------------------
 arch/x86/kvm/svm/svm.c  |  2 +-
 arch/x86/kvm/svm/svm.h  |  9 +-------
 3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 535e35edce1d..84beef0edae3 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -53,7 +53,7 @@ static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
 static u32 next_vm_id = 0;
 static bool next_vm_id_wrapped = 0;
 static DEFINE_SPINLOCK(svm_vm_data_hash_lock);
-enum avic_modes avic_mode;
+bool x2avic_enabled;
 
 /*
  * This is a wrapper of struct amd_iommu_ir_data.
@@ -79,8 +79,7 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
 	 * (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
 	 * AVIC in hybrid mode activates only the doorbell mechanism.
 	 */
-	if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
-	    avic_mode == AVIC_MODE_X2) {
+	if (x2avic_enabled && apic_x2apic_mode(svm->vcpu.arch.apic)) {
 		vmcb->control.int_ctl |= X2APIC_MODE_MASK;
 		vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID;
 		/* Disabling MSR intercept for x2APIC registers */
@@ -247,8 +246,8 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
 	u64 *avic_physical_id_table;
 	struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
 
-	if ((avic_mode == AVIC_MODE_X1 && index > AVIC_MAX_PHYSICAL_ID) ||
-	    (avic_mode == AVIC_MODE_X2 && index > X2AVIC_MAX_PHYSICAL_ID))
+	if ((!x2avic_enabled && index > AVIC_MAX_PHYSICAL_ID) ||
+	    (index > X2AVIC_MAX_PHYSICAL_ID))
 		return NULL;
 
 	avic_physical_id_table = page_address(kvm_svm->avic_physical_id_table_page);
@@ -262,8 +261,8 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
 	int id = vcpu->vcpu_id;
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if ((avic_mode == AVIC_MODE_X1 && id > AVIC_MAX_PHYSICAL_ID) ||
-	    (avic_mode == AVIC_MODE_X2 && id > X2AVIC_MAX_PHYSICAL_ID))
+	if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) ||
+	    (id > X2AVIC_MAX_PHYSICAL_ID))
 		return -EINVAL;
 
 	if (!vcpu->arch.apic->regs)
@@ -1067,10 +1066,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb *vmcb = svm->vmcb01.ptr;
 
-	if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
-		return;
-
-	if (!enable_apicv)
+	if (!lapic_in_kernel(vcpu) || !enable_apicv)
 		return;
 
 	if (kvm_vcpu_apicv_active(vcpu)) {
@@ -1146,32 +1142,32 @@ bool avic_hardware_setup(struct kvm_x86_ops *x86_ops)
 	if (!npt_enabled)
 		return false;
 
+	/* AVIC is a prerequisite for x2AVIC. */
+	if (!boot_cpu_has(X86_FEATURE_AVIC) && !force_avic) {
+		if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
+			pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
+			pr_warn(FW_BUG "Try enable AVIC using force_avic option");
+		}
+		return false;
+	}
+
 	if (boot_cpu_has(X86_FEATURE_AVIC)) {
-		avic_mode = AVIC_MODE_X1;
 		pr_info("AVIC enabled\n");
 	} else if (force_avic) {
 		/*
 		 * Some older systems does not advertise AVIC support.
 		 * See Revision Guide for specific AMD processor for more detail.
 		 */
-		avic_mode = AVIC_MODE_X1;
 		pr_warn("AVIC is not supported in CPUID but force enabled");
 		pr_warn("Your system might crash and burn");
 	}
 
 	/* AVIC is a prerequisite for x2AVIC. */
-	if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
-		if (avic_mode == AVIC_MODE_X1) {
-			avic_mode = AVIC_MODE_X2;
-			pr_info("x2AVIC enabled\n");
-		} else {
-			pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
-			pr_warn(FW_BUG "Try enable AVIC using force_avic option");
-		}
-	}
+	x2avic_enabled = boot_cpu_has(X86_FEATURE_X2AVIC);
+	if (x2avic_enabled)
+		pr_info("x2AVIC enabled\n");
 
-	if (avic_mode != AVIC_MODE_NONE)
-		amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
+	amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
 
-	return !!avic_mode;
+	return true;
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index afae97ea9a06..37fe7bcf8496 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -819,7 +819,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 	if (intercept == svm->x2avic_msrs_intercepted)
 		return;
 
-	if (avic_mode != AVIC_MODE_X2 ||
+	if (!x2avic_enabled ||
 	    !apic_x2apic_mode(svm->vcpu.arch.apic))
 		return;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7a95f50e80e7..29c334a932c3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,14 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
 extern int vgif;
 extern bool intercept_smi;
-
-enum avic_modes {
-	AVIC_MODE_NONE = 0,
-	AVIC_MODE_X1,
-	AVIC_MODE_X2,
-};
-
-extern enum avic_modes avic_mode;
+extern bool x2avic_enabled;
 
 /*
  * Clean bits in VMCB.
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 13/32] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (11 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 12/32] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 14/32] KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast Sean Christopherson
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Compute the destination from ICRH using the sender's x2APIC status, not
each (potential) target's x2APIC status.

Fixes: c514d3a348ac ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID")
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 84beef0edae3..e9aab8ecce83 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -429,6 +429,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 				   u32 icrl, u32 icrh, u32 index)
 {
+	u32 dest = apic_x2apic_mode(source) ? icrh : GET_XAPIC_DEST_FIELD(icrh);
 	unsigned long i;
 	struct kvm_vcpu *vcpu;
 
@@ -444,13 +445,6 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 	 * since entered the guest will have processed pending IRQs at VMRUN.
 	 */
 	kvm_for_each_vcpu(i, vcpu, kvm) {
-		u32 dest;
-
-		if (apic_x2apic_mode(vcpu->arch.apic))
-			dest = icrh;
-		else
-			dest = GET_XAPIC_DEST_FIELD(icrh);
-
 		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
 					dest, icrl & APIC_DEST_MASK)) {
 			vcpu->arch.apic->irr_pending = true;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 14/32] KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (12 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 13/32] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-10-01  0:58 ` [PATCH v4 15/32] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible" Sean Christopherson
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

For X2APIC ID in cluster mode, the logical ID is bit [15:0].

Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index e9aab8ecce83..e35e9363e7ff 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -356,7 +356,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 
 		if (apic_x2apic_mode(source)) {
 			/* 16 bit dest mask, 16 bit cluster id */
-			bitmap = dest & 0xFFFF0000;
+			bitmap = dest & 0xFFFF;
 			cluster = (dest >> 16) << 4;
 		} else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
 			/* 8 bit dest mask*/
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 15/32] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible"
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (13 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 14/32] KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-12-08 21:56   ` Maxim Levitsky
  2022-10-01  0:58 ` [PATCH v4 16/32] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch Sean Christopherson
                   ` (17 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Due to a likely mismerge of patches, KVM ended up with a superfluous
commit to "enable" AVIC's fast path for x2AVIC mode.  Even worse, the
superfluous commit has several bugs and creates a nasty local shadow
variable.

Rather than fix the bugs piece-by-piece[*] to achieve the same end
result, revert the patch wholesale.

Opportunistically add a comment documenting the x2AVIC dependencies.

This reverts commit 8c9e639da435874fb845c4d296ce55664071ea7a.

[*] https://lore.kernel.org/all/YxEP7ZBRIuFWhnYJ@google.com

Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index e35e9363e7ff..605c36569ddf 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -378,7 +378,17 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 
 		logid_index = cluster + __ffs(bitmap);
 
-		if (!apic_x2apic_mode(source)) {
+		if (apic_x2apic_mode(source)) {
+			/*
+			 * For x2APIC, the logical APIC ID is a read-only value
+			 * that is derived from the x2APIC ID, thus the x2APIC
+			 * ID can be found by reversing the calculation (done
+			 * above).  Note, bits 31:20 of the x2APIC ID are not
+			 * propagated to the logical ID, but KVM limits the
+			 * x2APIC ID limited to KVM_MAX_VCPU_IDS.
+			 */
+			l1_physical_id = logid_index;
+		} else {
 			u32 *avic_logical_id_table =
 				page_address(kvm_svm->avic_logical_id_table_page);
 
@@ -393,23 +403,6 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 
 			l1_physical_id = logid_entry &
 					 AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
-		} else {
-			/*
-			 * For x2APIC logical mode, cannot leverage the index.
-			 * Instead, calculate physical ID from logical ID in ICRH.
-			 */
-			int cluster = (icrh & 0xffff0000) >> 16;
-			int apic = ffs(icrh & 0xffff) - 1;
-
-			/*
-			 * If the x2APIC logical ID sub-field (i.e. icrh[15:0])
-			 * contains anything but a single bit, we cannot use the
-			 * fast path, because it is limited to a single vCPU.
-			 */
-			if (apic < 0 || icrh != (1 << apic))
-				return -EINVAL;
-
-			l1_physical_id = (cluster << 4) + apic;
 		}
 	}
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 16/32] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (14 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 15/32] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible" Sean Christopherson
@ 2022-10-01  0:58 ` Sean Christopherson
  2022-10-01  0:59 ` [PATCH v4 17/32] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU Sean Christopherson
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
vCPU ID, i.e. that there's no need to check for a destination match in
the AVIC kick fast path.

Opportunistically tweak comments to remove "guest bug", as that suggests
KVM is punting on error handling, which is not the case.  Targeting a
non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
or not it's a guest bug is irrelevant.  Such behavior is architecturally
legal and thus needs to faithfully emulated by KVM (and it is).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 605c36569ddf..40a1ea21074d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -368,8 +368,8 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 			cluster = (dest >> 4) << 2;
 		}
 
+		/* Nothing to do if there are no destinations in the cluster. */
 		if (unlikely(!bitmap))
-			/* guest bug: nobody to send the logical interrupt to */
 			return 0;
 
 		if (!is_power_of_2(bitmap))
@@ -397,7 +397,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 			if (WARN_ON_ONCE(index != logid_index))
 				return -EINVAL;
 
-			/* guest bug: non existing/reserved logical destination */
+			/* Nothing to do if the logical destination is invalid. */
 			if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
 				return 0;
 
@@ -406,9 +406,13 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 		}
 	}
 
+	/*
+	 * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+	 * i.e. APIC ID == vCPU ID.  Once again, nothing to do if the target
+	 * vCPU doesn't exist.
+	 */
 	target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
 	if (unlikely(!target_vcpu))
-		/* guest bug: non existing vCPU is a target of this IPI*/
 		return 0;
 
 	target_vcpu->arch.apic->irr_pending = true;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 17/32] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (15 preceding siblings ...)
  2022-10-01  0:58 ` [PATCH v4 16/32] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-10-01  0:59 ` [PATCH v4 18/32] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0 Sean Christopherson
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Add a helper to perform the final kick, two instances of the ICR decoding
is one too many.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 40a1ea21074d..dd0e41d454a7 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -317,6 +317,16 @@ void avic_ring_doorbell(struct kvm_vcpu *vcpu)
 	put_cpu();
 }
 
+
+static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
+{
+	vcpu->arch.apic->irr_pending = true;
+	svm_complete_interrupt_delivery(vcpu,
+					icrl & APIC_MODE_MASK,
+					icrl & APIC_INT_LEVELTRIG,
+					icrl & APIC_VECTOR_MASK);
+}
+
 /*
  * A fast-path version of avic_kick_target_vcpus(), which attempts to match
  * destination APIC ID to vCPU without looping through all vCPUs.
@@ -415,11 +425,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 	if (unlikely(!target_vcpu))
 		return 0;
 
-	target_vcpu->arch.apic->irr_pending = true;
-	svm_complete_interrupt_delivery(target_vcpu,
-					icrl & APIC_MODE_MASK,
-					icrl & APIC_INT_LEVELTRIG,
-					icrl & APIC_VECTOR_MASK);
+	avic_kick_vcpu(target_vcpu, icrl);
 	return 0;
 }
 
@@ -443,13 +449,8 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 	 */
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
-					dest, icrl & APIC_DEST_MASK)) {
-			vcpu->arch.apic->irr_pending = true;
-			svm_complete_interrupt_delivery(vcpu,
-							icrl & APIC_MODE_MASK,
-							icrl & APIC_INT_LEVELTRIG,
-							icrl & APIC_VECTOR_MASK);
-		}
+					dest, icrl & APIC_DEST_MASK))
+			avic_kick_vcpu(vcpu, icrl);
 	}
 }
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 18/32] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (16 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 17/32] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:56   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes Sean Christopherson
                   ` (14 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if
the vCPU will never respond to logical mode interrupts.  KVM already
skips setup in this case, but relies on kvm_apic_map_get_logical_dest()
to generate mask==0.  KVM still needs the mask=0 check as a non-zero LDR
can yield mask==0 depending on the mode, but explicitly handling the LDR
will make it simpler to clean up the logical mode tracking in the future.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 42b61469674d..cef8b202490b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -286,10 +286,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 			continue;
 
 		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
+		if (!ldr)
+			continue;
 
 		if (apic_x2apic_mode(apic)) {
 			new->mode |= KVM_APIC_MODE_X2APIC;
-		} else if (ldr) {
+		} else {
 			ldr = GET_APIC_LOGICAL_ID(ldr);
 			if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
 				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (17 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 18/32] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0 Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:57   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 20/32] KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup Sean Christopherson
                   ` (13 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Track all possibilities for the optimized APIC map's logical modes
instead of overloading the pseudo-bitmap and treating any "unknown" value
as "invalid".

As documented by the now-stale comment above the mode values, the values
did have meaning when the optimized map was originally added.  That
dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
physical LAPIC array for logical x2APIC"), but the obfuscated behavior
and its comment were left behind.

Opportunistically rename "mode" to "logical_mode", partly to make it
clear that the "disabled" case applies only to the logical map, but also
to prove that there is no lurking code that expects "mode" to be a bitmap.

Functionally, this is a glorified nop.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 21 +++++++++++---------
 arch/x86/kvm/lapic.c            | 35 +++++++++++++++++++++++++--------
 2 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 062758135c86..ac28bbfbf0e3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -962,19 +962,22 @@ struct kvm_arch_memory_slot {
 };
 
 /*
- * We use as the mode the number of bits allocated in the LDR for the
- * logical processor ID.  It happens that these are all powers of two.
- * This makes it is very easy to detect cases where the APICs are
- * configured for multiple modes; in that case, we cannot use the map and
- * hence cannot use kvm_irq_delivery_to_apic_fast either.
+ * Track the mode of the optimized logical map, as the rules for decoding the
+ * destination vary per mode.  Enabling the optimized logical map requires all
+ * software-enabled local APIs to be in the same mode, each addressable APIC to
+ * be mapped to only one MDA, and each MDA to map to at most one APIC.
  */
-#define KVM_APIC_MODE_XAPIC_CLUSTER          4
-#define KVM_APIC_MODE_XAPIC_FLAT             8
-#define KVM_APIC_MODE_X2APIC                16
+enum kvm_apic_logical_mode {
+	KVM_APIC_MODE_SW_DISABLED,
+	KVM_APIC_MODE_XAPIC_CLUSTER,
+	KVM_APIC_MODE_XAPIC_FLAT,
+	KVM_APIC_MODE_X2APIC,
+	KVM_APIC_MODE_MAP_DISABLED,
+};
 
 struct kvm_apic_map {
 	struct rcu_head rcu;
-	u8 mode;
+	enum kvm_apic_logical_mode logical_mode;
 	u32 max_apic_id;
 	union {
 		struct kvm_lapic *xapic_flat_map[8];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cef8b202490b..9989893fef69 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
-	switch (map->mode) {
+	switch (map->logical_mode) {
+	case KVM_APIC_MODE_SW_DISABLED:
+		/* Arbitrarily use the flat map so that @cluster isn't NULL. */
+		*cluster = map->xapic_flat_map;
+		*mask = 0;
+		return true;
 	case KVM_APIC_MODE_X2APIC: {
 		u32 offset = (dest_id >> 16) * 16;
 		u32 max_apic_id = map->max_apic_id;
@@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 		*cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
 		*mask = dest_id & 0xf;
 		return true;
+	case KVM_APIC_MODE_MAP_DISABLED:
+		return false;
 	default:
-		/* Not optimized. */
+		WARN_ON_ONCE(1);
 		return false;
 	}
 }
@@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		goto out;
 
 	new->max_apic_id = max_id;
+	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
 		struct kvm_lapic **cluster;
+		enum kvm_apic_logical_mode logical_mode;
 		u16 mask;
 		u32 ldr;
 		u8 xapic_id;
@@ -282,7 +291,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
 			new->phys_map[xapic_id] = apic;
 
-		if (!kvm_apic_sw_enabled(apic))
+		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
+		    !kvm_apic_sw_enabled(apic))
 			continue;
 
 		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
@@ -290,17 +300,26 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 			continue;
 
 		if (apic_x2apic_mode(apic)) {
-			new->mode |= KVM_APIC_MODE_X2APIC;
+			logical_mode = KVM_APIC_MODE_X2APIC;
 		} else {
 			ldr = GET_APIC_LOGICAL_ID(ldr);
 			if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
-				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
+				logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
 			else
-				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
+				logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
 		}
+		if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
+		    new->logical_mode != logical_mode) {
+			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+			continue;
+		}
+		new->logical_mode = logical_mode;
 
-		if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
+		if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
+								&cluster, &mask))) {
+			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
 			continue;
+		}
 
 		if (mask)
 			cluster[ffs(mask) - 1] = apic;
@@ -953,7 +972,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
 {
 	if (kvm->arch.x2apic_broadcast_quirk_disabled) {
 		if ((irq->dest_id == APIC_BROADCAST &&
-				map->mode != KVM_APIC_MODE_X2APIC))
+		     map->logical_mode != KVM_APIC_MODE_X2APIC))
 			return true;
 		if (irq->dest_id == X2APIC_BROADCAST)
 			return true;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 20/32] KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (18 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:57   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 21/32] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs Sean Christopherson
                   ` (12 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Skip the optimized cluster[] setup for x2APIC logical mode, as KVM reuses
the optimized map's phys_map[] and doesn't actually need to insert the
target apic into the cluster[].  The LDR is derived from the x2APIC ID,
and both are read-only in KVM, thus the vCPU's cluster[ldr] is guaranteed
to be the same entry as the vCPU's phys_map[x2apic_id] entry.

Skipping the unnecessary setup will allow a future fix for aliased xAPIC
logical IDs to simply require that cluster[ldr] is non-NULL, i.e. won't
have to special case x2APIC.

Alternatively, the future check could allow "cluster[ldr] == apic", but
that ends up being terribly confusing because cluster[ldr] is only set
at the very end, i.e. it's only possible due to x2APIC's shenanigans.

Another alternative would be to send x2APIC down a separate path _after_
the calculation and then assert that all of the above, but the resulting
code is rather messy, and it's arguably unnecessary since asserting that
the actual LDR matches the expected LDR means that simply testing that
interrupts are delivered correctly provides the same guarantees.

Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9989893fef69..fca8bbb44375 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -166,6 +166,11 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
 	return kvm_can_post_timer_interrupt(vcpu) && vcpu->mode == IN_GUEST_MODE;
 }
 
+static inline u32 kvm_apic_calc_x2apic_ldr(u32 id)
+{
+	return ((id >> 4) << 16) | (1 << (id & 0xf));
+}
+
 static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
 	switch (map->logical_mode) {
@@ -315,6 +320,18 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		}
 		new->logical_mode = logical_mode;
 
+		/*
+		 * In x2APIC mode, the LDR is read-only and derived directly
+		 * from the x2APIC ID, thus is guaranteed to be addressable.
+		 * KVM reuses kvm_apic_map.phys_map to optimize logical mode
+		 * x2APIC interrupts by reversing the LDR calculation to get
+		 * cluster of APICs, i.e. no additional work is required.
+		 */
+		if (apic_x2apic_mode(apic)) {
+			WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(x2apic_id));
+			continue;
+		}
+
 		if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
 								&cluster, &mask))) {
 			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
@@ -381,11 +398,6 @@ static inline void kvm_apic_set_dfr(struct kvm_lapic *apic, u32 val)
 	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
 }
 
-static inline u32 kvm_apic_calc_x2apic_ldr(u32 id)
-{
-	return ((id >> 4) << 16) | (1 << (id & 0xf));
-}
-
 static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id)
 {
 	u32 ldr = kvm_apic_calc_x2apic_ldr(id);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 21/32] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (19 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 20/32] KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-10-01  0:59 ` [PATCH v4 22/32] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode Sean Christopherson
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Disable the optimized APIC logical map if a logical ID covers multiple
MDAs, i.e. if a vCPU has multiple bits set in its ID.  In logical mode,
events match if "ID & MDA != 0", i.e. creating an entry for only the
first bit can cause interrupts to be missed.

Note, creating an entry for every bit is also wrong as KVM would generate
IPIs for every matching bit.  It would be possible to teach KVM to play
nice with this edge case, but it is very much an edge case and probably
not used in any real world OS, i.e. it's not worth optimizing.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fca8bbb44375..7fd55e24247c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -338,8 +338,14 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 			continue;
 		}
 
-		if (mask)
-			cluster[ffs(mask) - 1] = apic;
+		if (!mask)
+			continue;
+
+		if (!is_power_of_2(mask)) {
+			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+			continue;
+		}
+		cluster[ffs(mask) - 1] = apic;
 	}
 out:
 	old = rcu_dereference_protected(kvm->arch.apic_map,
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 22/32] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (20 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 21/32] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-10-01  0:59 ` [PATCH v4 23/32] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs Sean Christopherson
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Disable the optimized APIC logical map if multiple vCPUs are aliased to
the same logical ID.  Architecturally, all CPUs whose logical ID matches
the MDA are supposed to receive the interrupt; overwriting existing map
entries can result in missed IPIs.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7fd55e24247c..14f03e654de4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -341,11 +341,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		if (!mask)
 			continue;
 
-		if (!is_power_of_2(mask)) {
+		ldr = ffs(mask) - 1;
+		if (!is_power_of_2(mask) || cluster[ldr]) {
 			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
 			continue;
 		}
-		cluster[ffs(mask) - 1] = apic;
+		cluster[ldr] = apic;
 	}
 out:
 	old = rcu_dereference_protected(kvm->arch.apic_map,
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 23/32] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (21 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 22/32] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:58   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 24/32] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled Sean Christopherson
                   ` (9 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
for x2APIC.  If 32-bit IDs are not enabled, disable the optimized map to
honor x86 architectural behavior if multiple vCPUs shared a physical APIC
ID.  As called out in the changelog that added the hack, all CPUs whose
(possibly truncated) APIC ID matches the target are supposed to receive
the IPI.

  KVM intentionally differs from real hardware, because real hardware
  (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
  accept the interrupt in xAPIC mode and it can deliver one interrupt to
  more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Applying the hack even when x2APIC is not fully enabled means KVM doesn't
correctly handle scenarios where the guest has aliased xAPIC IDs across
multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
interrupts.  It's extremely unlikely any real world guest aliase APIC IDs,
or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
vCPU is "normal".

Furthermore, the hack is _not_ guaranteed to work!  The hack works if and
only if the optimized APIC map is successfully allocated.  If the map
allocation fails (unlikely), KVM will fall back to its unoptimized
behavior, which _does_ honor the architectural behavior.

Pivot on 32-bit x2APIC IDs being enabled as that is required to take
advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
break existing setups unless they are way, way off in the weeds.

And an entry in KVM's errata to document the hack.  Alternatively, KVM
could provide an actual x2APIC quirk and document the hack that way, but
there's unlikely to ever be a use case for disabling the quirk.  Go the
errata route to avoid having to validate a quirk no one cares about.

Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/x86/errata.rst | 11 ++++++
 arch/x86/kvm/lapic.c                  | 50 ++++++++++++++++++++++-----
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 410e0aa63493..49a05f24747b 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -37,3 +37,14 @@ Nested virtualization features
 ------------------------------
 
 TBD
+
+x2APIC
+------
+When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
+allows sending events to a single vCPU using its x2APIC ID even if the target
+vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
+on VMs with > 255 vCPUs.  A side effect of the quirk is that, if multiple vCPUs
+have the same physical APIC ID, KVM will deliver events targeting that APIC ID
+only to the vCPU with the lowest vCPU ID.  If KVM_X2APIC_API_USE_32BIT_IDS is
+not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
+matching the target APIC ID receive the interrupt).
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 14f03e654de4..340c2d3e781b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -274,10 +274,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		struct kvm_lapic *apic = vcpu->arch.apic;
 		struct kvm_lapic **cluster;
 		enum kvm_apic_logical_mode logical_mode;
+		u32 x2apic_id, physical_id;
 		u16 mask;
 		u32 ldr;
 		u8 xapic_id;
-		u32 x2apic_id;
 
 		if (!kvm_apic_present(vcpu))
 			continue;
@@ -285,16 +285,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		xapic_id = kvm_xapic_id(apic);
 		x2apic_id = kvm_x2apic_id(apic);
 
-		/* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
-		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-				x2apic_id <= new->max_apic_id)
-			new->phys_map[x2apic_id] = apic;
 		/*
-		 * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
-		 * prevent them from masking VCPUs with APIC ID <= 0xff.
+		 * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
+		 * IDs.  Allow sending events to vCPUs by their x2APIC ID even
+		 * if the target vCPU is in legacy xAPIC mode, and silently
+		 * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
+		 * bits, causing IDs > 0xff to wrap and collide).
+		 *
+		 * Honor the architectural (and KVM's non-optimized) behavior
+		 * if userspace has not enabled 32-bit x2APIC IDs.  Each APIC
+		 * is supposed to process messages independently.  If multiple
+		 * vCPUs have the same effective APIC ID, e.g. due to the
+		 * x2APIC wrap or because the guest manually modified its xAPIC
+		 * IDs, events targeting that ID are supposed to be recognized
+		 * by all vCPUs with said ID.
 		 */
-		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
-			new->phys_map[xapic_id] = apic;
+		if (kvm->arch.x2apic_format) {
+			/* See also kvm_apic_match_physical_addr(). */
+			if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
+			    x2apic_id <= new->max_apic_id)
+				new->phys_map[x2apic_id] = apic;
+
+			if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
+				new->phys_map[xapic_id] = apic;
+		} else {
+			/*
+			 * Disable the optimized map if the physical APIC ID is
+			 * already mapped, i.e. is aliased to multiple vCPUs.
+			 * The optimized map requires a strict 1:1 mapping
+			 * between IDs and vCPUs.
+			 */
+			if (apic_x2apic_mode(apic))
+				physical_id = x2apic_id;
+			else
+				physical_id = xapic_id;
+
+			if (new->phys_map[physical_id]) {
+				kvfree(new);
+				new = NULL;
+				goto out;
+			}
+			new->phys_map[physical_id] = apic;
+		}
 
 		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
 		    !kvm_apic_sw_enabled(apic))
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 24/32] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (22 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 23/32] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:58   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 25/32] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode Sean Christopherson
                   ` (8 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM
KVM provides consistent APIC behavior if xAPIC IDs are aliased due to
vcpu_id being truncated and the x2APIC hotplug hack isn't enabled.  If
the hotplug hack is disabled, events that are emulated by KVM will follow
architectural behavior (all matching vCPUs receive events, even if the
"match" is due to truncation), whereas APICv and AVIC will deliver events
only to the first matching vCPU, i.e. the vCPU that matches without
truncation.

Note, the "extra" inhibit is needed because  KVM deliberately ignores
mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit
so that large VMs (>255 vCPUs) can run with APICv/AVIC.

Fixes: TDB
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/lapic.c            | 13 ++++++++++++-
 arch/x86/kvm/svm/avic.c         |  1 +
 arch/x86/kvm/vmx/vmx.c          |  1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ac28bbfbf0e3..171e38b94c89 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1104,6 +1104,12 @@ enum kvm_apicv_inhibit {
 	 */
 	APICV_INHIBIT_REASON_BLOCKIRQ,
 
+	/*
+	 * APICv is disabled because not all vCPUs have a 1:1 mapping between
+	 * APIC ID and vCPU, _and_ KVM is not applying its x2APIC hotplug hack.
+	 */
+	APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED,
+
 	/*
 	 * For simplicity, the APIC acceleration is inhibited
 	 * first time either APIC ID or APIC base are changed by the guest
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 340c2d3e781b..f6f328d36ae2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -381,6 +381,16 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		cluster[ldr] = apic;
 	}
 out:
+	/*
+	 * The optimized map is effectively KVM's internal version of APICv,
+	 * and all unwanted aliasing that results in disabling the optimized
+	 * map also applies to APICv.
+	 */
+	if (!new)
+		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
+	else
+		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
+
 	old = rcu_dereference_protected(kvm->arch.apic_map,
 			lockdep_is_held(&kvm->arch.apic_map_lock));
 	rcu_assign_pointer(kvm->arch.apic_map, new);
@@ -2153,7 +2163,8 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
 	/*
 	 * Deliberately truncate the vCPU ID when detecting a modified APIC ID
 	 * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
-	 * value.
+	 * value.  If the wrap/truncation results in unwatned aliasing, APICv
+	 * will be inhibited as part of updating KVM's optimized APIC maps.
 	 */
 	if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
 		return;
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index dd0e41d454a7..2908adc79ea6 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -965,6 +965,7 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
 			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
 			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
 			  BIT(APICV_INHIBIT_REASON_SEV)      |
+			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
 			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
 			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
 			  BIT(APICV_INHIBIT_REASON_X2APIC);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 974d9a366d5d..5920166d7260 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7955,6 +7955,7 @@ static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
 			  BIT(APICV_INHIBIT_REASON_ABSENT) |
 			  BIT(APICV_INHIBIT_REASON_HYPERV) |
 			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
+			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
 			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
 			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 25/32] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (23 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 24/32] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:58   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 26/32] KVM: SVM: Always update local APIC on writes to logical dest register Sean Christopherson
                   ` (7 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID.
Architecturally, all CPUs whose logical ID matches the MDA are supposed
to receive the interrupt; overwriting existing entries in AVIC's
logical=>physical map can result in missed IPIs.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++++
 arch/x86/kvm/lapic.c            | 5 +++++
 arch/x86/kvm/svm/avic.c         | 3 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 171e38b94c89..4fd06965c773 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1159,6 +1159,12 @@ enum kvm_apicv_inhibit {
 	 * APIC base.  For simplicity, this is sticky.
 	 */
 	APICV_INHIBIT_REASON_X2APIC,
+
+	/*
+	 * AVIC is disabled because not all vCPUs with a valid LDR have a 1:1
+	 * mapping between logical ID and vCPU.
+	 */
+	APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,
 };
 
 struct kvm_arch {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f6f328d36ae2..9b3af49d2524 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -391,6 +391,11 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 	else
 		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
 
+	if (!new || new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
+		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+	else
+		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+
 	old = rcu_dereference_protected(kvm->arch.apic_map,
 			lockdep_is_held(&kvm->arch.apic_map_lock));
 	rcu_assign_pointer(kvm->arch.apic_map, new);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 2908adc79ea6..27d5abc15a91 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -968,7 +968,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
 			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
 			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
 			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
-			  BIT(APICV_INHIBIT_REASON_X2APIC);
+			  BIT(APICV_INHIBIT_REASON_X2APIC) |
+			  BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
 
 	return supported & BIT(reason);
 }
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 26/32] KVM: SVM: Always update local APIC on writes to logical dest register
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (24 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 25/32] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:58   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 27/32] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad" Sean Christopherson
                   ` (6 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Update the vCPU's local (virtual) APIC on LDR writes even if the write
"fails".  The APIC needs to recalc the optimized logical map even if the
LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized
map will be left as is and the vCPU will receive interrupts using its
old LDR.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 27d5abc15a91..2b640c73f447 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -573,7 +573,7 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
 		clear_bit(AVIC_LOGICAL_ID_ENTRY_VALID_BIT, (unsigned long *)entry);
 }
 
-static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
+static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
 {
 	int ret = 0;
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -582,10 +582,10 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
 
 	/* AVIC does not support LDR update for x2APIC */
 	if (apic_x2apic_mode(vcpu->arch.apic))
-		return 0;
+		return;
 
 	if (ldr == svm->ldr_reg)
-		return 0;
+		return;
 
 	avic_invalidate_logical_id_entry(vcpu);
 
@@ -594,8 +594,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
 
 	if (!ret)
 		svm->ldr_reg = ldr;
-
-	return ret;
 }
 
 static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
@@ -617,8 +615,7 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)
 
 	switch (offset) {
 	case APIC_LDR:
-		if (avic_handle_ldr_update(vcpu))
-			return 0;
+		avic_handle_ldr_update(vcpu);
 		break;
 	case APIC_DFR:
 		avic_handle_dfr_update(vcpu);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 27/32] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (25 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 26/32] KVM: SVM: Always update local APIC on writes to logical dest register Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 21:59   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry Sean Christopherson
                   ` (5 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Update SVM's cache of the LDR even if the new value is "bad".  Leaving
stale information in the cache can result in KVM missing updates and/or
invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry()
is triggered after a different vCPU has "claimed" the old LDR.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 2b640c73f447..4b6fc9d64f4d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -539,23 +539,24 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
 	return &logical_apic_id_table[index];
 }
 
-static int avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
+static void avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
 {
 	bool flat;
 	u32 *entry, new_entry;
 
+	if (!ldr)
+		return;
+
 	flat = kvm_lapic_get_reg(vcpu->arch.apic, APIC_DFR) == APIC_DFR_FLAT;
 	entry = avic_get_logical_id_entry(vcpu, ldr, flat);
 	if (!entry)
-		return -EINVAL;
+		return;
 
 	new_entry = READ_ONCE(*entry);
 	new_entry &= ~AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
 	new_entry |= (g_physical_id & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK);
 	new_entry |= AVIC_LOGICAL_ID_ENTRY_VALID_MASK;
 	WRITE_ONCE(*entry, new_entry);
-
-	return 0;
 }
 
 static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
@@ -575,7 +576,6 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
 
 static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
 {
-	int ret = 0;
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u32 ldr = kvm_lapic_get_reg(vcpu->arch.apic, APIC_LDR);
 	u32 id = kvm_xapic_id(vcpu->arch.apic);
@@ -589,11 +589,8 @@ static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
 
 	avic_invalidate_logical_id_entry(vcpu);
 
-	if (ldr)
-		ret = avic_ldr_write(vcpu, id, ldr);
-
-	if (!ret)
-		svm->ldr_reg = ldr;
+	svm->ldr_reg = ldr;
+	avic_ldr_write(vcpu, id, ldr);
 }
 
 static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (26 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 27/32] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad" Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 22:00   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 29/32] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath Sean Christopherson
                   ` (4 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Do not modify AVIC's logical ID table if the logical ID portion of the
LDR is not a power-of-2, i.e. if the LDR has multiple bits set.  Taking
only the first bit means that KVM will fail to match MDAs that intersect
with "higher" bits in the "ID"

The "ID" acts as a bitmap, but is referred to as an ID because theres an
implicit, unenforced "requirement" that software only set one bit.  This
edge case is arguably out-of-spec behavior, but KVM cleanly handles it
in all other cases, e.g. the optimized logical map (and AVIC!) is also
disabled in this scenario.

Refactor the code to consolidate the checks, and so that the code looks
more like avic_kick_target_vcpus_fast().

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 4b6fc9d64f4d..a9e4e09f83fc 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -513,26 +513,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
 static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
 {
 	struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
-	int index;
 	u32 *logical_apic_id_table;
-	int dlid = GET_APIC_LOGICAL_ID(ldr);
+	u32 cluster, index;
 
-	if (!dlid)
-		return NULL;
+	ldr = GET_APIC_LOGICAL_ID(ldr);
 
-	if (flat) { /* flat */
-		index = ffs(dlid) - 1;
-		if (index > 7)
+	if (flat) {
+		cluster = 0;
+	} else {
+		cluster = (ldr >> 4) << 2;
+		if (cluster >= 0xf)
 			return NULL;
-	} else { /* cluster */
-		int cluster = (dlid & 0xf0) >> 4;
-		int apic = ffs(dlid & 0x0f) - 1;
-
-		if ((apic < 0) || (apic > 7) ||
-		    (cluster >= 0xf))
-			return NULL;
-		index = (cluster << 2) + apic;
+		ldr &= 0xf;
 	}
+	if (!ldr || !is_power_of_2(ldr))
+		return NULL;
+
+	index = __ffs(ldr);
+	if (WARN_ON_ONCE(index > 7))
+		return NULL;
+	index += (cluster << 2);
 
 	logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 29/32] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (27 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 22:00   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 30/32] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps Sean Christopherson
                   ` (3 subsequent siblings)
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Iterate over all target logical IDs in the AVIC kick fastpath instead of
bailing if there is more than one target.  Now that KVM inhibits AVIC if
vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is
guaranteed to match to at most one vCPU, i.e. iterating over the bitmap
is guaranteed to kick each valid target exactly once.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 112 ++++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index a9e4e09f83fc..17e64b056e4e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -327,6 +327,50 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
 					icrl & APIC_VECTOR_MASK);
 }
 
+static void avic_kick_vcpu_by_physical_id(struct kvm *kvm, u32 physical_id,
+					  u32 icrl)
+{
+	/*
+	 * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+	 * i.e. APIC ID == vCPU ID.
+	 */
+	struct kvm_vcpu *target_vcpu = kvm_get_vcpu_by_id(kvm, physical_id);
+
+	/* Once again, nothing to do if the target vCPU doesn't exist. */
+	if (unlikely(!target_vcpu))
+		return;
+
+	avic_kick_vcpu(target_vcpu, icrl);
+}
+
+static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
+					 u32 logid_index, u32 icrl)
+{
+	u32 physical_id;
+
+	if (avic_logical_id_table) {
+		u32 logid_entry = avic_logical_id_table[logid_index];
+
+		/* Nothing to do if the logical destination is invalid. */
+		if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
+			return;
+
+		physical_id = logid_entry &
+			      AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
+	} else {
+		/*
+		 * For x2APIC, the logical APIC ID is a read-only value that is
+		 * derived from the x2APIC ID, thus the x2APIC ID can be found
+		 * by reversing the calculation (stored in logid_index).  Note,
+		 * bits 31:20 of the x2APIC ID aren't propagated to the logical
+		 * ID, but KVM limits the x2APIC ID limited to KVM_MAX_VCPU_IDS.
+		 */
+		physical_id = logid_index;
+	}
+
+	avic_kick_vcpu_by_physical_id(kvm, physical_id, icrl);
+}
+
 /*
  * A fast-path version of avic_kick_target_vcpus(), which attempts to match
  * destination APIC ID to vCPU without looping through all vCPUs.
@@ -334,11 +378,10 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
 static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source,
 				       u32 icrl, u32 icrh, u32 index)
 {
-	u32 l1_physical_id, dest;
-	struct kvm_vcpu *target_vcpu;
 	int dest_mode = icrl & APIC_DEST_MASK;
 	int shorthand = icrl & APIC_SHORT_MASK;
 	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+	u32 dest;
 
 	if (shorthand != APIC_DEST_NOSHORT)
 		return -EINVAL;
@@ -355,14 +398,14 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 		if (!apic_x2apic_mode(source) && dest == APIC_BROADCAST)
 			return -EINVAL;
 
-		l1_physical_id = dest;
-
-		if (WARN_ON_ONCE(l1_physical_id != index))
+		if (WARN_ON_ONCE(dest != index))
 			return -EINVAL;
 
+		avic_kick_vcpu_by_physical_id(kvm, dest, icrl);
 	} else {
-		u32 bitmap, cluster;
-		int logid_index;
+		u32 *avic_logical_id_table;
+		unsigned long bitmap, i;
+		u32 cluster;
 
 		if (apic_x2apic_mode(source)) {
 			/* 16 bit dest mask, 16 bit cluster id */
@@ -382,50 +425,21 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
 		if (unlikely(!bitmap))
 			return 0;
 
-		if (!is_power_of_2(bitmap))
-			/* multiple logical destinations, use slow path */
-			return -EINVAL;
-
-		logid_index = cluster + __ffs(bitmap);
-
-		if (apic_x2apic_mode(source)) {
-			/*
-			 * For x2APIC, the logical APIC ID is a read-only value
-			 * that is derived from the x2APIC ID, thus the x2APIC
-			 * ID can be found by reversing the calculation (done
-			 * above).  Note, bits 31:20 of the x2APIC ID are not
-			 * propagated to the logical ID, but KVM limits the
-			 * x2APIC ID limited to KVM_MAX_VCPU_IDS.
-			 */
-			l1_physical_id = logid_index;
-		} else {
-			u32 *avic_logical_id_table =
-				page_address(kvm_svm->avic_logical_id_table_page);
-
-			u32 logid_entry = avic_logical_id_table[logid_index];
-
-			if (WARN_ON_ONCE(index != logid_index))
-				return -EINVAL;
-
-			/* Nothing to do if the logical destination is invalid. */
-			if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
-				return 0;
-
-			l1_physical_id = logid_entry &
-					 AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
-		}
+		if (apic_x2apic_mode(source))
+			avic_logical_id_table = NULL;
+		else
+			avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page);
+
+		/*
+		 * AVIC is inhibited if vCPUs aren't mapped 1:1 with logical
+		 * IDs, thus each bit in the destination is guaranteed to map
+		 * to at most one vCPU.
+		 */
+		for_each_set_bit(i, &bitmap, 16)
+			avic_kick_vcpu_by_logical_id(kvm, avic_logical_id_table,
+						     cluster + i, icrl);
 	}
 
-	/*
-	 * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
-	 * i.e. APIC ID == vCPU ID.  Once again, nothing to do if the target
-	 * vCPU doesn't exist.
-	 */
-	target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
-	if (unlikely(!target_vcpu))
-		return 0;
-
-	avic_kick_vcpu(target_vcpu, icrl);
 	return 0;
 }
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 30/32] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (28 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 29/32] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-10-01  0:59 ` [PATCH v4 31/32] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu" Sean Christopherson
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC
unaccelerated write traps.  The register is read-only and isn't emulated
by KVM.  Sending the register through kvm_apic_write_nodecode() will
result in screaming when x2APIC is enabled due to the unexpected failure
to retrieve the MSR (KVM expects that only "legal" accesses will trap).

Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 17e64b056e4e..953b1fd14b6d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -631,6 +631,9 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)
 	case APIC_DFR:
 		avic_handle_dfr_update(vcpu);
 		break;
+	case APIC_RRR:
+		/* Ignore writes to Read Remote Data, it's read-only. */
+		return 1;
 	default:
 		break;
 	}
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 31/32] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu"
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (29 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 30/32] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 22:01   ` Maxim Levitsky
  2022-10-01  0:59 ` [PATCH v4 32/32] KVM: x86: Track required APICv inhibits with variable, not callback Sean Christopherson
  2022-12-27 11:22 ` [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Paolo Bonzini
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Turns out that some warnings exist for good reasons.  Restore the warning
in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
running vCPU now that KVM avoids doing so when switching between x2APIC
and xAPIC.  The entire point of the WARN is to highlight that KVM should
not be reloading an AVIC.

Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
spamming the kernel if it does fire.

This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 953b1fd14b6d..35b0ef877e53 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -1038,6 +1038,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		return;
 
 	entry = READ_ONCE(*(svm->avic_physical_id_cache));
+	WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
 
 	entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
 	entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 32/32] KVM: x86: Track required APICv inhibits with variable, not callback
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (30 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 31/32] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu" Sean Christopherson
@ 2022-10-01  0:59 ` Sean Christopherson
  2022-12-08 22:03   ` Maxim Levitsky
  2022-12-27 11:22 ` [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Paolo Bonzini
  32 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-10-01  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Track the per-vendor required APICv inhibits with a variable instead of
calling into vendor code every time KVM wants to query the set of
required inhibits.  The required inhibits are a property of the vendor's
virtualization architecture, i.e. are 100% static.

Using a variable allows the compiler to inline the check, e.g. generate
a single-uop TEST+Jcc, and thus eliminates any desire to avoid checking
inhibits for performance reasons.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 -
 arch/x86/include/asm/kvm_host.h    |  2 +-
 arch/x86/kvm/svm/avic.c            | 20 --------------------
 arch/x86/kvm/svm/svm.c             |  2 +-
 arch/x86/kvm/svm/svm.h             | 17 ++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c             | 24 +++++++++++-------------
 arch/x86/kvm/x86.c                 |  9 +++++++--
 7 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 82ba4a564e58..533d0e804e5a 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -76,7 +76,6 @@ KVM_X86_OP(set_nmi_mask)
 KVM_X86_OP(enable_nmi_window)
 KVM_X86_OP(enable_irq_window)
 KVM_X86_OP_OPTIONAL(update_cr8_intercept)
-KVM_X86_OP(check_apicv_inhibit_reasons)
 KVM_X86_OP(refresh_apicv_exec_ctrl)
 KVM_X86_OP_OPTIONAL(hwapic_irr_update)
 KVM_X86_OP_OPTIONAL(hwapic_isr_update)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4fd06965c773..41d1c684a1e1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1566,7 +1566,7 @@ struct kvm_x86_ops {
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
-	bool (*check_apicv_inhibit_reasons)(enum kvm_apicv_inhibit reason);
+	const unsigned long required_apicv_inhibits;
 	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
 	void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
 	void (*hwapic_isr_update)(int isr);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 35b0ef877e53..5b25944274b5 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -966,26 +966,6 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
 	return ret;
 }
 
-bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
-{
-	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
-			  BIT(APICV_INHIBIT_REASON_ABSENT) |
-			  BIT(APICV_INHIBIT_REASON_HYPERV) |
-			  BIT(APICV_INHIBIT_REASON_NESTED) |
-			  BIT(APICV_INHIBIT_REASON_IRQWIN) |
-			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
-			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
-			  BIT(APICV_INHIBIT_REASON_SEV)      |
-			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
-			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
-			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
-			  BIT(APICV_INHIBIT_REASON_X2APIC) |
-			  BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
-
-	return supported & BIT(reason);
-}
-
-
 static inline int
 avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
 {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 37fe7bcf8496..6bca3f9f39e6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4800,8 +4800,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.update_cr8_intercept = svm_update_cr8_intercept,
 	.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
 	.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
-	.check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
 	.apicv_post_state_restore = avic_apicv_post_state_restore,
+	.required_apicv_inhibits = AVIC_REQUIRED_APICV_INHIBITS,
 
 	.get_exit_info = svm_get_exit_info,
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 29c334a932c3..dfde85782da5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -619,6 +619,22 @@ void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
 extern struct kvm_x86_nested_ops svm_nested_ops;
 
 /* avic.c */
+#define AVIC_REQUIRED_APICV_INHIBITS			\
+(							\
+	BIT(APICV_INHIBIT_REASON_DISABLE) |		\
+	BIT(APICV_INHIBIT_REASON_ABSENT) |		\
+	BIT(APICV_INHIBIT_REASON_HYPERV) |		\
+	BIT(APICV_INHIBIT_REASON_NESTED) |		\
+	BIT(APICV_INHIBIT_REASON_IRQWIN) |		\
+	BIT(APICV_INHIBIT_REASON_PIT_REINJ) |		\
+	BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |		\
+	BIT(APICV_INHIBIT_REASON_SEV)      |		\
+	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
+	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
+	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |	\
+	BIT(APICV_INHIBIT_REASON_X2APIC) |		\
+	BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED)	\
+)
 
 bool avic_hardware_setup(struct kvm_x86_ops *ops);
 int avic_ga_log_notifier(u32 ga_tag);
@@ -632,7 +648,6 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void avic_vcpu_put(struct kvm_vcpu *vcpu);
 void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu);
 void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
-bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
 int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
 			uint32_t guest_irq, bool set);
 void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5920166d7260..54bc447ba87e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7949,18 +7949,16 @@ static void vmx_hardware_unsetup(void)
 	free_kvm_area();
 }
 
-static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
-{
-	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
-			  BIT(APICV_INHIBIT_REASON_ABSENT) |
-			  BIT(APICV_INHIBIT_REASON_HYPERV) |
-			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
-			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
-			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
-			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
-
-	return supported & BIT(reason);
-}
+#define VMX_REQUIRED_APICV_INHIBITS			\
+(							\
+	BIT(APICV_INHIBIT_REASON_DISABLE)|		\
+	BIT(APICV_INHIBIT_REASON_ABSENT) |		\
+	BIT(APICV_INHIBIT_REASON_HYPERV) |		\
+	BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |		\
+	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
+	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
+	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED)	\
+)
 
 static void vmx_vm_destroy(struct kvm *kvm)
 {
@@ -8044,7 +8042,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
 	.apicv_post_state_restore = vmx_apicv_post_state_restore,
-	.check_apicv_inhibit_reasons = vmx_check_apicv_inhibit_reasons,
+	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
 	.hwapic_irr_update = vmx_hwapic_irr_update,
 	.hwapic_isr_update = vmx_hwapic_isr_update,
 	.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a20002924eb4..b307637d9200 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10251,6 +10251,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
 	kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
+static bool kvm_is_required_apicv_inhibit(enum kvm_apicv_inhibit reason)
+{
+	return kvm_x86_ops.required_apicv_inhibits & BIT(reason);
+}
+
 void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
@@ -10294,7 +10299,7 @@ static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 		return;
 
 	if (apic_x2apic_mode(vcpu->arch.apic) &&
-	    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC))
+	    kvm_is_required_apicv_inhibit(APICV_INHIBIT_REASON_X2APIC))
 		kvm_inhibit_apic_access_page(vcpu);
 
 	__kvm_vcpu_update_apicv(vcpu);
@@ -10307,7 +10312,7 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 
 	lockdep_assert_held_write(&kvm->arch.apicv_update_lock);
 
-	if (!static_call(kvm_x86_check_apicv_inhibit_reasons)(reason))
+	if (!kvm_is_required_apicv_inhibit(reason))
 		return;
 
 	old = new = kvm->arch.apicv_inhibit_reasons;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 01/32] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps
  2022-10-01  0:58 ` [PATCH v4 01/32] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps Sean Christopherson
@ 2022-12-08 21:47   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:47 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> When emulating a x2APIC write in response to an APICv/AVIC trap, get the
> the written value from the vAPIC page without checking that reads are
> allowed for the target register.  AVIC can generate trap-like VM-Exits on
> writes to EOI, and so KVM needs to get the written value from the backing
> page without running afoul of EOI's write-only behavior.
> 
> Alternatively, EOI could be special cased to always write '0', e.g. so
> that the sanity check could be preserved, but x2APIC on AMD is actually
> supposed to disallow non-zero writes (not emulated by KVM), and the
> sanity check was a byproduct of how the KVM code was written, i.e. wasn't
> added to guard against anything in particular.
> 
> Fixes: 70c8327c11c6 ("KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg")
> Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers")
> Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index d7639d126e6c..05d079fc2c66 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2284,23 +2284,18 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	u64 val;
>  
> -	if (apic_x2apic_mode(apic)) {
> -		if (KVM_BUG_ON(kvm_lapic_msr_read(apic, offset, &val), vcpu->kvm))
> -			return;
> -	} else {
> -		val = kvm_lapic_get_reg(apic, offset);
> -	}
> -
>  	/*
>  	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
>  	 * xAPIC, ICR writes need to go down the common (slightly slower) path
>  	 * to get the upper half from ICR2.
>  	 */
>  	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
> +		val = kvm_lapic_get_reg64(apic, APIC_ICR);
>  		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
>  		trace_kvm_apic_write(APIC_ICR, val);
>  	} else {
>  		/* TODO: optimize to just emulate side effect w/o one more write */
> +		val = kvm_lapic_get_reg(apic, offset);
>  		kvm_lapic_reg_write(apic, offset, (u32)val);
>  	}
>  }

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 02/32] KVM: x86: Purge "highest ISR" cache when updating APICv state
  2022-10-01  0:58 ` [PATCH v4 02/32] KVM: x86: Purge "highest ISR" cache when updating APICv state Sean Christopherson
@ 2022-12-08 21:47   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:47 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Purge the "highest ISR" cache when updating APICv state on a vCPU.  The
> cache must not be used when APICv is active as hardware may emulate EOIs
> (and other operations) without exiting to KVM.
> 
> This fixes a bug where KVM will effectively block IRQs in perpetuity due
> to the "highest ISR" never getting reset if APICv is activated on a vCPU
> while an IRQ is in-service.  Hardware emulates the EOI and KVM never gets
> a chance to update its cache.
> 
> Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function")
> Cc: stable@vger.kernel.org
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 05d079fc2c66..5de1c7aa1ce9 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2424,6 +2424,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
>  		 */
>  		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
>  	}
> +	apic->highest_isr_cache = -1;
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);
>  
> @@ -2480,7 +2481,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
>  	}
>  	kvm_apic_update_apicv(vcpu);
> -	apic->highest_isr_cache = -1;
>  	update_divide_count(apic);
>  	atomic_set(&apic->lapic_timer.pending, 0);
>  
> @@ -2767,7 +2767,6 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>  	__start_apic_timer(apic, APIC_TMCCT);
>  	kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
>  	kvm_apic_update_apicv(vcpu);
> -	apic->highest_isr_cache = -1;
>  	if (apic->apicv_active) {
>  		static_call_cond(kvm_x86_apicv_post_state_restore)(vcpu);
>  		static_call_cond(kvm_x86_hwapic_irr_update)(vcpu, apic_find_highest_irr(apic));

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 03/32] KVM: SVM: Flush the "current" TLB when activating AVIC
       [not found]   ` <b9f336f17eec6bfbb8429700e0f135d19813c576.camel@redhat.com>
@ 2022-12-08 21:52     ` Maxim Levitsky
  2022-12-09  0:40       ` Sean Christopherson
  2022-12-08 22:02     ` Maxim Levitsky
  1 sibling, 1 reply; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote:
On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Flush the TLB when activating AVIC as the CPU can insert into the TLB
> while AVIC is "locally" disabled.  KVM doesn't treat "APIC hardware
> disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
> hardware disabled, AVIC is not guaranteed to be inhibited.  As a result,
> KVM may create a valid NPT mapping for the APIC base, which the CPU can
> cache as a non-AVIC translation.
> 
> Note, Intel handles this in vmx_set_virtual_apic_mode().
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 6919dee69f18..712330b80891 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>                 /* Disabling MSR intercept for x2APIC registers */
>                 svm_set_x2apic_msr_interception(svm, false);
>         } else {
> +               /*
> +                * Flush the TLB, the guest may have inserted a non-APIC
> +                * mapping into the TLB while AVIC was disabled.
> +                */
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
> +
>                 /* For xAVIC and hybrid-xAVIC modes */
>                 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
>                 /* Enabling MSR intercept for x2APIC registers */


I agree, that if guest disables APIC on a vCPU, this will lead to call to kvm_apic_update_apicv which will
disable AVIC, but if other vCPUs don't disable it, the AVIC's private memslot will still be mapped and
guest could read/write it from this vCPU, and its TLB mapping needs to be invalidated if/when APIC is re-enabled.
 
However I think that this adds an unnecessarily (at least in the future) performance penalty to AVIC nesting coexistence:
 
L1's AVIC is inhibited on each nested VM entry, and uninhibited on each nested VM exit, but while nested the guest
can't really access it as it has its own NPT.
 
With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM sadly already does this but this can be fixed
(its another thing on my TODO list)
 
Note that APICv doesn't have this issue, it is not inhibited on nested VM entry/exit, thus this code is not performance
sensitive for APICv.
 
 
I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if any of vCPUs have APICv/AVIC hardware disabled,
because it is also more correct from an x86 perspective. I do wonder how often is the usage of having "extra" cpus but not using them, and thus having their APIC in disabled state.
KVM does support adding new vCPUs on the fly, so this shouldn't be needed, and APICv inhibit in this case is just a perf regression.

Or at least do this only when APIC does back from hardware disabled state to enabled.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 05/32] KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled
  2022-10-01  0:58 ` [PATCH v4 05/32] KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled Sean Christopherson
@ 2022-12-08 21:53   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Don't inhibit APICv/AVIC due to an xAPIC ID mismatch if the APIC is
> hardware disabled.  The ID cannot be consumed while the APIC is disabled,
> and the ID is guaranteed to be set back to the vcpu_id when the APIC is
> hardware enabled (architectural behavior correctly emulated by KVM).
> 
> Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5de1c7aa1ce9..67260f7ce43a 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2072,6 +2072,9 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
>  {
>  	struct kvm *kvm = apic->vcpu->kvm;
>  
> +	if (!kvm_apic_hw_enabled(apic))
> +		return;
> +
>  	if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
>  		return;
>  
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 06/32] KVM: x86: Track xAPIC ID only on userspace SET, _after_ vAPIC is updated
  2022-10-01  0:58 ` [PATCH v4 06/32] KVM: x86: Track xAPIC ID only on userspace SET, _after_ vAPIC is updated Sean Christopherson
@ 2022-12-08 21:53   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Track potential changes to a vCPU's xAPIC ID only for KVM_SET_LAPIC, i.e.
> not for KVM_GET_LAPIC, and process the update after the incoming state
> provided by userspace is copied to KVM's in-kernel vAPIC.  The latter bug
> is the most problematic issue, as processing the update before KVM's
> vAPIC is actually updated can result in false positives, e.g. due to the
> APIC holding an x2APIC ID (wrong format), and false negatives, e.g. due
> to KVM failing to detect an xAPIC ID "mismatch".
> 
> Processing an "update" in KVM_GET_LAPIC is likely a benign bug now that
> the update helper ignores mismatches, but prior to that fix, invoking
> KVM_GET_LAPIC while the APIC is disabled could effectively cause KVM to
> consume stale state, e.g. if the APIC were in x2APIC mode before being
> hardware disabled.
> 
> Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
> Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 67260f7ce43a..251856ba0750 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2720,8 +2720,6 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>  			icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
>  			__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
>  		}
> -	} else {
> -		kvm_lapic_xapic_id_updated(vcpu->arch.apic);
>  	}
>  
>  	return 0;
> @@ -2757,6 +2755,9 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>  	}
>  	memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
>  
> +	if (!apic_x2apic_mode(vcpu->arch.apic))
> +		kvm_lapic_xapic_id_updated(vcpu->arch.apic);
> +
>  	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
>  	kvm_recalculate_apic_map(vcpu->kvm);
>  	kvm_apic_set_version(vcpu);


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 07/32] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID
  2022-10-01  0:58 ` [PATCH v4 07/32] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID Sean Christopherson
@ 2022-12-08 21:53   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
> it against the xAPIC ID to avoid false positives (sort of) on systems
> with >255 CPUs, i.e. with IDs that don't fit into a u8.  The intent of
> APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
> it's original value,
> 
> The mismatch isn't technically a false positive, as architecturally the
> xAPIC IDs do end up being aliased in this scenario, and neither APICv
> nor AVIC correctly handles IPI virtualization when there is aliasing.
> However, KVM already deliberately does not honor the aliasing behavior
> that results when an x2APIC ID gets truncated to an xAPIC ID.  I.e. the
> resulting APICv/AVIC behavior is aligned with KVM's existing behavior
> when KVM's x2APIC hotplug hack is effectively enabled.
> 
> If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
> piggyback whatever logic disables the optimized APIC map (which is what
> provides the hotplug hack), i.e. so that KVM's optimized map and APIC
> virtualization yield the same behavior.
> 
> For now, fix the immediate problem of APIC virtualization being disabled
> for large VMs, which is a much more pressing issue than ensuring KVM
> honors architectural behavior for APIC ID aliasing.
> 
> Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
> Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 251856ba0750..2503f162eb95 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2078,7 +2078,12 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
>  	if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
>  		return;
>  
> -	if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
> +	/*
> +	 * Deliberately truncate the vCPU ID when detecting a modified APIC ID
> +	 * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
> +	 * value.
> +	 */
> +	if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
>  		return;
>  
>  	kvm_set_apicv_inhibit(apic->vcpu->kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 08/32] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
  2022-10-01  0:58 ` [PATCH v4 08/32] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode Sean Christopherson
@ 2022-12-08 21:53   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
> avic_set_virtual_apic_mode() and invert the dependency being said
> functions to avoid calling avic_vcpu_{load,put}() and
> avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.
> 
> avic_set_virtual_apic_mode() is invoked from common x86 with preemption
> enabled, which makes avic_vcpu_{load,put}() unhappy.  Luckily, calling
> those and updating IRTE stuff is unnecessary as the only reason
> avic_set_virtual_apic_mode() is called is to handle transitions between
> xAPIC and x2APIC that don't also toggle APICv activation.  And if
> activation doesn't change, there's no need to fiddle with the physical
> APIC ID table or update IRTE.
> 
> The "full" refresh is guaranteed to be called if activation changes in
> this case as the only call to the "set" path is:
> 
> 	kvm_vcpu_update_apicv(vcpu);
> 	static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
> 
> and kvm_vcpu_update_apicv() invokes the refresh if activation changes:
> 
> 	if (apic->apicv_active == activate)
> 		goto out;
> 
> 	apic->apicv_active = activate;
> 	kvm_apic_update_apicv(vcpu);
> 	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
> 
> Rename the helper to reflect that it is also called during "refresh".
> 
>   WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
>   CPU: 183 PID: 49186 Comm: stable Tainted: G           O       6.0.0-smp--fcddbca45f0a-sink #34
>   Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
>   RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
>    avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
>    avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
>    kvm_lapic_set_base+0x149/0x1a0 [kvm]
>    kvm_set_apic_base+0x8f/0xd0 [kvm]
>    kvm_set_msr_common+0xa3a/0xdc0 [kvm]
>    svm_set_msr+0x364/0x6b0 [kvm_amd]
>    __kvm_set_msr+0xb8/0x1c0 [kvm]
>    kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
>    msr_interception+0x1c/0x30 [kvm_amd]
>    svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
>    svm_handle_exit+0xfc/0x160 [kvm_amd]
>    vcpu_enter_guest+0x21bb/0x23e0 [kvm]
>    vcpu_run+0x92/0x450 [kvm]
>    kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
>    kvm_vcpu_ioctl+0x559/0x620 [kvm]
> 
> Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
> Cc: stable@vger.kernel.org
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 31 +++++++++++++++----------------
>  arch/x86/kvm/svm/svm.c  |  2 +-
>  arch/x86/kvm/svm/svm.h  |  2 +-
>  3 files changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 3b2c88b168ba..97ad0661f963 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -747,18 +747,6 @@ void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
>  	avic_handle_ldr_update(vcpu);
>  }
>  
> -void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
> -{
> -	if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
> -		return;
> -
> -	if (kvm_get_apic_mode(vcpu) == LAPIC_MODE_INVALID) {
> -		WARN_ONCE(true, "Invalid local APIC state (vcpu_id=%d)", vcpu->vcpu_id);
> -		return;
> -	}
> -	avic_refresh_apicv_exec_ctrl(vcpu);
> -}
> -
>  static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
>  {
>  	int ret = 0;
> @@ -1100,17 +1088,18 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
>  	WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
>  }
>  
> -
> -void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> +void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct vmcb *vmcb = svm->vmcb01.ptr;
> -	bool activated = kvm_vcpu_apicv_active(vcpu);
> +
> +	if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
> +		return;
>  
>  	if (!enable_apicv)
>  		return;
>  
> -	if (activated) {
> +	if (kvm_vcpu_apicv_active(vcpu)) {
>  		/**
>  		 * During AVIC temporary deactivation, guest could update
>  		 * APIC ID, DFR and LDR registers, which would not be trapped
> @@ -1124,6 +1113,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>  		avic_deactivate_vmcb(svm);
>  	}
>  	vmcb_mark_dirty(vmcb, VMCB_AVIC);
> +}
> +
> +void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> +{
> +	bool activated = kvm_vcpu_apicv_active(vcpu);
> +
> +	if (!enable_apicv)
> +		return;
> +
> +	avic_refresh_virtual_apic_mode(vcpu);
>  
>  	if (activated)
>  		avic_vcpu_load(vcpu, vcpu->cpu);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58f0077d9357..afae97ea9a06 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4798,7 +4798,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.enable_nmi_window = svm_enable_nmi_window,
>  	.enable_irq_window = svm_enable_irq_window,
>  	.update_cr8_intercept = svm_update_cr8_intercept,
> -	.set_virtual_apic_mode = avic_set_virtual_apic_mode,
> +	.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
>  	.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
>  	.check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
>  	.apicv_post_state_restore = avic_apicv_post_state_restore,
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 6a7686bf6900..7a95f50e80e7 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -646,7 +646,7 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
>  void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
>  void avic_ring_doorbell(struct kvm_vcpu *vcpu);
>  unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);
> -void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
> +void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
>  
>  
>  /* sev.c */

Makes sense. I missed that during x2avic review. That warning about preemption is a good one.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 09/32] KVM: x86: Handle APICv updates for APIC "mode" changes via request
  2022-10-01  0:58 ` [PATCH v4 09/32] KVM: x86: Handle APICv updates for APIC "mode" changes via request Sean Christopherson
@ 2022-12-08 21:54   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:54 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Use KVM_REQ_UPDATE_APICV to react to APIC "mode" changes, i.e. to handle
> the APIC being hardware enabled/disabled and/or x2APIC being toggled.
> There is no need to immediately update APICv state, the only requirement
> is that APICv be updating prior to the next VM-Enter.
> 
> Making a request will allow piggybacking KVM_REQ_UPDATE_APICV to "inhibit"
> the APICv memslot when x2APIC is enabled.  Doing that directly from
> kvm_lapic_set_base() isn't feasible as KVM's SRCU must not be held when
> modifying memslots (to avoid deadlock), and may or may not be held when
> kvm_lapic_set_base() is called, i.e. KVM can't do the right thing without
> tracking that is rightly buried behind CONFIG_PROVE_RCU=y.
> 
> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 2503f162eb95..316b61b56cca 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2401,7 +2401,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>  		kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
>  
>  	if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
> -		kvm_vcpu_update_apicv(vcpu);
> +		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
>  		static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
>  	}
>  

Makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 10/32] KVM: x86: Move APIC access page helper to common x86 code
  2022-10-01  0:58 ` [PATCH v4 10/32] KVM: x86: Move APIC access page helper to common x86 code Sean Christopherson
@ 2022-12-08 21:55   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Move the APIC access page allocation helper function to common x86 code,
> the allocation routine is virtually identical between APICv (VMX) and
> AVIC (SVM).  Keep APICv's gfn_to_page() + put_page() sequence, which
> verifies that a backing page can be allocated, i.e. that the system isn't
> under heavy memory pressure.  Forcing the backing page to be populated
> isn't strictly necessary, but skipping the effective prefetch only delays
> the inevitable.


Just a note on the way we deal with that dummy page differs between APICv and AVIC
a bit:


On APICv we need physical address of the dummy page, since APICv 
has physical address of this page in VMCB, and when guest "accesses"
only then APICv specific handling is invoked.

Since we don't want to pin it, we have a mmu notifier telling us where
it is currently and we update the vmcb each time it moves.

If the page is swapped out, mmu notifier will call us, and we will
pretty much swap it back in on next VM entry, in 'vmx_set_apic_access_page_addr'
I hope that this code doesn't have races, I never looked at depth at it.


On AVIC on the other hand we don't force the page to be mapped at all, since
AVIC stores in the vmcb the virtual address of the page and only checkes that
'something' is mapped there.

If we get nothing mapped there, AVIC will not intercept, we will get normal
NPT fault and hopefully swap that page in.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c    | 35 +++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/lapic.h    |  1 +
>  arch/x86/kvm/svm/avic.c | 41 +++++++----------------------------------
>  arch/x86/kvm/vmx/vmx.c  | 35 +----------------------------------
>  4 files changed, 44 insertions(+), 68 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 316b61b56cca..80e8b1cc6dc2 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2436,6 +2436,41 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);
>  
> +int kvm_alloc_apic_access_page(struct kvm *kvm)
> +{
> +	struct page *page;
> +	void __user *hva;
> +	int ret = 0;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	if (kvm->arch.apic_access_memslot_enabled)
> +		goto out;
> +
> +	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> +				      APIC_DEFAULT_PHYS_BASE, PAGE_SIZE);
> +	if (IS_ERR(hva)) {
> +		ret = PTR_ERR(hva);
> +		goto out;
> +	}
> +
> +	page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> +	if (is_error_page(page)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Do not pin the page in memory, so that memory hot-unplug
> +	 * is able to migrate it.
> +	 */
> +	put_page(page);




> +	kvm->arch.apic_access_memslot_enabled = true;
> +out:
> +	mutex_unlock(&kvm->slots_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
> +
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index a5ac4a5a5179..0587a8282cb3 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -112,6 +112,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>  		     struct dest_map *dest_map);
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>  void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
> +int kvm_alloc_apic_access_page(struct kvm *kvm);
>  
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 97ad0661f963..ec28ba4c5f1b 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -256,39 +256,6 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
>  	return &avic_physical_id_table[index];
>  }
>  
> -/*
> - * Note:
> - * AVIC hardware walks the nested page table to check permissions,
> - * but does not use the SPA address specified in the leaf page
> - * table entry since it uses  address in the AVIC_BACKING_PAGE pointer
> - * field of the VMCB. Therefore, we set up the
> - * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
> - */
> -static int avic_alloc_access_page(struct kvm *kvm)
> -{
> -	void __user *ret;
> -	int r = 0;
> -
> -	mutex_lock(&kvm->slots_lock);
> -
> -	if (kvm->arch.apic_access_memslot_enabled)
> -		goto out;
> -
> -	ret = __x86_set_memory_region(kvm,
> -				      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> -				      APIC_DEFAULT_PHYS_BASE,
> -				      PAGE_SIZE);
> -	if (IS_ERR(ret)) {
> -		r = PTR_ERR(ret);
> -		goto out;
> -	}
> -
> -	kvm->arch.apic_access_memslot_enabled = true;
> -out:
> -	mutex_unlock(&kvm->slots_lock);
> -	return r;
> -}
> -
>  static int avic_init_backing_page(struct kvm_vcpu *vcpu)
>  {
>  	u64 *entry, new_entry;
> @@ -305,7 +272,13 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
>  	if (kvm_apicv_activated(vcpu->kvm)) {
>  		int ret;
>  
> -		ret = avic_alloc_access_page(vcpu->kvm);
> +		/*
> +		 * Note, AVIC hardware walks the nested page table to check
> +		 * permissions, but does not use the SPA address specified in
> +		 * the leaf SPTE since it uses address in the AVIC_BACKING_PAGE
> +		 * pointer field of the VMCB.
> +		 */
> +		ret = kvm_alloc_apic_access_page(vcpu->kvm);
>  		if (ret)
>  			return ret;
>  	}
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9dba04b6b019..974d9a366d5d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3771,39 +3771,6 @@ static void seg_setup(int seg)
>  	vmcs_write32(sf->ar_bytes, ar);
>  }
>  
> -static int alloc_apic_access_page(struct kvm *kvm)
> -{
> -	struct page *page;
> -	void __user *hva;
> -	int ret = 0;
> -
> -	mutex_lock(&kvm->slots_lock);
> -	if (kvm->arch.apic_access_memslot_enabled)
> -		goto out;
> -	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> -				      APIC_DEFAULT_PHYS_BASE, PAGE_SIZE);
> -	if (IS_ERR(hva)) {
> -		ret = PTR_ERR(hva);
> -		goto out;
> -	}
> -
> -	page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> -	if (is_error_page(page)) {
> -		ret = -EFAULT;
> -		goto out;
> -	}
> -
> -	/*
> -	 * Do not pin the page in memory, so that memory hot-unplug
> -	 * is able to migrate it.
> -	 */
> -	put_page(page);
> -	kvm->arch.apic_access_memslot_enabled = true;
> -out:
> -	mutex_unlock(&kvm->slots_lock);
> -	return ret;
> -}
> -
>  int allocate_vpid(void)
>  {
>  	int vpid;
> @@ -7348,7 +7315,7 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
>  	vmx->loaded_vmcs = &vmx->vmcs01;
>  
>  	if (cpu_need_virtualize_apic_accesses(vcpu)) {
> -		err = alloc_apic_access_page(vcpu->kvm);
> +		err = kvm_alloc_apic_access_page(vcpu->kvm);
>  		if (err)
>  			goto free_vmcs;
>  	}



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
  2022-10-01  0:58 ` [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled Sean Christopherson
@ 2022-12-08 21:56   ` Maxim Levitsky
  2022-12-16 19:03     ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:56 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Free the APIC access page memslot if any vCPU enables x2APIC and SVM's
> AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with
> x2APIC enabled.  On AMD, due to its "hybrid" mode where AVIC is enabled
> when x2APIC is enabled even without x2AVIC support, keeping the APIC
> access page memslot results in the guest being able to access the virtual
> APIC page as x2APIC is fully emulated by KVM.  I.e. hardware isn't aware
> that the guest is operating in x2APIC mode.
> 
> Exempt nested SVM's update of APICv state from new logic as x2APIC can't
> be toggled on VM-Exit.  In practice, invoking the x2APIC logic should be
> harmless precisely because it should be a glorified nop, but play it
> safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU lock.
> 
> Intel doesn't suffer from the same issue as APICv has fully independent
> VMCS controls for xAPIC vs. x2APIC virtualization.  Technically, KVM
> should provide bus error semantics and not memory semantics for the APIC
> page when x2APIC is enabled, but KVM already provides memory semantics in
> other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware
> disabled (via APIC_BASE MSR).
> 
> Reserve an inhibit bit so that common code can detect whether or not the
> "x2APIC inhibit" applies, but use a dedicated flag to track the inhibit
> so that it doesn't need to be stripped from apicv_inhibit_reasons (since
> it's not a "full" inhibit).
> 
> Note, checking apic_access_memslot_enabled without taking locks relies
> it being set during vCPU creation (before kvm_vcpu_reset()).  vCPUs can
> race to set the inhibit and delete the memslot, i.e. can get false
> positives, but can't get false negatives as apic_access_memslot_enabled
> can't be toggled "on" once any vCPU reaches KVM_RUN.
> 
> Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> comment, i.e. to state that KVM _does_ support the hybrid mode.  Move
> the "Note:" down a line to conform to preferred kernel/KVM multi-line
> comment style.
> 
> Opportunistically update the apicv_update_lock comment, as it isn't
> actually used to protect apic_access_memslot_enabled (it's protected by
> slots_lock).
> 
> Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 20 +++++++++++++----
>  arch/x86/kvm/lapic.c            | 38 ++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/lapic.h            |  1 +
>  arch/x86/kvm/svm/avic.c         | 15 +++++++------
>  arch/x86/kvm/svm/nested.c       |  2 +-
>  arch/x86/kvm/x86.c              | 16 ++++++++++++--
>  6 files changed, 77 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d40206b16d6c..062758135c86 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1139,6 +1139,17 @@ enum kvm_apicv_inhibit {
>  	 * AVIC is disabled because SEV doesn't support it.
>  	 */
>  	APICV_INHIBIT_REASON_SEV,
> +
> +	/*
> +	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> +	 * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> +	 * independent controls for AVIC vs. x2AVIC, and also because SVM
> +	 * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> +	 * x2AVIC.  Note, this isn't a "full" inhibit and is tracked separately.
> +	 * AVIC can still be activated, but KVM must not create SPTEs for the
> +	 * APIC base.  For simplicity, this is sticky.
> +	 */
> +	APICV_INHIBIT_REASON_X2APIC,

I still don't understand why do you want this to be an inhibit bit.

Now this 'inhibit' is not even set/clear.

I prefer to just have a boolean 'is_avic' or, '.needs_x2apic_memslot_inhibition'
in the vendor ops, and check it in 'kvm_vcpu_update_apicv' with the above comment on top of it.

need_x2apic_memslot_inhibition can even be set to false when x2avic is supported at the initalization time,
because then AVIC behaves just like APICv (when x2avic bit is enabled, AVIC mmio is no longer decoded).



static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 {
        if (!lapic_in_kernel(vcpu))
		return;

	/
	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
	 * deleted if any vCPU has x2APIC enabled as SVM without X2AVIC supoprt 
         * doesn't provide fully independent controls for AVIC vs. x2AVIC.
	 * For simplicity, this is sticky.
	/
 
 	if (apic_x2apic_mode(vcpu->arch.apic) && kvm_x86_ops.needs_x2apic_memslot_inhibition)
 		kvm_inhibit_apic_access_page(vcpu);
 
 	__kvm_vcpu_update_apicv(vcpu);
 }


With this fixed:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


>  };
>  
>  struct kvm_arch {
> @@ -1176,10 +1187,11 @@ struct kvm_arch {
>  	struct kvm_apic_map __rcu *apic_map;
>  	atomic_t apic_map_dirty;
>  
> -	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> -	struct rw_semaphore apicv_update_lock;
> -
>  	bool apic_access_memslot_enabled;
> +	bool apic_access_memslot_inhibited;
> +
> +	/* Protects apicv_inhibit_reasons */
> +	struct rw_semaphore apicv_update_lock;
>  	unsigned long apicv_inhibit_reasons;
>  
>  	gpa_t wall_clock;
> @@ -1912,7 +1924,7 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
>  
>  bool kvm_apicv_activated(struct kvm *kvm);
>  bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
> -void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
> +void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
>  void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
>  				      enum kvm_apicv_inhibit reason, bool set);
>  void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 80e8b1cc6dc2..42b61469674d 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2443,7 +2443,8 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
>  	int ret = 0;
>  
>  	mutex_lock(&kvm->slots_lock);
> -	if (kvm->arch.apic_access_memslot_enabled)
> +	if (kvm->arch.apic_access_memslot_enabled ||
> +	    kvm->arch.apic_access_memslot_inhibited)
>  		goto out;
>  
>  	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> @@ -2471,6 +2472,41 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
>  }
>  EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
>  
> +void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +
> +	if (!kvm->arch.apic_access_memslot_enabled)
> +		return;
> +
> +	kvm_vcpu_srcu_read_unlock(vcpu);
> +
> +	mutex_lock(&kvm->slots_lock);
> +
> +	if (kvm->arch.apic_access_memslot_enabled) {
> +		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
> +		/*
> +		 * Clear "enabled" after the memslot is deleted so that a
> +		 * different vCPU doesn't get a false negative when checking
> +		 * the flag out of slots_lock.  No additional memory barrier is
> +		 * needed as modifying memslots requires waiting other vCPUs to
> +		 * drop SRCU (see above), and false positives are ok as the
> +		 * flag is rechecked after acquiring slots_lock.kvm_vcpu_update_apicv
> +		 */
> +		kvm->arch.apic_access_memslot_enabled = false;
> +
> +		/*
> +		 * Mark the memslot as inhibited to prevent reallocating the
> +		 * memslot during vCPU creation, e.g. if a vCPU is hotplugged.
> +		 */
> +		kvm->arch.apic_access_memslot_inhibited = true;
> +	}
> +
> +	mutex_unlock(&kvm->slots_lock);
> +
> +	kvm_vcpu_srcu_read_lock(vcpu);
> +}
> +
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 0587a8282cb3..a318609bb050 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -113,6 +113,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>  void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
>  int kvm_alloc_apic_access_page(struct kvm *kvm);
> +void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu);
>  
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index ec28ba4c5f1b..535e35edce1d 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>  
>  	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
>  
> -	/* Note:
> -	 * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
> -	 * MSR accesses, while interrupt injection to a running vCPU
> -	 * can be achieved using AVIC doorbell. The AVIC hardware still
> -	 * accelerate MMIO accesses, but this does not cause any harm
> -	 * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
> +	/*
> +	 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
> +	 * accesses, while interrupt injection to a running vCPU can be
> +	 * achieved using AVIC doorbell.  KVM disables the APIC access page
> +	 * (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
> +	 * AVIC in hybrid mode activates only the doorbell mechanism.
>  	 */
>  	if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
>  	    avic_mode == AVIC_MODE_X2) {
> @@ -975,7 +975,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
>  			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
>  			  BIT(APICV_INHIBIT_REASON_SEV)      |
>  			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
> -			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
> +			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
> +			  BIT(APICV_INHIBIT_REASON_X2APIC);
>  
>  	return supported & BIT(reason);
>  }
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 4c620999d230..8d5e00a7ef84 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1084,7 +1084,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  	 * to benefit from it right away.
>  	 */
>  	if (kvm_apicv_activated(vcpu->kvm))
> -		kvm_vcpu_update_apicv(vcpu);
> +		__kvm_vcpu_update_apicv(vcpu);
>  
>  	return 0;
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb9d2c23fb04..a20002924eb4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10251,7 +10251,7 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
>  	kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
>  }
>  
> -void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> +void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	bool activate;
> @@ -10286,7 +10286,19 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  	preempt_enable();
>  	up_read(&vcpu->kvm->arch.apicv_update_lock);
>  }
> -EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
> +EXPORT_SYMBOL_GPL(__kvm_vcpu_update_apicv);
> +
> +static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> +{
> +	if (!lapic_in_kernel(vcpu))
> +		return;
> +
> +	if (apic_x2apic_mode(vcpu->arch.apic) &&
> +	    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC))
> +		kvm_inhibit_apic_access_page(vcpu);
> +
> +	__kvm_vcpu_update_apicv(vcpu);
> +}
>  
>  void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
>  				      enum kvm_apicv_inhibit reason, bool set)



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 15/32] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible"
  2022-10-01  0:58 ` [PATCH v4 15/32] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible" Sean Christopherson
@ 2022-12-08 21:56   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:56 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Due to a likely mismerge of patches, KVM ended up with a superfluous
> commit to "enable" AVIC's fast path for x2AVIC mode.  Even worse, the
> superfluous commit has several bugs and creates a nasty local shadow
> variable.
> 
> Rather than fix the bugs piece-by-piece[*] to achieve the same end
> result, revert the patch wholesale.
> 
> Opportunistically add a comment documenting the x2AVIC dependencies.
> 
> This reverts commit 8c9e639da435874fb845c4d296ce55664071ea7a.
> 
> [*] https://lore.kernel.org/all/YxEP7ZBRIuFWhnYJ@google.com
> 
> Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index e35e9363e7ff..605c36569ddf 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -378,7 +378,17 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
>  
>  		logid_index = cluster + __ffs(bitmap);
>  
> -		if (!apic_x2apic_mode(source)) {
> +		if (apic_x2apic_mode(source)) {
> +			/*
> +			 * For x2APIC, the logical APIC ID is a read-only value
> +			 * that is derived from the x2APIC ID, thus the x2APIC
> +			 * ID can be found by reversing the calculation (done
> +			 * above).  Note, bits 31:20 of the x2APIC ID are not
> +			 * propagated to the logical ID, but KVM limits the
> +			 * x2APIC ID limited to KVM_MAX_VCPU_IDS.
> +			 */
> +			l1_physical_id = logid_index;
> +		} else {
>  			u32 *avic_logical_id_table =
>  				page_address(kvm_svm->avic_logical_id_table_page);
>  
> @@ -393,23 +403,6 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
>  
>  			l1_physical_id = logid_entry &
>  					 AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
> -		} else {
> -			/*
> -			 * For x2APIC logical mode, cannot leverage the index.
> -			 * Instead, calculate physical ID from logical ID in ICRH.
> -			 */
> -			int cluster = (icrh & 0xffff0000) >> 16;
> -			int apic = ffs(icrh & 0xffff) - 1;
> -
> -			/*
> -			 * If the x2APIC logical ID sub-field (i.e. icrh[15:0])
> -			 * contains anything but a single bit, we cannot use the
> -			 * fast path, because it is limited to a single vCPU.
> -			 */
> -			if (apic < 0 || icrh != (1 << apic))
> -				return -EINVAL;
> -
> -			l1_physical_id = (cluster << 4) + apic;
>  		}
>  	}
>  


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 18/32] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0
  2022-10-01  0:59 ` [PATCH v4 18/32] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0 Sean Christopherson
@ 2022-12-08 21:56   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:56 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if
> the vCPU will never respond to logical mode interrupts.  KVM already
> skips setup in this case, but relies on kvm_apic_map_get_logical_dest()
> to generate mask==0.  KVM still needs the mask=0 check as a non-zero LDR
> can yield mask==0 depending on the mode, but explicitly handling the LDR
> will make it simpler to clean up the logical mode tracking in the future.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 42b61469674d..cef8b202490b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -286,10 +286,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  			continue;
>  
>  		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> +		if (!ldr)
> +			continue;
>  
>  		if (apic_x2apic_mode(apic)) {
>  			new->mode |= KVM_APIC_MODE_X2APIC;
> -		} else if (ldr) {
> +		} else {
>  			ldr = GET_APIC_LOGICAL_ID(ldr);
>  			if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
>  				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes
  2022-10-01  0:59 ` [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes Sean Christopherson
@ 2022-12-08 21:57   ` Maxim Levitsky
  2022-12-16 18:39     ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Track all possibilities for the optimized APIC map's logical modes
> instead of overloading the pseudo-bitmap and treating any "unknown" value
> as "invalid".
> 
> As documented by the now-stale comment above the mode values, the values
> did have meaning when the optimized map was originally added.  That
> dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
> physical LAPIC array for logical x2APIC"), but the obfuscated behavior
> and its comment were left behind.
> 
> Opportunistically rename "mode" to "logical_mode", partly to make it
> clear that the "disabled" case applies only to the logical map, but also
> to prove that there is no lurking code that expects "mode" to be a bitmap.
> 
> Functionally, this is a glorified nop.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 21 +++++++++++---------
>  arch/x86/kvm/lapic.c            | 35 +++++++++++++++++++++++++--------
>  2 files changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 062758135c86..ac28bbfbf0e3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -962,19 +962,22 @@ struct kvm_arch_memory_slot {
>  };
>  
>  /*
> - * We use as the mode the number of bits allocated in the LDR for the
> - * logical processor ID.  It happens that these are all powers of two.
> - * This makes it is very easy to detect cases where the APICs are
> - * configured for multiple modes; in that case, we cannot use the map and
> - * hence cannot use kvm_irq_delivery_to_apic_fast either.
> + * Track the mode of the optimized logical map, as the rules for decoding the
> + * destination vary per mode.  Enabling the optimized logical map requires all
> + * software-enabled local APIs to be in the same mode, each addressable APIC to
> + * be mapped to only one MDA, and each MDA to map to at most one APIC.
>   */
> -#define KVM_APIC_MODE_XAPIC_CLUSTER          4
> -#define KVM_APIC_MODE_XAPIC_FLAT             8
> -#define KVM_APIC_MODE_X2APIC                16
> +enum kvm_apic_logical_mode {

It would be nice to have short comment about each of the modes, like

/* All local APICs are disabled */
> +	KVM_APIC_MODE_SW_DISABLED,
/* All enabled local APICs are in XAPIC mode using cluster logical addresssing */
> +	KVM_APIC_MODE_XAPIC_CLUSTER,
/* All enabled local APICs are in XAPIC mode using flat logical addresssing */
> +	KVM_APIC_MODE_XAPIC_FLAT,
/* All enabled local APICs are in X2APIC mode */
> +	KVM_APIC_MODE_X2APIC,
/* 
   Due to differencies in mode between enabled local APICs and/or other corner cases, 
   the optimized logical map disabled 
*/
> +	KVM_APIC_MODE_MAP_DISABLED,
> +};
>  
>  struct kvm_apic_map {
>  	struct rcu_head rcu;
> -	u8 mode;
> +	enum kvm_apic_logical_mode logical_mode;
>  	u32 max_apic_id;
>  	union {
>  		struct kvm_lapic *xapic_flat_map[8];
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index cef8b202490b..9989893fef69 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
>  		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> -	switch (map->mode) {
> +	switch (map->logical_mode) {
> +	case KVM_APIC_MODE_SW_DISABLED:
> +		/* Arbitrarily use the flat map so that @cluster isn't NULL. */
> +		*cluster = map->xapic_flat_map;
> +		*mask = 0;
> +		return true;
>  	case KVM_APIC_MODE_X2APIC: {
>  		u32 offset = (dest_id >> 16) * 16;
>  		u32 max_apic_id = map->max_apic_id;
> @@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
>  		*cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
>  		*mask = dest_id & 0xf;
>  		return true;
> +	case KVM_APIC_MODE_MAP_DISABLED:
> +		return false;
>  	default:
> -		/* Not optimized. */
> +		WARN_ON_ONCE(1);
>  		return false;
>  	}
>  }
> @@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		goto out;
>  
>  	new->max_apic_id = max_id;
> +	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		struct kvm_lapic *apic = vcpu->arch.apic;
>  		struct kvm_lapic **cluster;
> +		enum kvm_apic_logical_mode logical_mode;
>  		u16 mask;
>  		u32 ldr;
>  		u8 xapic_id;
> @@ -282,7 +291,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
>  			new->phys_map[xapic_id] = apic;
>  
> -		if (!kvm_apic_sw_enabled(apic))
> +		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
> +		    !kvm_apic_sw_enabled(apic))
>  			continue;
Very minor nitpick: it feels to me that code that updates the logical mode of the
map, might be better to be in a function, or in 'if', like

		if (new->logical_mode != KVM_APIC_MODE_MAP_DISABLED) {

			/* Disabled local APICs don't affect the logical map */
			if (!kvm_apic_sw_enabled(apic))
				continue;
			....
		}
>  
>  		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> @@ -290,17 +300,26 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  			continue;
>  
>  		if (apic_x2apic_mode(apic)) {
> -			new->mode |= KVM_APIC_MODE_X2APIC;
> +			logical_mode = KVM_APIC_MODE_X2APIC;
>  		} else {
>  			ldr = GET_APIC_LOGICAL_ID(ldr);
>  			if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> -				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
> +				logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
>  			else
> -				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
> +				logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
>  		}
> +		if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
> +		    new->logical_mode != logical_mode) {
> +			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> +			continue;
> +		}
> +		new->logical_mode = logical_mode;

How about:

		if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED)
			new->logical_mode = logical_mode;

		if (new->logical_mode != logical_mode) {
			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
			continue;
		}
		
>  
> -		if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
> +		if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
> +								&cluster, &mask))) {
> +			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
>  			continue;
> +		}
>  
>  		if (mask)
>  			cluster[ffs(mask) - 1] = apic;
> @@ -953,7 +972,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
>  {
>  	if (kvm->arch.x2apic_broadcast_quirk_disabled) {
>  		if ((irq->dest_id == APIC_BROADCAST &&
> -				map->mode != KVM_APIC_MODE_X2APIC))
> +		     map->logical_mode != KVM_APIC_MODE_X2APIC))
>  			return true;
>  		if (irq->dest_id == X2APIC_BROADCAST)
>  			return true;


Functionality wise the code looks OK to me.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 20/32] KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup
  2022-10-01  0:59 ` [PATCH v4 20/32] KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup Sean Christopherson
@ 2022-12-08 21:57   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Skip the optimized cluster[] setup for x2APIC logical mode, as KVM reuses
> the optimized map's phys_map[] and doesn't actually need to insert the
> target apic into the cluster[].  The LDR is derived from the x2APIC ID,
> and both are read-only in KVM, thus the vCPU's cluster[ldr] is guaranteed
> to be the same entry as the vCPU's phys_map[x2apic_id] entry.
> 
> Skipping the unnecessary setup will allow a future fix for aliased xAPIC
> logical IDs to simply require that cluster[ldr] is non-NULL, i.e. won't
> have to special case x2APIC.

> Alternatively, the future check could allow "cluster[ldr] == apic", but
> that ends up being terribly confusing because cluster[ldr] is only set
> at the very end, i.e. it's only possible due to x2APIC's shenanigans.
> 
> Another alternative would be to send x2APIC down a separate path _after_
> the calculation and then assert that all of the above, but the resulting
> code is rather messy, and it's arguably unnecessary since asserting that
> the actual LDR matches the expected LDR means that simply testing that
> interrupts are delivered correctly provides the same guarantees.


Yes, I think it makes sense to not touch the cluser map in x2apic mode.
The map even explicitly named as for xapic (xapic_flat_map, xapic_cluster_map)

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


> 
> Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9989893fef69..fca8bbb44375 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -166,6 +166,11 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
>  	return kvm_can_post_timer_interrupt(vcpu) && vcpu->mode == IN_GUEST_MODE;
>  }
>  
> +static inline u32 kvm_apic_calc_x2apic_ldr(u32 id)
> +{
> +	return ((id >> 4) << 16) | (1 << (id & 0xf));
> +}
> +
>  static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
>  		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
>  	switch (map->logical_mode) {
> @@ -315,6 +320,18 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		}
>  		new->logical_mode = logical_mode;
>  
> +		/*
> +		 * In x2APIC mode, the LDR is read-only and derived directly
> +		 * from the x2APIC ID, thus is guaranteed to be addressable.
> +		 * KVM reuses kvm_apic_map.phys_map to optimize logical mode
> +		 * x2APIC interrupts by reversing the LDR calculation to get
> +		 * cluster of APICs, i.e. no additional work is required.
> +		 */
> +		if (apic_x2apic_mode(apic)) {
> +			WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(x2apic_id));
> +			continue;
> +		}
> +
>  		if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
>  								&cluster, &mask))) {
>  			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> @@ -381,11 +398,6 @@ static inline void kvm_apic_set_dfr(struct kvm_lapic *apic, u32 val)
>  	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
>  }
>  
> -static inline u32 kvm_apic_calc_x2apic_ldr(u32 id)
> -{
> -	return ((id >> 4) << 16) | (1 << (id & 0xf));
> -}
> -
>  static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id)
>  {
>  	u32 ldr = kvm_apic_calc_x2apic_ldr(id);



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 23/32] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
  2022-10-01  0:59 ` [PATCH v4 23/32] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs Sean Christopherson
@ 2022-12-08 21:58   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
> for x2APIC.  If 32-bit IDs are not enabled, disable the optimized map to
> honor x86 architectural behavior if multiple vCPUs shared a physical APIC
> ID.  As called out in the changelog that added the hack, all CPUs whose
> (possibly truncated) APIC ID matches the target are supposed to receive
> the IPI.
> 
>   KVM intentionally differs from real hardware, because real hardware
>   (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
>   accept the interrupt in xAPIC mode and it can deliver one interrupt to
>   more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
> 
> Applying the hack even when x2APIC is not fully enabled means KVM doesn't
> correctly handle scenarios where the guest has aliased xAPIC IDs across
> multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
> interrupts.  It's extremely unlikely any real world guest aliase APIC IDs,
> or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
> lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
> vCPU is "normal".
> 
> Furthermore, the hack is _not_ guaranteed to work!  The hack works if and
> only if the optimized APIC map is successfully allocated.  If the map
> allocation fails (unlikely), KVM will fall back to its unoptimized
> behavior, which _does_ honor the architectural behavior.
> 
> Pivot on 32-bit x2APIC IDs being enabled as that is required to take
> advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
> break existing setups unless they are way, way off in the weeds.
> 
> And an entry in KVM's errata to document the hack.  Alternatively, KVM
> could provide an actual x2APIC quirk and document the hack that way, but
> there's unlikely to ever be a use case for disabling the quirk.  Go the
> errata route to avoid having to validate a quirk no one cares about.
> 
> Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/virt/kvm/x86/errata.rst | 11 ++++++
>  arch/x86/kvm/lapic.c                  | 50 ++++++++++++++++++++++-----
>  2 files changed, 52 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
> index 410e0aa63493..49a05f24747b 100644
> --- a/Documentation/virt/kvm/x86/errata.rst
> +++ b/Documentation/virt/kvm/x86/errata.rst
> @@ -37,3 +37,14 @@ Nested virtualization features
>  ------------------------------
>  
>  TBD
> +
> +x2APIC
> +------
> +When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
> +allows sending events to a single vCPU using its x2APIC ID even if the target
> +vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
> +on VMs with > 255 vCPUs.  A side effect of the quirk is that, if multiple vCPUs
> +have the same physical APIC ID, KVM will deliver events targeting that APIC ID
> +only to the vCPU with the lowest vCPU ID.  If KVM_X2APIC_API_USE_32BIT_IDS is
> +not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
> +matching the target APIC ID receive the interrupt).

> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 14f03e654de4..340c2d3e781b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -274,10 +274,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		struct kvm_lapic *apic = vcpu->arch.apic;
>  		struct kvm_lapic **cluster;
>  		enum kvm_apic_logical_mode logical_mode;
> +		u32 x2apic_id, physical_id;
>  		u16 mask;
>  		u32 ldr;
>  		u8 xapic_id;
> -		u32 x2apic_id;
>  
>  		if (!kvm_apic_present(vcpu))
>  			continue;
> @@ -285,16 +285,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		xapic_id = kvm_xapic_id(apic);
>  		x2apic_id = kvm_x2apic_id(apic);
>  
> -		/* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
> -		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> -				x2apic_id <= new->max_apic_id)
> -			new->phys_map[x2apic_id] = apic;
>  		/*
> -		 * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
> -		 * prevent them from masking VCPUs with APIC ID <= 0xff.
> +		 * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
> +		 * IDs.  Allow sending events to vCPUs by their x2APIC ID even
> +		 * if the target vCPU is in legacy xAPIC mode, and silently
> +		 * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
> +		 * bits, causing IDs > 0xff to wrap and collide).
> +		 *
> +		 * Honor the architectural (and KVM's non-optimized) behavior
> +		 * if userspace has not enabled 32-bit x2APIC IDs.  Each APIC
> +		 * is supposed to process messages independently.  If multiple
> +		 * vCPUs have the same effective APIC ID, e.g. due to the
> +		 * x2APIC wrap or because the guest manually modified its xAPIC
> +		 * IDs, events targeting that ID are supposed to be recognized
> +		 * by all vCPUs with said ID.
>  		 */
> -		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> -			new->phys_map[xapic_id] = apic;
> +		if (kvm->arch.x2apic_format) {
> +			/* See also kvm_apic_match_physical_addr(). */
> +			if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> +			    x2apic_id <= new->max_apic_id)
> +				new->phys_map[x2apic_id] = apic;
> +
> +			if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> +				new->phys_map[xapic_id] = apic;
> +		} else {
> +			/*
> +			 * Disable the optimized map if the physical APIC ID is
> +			 * already mapped, i.e. is aliased to multiple vCPUs.
> +			 * The optimized map requires a strict 1:1 mapping
> +			 * between IDs and vCPUs.
> +			 */
> +			if (apic_x2apic_mode(apic))
> +				physical_id = x2apic_id;
> +			else
> +				physical_id = xapic_id;
> +
> +			if (new->phys_map[physical_id]) {
> +				kvfree(new);
> +				new = NULL;
> +				goto out;
> +			}
> +			new->phys_map[physical_id] = apic;
> +		}
>  
>  		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
>  		    !kvm_apic_sw_enabled(apic))


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 24/32] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled
  2022-10-01  0:59 ` [PATCH v4 24/32] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled Sean Christopherson
@ 2022-12-08 21:58   ` Maxim Levitsky
  2022-12-09  0:56     ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM
> KVM provides consistent APIC behavior if xAPIC IDs are aliased due to
> vcpu_id being truncated and the x2APIC hotplug hack isn't enabled.  If
> the hotplug hack is disabled, events that are emulated by KVM will follow
> architectural behavior (all matching vCPUs receive events, even if the
> "match" is due to truncation), whereas APICv and AVIC will deliver events
> only to the first matching vCPU, i.e. the vCPU that matches without
> truncation.
> 
> Note, the "extra" inhibit is needed because  KVM deliberately ignores
> mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit
> so that large VMs (>255 vCPUs) can run with APICv/AVIC.
> 
> Fixes: TDB
Do you mean Trade & Development Bank of Mongolia ? ;-)


> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  6 ++++++
>  arch/x86/kvm/lapic.c            | 13 ++++++++++++-
>  arch/x86/kvm/svm/avic.c         |  1 +
>  arch/x86/kvm/vmx/vmx.c          |  1 +
>  4 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ac28bbfbf0e3..171e38b94c89 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1104,6 +1104,12 @@ enum kvm_apicv_inhibit {
>  	 */
>  	APICV_INHIBIT_REASON_BLOCKIRQ,
>  
> +	/*
> +	 * APICv is disabled because not all vCPUs have a 1:1 mapping between
> +	 * APIC ID and vCPU, _and_ KVM is not applying its x2APIC hotplug hack.
> +	 */
> +	APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED,
> +
>  	/*
>  	 * For simplicity, the APIC acceleration is inhibited
>  	 * first time either APIC ID or APIC base are changed by the guest
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 340c2d3e781b..f6f328d36ae2 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -381,6 +381,16 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		cluster[ldr] = apic;
>  	}
>  out:
> +	/*
> +	 * The optimized map is effectively KVM's internal version of APICv,
Nitpick: APICv/AVIC?
> +	 * and all unwanted aliasing that results in disabling the optimized
> +	 * map also applies to APICv.
> +	 */
> +	if (!new)
> +		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
> +	else
> +		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
> +
>  	old = rcu_dereference_protected(kvm->arch.apic_map,
>  			lockdep_is_held(&kvm->arch.apic_map_lock));
>  	rcu_assign_pointer(kvm->arch.apic_map, new);
> @@ -2153,7 +2163,8 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
>  	/*
>  	 * Deliberately truncate the vCPU ID when detecting a modified APIC ID
>  	 * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
> -	 * value.
> +	 * value.  If the wrap/truncation results in unwatned aliasing, APICv
                                                       ^^ typo
> +	 * will be inhibited as part of updating KVM's optimized APIC maps.
>  	 */
>  	if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
>  		return;
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index dd0e41d454a7..2908adc79ea6 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -965,6 +965,7 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
>  			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
>  			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
>  			  BIT(APICV_INHIBIT_REASON_SEV)      |
> +			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
>  			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
>  			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
>  			  BIT(APICV_INHIBIT_REASON_X2APIC);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 974d9a366d5d..5920166d7260 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7955,6 +7955,7 @@ static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
>  			  BIT(APICV_INHIBIT_REASON_ABSENT) |
>  			  BIT(APICV_INHIBIT_REASON_HYPERV) |
>  			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> +			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
>  			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
>  			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
>  


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 25/32] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode
  2022-10-01  0:59 ` [PATCH v4 25/32] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode Sean Christopherson
@ 2022-12-08 21:58   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID.
> Architecturally, all CPUs whose logical ID matches the MDA are supposed
> to receive the interrupt; overwriting existing entries in AVIC's
> logical=>physical map can result in missed IPIs.
> 
> Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 6 ++++++
>  arch/x86/kvm/lapic.c            | 5 +++++
>  arch/x86/kvm/svm/avic.c         | 3 ++-
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 171e38b94c89..4fd06965c773 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1159,6 +1159,12 @@ enum kvm_apicv_inhibit {
>  	 * APIC base.  For simplicity, this is sticky.
>  	 */
>  	APICV_INHIBIT_REASON_X2APIC,
> +
> +	/*
> +	 * AVIC is disabled because not all vCPUs with a valid LDR have a 1:1
> +	 * mapping between logical ID and vCPU.
> +	 */
> +	APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,
>  };
>  
>  struct kvm_arch {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index f6f328d36ae2..9b3af49d2524 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -391,6 +391,11 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  	else
>  		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
>  
> +	if (!new || new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
> +		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
> +	else
> +		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
> +
>  	old = rcu_dereference_protected(kvm->arch.apic_map,
>  			lockdep_is_held(&kvm->arch.apic_map_lock));
>  	rcu_assign_pointer(kvm->arch.apic_map, new);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 2908adc79ea6..27d5abc15a91 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -968,7 +968,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
>  			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
>  			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
>  			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
> -			  BIT(APICV_INHIBIT_REASON_X2APIC);
> +			  BIT(APICV_INHIBIT_REASON_X2APIC) |
> +			  BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
>  
>  	return supported & BIT(reason);
>  }

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 26/32] KVM: SVM: Always update local APIC on writes to logical dest register
  2022-10-01  0:59 ` [PATCH v4 26/32] KVM: SVM: Always update local APIC on writes to logical dest register Sean Christopherson
@ 2022-12-08 21:58   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Update the vCPU's local (virtual) APIC on LDR writes even if the write
> "fails".  The APIC needs to recalc the optimized logical map even if the
> LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized
> map will be left as is and the vCPU will receive interrupts using its
> old LDR.
> 
> Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 27d5abc15a91..2b640c73f447 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -573,7 +573,7 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
>  		clear_bit(AVIC_LOGICAL_ID_ENTRY_VALID_BIT, (unsigned long *)entry);
>  }
>  
> -static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
> +static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
>  {
>  	int ret = 0;
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -582,10 +582,10 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
>  
>  	/* AVIC does not support LDR update for x2APIC */
>  	if (apic_x2apic_mode(vcpu->arch.apic))
> -		return 0;
> +		return;
>  
>  	if (ldr == svm->ldr_reg)
> -		return 0;
> +		return;
>  
>  	avic_invalidate_logical_id_entry(vcpu);
>  
> @@ -594,8 +594,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
>  
>  	if (!ret)
>  		svm->ldr_reg = ldr;
> -
> -	return ret;
>  }
>  
>  static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
> @@ -617,8 +615,7 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)
>  
>  	switch (offset) {
>  	case APIC_LDR:
> -		if (avic_handle_ldr_update(vcpu))
> -			return 0;
> +		avic_handle_ldr_update(vcpu);
>  		break;
>  	case APIC_DFR:
>  		avic_handle_dfr_update(vcpu);


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 27/32] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
  2022-10-01  0:59 ` [PATCH v4 27/32] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad" Sean Christopherson
@ 2022-12-08 21:59   ` Maxim Levitsky
  2022-12-09  0:49     ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 21:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Update SVM's cache of the LDR even if the new value is "bad".  Leaving
> stale information in the cache can result in KVM missing updates and/or
> invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry()
> is triggered after a different vCPU has "claimed" the old LDR.
> 
> Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 2b640c73f447..4b6fc9d64f4d 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -539,23 +539,24 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
>  	return &logical_apic_id_table[index];
>  }
>  
> -static int avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
> +static void avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
>  {
>  	bool flat;
>  	u32 *entry, new_entry;
>  
> +	if (!ldr)
> +		return;
> +
The avic_get_logical_id_entry already returns NULL in this case, so I don't think
that this is needed.

>  	flat = kvm_lapic_get_reg(vcpu->arch.apic, APIC_DFR) == APIC_DFR_FLAT;
>  	entry = avic_get_logical_id_entry(vcpu, ldr, flat);
>  	if (!entry)
> -		return -EINVAL;
> +		return;
>  
>  	new_entry = READ_ONCE(*entry);
>  	new_entry &= ~AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
>  	new_entry |= (g_physical_id & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK);
>  	new_entry |= AVIC_LOGICAL_ID_ENTRY_VALID_MASK;
>  	WRITE_ONCE(*entry, new_entry);
> -
> -	return 0;
>  }
>  
>  static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
> @@ -575,7 +576,6 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
>  
>  static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
>  {
> -	int ret = 0;
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	u32 ldr = kvm_lapic_get_reg(vcpu->arch.apic, APIC_LDR);
>  	u32 id = kvm_xapic_id(vcpu->arch.apic);
> @@ -589,11 +589,8 @@ static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
>  
>  	avic_invalidate_logical_id_entry(vcpu);
>  
> -	if (ldr)
> -		ret = avic_ldr_write(vcpu, id, ldr);
> -
> -	if (!ret)
> -		svm->ldr_reg = ldr;
> +	svm->ldr_reg = ldr;
> +	avic_ldr_write(vcpu, id, ldr);
>  }
>  
>  static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)


With the nitpick fixed:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky <mlevitsk@redhat.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
  2022-10-01  0:59 ` [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry Sean Christopherson
@ 2022-12-08 22:00   ` Maxim Levitsky
  2022-12-29  8:27     ` mlevitsk
  0 siblings, 1 reply; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 22:00 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Do not modify AVIC's logical ID table if the logical ID portion of the
> LDR is not a power-of-2, i.e. if the LDR has multiple bits set.  Taking
> only the first bit means that KVM will fail to match MDAs that intersect
> with "higher" bits in the "ID"
> 
> The "ID" acts as a bitmap, but is referred to as an ID because theres an
> implicit, unenforced "requirement" that software only set one bit.  This
> edge case is arguably out-of-spec behavior, but KVM cleanly handles it
> in all other cases, e.g. the optimized logical map (and AVIC!) is also
> disabled in this scenario.
> 
> Refactor the code to consolidate the checks, and so that the code looks
> more like avic_kick_target_vcpus_fast().
> 
> Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 4b6fc9d64f4d..a9e4e09f83fc 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -513,26 +513,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
>  static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
>  {
>  	struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
> -	int index;
>  	u32 *logical_apic_id_table;
> -	int dlid = GET_APIC_LOGICAL_ID(ldr);
> +	u32 cluster, index;
>  
> -	if (!dlid)
> -		return NULL;
> +	ldr = GET_APIC_LOGICAL_ID(ldr);
>  
> -	if (flat) { /* flat */
> -		index = ffs(dlid) - 1;
> -		if (index > 7)
> +	if (flat) {
> +		cluster = 0;
> +	} else {
> +		cluster = (ldr >> 4) << 2;
> +		if (cluster >= 0xf)
>  			return NULL;
> -	} else { /* cluster */
> -		int cluster = (dlid & 0xf0) >> 4;
> -		int apic = ffs(dlid & 0x0f) - 1;
> -
> -		if ((apic < 0) || (apic > 7) ||
> -		    (cluster >= 0xf))
> -			return NULL;
> -		index = (cluster << 2) + apic;
> +		ldr &= 0xf;
>  	}
> +	if (!ldr || !is_power_of_2(ldr))
> +		return NULL;
> +
> +	index = __ffs(ldr);
> +	if (WARN_ON_ONCE(index > 7))
> +		return NULL;
> +	index += (cluster << 2);
>  
>  	logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);
>  


Looks good.

For future refactoring, I also suggest to rename this function to 'avic_get_logical_id_table_entry'
to stress the fact that it gets a pointer to the AVIC's data structure.

Same for 'avic_get_physical_id_entry'

And also while at it : the 'svm->avic_physical_id_cache', is a very misleading name,

It should be svm->avic_physical_id_table_entry_ptr with a comment explaining that
is is the pointer to physid table entry.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>



Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 29/32] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
  2022-10-01  0:59 ` [PATCH v4 29/32] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath Sean Christopherson
@ 2022-12-08 22:00   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 22:00 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Iterate over all target logical IDs in the AVIC kick fastpath instead of
> bailing if there is more than one target.  Now that KVM inhibits AVIC if
> vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is
> guaranteed to match to at most one vCPU, i.e. iterating over the bitmap
> is guaranteed to kick each valid target exactly once.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 112 ++++++++++++++++++++++------------------
>  1 file changed, 63 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index a9e4e09f83fc..17e64b056e4e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -327,6 +327,50 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
>  					icrl & APIC_VECTOR_MASK);
>  }
>  
> +static void avic_kick_vcpu_by_physical_id(struct kvm *kvm, u32 physical_id,
> +					  u32 icrl)
> +{
> +	/*
> +	 * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
> +	 * i.e. APIC ID == vCPU ID.
> +	 */
> +	struct kvm_vcpu *target_vcpu = kvm_get_vcpu_by_id(kvm, physical_id);
> +
> +	/* Once again, nothing to do if the target vCPU doesn't exist. */
> +	if (unlikely(!target_vcpu))
> +		return;
> +
> +	avic_kick_vcpu(target_vcpu, icrl);
> +}
> +
> +static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
> +					 u32 logid_index, u32 icrl)
> +{
> +	u32 physical_id;
> +
> +	if (avic_logical_id_table) {
> +		u32 logid_entry = avic_logical_id_table[logid_index];
> +
> +		/* Nothing to do if the logical destination is invalid. */
> +		if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
> +			return;
> +
> +		physical_id = logid_entry &
> +			      AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
> +	} else {
> +		/*
> +		 * For x2APIC, the logical APIC ID is a read-only value that is
> +		 * derived from the x2APIC ID, thus the x2APIC ID can be found
> +		 * by reversing the calculation (stored in logid_index).  Note,
> +		 * bits 31:20 of the x2APIC ID aren't propagated to the logical
> +		 * ID, but KVM limits the x2APIC ID limited to KVM_MAX_VCPU_IDS.
> +		 */
> +		physical_id = logid_index;
> +	}
> +
> +	avic_kick_vcpu_by_physical_id(kvm, physical_id, icrl);
> +}
> +
>  /*
>   * A fast-path version of avic_kick_target_vcpus(), which attempts to match
>   * destination APIC ID to vCPU without looping through all vCPUs.
> @@ -334,11 +378,10 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
>  static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source,
>  				       u32 icrl, u32 icrh, u32 index)
>  {
> -	u32 l1_physical_id, dest;
> -	struct kvm_vcpu *target_vcpu;
>  	int dest_mode = icrl & APIC_DEST_MASK;
>  	int shorthand = icrl & APIC_SHORT_MASK;
>  	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
> +	u32 dest;
>  
>  	if (shorthand != APIC_DEST_NOSHORT)
>  		return -EINVAL;
> @@ -355,14 +398,14 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
>  		if (!apic_x2apic_mode(source) && dest == APIC_BROADCAST)
>  			return -EINVAL;
>  
> -		l1_physical_id = dest;
> -
> -		if (WARN_ON_ONCE(l1_physical_id != index))
> +		if (WARN_ON_ONCE(dest != index))
>  			return -EINVAL;
>  
> +		avic_kick_vcpu_by_physical_id(kvm, dest, icrl);
>  	} else {
> -		u32 bitmap, cluster;
> -		int logid_index;
> +		u32 *avic_logical_id_table;
> +		unsigned long bitmap, i;
> +		u32 cluster;
>  
>  		if (apic_x2apic_mode(source)) {
>  			/* 16 bit dest mask, 16 bit cluster id */
> @@ -382,50 +425,21 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
>  		if (unlikely(!bitmap))
>  			return 0;
>  
> -		if (!is_power_of_2(bitmap))
> -			/* multiple logical destinations, use slow path */
> -			return -EINVAL;
> -
> -		logid_index = cluster + __ffs(bitmap);
> -
> -		if (apic_x2apic_mode(source)) {
> -			/*
> -			 * For x2APIC, the logical APIC ID is a read-only value
> -			 * that is derived from the x2APIC ID, thus the x2APIC
> -			 * ID can be found by reversing the calculation (done
> -			 * above).  Note, bits 31:20 of the x2APIC ID are not
> -			 * propagated to the logical ID, but KVM limits the
> -			 * x2APIC ID limited to KVM_MAX_VCPU_IDS.
> -			 */
> -			l1_physical_id = logid_index;
> -		} else {
> -			u32 *avic_logical_id_table =
> -				page_address(kvm_svm->avic_logical_id_table_page);
> -
> -			u32 logid_entry = avic_logical_id_table[logid_index];
> -
> -			if (WARN_ON_ONCE(index != logid_index))
> -				return -EINVAL;
> -
> -			/* Nothing to do if the logical destination is invalid. */
> -			if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
> -				return 0;
> -
> -			l1_physical_id = logid_entry &
> -					 AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
> -		}
> +		if (apic_x2apic_mode(source))
> +			avic_logical_id_table = NULL;
> +		else
> +			avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page);
> +
> +		/*
> +		 * AVIC is inhibited if vCPUs aren't mapped 1:1 with logical
> +		 * IDs, thus each bit in the destination is guaranteed to map
> +		 * to at most one vCPU.
> +		 */
> +		for_each_set_bit(i, &bitmap, 16)
> +			avic_kick_vcpu_by_logical_id(kvm, avic_logical_id_table,
> +						     cluster + i, icrl);
>  	}
>  
> -	/*
> -	 * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
> -	 * i.e. APIC ID == vCPU ID.  Once again, nothing to do if the target
> -	 * vCPU doesn't exist.
> -	 */
> -	target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
> -	if (unlikely(!target_vcpu))
> -		return 0;
> -
> -	avic_kick_vcpu(target_vcpu, icrl);
>  	return 0;
>  }
>  


Looks OK, but I might have missed something.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 31/32] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu"
  2022-10-01  0:59 ` [PATCH v4 31/32] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu" Sean Christopherson
@ 2022-12-08 22:01   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 22:01 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Turns out that some warnings exist for good reasons.  Restore the warning
> in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
> running vCPU now that KVM avoids doing so when switching between x2APIC
> and xAPIC.  The entire point of the WARN is to highlight that KVM should
> not be reloading an AVIC.
> 
> Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
> spamming the kernel if it does fire.
> 
> This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 953b1fd14b6d..35b0ef877e53 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -1038,6 +1038,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		return;
>  
>  	entry = READ_ONCE(*(svm->avic_physical_id_cache));
> +	WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
>  
>  	entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
>  	entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 03/32] KVM: SVM: Flush the "current" TLB when activating AVIC
       [not found]   ` <b9f336f17eec6bfbb8429700e0f135d19813c576.camel@redhat.com>
  2022-12-08 21:52     ` Maxim Levitsky
@ 2022-12-08 22:02     ` Maxim Levitsky
  1 sibling, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 22:02 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote:
On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Flush the TLB when activating AVIC as the CPU can insert into the TLB
> while AVIC is "locally" disabled.  KVM doesn't treat "APIC hardware
> disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
> hardware disabled, AVIC is not guaranteed to be inhibited.  As a result,
> KVM may create a valid NPT mapping for the APIC base, which the CPU can
> cache as a non-AVIC translation.
> 
> Note, Intel handles this in vmx_set_virtual_apic_mode().
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 6919dee69f18..712330b80891 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>                 /* Disabling MSR intercept for x2APIC registers */
>                 svm_set_x2apic_msr_interception(svm, false);
>         } else {
> +               /*
> +                * Flush the TLB, the guest may have inserted a non-APIC
> +                * mapping into the TLB while AVIC was disabled.
> +                */
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
> +
>                 /* For xAVIC and hybrid-xAVIC modes */
>                 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
>                 /* Enabling MSR intercept for x2APIC registers */

I agree, that if guest disables APIC on a vCPU, this will lead to call to kvm_apic_update_apicv which will
disable AVIC, but if other vCPUs don't disable it, the AVIC's private memslot will still be mapped and
guest could read/write it from this vCPU, and its TLB mapping needs to be invalidated if/when APIC is re-enabled.
 
However I think that this adds an unnecessarily (at least in the future) performance penalty to AVIC nesting coexistence:
 
L1's AVIC is inhibited on each nested VM entry, and uninhibited on each nested VM exit, but while nested the guest
can't really access it as it has its own NPT.
 
With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM sadly already does this but this can be fixed
(its another thing on my TODO list)
 
Note that APICv doesn't have this issue, it is not inhibited on nested VM entry/exit, thus this code is not performance
sensitive for APICv.
 
 
I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if any of vCPUs have APICv/AVIC hardware disabled,
because it is also more correct from an x86 perspective. I do wonder how often is the usage of having "extra" cpus but not using them, and thus having their APIC in disabled state.
KVM does support adding new vCPUs on the fly, so this shouldn't be needed, and APICv inhibit in this case is just a perf regression.

Or at least do this only when APIC does back from hardware disabled state to enabled.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 32/32] KVM: x86: Track required APICv inhibits with variable, not callback
  2022-10-01  0:59 ` [PATCH v4 32/32] KVM: x86: Track required APICv inhibits with variable, not callback Sean Christopherson
@ 2022-12-08 22:03   ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2022-12-08 22:03 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> Track the per-vendor required APICv inhibits with a variable instead of
> calling into vendor code every time KVM wants to query the set of
> required inhibits.  The required inhibits are a property of the vendor's
> virtualization architecture, i.e. are 100% static.
> 
> Using a variable allows the compiler to inline the check, e.g. generate
> a single-uop TEST+Jcc, and thus eliminates any desire to avoid checking
> inhibits for performance reasons.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 -
>  arch/x86/include/asm/kvm_host.h    |  2 +-
>  arch/x86/kvm/svm/avic.c            | 20 --------------------
>  arch/x86/kvm/svm/svm.c             |  2 +-
>  arch/x86/kvm/svm/svm.h             | 17 ++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.c             | 24 +++++++++++-------------
>  arch/x86/kvm/x86.c                 |  9 +++++++--
>  7 files changed, 36 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 82ba4a564e58..533d0e804e5a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -76,7 +76,6 @@ KVM_X86_OP(set_nmi_mask)
>  KVM_X86_OP(enable_nmi_window)
>  KVM_X86_OP(enable_irq_window)
>  KVM_X86_OP_OPTIONAL(update_cr8_intercept)
> -KVM_X86_OP(check_apicv_inhibit_reasons)
>  KVM_X86_OP(refresh_apicv_exec_ctrl)
>  KVM_X86_OP_OPTIONAL(hwapic_irr_update)
>  KVM_X86_OP_OPTIONAL(hwapic_isr_update)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4fd06965c773..41d1c684a1e1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1566,7 +1566,7 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> -	bool (*check_apicv_inhibit_reasons)(enum kvm_apicv_inhibit reason);
> +	const unsigned long required_apicv_inhibits;
>  	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
>  	void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
>  	void (*hwapic_isr_update)(int isr);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 35b0ef877e53..5b25944274b5 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -966,26 +966,6 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
>  	return ret;
>  }
>  
> -bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
> -{
> -	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
> -			  BIT(APICV_INHIBIT_REASON_ABSENT) |
> -			  BIT(APICV_INHIBIT_REASON_HYPERV) |
> -			  BIT(APICV_INHIBIT_REASON_NESTED) |
> -			  BIT(APICV_INHIBIT_REASON_IRQWIN) |
> -			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
> -			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> -			  BIT(APICV_INHIBIT_REASON_SEV)      |
> -			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
> -			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
> -			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
> -			  BIT(APICV_INHIBIT_REASON_X2APIC) |
> -			  BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
> -
> -	return supported & BIT(reason);
> -}
> -
> -
>  static inline int
>  avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
>  {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 37fe7bcf8496..6bca3f9f39e6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4800,8 +4800,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.update_cr8_intercept = svm_update_cr8_intercept,
>  	.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
>  	.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
> -	.check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
>  	.apicv_post_state_restore = avic_apicv_post_state_restore,
> +	.required_apicv_inhibits = AVIC_REQUIRED_APICV_INHIBITS,
>  
>  	.get_exit_info = svm_get_exit_info,
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 29c334a932c3..dfde85782da5 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -619,6 +619,22 @@ void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
>  extern struct kvm_x86_nested_ops svm_nested_ops;
>  
>  /* avic.c */
> +#define AVIC_REQUIRED_APICV_INHIBITS			\
> +(							\
> +	BIT(APICV_INHIBIT_REASON_DISABLE) |		\
> +	BIT(APICV_INHIBIT_REASON_ABSENT) |		\
> +	BIT(APICV_INHIBIT_REASON_HYPERV) |		\
> +	BIT(APICV_INHIBIT_REASON_NESTED) |		\
> +	BIT(APICV_INHIBIT_REASON_IRQWIN) |		\
> +	BIT(APICV_INHIBIT_REASON_PIT_REINJ) |		\
> +	BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |		\
> +	BIT(APICV_INHIBIT_REASON_SEV)      |		\
> +	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
> +	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
> +	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |	\
> +	BIT(APICV_INHIBIT_REASON_X2APIC) |		\
> +	BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED)	\
> +)
>  
>  bool avic_hardware_setup(struct kvm_x86_ops *ops);
>  int avic_ga_log_notifier(u32 ga_tag);
> @@ -632,7 +648,6 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
>  void avic_vcpu_put(struct kvm_vcpu *vcpu);
>  void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu);
>  void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
> -bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
>  int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
>  			uint32_t guest_irq, bool set);
>  void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 5920166d7260..54bc447ba87e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7949,18 +7949,16 @@ static void vmx_hardware_unsetup(void)
>  	free_kvm_area();
>  }
>  
> -static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
> -{
> -	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
> -			  BIT(APICV_INHIBIT_REASON_ABSENT) |
> -			  BIT(APICV_INHIBIT_REASON_HYPERV) |
> -			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> -			  BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
> -			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
> -			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
> -
> -	return supported & BIT(reason);
> -}
> +#define VMX_REQUIRED_APICV_INHIBITS			\
> +(							\
> +	BIT(APICV_INHIBIT_REASON_DISABLE)|		\
> +	BIT(APICV_INHIBIT_REASON_ABSENT) |		\
> +	BIT(APICV_INHIBIT_REASON_HYPERV) |		\
> +	BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |		\
> +	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
> +	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
> +	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED)	\
> +)
>  
>  static void vmx_vm_destroy(struct kvm *kvm)
>  {
> @@ -8044,7 +8042,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>  	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
>  	.load_eoi_exitmap = vmx_load_eoi_exitmap,
>  	.apicv_post_state_restore = vmx_apicv_post_state_restore,
> -	.check_apicv_inhibit_reasons = vmx_check_apicv_inhibit_reasons,
> +	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
>  	.hwapic_irr_update = vmx_hwapic_irr_update,
>  	.hwapic_isr_update = vmx_hwapic_isr_update,
>  	.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a20002924eb4..b307637d9200 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10251,6 +10251,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
>  	kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
>  }
>  
> +static bool kvm_is_required_apicv_inhibit(enum kvm_apicv_inhibit reason)
> +{
> +	return kvm_x86_ops.required_apicv_inhibits & BIT(reason);
> +}
> +
>  void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> @@ -10294,7 +10299,7 @@ static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  		return;
>  
>  	if (apic_x2apic_mode(vcpu->arch.apic) &&
> -	    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC))
> +	    kvm_is_required_apicv_inhibit(APICV_INHIBIT_REASON_X2APIC))
>  		kvm_inhibit_apic_access_page(vcpu);
>  
>  	__kvm_vcpu_update_apicv(vcpu);
> @@ -10307,7 +10312,7 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
>  
>  	lockdep_assert_held_write(&kvm->arch.apicv_update_lock);
>  
> -	if (!static_call(kvm_x86_check_apicv_inhibit_reasons)(reason))
> +	if (!kvm_is_required_apicv_inhibit(reason))
>  		return;
>  
>  	old = new = kvm->arch.apicv_inhibit_reasons;



Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 03/32] KVM: SVM: Flush the "current" TLB when activating AVIC
  2022-12-08 21:52     ` Maxim Levitsky
@ 2022-12-09  0:40       ` Sean Christopherson
  0 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-12-09  0:40 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote:
> On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
> >                 /* Disabling MSR intercept for x2APIC registers */
> >                 svm_set_x2apic_msr_interception(svm, false);
> >         } else {
> > +               /*
> > +                * Flush the TLB, the guest may have inserted a non-APIC
> > +                * mapping into the TLB while AVIC was disabled.
> > +                */
> > +               kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
> > +
> >                 /* For xAVIC and hybrid-xAVIC modes */
> >                 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
> >                 /* Enabling MSR intercept for x2APIC registers */
> 
> 
> I agree, that if guest disables APIC on a vCPU, this will lead to call to
> kvm_apic_update_apicv which will disable AVIC, but if other vCPUs don't
> disable it, the AVIC's private memslot will still be mapped and guest could
> read/write it from this vCPU, and its TLB mapping needs to be invalidated
> if/when APIC is re-enabled.
>  
> However I think that this adds an unnecessarily (at least in the future)
> performance penalty to AVIC nesting coexistence:
>  
> L1's AVIC is inhibited on each nested VM entry, and uninhibited on each
> nested VM exit, but while nested the guest can't really access it as it has
> its own NPT.
>  
> With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM
> sadly already does this but this can be fixed (its another thing on my TODO
> list)
>  
> Note that APICv doesn't have this issue, it is not inhibited on nested VM
> entry/exit, thus this code is not performance sensitive for APICv.
>  
>  
> I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if
> any of vCPUs have APICv/AVIC hardware disabled, because it is also more
> correct from an x86 perspective. I do wonder how often is the usage of having
> "extra" cpus but not using them, and thus having their APIC in disabled
> state.

There are legimate scenarios where a kernel might want to disable the APIC on
select CPUs, e.g. to offline SMT siblings in BIOS.  Whether or not doing that in
a VM makes sense is debatable, but we really have no way of knowing if there are
existing guests that selectively disable APICs.

> KVM does support adding new vCPUs on the fly, so this shouldn't be needed,
> and APICv inhibit in this case is just a perf regression.

Heh, "just" a perf regression.  Inhibiting APICv would definitely be a perf regression
that people care about, e.g. see the very recent bug fixes:

https://lore.kernel.org/all/20221116205123.18737-1-gedwards@ddn.com
https://lore.kernel.org/all/1669984574-32692-1-git-send-email-yuanzhaoxiong@baidu.com

Conceptually, I like the idea of inhibiting the APICv memslot if a vCPU has its
APIC hardware disabled.  But practically speaking, because KVM has allowed that
scenario for years, I don't think we should introduce such an inhibit and risk
regressing guests.

> Or at least do this only when APIC does back from hardware disabled state to
> enabled.

I have no objection to fine tuning this in follow-up, but for this bug fix I'd
much prefer to go with this minimal change.  The nested SVM TLB flushing issue
extends far beyond this one case, i.e. needs a non-trivial overhaul and an audit
of pretty every piece of SVM code that can interact with TLBs.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 27/32] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
  2022-12-08 21:59   ` Maxim Levitsky
@ 2022-12-09  0:49     ` Sean Christopherson
  0 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-12-09  0:49 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> > Update SVM's cache of the LDR even if the new value is "bad".  Leaving
> > stale information in the cache can result in KVM missing updates and/or
> > invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry()
> > is triggered after a different vCPU has "claimed" the old LDR.
> > 
> > Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 17 +++++++----------
> >  1 file changed, 7 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 2b640c73f447..4b6fc9d64f4d 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -539,23 +539,24 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
> >  	return &logical_apic_id_table[index];
> >  }
> >  
> > -static int avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
> > +static void avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
> >  {
> >  	bool flat;
> >  	u32 *entry, new_entry;
> >  
> > +	if (!ldr)
> > +		return;
> > +
> The avic_get_logical_id_entry already returns NULL in this case, so I don't think
> that this is needed.

Hmm, yeah, it's not strictly needed.  I think I kept it because of the explicit
check that previously existed in avic_handle_ldr_update().  Part of me likes the
explicit check, but avic_invalidate_logical_id_entry() doesn't manually guard
against ldr==0, to I agree it's better to drop this for consistency.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 24/32] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled
  2022-12-08 21:58   ` Maxim Levitsky
@ 2022-12-09  0:56     ` Sean Christopherson
  0 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2022-12-09  0:56 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> > Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM
> > KVM provides consistent APIC behavior if xAPIC IDs are aliased due to
> > vcpu_id being truncated and the x2APIC hotplug hack isn't enabled.  If
> > the hotplug hack is disabled, events that are emulated by KVM will follow
> > architectural behavior (all matching vCPUs receive events, even if the
> > "match" is due to truncation), whereas APICv and AVIC will deliver events
> > only to the first matching vCPU, i.e. the vCPU that matches without
> > truncation.
> > 
> > Note, the "extra" inhibit is needed because  KVM deliberately ignores
> > mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit
> > so that large VMs (>255 vCPUs) can run with APICv/AVIC.
> > 
> > Fixes: TDB
> Do you mean Trade & Development Bank of Mongolia ? ;-)

Heh, this fixes a patch/commit earlier in the series, hence the placeholder.

> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  6 ++++++
> >  arch/x86/kvm/lapic.c            | 13 ++++++++++++-
> >  arch/x86/kvm/svm/avic.c         |  1 +
> >  arch/x86/kvm/vmx/vmx.c          |  1 +
> >  4 files changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index ac28bbfbf0e3..171e38b94c89 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1104,6 +1104,12 @@ enum kvm_apicv_inhibit {
> >  	 */
> >  	APICV_INHIBIT_REASON_BLOCKIRQ,
> >  
> > +	/*
> > +	 * APICv is disabled because not all vCPUs have a 1:1 mapping between
> > +	 * APIC ID and vCPU, _and_ KVM is not applying its x2APIC hotplug hack.
> > +	 */
> > +	APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED,
> > +
> >  	/*
> >  	 * For simplicity, the APIC acceleration is inhibited
> >  	 * first time either APIC ID or APIC base are changed by the guest
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 340c2d3e781b..f6f328d36ae2 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -381,6 +381,16 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >  		cluster[ldr] = apic;
> >  	}
> >  out:
> > +	/*
> > +	 * The optimized map is effectively KVM's internal version of APICv,
> Nitpick: APICv/AVIC?

Hmm.  I think my preference for common code that doesn't need to differentiate
between Intel and AMD (or AVIC vs. x2AVIC) is to use just APICv.  "APICv" is a
mostly a KVM term, e.g. Intel uses "APIC-virtualization" as an umbrella term for
a massive pile of features, so I think bending the term to cover Intel+AMD is ok?

Typing APICv/AVIC everywhere will get tedious and creates its own flavor of
confusion, e.g. the inhibits all use APICV.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes
  2022-12-08 21:57   ` Maxim Levitsky
@ 2022-12-16 18:39     ` Sean Christopherson
  2022-12-16 23:34       ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-12-16 18:39 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> > -#define KVM_APIC_MODE_XAPIC_CLUSTER          4
> > -#define KVM_APIC_MODE_XAPIC_FLAT             8
> > -#define KVM_APIC_MODE_X2APIC                16
> > +enum kvm_apic_logical_mode {
> 
> It would be nice to have short comment about each of the modes, like
> 
> /* All local APICs are disabled */
> > +	KVM_APIC_MODE_SW_DISABLED,
> /* All enabled local APICs are in XAPIC mode using cluster logical addresssing */
> > +	KVM_APIC_MODE_XAPIC_CLUSTER,
> /* All enabled local APICs are in XAPIC mode using flat logical addresssing */
> > +	KVM_APIC_MODE_XAPIC_FLAT,
> /* All enabled local APICs are in X2APIC mode */
> > +	KVM_APIC_MODE_X2APIC,
> /* 
>    Due to differencies in mode between enabled local APICs and/or other corner cases, 
>    the optimized logical map disabled 
> */

I'll add that.

> > +	KVM_APIC_MODE_MAP_DISABLED,
> > +};
> >  
> >  struct kvm_apic_map {
> >  	struct rcu_head rcu;
> > -	u8 mode;
> > +	enum kvm_apic_logical_mode logical_mode;
> >  	u32 max_apic_id;
> >  	union {
> >  		struct kvm_lapic *xapic_flat_map[8];
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index cef8b202490b..9989893fef69 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
> >  
> >  static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> >  		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> > -	switch (map->mode) {
> > +	switch (map->logical_mode) {
> > +	case KVM_APIC_MODE_SW_DISABLED:
> > +		/* Arbitrarily use the flat map so that @cluster isn't NULL. */
> > +		*cluster = map->xapic_flat_map;
> > +		*mask = 0;
> > +		return true;
> >  	case KVM_APIC_MODE_X2APIC: {
> >  		u32 offset = (dest_id >> 16) * 16;
> >  		u32 max_apic_id = map->max_apic_id;
> > @@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> >  		*cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
> >  		*mask = dest_id & 0xf;
> >  		return true;
> > +	case KVM_APIC_MODE_MAP_DISABLED:
> > +		return false;
> >  	default:
> > -		/* Not optimized. */
> > +		WARN_ON_ONCE(1);
> >  		return false;
> >  	}
> >  }
> > @@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >  		goto out;
> >  
> >  	new->max_apic_id = max_id;
> > +	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
> >  
> >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> >  		struct kvm_lapic *apic = vcpu->arch.apic;
> >  		struct kvm_lapic **cluster;
> > +		enum kvm_apic_logical_mode logical_mode;
> >  		u16 mask;
> >  		u32 ldr;
> >  		u8 xapic_id;
> > @@ -282,7 +291,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >  		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> >  			new->phys_map[xapic_id] = apic;
> >  
> > -		if (!kvm_apic_sw_enabled(apic))
> > +		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
> > +		    !kvm_apic_sw_enabled(apic))
> >  			continue;
> Very minor nitpick: it feels to me that code that updates the logical mode of the
> map, might be better to be in a function, or in 'if', like

An if-statement would be rough due to the indentation.  A function works well
though, especially if both the physical and logical chunks are put into helpers.
E.g. the patch at the bottom (with other fixup for this patch) yields:

	new->max_apic_id = max_id;
	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;

	kvm_for_each_vcpu(i, vcpu, kvm) {
		if (!kvm_apic_present(vcpu))
			continue;

		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
			kvfree(new);
			new = NULL;
			goto out;
		}

		kvm_recalculate_logical_map(new, vcpu);
	}

I'll tack that patch on at the end of the series if it looks ok.

> 		if (new->logical_mode != KVM_APIC_MODE_MAP_DISABLED) {
> 
> 			/* Disabled local APICs don't affect the logical map */
> 			if (!kvm_apic_sw_enabled(apic))
> 				continue;
> 			....
> 		}
> >  
> >  		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> > @@ -290,17 +300,26 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >  			continue;
> >  
> >  		if (apic_x2apic_mode(apic)) {
> > -			new->mode |= KVM_APIC_MODE_X2APIC;
> > +			logical_mode = KVM_APIC_MODE_X2APIC;
> >  		} else {
> >  			ldr = GET_APIC_LOGICAL_ID(ldr);
> >  			if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> > -				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
> > +				logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
> >  			else
> > -				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
> > +				logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
> >  		}
> > +		if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
> > +		    new->logical_mode != logical_mode) {
> > +			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> > +			continue;
> > +		}
> > +		new->logical_mode = logical_mode;
> 
> How about:
> 
> 		if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED)
> 			new->logical_mode = logical_mode;
> 
> 		if (new->logical_mode != logical_mode) {
> 			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> 			continue;
> 		}

This instead?  So that it's a bit more obvious that the SW_DISABLED case is
exclusive with the mismatched mode path.

		if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
			new->logical_mode = logical_mode;
		} else if (new->logical_mode != logical_mode) {
			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
			continue;
		}

---
 arch/x86/kvm/lapic.c | 245 +++++++++++++++++++++++--------------------
 1 file changed, 133 insertions(+), 112 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5224d73cd84a..d017f49c5048 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -218,6 +218,134 @@ static void kvm_apic_map_free(struct rcu_head *rcu)
 	kvfree(map);
 }
 
+static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
+				    struct kvm_vcpu *vcpu,
+				    bool *xapic_id_mismatch)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	u32 x2apic_id = kvm_x2apic_id(apic);
+	u32 xapic_id = kvm_xapic_id(apic);
+	u32 physical_id;
+
+	/*
+	 * Deliberately truncate the vCPU ID when detecting a mismatched APIC
+	 * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
+	 * 32-bit value.  Any unwanted aliasing due to truncation results will
+	 * be detected below.
+	 */
+	if (!apic_x2apic_mode(apic) && xapic_id != (u8)vcpu->vcpu_id)
+		*xapic_id_mismatch = true;
+
+	/*
+	 * Apply KVM's hotplug hack if userspace has enable 32-bit APIC IDs.
+	 * Allow sending events to vCPUs by their x2APIC ID even if the target
+	 * vCPU is in legacy xAPIC mode, and silently ignore aliased xAPIC IDs
+	 * (the x2APIC ID is truncated to 8 bits, causing IDs > 0xff to wrap
+	 * and collide).
+	 *
+	 * Honor the architectural (and KVM's non-optimized) behavior if
+	 * userspace has not enabled 32-bit x2APIC IDs.  Each APIC is supposed
+	 * to process messages independently.  If multiple vCPUs have the same
+	 * effective APIC ID, e.g. due to the x2APIC wrap or because the guest
+	 * manually modified its xAPIC IDs, events targeting that ID are
+	 * supposed to be recognized by all vCPUs with said ID.
+	 */
+	if (vcpu->kvm->arch.x2apic_format) {
+		/* See also kvm_apic_match_physical_addr(). */
+		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
+			x2apic_id <= new->max_apic_id)
+			new->phys_map[x2apic_id] = apic;
+
+		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
+			new->phys_map[xapic_id] = apic;
+	} else {
+		/*
+		 * Disable the optimized map if the physical APIC ID is already
+		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
+		 * map requires a strict 1:1 mapping between IDs and vCPUs.
+		 */
+		if (apic_x2apic_mode(apic))
+			physical_id = x2apic_id;
+		else
+			physical_id = xapic_id;
+
+		if (new->phys_map[physical_id])
+			return -EINVAL;
+
+		new->phys_map[physical_id] = apic;
+	}
+
+	return 0;
+}
+
+static void kvm_recalculate_logical_map(struct kvm_apic_map *new,
+					struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	enum kvm_apic_logical_mode logical_mode;
+	struct kvm_lapic **cluster;
+	u16 mask;
+	u32 ldr;
+
+	if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
+		return;
+
+	if (kvm_apic_sw_enabled(apic))
+		return;
+
+	ldr = kvm_lapic_get_reg(apic, APIC_LDR);
+	if (!ldr)
+		return;
+
+	if (apic_x2apic_mode(apic)) {
+		logical_mode = KVM_APIC_MODE_X2APIC;
+	} else {
+		ldr = GET_APIC_LOGICAL_ID(ldr);
+		if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
+			logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
+		else
+			logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
+	}
+
+	/*
+	 * To optimize logical mode delivery, all software-enabled APICs must
+	 * be configured for the same mode.
+	 */
+	if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
+		new->logical_mode = logical_mode;
+	} else if (new->logical_mode != logical_mode) {
+		new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+		return;
+	}
+
+	/*
+	 * In x2APIC mode, the LDR is read-only and derived directly from the
+	 * x2APIC ID, thus is guaranteed to be addressable.  KVM reuses
+	 * kvm_apic_map.phys_map to optimize logical mode x2APIC interrupts by
+	 * reversing the LDR calculation to get cluster of APICs, i.e. no
+	 * additional work is required.
+	 */
+	if (apic_x2apic_mode(apic)) {
+		WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(kvm_x2apic_id(apic)));
+		return;
+	}
+
+	if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
+							&cluster, &mask))) {
+		new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+		return;
+	}
+
+	if (!mask)
+		return;
+
+	ldr = ffs(mask) - 1;
+	if (!is_power_of_2(mask) || cluster[ldr])
+		new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+	else
+		cluster[ldr] = apic;
+}
+
 /*
  * CLEAN -> DIRTY and UPDATE_IN_PROGRESS -> DIRTY changes happen without a lock.
  *
@@ -272,123 +400,16 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
-		struct kvm_lapic *apic = vcpu->arch.apic;
-		struct kvm_lapic **cluster;
-		enum kvm_apic_logical_mode logical_mode;
-		u32 x2apic_id, physical_id;
-		u16 mask;
-		u32 ldr;
-		u8 xapic_id;
-
 		if (!kvm_apic_present(vcpu))
 			continue;
 
-		xapic_id = kvm_xapic_id(apic);
-		x2apic_id = kvm_x2apic_id(apic);
-
-		/*
-		 * Deliberately truncate the vCPU ID when detecting a mismatched
-		 * APIC ID to avoid false positives if the vCPU ID, i.e. x2APIC
-		 * ID, is a 32-bit value.  Any unwanted aliasing due to
-		 * truncation results will be detected below.
-		 */
-		if (!apic_x2apic_mode(apic) && xapic_id != (u8)vcpu->vcpu_id)
-			xapic_id_mismatch = true;
-
-		/*
-		 * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
-		 * IDs.  Allow sending events to vCPUs by their x2APIC ID even
-		 * if the target vCPU is in legacy xAPIC mode, and silently
-		 * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
-		 * bits, causing IDs > 0xff to wrap and collide).
-		 *
-		 * Honor the architectural (and KVM's non-optimized) behavior
-		 * if userspace has not enabled 32-bit x2APIC IDs.  Each APIC
-		 * is supposed to process messages independently.  If multiple
-		 * vCPUs have the same effective APIC ID, e.g. due to the
-		 * x2APIC wrap or because the guest manually modified its xAPIC
-		 * IDs, events targeting that ID are supposed to be recognized
-		 * by all vCPUs with said ID.
-		 */
-		if (kvm->arch.x2apic_format) {
-			/* See also kvm_apic_match_physical_addr(). */
-			if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-			    x2apic_id <= new->max_apic_id)
-				new->phys_map[x2apic_id] = apic;
-
-			if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
-				new->phys_map[xapic_id] = apic;
-		} else {
-			/*
-			 * Disable the optimized map if the physical APIC ID is
-			 * already mapped, i.e. is aliased to multiple vCPUs.
-			 * The optimized map requires a strict 1:1 mapping
-			 * between IDs and vCPUs.
-			 */
-			if (apic_x2apic_mode(apic))
-				physical_id = x2apic_id;
-			else
-				physical_id = xapic_id;
-
-			if (new->phys_map[physical_id]) {
-				kvfree(new);
-				new = NULL;
-				goto out;
-			}
-			new->phys_map[physical_id] = apic;
+		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
+			kvfree(new);
+			new = NULL;
+			goto out;
 		}
 
-		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
-		    !kvm_apic_sw_enabled(apic))
-			continue;
-
-		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
-		if (!ldr)
-			continue;
-
-		if (apic_x2apic_mode(apic)) {
-			logical_mode = KVM_APIC_MODE_X2APIC;
-		} else {
-			ldr = GET_APIC_LOGICAL_ID(ldr);
-			if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
-				logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
-			else
-				logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
-		}
-		if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
-		    new->logical_mode != logical_mode) {
-			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
-			continue;
-		}
-		new->logical_mode = logical_mode;
-
-		/*
-		 * In x2APIC mode, the LDR is read-only and derived directly
-		 * from the x2APIC ID, thus is guaranteed to be addressable.
-		 * KVM reuses kvm_apic_map.phys_map to optimize logical mode
-		 * x2APIC interrupts by reversing the LDR calculation to get
-		 * cluster of APICs, i.e. no additional work is required.
-		 */
-		if (apic_x2apic_mode(apic)) {
-			WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(x2apic_id));
-			continue;
-		}
-
-		if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
-								&cluster, &mask))) {
-			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
-			continue;
-		}
-
-		if (!mask)
-			continue;
-
-		ldr = ffs(mask) - 1;
-		if (!is_power_of_2(mask) || cluster[ldr]) {
-			new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
-			continue;
-		}
-		cluster[ldr] = apic;
+		kvm_recalculate_logical_map(new, vcpu);
 	}
 out:
 	/*

base-commit: 46d887a39567c4d1517518e4bdfcc10f34b5d6ce
-- 


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
  2022-12-08 21:56   ` Maxim Levitsky
@ 2022-12-16 19:03     ` Sean Christopherson
  2022-12-16 19:40       ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-12-16 19:03 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d40206b16d6c..062758135c86 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1139,6 +1139,17 @@ enum kvm_apicv_inhibit {
> >  	 * AVIC is disabled because SEV doesn't support it.
> >  	 */
> >  	APICV_INHIBIT_REASON_SEV,
> > +
> > +	/*
> > +	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > +	 * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> > +	 * independent controls for AVIC vs. x2AVIC, and also because SVM
> > +	 * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> > +	 * x2AVIC.  Note, this isn't a "full" inhibit and is tracked separately.
> > +	 * AVIC can still be activated, but KVM must not create SPTEs for the
> > +	 * APIC base.  For simplicity, this is sticky.
> > +	 */
> > +	APICV_INHIBIT_REASON_X2APIC,
> 
> I still don't understand why do you want this to be an inhibit bit.

Because in my mental model, it's an inhibit, but with special properties.  But I
totally get why that's confusing.

> Now this 'inhibit' is not even set/clear.
> 
> I prefer to just have a boolean 'is_avic' or,
> '.needs_x2apic_memslot_inhibition' in the vendor ops, and check it in
> 'kvm_vcpu_update_apicv' with the above comment on top of it.
> 
> need_x2apic_memslot_inhibition can even be set to false when x2avic is
> supported at the initalization time, because then AVIC behaves just like
> APICv (when x2avic bit is enabled, AVIC mmio is no longer decoded).

Oh, so SVM does effectively have independent controls, it's only the "hybrid" mode
that's affected?  In that case, how about this?

	/*
	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
	 * deleted if any vCPU has x2APIC enabled and hardware doesn't support
	 * x2APIC virtualization.  E.g. some AMD CPUs support AVIC but not
	 * x2AVIC.  KVM still allows enabling AVIC in this case so that KVM can
	 * the AVIC doorbell to inject interrupts to running vCPUs, but KVM
	 * mustn't create SPTEs for the APIC base as the vCPU would incorrectly
	 * be able to access the vAPIC page via MMIO despite being in x2APIC
	 * mode.  For simplicity, inhibiting the APIC access page is sticky.
	 */
	if (apic_x2apic_mode(vcpu->arch.apic) &&
	    !kvm_x86_ops.has_hardware_x2apic_virtualization)
		kvm_inhibit_apic_access_page(vcpu)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
  2022-12-16 19:03     ` Sean Christopherson
@ 2022-12-16 19:40       ` Sean Christopherson
  2022-12-27 11:25         ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-12-16 19:40 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Fri, Dec 16, 2022, Sean Christopherson wrote:
> On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> > I prefer to just have a boolean 'is_avic' or,
> > '.needs_x2apic_memslot_inhibition' in the vendor ops, and check it in
> > 'kvm_vcpu_update_apicv' with the above comment on top of it.
> > 
> > need_x2apic_memslot_inhibition can even be set to false when x2avic is
> > supported at the initalization time, because then AVIC behaves just like
> > APICv (when x2avic bit is enabled, AVIC mmio is no longer decoded).
> 
> Oh, so SVM does effectively have independent controls, it's only the "hybrid" mode
> that's affected?  In that case, how about this?
> 
> 	/*
> 	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> 	 * deleted if any vCPU has x2APIC enabled and hardware doesn't support
> 	 * x2APIC virtualization.  E.g. some AMD CPUs support AVIC but not
> 	 * x2AVIC.  KVM still allows enabling AVIC in this case so that KVM can
> 	 * the AVIC doorbell to inject interrupts to running vCPUs, but KVM
> 	 * mustn't create SPTEs for the APIC base as the vCPU would incorrectly
> 	 * be able to access the vAPIC page via MMIO despite being in x2APIC
> 	 * mode.  For simplicity, inhibiting the APIC access page is sticky.
> 	 */
> 	if (apic_x2apic_mode(vcpu->arch.apic) &&
> 	    !kvm_x86_ops.has_hardware_x2apic_virtualization)

Hrm, that's not quite right either since it's obviously possible to have an Intel
CPU that supports APICv but not x2APIC virtualization.  And in that case KVM
doesn't need to inhibit the memslot, e.g. if not all vCPUs are in x2APIC.

I was hoping to have a name that communicate _why_ the memslot needs to be
inhibited, but it's turning out to be really hard to come up with a name that's
descriptive without being ridiculously verbose.  The best I've come up with is:

	allow_apicv_in_x2apic_without_x2apic_virtualization

It's heinous, but I'm inclined to go with it unless someone has a better idea.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes
  2022-12-16 18:39     ` Sean Christopherson
@ 2022-12-16 23:34       ` Sean Christopherson
  2022-12-27 11:30         ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2022-12-16 23:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Fri, Dec 16, 2022, Sean Christopherson wrote:
> On Thu, Dec 08, 2022, Maxim Levitsky wrote:

...

> > > @@ -282,7 +291,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> > >  		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> > >  			new->phys_map[xapic_id] = apic;
> > >  
> > > -		if (!kvm_apic_sw_enabled(apic))
> > > +		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
> > > +		    !kvm_apic_sw_enabled(apic))
> > >  			continue;
> > Very minor nitpick: it feels to me that code that updates the logical mode of the
> > map, might be better to be in a function, or in 'if', like
> 
> An if-statement would be rough due to the indentation.  A function works well
> though, especially if both the physical and logical chunks are put into helpers.
> E.g. the patch at the bottom (with other fixup for this patch) yields:
> 
> 	new->max_apic_id = max_id;
> 	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
> 
> 	kvm_for_each_vcpu(i, vcpu, kvm) {
> 		if (!kvm_apic_present(vcpu))
> 			continue;
> 
> 		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
> 			kvfree(new);
> 			new = NULL;
> 			goto out;
> 		}
> 
> 		kvm_recalculate_logical_map(new, vcpu);
> 	}
> 
> I'll tack that patch on at the end of the series if it looks ok.

...

> +static void kvm_recalculate_logical_map(struct kvm_apic_map *new,
> +					struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +	enum kvm_apic_logical_mode logical_mode;
> +	struct kvm_lapic **cluster;
> +	u16 mask;
> +	u32 ldr;
> +
> +	if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
> +		return;
> +
> +	if (kvm_apic_sw_enabled(apic))

"minor" detail, this is inverted.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups
  2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
                   ` (31 preceding siblings ...)
  2022-10-01  0:59 ` [PATCH v4 32/32] KVM: x86: Track required APICv inhibits with variable, not callback Sean Christopherson
@ 2022-12-27 11:22 ` Paolo Bonzini
  32 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2022-12-27 11:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit,
	Maxim Levitsky, Li RongQing

Queued, thanks.  I think I'll do a pass on the commit messages, but
I'm technically on vacation and I'm not sure when I will have time
so I am pushing everything to kvm/queue in the meanwhile.

Paolo



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
  2022-12-16 19:40       ` Sean Christopherson
@ 2022-12-27 11:25         ` Paolo Bonzini
  2023-01-03 16:30           ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2022-12-27 11:25 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On 12/16/22 20:40, Sean Christopherson wrote:
> I was hoping to have a name that communicate_why_  the memslot needs to be
> inhibited, but it's turning out to be really hard to come up with a name that's
> descriptive without being ridiculously verbose.  The best I've come up with is:
> 
> 	allow_apicv_in_x2apic_without_x2apic_virtualization
> 
> It's heinous, but I'm inclined to go with it unless someone has a better idea.

Can any of you provide a patch to squash on top of this one (or on top 
of kvm/queue, as you prefer)?

Paolo


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes
  2022-12-16 23:34       ` Sean Christopherson
@ 2022-12-27 11:30         ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2022-12-27 11:30 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On 12/17/22 00:34, Sean Christopherson wrote:
> On Fri, Dec 16, 2022, Sean Christopherson wrote:
>> On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> 
> ...
> 
>>>> @@ -282,7 +291,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>>>>   		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
>>>>   			new->phys_map[xapic_id] = apic;
>>>>   
>>>> -		if (!kvm_apic_sw_enabled(apic))
>>>> +		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
>>>> +		    !kvm_apic_sw_enabled(apic))
>>>>   			continue;
>>> Very minor nitpick: it feels to me that code that updates the logical mode of the
>>> map, might be better to be in a function, or in 'if', like
>>
>> An if-statement would be rough due to the indentation.  A function works well
>> though, especially if both the physical and logical chunks are put into helpers.
>> E.g. the patch at the bottom (with other fixup for this patch) yields:
>>
>> 	new->max_apic_id = max_id;
>> 	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
>>
>> 	kvm_for_each_vcpu(i, vcpu, kvm) {
>> 		if (!kvm_apic_present(vcpu))
>> 			continue;
>>
>> 		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
>> 			kvfree(new);
>> 			new = NULL;
>> 			goto out;
>> 		}
>>
>> 		kvm_recalculate_logical_map(new, vcpu);
>> 	}
>>
>> I'll tack that patch on at the end of the series if it looks ok.

Yes, please send as a follow up.

Paolo


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
  2022-12-08 22:00   ` Maxim Levitsky
@ 2022-12-29  8:27     ` mlevitsk
  2023-01-04 10:08       ` Maxim Levitsky
  2023-01-04 18:02       ` Sean Christopherson
  0 siblings, 2 replies; 72+ messages in thread
From: mlevitsk @ 2022-12-29  8:27 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Fri, 2022-12-09 at 00:00 +0200, Maxim Levitsky wrote:
> On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> > Do not modify AVIC's logical ID table if the logical ID portion of the
> > LDR is not a power-of-2, i.e. if the LDR has multiple bits set.  Taking
> > only the first bit means that KVM will fail to match MDAs that intersect
> > with "higher" bits in the "ID"
> > 
> > The "ID" acts as a bitmap, but is referred to as an ID because theres an
> > implicit, unenforced "requirement" that software only set one bit.  This
> > edge case is arguably out-of-spec behavior, but KVM cleanly handles it
> > in all other cases, e.g. the optimized logical map (and AVIC!) is also
> > disabled in this scenario.
> > 
> > Refactor the code to consolidate the checks, and so that the code looks
> > more like avic_kick_target_vcpus_fast().
> > 
> > Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> > Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
> >  1 file changed, 15 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 4b6fc9d64f4d..a9e4e09f83fc 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -513,26 +513,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
> >  static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
> >  {
> >  	struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
> > -	int index;
> >  	u32 *logical_apic_id_table;
> > -	int dlid = GET_APIC_LOGICAL_ID(ldr);
> > +	u32 cluster, index;
> >  
> > -	if (!dlid)
> > -		return NULL;
> > +	ldr = GET_APIC_LOGICAL_ID(ldr);
> >  
> > -	if (flat) { /* flat */
> > -		index = ffs(dlid) - 1;
> > -		if (index > 7)
> > +	if (flat) {
> > +		cluster = 0;
> > +	} else {
> > +		cluster = (ldr >> 4) << 2;
> > +		if (cluster >= 0xf)
> >  			return NULL;
> > -	} else { /* cluster */
> > -		int cluster = (dlid & 0xf0) >> 4;
> > -		int apic = ffs(dlid & 0x0f) - 1;
> > -
> > -		if ((apic < 0) || (apic > 7) ||
> > -		    (cluster >= 0xf))
> > -			return NULL;
> > -		index = (cluster << 2) + apic;
> > +		ldr &= 0xf;
> >  	}
> > +	if (!ldr || !is_power_of_2(ldr))
> > +		return NULL;
> > +
> > +	index = __ffs(ldr);
> > +	if (WARN_ON_ONCE(index > 7))
> > +		return NULL;
> > +	index += (cluster << 2);
> >  
> >  	logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);
> >  
> 
> Looks good.

I hate to say it but this patch has a bug:

We have both 'cluster = (ldr >> 4) << 2' and then 'index += (cluster << 2)'

One of the shifts has to go.

Best regards,
	Maxim Levitsky


> 
> For future refactoring, I also suggest to rename this function to 'avic_get_logical_id_table_entry'
> to stress the fact that it gets a pointer to the AVIC's data structure.
> 
> Same for 'avic_get_physical_id_entry'
> 
> And also while at it : the 'svm->avic_physical_id_cache', is a very misleading name,
> 
> It should be svm->avic_physical_id_table_entry_ptr with a comment explaining that
> is is the pointer to physid table entry.
> 
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> 
> 
> Best regards,
> 	Maxim Levitsky



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
  2022-12-27 11:25         ` Paolo Bonzini
@ 2023-01-03 16:30           ` Sean Christopherson
  0 siblings, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2023-01-03 16:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Tue, Dec 27, 2022, Paolo Bonzini wrote:
> On 12/16/22 20:40, Sean Christopherson wrote:
> > I was hoping to have a name that communicate_why_  the memslot needs to be
> > inhibited, but it's turning out to be really hard to come up with a name that's
> > descriptive without being ridiculously verbose.  The best I've come up with is:
> > 
> > 	allow_apicv_in_x2apic_without_x2apic_virtualization
> > 
> > It's heinous, but I'm inclined to go with it unless someone has a better idea.
> 
> Can any of you provide a patch to squash on top of this one (or on top of
> kvm/queue, as you prefer)?

I'll send patches, though it might take a day or three.  I had a reworked version
of this series prepped and tested before vacation kicked in, but lost power before
I could send my traditional pre-vacation patch bomb :-/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
  2022-12-29  8:27     ` mlevitsk
@ 2023-01-04 10:08       ` Maxim Levitsky
  2023-01-04 18:02       ` Sean Christopherson
  1 sibling, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2023-01-04 10:08 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Alejandro Jimenez, Suravee Suthikulpanit, Li RongQing

On Thu, 2022-12-29 at 10:27 +0200, mlevitsk@redhat.com wrote:
> On Fri, 2022-12-09 at 00:00 +0200, Maxim Levitsky wrote:
> > On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> > > Do not modify AVIC's logical ID table if the logical ID portion of the
> > > LDR is not a power-of-2, i.e. if the LDR has multiple bits set.  Taking
> > > only the first bit means that KVM will fail to match MDAs that intersect
> > > with "higher" bits in the "ID"
> > > 
> > > The "ID" acts as a bitmap, but is referred to as an ID because theres an
> > > implicit, unenforced "requirement" that software only set one bit.  This
> > > edge case is arguably out-of-spec behavior, but KVM cleanly handles it
> > > in all other cases, e.g. the optimized logical map (and AVIC!) is also
> > > disabled in this scenario.
> > > 
> > > Refactor the code to consolidate the checks, and so that the code looks
> > > more like avic_kick_target_vcpus_fast().
> > > 
> > > Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> > > Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> > > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
> > >  1 file changed, 15 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index 4b6fc9d64f4d..a9e4e09f83fc 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -513,26 +513,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
> > >  static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
> > >  {
> > >  	struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
> > > -	int index;
> > >  	u32 *logical_apic_id_table;
> > > -	int dlid = GET_APIC_LOGICAL_ID(ldr);
> > > +	u32 cluster, index;
> > >  
> > > -	if (!dlid)
> > > -		return NULL;
> > > +	ldr = GET_APIC_LOGICAL_ID(ldr);
> > >  
> > > -	if (flat) { /* flat */
> > > -		index = ffs(dlid) - 1;
> > > -		if (index > 7)
> > > +	if (flat) {
> > > +		cluster = 0;
> > > +	} else {
> > > +		cluster = (ldr >> 4) << 2;
> > > +		if (cluster >= 0xf)
> > >  			return NULL;
> > > -	} else { /* cluster */
> > > -		int cluster = (dlid & 0xf0) >> 4;
> > > -		int apic = ffs(dlid & 0x0f) - 1;
> > > -
> > > -		if ((apic < 0) || (apic > 7) ||
> > > -		    (cluster >= 0xf))
> > > -			return NULL;
> > > -		index = (cluster << 2) + apic;
> > > +		ldr &= 0xf;
> > >  	}
> > > +	if (!ldr || !is_power_of_2(ldr))
> > > +		return NULL;
> > > +
> > > +	index = __ffs(ldr);
> > > +	if (WARN_ON_ONCE(index > 7))
> > > +		return NULL;
> > > +	index += (cluster << 2);
> > >  
> > >  	logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);
> > >  
> > 
> > Looks good.
> 
> I hate to say it but this patch has a bug:
> 
> We have both 'cluster = (ldr >> 4) << 2' and then 'index += (cluster << 2)'
> 
> One of the shifts has to go.


Sean, please don't forget to fix this isssue in the next patch series.
Best regards,
	Maxim Levitsky
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> > For future refactoring, I also suggest to rename this function to 'avic_get_logical_id_table_entry'
> > to stress the fact that it gets a pointer to the AVIC's data structure.
> > 
> > Same for 'avic_get_physical_id_entry'
> > 
> > And also while at it : the 'svm->avic_physical_id_cache', is a very misleading name,
> > 
> > It should be svm->avic_physical_id_table_entry_ptr with a comment explaining that
> > is is the pointer to physid table entry.
> > 
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> > 
> > 
> > Best regards,
> > 	Maxim Levitsky
> 
> 



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
  2022-12-29  8:27     ` mlevitsk
  2023-01-04 10:08       ` Maxim Levitsky
@ 2023-01-04 18:02       ` Sean Christopherson
  2023-01-04 18:34         ` Maxim Levitsky
  1 sibling, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2023-01-04 18:02 UTC (permalink / raw)
  To: mlevitsk
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Thu, Dec 29, 2022, mlevitsk@redhat.com wrote:
> On Fri, 2022-12-09 at 00:00 +0200, Maxim Levitsky wrote:
> > On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> > > +	} else {
> > > +		cluster = (ldr >> 4) << 2;
> > > +		if (cluster >= 0xf)
> > >  			return NULL;
> > > -	} else { /* cluster */
> > > -		int cluster = (dlid & 0xf0) >> 4;
> > > -		int apic = ffs(dlid & 0x0f) - 1;
> > > -
> > > -		if ((apic < 0) || (apic > 7) ||
> > > -		    (cluster >= 0xf))
> > > -			return NULL;
> > > -		index = (cluster << 2) + apic;
> > > +		ldr &= 0xf;
> > >  	}
> > > +	if (!ldr || !is_power_of_2(ldr))
> > > +		return NULL;
> > > +
> > > +	index = __ffs(ldr);
> > > +	if (WARN_ON_ONCE(index > 7))
> > > +		return NULL;
> > > +	index += (cluster << 2);
> > >  
> > >  	logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);
> > >  
> > 
> > Looks good.
> 
> I hate to say it but this patch has a bug:
> 
> We have both 'cluster = (ldr >> 4) << 2' and then 'index += (cluster << 2)'
> 
> One of the shifts has to go.

The first shift is wrong.  The "cluster >= 0xf" check needs to be done on the actual
cluster.  The "<< 2", a.k.a. "* 4", is specific to indexing the AVIC table.

Thanks!

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
  2023-01-04 18:02       ` Sean Christopherson
@ 2023-01-04 18:34         ` Maxim Levitsky
  0 siblings, 0 replies; 72+ messages in thread
From: Maxim Levitsky @ 2023-01-04 18:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Alejandro Jimenez,
	Suravee Suthikulpanit, Li RongQing

On Wed, 2023-01-04 at 18:02 +0000, Sean Christopherson wrote:
> On Thu, Dec 29, 2022, mlevitsk@redhat.com wrote:
> > On Fri, 2022-12-09 at 00:00 +0200, Maxim Levitsky wrote:
> > > On Sat, 2022-10-01 at 00:59 +0000, Sean Christopherson wrote:
> > > > +	} else {
> > > > +		cluster = (ldr >> 4) << 2;
> > > > +		if (cluster >= 0xf)
> > > >  			return NULL;
> > > > -	} else { /* cluster */
> > > > -		int cluster = (dlid & 0xf0) >> 4;
> > > > -		int apic = ffs(dlid & 0x0f) - 1;
> > > > -
> > > > -		if ((apic < 0) || (apic > 7) ||
> > > > -		    (cluster >= 0xf))
> > > > -			return NULL;
> > > > -		index = (cluster << 2) + apic;
> > > > +		ldr &= 0xf;
> > > >  	}
> > > > +	if (!ldr || !is_power_of_2(ldr))
> > > > +		return NULL;
> > > > +
> > > > +	index = __ffs(ldr);
> > > > +	if (WARN_ON_ONCE(index > 7))
> > > > +		return NULL;
> > > > +	index += (cluster << 2);
> > > >  
> > > >  	logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);
> > > >  
> > > 
> > > Looks good.
> > 
> > I hate to say it but this patch has a bug:
> > 
> > We have both 'cluster = (ldr >> 4) << 2' and then 'index += (cluster << 2)'
> > 
> > One of the shifts has to go.
> 
> The first shift is wrong.  The "cluster >= 0xf" check needs to be done on the actual
> cluster.  The "<< 2", a.k.a. "* 4", is specific to indexing the AVIC table.
> 
> Thanks!
> 

Yep, agree.

(I mean technically you can remove the second shift and do the check before doing the first shift, that
is why I told you that one of the shifts has to go)

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2023-01-04 18:36 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-01  0:58 [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
2022-10-01  0:58 ` [PATCH v4 01/32] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps Sean Christopherson
2022-12-08 21:47   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 02/32] KVM: x86: Purge "highest ISR" cache when updating APICv state Sean Christopherson
2022-12-08 21:47   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 03/32] KVM: SVM: Flush the "current" TLB when activating AVIC Sean Christopherson
     [not found]   ` <b9f336f17eec6bfbb8429700e0f135d19813c576.camel@redhat.com>
2022-12-08 21:52     ` Maxim Levitsky
2022-12-09  0:40       ` Sean Christopherson
2022-12-08 22:02     ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 04/32] KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target Sean Christopherson
2022-10-01  0:58 ` [PATCH v4 05/32] KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled Sean Christopherson
2022-12-08 21:53   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 06/32] KVM: x86: Track xAPIC ID only on userspace SET, _after_ vAPIC is updated Sean Christopherson
2022-12-08 21:53   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 07/32] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID Sean Christopherson
2022-12-08 21:53   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 08/32] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode Sean Christopherson
2022-12-08 21:53   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 09/32] KVM: x86: Handle APICv updates for APIC "mode" changes via request Sean Christopherson
2022-12-08 21:54   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 10/32] KVM: x86: Move APIC access page helper to common x86 code Sean Christopherson
2022-12-08 21:55   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 11/32] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled Sean Christopherson
2022-12-08 21:56   ` Maxim Levitsky
2022-12-16 19:03     ` Sean Christopherson
2022-12-16 19:40       ` Sean Christopherson
2022-12-27 11:25         ` Paolo Bonzini
2023-01-03 16:30           ` Sean Christopherson
2022-10-01  0:58 ` [PATCH v4 12/32] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean Sean Christopherson
2022-10-01  0:58 ` [PATCH v4 13/32] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick Sean Christopherson
2022-10-01  0:58 ` [PATCH v4 14/32] KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast Sean Christopherson
2022-10-01  0:58 ` [PATCH v4 15/32] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible" Sean Christopherson
2022-12-08 21:56   ` Maxim Levitsky
2022-10-01  0:58 ` [PATCH v4 16/32] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch Sean Christopherson
2022-10-01  0:59 ` [PATCH v4 17/32] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU Sean Christopherson
2022-10-01  0:59 ` [PATCH v4 18/32] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0 Sean Christopherson
2022-12-08 21:56   ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 19/32] KVM: x86: Explicitly track all possibilities for APIC map's logical modes Sean Christopherson
2022-12-08 21:57   ` Maxim Levitsky
2022-12-16 18:39     ` Sean Christopherson
2022-12-16 23:34       ` Sean Christopherson
2022-12-27 11:30         ` Paolo Bonzini
2022-10-01  0:59 ` [PATCH v4 20/32] KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup Sean Christopherson
2022-12-08 21:57   ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 21/32] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs Sean Christopherson
2022-10-01  0:59 ` [PATCH v4 22/32] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode Sean Christopherson
2022-10-01  0:59 ` [PATCH v4 23/32] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs Sean Christopherson
2022-12-08 21:58   ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 24/32] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled Sean Christopherson
2022-12-08 21:58   ` Maxim Levitsky
2022-12-09  0:56     ` Sean Christopherson
2022-10-01  0:59 ` [PATCH v4 25/32] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode Sean Christopherson
2022-12-08 21:58   ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 26/32] KVM: SVM: Always update local APIC on writes to logical dest register Sean Christopherson
2022-12-08 21:58   ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 27/32] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad" Sean Christopherson
2022-12-08 21:59   ` Maxim Levitsky
2022-12-09  0:49     ` Sean Christopherson
2022-10-01  0:59 ` [PATCH v4 28/32] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry Sean Christopherson
2022-12-08 22:00   ` Maxim Levitsky
2022-12-29  8:27     ` mlevitsk
2023-01-04 10:08       ` Maxim Levitsky
2023-01-04 18:02       ` Sean Christopherson
2023-01-04 18:34         ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 29/32] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath Sean Christopherson
2022-12-08 22:00   ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 30/32] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps Sean Christopherson
2022-10-01  0:59 ` [PATCH v4 31/32] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu" Sean Christopherson
2022-12-08 22:01   ` Maxim Levitsky
2022-10-01  0:59 ` [PATCH v4 32/32] KVM: x86: Track required APICv inhibits with variable, not callback Sean Christopherson
2022-12-08 22:03   ` Maxim Levitsky
2022-12-27 11:22 ` [PATCH v4 00/32] KVM: x86: AVIC and local APIC fixes+cleanups Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).