All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Li RongQing <lirongqing@baidu.com>,
	Greg Edwards <gedwards@ddn.com>
Subject: [PATCH v5 22/33] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
Date: Fri,  6 Jan 2023 01:12:55 +0000	[thread overview]
Message-ID: <20230106011306.85230-23-seanjc@google.com> (raw)
In-Reply-To: <20230106011306.85230-1-seanjc@google.com>

Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
for x2APIC.  If 32-bit IDs are not enabled, disable the optimized map to
honor x86 architectural behavior if multiple vCPUs shared a physical APIC
ID.  As called out in the changelog that added the hack, all CPUs whose
(possibly truncated) APIC ID matches the target are supposed to receive
the IPI.

  KVM intentionally differs from real hardware, because real hardware
  (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
  accept the interrupt in xAPIC mode and it can deliver one interrupt to
  more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Applying the hack even when x2APIC is not fully enabled means KVM doesn't
correctly handle scenarios where the guest has aliased xAPIC IDs across
multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
interrupts.  It's extremely unlikely any real world guest aliases APIC
IDs, or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
vCPU is "normal".

Furthermore, the hack is _not_ guaranteed to work!  The hack works if and
only if the optimized APIC map is successfully allocated.  If the map
allocation fails (unlikely), KVM will fall back to its unoptimized
behavior, which _does_ honor the architectural behavior.

Pivot on 32-bit x2APIC IDs being enabled as that is required to take
advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
break existing setups unless they are way, way off in the weeds.

And an entry in KVM's errata to document the hack.  Alternatively, KVM
could provide an actual x2APIC quirk and document the hack that way, but
there's unlikely to ever be a use case for disabling the quirk.  Go the
errata route to avoid having to validate a quirk no one cares about.

Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/x86/errata.rst | 11 ++++++
 arch/x86/kvm/lapic.c                  | 50 ++++++++++++++++++++++-----
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 410e0aa63493..49a05f24747b 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -37,3 +37,14 @@ Nested virtualization features
 ------------------------------
 
 TBD
+
+x2APIC
+------
+When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
+allows sending events to a single vCPU using its x2APIC ID even if the target
+vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
+on VMs with > 255 vCPUs.  A side effect of the quirk is that, if multiple vCPUs
+have the same physical APIC ID, KVM will deliver events targeting that APIC ID
+only to the vCPU with the lowest vCPU ID.  If KVM_X2APIC_API_USE_32BIT_IDS is
+not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
+matching the target APIC ID receive the interrupt).
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9c0554bae3b1..e9f258de91bd 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -274,10 +274,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		struct kvm_lapic *apic = vcpu->arch.apic;
 		struct kvm_lapic **cluster;
 		enum kvm_apic_logical_mode logical_mode;
+		u32 x2apic_id, physical_id;
 		u16 mask;
 		u32 ldr;
 		u8 xapic_id;
-		u32 x2apic_id;
 
 		if (!kvm_apic_present(vcpu))
 			continue;
@@ -285,16 +285,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		xapic_id = kvm_xapic_id(apic);
 		x2apic_id = kvm_x2apic_id(apic);
 
-		/* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
-		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-				x2apic_id <= new->max_apic_id)
-			new->phys_map[x2apic_id] = apic;
 		/*
-		 * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
-		 * prevent them from masking VCPUs with APIC ID <= 0xff.
+		 * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
+		 * IDs.  Allow sending events to vCPUs by their x2APIC ID even
+		 * if the target vCPU is in legacy xAPIC mode, and silently
+		 * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
+		 * bits, causing IDs > 0xff to wrap and collide).
+		 *
+		 * Honor the architectural (and KVM's non-optimized) behavior
+		 * if userspace has not enabled 32-bit x2APIC IDs.  Each APIC
+		 * is supposed to process messages independently.  If multiple
+		 * vCPUs have the same effective APIC ID, e.g. due to the
+		 * x2APIC wrap or because the guest manually modified its xAPIC
+		 * IDs, events targeting that ID are supposed to be recognized
+		 * by all vCPUs with said ID.
 		 */
-		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
-			new->phys_map[xapic_id] = apic;
+		if (kvm->arch.x2apic_format) {
+			/* See also kvm_apic_match_physical_addr(). */
+			if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
+			    x2apic_id <= new->max_apic_id)
+				new->phys_map[x2apic_id] = apic;
+
+			if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
+				new->phys_map[xapic_id] = apic;
+		} else {
+			/*
+			 * Disable the optimized map if the physical APIC ID is
+			 * already mapped, i.e. is aliased to multiple vCPUs.
+			 * The optimized map requires a strict 1:1 mapping
+			 * between IDs and vCPUs.
+			 */
+			if (apic_x2apic_mode(apic))
+				physical_id = x2apic_id;
+			else
+				physical_id = xapic_id;
+
+			if (new->phys_map[physical_id]) {
+				kvfree(new);
+				new = NULL;
+				goto out;
+			}
+			new->phys_map[physical_id] = apic;
+		}
 
 		if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
 		    !kvm_apic_sw_enabled(apic))
-- 
2.39.0.314.g84b9a713c41-goog


  parent reply	other threads:[~2023-01-06  1:16 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06  1:12 [PATCH v5 00/33] KVM: x86: AVIC and local APIC fixes+cleanups Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 01/33] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 02/33] KVM: x86: Purge "highest ISR" cache when updating APICv state Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 03/33] KVM: SVM: Flush the "current" TLB when activating AVIC Sean Christopherson
2023-01-08 14:25   ` Maxim Levitsky
2023-01-06  1:12 ` [PATCH v5 04/33] KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 05/33] KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 06/33] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 07/33] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 08/33] KVM: x86: Handle APICv updates for APIC "mode" changes via request Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 09/33] KVM: x86: Move APIC access page helper to common x86 code Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 10/33] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled Sean Christopherson
2023-01-08 14:30   ` Maxim Levitsky
2023-01-06  1:12 ` [PATCH v5 11/33] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 12/33] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 13/33] KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 14/33] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible" Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 15/33] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 16/33] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 17/33] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0 Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 18/33] KVM: x86: Explicitly track all possibilities for APIC map's logical modes Sean Christopherson
2023-01-08 15:14   ` Maxim Levitsky
2023-01-06  1:12 ` [PATCH v5 19/33] KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 20/33] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 21/33] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode Sean Christopherson
2023-01-06  1:12 ` Sean Christopherson [this message]
2023-01-06  1:12 ` [PATCH v5 23/33] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 24/33] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 25/33] KVM: SVM: Always update local APIC on writes to logical dest register Sean Christopherson
2023-01-06  1:12 ` [PATCH v5 26/33] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad" Sean Christopherson
2023-01-06  1:13 ` [PATCH v5 27/33] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry Sean Christopherson
2023-01-08 15:20   ` Maxim Levitsky
2023-01-06  1:13 ` [PATCH v5 28/33] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath Sean Christopherson
2023-01-06  1:13 ` [PATCH v5 29/33] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps Sean Christopherson
2023-01-06  1:13 ` [PATCH v5 30/33] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu" Sean Christopherson
2023-01-06  1:13 ` [PATCH v5 31/33] KVM: x86: Track required APICv inhibits with variable, not callback Sean Christopherson
2023-01-06  1:13 ` [PATCH v5 32/33] KVM: x86: Allow APICv APIC ID inhibit to be cleared Sean Christopherson
2023-01-08 15:25   ` Maxim Levitsky
2023-01-06  1:13 ` [PATCH v5 33/33] KVM: x86: Add helpers to recalc physical vs. logical optimized APIC maps Sean Christopherson
2023-01-08 15:32   ` Maxim Levitsky
2023-02-15 20:25 ` [PATCH v5 00/33] KVM: x86: AVIC and local APIC fixes+cleanups Suthikulpanit, Suravee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230106011306.85230-23-seanjc@google.com \
    --to=seanjc@google.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=gedwards@ddn.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.