All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] KVM: x86: Out-of-bounds access in kvm_recalculate_phys_map()
@ 2023-05-26 23:50 Sean Christopherson
  2023-05-26 23:50 ` [PATCH v2 1/3] KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sean Christopherson @ 2023-05-26 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Michal Luczaj

v2 of Michal's fix for a TOCTOU bug in kvm_recalculate_phys_map().   Not
fully tested (will do that next week), though I did confirm the reworked
selftest can hit the bug.  Posting a bit prematurely as I have a long weekend
and I don't want Michal to do any duplicate work.

In Michal's words...

kvm_recalculate_apic_map() creates the APIC map iterating over the list of
vCPUs twice. First to find the max APIC ID and allocate a max-sized buffer,
then again, calling kvm_recalculate_phys_map() for each vCPU. This opens a
race window: value of max APIC ID can increase _after_ the buffer was
allocated.

Michal Luczaj (1):
  KVM: selftests: Add test for race in kvm_recalculate_apic_map()

Sean Christopherson (2):
  KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is
    out-of-bounds
  KVM: x86: Retry APIC optimized map recalc if vCPU is added/enabled

 arch/x86/kvm/lapic.c                          | 49 ++++++++++--
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../kvm/x86_64/recalc_apic_map_race.c         | 76 +++++++++++++++++++
 3 files changed, 120 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/recalc_apic_map_race.c


base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/3] KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
  2023-05-26 23:50 [PATCH v2 0/3] KVM: x86: Out-of-bounds access in kvm_recalculate_phys_map() Sean Christopherson
@ 2023-05-26 23:50 ` Sean Christopherson
  2023-05-26 23:50 ` [PATCH v2 2/3] KVM: x86: Retry APIC optimized map recalc if vCPU is added/enabled Sean Christopherson
  2023-05-26 23:50 ` [PATCH v2 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map() Sean Christopherson
  2 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2023-05-26 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Michal Luczaj

Bail from kvm_recalculate_phys_map() and disable the optimized map if the
target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added
and/or enabled its local APIC after the map was allocated.  This fixes an
out-of-bounds access bug in the !x2apic_format path where KVM would write
beyond the end of phys_map.

Check the x2APIC ID regardless of whether or not x2APIC is enabled,
as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and
the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC
being enabled, i.e. the check won't get false postivies.

Note, this also affects the x2apic_format path, which previously just
ignored the "x2apic_id > new->max_apic_id" case.  That too is arguably a
bug fix, as ignoring the vCPU meant that KVM would not send interrupts to
the vCPU until the next map recalculation.  In practice, that "bug" is
likely benign as a newly present vCPU/APIC would immediately trigger a
recalc.  But, there's no functional downside to disabling the map, and
a future patch will gracefully handle the -E2BIG case by retrying instead
of simply disabling the optimized map.

Opportunistically add a sanity check on the xAPIC ID size, along with a
comment explaining why the xAPIC ID is guaranteed to be "good".

Reported-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e542cf285b51..3c300a196bdf 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -228,6 +228,23 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	u32 xapic_id = kvm_xapic_id(apic);
 	u32 physical_id;
 
+	/*
+	 * For simplicity, KVM always allocates enough space for all possible
+	 * xAPIC IDs.  Yell, but don't kill the VM, as KVM can continue on
+	 * without the optimized map.
+	 */
+	if (WARN_ON_ONCE(xapic_id > new->max_apic_id))
+		return -EINVAL;
+
+	/*
+	 * Bail if a vCPU was added and/or enabled its APIC between allocating
+	 * the map and doing the actual calculations for the map.  Note, KVM
+	 * hardcodes the x2APIC ID to vcpu_id, i.e. there's no TOCTOU bug if
+	 * the compiler decides to reload x2apic_id after this check.
+	 */
+	if (x2apic_id > new->max_apic_id)
+		return -E2BIG;
+
 	/*
 	 * Deliberately truncate the vCPU ID when detecting a mismatched APIC
 	 * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
@@ -253,8 +270,7 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	 */
 	if (vcpu->kvm->arch.x2apic_format) {
 		/* See also kvm_apic_match_physical_addr(). */
-		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-			x2apic_id <= new->max_apic_id)
+		if (apic_x2apic_mode(apic) || x2apic_id > 0xff)
 			new->phys_map[x2apic_id] = apic;
 
 		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/3] KVM: x86: Retry APIC optimized map recalc if vCPU is added/enabled
  2023-05-26 23:50 [PATCH v2 0/3] KVM: x86: Out-of-bounds access in kvm_recalculate_phys_map() Sean Christopherson
  2023-05-26 23:50 ` [PATCH v2 1/3] KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds Sean Christopherson
@ 2023-05-26 23:50 ` Sean Christopherson
  2023-05-26 23:50 ` [PATCH v2 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map() Sean Christopherson
  2 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2023-05-26 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Michal Luczaj

Retry the optimized APIC map recalculation if an APIC-enabled vCPU shows
up between allocating the map and filling in the map data.  Conditionally
reschedule before retrying even though the number of vCPUs that can be
created is bounded by KVM.  Retrying a few thousand times isn't so slow
as to be hugely problematic, but it's not blazing fast either.

Reset xapic_id_mistach on each retry as a vCPU could change its xAPIC ID
between loops, but do NOT reset max_id.  The map size also factors in
whether or not a vCPU's local APIC is hardware-enabled, i.e. userspace
and/or the guest can theoretically keep KVM retrying indefinitely.  The
only downside is that KVM will allocate more memory than is strictly
necessary if the vCPU with the highest x2APIC ID disabled its APIC while
the recalculation was in-progress.

Refresh kvm->arch.apic_map_dirty to opportunistically change it from
DIRTY => UPDATE_IN_PROGRESS to avoid an unnecessary recalc from a
different task, i.e. if another task is waiting to attempt an update
(which is likely since a retry happens if and only if an update is
required).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3c300a196bdf..cadeaba25e65 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -381,7 +381,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
 	u32 max_id = 255; /* enough space for any xAPIC ID */
-	bool xapic_id_mismatch = false;
+	bool xapic_id_mismatch;
+	int r;
 
 	/* Read kvm->arch.apic_map_dirty before kvm->arch.apic_map.  */
 	if (atomic_read_acquire(&kvm->arch.apic_map_dirty) == CLEAN)
@@ -391,9 +392,14 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		  "Dirty APIC map without an in-kernel local APIC");
 
 	mutex_lock(&kvm->arch.apic_map_lock);
+
+retry:
 	/*
-	 * Read kvm->arch.apic_map_dirty before kvm->arch.apic_map
-	 * (if clean) or the APIC registers (if dirty).
+	 * Read kvm->arch.apic_map_dirty before kvm->arch.apic_map (if clean)
+	 * or the APIC registers (if dirty).  Note, on retry the map may have
+	 * not yet been marked dirty by whatever task changed a vCPU's x2APIC
+	 * ID, i.e. the map may still show up as in-progress.  In that case
+	 * this task still needs to retry and copmlete its calculation.
 	 */
 	if (atomic_cmpxchg_acquire(&kvm->arch.apic_map_dirty,
 				   DIRTY, UPDATE_IN_PROGRESS) == CLEAN) {
@@ -402,6 +408,15 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		return;
 	}
 
+	/*
+	 * Reset the mismatch flag between attempts so that KVM does the right
+	 * thing if a vCPU changes its xAPIC ID, but do NOT reset max_id, i.e.
+	 * keep max_id strictly increasing.  Disallowing max_id from shrinking
+	 * ensures KVM won't get stuck in an infinite loop, e.g. if the vCPU
+	 * with the highest x2APIC ID is toggling its APIC on and off.
+	 */
+	xapic_id_mismatch = false;
+
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic));
@@ -420,9 +435,15 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		if (!kvm_apic_present(vcpu))
 			continue;
 
-		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
+		r = kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch);
+		if (r) {
 			kvfree(new);
 			new = NULL;
+			if (r == -E2BIG) {
+				cond_resched();
+				goto retry;
+			}
+
 			goto out;
 		}
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map()
  2023-05-26 23:50 [PATCH v2 0/3] KVM: x86: Out-of-bounds access in kvm_recalculate_phys_map() Sean Christopherson
  2023-05-26 23:50 ` [PATCH v2 1/3] KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds Sean Christopherson
  2023-05-26 23:50 ` [PATCH v2 2/3] KVM: x86: Retry APIC optimized map recalc if vCPU is added/enabled Sean Christopherson
@ 2023-05-26 23:50 ` Sean Christopherson
  2023-06-01 23:11   ` Sean Christopherson
  2 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2023-05-26 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Michal Luczaj

From: Michal Luczaj <mhal@rbox.co>

Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
APIC map construction to hunt for TOCTOU bugs in KVM.  KVM's optimized map
recalc makes multiple passes over the list of vCPUs, and the calculations
ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
toggling LAPIC_MODE_DISABLED is quite interesting.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../kvm/x86_64/recalc_apic_map_race.c         | 76 +++++++++++++++++++
 2 files changed, 77 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/recalc_apic_map_race.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 7a5ff646e7e7..c9b8f16fb23f 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -116,6 +116,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/sev_migrate_tests
 TEST_GEN_PROGS_x86_64 += x86_64/amx_test
 TEST_GEN_PROGS_x86_64 += x86_64/max_vcpuid_cap_test
 TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test
+TEST_GEN_PROGS_x86_64 += x86_64/recalc_apic_map_race
 TEST_GEN_PROGS_x86_64 += access_tracking_perf_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
diff --git a/tools/testing/selftests/kvm/x86_64/recalc_apic_map_race.c b/tools/testing/selftests/kvm/x86_64/recalc_apic_map_race.c
new file mode 100644
index 000000000000..09516548c11a
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/recalc_apic_map_race.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * recalc_apic_map_race
+ *
+ * Test for a race condition in kvm_recalculate_apic_map().
+ */
+
+#include <sys/ioctl.h>
+#include <pthread.h>
+#include <time.h>
+
+#include "processor.h"
+#include "test_util.h"
+#include "kvm_util.h"
+#include "apic.h"
+
+#define TIMEOUT		5	/* seconds */
+
+#define LAPIC_DISABLED	0
+#define LAPIC_X2APIC	(MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)
+#define MAX_XAPIC_ID	0xff
+
+static void *race(void *arg)
+{
+	struct kvm_lapic_state lapic = {};
+	struct kvm_vcpu *vcpu = arg;
+
+	while (1) {
+		/* Trigger kvm_recalculate_apic_map(). */
+		vcpu_ioctl(vcpu, KVM_SET_LAPIC, &lapic);
+		pthread_testcancel();
+	}
+
+	return NULL;
+}
+
+int main(void)
+{
+	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+	struct kvm_vcpu *vcpuN;
+	struct kvm_vm *vm;
+	pthread_t thread;
+	time_t t;
+	int i;
+
+	kvm_static_assert(KVM_MAX_VCPUS > MAX_XAPIC_ID);
+
+	/*
+	 * Creating the max number of vCPUs supported by selftests so that KVM
+	 * has decent amount of work to do when recalculating the map, i.e. to
+	 * make the problematic window large enough to hit.
+	 */
+	vm = vm_create_with_vcpus(KVM_MAX_VCPUS, NULL, vcpus);
+
+	/*
+	 * Enable x2APIC on all vCPUs so that KVM doesn't bail from the recalc
+	 * due to vCPUs having aliased xAPIC IDs (truncated to 8 bits).
+	 */
+	for (i = 0; i < KVM_MAX_VCPUS; i++)
+		vcpu_set_msr(vcpus[i], MSR_IA32_APICBASE, LAPIC_X2APIC);
+
+	ASSERT_EQ(pthread_create(&thread, NULL, race, vcpus[0]), 0);
+
+	vcpuN = vcpus[KVM_MAX_VCPUS - 1];
+	for (t = time(NULL) + TIMEOUT; time(NULL) < t;) {
+		vcpu_set_msr(vcpuN, MSR_IA32_APICBASE, LAPIC_X2APIC);
+		vcpu_set_msr(vcpuN, MSR_IA32_APICBASE, LAPIC_DISABLED);
+	}
+
+	ASSERT_EQ(pthread_cancel(thread), 0);
+	ASSERT_EQ(pthread_join(thread, NULL), 0);
+
+	kvm_vm_release(vm);
+
+	return 0;
+}
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map()
  2023-05-26 23:50 ` [PATCH v2 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map() Sean Christopherson
@ 2023-06-01 23:11   ` Sean Christopherson
  2023-06-01 23:40     ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2023-06-01 23:11 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj

On Fri, May 26, 2023, Sean Christopherson wrote:
> From: Michal Luczaj <mhal@rbox.co>
> 
> Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
> APIC map construction to hunt for TOCTOU bugs in KVM.  KVM's optimized map
> recalc makes multiple passes over the list of vCPUs, and the calculations
> ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
> toggling LAPIC_MODE_DISABLED is quite interesting.
> 
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  tools/testing/selftests/kvm/Makefile          |  1 +
>  .../kvm/x86_64/recalc_apic_map_race.c         | 76 +++++++++++++++++++
>  2 files changed, 77 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/recalc_apic_map_race.c

Since there's another bug+test related to kvm_recalculate_apic_map()[*], I think
it makes sense to name this recalc_apic_map_test, and then fold the LDR test into
this one.  The LDR test is tiny enough that I don't think it's worth a separate
binary, even though I generally prefer to keep the selftests small.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map()
  2023-06-01 23:11   ` Sean Christopherson
@ 2023-06-01 23:40     ` Sean Christopherson
  0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2023-06-01 23:40 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj

On Thu, Jun 01, 2023, Sean Christopherson wrote:
> On Fri, May 26, 2023, Sean Christopherson wrote:
> > From: Michal Luczaj <mhal@rbox.co>
> > 
> > Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
> > APIC map construction to hunt for TOCTOU bugs in KVM.  KVM's optimized map
> > recalc makes multiple passes over the list of vCPUs, and the calculations
> > ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
> > toggling LAPIC_MODE_DISABLED is quite interesting.
> > 
> > Signed-off-by: Michal Luczaj <mhal@rbox.co>
> > Co-developed-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  tools/testing/selftests/kvm/Makefile          |  1 +
> >  .../kvm/x86_64/recalc_apic_map_race.c         | 76 +++++++++++++++++++
> >  2 files changed, 77 insertions(+)
> >  create mode 100644 tools/testing/selftests/kvm/x86_64/recalc_apic_map_race.c
> 
> Since there's another bug+test related to kvm_recalculate_apic_map()[*], I think
> it makes sense to name this recalc_apic_map_test, and then fold the LDR test into
> this one.  The LDR test is tiny enough that I don't think it's worth a separate
> binary, even though I generally prefer to keep the selftests small.

Actually, the x2APIC test is a better fit in xapic_state_test.  I'll probably
still rename this to match (almost) every other selftest name.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-01 23:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 23:50 [PATCH v2 0/3] KVM: x86: Out-of-bounds access in kvm_recalculate_phys_map() Sean Christopherson
2023-05-26 23:50 ` [PATCH v2 1/3] KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds Sean Christopherson
2023-05-26 23:50 ` [PATCH v2 2/3] KVM: x86: Retry APIC optimized map recalc if vCPU is added/enabled Sean Christopherson
2023-05-26 23:50 ` [PATCH v2 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map() Sean Christopherson
2023-06-01 23:11   ` Sean Christopherson
2023-06-01 23:40     ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.