[PATCH v2 0/4] Fix a race between posted interrupt delivery and migration in a nested VM

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] Fix a race between posted interrupt delivery and migration in a nested VM
@ 2022-08-28 22:25 Mingwei Zhang
  2022-08-28 22:25 ` [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function Mingwei Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-08-28 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Vitaly Kuznetsov,
	Oliver Upton, Mingwei Zhang, Jim Mattson

This patch set aims to fix a race condition between posted interrupt
delivery and migration for a nested VM. In particular, we proves that when
a nested vCPU is halted and just migrated, it will lose a posted
interrupt from another vCPU in the same VM.

Changelog:

v1 -> v2:
 - Replace the original vmcs12 bug fix patch into one that processes nested
   state pages request in a common function [paolo].
 - Update the commit messages [seanjc, oupton].
 - Remove vcpu_run_interruptable(), use __vcpu_run() instead [seanjc].
 - Fix format issue in prepare_posted_intr_desc() [seanjc].
 - Rebase to kvm/queue.

v1 link:
 - https://lore.kernel.org/lkml/20220802230718.1891356-6-mizhang@google.com/t/


Jim Mattson (1):
  KVM: selftests: Test if posted interrupt delivery race with migration

Mingwei Zhang (3):
  KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a
    common function
  KVM: selftests: Save/restore vAPIC state in migration tests
  KVM: selftests: Add support for posted interrupt handling in L2

 arch/x86/kvm/x86.c                            |  29 +-
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/kvm_util_base.h     |  10 +
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/include/x86_64/vmx.h        |  10 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  14 +
 .../kvm/x86_64/vmx_migrate_pi_pending.c       | 291 ++++++++++++++++++
 9 files changed, 353 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_migrate_pi_pending.c


base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-08-28 22:25 [PATCH v2 0/4] Fix a race between posted interrupt delivery and migration in a nested VM Mingwei Zhang
@ 2022-08-28 22:25 ` Mingwei Zhang
  2022-08-29 16:09   ` Sean Christopherson
  2022-08-28 22:25 ` [PATCH v2 2/4] KVM: selftests: Save/restore vAPIC state in migration tests Mingwei Zhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Mingwei Zhang @ 2022-08-28 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Vitaly Kuznetsov,
	Oliver Upton, Mingwei Zhang, Jim Mattson

Create a common function to handle kvm request in the vcpu_run loop. KVM
implicitly assumes the virtual APIC page being present + mapped into the
kernel address space when executing vmx_guest_apic_has_interrupts().
However, with demand paging KVM breaks the assumption, as the
KVM_REQ_GET_VMCS12_PAGES event isn't assessed before entering vcpu_block.

Fix this by getting vmcs12 pages before inspecting the guest's APIC page.
Because of this fix, the event handling code of
KVM_REQ_GET_NESTED_STATE_PAGES becomes a common code path for both
vcpu_enter_guest() and vcpu_block(). Thus, put this code snippet into a
common helper function to avoid code duplication.

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Originally-by: Oliver Upton <oupton@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/x86.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7374d768296..3dcaac8f0584 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10261,12 +10261,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			r = -EIO;
 			goto out;
 		}
-		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
-			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
-				r = 0;
-				goto out;
-			}
-		}
 		if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
 			kvm_mmu_free_obsolete_roots(vcpu);
 		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
@@ -10666,6 +10660,23 @@ static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted);
 }
 
+static int kvm_vcpu_handle_common_requests(struct kvm_vcpu *vcpu)
+{
+	if (kvm_request_pending(vcpu)) {
+		/*
+		 * Get the vmcs12 pages before checking for interrupts that
+		 * might unblock the guest if L1 is using virtual-interrupt
+		 * delivery.
+		 */
+		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
+			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))
+				return 0;
+		}
+	}
+
+	return 1;
+}
+
 /* Called within kvm->srcu read side.  */
 static int vcpu_run(struct kvm_vcpu *vcpu)
 {
@@ -10681,6 +10692,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 		 * this point can start executing an instruction.
 		 */
 		vcpu->arch.at_instruction_boundary = false;
+
+		/* Process common request regardless of vcpu state. */
+		r = kvm_vcpu_handle_common_requests(vcpu);
+		if (r <= 0)
+			break;
+
 		if (kvm_vcpu_running(vcpu)) {
 			r = vcpu_enter_guest(vcpu);
 		} else {
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] KVM: selftests: Save/restore vAPIC state in migration tests
  2022-08-28 22:25 [PATCH v2 0/4] Fix a race between posted interrupt delivery and migration in a nested VM Mingwei Zhang
  2022-08-28 22:25 ` [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function Mingwei Zhang
@ 2022-08-28 22:25 ` Mingwei Zhang
  2022-08-28 22:25 ` [PATCH v2 3/4] KVM: selftests: Add support for posted interrupt handling in L2 Mingwei Zhang
  2022-08-28 22:25 ` [PATCH v2 4/4] KVM: selftests: Test if posted interrupt delivery race with migration Mingwei Zhang
  3 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-08-28 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Vitaly Kuznetsov,
	Oliver Upton, Mingwei Zhang, Jim Mattson

Save/restore vAPIC state as part of vCPU save/load so that it is preserved
across VM (copyless) migration. This wil allow testing the posted
interrupts are properly handled across VM migration.

Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 tools/testing/selftests/kvm/include/kvm_util_base.h    | 10 ++++++++++
 tools/testing/selftests/kvm/include/x86_64/processor.h |  1 +
 tools/testing/selftests/kvm/lib/x86_64/processor.c     |  2 ++
 3 files changed, 13 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 24fde97f6121..ac883b8eab57 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -457,6 +457,16 @@ static inline void vcpu_fpu_set(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 	vcpu_ioctl(vcpu, KVM_SET_FPU, fpu);
 }
 
+static inline void vcpu_apic_get(struct kvm_vcpu *vcpu, struct kvm_lapic_state *apic)
+{
+	vcpu_ioctl(vcpu, KVM_GET_LAPIC, apic);
+}
+
+static inline void vcpu_apic_set(struct kvm_vcpu *vcpu, struct kvm_lapic_state *apic)
+{
+	vcpu_ioctl(vcpu, KVM_SET_LAPIC, apic);
+}
+
 static inline int __vcpu_get_reg(struct kvm_vcpu *vcpu, uint64_t id, void *addr)
 {
 	struct kvm_one_reg reg = { .id = id, .addr = (uint64_t)addr };
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 0cbc71b7af50..102a56a60652 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -225,6 +225,7 @@ struct kvm_x86_state {
 		struct kvm_nested_state nested;
 		char nested_[16384];
 	};
+	struct kvm_lapic_state apic;
 	struct kvm_msrs msrs;
 };
 
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 2e6e61bbe81b..e22b4f0e24f1 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -980,6 +980,7 @@ struct kvm_x86_state *vcpu_save_state(struct kvm_vcpu *vcpu)
 	vcpu_msrs_get(vcpu, &state->msrs);
 
 	vcpu_debugregs_get(vcpu, &state->debugregs);
+	vcpu_apic_get(vcpu, &state->apic);
 
 	return state;
 }
@@ -997,6 +998,7 @@ void vcpu_load_state(struct kvm_vcpu *vcpu, struct kvm_x86_state *state)
 	vcpu_mp_state_set(vcpu, &state->mp_state);
 	vcpu_debugregs_set(vcpu, &state->debugregs);
 	vcpu_regs_set(vcpu, &state->regs);
+	vcpu_apic_set(vcpu, &state->apic);
 
 	if (state->nested.size)
 		vcpu_nested_state_set(vcpu, &state->nested);
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] KVM: selftests: Add support for posted interrupt handling in L2
  2022-08-28 22:25 [PATCH v2 0/4] Fix a race between posted interrupt delivery and migration in a nested VM Mingwei Zhang
  2022-08-28 22:25 ` [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function Mingwei Zhang
  2022-08-28 22:25 ` [PATCH v2 2/4] KVM: selftests: Save/restore vAPIC state in migration tests Mingwei Zhang
@ 2022-08-28 22:25 ` Mingwei Zhang
  2022-08-29 16:19   ` Sean Christopherson
  2022-08-28 22:25 ` [PATCH v2 4/4] KVM: selftests: Test if posted interrupt delivery race with migration Mingwei Zhang
  3 siblings, 1 reply; 15+ messages in thread
From: Mingwei Zhang @ 2022-08-28 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Vitaly Kuznetsov,
	Oliver Upton, Mingwei Zhang, Jim Mattson

Add support for posted interrupt handling in L2. This is done by adding
needed data structures in vmx_pages and APIs to allow an L2 receive posted
interrupts.

Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 tools/testing/selftests/kvm/include/x86_64/vmx.h | 10 ++++++++++
 tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 14 ++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 99fa1410964c..69784fc71bce 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -577,6 +577,14 @@ struct vmx_pages {
 	void *apic_access_hva;
 	uint64_t apic_access_gpa;
 	void *apic_access;
+
+	void *virtual_apic_hva;
+	uint64_t virtual_apic_gpa;
+	void *virtual_apic;
+
+	void *posted_intr_desc_hva;
+	uint64_t posted_intr_desc_gpa;
+	void *posted_intr_desc;
 };
 
 union vmx_basic {
@@ -620,5 +628,7 @@ void nested_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
 void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot);
 void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
+void prepare_virtual_apic(struct vmx_pages *vmx, struct kvm_vm *vm);
+void prepare_posted_intr_desc(struct vmx_pages *vmx, struct kvm_vm *vm);
 
 #endif /* SELFTEST_KVM_VMX_H */
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 80a568c439b8..47ae419d7eb1 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -556,3 +556,17 @@ void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm)
 	vmx->apic_access_hva = addr_gva2hva(vm, (uintptr_t)vmx->apic_access);
 	vmx->apic_access_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->apic_access);
 }
+
+void prepare_virtual_apic(struct vmx_pages *vmx, struct kvm_vm *vm)
+{
+	vmx->virtual_apic = (void *)vm_vaddr_alloc_page(vm);
+	vmx->virtual_apic_hva = addr_gva2hva(vm, (uintptr_t)vmx->virtual_apic);
+	vmx->virtual_apic_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->virtual_apic);
+}
+
+void prepare_posted_intr_desc(struct vmx_pages *vmx, struct kvm_vm *vm)
+{
+	vmx->posted_intr_desc = (void *)vm_vaddr_alloc_page(vm);
+	vmx->posted_intr_desc_hva = addr_gva2hva(vm, vmx->posted_intr_desc);
+	vmx->posted_intr_desc_gpa = addr_gva2gpa(vm, vmx->posted_intr_desc);
+}
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] KVM: selftests: Test if posted interrupt delivery race with migration
  2022-08-28 22:25 [PATCH v2 0/4] Fix a race between posted interrupt delivery and migration in a nested VM Mingwei Zhang
                   ` (2 preceding siblings ...)
  2022-08-28 22:25 ` [PATCH v2 3/4] KVM: selftests: Add support for posted interrupt handling in L2 Mingwei Zhang
@ 2022-08-28 22:25 ` Mingwei Zhang
  2022-08-29 17:21   ` Sean Christopherson
  3 siblings, 1 reply; 15+ messages in thread
From: Mingwei Zhang @ 2022-08-28 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Vitaly Kuznetsov,
	Oliver Upton, Mingwei Zhang, Jim Mattson

From: Jim Mattson <jmattson@google.com>

Test if posted interrupt delivery race with migration. Add a selftest
to demonstrate a race condition between migration and posted
interrupt for a nested VM. The consequence of this race condition causes
the loss of a posted interrupt for a nested vCPU after migration and
triggers a warning for unpatched kernel.

The selftest demonstrates that if a L2 vCPU is in halted state before
migration, then after migration, it is not able to receive a posted
interrupt from another vCPU within the same VM.

The fundamental problem is deeply buried in the kernel logic where
vcpu_block() will directly check vmcs12 related mappings before having a
valid vmcs12 ready.  Because of that, it fails to process the posted
interrupt and triggers the warning in vmx_guest_apic_has_interrupt()

static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
{
	...
	if (WARN_ON_ONCE(!is_guest_mode(vcpu)) ||
		!nested_cpu_has_vid(get_vmcs12(vcpu)) ||
		WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn)) <= HERE
		return false;
	...
}

Signed-off-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/vmx_migrate_pi_pending.c       | 291 ++++++++++++++++++
 3 files changed, 293 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_migrate_pi_pending.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index d625a3f83780..749b2be5b23c 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -51,6 +51,7 @@
 /x86_64/vmx_exception_with_invalid_guest_state
 /x86_64/vmx_invalid_nested_guest_state
 /x86_64/vmx_msrs_test
+/x86_64/vmx_migrate_pi_pending
 /x86_64/vmx_preemption_timer_test
 /x86_64/vmx_set_nested_state_test
 /x86_64/vmx_tsc_adjust_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4c122f1b1737..5e830c65c068 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -110,6 +110,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_exception_with_invalid_guest_state
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_migrate_pi_pending
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_migrate_pi_pending.c b/tools/testing/selftests/kvm/x86_64/vmx_migrate_pi_pending.c
new file mode 100644
index 000000000000..6f1a9f284754
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_migrate_pi_pending.c
@@ -0,0 +1,291 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * vmx_migrate_pi_pending
+ *
+ * Copyright (C) 2022, Google, LLC.
+ *
+ * Deliver a nested posted interrupt between migration and the first
+ * KVM_RUN on the target.
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "vmx.h"
+
+#include <string.h>
+#include <sys/ioctl.h>
+#include <linux/bitmap.h>
+#include <pthread.h>
+#include <signal.h>
+
+#include "kselftest.h"
+
+#define VCPU_ID0 0
+#define VCPU_ID1 1
+#define PI_ON_BIT 256
+#define PI_NV    0x42
+#define L2_INTR  0x71
+
+enum {
+	PORT_L0_EXIT = 0x2000,
+};
+
+/* The virtual machine object. */
+struct vmx_pages *vmx;
+
+static struct kvm_vm *vm;
+static struct kvm_vcpu *vcpu0;
+static struct kvm_vcpu *vcpu1;
+bool vcpu0_can_run = true;
+bool vcpu0_running;
+bool pi_executed;
+pthread_t pthread_cpu0;
+pthread_t pthread_cpu1;
+
+static void vcpu0_ipi_handler(struct ex_regs *regs)
+{
+	 asm volatile("inb %%dx, %%al"
+		      : : [port] "d" (PORT_L0_EXIT) : "rax");
+	asm volatile("vmcall");
+}
+
+static void l2_vcpu0_guest_code(void)
+{
+	asm volatile("cli");
+	asm volatile("sti; nop; hlt");
+}
+
+static void l1_vcpu0_guest_code(struct vmx_pages *vmx_pages)
+{
+#define L2_GUEST_STACK_SIZE 64
+	unsigned long l2_vcpu0_guest_stack[L2_GUEST_STACK_SIZE];
+	uint32_t control;
+
+	x2apic_enable();
+
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	GUEST_ASSERT(load_vmcs(vmx_pages));
+
+	/* Prepare the VMCS for L2 execution. */
+	prepare_vmcs(vmx_pages, l2_vcpu0_guest_code,
+		     &l2_vcpu0_guest_stack[L2_GUEST_STACK_SIZE]);
+	control = vmreadz(PIN_BASED_VM_EXEC_CONTROL);
+	control |= (PIN_BASED_EXT_INTR_MASK |
+		    PIN_BASED_POSTED_INTR);
+	vmwrite(PIN_BASED_VM_EXEC_CONTROL, control);
+	control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
+	control |= (CPU_BASED_TPR_SHADOW |
+		    CPU_BASED_ACTIVATE_SECONDARY_CONTROLS);
+	vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
+	control = vmreadz(SECONDARY_VM_EXEC_CONTROL);
+	control |= (SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+		    SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+	vmwrite(SECONDARY_VM_EXEC_CONTROL, control);
+	control = vmreadz(VM_EXIT_CONTROLS);
+	control |= VM_EXIT_ACK_INTR_ON_EXIT;
+	vmwrite(VM_EXIT_CONTROLS, control);
+	vmwrite(VIRTUAL_APIC_PAGE_ADDR, vmx_pages->virtual_apic_gpa);
+	vmwrite(POSTED_INTR_DESC_ADDR, vmx_pages->posted_intr_desc_gpa);
+	vmwrite(POSTED_INTR_NV, PI_NV);
+
+	GUEST_ASSERT(!vmlaunch());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+	GUEST_ASSERT(!test_bit(PI_ON_BIT, (void *)vmx_pages->posted_intr_desc));
+	GUEST_DONE();
+}
+
+static void post_intr(u8 vector, void *pi_desc)
+{
+	set_bit(vector, pi_desc);
+	set_bit(PI_ON_BIT, pi_desc);
+}
+
+static void l1_vcpu1_guest_code(void *vcpu0_pi_desc)
+{
+	post_intr(L2_INTR, vcpu0_pi_desc);
+	x2apic_enable();
+	x2apic_write_reg(APIC_ICR, ((u64)VCPU_ID0 << 32) |
+			 APIC_DEST_PHYSICAL | APIC_DM_FIXED | PI_NV);
+	GUEST_DONE();
+}
+
+static void save_restore_vm(struct kvm_vm *vm)
+{
+	struct kvm_regs regs1 = {}, regs2 = {};
+	struct kvm_x86_state *state;
+
+	state = vcpu_save_state(vcpu0);
+	vcpu_regs_get(vcpu0, &regs1);
+
+	kvm_vm_release(vm);
+
+	/* Restore state in a new VM.  */
+	vcpu0 = vm_recreate_with_one_vcpu(vm);
+	vcpu_load_state(vcpu0, state);
+	kvm_x86_state_cleanup(state);
+
+	vcpu_regs_get(vcpu0, &regs2);
+	TEST_ASSERT(!memcmp(&regs1, &regs2, sizeof(regs2)),
+		    "vcpu0: Unexpected register values after vcpu_load_state; rdi: %lx rsi: %lx",
+		    (ulong) regs2.rdi, (ulong) regs2.rsi);
+}
+
+void *create_and_run_vcpu1(void *arg)
+{
+	struct ucall uc;
+	struct kvm_run *run;
+	struct kvm_mp_state vcpu0_mp_state;
+
+	pthread_cpu1 = pthread_self();
+
+	/* Keep trying to kick out vcpu0 until it is in halted state. */
+	for (;;) {
+		WRITE_ONCE(vcpu0_can_run, true);
+		sleep(1);
+		WRITE_ONCE(vcpu0_can_run, false);
+		pthread_kill(pthread_cpu0, SIGUSR1);
+		printf("vcpu1: Sent SIGUSR1 to vcpu0\n");
+
+		while (READ_ONCE(vcpu0_running))
+			;
+
+		vcpu_mp_state_get(vcpu0, &vcpu0_mp_state);
+		if (vcpu0_mp_state.mp_state == KVM_MP_STATE_HALTED)
+			break;
+	}
+
+	printf("vcpu1: Kicked out vcpu0 and ensure vcpu0 is halted\n");
+
+	/* Use save_restore_vm() to simulate a VM migration. */
+	save_restore_vm(vm);
+
+	printf("vcpu1: Finished save and restore vm.\n");
+	vcpu1 = vm_vcpu_add(vm, VCPU_ID1, l1_vcpu1_guest_code);
+	vcpu_args_set(vcpu1, 1, vmx->posted_intr_desc);
+
+	/* Start an L1 in vcpu1 and send a posted interrupt to halted L2 in vcpu0. */
+	for (;;) {
+		run = vcpu1->run;
+		vcpu_run(vcpu1);
+
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "vcpu1: Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+			    run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vcpu1, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s", (const char *)uc.args[0]);
+			/* NOT REACHED */
+		case UCALL_DONE:
+			printf("vcpu1: Successfully send a posted interrupt to vcpu0\n");
+			goto done;
+		default:
+			TEST_FAIL("vcpu1: Unknown ucall %lu", uc.cmd);
+		}
+	}
+
+done:
+	/*
+	 * Allow vcpu0 resume execution from L0 userspace and check if the
+	 * posted interrupt get executed.
+	 */
+	WRITE_ONCE(vcpu0_can_run, true);
+	sleep(1);
+	TEST_ASSERT(READ_ONCE(pi_executed),
+		    "vcpu0 did not execute the posted interrupt.\n");
+
+	return NULL;
+}
+
+void sig_handler(int signum, siginfo_t *info, void *context)
+{
+	TEST_ASSERT(pthread_self() == pthread_cpu0,
+		    "Incorrect receiver of the signal, expect pthread_cpu0: "
+		    "%lu, but get: %lu\n", pthread_cpu0, pthread_self());
+	printf("vcpu0: Execute sighandler for signal: %d\n", signum);
+}
+
+int main(int argc, char *argv[])
+{
+	vm_vaddr_t vmx_pages_gva;
+	struct sigaction sig_action;
+	struct sigaction old_action;
+
+	memset(&sig_action, 0, sizeof(sig_action));
+	sig_action.sa_sigaction = sig_handler;
+	sig_action.sa_flags = SA_RESTART | SA_SIGINFO;
+	sigemptyset(&sig_action.sa_mask);
+	sigaction(SIGUSR1, &sig_action, &old_action);
+
+	pthread_cpu0 = pthread_self();
+	printf("vcpu0: Finish setup signal handler for SIGUSR1\n");
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE));
+	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
+
+	vm = vm_create_with_one_vcpu(&vcpu0, (void *)l1_vcpu0_guest_code);
+
+	vm_init_descriptor_tables(vm);
+	vcpu_init_descriptor_tables(vcpu0);
+	vm_install_exception_handler(vm, L2_INTR, vcpu0_ipi_handler);
+
+	/* Allocate VMX pages and shared descriptors (vmx_pages). */
+	vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
+	prepare_virtual_apic(vmx, vm);
+	prepare_posted_intr_desc(vmx, vm);
+	vcpu_args_set(vcpu0, 1, vmx_pages_gva);
+
+	pthread_create(&pthread_cpu1, NULL, create_and_run_vcpu1, NULL);
+
+	for (;;) {
+		struct kvm_run *run = vcpu0->run;
+		struct ucall uc;
+		int rc;
+
+		while (!READ_ONCE(vcpu0_can_run))
+			;
+
+		WRITE_ONCE(vcpu0_running, true);
+
+		rc = __vcpu_run(vcpu0);
+
+		vcpu0->run->immediate_exit = 0;
+
+		/*
+		 * When vCPU is kicked out by a signal, ensure a consistent vCPU
+		 * state to prepare for migration before setting the
+		 * vcpu_running flag to false.
+		 */
+		if (rc == -1 && run->exit_reason == KVM_EXIT_INTR) {
+			vcpu_run_complete_io(vcpu0);
+
+			WRITE_ONCE(vcpu0_running, false);
+
+			continue;
+		}
+
+		WRITE_ONCE(vcpu0_running, false);
+
+		if (run->io.port == PORT_L0_EXIT) {
+			printf("vcpu0: Executed the posted interrupt\n");
+			WRITE_ONCE(pi_executed, true);
+			continue;
+		}
+
+		switch (get_ucall(vcpu0, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s", (const char *)uc.args[0]);
+			/* NOT REACHED */
+		case UCALL_DONE:
+			goto done;
+		default:
+			TEST_FAIL("vcpu0: Unknown ucall %lu", uc.cmd);
+		}
+	}
+
+done:
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-08-28 22:25 ` [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function Mingwei Zhang
@ 2022-08-29 16:09   ` Sean Christopherson
  2022-09-07  0:01     ` Mingwei Zhang
  2022-09-07  2:50     ` Yuan Yao
  0 siblings, 2 replies; 15+ messages in thread
From: Sean Christopherson @ 2022-08-29 16:09 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Sun, Aug 28, 2022, Mingwei Zhang wrote:
> Create a common function to handle kvm request in the vcpu_run loop. KVM
> implicitly assumes the virtual APIC page being present + mapped into the
> kernel address space when executing vmx_guest_apic_has_interrupts().
> However, with demand paging KVM breaks the assumption, as the
> KVM_REQ_GET_VMCS12_PAGES event isn't assessed before entering vcpu_block.

KVM_REQ_GET_VMCS12_PAGES doesn't exist upstream.

> Fix this by getting vmcs12 pages before inspecting the guest's APIC page.
> Because of this fix, the event handling code of
> KVM_REQ_GET_NESTED_STATE_PAGES becomes a common code path for both
> vcpu_enter_guest() and vcpu_block(). Thus, put this code snippet into a
> common helper function to avoid code duplication.
> 
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Originally-by: Oliver Upton <oupton@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>

If you drop someone as author, then their SOB also needs to be jettisoned.

> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/kvm/x86.c | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7374d768296..3dcaac8f0584 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10261,12 +10261,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			r = -EIO;
>  			goto out;
>  		}
> -		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
> -			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
> -				r = 0;
> -				goto out;
> -			}
> -		}
>  		if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
>  			kvm_mmu_free_obsolete_roots(vcpu);
>  		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> @@ -10666,6 +10660,23 @@ static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
>  		!vcpu->arch.apf.halted);
>  }
>  
> +static int kvm_vcpu_handle_common_requests(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_request_pending(vcpu)) {

Probably going to be a moot point, but write this as

	if (!kvm_request_pending(vcpu))
		return 1;

to reduce indentation.

> +		/*
> +		 * Get the vmcs12 pages before checking for interrupts that
> +		 * might unblock the guest if L1 is using virtual-interrupt
> +		 * delivery.
> +		 */
> +		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
> +			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))

Similarly

	if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu) &&
	    unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))
		return 0;

though I can see the argument for fully isolating each request.  But again, likely
a moot point.

> +				return 0;
> +		}
> +	}
> +
> +	return 1;
> +}
> +
>  /* Called within kvm->srcu read side.  */
>  static int vcpu_run(struct kvm_vcpu *vcpu)
>  {
> @@ -10681,6 +10692,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  		 * this point can start executing an instruction.
>  		 */
>  		vcpu->arch.at_instruction_boundary = false;
> +
> +		/* Process common request regardless of vcpu state. */
> +		r = kvm_vcpu_handle_common_requests(vcpu);

IMO this is subtly a dangerous hook.  It implies that both vcpu_enter_guest()
and vcpu_block() correctly handle requests becoming pending after the "common"
check, but that's not actually the case.  If a request _needs_ to be handled
before vcpu_block(), then ideally it should be explicitly queried in
kvm_vcpu_check_block().  KVM_REQ_GET_NESTED_STATE_PAGES doesn't have issues because
it's only ever set from the vCPU itself.

Following that train of thought, KVM_REQ_GET_NESTED_STATE_PAGES really shouldn't
even be a request.  Aha!  And we can do that in a way that would magically fix this
bug, and would ensure we don't leave a trap for future us.

KVM already provides KVM_REQ_UNBLOCK to prevent blocking the vCPU without actaully
waking the vCPU, i.e. to kick the vCPU back into the vcpu_run() loop.  The request
is provided specifically for scenarios like this where KVM needs to do work before
blocking.

Normally I'd say we should do this over multiple patches so that the "blocking"
bug is fixed before doing the rework/cleanup, but I'm ok if we want to skip straight
to the rework since we're obviously carrying an internal patch and no one else is
likely to need the fix.  But I also wouldn't object to including an intermediate
patch to fix the bug so that there's a better paper trail.

E.g. as a very partial conversion:

---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/x86.c              | 12 ++++++++++++
 arch/x86/kvm/x86.h              | 10 ++++++++++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9345303c8c6d..bfca37419783 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -939,6 +939,8 @@ struct kvm_vcpu_arch {
 	 */
 	bool pdptrs_from_userspace;

+	bool nested_get_pages_pending;
+
 #if IS_ENABLED(CONFIG_HYPERV)
 	hpa_t hv_root_tdp;
 #endif
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ddd4367d4826..e83b145c3a35 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3446,7 +3446,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		 * to nested_get_vmcs12_pages before the next VM-entry.  The MSRs
 		 * have already been set at vmentry time and should not be reset.
 		 */
-		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
+		kvm_nested_get_pages_set_pending(vcpu);
 	}

 	/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c0e3e7915a3a..0a7601ebffc6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9650,6 +9650,12 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
 	return kvm_x86_ops.nested_ops->check_events(vcpu);
 }

+static int kvm_get_nested_state_pages(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.nested_get_pages_pending = false;
+	return kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu);
+}
+
 static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 {
 	trace_kvm_inj_exception(vcpu->arch.exception.nr,
@@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 		if (kvm_cpu_has_pending_timer(vcpu))
 			kvm_inject_pending_timer_irqs(vcpu);

+		if (vcpu->arch.nested_get_pages_pending) {
+			r = kvm_get_nested_state_pages(vcpu);
+			if (r <= 0)
+				break;
+		}
+
 		if (dm_request_for_irq_injection(vcpu) &&
 			kvm_vcpu_ready_for_interrupt_injection(vcpu)) {
 			r = 0;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1926d2cb8e79..e35aac39dc73 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -481,4 +481,14 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 			 unsigned int port, void *data,  unsigned int count,
 			 int in);

+static inline void kvm_nested_get_pages_set_pending(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Here is a comment explaining why KVM needs to prevent the vCPU from
+	 * blocking until the vCPU's nested pages have been loaded.
+	 */
+	vcpu->arch.nested_get_pages_pending = true;
+	kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
+}
+
 #endif

base-commit: 14a47a98151834c5bd2f6d8d592b01108a3f882a
--

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] KVM: selftests: Add support for posted interrupt handling in L2
  2022-08-28 22:25 ` [PATCH v2 3/4] KVM: selftests: Add support for posted interrupt handling in L2 Mingwei Zhang
@ 2022-08-29 16:19   ` Sean Christopherson
  2022-09-07  0:02     ` Mingwei Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2022-08-29 16:19 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Sun, Aug 28, 2022, Mingwei Zhang wrote:
> Add support for posted interrupt handling in L2. This is done by adding
> needed data structures in vmx_pages and APIs to allow an L2 receive posted
> interrupts.
> 
> Cc: Jim Mattson <jmattson@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  tools/testing/selftests/kvm/include/x86_64/vmx.h | 10 ++++++++++
>  tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 14 ++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> index 99fa1410964c..69784fc71bce 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> @@ -577,6 +577,14 @@ struct vmx_pages {
>  	void *apic_access_hva;
>  	uint64_t apic_access_gpa;
>  	void *apic_access;
> +
> +	void *virtual_apic_hva;
> +	uint64_t virtual_apic_gpa;
> +	void *virtual_apic;
> +
> +	void *posted_intr_desc_hva;
> +	uint64_t posted_intr_desc_gpa;
> +	void *posted_intr_desc;

Can you add a prep patch to dedup the absurd amount of copy-paste code related to
vmx_pages?

I.e. take this and give all the other triplets the same treatment.

---
 tools/testing/selftests/kvm/include/x86_64/vmx.h | 10 +++++++---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 15 ++++++++++-----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 99fa1410964c..ecc66d65acc1 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -537,10 +537,14 @@ static inline uint32_t vmcs_revision(void)
 	return rdmsr(MSR_IA32_VMX_BASIC);
 }

+struct vmx_page {
+	void *gva;
+	uint64_t gpa;
+	void *hva;
+};
+
 struct vmx_pages {
-	void *vmxon_hva;
-	uint64_t vmxon_gpa;
-	void *vmxon;
+	struct vmx_page vmxon;

 	void *vmcs_hva;
 	uint64_t vmcs_gpa;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 80a568c439b8..e4eeab85741a 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -58,6 +58,13 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
 	return evmcs_ver;
 }

+static void vcpu_alloc_vmx_page(struct kvm_vm *vm, struct vmx_page *page)
+{
+	page->gva = (void *)vm_vaddr_alloc_page(vm);
+	page->hva = addr_gva2hva(vm, (uintptr_t)page->gva);
+	page->gpa = addr_gva2gpa(vm, (uintptr_t)page->gva);
+}
+
 /* Allocate memory regions for nested VMX tests.
  *
  * Input Args:
@@ -76,9 +83,7 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
 	struct vmx_pages *vmx = addr_gva2hva(vm, vmx_gva);

 	/* Setup of a region of guest memory for the vmxon region. */
-	vmx->vmxon = (void *)vm_vaddr_alloc_page(vm);
-	vmx->vmxon_hva = addr_gva2hva(vm, (uintptr_t)vmx->vmxon);
-	vmx->vmxon_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmxon);
+	vcpu_alloc_vmx_page(vm, &vmx->vmxon);

 	/* Setup of a region of guest memory for a vmcs. */
 	vmx->vmcs = (void *)vm_vaddr_alloc_page(vm);
@@ -160,8 +165,8 @@ bool prepare_for_vmx_operation(struct vmx_pages *vmx)
 		wrmsr(MSR_IA32_FEAT_CTL, feature_control | required);

 	/* Enter VMX root operation. */
-	*(uint32_t *)(vmx->vmxon) = vmcs_revision();
-	if (vmxon(vmx->vmxon_gpa))
+	*(uint32_t *)(vmx->vmxon.gva) = vmcs_revision();
+	if (vmxon(vmx->vmxon.gpa))
 		return false;

 	return true;

base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2
--


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] KVM: selftests: Test if posted interrupt delivery race with migration
  2022-08-28 22:25 ` [PATCH v2 4/4] KVM: selftests: Test if posted interrupt delivery race with migration Mingwei Zhang
@ 2022-08-29 17:21   ` Sean Christopherson
  0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2022-08-29 17:21 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Sun, Aug 28, 2022, Mingwei Zhang wrote:
> From: Jim Mattson <jmattson@google.com>
> 
> Test if posted interrupt delivery race with migration. Add a selftest
> to demonstrate a race condition between migration and posted
> interrupt for a nested VM. The consequence of this race condition causes
> the loss of a posted interrupt for a nested vCPU after migration and
> triggers a warning for unpatched kernel.
> 
> The selftest demonstrates that if a L2 vCPU is in halted state before
> migration, then after migration, it is not able to receive a posted
> interrupt from another vCPU within the same VM.

For tests, try to phrase the changelog in terms of what architectural behavior is
being tested, as opposed to stating what exact KVM bug is being targeted.  It's
definitely helpful to call out the KVM bug, but do that _after_ explaining the
test itself.  The problem with talking about a specific KVM bug is that 

IIUC, this can be:

  Add a test to verify that a posted interrupt wakes the target vCPU from
  a halted state if the VM is migrated while the vCPU is halted.

and then something like this to call out the known KVM bug

  This test exposes a bug where KVM checks the vAPIC page before loading
  nested pages when the vCPU is blocking.

> The fundamental problem is deeply buried in the kernel logic where
> vcpu_block() will directly check vmcs12 related mappings before having a
> valid vmcs12 ready.  Because of that, it fails to process the posted
> interrupt and triggers the warning in vmx_guest_apic_has_interrupt()
> 
> static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
> {
> 	...
> 	if (WARN_ON_ONCE(!is_guest_mode(vcpu)) ||
> 		!nested_cpu_has_vid(get_vmcs12(vcpu)) ||
> 		WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn)) <= HERE
> 		return false;
> 	...
> }
> +static void vcpu0_ipi_handler(struct ex_regs *regs)

This needs to clarify that it's an L2 handler, that info is _extremely_ important
to understanding this sequence.  And even if that info were superfluous, it's still
a good habit to use consistent namespacing.

> +{
> +	 asm volatile("inb %%dx, %%al"
> +		      : : [port] "d" (PORT_L0_EXIT) : "rax");

Can't this use GUEST_SYNC()?

> +	asm volatile("vmcall");

A comment would be helpful, but not as necessary if this is vcpu0_l2_ipi_handler().

> +}
> +
> +static void l2_vcpu0_guest_code(void)

I have a slight preference for

	vcpu0_l2_ipi_handler()
	vcpu0_l2_guest_code()

	vcpu1_l1_guest_code()

because the "vcpu0 vs. vcpu1" is the broader scope, and then "l1 vs. l2" further
clarifies the exact scope of the function.

> +{
> +	asm volatile("cli");
> +	asm volatile("sti; nop; hlt");

What is this code trying to do?  Assuming the intent is to ensure the posted IRQ
arrives after "hlt", this needs to be:

	GUEST_ASSERT(!irqs_enabled());
	asm volatile("sti; hlt");

because if interrupts are enabled when l2_vcpu0_guest_code() starts running, then
the IRQ can arrive before CLI.  And the "nop" needs to go because the nop will
consume the STI shadow, i.e. the IRQ can arrive before the "hlt".  irqs_enabled()
needs to be defined, but it's fairly straightfoward; something like this:

static __always_inline bool irqs_enabled(void)
{
	unsigned long flags;

	asm volatile("pushf ; pop %0"
		     : "=rm" (flags)
		     :
		     : "memory");

	return flags & X86_EFLAGS_IF;
}

If my assuming is wrong, then this needs a very verbose comment.

> +static void post_intr(u8 vector, void *pi_desc)
> +{
> +       set_bit(vector, pi_desc);
> +       set_bit(PI_ON_BIT, pi_desc);
> +}
> +
> +static void l1_vcpu1_guest_code(void *vcpu0_pi_desc)
> +{
> +       post_intr(L2_INTR, vcpu0_pi_desc);

Open code post_intr() here.  I would expect a "post_intr()" helper to actually do
the notification.  Separating the two things confused me.

> +       x2apic_enable();

Why take a dependency on x2APIC?  Either way, enable x2APIC, then post the
interrupt.  Again, it's weird splitting "set info in PI descriptor" from the
notification.

> +       x2apic_write_reg(APIC_ICR, ((u64)VCPU_ID0 << 32) |
> +                        APIC_DEST_PHYSICAL | APIC_DM_FIXED | PI_NV);
> +       GUEST_DONE();
> +}

...

> +void *create_and_run_vcpu1(void *arg)
> +{
> +	struct ucall uc;
> +	struct kvm_run *run;
> +	struct kvm_mp_state vcpu0_mp_state;
> +
> +	pthread_cpu1 = pthread_self();
> +
> +	/* Keep trying to kick out vcpu0 until it is in halted state. */
> +	for (;;) {
> +		WRITE_ONCE(vcpu0_can_run, true);
> +		sleep(1);
> +		WRITE_ONCE(vcpu0_can_run, false);
> +		pthread_kill(pthread_cpu0, SIGUSR1);
> +		printf("vcpu1: Sent SIGUSR1 to vcpu0\n");

Use pr_debug(), this has the potential to spam the console.  And then probably use
pr_info() for the other printfs so that they can be turned off via QUIET.

> +
> +		while (READ_ONCE(vcpu0_running))
> +			;

vcpu0_running is unnecessary.  KVM needs to acquire vcpu->mutex to do KVM_GET_MP_STATE,
i.e. vcpu_mp_state_get() will block until KVM_RUN completes.  Nothing guarantees
vcpu0_running will be set from main() before this gets to the while-loop, so
vcpu_mp_state_get() racing with KVM_RUN is already possible.

> +
> +		vcpu_mp_state_get(vcpu0, &vcpu0_mp_state);
> +		if (vcpu0_mp_state.mp_state == KVM_MP_STATE_HALTED)
> +			break;
> +	}
> +
> +	printf("vcpu1: Kicked out vcpu0 and ensure vcpu0 is halted\n");
> +
> +	/* Use save_restore_vm() to simulate a VM migration. */
> +	save_restore_vm(vm);
> +
> +	printf("vcpu1: Finished save and restore vm.\n");

Uber nit, be consistent on whether or not the test uses punctionation.

> +	vcpu1 = vm_vcpu_add(vm, VCPU_ID1, l1_vcpu1_guest_code);
> +	vcpu_args_set(vcpu1, 1, vmx->posted_intr_desc);
> +
> +	/* Start an L1 in vcpu1 and send a posted interrupt to halted L2 in vcpu0. */
> +	for (;;) {
> +		run = vcpu1->run;
> +		vcpu_run(vcpu1);
> +
> +		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
> +			    "vcpu1: Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
> +			    run->exit_reason,
> +			    exit_reason_str(run->exit_reason));
> +
> +		switch (get_ucall(vcpu1, &uc)) {
> +		case UCALL_ABORT:
> +			TEST_FAIL("%s", (const char *)uc.args[0]);
> +			/* NOT REACHED */

			REPORT_GUEST_ASSERT(...)

> +		case UCALL_DONE:
> +			printf("vcpu1: Successfully send a posted interrupt to vcpu0\n");
> +			goto done;
> +		default:
> +			TEST_FAIL("vcpu1: Unknown ucall %lu", uc.cmd);
> +		}
> +	}
> +
> +done:
> +	/*
> +	 * Allow vcpu0 resume execution from L0 userspace and check if the
> +	 * posted interrupt get executed.
> +	 */
> +	WRITE_ONCE(vcpu0_can_run, true);
> +	sleep(1);

What guarantees that sleep(1) is sufficient for vCPU to get back into the guest?
This might be a good candidate for a sempahore?

> +	TEST_ASSERT(READ_ONCE(pi_executed),
> +		    "vcpu0 did not execute the posted interrupt.\n");
> +
> +	return NULL;
> +}

...

> +       TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE));
> +       TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));

This also requires APICv / posted interrupts, and x2APIC if that hardcoded behavior
is kept.

> +	for (;;) {
> +		struct kvm_run *run = vcpu0->run;
> +		struct ucall uc;
> +		int rc;
> +
> +		while (!READ_ONCE(vcpu0_can_run))
> +			;
> +
> +		WRITE_ONCE(vcpu0_running, true);
> +
> +		rc = __vcpu_run(vcpu0);
> +
> +		vcpu0->run->immediate_exit = 0;

Why?  vcpu_run_complete_io() is the only thing that sets immediate_exit, and it
clears the flag after doing __vcpu_run().

> +
> +		/*
> +		 * When vCPU is kicked out by a signal, ensure a consistent vCPU
> +		 * state to prepare for migration before setting the
> +		 * vcpu_running flag to false.
> +		 */
> +		if (rc == -1 && run->exit_reason == KVM_EXIT_INTR) {
> +			vcpu_run_complete_io(vcpu0);
> +
> +			WRITE_ONCE(vcpu0_running, false);
> +
> +			continue;
> +		}
> +
> +		WRITE_ONCE(vcpu0_running, false);
> +
> +		if (run->io.port == PORT_L0_EXIT) {

One of the motivations for using GUEST_SYNC() instead of PORT_L0_EXIT is that
this test could (very theoretically) get a false pass if GUEST_DONE() is reached
before PORT_L0_EXIT is encountered.

> +			printf("vcpu0: Executed the posted interrupt\n");
> +			WRITE_ONCE(pi_executed, true);
> +			continue;
> +		}
> +
> +		switch (get_ucall(vcpu0, &uc)) {
> +		case UCALL_ABORT:
> +			TEST_FAIL("%s", (const char *)uc.args[0]);

			REPORT_GUEST_ASSERT(...)

> +			/* NOT REACHED */
> +		case UCALL_DONE:
> +			goto done;

Just break?

> +		default:
> +			TEST_FAIL("vcpu0: Unknown ucall %lu", uc.cmd);
> +		}
> +	}
> +
> +done:
> +	kvm_vm_free(vm);
> +	return 0;
> +}
> -- 
> 2.37.2.672.g94769d06f0-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-08-29 16:09   ` Sean Christopherson
@ 2022-09-07  0:01     ` Mingwei Zhang
  2022-09-07  2:50     ` Yuan Yao
  1 sibling, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-09-07  0:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Mon, Aug 29, 2022, Sean Christopherson wrote:
> On Sun, Aug 28, 2022, Mingwei Zhang wrote:
> > Create a common function to handle kvm request in the vcpu_run loop. KVM
> > implicitly assumes the virtual APIC page being present + mapped into the
> > kernel address space when executing vmx_guest_apic_has_interrupts().
> > However, with demand paging KVM breaks the assumption, as the
> > KVM_REQ_GET_VMCS12_PAGES event isn't assessed before entering vcpu_block.
> 
> KVM_REQ_GET_VMCS12_PAGES doesn't exist upstream.

ack.
> 
> > Fix this by getting vmcs12 pages before inspecting the guest's APIC page.
> > Because of this fix, the event handling code of
> > KVM_REQ_GET_NESTED_STATE_PAGES becomes a common code path for both
> > vcpu_enter_guest() and vcpu_block(). Thus, put this code snippet into a
> > common helper function to avoid code duplication.
> > 
> > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Originally-by: Oliver Upton <oupton@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> 
> If you drop someone as author, then their SOB also needs to be jettisoned.
> 

ack.

> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  arch/x86/kvm/x86.c | 29 +++++++++++++++++++++++------
> >  1 file changed, 23 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d7374d768296..3dcaac8f0584 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10261,12 +10261,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  			r = -EIO;
> >  			goto out;
> >  		}
> > -		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
> > -			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
> > -				r = 0;
> > -				goto out;
> > -			}
> > -		}
> >  		if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
> >  			kvm_mmu_free_obsolete_roots(vcpu);
> >  		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> > @@ -10666,6 +10660,23 @@ static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
> >  		!vcpu->arch.apf.halted);
> >  }
> >  
> > +static int kvm_vcpu_handle_common_requests(struct kvm_vcpu *vcpu)
> > +{
> > +	if (kvm_request_pending(vcpu)) {
> 
> Probably going to be a moot point, but write this as
> 
> 	if (!kvm_request_pending(vcpu))
> 		return 1;
> 
> to reduce indentation.
> 
> > +		/*
> > +		 * Get the vmcs12 pages before checking for interrupts that
> > +		 * might unblock the guest if L1 is using virtual-interrupt
> > +		 * delivery.
> > +		 */
> > +		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
> > +			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))
> 
> Similarly
> 
> 	if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu) &&
> 	    unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))
> 		return 0;
> 
> though I can see the argument for fully isolating each request.  But again, likely
> a moot point.
> 
> > +				return 0;
> > +		}
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> >  /* Called within kvm->srcu read side.  */
> >  static int vcpu_run(struct kvm_vcpu *vcpu)
> >  {
> > @@ -10681,6 +10692,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >  		 * this point can start executing an instruction.
> >  		 */
> >  		vcpu->arch.at_instruction_boundary = false;
> > +
> > +		/* Process common request regardless of vcpu state. */
> > +		r = kvm_vcpu_handle_common_requests(vcpu);
> 
> IMO this is subtly a dangerous hook.  It implies that both vcpu_enter_guest()
> and vcpu_block() correctly handle requests becoming pending after the "common"
> check, but that's not actually the case.  If a request _needs_ to be handled
> before vcpu_block(), then ideally it should be explicitly queried in
> kvm_vcpu_check_block().  KVM_REQ_GET_NESTED_STATE_PAGES doesn't have issues because
> it's only ever set from the vCPU itself.
> 
> Following that train of thought, KVM_REQ_GET_NESTED_STATE_PAGES really shouldn't
> even be a request.  Aha!  And we can do that in a way that would magically fix this
> bug, and would ensure we don't leave a trap for future us.
> 
> KVM already provides KVM_REQ_UNBLOCK to prevent blocking the vCPU without actaully
> waking the vCPU, i.e. to kick the vCPU back into the vcpu_run() loop.  The request
> is provided specifically for scenarios like this where KVM needs to do work before
> blocking.
> 

hmm. I think this won't work. The warning is happening at this trace
(although the dynamic trace does not show the full stack trace in source
code):

WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn))
vmx_guest_apic_has_interrupt()
kvm_guest_apic_has_interrupt()
kvm_vcpu_has_events()
kvm_arch_vcpu_runnablea()
kvm_vcpu_check_block()

If you go to kvm_vcpu_check_block(), the check of KVM_REQ_UNBLOCK is
behind check of kvm_arch_vcpu_runnable(). So, with the diff you pointed
out, we will still see the warning.

Maybe what we can do is to re-order the
kvm_check_request(KVM_REQ_UNBLOCK, vcpu) to the beginning of the
kvm_vcpu_check_block()? But I am not sure.

Thanks.
-Mingwei
> Normally I'd say we should do this over multiple patches so that the "blocking"
> bug is fixed before doing the rework/cleanup, but I'm ok if we want to skip straight
> to the rework since we're obviously carrying an internal patch and no one else is
> likely to need the fix.  But I also wouldn't object to including an intermediate
> patch to fix the bug so that there's a better paper trail.
> 
> E.g. as a very partial conversion:
> 
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/vmx/nested.c       |  2 +-
>  arch/x86/kvm/x86.c              | 12 ++++++++++++
>  arch/x86/kvm/x86.h              | 10 ++++++++++
>  4 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9345303c8c6d..bfca37419783 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -939,6 +939,8 @@ struct kvm_vcpu_arch {
>  	 */
>  	bool pdptrs_from_userspace;
> 
> +	bool nested_get_pages_pending;
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
>  	hpa_t hv_root_tdp;
>  #endif
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index ddd4367d4826..e83b145c3a35 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3446,7 +3446,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  		 * to nested_get_vmcs12_pages before the next VM-entry.  The MSRs
>  		 * have already been set at vmentry time and should not be reset.
>  		 */
> -		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
> +		kvm_nested_get_pages_set_pending(vcpu);
>  	}
> 
>  	/*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c0e3e7915a3a..0a7601ebffc6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9650,6 +9650,12 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
>  	return kvm_x86_ops.nested_ops->check_events(vcpu);
>  }
> 
> +static int kvm_get_nested_state_pages(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.nested_get_pages_pending = false;
> +	return kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu);
> +}
> +
>  static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  {
>  	trace_kvm_inj_exception(vcpu->arch.exception.nr,
> @@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  		if (kvm_cpu_has_pending_timer(vcpu))
>  			kvm_inject_pending_timer_irqs(vcpu);
> 
> +		if (vcpu->arch.nested_get_pages_pending) {
> +			r = kvm_get_nested_state_pages(vcpu);
> +			if (r <= 0)
> +				break;
> +		}
> +
>  		if (dm_request_for_irq_injection(vcpu) &&
>  			kvm_vcpu_ready_for_interrupt_injection(vcpu)) {
>  			r = 0;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 1926d2cb8e79..e35aac39dc73 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -481,4 +481,14 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  			 unsigned int port, void *data,  unsigned int count,
>  			 int in);
> 
> +static inline void kvm_nested_get_pages_set_pending(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Here is a comment explaining why KVM needs to prevent the vCPU from
> +	 * blocking until the vCPU's nested pages have been loaded.
> +	 */
> +	vcpu->arch.nested_get_pages_pending = true;
> +	kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
> +}
> +
>  #endif
> 
> base-commit: 14a47a98151834c5bd2f6d8d592b01108a3f882a
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] KVM: selftests: Add support for posted interrupt handling in L2
  2022-08-29 16:19   ` Sean Christopherson
@ 2022-09-07  0:02     ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-09-07  0:02 UTC (permalink / raw)
  To: Sean Christopherson, y
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Mon, Aug 29, 2022, Sean Christopherson wrote:
> On Sun, Aug 28, 2022, Mingwei Zhang wrote:
> > Add support for posted interrupt handling in L2. This is done by adding
> > needed data structures in vmx_pages and APIs to allow an L2 receive posted
> > interrupts.
> > 
> > Cc: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  tools/testing/selftests/kvm/include/x86_64/vmx.h | 10 ++++++++++
> >  tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 14 ++++++++++++++
> >  2 files changed, 24 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> > index 99fa1410964c..69784fc71bce 100644
> > --- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
> > +++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> > @@ -577,6 +577,14 @@ struct vmx_pages {
> >  	void *apic_access_hva;
> >  	uint64_t apic_access_gpa;
> >  	void *apic_access;
> > +
> > +	void *virtual_apic_hva;
> > +	uint64_t virtual_apic_gpa;
> > +	void *virtual_apic;
> > +
> > +	void *posted_intr_desc_hva;
> > +	uint64_t posted_intr_desc_gpa;
> > +	void *posted_intr_desc;
> 
> Can you add a prep patch to dedup the absurd amount of copy-paste code related to
> vmx_pages?
> 
> I.e. take this and give all the other triplets the same treatment.

Will do.

> 
> ---
>  tools/testing/selftests/kvm/include/x86_64/vmx.h | 10 +++++++---
>  tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 15 ++++++++++-----
>  2 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> index 99fa1410964c..ecc66d65acc1 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> @@ -537,10 +537,14 @@ static inline uint32_t vmcs_revision(void)
>  	return rdmsr(MSR_IA32_VMX_BASIC);
>  }
> 
> +struct vmx_page {
> +	void *gva;
> +	uint64_t gpa;
> +	void *hva;
> +};
> +
>  struct vmx_pages {
> -	void *vmxon_hva;
> -	uint64_t vmxon_gpa;
> -	void *vmxon;
> +	struct vmx_page vmxon;
> 
>  	void *vmcs_hva;
>  	uint64_t vmcs_gpa;
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> index 80a568c439b8..e4eeab85741a 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> @@ -58,6 +58,13 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
>  	return evmcs_ver;
>  }
> 
> +static void vcpu_alloc_vmx_page(struct kvm_vm *vm, struct vmx_page *page)
> +{
> +	page->gva = (void *)vm_vaddr_alloc_page(vm);
> +	page->hva = addr_gva2hva(vm, (uintptr_t)page->gva);
> +	page->gpa = addr_gva2gpa(vm, (uintptr_t)page->gva);
> +}
> +
>  /* Allocate memory regions for nested VMX tests.
>   *
>   * Input Args:
> @@ -76,9 +83,7 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
>  	struct vmx_pages *vmx = addr_gva2hva(vm, vmx_gva);
> 
>  	/* Setup of a region of guest memory for the vmxon region. */
> -	vmx->vmxon = (void *)vm_vaddr_alloc_page(vm);
> -	vmx->vmxon_hva = addr_gva2hva(vm, (uintptr_t)vmx->vmxon);
> -	vmx->vmxon_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmxon);
> +	vcpu_alloc_vmx_page(vm, &vmx->vmxon);
> 
>  	/* Setup of a region of guest memory for a vmcs. */
>  	vmx->vmcs = (void *)vm_vaddr_alloc_page(vm);
> @@ -160,8 +165,8 @@ bool prepare_for_vmx_operation(struct vmx_pages *vmx)
>  		wrmsr(MSR_IA32_FEAT_CTL, feature_control | required);
> 
>  	/* Enter VMX root operation. */
> -	*(uint32_t *)(vmx->vmxon) = vmcs_revision();
> -	if (vmxon(vmx->vmxon_gpa))
> +	*(uint32_t *)(vmx->vmxon.gva) = vmcs_revision();
> +	if (vmxon(vmx->vmxon.gpa))
>  		return false;
> 
>  	return true;
> 
> base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2
> --
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-08-29 16:09   ` Sean Christopherson
  2022-09-07  0:01     ` Mingwei Zhang
@ 2022-09-07  2:50     ` Yuan Yao
  2022-09-07  4:26       ` Mingwei Zhang
  1 sibling, 1 reply; 15+ messages in thread
From: Yuan Yao @ 2022-09-07  2:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Mingwei Zhang, Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Mon, Aug 29, 2022 at 04:09:33PM +0000, Sean Christopherson wrote:
> On Sun, Aug 28, 2022, Mingwei Zhang wrote:
> > Create a common function to handle kvm request in the vcpu_run loop. KVM
> > implicitly assumes the virtual APIC page being present + mapped into the
> > kernel address space when executing vmx_guest_apic_has_interrupts().
> > However, with demand paging KVM breaks the assumption, as the
> > KVM_REQ_GET_VMCS12_PAGES event isn't assessed before entering vcpu_block.
>
> KVM_REQ_GET_VMCS12_PAGES doesn't exist upstream.
>
> > Fix this by getting vmcs12 pages before inspecting the guest's APIC page.
> > Because of this fix, the event handling code of
> > KVM_REQ_GET_NESTED_STATE_PAGES becomes a common code path for both
> > vcpu_enter_guest() and vcpu_block(). Thus, put this code snippet into a
> > common helper function to avoid code duplication.
> >
> > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Originally-by: Oliver Upton <oupton@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
>
> If you drop someone as author, then their SOB also needs to be jettisoned.
>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  arch/x86/kvm/x86.c | 29 +++++++++++++++++++++++------
> >  1 file changed, 23 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d7374d768296..3dcaac8f0584 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10261,12 +10261,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  			r = -EIO;
> >  			goto out;
> >  		}
> > -		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
> > -			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
> > -				r = 0;
> > -				goto out;
> > -			}
> > -		}
> >  		if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
> >  			kvm_mmu_free_obsolete_roots(vcpu);
> >  		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> > @@ -10666,6 +10660,23 @@ static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
> >  		!vcpu->arch.apf.halted);
> >  }
> >
> > +static int kvm_vcpu_handle_common_requests(struct kvm_vcpu *vcpu)
> > +{
> > +	if (kvm_request_pending(vcpu)) {
>
> Probably going to be a moot point, but write this as
>
> 	if (!kvm_request_pending(vcpu))
> 		return 1;
>
> to reduce indentation.
>
> > +		/*
> > +		 * Get the vmcs12 pages before checking for interrupts that
> > +		 * might unblock the guest if L1 is using virtual-interrupt
> > +		 * delivery.
> > +		 */
> > +		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
> > +			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))
>
> Similarly
>
> 	if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu) &&
> 	    unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))
> 		return 0;
>
> though I can see the argument for fully isolating each request.  But again, likely
> a moot point.
>
> > +				return 0;
> > +		}
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> >  /* Called within kvm->srcu read side.  */
> >  static int vcpu_run(struct kvm_vcpu *vcpu)
> >  {
> > @@ -10681,6 +10692,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >  		 * this point can start executing an instruction.
> >  		 */
> >  		vcpu->arch.at_instruction_boundary = false;
> > +
> > +		/* Process common request regardless of vcpu state. */
> > +		r = kvm_vcpu_handle_common_requests(vcpu);
>
> IMO this is subtly a dangerous hook.  It implies that both vcpu_enter_guest()
> and vcpu_block() correctly handle requests becoming pending after the "common"
> check, but that's not actually the case.  If a request _needs_ to be handled
> before vcpu_block(), then ideally it should be explicitly queried in
> kvm_vcpu_check_block().  KVM_REQ_GET_NESTED_STATE_PAGES doesn't have issues because
> it's only ever set from the vCPU itself.
>
> Following that train of thought, KVM_REQ_GET_NESTED_STATE_PAGES really shouldn't
> even be a request.  Aha!  And we can do that in a way that would magically fix this
> bug, and would ensure we don't leave a trap for future us.
>
> KVM already provides KVM_REQ_UNBLOCK to prevent blocking the vCPU without actaully
> waking the vCPU, i.e. to kick the vCPU back into the vcpu_run() loop.  The request
> is provided specifically for scenarios like this where KVM needs to do work before
> blocking.
>
> Normally I'd say we should do this over multiple patches so that the "blocking"
> bug is fixed before doing the rework/cleanup, but I'm ok if we want to skip straight
> to the rework since we're obviously carrying an internal patch and no one else is
> likely to need the fix.  But I also wouldn't object to including an intermediate
> patch to fix the bug so that there's a better paper trail.
>
> E.g. as a very partial conversion:
>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/vmx/nested.c       |  2 +-
>  arch/x86/kvm/x86.c              | 12 ++++++++++++
>  arch/x86/kvm/x86.h              | 10 ++++++++++
>  4 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9345303c8c6d..bfca37419783 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -939,6 +939,8 @@ struct kvm_vcpu_arch {
>  	 */
>  	bool pdptrs_from_userspace;
>
> +	bool nested_get_pages_pending;
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
>  	hpa_t hv_root_tdp;
>  #endif
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index ddd4367d4826..e83b145c3a35 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3446,7 +3446,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  		 * to nested_get_vmcs12_pages before the next VM-entry.  The MSRs
>  		 * have already been set at vmentry time and should not be reset.
>  		 */
> -		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
> +		kvm_nested_get_pages_set_pending(vcpu);
>  	}
>
>  	/*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c0e3e7915a3a..0a7601ebffc6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9650,6 +9650,12 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
>  	return kvm_x86_ops.nested_ops->check_events(vcpu);
>  }
>
> +static int kvm_get_nested_state_pages(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.nested_get_pages_pending = false;
> +	return kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu);
> +}
> +
>  static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  {
>  	trace_kvm_inj_exception(vcpu->arch.exception.nr,
> @@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  		if (kvm_cpu_has_pending_timer(vcpu))
>  			kvm_inject_pending_timer_irqs(vcpu);
>
> +		if (vcpu->arch.nested_get_pages_pending) {
> +			r = kvm_get_nested_state_pages(vcpu);
> +			if (r <= 0)
> +				break;
> +		}
> +

Will this leads to skip the get_nested_state_pages for L2 first time
vmentry in every L2 running iteration ? Because with above changes
KVM_REQ_GET_NESTED_STATE_PAGES is not set in
nested_vmx_enter_non_root_mode() and
vcpu->arch.nested_get_pages_pending is not checked in
vcpu_enter_guest().

>  		if (dm_request_for_irq_injection(vcpu) &&
>  			kvm_vcpu_ready_for_interrupt_injection(vcpu)) {
>  			r = 0;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 1926d2cb8e79..e35aac39dc73 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -481,4 +481,14 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  			 unsigned int port, void *data,  unsigned int count,
>  			 int in);
>
> +static inline void kvm_nested_get_pages_set_pending(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Here is a comment explaining why KVM needs to prevent the vCPU from
> +	 * blocking until the vCPU's nested pages have been loaded.
> +	 */
> +	vcpu->arch.nested_get_pages_pending = true;
> +	kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
> +}
> +
>  #endif
>
> base-commit: 14a47a98151834c5bd2f6d8d592b01108a3f882a
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-09-07  2:50     ` Yuan Yao
@ 2022-09-07  4:26       ` Mingwei Zhang
  2022-09-07  5:35         ` Yuan Yao
  0 siblings, 1 reply; 15+ messages in thread
From: Mingwei Zhang @ 2022-09-07  4:26 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Sean Christopherson, Paolo Bonzini, kvm, LKML, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

> > @@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >               if (kvm_cpu_has_pending_timer(vcpu))
> >                       kvm_inject_pending_timer_irqs(vcpu);
> >
> > +             if (vcpu->arch.nested_get_pages_pending) {
> > +                     r = kvm_get_nested_state_pages(vcpu);
> > +                     if (r <= 0)
> > +                             break;
> > +             }
> > +
>
> Will this leads to skip the get_nested_state_pages for L2 first time
> vmentry in every L2 running iteration ? Because with above changes
> KVM_REQ_GET_NESTED_STATE_PAGES is not set in
> nested_vmx_enter_non_root_mode() and
> vcpu->arch.nested_get_pages_pending is not checked in
> vcpu_enter_guest().
>
Good catch. I think the diff won't work when vcpu is runnable. It only
tries to catch the vcpu block case. Even for the vcpu block case,  the
check of KVM_REQ_UNBLOCK is way too late. Ah, kvm_vcpu_check_block()
is called by kvm_vcpu_block() which is called by vcpu_block(). The
warning is triggered at the very beginning of vcpu_block(), i.e.,
within kvm_arch_vcpu_runnable(). So, please ignore the trace in my
previous email.

In addition, my minor push back for that is
vcpu->arch.nested_get_pages_pending seems to be another
KVM_REQ_GET_NESTED_STATE_PAGES.

Thanks.
-Mingwei


-Mingwei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-09-07  4:26       ` Mingwei Zhang
@ 2022-09-07  5:35         ` Yuan Yao
  2022-09-07 15:48           ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Yuan Yao @ 2022-09-07  5:35 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Sean Christopherson, Paolo Bonzini, kvm, LKML, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Tue, Sep 06, 2022 at 09:26:33PM -0700, Mingwei Zhang wrote:
> > > @@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> > >               if (kvm_cpu_has_pending_timer(vcpu))
> > >                       kvm_inject_pending_timer_irqs(vcpu);
> > >
> > > +             if (vcpu->arch.nested_get_pages_pending) {
> > > +                     r = kvm_get_nested_state_pages(vcpu);
> > > +                     if (r <= 0)
> > > +                             break;
> > > +             }
> > > +
> >
> > Will this leads to skip the get_nested_state_pages for L2 first time
> > vmentry in every L2 running iteration ? Because with above changes
> > KVM_REQ_GET_NESTED_STATE_PAGES is not set in
> > nested_vmx_enter_non_root_mode() and
> > vcpu->arch.nested_get_pages_pending is not checked in
> > vcpu_enter_guest().
> >
> Good catch. I think the diff won't work when vcpu is runnable. It only
> tries to catch the vcpu block case. Even for the vcpu block case,  the
> check of KVM_REQ_UNBLOCK is way too late. Ah, kvm_vcpu_check_block()
> is called by kvm_vcpu_block() which is called by vcpu_block(). The
> warning is triggered at the very beginning of vcpu_block(), i.e.,
> within kvm_arch_vcpu_runnable(). So, please ignore the trace in my
> previous email.
>
> In addition, my minor push back for that is
> vcpu->arch.nested_get_pages_pending seems to be another
> KVM_REQ_GET_NESTED_STATE_PAGES.

Yeah, but in concept level it's not a REQ mask lives in the
vcpu->requests which can be cached by e.g. kvm_request_pending().
It's necessary to check vcpu->arch.nested_get_pages_pending in
vcpu_enter_guest() if Sean's idea is to replace
KVM_REQ_GET_NESTED_STATE_PAGES with nested_get_pages_pending.

>
> Thanks.
> -Mingwei
>
>
> -Mingwei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-09-07  5:35         ` Yuan Yao
@ 2022-09-07 15:48           ` Sean Christopherson
  2022-09-07 15:56             ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2022-09-07 15:48 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Mingwei Zhang, Paolo Bonzini, kvm, LKML, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Wed, Sep 07, 2022, Yuan Yao wrote:
> On Tue, Sep 06, 2022 at 09:26:33PM -0700, Mingwei Zhang wrote:
> > > > @@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> > > >               if (kvm_cpu_has_pending_timer(vcpu))
> > > >                       kvm_inject_pending_timer_irqs(vcpu);
> > > >
> > > > +             if (vcpu->arch.nested_get_pages_pending) {
> > > > +                     r = kvm_get_nested_state_pages(vcpu);
> > > > +                     if (r <= 0)
> > > > +                             break;
> > > > +             }
> > > > +
> > >
> > > Will this leads to skip the get_nested_state_pages for L2 first time
> > > vmentry in every L2 running iteration ? Because with above changes
> > > KVM_REQ_GET_NESTED_STATE_PAGES is not set in
> > > nested_vmx_enter_non_root_mode() and
> > > vcpu->arch.nested_get_pages_pending is not checked in
> > > vcpu_enter_guest().
> > >
> > Good catch. I think the diff won't work when vcpu is runnable.

It works, but it's inefficient if the request comes from KVM_SET_NESTED_STATE.
The pending KVM_REQ_UNBLOCK that comes with the flag will prevent actually running
the guest.  Specifically, this chunk of code will detect the pending request and
bail out of vcpu_enter_guest().

	if (kvm_vcpu_exit_request(vcpu)) {
		vcpu->mode = OUTSIDE_GUEST_MODE;
		smp_wmb();
		local_irq_enable();
		preempt_enable();
		kvm_vcpu_srcu_read_lock(vcpu);
		r = 1;
		goto cancel_injection;
	}

But the inefficiency is a non-issue since "true" emulation of VM-Enter will flow
through this path (the VMRESUME/VMLAUNCH/VMRUN exit handler runs at the end of
vcpu_enter_guest().

> > It only tries to catch the vcpu block case. Even for the vcpu block case,
> > the check of KVM_REQ_UNBLOCK is way too late. Ah, kvm_vcpu_check_block() is
> > called by kvm_vcpu_block() which is called by vcpu_block(). The warning is
> > triggered at the very beginning of vcpu_block(), i.e., within
> > kvm_arch_vcpu_runnable(). So, please ignore the trace in my previous email.
> >
> > In addition, my minor push back for that is
> > vcpu->arch.nested_get_pages_pending seems to be another
> > KVM_REQ_GET_NESTED_STATE_PAGES.
> 
> Yeah, but in concept level it's not a REQ mask lives in the
> vcpu->requests which can be cached by e.g. kvm_request_pending().
> It's necessary to check vcpu->arch.nested_get_pages_pending in
> vcpu_enter_guest() if Sean's idea is to replace
> KVM_REQ_GET_NESTED_STATE_PAGES with nested_get_pages_pending.

Yes, they key is that it's not a request.  Requests have implicit properties:
e.g. as above, effectively prevent running the vCPU until the request goes away,
they can be pended from other vCPUs, etc...  And the property that is most relevant
to this bug: except for special cases, requests only need to be serviced before
running vCPU.

And the number of requests is limited due to them being stored in a bitmap.  x86
still has plenty of room due to kvm_vcpu.requests being a u64, but it's still
preferable to avoid using a request unless absolutely necessary.

For this case, since using a request isn't strictly needed and using a request
would require special casing that request, my strong preference is to not use a
request.

So yes, my idea is to "just" replace the request with a flag, but there are subtly
quite a few impliciations in not using a request.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function
  2022-09-07 15:48           ` Sean Christopherson
@ 2022-09-07 15:56             ` Sean Christopherson
  0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2022-09-07 15:56 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Mingwei Zhang, Paolo Bonzini, kvm, LKML, Maxim Levitsky,
	Vitaly Kuznetsov, Oliver Upton, Jim Mattson

On Wed, Sep 07, 2022, Sean Christopherson wrote:
> On Wed, Sep 07, 2022, Yuan Yao wrote:
> > On Tue, Sep 06, 2022 at 09:26:33PM -0700, Mingwei Zhang wrote:
> > > > > @@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> > > > >               if (kvm_cpu_has_pending_timer(vcpu))
> > > > >                       kvm_inject_pending_timer_irqs(vcpu);
> > > > >
> > > > > +             if (vcpu->arch.nested_get_pages_pending) {
> > > > > +                     r = kvm_get_nested_state_pages(vcpu);
> > > > > +                     if (r <= 0)
> > > > > +                             break;
> > > > > +             }
> > > > > +
> > > >
> > > > Will this leads to skip the get_nested_state_pages for L2 first time
> > > > vmentry in every L2 running iteration ? Because with above changes
> > > > KVM_REQ_GET_NESTED_STATE_PAGES is not set in
> > > > nested_vmx_enter_non_root_mode() and
> > > > vcpu->arch.nested_get_pages_pending is not checked in
> > > > vcpu_enter_guest().
> > > >
> > > Good catch. I think the diff won't work when vcpu is runnable.
> 
> It works, but it's inefficient if the request comes from KVM_SET_NESTED_STATE.
> The pending KVM_REQ_UNBLOCK that comes with the flag will prevent actually running
> the guest.  Specifically, this chunk of code will detect the pending request and
> bail out of vcpu_enter_guest().
> 
> 	if (kvm_vcpu_exit_request(vcpu)) {
> 		vcpu->mode = OUTSIDE_GUEST_MODE;
> 		smp_wmb();
> 		local_irq_enable();
> 		preempt_enable();
> 		kvm_vcpu_srcu_read_lock(vcpu);
> 		r = 1;
> 		goto cancel_injection;
> 	}
> 
> But the inefficiency is a non-issue since "true" emulation of VM-Enter will flow
> through this path (the VMRESUME/VMLAUNCH/VMRUN exit handler runs at the end of
> vcpu_enter_guest().

Actually, nested VM-Enter doesn't use this path at all.  The above holds true for
emulated RSM, but that's largely a moot point since RSM isn't exactly a hot path.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-09-07 15:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-28 22:25 [PATCH v2 0/4] Fix a race between posted interrupt delivery and migration in a nested VM Mingwei Zhang
2022-08-28 22:25 ` [PATCH v2 1/4] KVM: x86: move the event handling of KVM_REQ_GET_VMCS12_PAGES into a common function Mingwei Zhang
2022-08-29 16:09   ` Sean Christopherson
2022-09-07  0:01     ` Mingwei Zhang
2022-09-07  2:50     ` Yuan Yao
2022-09-07  4:26       ` Mingwei Zhang
2022-09-07  5:35         ` Yuan Yao
2022-09-07 15:48           ` Sean Christopherson
2022-09-07 15:56             ` Sean Christopherson
2022-08-28 22:25 ` [PATCH v2 2/4] KVM: selftests: Save/restore vAPIC state in migration tests Mingwei Zhang
2022-08-28 22:25 ` [PATCH v2 3/4] KVM: selftests: Add support for posted interrupt handling in L2 Mingwei Zhang
2022-08-29 16:19   ` Sean Christopherson
2022-09-07  0:02     ` Mingwei Zhang
2022-08-28 22:25 ` [PATCH v2 4/4] KVM: selftests: Test if posted interrupt delivery race with migration Mingwei Zhang
2022-08-29 17:21   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).