All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue
@ 2021-05-20 23:03 Jim Mattson
  2021-05-20 23:03 ` [PATCH 01/12] KVM: x86: Remove guest mode check from kvm_check_nested_events Jim Mattson
                   ` (15 more replies)
  0 siblings, 16 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson

When the VMCS12 posted interrupt descriptor isn't backed by an L1
memslot, kvm will launch vmcs02 with a stale posted interrupt
descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
nested_get_vmcs12_pages()"), kvm would have silently disabled the
VMCS02 "process posted interrupts" VM-execution control. Both
behaviors are wrong, though the use-after-free is more egregious.

Empirical tests on actual hardware reveal that a posted interrupt
descriptor without any backing memory/device has PCI bus error
semantics (reads return all 1's and writes are discarded). However,
kvm can't tell an unbacked address from an MMIO address. Normally, kvm
would ask userspace for an MMIO completion, but that's complicated for
a posted interrupt descriptor access. There are already a number of
cases where kvm bails out to userspace with KVM_INTERNAL_ERROR via
kvm_handle_memory_failure, so that seems like the best route to take.

It would be relatively easy to invoke kvm_handle_memory_failure at
emulated VM-entry, but that approach would break existing
kvm-unit-tests. Moreover, the issue doesn't really come up until the
vCPU--in virtualized VMX non-root operation--received the posted
interrupt notification vector indicated in its VMCS12.

Sadly, it's really hard to arrange for an exit to userspace from
vmx_complete_nested_posted_interrupt, which is where kvm actually
needs to access the unbacked PID. Initially, I added a new kvm request
for a userspace exit on the next guest entry, but Sean hated that
approach. Based on his suggestion, I added the plumbing to get back
out to userspace in the event of an error in
vmx_complete_nested_posted_interrupt. This works in the case of an
unbacked PID, but it doesn't work quite as well in the case of an
unbacked virtual APIC page (another case where kvm was happy to just
silently ignore the problem and attempt to muddle its way through.) In
that case, this series is an incremental improvement, but it's not a
complete fix.

Jim Mattson (12):
  KVM: x86: Remove guest mode check from kvm_check_nested_events
  KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt
  KVM: x86: Add a return code to inject_pending_event
  KVM: x86: Add a return code to kvm_apic_accept_events
  KVM: nVMX: Fail on MMIO completion for nested posted interrupts
  KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't
    mappable
  KVM: selftests: Move APIC definitions into a separate file
  KVM: selftests: Hoist APIC functions out of individual tests
  KVM: selftests: Introduce x2APIC register manipulation functions
  KVM: selftests: Introduce prepare_tpr_shadow
  KVM: selftests: Add a test of an unbacked nested PI descriptor

 arch/x86/kvm/lapic.c                          |  11 +-
 arch/x86/kvm/lapic.h                          |   2 +-
 arch/x86/kvm/vmx/nested.c                     |  31 ++-
 arch/x86/kvm/x86.c                            |  56 ++--
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +-
 .../selftests/kvm/include/x86_64/apic.h       |  91 +++++++
 .../selftests/kvm/include/x86_64/processor.h  |  49 +---
 .../selftests/kvm/include/x86_64/vmx.h        |   6 +
 tools/testing/selftests/kvm/lib/x86_64/apic.c |  45 ++++
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |   8 +
 .../testing/selftests/kvm/x86_64/evmcs_test.c |  11 +-
 .../selftests/kvm/x86_64/set_boot_cpu_id.c    |   6 +-
 tools/testing/selftests/kvm/x86_64/smm_test.c |   4 +-
 .../selftests/kvm/x86_64/vmx_pi_mmio_test.c   | 252 ++++++++++++++++++
 .../selftests/kvm/x86_64/xapic_ipi_test.c     |  59 +---
 16 files changed, 488 insertions(+), 147 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/apic.h
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/apic.c
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c

-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/12] KVM: x86: Remove guest mode check from kvm_check_nested_events
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails Jim Mattson
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson

A survey of the callsites reveals that they all ensure the vCPU is in
guest mode before calling kvm_check_nested_events. Remove this dead
code so that the only negative value this function returns (at the
moment) is -EBUSY.

Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/x86.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbc4e04e67ad..d517460db413 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8532,9 +8532,6 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu)
 
 int kvm_check_nested_events(struct kvm_vcpu *vcpu)
 {
-	if (WARN_ON_ONCE(!is_guest_mode(vcpu)))
-		return -EIO;
-
 	if (kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)) {
 		kvm_x86_ops.nested_ops->triple_fault(vcpu);
 		return 1;
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
  2021-05-20 23:03 ` [PATCH 01/12] KVM: x86: Remove guest mode check from kvm_check_nested_events Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-24 15:43   ` Paolo Bonzini
  2021-05-20 23:03 ` [PATCH 03/12] KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt Jim Mattson
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Oliver Upton

At present, there are two reasons why kvm_check_nested_events may
return a non-zero value:

1) we just emulated a shutdown VM-exit from L2 to L1.
2) we need to perform an immediate VM-exit from vmcs02.

In either case, transition the vCPU to "running."

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/x86.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d517460db413..d3fea8ea3628 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9468,8 +9468,8 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
 
 static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu))
-		kvm_check_nested_events(vcpu);
+	if (is_guest_mode(vcpu) && kvm_check_nested_events(vcpu))
+		return true;
 
 	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
 		!vcpu->arch.apf.halted);
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 03/12] KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
  2021-05-20 23:03 ` [PATCH 01/12] KVM: x86: Remove guest mode check from kvm_check_nested_events Jim Mattson
  2021-05-20 23:03 ` [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 04/12] KVM: x86: Add a return code to inject_pending_event Jim Mattson
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Sean Christopherson, Oliver Upton

No functional change intended.

Signed-off-by: Jim Mattson <jmattson@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/vmx/nested.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..7646e6e561ad 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3682,7 +3682,7 @@ void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
+static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int max_irr;
@@ -3690,17 +3690,17 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 	u16 status;
 
 	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
-		return;
+		return 0;
 
 	vmx->nested.pi_pending = false;
 	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
-		return;
+		return 0;
 
 	max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
 	if (max_irr != 256) {
 		vapic_page = vmx->nested.virtual_apic_map.hva;
 		if (!vapic_page)
-			return;
+			return 0;
 
 		__kvm_apic_update_irr(vmx->nested.pi_desc->pir,
 			vapic_page, &max_irr);
@@ -3713,6 +3713,7 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 	}
 
 	nested_mark_vmcs12_pages_dirty(vcpu);
+	return 0;
 }
 
 static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
@@ -3887,8 +3888,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 	}
 
 no_vmexit:
-	vmx_complete_nested_posted_interrupt(vcpu);
-	return 0;
+	return vmx_complete_nested_posted_interrupt(vcpu);
 }
 
 static u32 vmx_get_preemption_timer_value(struct kvm_vcpu *vcpu)
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 04/12] KVM: x86: Add a return code to inject_pending_event
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (2 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 03/12] KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 05/12] KVM: x86: Add a return code to kvm_apic_accept_events Jim Mattson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Sean Christopherson

No functional change intended. At present, 'r' will always be -EBUSY
on a control transfer to the 'out' label.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/x86.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d3fea8ea3628..eb35536f8d06 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8547,7 +8547,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 	static_call(kvm_x86_queue_exception)(vcpu);
 }
 
-static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
+static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
 {
 	int r;
 	bool can_inject = true;
@@ -8594,7 +8594,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 	if (is_guest_mode(vcpu)) {
 		r = kvm_check_nested_events(vcpu);
 		if (r < 0)
-			goto busy;
+			goto out;
 	}
 
 	/* try to inject new event if pending */
@@ -8636,7 +8636,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 	if (vcpu->arch.smi_pending) {
 		r = can_inject ? static_call(kvm_x86_smi_allowed)(vcpu, true) : -EBUSY;
 		if (r < 0)
-			goto busy;
+			goto out;
 		if (r) {
 			vcpu->arch.smi_pending = false;
 			++vcpu->arch.smi_count;
@@ -8649,7 +8649,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 	if (vcpu->arch.nmi_pending) {
 		r = can_inject ? static_call(kvm_x86_nmi_allowed)(vcpu, true) : -EBUSY;
 		if (r < 0)
-			goto busy;
+			goto out;
 		if (r) {
 			--vcpu->arch.nmi_pending;
 			vcpu->arch.nmi_injected = true;
@@ -8664,7 +8664,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 	if (kvm_cpu_has_injectable_intr(vcpu)) {
 		r = can_inject ? static_call(kvm_x86_interrupt_allowed)(vcpu, true) : -EBUSY;
 		if (r < 0)
-			goto busy;
+			goto out;
 		if (r) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), false);
 			static_call(kvm_x86_set_irq)(vcpu);
@@ -8680,11 +8680,14 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 		*req_immediate_exit = true;
 
 	WARN_ON(vcpu->arch.exception.pending);
-	return;
+	return 0;
 
-busy:
-	*req_immediate_exit = true;
-	return;
+out:
+	if (r == -EBUSY) {
+		*req_immediate_exit = true;
+		r = 0;
+	}
+	return r;
 }
 
 static void process_nmi(struct kvm_vcpu *vcpu)
@@ -9245,7 +9248,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			goto out;
 		}
 
-		inject_pending_event(vcpu, &req_immediate_exit);
+		r = inject_pending_event(vcpu, &req_immediate_exit);
+		if (r < 0) {
+			r = 0;
+			goto out;
+		}
 		if (req_int_win)
 			static_call(kvm_x86_enable_irq_window)(vcpu);
 
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 05/12] KVM: x86: Add a return code to kvm_apic_accept_events
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (3 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 04/12] KVM: x86: Add a return code to inject_pending_event Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-25 19:24   ` Reiji Watanabe
  2021-05-20 23:03 ` [PATCH 06/12] KVM: nVMX: Fail on MMIO completion for nested posted interrupts Jim Mattson
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Sean Christopherson

No functional change intended. At present, the only negative value
returned by kvm_check_nested_events is -EBUSY.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/lapic.c | 11 ++++++-----
 arch/x86/kvm/lapic.h |  2 +-
 arch/x86/kvm/x86.c   | 22 ++++++++++++++++++----
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c0ebef560bd1..f747864ff323 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2856,7 +2856,7 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len)
 	return kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, addr, new_len);
 }
 
-void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
+int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u8 sipi_vector;
@@ -2864,7 +2864,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 	unsigned long pe;
 
 	if (!lapic_in_kernel(vcpu))
-		return;
+		return 0;
 
 	/*
 	 * Read pending events before calling the check_events
@@ -2872,12 +2872,12 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 	 */
 	pe = smp_load_acquire(&apic->pending_events);
 	if (!pe)
-		return;
+		return 0;
 
 	if (is_guest_mode(vcpu)) {
 		r = kvm_check_nested_events(vcpu);
 		if (r < 0)
-			return;
+			return r == -EBUSY ? 0 : r;
 		/*
 		 * If an event has happened and caused a vmexit,
 		 * we know INITs are latched and therefore
@@ -2898,7 +2898,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 		WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
 		if (test_bit(KVM_APIC_SIPI, &pe))
 			clear_bit(KVM_APIC_SIPI, &apic->pending_events);
-		return;
+		return 0;
 	}
 
 	if (test_bit(KVM_APIC_INIT, &pe)) {
@@ -2919,6 +2919,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 		}
 	}
+	return 0;
 }
 
 void kvm_lapic_exit(void)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 997c45a5963a..d7c25d0c1354 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -76,7 +76,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
 int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
-void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
+int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb35536f8d06..9258fee4f272 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9242,7 +9242,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
 	    kvm_xen_has_interrupt(vcpu)) {
 		++vcpu->stat.req_event;
-		kvm_apic_accept_events(vcpu);
+		r = kvm_apic_accept_events(vcpu);
+		if (r < 0) {
+			r = 0;
+			goto out;
+		}
 		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
 			r = 1;
 			goto out;
@@ -9454,7 +9458,8 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
 			return 1;
 	}
 
-	kvm_apic_accept_events(vcpu);
+	if (kvm_apic_accept_events(vcpu) < 0)
+		return 0;
 	switch(vcpu->arch.mp_state) {
 	case KVM_MP_STATE_HALTED:
 	case KVM_MP_STATE_AP_RESET_HOLD:
@@ -9678,7 +9683,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 			goto out;
 		}
 		kvm_vcpu_block(vcpu);
-		kvm_apic_accept_events(vcpu);
+		if (kvm_apic_accept_events(vcpu) < 0) {
+			r = 0;
+			goto out;
+		}
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 		r = -EAGAIN;
 		if (signal_pending(current)) {
@@ -9880,11 +9888,16 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
+	int r = 0;
+
 	vcpu_load(vcpu);
 	if (kvm_mpx_supported())
 		kvm_load_guest_fpu(vcpu);
 
-	kvm_apic_accept_events(vcpu);
+	r = kvm_apic_accept_events(vcpu);
+	if (r < 0)
+		goto out;
+
 	if ((vcpu->arch.mp_state == KVM_MP_STATE_HALTED ||
 	     vcpu->arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD) &&
 	    vcpu->arch.pv.pv_unhalted)
@@ -9892,6 +9905,7 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 	else
 		mp_state->mp_state = vcpu->arch.mp_state;
 
+out:
 	if (kvm_mpx_supported())
 		kvm_put_guest_fpu(vcpu);
 	vcpu_put(vcpu);
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 06/12] KVM: nVMX: Fail on MMIO completion for nested posted interrupts
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (4 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 05/12] KVM: x86: Add a return code to kvm_apic_accept_events Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable Jim Mattson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson

When the kernel has no mapping for the vmcs02 virtual APIC page,
userspace MMIO completion is necessary to process nested posted
interrupts. This is not a configuration that KVM supports. Rather than
silently ignoring the problem, try to exit to userspace with
KVM_INTERNAL_ERROR.

Note that the event that triggers this error is consumed as a
side-effect of a call to kvm_check_nested_events. On some paths
(notably through kvm_vcpu_check_block), the error is dropped. In any
case, this is an incremental improvement over always ignoring the
error.

Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx/nested.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7646e6e561ad..706c31821362 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3700,7 +3700,7 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 	if (max_irr != 256) {
 		vapic_page = vmx->nested.virtual_apic_map.hva;
 		if (!vapic_page)
-			return 0;
+			goto mmio_needed;
 
 		__kvm_apic_update_irr(vmx->nested.pi_desc->pir,
 			vapic_page, &max_irr);
@@ -3714,6 +3714,10 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 
 	nested_mark_vmcs12_pages_dirty(vcpu);
 	return 0;
+
+mmio_needed:
+	kvm_handle_memory_failure(vcpu, X86EMUL_IO_NEEDED, NULL);
+	return -ENXIO;
 }
 
 static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (5 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 06/12] KVM: nVMX: Fail on MMIO completion for nested posted interrupts Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-24 23:21   ` Sean Christopherson
  2021-05-20 23:03 ` [PATCH 08/12] KVM: selftests: Move APIC definitions into a separate file Jim Mattson
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson

Don't allow posted interrupts to modify a stale posted interrupt
descriptor (including the initial value of 0).

Empirical tests on real hardware reveal that a posted interrupt
descriptor referencing an unbacked address has PCI bus error semantics
(reads as all 1's; writes are ignored). However, kvm can't distinguish
unbacked addresses from device-backed (MMIO) addresses, so it should
really ask userspace for an MMIO completion. That's overly
complicated, so just punt with KVM_INTERNAL_ERROR.

Don't return the error until the posted interrupt descriptor is
actually accessed. We don't want to break the existing kvm-unit-tests
that assume they can launch an L2 VM with a posted interrupt
descriptor that references MMIO space in L1.

Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 706c31821362..defd42201bb4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 				offset_in_page(vmcs12->posted_intr_desc_addr));
 			vmcs_write64(POSTED_INTR_DESC_ADDR,
 				     pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
+		} else {
+			/*
+			 * Defer the KVM_INTERNAL_ERROR exit until
+			 * someone tries to trigger posted interrupt
+			 * processing on this vCPU, to avoid breaking
+			 * existing kvm-unit-tests.
+			 */
+			vmx->nested.pi_desc = NULL;
+			pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR);
 		}
 	}
 	if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12))
@@ -3689,10 +3698,14 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 	void *vapic_page;
 	u16 status;
 
-	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
+	if (!vmx->nested.pi_pending)
 		return 0;
 
+	if (!vmx->nested.pi_desc)
+		goto mmio_needed;
+
 	vmx->nested.pi_pending = false;
+
 	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
 		return 0;
 
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 08/12] KVM: selftests: Move APIC definitions into a separate file
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (6 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 09/12] KVM: selftests: Hoist APIC functions out of individual tests Jim Mattson
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Oliver Upton

Processor.h is a hodgepodge of definitions. Though the local APIC is
technically built into the CPU these days, move the APIC definitions
into a new header file: apic.h.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 .../selftests/kvm/include/x86_64/apic.h       | 58 +++++++++++++++++++
 .../selftests/kvm/include/x86_64/processor.h  | 47 ---------------
 .../selftests/kvm/include/x86_64/vmx.h        |  1 +
 3 files changed, 59 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/apic.h

diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
new file mode 100644
index 000000000000..0d0e35c8866b
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * tools/testing/selftests/kvm/include/x86_64/apic.h
+ *
+ * Copyright (C) 2021, Google LLC.
+ */
+
+#ifndef SELFTEST_KVM_APIC_H
+#define SELFTEST_KVM_APIC_H
+
+#define APIC_DEFAULT_GPA		0xfee00000ULL
+
+/* APIC base address MSR and fields */
+#define MSR_IA32_APICBASE		0x0000001b
+#define MSR_IA32_APICBASE_BSP		(1<<8)
+#define MSR_IA32_APICBASE_EXTD		(1<<10)
+#define MSR_IA32_APICBASE_ENABLE	(1<<11)
+#define MSR_IA32_APICBASE_BASE		(0xfffff<<12)
+#define		GET_APIC_BASE(x)	(((x) >> 12) << 12)
+
+#define APIC_BASE_MSR	0x800
+#define X2APIC_ENABLE	(1UL << 10)
+#define	APIC_ID		0x20
+#define	APIC_LVR	0x30
+#define		GET_APIC_ID_FIELD(x)	(((x) >> 24) & 0xFF)
+#define	APIC_TASKPRI	0x80
+#define	APIC_PROCPRI	0xA0
+#define	APIC_EOI	0xB0
+#define	APIC_SPIV	0xF0
+#define		APIC_SPIV_FOCUS_DISABLED	(1 << 9)
+#define		APIC_SPIV_APIC_ENABLED		(1 << 8)
+#define	APIC_ICR	0x300
+#define		APIC_DEST_SELF		0x40000
+#define		APIC_DEST_ALLINC	0x80000
+#define		APIC_DEST_ALLBUT	0xC0000
+#define		APIC_ICR_RR_MASK	0x30000
+#define		APIC_ICR_RR_INVALID	0x00000
+#define		APIC_ICR_RR_INPROG	0x10000
+#define		APIC_ICR_RR_VALID	0x20000
+#define		APIC_INT_LEVELTRIG	0x08000
+#define		APIC_INT_ASSERT		0x04000
+#define		APIC_ICR_BUSY		0x01000
+#define		APIC_DEST_LOGICAL	0x00800
+#define		APIC_DEST_PHYSICAL	0x00000
+#define		APIC_DM_FIXED		0x00000
+#define		APIC_DM_FIXED_MASK	0x00700
+#define		APIC_DM_LOWEST		0x00100
+#define		APIC_DM_SMI		0x00200
+#define		APIC_DM_REMRD		0x00300
+#define		APIC_DM_NMI		0x00400
+#define		APIC_DM_INIT		0x00500
+#define		APIC_DM_STARTUP		0x00600
+#define		APIC_DM_EXTINT		0x00700
+#define		APIC_VECTOR_MASK	0x000FF
+#define	APIC_ICR2	0x310
+#define		SET_APIC_DEST_FIELD(x)	((x) << 24)
+
+#endif /* SELFTEST_KVM_APIC_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 0b30b4e15c38..a4729d9032ce 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -425,53 +425,6 @@ struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpui
 #define X86_CR0_CD          (1UL<<30) /* Cache Disable */
 #define X86_CR0_PG          (1UL<<31) /* Paging */
 
-#define APIC_DEFAULT_GPA		0xfee00000ULL
-
-/* APIC base address MSR and fields */
-#define MSR_IA32_APICBASE		0x0000001b
-#define MSR_IA32_APICBASE_BSP		(1<<8)
-#define MSR_IA32_APICBASE_EXTD		(1<<10)
-#define MSR_IA32_APICBASE_ENABLE	(1<<11)
-#define MSR_IA32_APICBASE_BASE		(0xfffff<<12)
-#define		GET_APIC_BASE(x)	(((x) >> 12) << 12)
-
-#define APIC_BASE_MSR	0x800
-#define X2APIC_ENABLE	(1UL << 10)
-#define	APIC_ID		0x20
-#define	APIC_LVR	0x30
-#define		GET_APIC_ID_FIELD(x)	(((x) >> 24) & 0xFF)
-#define	APIC_TASKPRI	0x80
-#define	APIC_PROCPRI	0xA0
-#define	APIC_EOI	0xB0
-#define	APIC_SPIV	0xF0
-#define		APIC_SPIV_FOCUS_DISABLED	(1 << 9)
-#define		APIC_SPIV_APIC_ENABLED		(1 << 8)
-#define	APIC_ICR	0x300
-#define		APIC_DEST_SELF		0x40000
-#define		APIC_DEST_ALLINC	0x80000
-#define		APIC_DEST_ALLBUT	0xC0000
-#define		APIC_ICR_RR_MASK	0x30000
-#define		APIC_ICR_RR_INVALID	0x00000
-#define		APIC_ICR_RR_INPROG	0x10000
-#define		APIC_ICR_RR_VALID	0x20000
-#define		APIC_INT_LEVELTRIG	0x08000
-#define		APIC_INT_ASSERT		0x04000
-#define		APIC_ICR_BUSY		0x01000
-#define		APIC_DEST_LOGICAL	0x00800
-#define		APIC_DEST_PHYSICAL	0x00000
-#define		APIC_DM_FIXED		0x00000
-#define		APIC_DM_FIXED_MASK	0x00700
-#define		APIC_DM_LOWEST		0x00100
-#define		APIC_DM_SMI		0x00200
-#define		APIC_DM_REMRD		0x00300
-#define		APIC_DM_NMI		0x00400
-#define		APIC_DM_INIT		0x00500
-#define		APIC_DM_STARTUP		0x00600
-#define		APIC_DM_EXTINT		0x00700
-#define		APIC_VECTOR_MASK	0x000FF
-#define	APIC_ICR2	0x310
-#define		SET_APIC_DEST_FIELD(x)	((x) << 24)
-
 /* VMX_EPT_VPID_CAP bits */
 #define VMX_EPT_VPID_CAP_AD_BITS       (1ULL << 21)
 
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 65eb1079a161..516c81d86353 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -10,6 +10,7 @@
 
 #include <stdint.h>
 #include "processor.h"
+#include "apic.h"
 
 /*
  * Definitions of Primary Processor-Based VM-Execution Controls.
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 09/12] KVM: selftests: Hoist APIC functions out of individual tests
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (7 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 08/12] KVM: selftests: Move APIC definitions into a separate file Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 10/12] KVM: selftests: Introduce x2APIC register manipulation functions Jim Mattson
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Oliver Upton

Move the APIC functions into the library to encourage code reuse and
to avoid unintended deviations.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  2 +-
 .../selftests/kvm/include/x86_64/apic.h       | 23 ++++++++
 .../selftests/kvm/include/x86_64/processor.h  |  2 +
 tools/testing/selftests/kvm/lib/x86_64/apic.c | 46 +++++++++++++++
 .../testing/selftests/kvm/x86_64/evmcs_test.c | 11 +---
 .../selftests/kvm/x86_64/set_boot_cpu_id.c    |  6 +-
 .../selftests/kvm/x86_64/xapic_ipi_test.c     | 59 +++----------------
 7 files changed, 83 insertions(+), 66 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/apic.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..af102e03e698 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -34,7 +34,7 @@ ifeq ($(ARCH),s390)
 endif
 
 LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c
-LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
+LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
 LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
 LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
 
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index 0d0e35c8866b..e5a9fe040a6c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -8,6 +8,10 @@
 #ifndef SELFTEST_KVM_APIC_H
 #define SELFTEST_KVM_APIC_H
 
+#include <stdint.h>
+
+#include "processor.h"
+
 #define APIC_DEFAULT_GPA		0xfee00000ULL
 
 /* APIC base address MSR and fields */
@@ -55,4 +59,23 @@
 #define	APIC_ICR2	0x310
 #define		SET_APIC_DEST_FIELD(x)	((x) << 24)
 
+void apic_disable(void);
+void xapic_enable(void);
+void x2apic_enable(void);
+
+static inline uint32_t get_bsp_flag(void)
+{
+	return rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_BSP;
+}
+
+static inline uint32_t xapic_read_reg(unsigned int reg)
+{
+	return ((volatile uint32_t *)APIC_DEFAULT_GPA)[reg >> 2];
+}
+
+static inline void xapic_write_reg(unsigned int reg, uint32_t val)
+{
+	((volatile uint32_t *)APIC_DEFAULT_GPA)[reg >> 2] = val;
+}
+
 #endif /* SELFTEST_KVM_APIC_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index a4729d9032ce..9a5b47d2d5d6 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -13,6 +13,8 @@
 
 #include <asm/msr-index.h>
 
+#include "../kvm_util.h"
+
 #define X86_EFLAGS_FIXED	 (1u << 1)
 
 #define X86_CR4_VME		(1ul << 0)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/apic.c b/tools/testing/selftests/kvm/lib/x86_64/apic.c
new file mode 100644
index 000000000000..31f318ac67ba
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/x86_64/apic.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * tools/testing/selftests/kvm/lib/x86_64/processor.c
+ *
+ * Copyright (C) 2021, Google LLC.
+ */
+
+#include "apic.h"
+
+void apic_disable(void)
+{
+	wrmsr(MSR_IA32_APICBASE,
+	      rdmsr(MSR_IA32_APICBASE) &
+		~(MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD));
+}
+
+void xapic_enable(void)
+{
+	uint64_t val = rdmsr(MSR_IA32_APICBASE);
+
+	/* Per SDM: to enable xAPIC when in x2APIC must first disable APIC */
+	if (val & MSR_IA32_APICBASE_EXTD) {
+		apic_disable();
+		wrmsr(MSR_IA32_APICBASE,
+		      rdmsr(MSR_IA32_APICBASE) | MSR_IA32_APICBASE_ENABLE);
+	} else if (!(val & MSR_IA32_APICBASE_ENABLE)) {
+		wrmsr(MSR_IA32_APICBASE, val | MSR_IA32_APICBASE_ENABLE);
+	}
+
+	/*
+	 * Per SDM: reset value of spurious interrupt vector register has the
+	 * APIC software enabled bit=0. It must be enabled in addition to the
+	 * enable bit in the MSR.
+	 */
+	val = xapic_read_reg(APIC_SPIV) | APIC_SPIV_APIC_ENABLED;
+	xapic_write_reg(APIC_SPIV, val);
+}
+
+void x2apic_enable(void)
+{
+	uint32_t spiv_reg = APIC_BASE_MSR + (APIC_SPIV >> 4);
+
+	wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
+	      MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD);
+	wrmsr(spiv_reg, rdmsr(spiv_reg) | APIC_SPIV_APIC_ENABLED);
+}
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index 63096cea26c6..d058d9e428c6 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -22,15 +22,6 @@
 
 static int ud_count;
 
-void enable_x2apic(void)
-{
-	uint32_t spiv_reg = APIC_BASE_MSR + (APIC_SPIV >> 4);
-
-	wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
-	      MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD);
-	wrmsr(spiv_reg, rdmsr(spiv_reg) | APIC_SPIV_APIC_ENABLED);
-}
-
 static void guest_ud_handler(struct ex_regs *regs)
 {
 	ud_count++;
@@ -59,7 +50,7 @@ void guest_code(struct vmx_pages *vmx_pages)
 #define L2_GUEST_STACK_SIZE 64
 	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
 
-	enable_x2apic();
+	x2apic_enable();
 
 	GUEST_SYNC(1);
 	GUEST_SYNC(2);
diff --git a/tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c b/tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c
index 12c558fc8074..5f8dd74d415f 100644
--- a/tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c
+++ b/tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c
@@ -14,16 +14,12 @@
 #include "test_util.h"
 #include "kvm_util.h"
 #include "processor.h"
+#include "apic.h"
 
 #define N_VCPU 2
 #define VCPU_ID0 0
 #define VCPU_ID1 1
 
-static uint32_t get_bsp_flag(void)
-{
-	return rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_BSP;
-}
-
 static void guest_bsp_vcpu(void *arg)
 {
 	GUEST_SYNC(1);
diff --git a/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c b/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c
index 2f964cdc273c..21b22718a9db 100644
--- a/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c
@@ -42,8 +42,6 @@
 #define HALTER_VCPU_ID 0
 #define SENDER_VCPU_ID 1
 
-volatile uint32_t *apic_base = (volatile uint32_t *)APIC_DEFAULT_GPA;
-
 /*
  * Vector for IPI from sender vCPU to halting vCPU.
  * Value is arbitrary and was chosen for the alternating bit pattern. Any
@@ -86,45 +84,6 @@ struct thread_params {
 	uint64_t *pipis_rcvd; /* host address of ipis_rcvd global */
 };
 
-uint32_t read_apic_reg(uint reg)
-{
-	return apic_base[reg >> 2];
-}
-
-void write_apic_reg(uint reg, uint32_t val)
-{
-	apic_base[reg >> 2] = val;
-}
-
-void disable_apic(void)
-{
-	wrmsr(MSR_IA32_APICBASE,
-	      rdmsr(MSR_IA32_APICBASE) &
-		~(MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD));
-}
-
-void enable_xapic(void)
-{
-	uint64_t val = rdmsr(MSR_IA32_APICBASE);
-
-	/* Per SDM: to enable xAPIC when in x2APIC must first disable APIC */
-	if (val & MSR_IA32_APICBASE_EXTD) {
-		disable_apic();
-		wrmsr(MSR_IA32_APICBASE,
-		      rdmsr(MSR_IA32_APICBASE) | MSR_IA32_APICBASE_ENABLE);
-	} else if (!(val & MSR_IA32_APICBASE_ENABLE)) {
-		wrmsr(MSR_IA32_APICBASE, val | MSR_IA32_APICBASE_ENABLE);
-	}
-
-	/*
-	 * Per SDM: reset value of spurious interrupt vector register has the
-	 * APIC software enabled bit=0. It must be enabled in addition to the
-	 * enable bit in the MSR.
-	 */
-	val = read_apic_reg(APIC_SPIV) | APIC_SPIV_APIC_ENABLED;
-	write_apic_reg(APIC_SPIV, val);
-}
-
 void verify_apic_base_addr(void)
 {
 	uint64_t msr = rdmsr(MSR_IA32_APICBASE);
@@ -136,10 +95,10 @@ void verify_apic_base_addr(void)
 static void halter_guest_code(struct test_data_page *data)
 {
 	verify_apic_base_addr();
-	enable_xapic();
+	xapic_enable();
 
-	data->halter_apic_id = GET_APIC_ID_FIELD(read_apic_reg(APIC_ID));
-	data->halter_lvr = read_apic_reg(APIC_LVR);
+	data->halter_apic_id = GET_APIC_ID_FIELD(xapic_read_reg(APIC_ID));
+	data->halter_lvr = xapic_read_reg(APIC_LVR);
 
 	/*
 	 * Loop forever HLTing and recording halts & wakes. Disable interrupts
@@ -150,8 +109,8 @@ static void halter_guest_code(struct test_data_page *data)
 	 * TPR and PPR for diagnostic purposes in case the test fails.
 	 */
 	for (;;) {
-		data->halter_tpr = read_apic_reg(APIC_TASKPRI);
-		data->halter_ppr = read_apic_reg(APIC_PROCPRI);
+		data->halter_tpr = xapic_read_reg(APIC_TASKPRI);
+		data->halter_ppr = xapic_read_reg(APIC_PROCPRI);
 		data->hlt_count++;
 		asm volatile("sti; hlt; cli");
 		data->wake_count++;
@@ -166,7 +125,7 @@ static void halter_guest_code(struct test_data_page *data)
 static void guest_ipi_handler(struct ex_regs *regs)
 {
 	ipis_rcvd++;
-	write_apic_reg(APIC_EOI, 77);
+	xapic_write_reg(APIC_EOI, 77);
 }
 
 static void sender_guest_code(struct test_data_page *data)
@@ -179,7 +138,7 @@ static void sender_guest_code(struct test_data_page *data)
 	uint64_t tsc_start;
 
 	verify_apic_base_addr();
-	enable_xapic();
+	xapic_enable();
 
 	/*
 	 * Init interrupt command register for sending IPIs
@@ -206,8 +165,8 @@ static void sender_guest_code(struct test_data_page *data)
 		 * First IPI can be sent unconditionally because halter vCPU
 		 * starts earlier.
 		 */
-		write_apic_reg(APIC_ICR2, icr2_val);
-		write_apic_reg(APIC_ICR, icr_val);
+		xapic_write_reg(APIC_ICR2, icr2_val);
+		xapic_write_reg(APIC_ICR, icr_val);
 		data->ipis_sent++;
 
 		/*
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 10/12] KVM: selftests: Introduce x2APIC register manipulation functions
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (8 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 09/12] KVM: selftests: Hoist APIC functions out of individual tests Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 11/12] KVM: selftests: Introduce prepare_tpr_shadow Jim Mattson
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Oliver Upton

Standardize reads and writes of the x2APIC MSRs.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/include/x86_64/apic.h | 10 ++++++++++
 tools/testing/selftests/kvm/lib/x86_64/apic.c     |  5 ++---
 tools/testing/selftests/kvm/x86_64/smm_test.c     |  4 ++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index e5a9fe040a6c..0be4757f1f20 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -78,4 +78,14 @@ static inline void xapic_write_reg(unsigned int reg, uint32_t val)
 	((volatile uint32_t *)APIC_DEFAULT_GPA)[reg >> 2] = val;
 }
 
+static inline uint64_t x2apic_read_reg(unsigned int reg)
+{
+	return rdmsr(APIC_BASE_MSR + (reg >> 4));
+}
+
+static inline void x2apic_write_reg(unsigned int reg, uint64_t value)
+{
+	wrmsr(APIC_BASE_MSR + (reg >> 4), value);
+}
+
 #endif /* SELFTEST_KVM_APIC_H */
diff --git a/tools/testing/selftests/kvm/lib/x86_64/apic.c b/tools/testing/selftests/kvm/lib/x86_64/apic.c
index 31f318ac67ba..7168e25c194e 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/apic.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/apic.c
@@ -38,9 +38,8 @@ void xapic_enable(void)
 
 void x2apic_enable(void)
 {
-	uint32_t spiv_reg = APIC_BASE_MSR + (APIC_SPIV >> 4);
-
 	wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
 	      MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD);
-	wrmsr(spiv_reg, rdmsr(spiv_reg) | APIC_SPIV_APIC_ENABLED);
+	x2apic_write_reg(APIC_SPIV,
+			 x2apic_read_reg(APIC_SPIV) | APIC_SPIV_APIC_ENABLED);
 }
diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index 613c42c5a9b8..c1f831803ad2 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -55,8 +55,8 @@ static inline void sync_with_host(uint64_t phase)
 
 void self_smi(void)
 {
-	wrmsr(APIC_BASE_MSR + (APIC_ICR >> 4),
-	      APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
+	x2apic_write_reg(APIC_ICR,
+			 APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
 }
 
 void guest_code(void *arg)
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 11/12] KVM: selftests: Introduce prepare_tpr_shadow
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (9 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 10/12] KVM: selftests: Introduce x2APIC register manipulation functions Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-20 23:03 ` [PATCH 12/12] KVM: selftests: Add a test of an unbacked nested PI descriptor Jim Mattson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Oliver Upton

Add support for yet another page to hang from the VMCS12 for nested
VMX testing: the virtual APIC page. This page is necessary for a
VMCS12 to be launched with the "use TPR shadow" VM-execution control
set (except in some oddball circumstances permitted by KVM).

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/include/x86_64/vmx.h | 5 +++++
 tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 516c81d86353..83ccb096b966 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -574,6 +574,10 @@ struct vmx_pages {
 	void *apic_access_hva;
 	uint64_t apic_access_gpa;
 	void *apic_access;
+
+	void *virtual_apic_hva;
+	uint64_t virtual_apic_gpa;
+	void *virtual_apic;
 };
 
 union vmx_basic {
@@ -618,5 +622,6 @@ void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot);
 void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm,
 				      uint32_t eptp_memslot);
+void prepare_tpr_shadow(struct vmx_pages *vmx, struct kvm_vm *vm);
 
 #endif /* SELFTEST_KVM_VMX_H */
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 2448b30e8efa..1023760d1bf7 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -551,3 +551,11 @@ void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm,
 	vmx->apic_access_hva = addr_gva2hva(vm, (uintptr_t)vmx->apic_access);
 	vmx->apic_access_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->apic_access);
 }
+
+void prepare_tpr_shadow(struct vmx_pages *vmx, struct kvm_vm *vm)
+{
+	vmx->virtual_apic = (void *)vm_vaddr_alloc(vm, getpagesize(),
+						  0x10000, 0, 0);
+	vmx->virtual_apic_hva = addr_gva2hva(vm, (uintptr_t)vmx->virtual_apic);
+	vmx->virtual_apic_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->virtual_apic);
+}
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 12/12] KVM: selftests: Add a test of an unbacked nested PI descriptor
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (10 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 11/12] KVM: selftests: Introduce prepare_tpr_shadow Jim Mattson
@ 2021-05-20 23:03 ` Jim Mattson
  2021-05-21  0:58 ` [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Sean Christopherson
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-20 23:03 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: Jim Mattson, Oliver Upton

Add a regression test for the unsupported configuration of a VMCS12
posted interrupt descriptor that has no backing memory in L1. KVM
should exit to userspace with KVM_INTERNAL_ERROR rather than just
silently doing something wrong.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/vmx_pi_mmio_test.c   | 252 ++++++++++++++++++
 3 files changed, 254 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..f813761ac0a1 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -34,6 +34,7 @@
 /x86_64/xen_vmcall_test
 /x86_64/xss_msr_test
 /x86_64/vmx_pmu_msrs_test
+/x86_64/vmx_pi_mmio_test
 /demand_paging_test
 /dirty_log_test
 /dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index af102e03e698..fef0992f04c9 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -67,6 +67,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/xen_shinfo_test
 TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_pi_mmio_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c b/tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c
new file mode 100644
index 000000000000..2246899d8988
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * vmx_pi_mmio_test
+ *
+ * Copyright (C) 2021, Google LLC.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * Test that an L2 vCPU can be launched with an unbacked posted
+ * interrupt descriptor, but that any attempt to send that vCPU its
+ * posted interrupt notification vector will result in an exit to
+ * userspace with KVM_INTERNAL_ERROR.
+ *
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <pthread.h>
+#include <inttypes.h>
+#include <unistd.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+#include "vmx.h"
+
+#include "kselftest.h"
+
+#define RECEIVER_VCPU_ID	0
+#define SENDER_VCPU_ID		1
+
+#define L2_GUEST_STACK_SIZE	64
+
+#define TIMEOUT_SECS		10
+
+#define L1_PI_VECTOR		33
+
+static struct kvm_vm *vm;
+
+static bool l2_active;
+
+static void l2_guest_code(void)
+{
+	l2_active = true;
+	__asm__ __volatile__("hlt");
+	/* NOT REACHED */
+}
+
+static void l1_receiver_code(struct vmx_pages *vmx_pages,
+			     unsigned long high_gpa)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	uint32_t control;
+
+	x2apic_enable();
+
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	GUEST_ASSERT(load_vmcs(vmx_pages));
+
+	prepare_vmcs(vmx_pages, l2_guest_code,
+		     &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	control = vmreadz(PIN_BASED_VM_EXEC_CONTROL);
+	control |= PIN_BASED_EXT_INTR_MASK |
+		PIN_BASED_POSTED_INTR;
+	vmwrite(PIN_BASED_VM_EXEC_CONTROL, control);
+
+	control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
+	control |= CPU_BASED_TPR_SHADOW |
+		CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+	vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
+
+	control = vmreadz(SECONDARY_VM_EXEC_CONTROL);
+	control |= SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
+	vmwrite(SECONDARY_VM_EXEC_CONTROL, control);
+
+	control = vmreadz(VM_EXIT_CONTROLS);
+	control |= VM_EXIT_ACK_INTR_ON_EXIT;
+	vmwrite(VM_EXIT_CONTROLS, control);
+
+	vmwrite(VIRTUAL_APIC_PAGE_ADDR, vmx_pages->virtual_apic_gpa);
+	vmwrite(POSTED_INTR_NV, L1_PI_VECTOR);
+	vmwrite(POSTED_INTR_DESC_ADDR, high_gpa);
+
+	GUEST_ASSERT(!vmlaunch());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+
+	GUEST_DONE();
+}
+
+static void l1_sender_code(void *arg)
+{
+	x2apic_enable();
+
+	x2apic_write_reg(APIC_ICR,
+			 APIC_INT_ASSERT | APIC_DEST_PHYSICAL |
+			 APIC_DM_FIXED | L1_PI_VECTOR |
+			 ((uint64_t)RECEIVER_VCPU_ID << 32));
+
+	GUEST_DONE();
+}
+
+static bool vcpu_run_loop(int vcpu_id)
+{
+	volatile struct kvm_run *run = vcpu_state(vm, vcpu_id);
+	bool done = false;
+	struct ucall uc;
+
+	while (!done) {
+		vcpu_run(vm, vcpu_id);
+
+		if (run->exit_reason != KVM_EXIT_IO)
+			break;
+
+		switch (get_ucall(vm, vcpu_id, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("vCPU  %d: %s at %s:%ld", vcpu_id,
+				  (const char *)uc.args[0], __FILE__,
+				  uc.args[1]);
+			/* NOT REACHED */
+		case UCALL_SYNC:
+			break;
+		case UCALL_DONE:
+			done = true;
+			break;
+		default:
+			TEST_FAIL("vCPU %d: Unknown ucall %lu",
+				  vcpu_id, uc.cmd);
+			/* NOT REACHED */
+		}
+	}
+
+	return done;
+}
+
+static void *receiver(void *arg)
+{
+	volatile struct kvm_run *run = vcpu_state(vm, RECEIVER_VCPU_ID);
+	unsigned long high_gpa = *(unsigned long *)arg;
+	vm_vaddr_t vmx_pages_gva;
+	struct vmx_pages *vmx;
+	bool success;
+
+	vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
+	prepare_tpr_shadow(vmx, vm);
+	vcpu_args_set(vm, RECEIVER_VCPU_ID, 2, vmx_pages_gva, high_gpa);
+
+	success = vcpu_run_loop(RECEIVER_VCPU_ID);
+	TEST_ASSERT(!success, "Receiver didn't fail as expected.\n");
+	TEST_ASSERT(run->exit_reason ==
+		    KVM_EXIT_INTERNAL_ERROR,
+		    "Exit reason isn't KVM_EXIT_INTERNAL_ERROR: %u (%s).\n",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+	TEST_ASSERT(run->internal.suberror ==
+		    KVM_INTERNAL_ERROR_EMULATION,
+		    "Internal suberror isn't KVM_INTERNAL_ERROR_EMULATION: %u.\n",
+		    run->internal.suberror);
+
+	return NULL;
+}
+
+static void sender(void)
+{
+	volatile struct kvm_run *run = vcpu_state(vm, SENDER_VCPU_ID);
+	bool success;
+
+	success = vcpu_run_loop(SENDER_VCPU_ID);
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "Sender didn't exit with KVM_EXIT_IO: %u (%s).\n",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+	TEST_ASSERT(success, "Sender didn't complete successfully.\n");
+}
+
+void check_constraints(void)
+{
+	uint64_t msr;
+
+	nested_vmx_check_supported();
+
+	msr = kvm_get_feature_msr(MSR_IA32_VMX_PINBASED_CTLS) >> 32;
+	if (!(msr & PIN_BASED_EXT_INTR_MASK)) {
+		print_skip("Cannot enable \"external-interrupt exiting\"");
+		exit(KSFT_SKIP);
+	}
+	if (!(msr & PIN_BASED_POSTED_INTR)) {
+		print_skip("Cannot enable \"process posted interrupts\"");
+		exit(KSFT_SKIP);
+	}
+
+	msr = kvm_get_feature_msr(MSR_IA32_VMX_PROCBASED_CTLS) >> 32;
+	if (!(msr & CPU_BASED_TPR_SHADOW)) {
+		print_skip("Cannot enable \"use TPR shadow\"");
+		exit(KSFT_SKIP);
+	}
+	if (!(msr & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)) {
+		print_skip("Cannot enable \"activate secondary controls\"");
+		exit(KSFT_SKIP);
+	}
+
+	msr = kvm_get_feature_msr(MSR_IA32_VMX_PROCBASED_CTLS2) >> 32;
+	if (!(msr & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)) {
+		print_skip("Cannot enable \"virtual-interrupt delivery\"");
+		exit(KSFT_SKIP);
+	}
+
+	msr = kvm_get_feature_msr(MSR_IA32_VMX_EXIT_CTLS) >> 32;
+	if (!(msr & VM_EXIT_ACK_INTR_ON_EXIT)) {
+		print_skip("Cannot enable \"acknowledge interrupt on exit\"");
+		exit(KSFT_SKIP);
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	unsigned int paddr_width;
+	unsigned int vaddr_width;
+	unsigned long high_gpa;
+	pthread_t thread;
+	bool *l2_active_hva;
+	int r;
+
+	kvm_get_cpu_address_width(&paddr_width, &vaddr_width);
+	high_gpa = (1ul << paddr_width) - getpagesize();
+	if ((unsigned long)DEFAULT_GUEST_PHY_PAGES * getpagesize() > high_gpa) {
+		print_skip("No unbacked physical page available");
+		exit(KSFT_SKIP);
+	}
+
+	check_constraints();
+
+	vm = vm_create_default(RECEIVER_VCPU_ID, 0, (void *)l1_receiver_code);
+	vm_vcpu_add_default(vm, SENDER_VCPU_ID, (void *)l1_sender_code);
+	vcpu_set_cpuid(vm, SENDER_VCPU_ID, kvm_get_supported_cpuid());
+
+	r = pthread_create(&thread, NULL, receiver, &high_gpa);
+	TEST_ASSERT(r == 0,
+		    "pthread_create failed errno=%d", errno);
+
+	alarm(TIMEOUT_SECS);
+	l2_active_hva = (bool *)addr_gva2hva(vm, (vm_vaddr_t)&l2_active);
+	while (!*l2_active_hva)
+		pthread_yield();
+
+	sender();
+
+	r = pthread_join(thread, NULL);
+	TEST_ASSERT(r == 0, "pthread_join failed with errno=%d", r);
+
+	kvm_vm_free(vm);
+
+	return 0;
+}
-- 
2.31.1.818.g46aad6cb9e-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (11 preceding siblings ...)
  2021-05-20 23:03 ` [PATCH 12/12] KVM: selftests: Add a test of an unbacked nested PI descriptor Jim Mattson
@ 2021-05-21  0:58 ` Sean Christopherson
  2021-05-21 12:04 ` Jim Mattson
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2021-05-21  0:58 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, pbonzini

On Thu, May 20, 2021, Jim Mattson wrote:
> Initially, I added a new kvm request for a userspace exit on the next guest
> entry, but Sean hated that approach.

Hey!  You could at least say _why_ I hated it. :-D

My argument is that the only way to guarantee that vcpu->run->exit_reason isn't
crushed between making the request and servicing the request is by guaranteeing
that KVM makes it back to the run loop immediately after making the request.
Since the only way to accomplish that is by forcing a return up the stack, at
that point we can simply return the "request" directly.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (12 preceding siblings ...)
  2021-05-21  0:58 ` [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Sean Christopherson
@ 2021-05-21 12:04 ` Jim Mattson
  2021-05-24 15:50 ` Paolo Bonzini
  2021-05-24 16:46 ` Paolo Bonzini
  15 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-21 12:04 UTC (permalink / raw)
  To: kvm list, Paolo Bonzini

On Thu, May 20, 2021 at 4:03 PM Jim Mattson <jmattson@google.com> wrote:
>
> When the VMCS12 posted interrupt descriptor isn't backed by an L1
> memslot, kvm will launch vmcs02 with a stale posted interrupt
> descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
> nested_get_vmcs12_pages()"), kvm would have silently disabled the
> VMCS02 "process posted interrupts" VM-execution control. Both
> behaviors are wrong, though the use-after-free is more egregious.

Oops. Prior to the referenced commit, kvm would have forced a vmcs02
VM-entry failure by loading an illegal value into its posted interrupt
descriptor field. Though better than clearing the "process posted
interrupts" VM-execution control, that's still wrong.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-20 23:03 ` [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails Jim Mattson
@ 2021-05-24 15:43   ` Paolo Bonzini
  2021-05-24 16:39     ` Jim Mattson
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2021-05-24 15:43 UTC (permalink / raw)
  To: Jim Mattson, kvm; +Cc: Oliver Upton

On 21/05/21 01:03, Jim Mattson wrote:
> At present, there are two reasons why kvm_check_nested_events may
> return a non-zero value:
> 
> 1) we just emulated a shutdown VM-exit from L2 to L1.
> 2) we need to perform an immediate VM-exit from vmcs02.
> 
> In either case, transition the vCPU to "running."
> 
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Oliver Upton <oupton@google.com>
> ---
>   arch/x86/kvm/x86.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d517460db413..d3fea8ea3628 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9468,8 +9468,8 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
>   
>   static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
>   {
> -	if (is_guest_mode(vcpu))
> -		kvm_check_nested_events(vcpu);
> +	if (is_guest_mode(vcpu) && kvm_check_nested_events(vcpu))
> +		return true;

That doesn't make the vCPU running.  You still need to go through 
vcpu_block, which would properly update the vCPU's mp_state.

What is this patch fixing?

Paolo

>   	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>   		!vcpu->arch.apf.halted);
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (13 preceding siblings ...)
  2021-05-21 12:04 ` Jim Mattson
@ 2021-05-24 15:50 ` Paolo Bonzini
  2021-05-24 16:46 ` Paolo Bonzini
  15 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2021-05-24 15:50 UTC (permalink / raw)
  To: Jim Mattson, kvm

On 21/05/21 01:03, Jim Mattson wrote:
> When the VMCS12 posted interrupt descriptor isn't backed by an L1
> memslot, kvm will launch vmcs02 with a stale posted interrupt
> descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
> nested_get_vmcs12_pages()"), kvm would have silently disabled the
> VMCS02 "process posted interrupts" VM-execution control. Both
> behaviors are wrong, though the use-after-free is more egregious.
> 
> Empirical tests on actual hardware reveal that a posted interrupt
> descriptor without any backing memory/device has PCI bus error
> semantics (reads return all 1's and writes are discarded). However,
> kvm can't tell an unbacked address from an MMIO address. Normally, kvm
> would ask userspace for an MMIO completion, but that's complicated for
> a posted interrupt descriptor access. There are already a number of
> cases where kvm bails out to userspace with KVM_INTERNAL_ERROR via
> kvm_handle_memory_failure, so that seems like the best route to take.
> 
> It would be relatively easy to invoke kvm_handle_memory_failure at
> emulated VM-entry, but that approach would break existing
> kvm-unit-tests. Moreover, the issue doesn't really come up until the
> vCPU--in virtualized VMX non-root operation--received the posted
> interrupt notification vector indicated in its VMCS12.
> 
> Sadly, it's really hard to arrange for an exit to userspace from
> vmx_complete_nested_posted_interrupt, which is where kvm actually
> needs to access the unbacked PID. Initially, I added a new kvm request
> for a userspace exit on the next guest entry, but Sean hated that
> approach. Based on his suggestion, I added the plumbing to get back
> out to userspace in the event of an error in
> vmx_complete_nested_posted_interrupt. This works in the case of an
> unbacked PID, but it doesn't work quite as well in the case of an
> unbacked virtual APIC page (another case where kvm was happy to just
> silently ignore the problem and attempt to muddle its way through.) In
> that case, this series is an incremental improvement, but it's not a
> complete fix.
> 
> Jim Mattson (12):
>    KVM: x86: Remove guest mode check from kvm_check_nested_events
>    KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
>    KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt
>    KVM: x86: Add a return code to inject_pending_event
>    KVM: x86: Add a return code to kvm_apic_accept_events
>    KVM: nVMX: Fail on MMIO completion for nested posted interrupts
>    KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't
>      mappable
>    KVM: selftests: Move APIC definitions into a separate file
>    KVM: selftests: Hoist APIC functions out of individual tests
>    KVM: selftests: Introduce x2APIC register manipulation functions
>    KVM: selftests: Introduce prepare_tpr_shadow
>    KVM: selftests: Add a test of an unbacked nested PI descriptor

Patch 2 is the only one that seems wrong to me, and I am actually not 
sure why it is part of this series.  It seems to me that it overlaps 
"kvm: x86: move srcu lock out of kvm_vcpu_check_block​", for which both 
Sean and I had some discussions on how to remove the side effects that 
kvm_check_nested_events has on kvm_vcpu_is_runn{able,ing}.

Otherwise looks good, and I've already queued patches 8-10.  Holler if I 
should just apply patches 1 and 3-12.

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 15:43   ` Paolo Bonzini
@ 2021-05-24 16:39     ` Jim Mattson
  2021-05-24 16:43       ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-24 16:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm list, Oliver Upton

Without this patch, the accompanying selftest never wakes up from HLT
in L2. If you can get the selftest to work without this patch, feel
free to drop it.

On Mon, May 24, 2021 at 8:43 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 21/05/21 01:03, Jim Mattson wrote:
> > At present, there are two reasons why kvm_check_nested_events may
> > return a non-zero value:
> >
> > 1) we just emulated a shutdown VM-exit from L2 to L1.
> > 2) we need to perform an immediate VM-exit from vmcs02.
> >
> > In either case, transition the vCPU to "running."
> >
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > ---
> >   arch/x86/kvm/x86.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d517460db413..d3fea8ea3628 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9468,8 +9468,8 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> >
> >   static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
> >   {
> > -     if (is_guest_mode(vcpu))
> > -             kvm_check_nested_events(vcpu);
> > +     if (is_guest_mode(vcpu) && kvm_check_nested_events(vcpu))
> > +             return true;
>
> That doesn't make the vCPU running.  You still need to go through
> vcpu_block, which would properly update the vCPU's mp_state.
>
> What is this patch fixing?
>
> Paolo
>
> >       return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> >               !vcpu->arch.apf.halted);
> >
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 16:39     ` Jim Mattson
@ 2021-05-24 16:43       ` Paolo Bonzini
  2021-05-24 17:10         ` Jim Mattson
  2021-05-24 23:10         ` Sean Christopherson
  0 siblings, 2 replies; 35+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:43 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton

On 24/05/21 18:39, Jim Mattson wrote:
> Without this patch, the accompanying selftest never wakes up from HLT
> in L2. If you can get the selftest to work without this patch, feel
> free to drop it.

Ok, that's a pretty good reason.  I'll try to debug it.

Paolo

> On Mon, May 24, 2021 at 8:43 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 21/05/21 01:03, Jim Mattson wrote:
>>> At present, there are two reasons why kvm_check_nested_events may
>>> return a non-zero value:
>>>
>>> 1) we just emulated a shutdown VM-exit from L2 to L1.
>>> 2) we need to perform an immediate VM-exit from vmcs02.
>>>
>>> In either case, transition the vCPU to "running."
>>>
>>> Signed-off-by: Jim Mattson <jmattson@google.com>
>>> Reviewed-by: Oliver Upton <oupton@google.com>
>>> ---
>>>    arch/x86/kvm/x86.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index d517460db413..d3fea8ea3628 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -9468,8 +9468,8 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
>>>
>>>    static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
>>>    {
>>> -     if (is_guest_mode(vcpu))
>>> -             kvm_check_nested_events(vcpu);
>>> +     if (is_guest_mode(vcpu) && kvm_check_nested_events(vcpu))
>>> +             return true;
>>
>> That doesn't make the vCPU running.  You still need to go through
>> vcpu_block, which would properly update the vCPU's mp_state.
>>
>> What is this patch fixing?
>>
>> Paolo
>>
>>>        return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>>>                !vcpu->arch.apf.halted);
>>>
>>
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue
  2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
                   ` (14 preceding siblings ...)
  2021-05-24 15:50 ` Paolo Bonzini
@ 2021-05-24 16:46 ` Paolo Bonzini
  15 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:46 UTC (permalink / raw)
  To: Jim Mattson, kvm

On 21/05/21 01:03, Jim Mattson wrote:
> When the VMCS12 posted interrupt descriptor isn't backed by an L1
> memslot, kvm will launch vmcs02 with a stale posted interrupt
> descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
> nested_get_vmcs12_pages()"), kvm would have silently disabled the
> VMCS02 "process posted interrupts" VM-execution control. Both
> behaviors are wrong, though the use-after-free is more egregious.
> 
> Empirical tests on actual hardware reveal that a posted interrupt
> descriptor without any backing memory/device has PCI bus error
> semantics (reads return all 1's and writes are discarded). However,
> kvm can't tell an unbacked address from an MMIO address. Normally, kvm
> would ask userspace for an MMIO completion, but that's complicated for
> a posted interrupt descriptor access. There are already a number of
> cases where kvm bails out to userspace with KVM_INTERNAL_ERROR via
> kvm_handle_memory_failure, so that seems like the best route to take.
> 
> It would be relatively easy to invoke kvm_handle_memory_failure at
> emulated VM-entry, but that approach would break existing
> kvm-unit-tests. Moreover, the issue doesn't really come up until the
> vCPU--in virtualized VMX non-root operation--received the posted
> interrupt notification vector indicated in its VMCS12.
> 
> Sadly, it's really hard to arrange for an exit to userspace from
> vmx_complete_nested_posted_interrupt, which is where kvm actually
> needs to access the unbacked PID. Initially, I added a new kvm request
> for a userspace exit on the next guest entry, but Sean hated that
> approach. Based on his suggestion, I added the plumbing to get back
> out to userspace in the event of an error in
> vmx_complete_nested_posted_interrupt. This works in the case of an
> unbacked PID, but it doesn't work quite as well in the case of an
> unbacked virtual APIC page (another case where kvm was happy to just
> silently ignore the problem and attempt to muddle its way through.) In
> that case, this series is an incremental improvement, but it's not a
> complete fix.
> 
> Jim Mattson (12):
>    KVM: x86: Remove guest mode check from kvm_check_nested_events
>    KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
>    KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt
>    KVM: x86: Add a return code to inject_pending_event
>    KVM: x86: Add a return code to kvm_apic_accept_events
>    KVM: nVMX: Fail on MMIO completion for nested posted interrupts
>    KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't
>      mappable
>    KVM: selftests: Move APIC definitions into a separate file
>    KVM: selftests: Hoist APIC functions out of individual tests
>    KVM: selftests: Introduce x2APIC register manipulation functions
>    KVM: selftests: Introduce prepare_tpr_shadow
>    KVM: selftests: Add a test of an unbacked nested PI descriptor
> 
>   arch/x86/kvm/lapic.c                          |  11 +-
>   arch/x86/kvm/lapic.h                          |   2 +-
>   arch/x86/kvm/vmx/nested.c                     |  31 ++-
>   arch/x86/kvm/x86.c                            |  56 ++--
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   3 +-
>   .../selftests/kvm/include/x86_64/apic.h       |  91 +++++++
>   .../selftests/kvm/include/x86_64/processor.h  |  49 +---
>   .../selftests/kvm/include/x86_64/vmx.h        |   6 +
>   tools/testing/selftests/kvm/lib/x86_64/apic.c |  45 ++++
>   tools/testing/selftests/kvm/lib/x86_64/vmx.c  |   8 +
>   .../testing/selftests/kvm/x86_64/evmcs_test.c |  11 +-
>   .../selftests/kvm/x86_64/set_boot_cpu_id.c    |   6 +-
>   tools/testing/selftests/kvm/x86_64/smm_test.c |   4 +-
>   .../selftests/kvm/x86_64/vmx_pi_mmio_test.c   | 252 ++++++++++++++++++
>   .../selftests/kvm/x86_64/xapic_ipi_test.c     |  59 +---
>   16 files changed, 488 insertions(+), 147 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/include/x86_64/apic.h
>   create mode 100644 tools/testing/selftests/kvm/lib/x86_64/apic.c
>   create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c
> 

Pending further debugging of the selftest I've queued 
1-3-4-5-6-7-8-9-10.  Thanks,

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 16:43       ` Paolo Bonzini
@ 2021-05-24 17:10         ` Jim Mattson
  2021-05-24 23:10         ` Sean Christopherson
  1 sibling, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-24 17:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm list, Oliver Upton

On Mon, May 24, 2021 at 9:43 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 24/05/21 18:39, Jim Mattson wrote:
> > Without this patch, the accompanying selftest never wakes up from HLT
> > in L2. If you can get the selftest to work without this patch, feel
> > free to drop it.
>
> Ok, that's a pretty good reason.  I'll try to debug it.
>
Note that it works (without this patch) if you change the HLT to an
infinite loop.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 16:43       ` Paolo Bonzini
  2021-05-24 17:10         ` Jim Mattson
@ 2021-05-24 23:10         ` Sean Christopherson
  2021-05-24 23:23           ` Jim Mattson
  1 sibling, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2021-05-24 23:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jim Mattson, kvm list, Oliver Upton

On Mon, May 24, 2021, Paolo Bonzini wrote:
> On 24/05/21 18:39, Jim Mattson wrote:
> > Without this patch, the accompanying selftest never wakes up from HLT
> > in L2. If you can get the selftest to work without this patch, feel
> > free to drop it.
> 
> Ok, that's a pretty good reason.  I'll try to debug it.

I don't think there's any debug necessary, the hack of unconditionally calling
kvm_check_nested_events() in kvm_vcpu_running() handles the case where a pending
event/exception causes nested VM-Exit, but doesn't handle the case where KVM
can't immediately service the nested VM-Exit.  Because the event is never
serviced (doesn't cause a VM-Exit) and doesn't show up as a pending event,
kvm_vcpu_has_events() and thus kvm_arch_vcpu_runnable() will never become true,
i.e. the vCPU gets stuck in L2 HLT.

Until Jim's selftest, that was limited to the "request immediate exit" case,
which meant that to hit the bug a VM-Exiting event needed to collide with nested
VM-Enter that also put L2 into HLT state without a different pending wake event.

Jim's selftest adds a more direct path in the form of the -EXNIO when the PI
descriptor hits a memslot hole.

The proper fix is what we discussed in the other thread; get kvm_vcpu_has_events()
to return true if there's a nested event pending.

If I'm right, this hack-a-fix should go to stable.  Eww...

> > On Mon, May 24, 2021 at 8:43 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > 
> > > On 21/05/21 01:03, Jim Mattson wrote:
> > > > At present, there are two reasons why kvm_check_nested_events may
> > > > return a non-zero value:
> > > > 
> > > > 1) we just emulated a shutdown VM-exit from L2 to L1.
> > > > 2) we need to perform an immediate VM-exit from vmcs02.
> > > > 
> > > > In either case, transition the vCPU to "running."
> > > > 
> > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > Reviewed-by: Oliver Upton <oupton@google.com>
> > > > ---
> > > >    arch/x86/kvm/x86.c | 4 ++--
> > > >    1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index d517460db413..d3fea8ea3628 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -9468,8 +9468,8 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> > > > 
> > > >    static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
> > > >    {
> > > > -     if (is_guest_mode(vcpu))
> > > > -             kvm_check_nested_events(vcpu);
> > > > +     if (is_guest_mode(vcpu) && kvm_check_nested_events(vcpu))
> > > > +             return true;
> > > 
> > > That doesn't make the vCPU running.  You still need to go through
> > > vcpu_block, which would properly update the vCPU's mp_state.
> > > 
> > > What is this patch fixing?
> > > 
> > > Paolo
> > > 
> > > >        return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> > > >                !vcpu->arch.apf.halted);
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-20 23:03 ` [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable Jim Mattson
@ 2021-05-24 23:21   ` Sean Christopherson
  2021-05-24 23:27     ` Jim Mattson
  0 siblings, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2021-05-24 23:21 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, pbonzini

On Thu, May 20, 2021, Jim Mattson wrote:
> Don't allow posted interrupts to modify a stale posted interrupt
> descriptor (including the initial value of 0).
> 
> Empirical tests on real hardware reveal that a posted interrupt
> descriptor referencing an unbacked address has PCI bus error semantics
> (reads as all 1's; writes are ignored). However, kvm can't distinguish
> unbacked addresses from device-backed (MMIO) addresses, so it should
> really ask userspace for an MMIO completion. That's overly
> complicated, so just punt with KVM_INTERNAL_ERROR.
> 
> Don't return the error until the posted interrupt descriptor is
> actually accessed. We don't want to break the existing kvm-unit-tests
> that assume they can launch an L2 VM with a posted interrupt
> descriptor that references MMIO space in L1.
> 
> Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 706c31821362..defd42201bb4 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
>  				offset_in_page(vmcs12->posted_intr_desc_addr));
>  			vmcs_write64(POSTED_INTR_DESC_ADDR,
>  				     pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> +		} else {
> +			/*
> +			 * Defer the KVM_INTERNAL_ERROR exit until
> +			 * someone tries to trigger posted interrupt
> +			 * processing on this vCPU, to avoid breaking
> +			 * existing kvm-unit-tests.

Run the lines out to 80 chars.  Also, can we change the comment to tie it to
CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
either.  E.g. something like

			/*
			 * Defer the KVM_INTERNAL_ERROR exit until posted
			 * interrupt processing actually occurs on this vCPU.
			 * Until that happens, the descriptor is not accessed,
			 * and userspace can technically rely on that behavior.
			 */ 

> +			 */
> +			vmx->nested.pi_desc = NULL;
> +			pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR);
>  		}
>  	}
>  	if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12))
> @@ -3689,10 +3698,14 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
>  	void *vapic_page;
>  	u16 status;
>  
> -	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
> +	if (!vmx->nested.pi_pending)
>  		return 0;
>  
> +	if (!vmx->nested.pi_desc)
> +		goto mmio_needed;
> +
>  	vmx->nested.pi_pending = false;
> +
>  	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
>  		return 0;
>  
> -- 
> 2.31.1.818.g46aad6cb9e-goog
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 23:10         ` Sean Christopherson
@ 2021-05-24 23:23           ` Jim Mattson
  2021-05-24 23:24             ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-24 23:23 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm list, Oliver Upton

On Mon, May 24, 2021 at 4:10 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, May 24, 2021, Paolo Bonzini wrote:
> > On 24/05/21 18:39, Jim Mattson wrote:
> > > Without this patch, the accompanying selftest never wakes up from HLT
> > > in L2. If you can get the selftest to work without this patch, feel
> > > free to drop it.
> >
> > Ok, that's a pretty good reason.  I'll try to debug it.
>
> I don't think there's any debug necessary, the hack of unconditionally calling
> kvm_check_nested_events() in kvm_vcpu_running() ...

We don't unconditionally call kvm_check_nested_events() in
kvm_vcpu_running(). We still call kvm_check_nested_events() only when
is_guest_mode(vcpu). The only change introduced in this patch is that
we stop ignoring the result.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 23:23           ` Jim Mattson
@ 2021-05-24 23:24             ` Sean Christopherson
  2021-05-24 23:29               ` Jim Mattson
  0 siblings, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2021-05-24 23:24 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Paolo Bonzini, kvm list, Oliver Upton

On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 4:10 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, May 24, 2021, Paolo Bonzini wrote:
> > > On 24/05/21 18:39, Jim Mattson wrote:
> > > > Without this patch, the accompanying selftest never wakes up from HLT
> > > > in L2. If you can get the selftest to work without this patch, feel
> > > > free to drop it.
> > >
> > > Ok, that's a pretty good reason.  I'll try to debug it.
> >
> > I don't think there's any debug necessary, the hack of unconditionally calling
> > kvm_check_nested_events() in kvm_vcpu_running() ...
> 
> We don't unconditionally call kvm_check_nested_events() in
> kvm_vcpu_running(). We still call kvm_check_nested_events() only when
> is_guest_mode(vcpu). The only change introduced in this patch is that
> we stop ignoring the result.

Doh, sorry, bad use of "unconditionally".  I meant "unconditionally when in L2". :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-24 23:21   ` Sean Christopherson
@ 2021-05-24 23:27     ` Jim Mattson
  2021-05-24 23:45       ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-24 23:27 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Paolo Bonzini

On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, May 20, 2021, Jim Mattson wrote:
> > Don't allow posted interrupts to modify a stale posted interrupt
> > descriptor (including the initial value of 0).
> >
> > Empirical tests on real hardware reveal that a posted interrupt
> > descriptor referencing an unbacked address has PCI bus error semantics
> > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > unbacked addresses from device-backed (MMIO) addresses, so it should
> > really ask userspace for an MMIO completion. That's overly
> > complicated, so just punt with KVM_INTERNAL_ERROR.
> >
> > Don't return the error until the posted interrupt descriptor is
> > actually accessed. We don't want to break the existing kvm-unit-tests
> > that assume they can launch an L2 VM with a posted interrupt
> > descriptor that references MMIO space in L1.
> >
> > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 706c31821362..defd42201bb4 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > +             } else {
> > +                     /*
> > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > +                      * someone tries to trigger posted interrupt
> > +                      * processing on this vCPU, to avoid breaking
> > +                      * existing kvm-unit-tests.
>
> Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> either.  E.g. something like
>
>                         /*
>                          * Defer the KVM_INTERNAL_ERROR exit until posted
>                          * interrupt processing actually occurs on this vCPU.
>                          * Until that happens, the descriptor is not accessed,
>                          * and userspace can technically rely on that behavior.
>                          */
Okay...except for the fact that kvm will rather gratuitously process
posted interrupts in situations where hardware won't. That makes it
difficult to tie this to hardware behavior.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 23:24             ` Sean Christopherson
@ 2021-05-24 23:29               ` Jim Mattson
  2021-05-24 23:34                 ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-24 23:29 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm list, Oliver Upton

On Mon, May 24, 2021 at 4:24 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, May 24, 2021, Jim Mattson wrote:
> > On Mon, May 24, 2021 at 4:10 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Mon, May 24, 2021, Paolo Bonzini wrote:
> > > > On 24/05/21 18:39, Jim Mattson wrote:
> > > > > Without this patch, the accompanying selftest never wakes up from HLT
> > > > > in L2. If you can get the selftest to work without this patch, feel
> > > > > free to drop it.
> > > >
> > > > Ok, that's a pretty good reason.  I'll try to debug it.
> > >
> > > I don't think there's any debug necessary, the hack of unconditionally calling
> > > kvm_check_nested_events() in kvm_vcpu_running() ...
> >
> > We don't unconditionally call kvm_check_nested_events() in
> > kvm_vcpu_running(). We still call kvm_check_nested_events() only when
> > is_guest_mode(vcpu). The only change introduced in this patch is that
> > we stop ignoring the result.
>
> Doh, sorry, bad use of "unconditionally".  I meant "unconditionally when in L2". :-)

Again, the conditions under which we call kvm_check_nested_events are
unchanged. The only "hack" here is the hack of not ignoring the return
value.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  2021-05-24 23:29               ` Jim Mattson
@ 2021-05-24 23:34                 ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2021-05-24 23:34 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Paolo Bonzini, kvm list, Oliver Upton

On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 4:24 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, May 24, 2021, Jim Mattson wrote:
> > > On Mon, May 24, 2021 at 4:10 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Mon, May 24, 2021, Paolo Bonzini wrote:
> > > > > On 24/05/21 18:39, Jim Mattson wrote:
> > > > > > Without this patch, the accompanying selftest never wakes up from HLT
> > > > > > in L2. If you can get the selftest to work without this patch, feel
> > > > > > free to drop it.
> > > > >
> > > > > Ok, that's a pretty good reason.  I'll try to debug it.
> > > >
> > > > I don't think there's any debug necessary, the hack of unconditionally calling
> > > > kvm_check_nested_events() in kvm_vcpu_running() ...
> > >
> > > We don't unconditionally call kvm_check_nested_events() in
> > > kvm_vcpu_running(). We still call kvm_check_nested_events() only when
> > > is_guest_mode(vcpu). The only change introduced in this patch is that
> > > we stop ignoring the result.
> >
> > Doh, sorry, bad use of "unconditionally".  I meant "unconditionally when in L2". :-)
> 
> Again, the conditions under which we call kvm_check_nested_events are
> unchanged. The only "hack" here is the hack of not ignoring the return
> value.

I don't disagree, all I'm saying is that the existing code is a hack, and this
doesn't fix/cleanse that hack.  I agree that this patch is a good intermediate
change.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-24 23:27     ` Jim Mattson
@ 2021-05-24 23:45       ` Sean Christopherson
  2021-05-25  0:03         ` Jim Mattson
  0 siblings, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2021-05-24 23:45 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Paolo Bonzini

On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, May 20, 2021, Jim Mattson wrote:
> > > Don't allow posted interrupts to modify a stale posted interrupt
> > > descriptor (including the initial value of 0).
> > >
> > > Empirical tests on real hardware reveal that a posted interrupt
> > > descriptor referencing an unbacked address has PCI bus error semantics
> > > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > > unbacked addresses from device-backed (MMIO) addresses, so it should
> > > really ask userspace for an MMIO completion. That's overly
> > > complicated, so just punt with KVM_INTERNAL_ERROR.
> > >
> > > Don't return the error until the posted interrupt descriptor is
> > > actually accessed. We don't want to break the existing kvm-unit-tests
> > > that assume they can launch an L2 VM with a posted interrupt
> > > descriptor that references MMIO space in L1.
> > >
> > > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index 706c31821362..defd42201bb4 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> > >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> > >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> > >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > > +             } else {
> > > +                     /*
> > > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > > +                      * someone tries to trigger posted interrupt
> > > +                      * processing on this vCPU, to avoid breaking
> > > +                      * existing kvm-unit-tests.
> >
> > Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> > CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> > may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> > either.  E.g. something like
> >
> >                         /*
> >                          * Defer the KVM_INTERNAL_ERROR exit until posted
> >                          * interrupt processing actually occurs on this vCPU.
> >                          * Until that happens, the descriptor is not accessed,
> >                          * and userspace can technically rely on that behavior.
> >                          */
> Okay...except for the fact that kvm will rather gratuitously process
> posted interrupts in situations where hardware won't. That makes it
> difficult to tie this to hardware behavior.

Hrm, true, but we can at say that KVM won't bail if there's zero chance of posted
interrupts being processed.  I hope?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-24 23:45       ` Sean Christopherson
@ 2021-05-25  0:03         ` Jim Mattson
  2021-05-25  0:11           ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-25  0:03 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Paolo Bonzini

On Mon, May 24, 2021 at 4:45 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, May 24, 2021, Jim Mattson wrote:
> > On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Thu, May 20, 2021, Jim Mattson wrote:
> > > > Don't allow posted interrupts to modify a stale posted interrupt
> > > > descriptor (including the initial value of 0).
> > > >
> > > > Empirical tests on real hardware reveal that a posted interrupt
> > > > descriptor referencing an unbacked address has PCI bus error semantics
> > > > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > > > unbacked addresses from device-backed (MMIO) addresses, so it should
> > > > really ask userspace for an MMIO completion. That's overly
> > > > complicated, so just punt with KVM_INTERNAL_ERROR.
> > > >
> > > > Don't return the error until the posted interrupt descriptor is
> > > > actually accessed. We don't want to break the existing kvm-unit-tests
> > > > that assume they can launch an L2 VM with a posted interrupt
> > > > descriptor that references MMIO space in L1.
> > > >
> > > > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > index 706c31821362..defd42201bb4 100644
> > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> > > >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> > > >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> > > >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > > > +             } else {
> > > > +                     /*
> > > > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > > > +                      * someone tries to trigger posted interrupt
> > > > +                      * processing on this vCPU, to avoid breaking
> > > > +                      * existing kvm-unit-tests.
> > >
> > > Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> > > CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> > > may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> > > either.  E.g. something like
> > >
> > >                         /*
> > >                          * Defer the KVM_INTERNAL_ERROR exit until posted
> > >                          * interrupt processing actually occurs on this vCPU.
> > >                          * Until that happens, the descriptor is not accessed,
> > >                          * and userspace can technically rely on that behavior.
> > >                          */
> > Okay...except for the fact that kvm will rather gratuitously process
> > posted interrupts in situations where hardware won't. That makes it
> > difficult to tie this to hardware behavior.
>
> Hrm, true, but we can at say that KVM won't bail if there's zero chance of posted
> interrupts being processed.  I hope?
Zero chance in KVM, or zero chance on hardware?

For instance, set TPR high enough to block the posted interrupt vector
from being delivered, and there is zero chance of posted interrupts
being processed by hardware. However, if another L1 vCPU sends that
vector by IPI, there is a 100% chance that KVM will bail, because it
ignores TPR for processing posted interrupts.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-25  0:03         ` Jim Mattson
@ 2021-05-25  0:11           ` Sean Christopherson
  2021-05-25  0:15             ` Jim Mattson
  0 siblings, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2021-05-25  0:11 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Paolo Bonzini

On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 4:45 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, May 24, 2021, Jim Mattson wrote:
> > > On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Thu, May 20, 2021, Jim Mattson wrote:
> > > > > Don't allow posted interrupts to modify a stale posted interrupt
> > > > > descriptor (including the initial value of 0).
> > > > >
> > > > > Empirical tests on real hardware reveal that a posted interrupt
> > > > > descriptor referencing an unbacked address has PCI bus error semantics
> > > > > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > > > > unbacked addresses from device-backed (MMIO) addresses, so it should
> > > > > really ask userspace for an MMIO completion. That's overly
> > > > > complicated, so just punt with KVM_INTERNAL_ERROR.
> > > > >
> > > > > Don't return the error until the posted interrupt descriptor is
> > > > > actually accessed. We don't want to break the existing kvm-unit-tests
> > > > > that assume they can launch an L2 VM with a posted interrupt
> > > > > descriptor that references MMIO space in L1.
> > > > >
> > > > > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > > ---
> > > > >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> > > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > > index 706c31821362..defd42201bb4 100644
> > > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> > > > >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> > > > >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> > > > >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > > > > +             } else {
> > > > > +                     /*
> > > > > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > > > > +                      * someone tries to trigger posted interrupt
> > > > > +                      * processing on this vCPU, to avoid breaking
> > > > > +                      * existing kvm-unit-tests.
> > > >
> > > > Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> > > > CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> > > > may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> > > > either.  E.g. something like
> > > >
> > > >                         /*
> > > >                          * Defer the KVM_INTERNAL_ERROR exit until posted
> > > >                          * interrupt processing actually occurs on this vCPU.
> > > >                          * Until that happens, the descriptor is not accessed,
> > > >                          * and userspace can technically rely on that behavior.
> > > >                          */
> > > Okay...except for the fact that kvm will rather gratuitously process
> > > posted interrupts in situations where hardware won't. That makes it
> > > difficult to tie this to hardware behavior.
> >
> > Hrm, true, but we can at say that KVM won't bail if there's zero chance of posted
> > interrupts being processed.  I hope?
> Zero chance in KVM, or zero chance on hardware?

I was hoping hardware...

> For instance, set TPR high enough to block the posted interrupt vector
> from being delivered, and there is zero chance of posted interrupts
> being processed by hardware. However, if another L1 vCPU sends that
> vector by IPI, there is a 100% chance that KVM will bail, because it
> ignores TPR for processing posted interrupts.

Can we instead word it along the lines of:

  Defer the KVM_INTERNAL_EXIT until KVM actually attempts to consume the posted
  interrupt descriptor on behalf of the vCPU.  Note, KVM may process posted
  interrupts when it architecturally should not.  Bugs aside, userspace can at
  least rely on KVM to not process posted interrupts if there is no (posted?)
  interrupt activity whatsoever.
				 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-25  0:11           ` Sean Christopherson
@ 2021-05-25  0:15             ` Jim Mattson
  2021-05-25  0:57               ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Jim Mattson @ 2021-05-25  0:15 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Paolo Bonzini

On Mon, May 24, 2021 at 5:11 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Can we instead word it along the lines of:
>
>   Defer the KVM_INTERNAL_EXIT until KVM actually attempts to consume the posted
>   interrupt descriptor on behalf of the vCPU.  Note, KVM may process posted
>   interrupts when it architecturally should not.  Bugs aside, userspace can at
>   least rely on KVM to not process posted interrupts if there is no (posted?)
>   interrupt activity whatsoever.

How about:

Defer the KVM_INTERNAL_EXIT until KVM tries to access the contents of
the VMCS12 posted interrupt descriptor. (Note that KVM may do this
when it should not, per the architectural specification.)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable
  2021-05-25  0:15             ` Jim Mattson
@ 2021-05-25  0:57               ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2021-05-25  0:57 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Paolo Bonzini

On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 5:11 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Can we instead word it along the lines of:
> >
> >   Defer the KVM_INTERNAL_EXIT until KVM actually attempts to consume the posted
> >   interrupt descriptor on behalf of the vCPU.  Note, KVM may process posted
> >   interrupts when it architecturally should not.  Bugs aside, userspace can at
> >   least rely on KVM to not process posted interrupts if there is no (posted?)
> >   interrupt activity whatsoever.
> 
> How about:
> 
> Defer the KVM_INTERNAL_EXIT until KVM tries to access the contents of
> the VMCS12 posted interrupt descriptor. (Note that KVM may do this
> when it should not, per the architectural specification.)

Works for me!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/12] KVM: x86: Add a return code to kvm_apic_accept_events
  2021-05-20 23:03 ` [PATCH 05/12] KVM: x86: Add a return code to kvm_apic_accept_events Jim Mattson
@ 2021-05-25 19:24   ` Reiji Watanabe
  2021-05-25 20:35     ` Jim Mattson
  0 siblings, 1 reply; 35+ messages in thread
From: Reiji Watanabe @ 2021-05-25 19:24 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Paolo Bonzini, Sean Christopherson

> @@ -9880,11 +9888,16 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>                                     struct kvm_mp_state *mp_state)
>  {
> +       int r = 0;
> +
>         vcpu_load(vcpu);
>         if (kvm_mpx_supported())
>                 kvm_load_guest_fpu(vcpu);
>
> -       kvm_apic_accept_events(vcpu);
> +       r = kvm_apic_accept_events(vcpu);
> +       if (r < 0)
> +               goto out;
> +
>         if ((vcpu->arch.mp_state == KVM_MP_STATE_HALTED ||
>              vcpu->arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD) &&
>             vcpu->arch.pv.pv_unhalted)
> @@ -9892,6 +9905,7 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>         else
>                 mp_state->mp_state = vcpu->arch.mp_state;
>
> +out:
>         if (kvm_mpx_supported())
>                 kvm_put_guest_fpu(vcpu);
>         vcpu_put(vcpu);

With the change, if the return value from kvm_apic_accept_events()
is < 0, kvm_arch_vcpu_ioctl_get_mpstate(), which is called from
KVM_GET_MP_STATE ioctl, doesn't set mp_state returning 0 (success).
It leads KVM_GET_MP_STATE ioctl to return an undefined mp_state for
the success case.

Thanks,
Reiji

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/12] KVM: x86: Add a return code to kvm_apic_accept_events
  2021-05-25 19:24   ` Reiji Watanabe
@ 2021-05-25 20:35     ` Jim Mattson
  0 siblings, 0 replies; 35+ messages in thread
From: Jim Mattson @ 2021-05-25 20:35 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: kvm list, Paolo Bonzini, Sean Christopherson

On Tue, May 25, 2021 at 12:24 PM Reiji Watanabe <reijiw@google.com> wrote:
>
> > @@ -9880,11 +9888,16 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
> >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> >                                     struct kvm_mp_state *mp_state)
> >  {
> > +       int r = 0;
> > +
> >         vcpu_load(vcpu);
> >         if (kvm_mpx_supported())
> >                 kvm_load_guest_fpu(vcpu);
> >
> > -       kvm_apic_accept_events(vcpu);
> > +       r = kvm_apic_accept_events(vcpu);
> > +       if (r < 0)
> > +               goto out;
> > +
> >         if ((vcpu->arch.mp_state == KVM_MP_STATE_HALTED ||
> >              vcpu->arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD) &&
> >             vcpu->arch.pv.pv_unhalted)
> > @@ -9892,6 +9905,7 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> >         else
> >                 mp_state->mp_state = vcpu->arch.mp_state;
> >
> > +out:
> >         if (kvm_mpx_supported())
> >                 kvm_put_guest_fpu(vcpu);
> >         vcpu_put(vcpu);
>
> With the change, if the return value from kvm_apic_accept_events()
> is < 0, kvm_arch_vcpu_ioctl_get_mpstate(), which is called from
> KVM_GET_MP_STATE ioctl, doesn't set mp_state returning 0 (success).
> It leads KVM_GET_MP_STATE ioctl to return an undefined mp_state for
> the success case.

Yikes! I think I intended to return 'r' when it is less than 0 (e.g.
the -ENXIO I introduce later in the series). However, I'm not quite
sure what to do with values of r > 0. I'll look into it and send out
v2.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-05-25 20:35 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-20 23:03 [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Jim Mattson
2021-05-20 23:03 ` [PATCH 01/12] KVM: x86: Remove guest mode check from kvm_check_nested_events Jim Mattson
2021-05-20 23:03 ` [PATCH 02/12] KVM: x86: Wake up a vCPU when kvm_check_nested_events fails Jim Mattson
2021-05-24 15:43   ` Paolo Bonzini
2021-05-24 16:39     ` Jim Mattson
2021-05-24 16:43       ` Paolo Bonzini
2021-05-24 17:10         ` Jim Mattson
2021-05-24 23:10         ` Sean Christopherson
2021-05-24 23:23           ` Jim Mattson
2021-05-24 23:24             ` Sean Christopherson
2021-05-24 23:29               ` Jim Mattson
2021-05-24 23:34                 ` Sean Christopherson
2021-05-20 23:03 ` [PATCH 03/12] KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt Jim Mattson
2021-05-20 23:03 ` [PATCH 04/12] KVM: x86: Add a return code to inject_pending_event Jim Mattson
2021-05-20 23:03 ` [PATCH 05/12] KVM: x86: Add a return code to kvm_apic_accept_events Jim Mattson
2021-05-25 19:24   ` Reiji Watanabe
2021-05-25 20:35     ` Jim Mattson
2021-05-20 23:03 ` [PATCH 06/12] KVM: nVMX: Fail on MMIO completion for nested posted interrupts Jim Mattson
2021-05-20 23:03 ` [PATCH 07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable Jim Mattson
2021-05-24 23:21   ` Sean Christopherson
2021-05-24 23:27     ` Jim Mattson
2021-05-24 23:45       ` Sean Christopherson
2021-05-25  0:03         ` Jim Mattson
2021-05-25  0:11           ` Sean Christopherson
2021-05-25  0:15             ` Jim Mattson
2021-05-25  0:57               ` Sean Christopherson
2021-05-20 23:03 ` [PATCH 08/12] KVM: selftests: Move APIC definitions into a separate file Jim Mattson
2021-05-20 23:03 ` [PATCH 09/12] KVM: selftests: Hoist APIC functions out of individual tests Jim Mattson
2021-05-20 23:03 ` [PATCH 10/12] KVM: selftests: Introduce x2APIC register manipulation functions Jim Mattson
2021-05-20 23:03 ` [PATCH 11/12] KVM: selftests: Introduce prepare_tpr_shadow Jim Mattson
2021-05-20 23:03 ` [PATCH 12/12] KVM: selftests: Add a test of an unbacked nested PI descriptor Jim Mattson
2021-05-21  0:58 ` [PATCH 00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue Sean Christopherson
2021-05-21 12:04 ` Jim Mattson
2021-05-24 15:50 ` Paolo Bonzini
2021-05-24 16:46 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.