linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block
@ 2022-09-21  0:31 Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 01/12] KVM: x86: make vendor code check for all nested events Sean Christopherson
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Non-x86 folks, there's nothing interesting to see here, y'all got pulled
in because removing KVM_REQ_UNHALT requires deleting kvm_clear_request()
from arch code.

Note, this based on:

	https://github.com/sean-jc/linux.git tags/kvm-x86-6.1-1

to pre-resolve conflicts with the event/exception cleanups in there.


In Paolo's words...

The following backtrace:

[ 1355.807187]  kvm_vcpu_map+0x159/0x190 [kvm]
[ 1355.807628]  nested_svm_vmexit+0x4c/0x7f0 [kvm_amd]
[ 1355.808036]  ? kvm_vcpu_block+0x54/0xa0 [kvm]
[ 1355.808450]  svm_check_nested_events+0x97/0x390 [kvm_amd]
[ 1355.808920]  kvm_check_nested_events+0x1c/0x40 [kvm] 
[ 1355.809396]  kvm_arch_vcpu_runnable+0x4e/0x190 [kvm]
[ 1355.809892]  kvm_vcpu_check_block+0x4f/0x100 [kvm]
[ 1355.811259]  kvm_vcpu_block+0x6b/0xa0 [kvm] 

can occur due to kmap being called in non-sleepable (!TASK_RUNNING) context.
The fix is to extend kvm_x86_ops->nested_ops.hv_timer_pending() to cover
all events not already checked in kvm_arch_vcpu_is_runnable(), and then
get rid of the annoying (and wrong) call to kvm_check_nested_events()
from kvm_vcpu_check_block().

Beware, this is not a complete fix, because kvm_guest_apic_has_interrupt()
might still _read_ memory from non-sleepable context.  The fix here is
probably to make kvm_arch_vcpu_is_runnable() return -EAGAIN, and in that
case do a round of kvm_vcpu_check_block() polling in sleepable context.
Nevertheless, it is a good start as it pushes the vmexit into vcpu_block().

The series also does a small cleanup pass on kvm_vcpu_check_block(),
removing KVM_REQ_UNHALT in favor of simply calling kvm_arch_vcpu_runnable()
again.  Now that kvm_check_nested_events() is not called anymore by
kvm_arch_vcpu_runnable(), it is much easier to see that KVM will never
consume the event that caused kvm_vcpu_has_events() to return true,
and therefore it is safe to evaluate it again.

The alternative of propagating the return value of
kvm_arch_vcpu_runnable() up to kvm_vcpu_{block,halt}() is inferior
because it does not quite get right the edge cases where the vCPU becomes
runnable right before schedule() or right after kvm_vcpu_check_block().
While these edge cases are unlikely to truly matter in practice, it is
also pointless to get them "wrong".

v4:
  - Make event request if INIT/SIPI is pending when GIF=>1 (SVM) and
    on nested VM-Enter (VMX).
  - Make an event request at VMXOFF iff it's necessary.
  - Keep the INIT/SIPI pending vs. blocked checks separate (for the
    above nSVM/nVMX fixes).
  - Check the result of kvm_check_nested_events() in vcpu_block().
  - Rename INIT/SIPI helpers (hopefully we'll eventually rename all of
    the related collateral, e.g. "pending_events" is so misleading).
  - Drop pending INIT/SIPI snaphsot to avoid creating weird, conflicting
    code when kvm_check_nested_events() is called by vcpu_block().

v3:
  - https://lore.kernel.org/all/20220822170659.2527086-1-pbonzini@redhat.com
  - do not propagate the return value of  kvm_arch_vcpu_runnable() up to
    kvm_vcpu_{block,halt}()
  - move and reformat the comment in vcpu_block()

    move KVM_REQ_UNHALT removal last

Paolo Bonzini (5):
  KVM: x86: make vendor code check for all nested events
  KVM: x86: lapic does not have to process INIT if it is blocked
  KVM: x86: never write to memory from kvm_vcpu_check_block()
  KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  KVM: remove KVM_REQ_UNHALT

Sean Christopherson (7):
  KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
  KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
  KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
  KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is
    set
  KVM: nVMX: Make an event request if INIT or SIPI is pending on
    VM-Enter
  KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
  KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested
    events

 Documentation/virt/kvm/vcpu-requests.rst | 28 +--------------
 arch/arm64/kvm/arm.c                     |  1 -
 arch/mips/kvm/emulate.c                  |  6 ++--
 arch/powerpc/kvm/book3s_pr.c             |  1 -
 arch/powerpc/kvm/book3s_pr_papr.c        |  1 -
 arch/powerpc/kvm/booke.c                 |  1 -
 arch/powerpc/kvm/powerpc.c               |  1 -
 arch/riscv/kvm/vcpu_insn.c               |  1 -
 arch/s390/kvm/kvm-s390.c                 |  2 --
 arch/x86/include/asm/kvm_host.h          |  2 +-
 arch/x86/kvm/lapic.c                     | 38 ++++++--------------
 arch/x86/kvm/lapic.h                     |  9 ++++-
 arch/x86/kvm/svm/svm.c                   |  3 +-
 arch/x86/kvm/vmx/nested.c                | 33 +++++++++--------
 arch/x86/kvm/vmx/vmx.c                   |  6 ++--
 arch/x86/kvm/x86.c                       | 46 +++++++++++++++---------
 arch/x86/kvm/x86.h                       |  5 ---
 arch/x86/kvm/xen.c                       |  1 -
 include/linux/kvm_host.h                 |  3 +-
 virt/kvm/kvm_main.c                      |  4 +--
 20 files changed, 79 insertions(+), 113 deletions(-)


base-commit: 5df50a4a9b60afba4dd2be76d0f0fb8ae8c9beab
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 01/12] KVM: x86: make vendor code check for all nested events
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 02/12] KVM: nVMX: Make an event request when pending an MTF nested VM-Exit Sean Christopherson
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

From: Paolo Bonzini <pbonzini@redhat.com>

Interrupts, NMIs etc. sent while in guest mode are already handled
properly by the *_interrupt_allowed callbacks, but other events can
cause a vCPU to be runnable that are specific to guest mode.

In the case of VMX there are two, the preemption timer and the
monitor trap.  The VMX preemption timer is already special cased via
the hv_timer_pending callback, but the purpose of the callback can be
easily extended to MTF or in fact any other event that can occur only
in guest mode.

Rename the callback and add an MTF check; kvm_arch_vcpu_runnable()
now can return true if an MTF is pending, without relying on
kvm_vcpu_running()'s call to kvm_check_nested_events().  Until that call
is removed, however, the patch introduces no functional change.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/vmx/nested.c       | 8 +++++++-
 arch/x86/kvm/x86.c              | 8 ++++----
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ee940c4c0f64..c03590d1c5e1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1649,7 +1649,7 @@ struct kvm_x86_nested_ops {
 	bool (*is_exception_vmexit)(struct kvm_vcpu *vcpu, u8 vector,
 				    u32 error_code);
 	int (*check_events)(struct kvm_vcpu *vcpu);
-	bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
+	bool (*has_events)(struct kvm_vcpu *vcpu);
 	void (*triple_fault)(struct kvm_vcpu *vcpu);
 	int (*get_state)(struct kvm_vcpu *vcpu,
 			 struct kvm_nested_state __user *user_kvm_nested_state,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4da0558943ce..85318d803f4f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3929,6 +3929,12 @@ static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
 	       to_vmx(vcpu)->nested.preemption_timer_expired;
 }
 
+static bool vmx_has_nested_events(struct kvm_vcpu *vcpu)
+{
+	return nested_vmx_preemption_timer_pending(vcpu) ||
+	       to_vmx(vcpu)->nested.mtf_pending;
+}
+
 /*
  * Per the Intel SDM's table "Priority Among Concurrent Events", with minor
  * edits to fill in missing examples, e.g. #DB due to split-lock accesses,
@@ -6971,7 +6977,7 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
 	.leave_nested = vmx_leave_nested,
 	.is_exception_vmexit = nested_vmx_is_exception_vmexit,
 	.check_events = vmx_check_nested_events,
-	.hv_timer_pending = nested_vmx_preemption_timer_pending,
+	.has_events = vmx_has_nested_events,
 	.triple_fault = nested_vmx_triple_fault,
 	.get_state = vmx_get_nested_state,
 	.set_state = vmx_set_nested_state,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b8328cb6c14..e1a25e46dbf7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9966,8 +9966,8 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
 	}
 
 	if (is_guest_mode(vcpu) &&
-	    kvm_x86_ops.nested_ops->hv_timer_pending &&
-	    kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
+	    kvm_x86_ops.nested_ops->has_events &&
+	    kvm_x86_ops.nested_ops->has_events(vcpu))
 		*req_immediate_exit = true;
 
 	WARN_ON(kvm_is_exception_pending(vcpu));
@@ -12792,8 +12792,8 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 		return true;
 
 	if (is_guest_mode(vcpu) &&
-	    kvm_x86_ops.nested_ops->hv_timer_pending &&
-	    kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
+	    kvm_x86_ops.nested_ops->has_events &&
+	    kvm_x86_ops.nested_ops->has_events(vcpu))
 		return true;
 
 	if (kvm_xen_has_pending_events(vcpu))
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 02/12] KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 01/12] KVM: x86: make vendor code check for all nested events Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 03/12] KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed Sean Christopherson
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run
through inject_pending_event() and thus vmx_check_nested_events() prior
to re-entering the guest.

MTF currently works by virtue of KVM's hack that calls
kvm_check_nested_events() from kvm_vcpu_running(), but that hack will
be removed in the near future.  Until that call is removed, the patch
introduces no real functional change.

Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation")
Cc: stable@vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 3 +++
 arch/x86/kvm/vmx/vmx.c    | 6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 85318d803f4f..3a080051a4ec 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6632,6 +6632,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 	if (ret)
 		goto error_guest_mode;
 
+	if (vmx->nested.mtf_pending)
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+
 	return 0;
 
 error_guest_mode:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 94c314dc2393..9dba04b6b019 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1665,10 +1665,12 @@ static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
 	    (!vcpu->arch.exception.pending ||
 	     vcpu->arch.exception.vector == DB_VECTOR) &&
 	    (!vcpu->arch.exception_vmexit.pending ||
-	     vcpu->arch.exception_vmexit.vector == DB_VECTOR))
+	     vcpu->arch.exception_vmexit.vector == DB_VECTOR)) {
 		vmx->nested.mtf_pending = true;
-	else
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+	} else {
 		vmx->nested.mtf_pending = false;
+	}
 }
 
 static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 03/12] KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 01/12] KVM: x86: make vendor code check for all nested events Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 02/12] KVM: nVMX: Make an event request when pending an MTF nested VM-Exit Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 04/12] KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific Sean Christopherson
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Rename and invert kvm_vcpu_latch_init() to kvm_apic_init_sipi_allowed()
so as to match the behavior of {interrupt,nmi,smi}_allowed(), and expose
the helper so that it can be used by kvm_vcpu_has_events() to determine
whether or not an INIT or SIPI is pending _and_ can be taken immediately.

Opportunistically replaced usage of the "latch" terminology with "blocked"
and/or "allowed", again to align with KVM's terminology used for all other
event types.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 4 ++--
 arch/x86/kvm/lapic.h | 7 +++++++
 arch/x86/kvm/x86.c   | 9 +++++----
 arch/x86/kvm/x86.h   | 5 -----
 4 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9dda989a1cf0..2bd90effc653 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3051,14 +3051,14 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 	}
 
 	/*
-	 * INITs are latched while CPU is in specific states
+	 * INITs are blocked while CPU is in specific states
 	 * (SMM, VMX root mode, SVM with GIF=0).
 	 * Because a CPU cannot be in these states immediately
 	 * after it has processed an INIT signal (and thus in
 	 * KVM_MP_STATE_INIT_RECEIVED state), just eat SIPIs
 	 * and leave the INIT pending.
 	 */
-	if (kvm_vcpu_latch_init(vcpu)) {
+	if (!kvm_apic_init_sipi_allowed(vcpu)) {
 		WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
 		if (test_bit(KVM_APIC_SIPI, &pe))
 			clear_bit(KVM_APIC_SIPI, &apic->pending_events);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 117a46df5cc1..c3ce6b0b1ea3 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -7,6 +7,7 @@
 #include <linux/kvm_host.h>
 
 #include "hyperv.h"
+#include "kvm_cache_regs.h"
 
 #define KVM_APIC_INIT		0
 #define KVM_APIC_SIPI		1
@@ -228,6 +229,12 @@ static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
 	return lapic_in_kernel(vcpu) && vcpu->arch.apic->pending_events;
 }
 
+static inline bool kvm_apic_init_sipi_allowed(struct kvm_vcpu *vcpu)
+{
+	return !is_smm(vcpu) &&
+	       !static_call(kvm_x86_apic_init_signal_blocked)(vcpu);
+}
+
 static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq)
 {
 	return (irq->delivery_mode == APIC_DM_LOWEST ||
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e1a25e46dbf7..59be7b16b92f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11293,11 +11293,12 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		goto out;
 
 	/*
-	 * KVM_MP_STATE_INIT_RECEIVED means the processor is in
-	 * INIT state; latched init should be reported using
-	 * KVM_SET_VCPU_EVENTS, so reject it here.
+	 * Pending INITs are reported using KVM_SET_VCPU_EVENTS, disallow
+	 * forcing the guest into INIT/SIPI if those events are supposed to be
+	 * blocked.  KVM prioritizes SMI over INIT, so reject INIT/SIPI state
+	 * if an SMI is pending as well.
 	 */
-	if ((kvm_vcpu_latch_init(vcpu) || vcpu->arch.smi_pending) &&
+	if ((!kvm_apic_init_sipi_allowed(vcpu) || vcpu->arch.smi_pending) &&
 	    (mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED ||
 	     mp_state->mp_state == KVM_MP_STATE_INIT_RECEIVED))
 		goto out;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a784ff90740b..829d3134c1eb 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -275,11 +275,6 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
 	return !(kvm->arch.disabled_quirks & quirk);
 }
 
-static inline bool kvm_vcpu_latch_init(struct kvm_vcpu *vcpu)
-{
-	return is_smm(vcpu) || static_call(kvm_x86_apic_init_signal_blocked)(vcpu);
-}
-
 void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
 
 u64 get_kvmclock_ns(struct kvm *kvm);
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 04/12] KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 03/12] KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 05/12] KVM: x86: lapic does not have to process INIT if it is blocked Sean Christopherson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Rename kvm_apic_has_events() to kvm_apic_has_pending_init_or_sipi() so
that it's more obvious that "events" really just means "INIT or SIPI".

Opportunistically clean up a weirdly worded comment that referenced
kvm_apic_has_events() instead of kvm_apic_accept_events().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.h | 2 +-
 arch/x86/kvm/x86.c   | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index c3ce6b0b1ea3..a5ac4a5a5179 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -224,7 +224,7 @@ static inline bool kvm_vcpu_apicv_active(struct kvm_vcpu *vcpu)
 	return lapic_in_kernel(vcpu) && vcpu->arch.apic->apicv_active;
 }
 
-static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
+static inline bool kvm_apic_has_pending_init_or_sipi(struct kvm_vcpu *vcpu)
 {
 	return lapic_in_kernel(vcpu) && vcpu->arch.apic->pending_events;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59be7b16b92f..16a24dd28f26 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11920,8 +11920,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate;
 
 		/*
-		 * To avoid have the INIT path from kvm_apic_has_events() that be
-		 * called with loaded FPU and does not let userspace fix the state.
+		 * All paths that lead to INIT are required to load the guest's
+		 * FPU state (because most paths are buried in KVM_RUN).
 		 */
 		if (init_event)
 			kvm_put_guest_fpu(vcpu);
@@ -12765,7 +12765,7 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (!list_empty_careful(&vcpu->async_pf.done))
 		return true;
 
-	if (kvm_apic_has_events(vcpu))
+	if (kvm_apic_has_pending_init_or_sipi(vcpu))
 		return true;
 
 	if (vcpu->arch.pv.pv_unhalted)
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 05/12] KVM: x86: lapic does not have to process INIT if it is blocked
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 04/12] KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 06/12] KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set Sean Christopherson
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

From: Paolo Bonzini <pbonzini@redhat.com>

Do not return true from kvm_vcpu_has_events() if the vCPU isn' going to
immediately process a pending INIT/SIPI.  INIT/SIPI shouldn't be treated
as wake events if they are blocked.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[sean: rebase onto refactored INIT/SIPI helpers, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 16a24dd28f26..dcc675d4e44b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12765,7 +12765,8 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (!list_empty_careful(&vcpu->async_pf.done))
 		return true;
 
-	if (kvm_apic_has_pending_init_or_sipi(vcpu))
+	if (kvm_apic_has_pending_init_or_sipi(vcpu) &&
+	    kvm_apic_init_sipi_allowed(vcpu))
 		return true;
 
 	if (vcpu->arch.pv.pv_unhalted)
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 06/12] KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 05/12] KVM: x86: lapic does not have to process INIT if it is blocked Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 07/12] KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter Sean Christopherson
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Set KVM_REQ_EVENT if INIT or SIPI is pending when the guest enables GIF.
INIT in particular is blocked when GIF=0 and needs to be processed when
GIF is toggled to '1'.  This bug has been masked by (a) KVM calling
->check_nested_events() in the core run loop and (b) hypervisors toggling
GIF from 0=>1 only when entering guest mode (L1 entering L2).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd599afc85f5..58f0077d9357 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2339,7 +2339,8 @@ void svm_set_gif(struct vcpu_svm *svm, bool value)
 		enable_gif(svm);
 		if (svm->vcpu.arch.smi_pending ||
 		    svm->vcpu.arch.nmi_pending ||
-		    kvm_cpu_has_injectable_intr(&svm->vcpu))
+		    kvm_cpu_has_injectable_intr(&svm->vcpu) ||
+		    kvm_apic_has_pending_init_or_sipi(&svm->vcpu))
 			kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	} else {
 		disable_gif(svm);
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 07/12] KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 06/12] KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 08/12] KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending Sean Christopherson
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Evaluate interrupts, i.e. set KVM_REQ_EVENT, if INIT or SIPI is pending
when emulating nested VM-Enter.  INIT is blocked while the CPU is in VMX
root mode, but not in VMX non-root, i.e. becomes unblocked on VM-Enter.
This bug has been masked by KVM calling ->check_nested_events() in the
core run loop, but that hack will be fixed in the near future.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3a080051a4ec..5922531f6c52 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3377,6 +3377,8 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		(CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
 	if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu))
 		evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu);
+	if (!evaluate_pending_interrupts)
+		evaluate_pending_interrupts |= kvm_apic_has_pending_init_or_sipi(vcpu);
 
 	if (!vmx->nested.nested_run_pending ||
 	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
@@ -3457,18 +3459,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	}
 
 	/*
-	 * If L1 had a pending IRQ/NMI until it executed
-	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
-	 * disallowed (e.g. interrupts disabled), L0 needs to
-	 * evaluate if this pending event should cause an exit from L2
-	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
-	 * intercept EXTERNAL_INTERRUPT).
-	 *
-	 * Usually this would be handled by the processor noticing an
-	 * IRQ/NMI window request, or checking RVI during evaluation of
-	 * pending virtual interrupts.  However, this setting was done
-	 * on VMCS01 and now VMCS02 is active instead. Thus, we force L0
-	 * to perform pending event evaluation by requesting a KVM_REQ_EVENT.
+	 * Re-evaluate pending events if L1 had a pending IRQ/NMI/INIT/SIPI
+	 * when it executed VMLAUNCH/VMRESUME, as entering non-root mode can
+	 * effectively unblock various events, e.g. INIT/SIPI cause VM-Exit
+	 * unconditionally.
 	 */
 	if (unlikely(evaluate_pending_interrupts))
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 08/12] KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (6 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 07/12] KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 09/12] KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events Sean Christopherson
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Explicitly check for a pending INIT/SIPI event when emulating VMXOFF
instead of blindly making an event request.  There's obviously no need
to evaluate events if none are pending.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 5922531f6c52..8f67a9c4a287 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5193,8 +5193,8 @@ static int handle_vmxoff(struct kvm_vcpu *vcpu)
 
 	free_nested(vcpu);
 
-	/* Process a latched INIT during time CPU was in VMX operation */
-	kvm_make_request(KVM_REQ_EVENT, vcpu);
+	if (kvm_apic_has_pending_init_or_sipi(vcpu))
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
 
 	return nested_vmx_succeed(vcpu);
 }
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 09/12] KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (7 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 08/12] KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2022-09-21  0:31 ` [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block() Sean Christopherson
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

Don't snapshot pending INIT/SIPI events prior to checking nested events,
architecturally there's nothing wrong with KVM processing (dropping) a
SIPI that is received immediately after synthesizing a VM-Exit.  Taking
and consuming the snapshot makes the flow way more subtle than it needs
to be, e.g. nVMX consumes/clears events that trigger VM-Exit (INIT/SIPI),
and so at first glance it appears that KVM is double-dipping on pending
INITs and SIPIs.  But that's not the case because INIT is blocked
unconditionally in VMX root mode the CPU cannot be in wait-for_SIPI after
VM-Exit, i.e. the paths that truly consume the snapshot are unreachable
if apic->pending_events is modified by kvm_check_nested_events().

nSVM is a similar story as GIF is cleared by the CPU on VM-Exit; INIT is
blocked regardless of whether or not it was pending prior to VM-Exit.

Drop the snapshot logic so that a future fix doesn't create weirdness
when kvm_vcpu_running()'s call to kvm_check_nested_events() is moved to
vcpu_block().  In that case, kvm_check_nested_events() will be called
immediately before kvm_apic_accept_events(), which raises the obvious
question of why that change doesn't break the snapshot logic.

Note, there is a subtle functional change.  Previously, KVM would clear
pending SIPIs if and only SIPI was pending prior to VM-Exit, whereas now
KVM clears pending SIPI unconditionally if INIT+SIPI are blocked.  The
latter is architecturally allowed, as SIPI is ignored if the CPU is not
in wait-for-SIPI mode (arguably, KVM should be even more aggressive in
dropping SIPIs).  It is software's responsibility to ensure the SIPI is
delivered, i.e. software shouldn't be firing INIT-SIPI at a CPU until
it knows with 100% certaining that the target CPU isn't in VMX root mode.

Furthermore, the existing code is extra weird as SIPIs that arrive after
VM-Exit _are_ dropped if there also happened to be a pending SIPI before
VM-Exit.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 36 ++++++++++--------------------------
 1 file changed, 10 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2bd90effc653..d7639d126e6c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3025,17 +3025,8 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u8 sipi_vector;
 	int r;
-	unsigned long pe;
 
-	if (!lapic_in_kernel(vcpu))
-		return 0;
-
-	/*
-	 * Read pending events before calling the check_events
-	 * callback.
-	 */
-	pe = smp_load_acquire(&apic->pending_events);
-	if (!pe)
+	if (!kvm_apic_has_pending_init_or_sipi(vcpu))
 		return 0;
 
 	if (is_guest_mode(vcpu)) {
@@ -3043,38 +3034,31 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 		if (r < 0)
 			return r == -EBUSY ? 0 : r;
 		/*
-		 * If an event has happened and caused a vmexit,
-		 * we know INITs are latched and therefore
-		 * we will not incorrectly deliver an APIC
-		 * event instead of a vmexit.
+		 * Continue processing INIT/SIPI even if a nested VM-Exit
+		 * occurred, e.g. pending SIPIs should be dropped if INIT+SIPI
+		 * are blocked as a result of transitioning to VMX root mode.
 		 */
 	}
 
 	/*
-	 * INITs are blocked while CPU is in specific states
-	 * (SMM, VMX root mode, SVM with GIF=0).
-	 * Because a CPU cannot be in these states immediately
-	 * after it has processed an INIT signal (and thus in
-	 * KVM_MP_STATE_INIT_RECEIVED state), just eat SIPIs
-	 * and leave the INIT pending.
+	 * INITs are blocked while CPU is in specific states (SMM, VMX root
+	 * mode, SVM with GIF=0), while SIPIs are dropped if the CPU isn't in
+	 * wait-for-SIPI (WFS).
 	 */
 	if (!kvm_apic_init_sipi_allowed(vcpu)) {
 		WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
-		if (test_bit(KVM_APIC_SIPI, &pe))
-			clear_bit(KVM_APIC_SIPI, &apic->pending_events);
+		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
 		return 0;
 	}
 
-	if (test_bit(KVM_APIC_INIT, &pe)) {
-		clear_bit(KVM_APIC_INIT, &apic->pending_events);
+	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
 		kvm_vcpu_reset(vcpu, true);
 		if (kvm_vcpu_is_bsp(apic->vcpu))
 			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 		else
 			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
 	}
-	if (test_bit(KVM_APIC_SIPI, &pe)) {
-		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
+	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events)) {
 		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
 			/* evaluate pending_events before reading the vector */
 			smp_rmb();
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (8 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 09/12] KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events Sean Christopherson
@ 2022-09-21  0:31 ` Sean Christopherson
  2023-12-07  1:03   ` Jim Mattson
  2022-09-21  0:32 ` [PATCH v4 11/12] KVM: mips, x86: do not rely on KVM_REQ_UNHALT Sean Christopherson
  2022-09-21  0:32 ` [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT Sean Christopherson
  11 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

From: Paolo Bonzini <pbonzini@redhat.com>

kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore
it cannot sleep.  Writing to guest memory is therefore forbidden, but it
can happen on AMD processors if kvm_check_nested_events() causes a vmexit.

Fortunately, all events that are caught by kvm_check_nested_events() are
also recognized by kvm_vcpu_has_events() through vendor callbacks such as
kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so
remove the call and postpone the actual processing to vcpu_block().

Opportunistically honor the return of kvm_check_nested_events().  KVM
punted on the check in kvm_vcpu_running() because the only error path is
if vmx_complete_nested_posted_interrupt() fails, in which case KVM exits
to userspace with "internal error" i.e. the VM is likely dead anyways so
it wasn't worth overloading the return of kvm_vcpu_running().

Add the check mostly so that KVM is consistent with itself; the return of
the call via kvm_apic_accept_events()=>kvm_check_nested_events() that
immediately follows  _is_ checked.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[sean: check and handle return of kvm_check_nested_events()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dcc675d4e44b..8aeacbc2bff9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10815,6 +10815,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
 			return 1;
 	}
 
+	/*
+	 * Evaluate nested events before exiting the halted state.  This allows
+	 * the halt state to be recorded properly in the VMCS12's activity
+	 * state field (AMD does not have a similar field and a VM-Exit always
+	 * causes a spurious wakeup from HLT).
+	 */
+	if (is_guest_mode(vcpu)) {
+		if (kvm_check_nested_events(vcpu) < 0)
+			return 0;
+	}
+
 	if (kvm_apic_accept_events(vcpu) < 0)
 		return 0;
 	switch(vcpu->arch.mp_state) {
@@ -10837,9 +10848,6 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu))
-		kvm_check_nested_events(vcpu);
-
 	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
 		!vcpu->arch.apf.halted);
 }
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 11/12] KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (9 preceding siblings ...)
  2022-09-21  0:31 ` [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block() Sean Christopherson
@ 2022-09-21  0:32 ` Sean Christopherson
  2022-09-22 13:17   ` Philippe Mathieu-Daudé
  2022-09-21  0:32 ` [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT Sean Christopherson
  11 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:32 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

From: Paolo Bonzini <pbonzini@redhat.com>

KVM_REQ_UNHALT is a weird request that simply reports the value of
kvm_arch_vcpu_runnable() on exit from kvm_vcpu_halt().  Only
MIPS and x86 are looking at it, the others just clear it.  Check
the state of the vCPU directly so that the request is handled
as a nop on all architectures.

No functional change intended, except for corner cases where an
event arrive immediately after a signal become pending or after
another similar host-side event.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/mips/kvm/emulate.c | 7 +++----
 arch/x86/kvm/x86.c      | 9 ++++++++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
index b494d8d39290..1d7c56defe93 100644
--- a/arch/mips/kvm/emulate.c
+++ b/arch/mips/kvm/emulate.c
@@ -955,13 +955,12 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu)
 		kvm_vcpu_halt(vcpu);
 
 		/*
-		 * We we are runnable, then definitely go off to user space to
+		 * We are runnable, then definitely go off to user space to
 		 * check if any I/O interrupts are pending.
 		 */
-		if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
-			kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+		if (kvm_arch_vcpu_runnable(vcpu))
 			vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
-		}
 	}
 
 	return EMULATE_DONE;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8aeacbc2bff9..8b1ff7e91ecb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10811,7 +10811,14 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
 		if (hv_timer)
 			kvm_lapic_switch_to_hv_timer(vcpu);
 
-		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
+		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+
+		/*
+		 * If the vCPU is not runnable, a signal or another host event
+		 * of some kind is pending; service it without changing the
+		 * vCPU's activity state.
+		 */
+		if (!kvm_arch_vcpu_runnable(vcpu))
 			return 1;
 	}
 
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT
  2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
                   ` (10 preceding siblings ...)
  2022-09-21  0:32 ` [PATCH v4 11/12] KVM: mips, x86: do not rely on KVM_REQ_UNHALT Sean Christopherson
@ 2022-09-21  0:32 ` Sean Christopherson
  2022-09-22 14:52   ` Marc Zyngier
  11 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2022-09-21  0:32 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, linux-arm-kernel, kvmarm,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	Maxim Levitsky

From: Paolo Bonzini <pbonzini@redhat.com>

KVM_REQ_UNHALT is now unnecessary because it is replaced by the return
value of kvm_vcpu_block/kvm_vcpu_halt.  Remove it.

No functional change intended.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/vcpu-requests.rst | 28 +-----------------------
 arch/arm64/kvm/arm.c                     |  1 -
 arch/mips/kvm/emulate.c                  |  1 -
 arch/powerpc/kvm/book3s_pr.c             |  1 -
 arch/powerpc/kvm/book3s_pr_papr.c        |  1 -
 arch/powerpc/kvm/booke.c                 |  1 -
 arch/powerpc/kvm/powerpc.c               |  1 -
 arch/riscv/kvm/vcpu_insn.c               |  1 -
 arch/s390/kvm/kvm-s390.c                 |  2 --
 arch/x86/kvm/x86.c                       |  3 ---
 arch/x86/kvm/xen.c                       |  1 -
 include/linux/kvm_host.h                 |  3 +--
 virt/kvm/kvm_main.c                      |  4 +---
 13 files changed, 3 insertions(+), 45 deletions(-)

diff --git a/Documentation/virt/kvm/vcpu-requests.rst b/Documentation/virt/kvm/vcpu-requests.rst
index 31f62b64e07b..87f04c1fa53d 100644
--- a/Documentation/virt/kvm/vcpu-requests.rst
+++ b/Documentation/virt/kvm/vcpu-requests.rst
@@ -97,7 +97,7 @@ VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
 This means general bitops, like those documented in [atomic-ops]_ could
 also be used, e.g. ::
 
-  clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
+  clear_bit(KVM_REQ_UNBLOCK & KVM_REQUEST_MASK, &vcpu->requests);
 
 However, VCPU request users should refrain from doing so, as it would
 break the abstraction.  The first 8 bits are reserved for architecture
@@ -126,17 +126,6 @@ KVM_REQ_UNBLOCK
   or in order to update the interrupt routing and ensure that assigned
   devices will wake up the vCPU.
 
-KVM_REQ_UNHALT
-
-  This request may be made from the KVM common function kvm_vcpu_block(),
-  which is used to emulate an instruction that causes a CPU to halt until
-  one of an architectural specific set of events and/or interrupts is
-  received (determined by checking kvm_arch_vcpu_runnable()).  When that
-  event or interrupt arrives kvm_vcpu_block() makes the request.  This is
-  in contrast to when kvm_vcpu_block() returns due to any other reason,
-  such as a pending signal, which does not indicate the VCPU's halt
-  emulation should stop, and therefore does not make the request.
-
 KVM_REQ_OUTSIDE_GUEST_MODE
 
   This "request" ensures the target vCPU has exited guest mode prior to the
@@ -297,21 +286,6 @@ architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
 to check if it should awaken.  One reason to do so is to provide
 architectures a function where requests may be checked if necessary.
 
-Clearing Requests
------------------
-
-Generally it only makes sense for the receiving VCPU thread to clear a
-request.  However, in some circumstances, such as when the requesting
-thread and the receiving VCPU thread are executed serially, such as when
-they are the same thread, or when they are using some form of concurrency
-control to temporarily execute synchronously, then it's possible to know
-that the request may be cleared immediately, rather than waiting for the
-receiving VCPU thread to handle the request in VCPU RUN.  The only current
-examples of this are kvm_vcpu_block() calls made by VCPUs to block
-themselves.  A possible side-effect of that call is to make the
-KVM_REQ_UNHALT request, which may then be cleared immediately when the
-VCPU returns from the call.
-
 References
 ==========
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2ff0ef62abad..4f949b64fdc9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -666,7 +666,6 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
 
 	kvm_vcpu_halt(vcpu);
 	vcpu_clear_flag(vcpu, IN_WFIT);
-	kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 
 	preempt_disable();
 	vgic_v4_load(vcpu);
diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
index 1d7c56defe93..edaec93a1a1f 100644
--- a/arch/mips/kvm/emulate.c
+++ b/arch/mips/kvm/emulate.c
@@ -958,7 +958,6 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu)
 		 * We are runnable, then definitely go off to user space to
 		 * check if any I/O interrupts are pending.
 		 */
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 		if (kvm_arch_vcpu_runnable(vcpu))
 			vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
 	}
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d6abed6e51e6..9fc4dd8f66eb 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -499,7 +499,6 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
 	if (msr & MSR_POW) {
 		if (!vcpu->arch.pending_exceptions) {
 			kvm_vcpu_halt(vcpu);
-			kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 			vcpu->stat.generic.halt_wakeup++;
 
 			/* Unset POW bit after we woke up */
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index a1f2978b2a86..b2c89e850d7a 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -393,7 +393,6 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 	case H_CEDE:
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_halt(vcpu);
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 		vcpu->stat.generic.halt_wakeup++;
 		return EMULATE_DONE;
 	case H_LOGICAL_CI_LOAD:
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 06c5830a93f9..7b4920e9fd26 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -719,7 +719,6 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.shared->msr & MSR_WE) {
 		local_irq_enable();
 		kvm_vcpu_halt(vcpu);
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 		hard_irq_disable();
 
 		kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fb1490761c87..ec9c1e3c2ff4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -239,7 +239,6 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 	case EV_HCALL_TOKEN(EV_IDLE):
 		r = EV_SUCCESS;
 		kvm_vcpu_halt(vcpu);
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 		break;
 	default:
 		r = EV_UNIMPLEMENTED;
diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
index 7eb90a47b571..0bb52761a3f7 100644
--- a/arch/riscv/kvm/vcpu_insn.c
+++ b/arch/riscv/kvm/vcpu_insn.c
@@ -191,7 +191,6 @@ void kvm_riscv_vcpu_wfi(struct kvm_vcpu *vcpu)
 		kvm_vcpu_srcu_read_unlock(vcpu);
 		kvm_vcpu_halt(vcpu);
 		kvm_vcpu_srcu_read_lock(vcpu);
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
 }
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index edfd4bbd0cba..aa39ea4582bd 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4343,8 +4343,6 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 		goto retry;
 	}
 
-	/* nothing to do, just clear the request */
-	kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	/* we left the vsie handler, nothing to do, just clear the request */
 	kvm_clear_request(KVM_REQ_VSIE_RESTART, vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8b1ff7e91ecb..8927f37a07c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10811,8 +10811,6 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
 		if (hv_timer)
 			kvm_lapic_switch_to_hv_timer(vcpu);
 
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
-
 		/*
 		 * If the vCPU is not runnable, a signal or another host event
 		 * of some kind is pending; service it without changing the
@@ -11032,7 +11030,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 			r = 0;
 			goto out;
 		}
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 		r = -EAGAIN;
 		if (signal_pending(current)) {
 			r = -EINTR;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 280cb5dc7341..93c628d3e3a9 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1065,7 +1065,6 @@ static bool kvm_xen_schedop_poll(struct kvm_vcpu *vcpu, bool longmode,
 			del_timer(&vcpu->arch.xen.poll_timer);
 
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
-		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
 
 	vcpu->arch.xen.poll_evtchn = 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 04c7e5f2f727..32f259fa5801 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -151,12 +151,11 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQUEST_NO_ACTION      BIT(10)
 /*
  * Architecture-independent vcpu->requests bit members
- * Bits 4-7 are reserved for more arch-independent bits.
+ * Bits 3-7 are reserved for more arch-independent bits.
  */
 #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
-#define KVM_REQ_UNHALT            3
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dcf47da44844..26383e63d9dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3409,10 +3409,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 	int ret = -EINTR;
 	int idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	if (kvm_arch_vcpu_runnable(vcpu)) {
-		kvm_make_request(KVM_REQ_UNHALT, vcpu);
+	if (kvm_arch_vcpu_runnable(vcpu))
 		goto out;
-	}
 	if (kvm_cpu_has_pending_timer(vcpu))
 		goto out;
 	if (signal_pending(current))
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 11/12] KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  2022-09-21  0:32 ` [PATCH v4 11/12] KVM: mips, x86: do not rely on KVM_REQ_UNHALT Sean Christopherson
@ 2022-09-22 13:17   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-09-22 13:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Oliver Upton,
	Atish Patra, David Hildenbrand, kvm, Linux ARM, kvmarm,
	open list:BROADCOM NVRAM DRIVER, linuxppc-dev, kvm-riscv,
	linux-riscv, open list, Maxim Levitsky

On Wed, Sep 21, 2022 at 2:34 AM Sean Christopherson <seanjc@google.com> wrote:
>
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> KVM_REQ_UNHALT is a weird request that simply reports the value of
> kvm_arch_vcpu_runnable() on exit from kvm_vcpu_halt().  Only
> MIPS and x86 are looking at it, the others just clear it.  Check
> the state of the vCPU directly so that the request is handled
> as a nop on all architectures.
>
> No functional change intended, except for corner cases where an
> event arrive immediately after a signal become pending or after
> another similar host-side event.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/mips/kvm/emulate.c | 7 +++----
>  arch/x86/kvm/x86.c      | 9 ++++++++-
>  2 files changed, 11 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT
  2022-09-21  0:32 ` [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT Sean Christopherson
@ 2022-09-22 14:52   ` Marc Zyngier
  0 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2022-09-22 14:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Atish Patra, David Hildenbrand,
	kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev,
	kvm-riscv, linux-riscv, linux-kernel, Maxim Levitsky

On Wed, 21 Sep 2022 01:32:01 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> KVM_REQ_UNHALT is now unnecessary because it is replaced by the return
> value of kvm_vcpu_block/kvm_vcpu_halt.  Remove it.
> 
> No functional change intended.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2022-09-21  0:31 ` [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block() Sean Christopherson
@ 2023-12-07  1:03   ` Jim Mattson
  2023-12-07 16:21     ` Sean Christopherson
  0 siblings, 1 reply; 23+ messages in thread
From: Jim Mattson @ 2023-12-07  1:03 UTC (permalink / raw)
  To: seanjc
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup, aou, atishp,
	borntraeger, chenhuacai, david, frankja, imbrenda, james.morse,
	kvm-riscv, kvm, kvmarm, linux-arm-kernel, linux-kernel,
	linux-mips, linux-riscv, linuxppc-dev, maz, mlevitsk,
	oliver.upton, palmer, paul.walmsley, pbonzini, suzuki.poulose

kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore
it cannot sleep.  Writing to guest memory is therefore forbidden, but it
can happen on AMD processors if kvm_check_nested_events() causes a vmexit.

Fortunately, all events that are caught by kvm_check_nested_events() are
also recognized by kvm_vcpu_has_events() through vendor callbacks such as
kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so
remove the call and postpone the actual processing to vcpu_block().

Opportunistically honor the return of kvm_check_nested_events().  KVM
punted on the check in kvm_vcpu_running() because the only error path is
if vmx_complete_nested_posted_interrupt() fails, in which case KVM exits
to userspace with "internal error" i.e. the VM is likely dead anyways so
it wasn't worth overloading the return of kvm_vcpu_running().

Add the check mostly so that KVM is consistent with itself; the return of
the call via kvm_apic_accept_events()=>kvm_check_nested_events() that
immediately follows  _is_ checked.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[sean: check and handle return of kvm_check_nested_events()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dcc675d4e44b..8aeacbc2bff9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10815,6 +10815,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
 			return 1;
 	}
 
+	/*
+	 * Evaluate nested events before exiting the halted state.  This allows
+	 * the halt state to be recorded properly in the VMCS12's activity
+	 * state field (AMD does not have a similar field and a VM-Exit always
+	 * causes a spurious wakeup from HLT).
+	 */
+	if (is_guest_mode(vcpu)) {
+		if (kvm_check_nested_events(vcpu) < 0)
+			return 0;
+	}
+
 	if (kvm_apic_accept_events(vcpu) < 0)
 		return 0;
 	switch(vcpu->arch.mp_state) {
@@ -10837,9 +10848,6 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu))
-		kvm_check_nested_events(vcpu);
-
 	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
 		!vcpu->arch.apf.halted);
 }

This commit breaks delivery of a (virtualized) posted interrupt from
an L1 vCPU to a halted L2 vCPU.

Looking back at commit e6c67d8cf117 ("KVM: nVMX: Wake blocked vCPU in
guest-mode if pending interrupt in virtual APICv"), Liran wrote:

    Note that this also handles the case of nested posted-interrupt by the
    fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
    called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
    kvm_vcpu_running() -> vmx_check_nested_events() ->
    vmx_complete_nested_posted_interrupt().

Clearly, that is no longer the case.

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2023-12-07  1:03   ` Jim Mattson
@ 2023-12-07 16:21     ` Sean Christopherson
  2023-12-10 22:52       ` Jim Mattson
  0 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2023-12-07 16:21 UTC (permalink / raw)
  To: Jim Mattson
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup, aou, atishp,
	borntraeger, chenhuacai, david, frankja, imbrenda, james.morse,
	kvm-riscv, kvm, kvmarm, linux-arm-kernel, linux-kernel,
	linux-mips, linux-riscv, linuxppc-dev, maz, mlevitsk,
	oliver.upton, palmer, paul.walmsley, pbonzini, suzuki.poulose

On Wed, Dec 06, 2023, Jim Mattson wrote:
> kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore
> it cannot sleep.  Writing to guest memory is therefore forbidden, but it
> can happen on AMD processors if kvm_check_nested_events() causes a vmexit.
> 
> Fortunately, all events that are caught by kvm_check_nested_events() are
> also recognized by kvm_vcpu_has_events() through vendor callbacks such as
> kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so
> remove the call and postpone the actual processing to vcpu_block().
> 
> Opportunistically honor the return of kvm_check_nested_events().  KVM
> punted on the check in kvm_vcpu_running() because the only error path is
> if vmx_complete_nested_posted_interrupt() fails, in which case KVM exits
> to userspace with "internal error" i.e. the VM is likely dead anyways so
> it wasn't worth overloading the return of kvm_vcpu_running().
> 
> Add the check mostly so that KVM is consistent with itself; the return of
> the call via kvm_apic_accept_events()=>kvm_check_nested_events() that
> immediately follows  _is_ checked.
> 
> Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> [sean: check and handle return of kvm_check_nested_events()]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index dcc675d4e44b..8aeacbc2bff9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10815,6 +10815,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
>  			return 1;
>  	}
>  
> +	/*
> +	 * Evaluate nested events before exiting the halted state.  This allows
> +	 * the halt state to be recorded properly in the VMCS12's activity
> +	 * state field (AMD does not have a similar field and a VM-Exit always
> +	 * causes a spurious wakeup from HLT).
> +	 */
> +	if (is_guest_mode(vcpu)) {
> +		if (kvm_check_nested_events(vcpu) < 0)
> +			return 0;
> +	}
> +
>  	if (kvm_apic_accept_events(vcpu) < 0)
>  		return 0;
>  	switch(vcpu->arch.mp_state) {
> @@ -10837,9 +10848,6 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
>  {
> -	if (is_guest_mode(vcpu))
> -		kvm_check_nested_events(vcpu);
> -
>  	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>  		!vcpu->arch.apf.halted);
>  }
> 
> This commit breaks delivery of a (virtualized) posted interrupt from
> an L1 vCPU to a halted L2 vCPU.
> 
> Looking back at commit e6c67d8cf117 ("KVM: nVMX: Wake blocked vCPU in
> guest-mode if pending interrupt in virtual APICv"), Liran wrote:
> 
>     Note that this also handles the case of nested posted-interrupt by the
>     fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
>     called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
>     kvm_vcpu_running() -> vmx_check_nested_events() ->
>     vmx_complete_nested_posted_interrupt().
> 
> Clearly, that is no longer the case.

Doh.  We got the less obvious cases and missed the obvious one.

Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt().  That
thing should really be folded into vmx_has_nested_events().

Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
specifically checks if L1 interrupts are blocked.

Compile tested only, and definitely needs to be chunked into multiple patches,
but I think something like this mess?

---
 arch/x86/include/asm/kvm-x86-ops.h |  1 -
 arch/x86/include/asm/kvm_host.h    |  1 -
 arch/x86/kvm/lapic.c               | 20 ++---------------
 arch/x86/kvm/lapic.h               | 12 ++++++++++
 arch/x86/kvm/vmx/nested.c          | 36 ++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.c             | 34 ++++++++--------------------
 arch/x86/kvm/vmx/vmx.h             |  1 +
 arch/x86/kvm/x86.c                 | 10 +--------
 8 files changed, 59 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 378ed944b849..6f81774c1dd0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -85,7 +85,6 @@ KVM_X86_OP_OPTIONAL(update_cr8_intercept)
 KVM_X86_OP(refresh_apicv_exec_ctrl)
 KVM_X86_OP_OPTIONAL(hwapic_irr_update)
 KVM_X86_OP_OPTIONAL(hwapic_isr_update)
-KVM_X86_OP_OPTIONAL_RET0(guest_apic_has_interrupt)
 KVM_X86_OP_OPTIONAL(load_eoi_exitmap)
 KVM_X86_OP_OPTIONAL(set_virtual_apic_mode)
 KVM_X86_OP_OPTIONAL(set_apic_access_page_addr)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c8c7e2475a18..fc1466035a8c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1685,7 +1685,6 @@ struct kvm_x86_ops {
 	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
 	void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
 	void (*hwapic_isr_update)(int isr);
-	bool (*guest_apic_has_interrupt)(struct kvm_vcpu *vcpu);
 	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 	void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu);
 	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 245b20973cae..6d74c42accdf 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -55,7 +55,6 @@
 #define APIC_VERSION			0x14UL
 #define LAPIC_MMIO_LENGTH		(1 << 12)
 /* followed define is not in apicdef.h */
-#define MAX_APIC_VECTOR			256
 #define APIC_VECTORS_PER_REG		32
 
 static bool lapic_timer_advance_dynamic __read_mostly;
@@ -619,21 +618,6 @@ static const unsigned int apic_lvt_mask[KVM_APIC_MAX_NR_LVT_ENTRIES] = {
 	[LVT_CMCI] = LVT_MASK | APIC_MODE_MASK
 };
 
-static int find_highest_vector(void *bitmap)
-{
-	int vec;
-	u32 *reg;
-
-	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
-	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
-		reg = bitmap + REG_POS(vec);
-		if (*reg)
-			return __fls(*reg) + vec;
-	}
-
-	return -1;
-}
-
 static u8 count_vectors(void *bitmap)
 {
 	int vec;
@@ -697,7 +681,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
 static inline int apic_search_irr(struct kvm_lapic *apic)
 {
-	return find_highest_vector(apic->regs + APIC_IRR);
+	return kvm_apic_find_highest_vector(apic->regs + APIC_IRR);
 }
 
 static inline int apic_find_highest_irr(struct kvm_lapic *apic)
@@ -775,7 +759,7 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 	if (likely(apic->highest_isr_cache != -1))
 		return apic->highest_isr_cache;
 
-	result = find_highest_vector(apic->regs + APIC_ISR);
+	result = kvm_apic_find_highest_vector(apic->regs + APIC_ISR);
 	ASSERT(result == -1 || result >= 16);
 
 	return result;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0a0ea4b5dd8c..4239bf329748 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -12,6 +12,8 @@
 #define KVM_APIC_INIT		0
 #define KVM_APIC_SIPI		1
 
+#define MAX_APIC_VECTOR			256
+
 #define APIC_SHORT_MASK			0xc0000
 #define APIC_DEST_NOSHORT		0x0
 #define APIC_DEST_MASK			0x800
@@ -151,6 +153,16 @@ u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic);
 #define VEC_POS(v) ((v) & (32 - 1))
 #define REG_POS(v) (((v) >> 5) << 4)
 
+static inline int kvm_apic_find_highest_vector(unsigned long *bitmap)
+{
+	unsigned long bit = find_last_bit(bitmap, MAX_APIC_VECTOR);
+
+	if (bit == MAX_APIC_VECTOR)
+		return -1;
+
+	return bit;
+}
+
 static inline void kvm_lapic_clear_vector(int vec, void *bitmap)
 {
 	clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4ba46e1b29d2..28b3c1c0830f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3964,8 +3964,40 @@ static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
 
 static bool vmx_has_nested_events(struct kvm_vcpu *vcpu)
 {
-	return nested_vmx_preemption_timer_pending(vcpu) ||
-	       to_vmx(vcpu)->nested.mtf_pending;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	void *vapic = vmx->nested.virtual_apic_map.hva;
+	int max_irr, vppr;
+
+	if (nested_vmx_preemption_timer_pending(vcpu) ||
+	    vmx->nested.mtf_pending)
+		return true;
+
+	if (!nested_cpu_has_vid(get_vmcs12(vcpu)) ||
+	    __vmx_interrupt_blocked(vcpu))
+		return false;
+
+	if (!vapic)
+		return false;
+
+	vppr = *((u32 *)(vapic + APIC_PROCPRI));
+
+	max_irr = vmx_get_rvi();
+	if ((max_irr & 0xf0) > (vppr & 0xf0))
+		return true;
+
+	max_irr = kvm_apic_find_highest_vector(vapic + APIC_IRR);
+	if (max_irr > 0 && (max_irr & 0xf0) > (vppr & 0xf0))
+		return true;
+
+	if (vmx->nested.pi_desc && pi_test_on(vmx->nested.pi_desc)) {
+		void *pir = vmx->nested.pi_desc->pir;
+
+		max_irr = kvm_apic_find_highest_vector(pir);
+		if (max_irr > 0 && (max_irr & 0xf0) > (vppr & 0xf0))
+			return true;
+	}
+
+	return false;
 }
 
 /*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d30df9b3fe3e..be8ab49e9965 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4107,26 +4107,6 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
-static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	void *vapic_page;
-	u32 vppr;
-	int rvi;
-
-	if (WARN_ON_ONCE(!is_guest_mode(vcpu)) ||
-		!nested_cpu_has_vid(get_vmcs12(vcpu)) ||
-		WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn))
-		return false;
-
-	rvi = vmx_get_rvi();
-
-	vapic_page = vmx->nested.virtual_apic_map.hva;
-	vppr = *((u32 *)(vapic_page + APIC_PROCPRI));
-
-	return ((rvi & 0xf0) > (vppr & 0xf0));
-}
-
 static void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -5040,16 +5020,21 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 	return !vmx_nmi_blocked(vcpu);
 }
 
-bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
+bool __vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
-		return false;
-
 	return !(vmx_get_rflags(vcpu) & X86_EFLAGS_IF) ||
 	       (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
 		(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }
 
+bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
+{
+	if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
+		return false;
+
+	return __vmx_interrupt_blocked(vcpu);
+}
+
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	if (to_vmx(vcpu)->nested.nested_run_pending)
@@ -8335,7 +8320,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
 	.hwapic_irr_update = vmx_hwapic_irr_update,
 	.hwapic_isr_update = vmx_hwapic_isr_update,
-	.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
 	.deliver_interrupt = vmx_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 45cee1a8bc0a..4097afed4425 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -400,6 +400,7 @@ u64 construct_eptp(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level);
 bool vmx_guest_inject_ac(struct kvm_vcpu *vcpu);
 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
 bool vmx_nmi_blocked(struct kvm_vcpu *vcpu);
+bool __vmx_interrupt_blocked(struct kvm_vcpu *vcpu);
 bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu);
 bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu);
 void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1983947b8965..b9e599a4b487 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12974,12 +12974,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		kvm_arch_free_memslot(kvm, old);
 }
 
-static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
-{
-	return (is_guest_mode(vcpu) &&
-		static_call(kvm_x86_guest_apic_has_interrupt)(vcpu));
-}
-
 static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 {
 	if (!list_empty_careful(&vcpu->async_pf.done))
@@ -13010,9 +13004,7 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (kvm_test_request(KVM_REQ_PMI, vcpu))
 		return true;
 
-	if (kvm_arch_interrupt_allowed(vcpu) &&
-	    (kvm_cpu_has_interrupt(vcpu) ||
-	    kvm_guest_apic_has_interrupt(vcpu)))
+	if (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu))
 		return true;
 
 	if (kvm_hv_has_stimer_pending(vcpu))

base-commit: 1ab097653e4dd8d23272d028a61352c23486fd4a
-- 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2023-12-07 16:21     ` Sean Christopherson
@ 2023-12-10 22:52       ` Jim Mattson
  2023-12-12 15:28         ` Sean Christopherson
  0 siblings, 1 reply; 23+ messages in thread
From: Jim Mattson @ 2023-12-10 22:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: alexandru.elisei, anup, aou, atishp, borntraeger, chenhuacai,
	david, frankja, imbrenda, james.morse, kvm-riscv, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linux-riscv,
	linuxppc-dev, maz, mlevitsk, oliver.upton, palmer, paul.walmsley,
	pbonzini, suzuki.poulose

On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson <seanjc@google.com> wrote:
> Doh.  We got the less obvious cases and missed the obvious one.
>
> Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt().  That
> thing should really be folded into vmx_has_nested_events().
>
> Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
> specifically checks if L1 interrupts are blocked.
>
> Compile tested only, and definitely needs to be chunked into multiple patches,
> but I think something like this mess?

The proposed patch does not fix the problem. In fact, it messes things
up so much that I don't get any test results back.

Google has an internal K-U-T test that demonstrates the problem. I
will post it soon.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2023-12-10 22:52       ` Jim Mattson
@ 2023-12-12 15:28         ` Sean Christopherson
  2023-12-13 22:25           ` Maxim Levitsky
  0 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2023-12-12 15:28 UTC (permalink / raw)
  To: Jim Mattson
  Cc: alexandru.elisei, anup, aou, atishp, borntraeger, chenhuacai,
	david, frankja, imbrenda, james.morse, kvm-riscv, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linux-riscv,
	linuxppc-dev, maz, mlevitsk, oliver.upton, palmer, paul.walmsley,
	pbonzini, suzuki.poulose

On Sun, Dec 10, 2023, Jim Mattson wrote:
> On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > Doh.  We got the less obvious cases and missed the obvious one.
> >
> > Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt().  That
> > thing should really be folded into vmx_has_nested_events().
> >
> > Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
> > specifically checks if L1 interrupts are blocked.
> >
> > Compile tested only, and definitely needs to be chunked into multiple patches,
> > but I think something like this mess?
> 
> The proposed patch does not fix the problem. In fact, it messes things
> up so much that I don't get any test results back.

Drat.

> Google has an internal K-U-T test that demonstrates the problem. I
> will post it soon.

Received, I'll dig in soonish, though "soonish" might unfortunately might mean
2024.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2023-12-12 15:28         ` Sean Christopherson
@ 2023-12-13 22:25           ` Maxim Levitsky
  2023-12-13 22:31             ` Jim Mattson
  2023-12-13 22:59             ` Sean Christopherson
  0 siblings, 2 replies; 23+ messages in thread
From: Maxim Levitsky @ 2023-12-13 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Jim Mattson
  Cc: alexandru.elisei, anup, aou, atishp, borntraeger, chenhuacai,
	david, frankja, imbrenda, james.morse, kvm-riscv, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linux-riscv,
	linuxppc-dev, maz, oliver.upton, palmer, paul.walmsley, pbonzini,
	suzuki.poulose

On Tue, 2023-12-12 at 07:28 -0800, Sean Christopherson wrote:
> On Sun, Dec 10, 2023, Jim Mattson wrote:
> > On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > Doh.  We got the less obvious cases and missed the obvious one.
> > > 
> > > Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt().  That
> > > thing should really be folded into vmx_has_nested_events().
> > > 
> > > Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
> > > specifically checks if L1 interrupts are blocked.
> > > 
> > > Compile tested only, and definitely needs to be chunked into multiple patches,
> > > but I think something like this mess?
> > 
> > The proposed patch does not fix the problem. In fact, it messes things
> > up so much that I don't get any test results back.
> 
> Drat.
> 
> > Google has an internal K-U-T test that demonstrates the problem. I
> > will post it soon.
> 
> Received, I'll dig in soonish, though "soonish" might unfortunately might mean
> 2024.
> 

Hi,

So this is what I think:


KVM does have kvm_guest_apic_has_interrupt() for this exact purpose,
to check if nested APICv has a pending interrupt before halting.


However the problem is bigger - with APICv we have in essence 2 pending interrupt
bitmaps - the PIR and the IRR, and to know if the guest has a pending interrupt
one has in theory to copy PIR to IRR, then see if the max is larger then the current PPR.

Since we don't want to write to guest memory, and the IRR here resides in the guest memory,
I guess we have to do a 'dry-run' version of 'vmx_complete_nested_posted_interrupt' and call
it from  kvm_guest_apic_has_interrupt().

What do you think? I can prepare a patch for this.

Can you share a reproducer or write a new one that can be shared?

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2023-12-13 22:25           ` Maxim Levitsky
@ 2023-12-13 22:31             ` Jim Mattson
  2023-12-13 22:44               ` Maxim Levitsky
  2023-12-13 22:59             ` Sean Christopherson
  1 sibling, 1 reply; 23+ messages in thread
From: Jim Mattson @ 2023-12-13 22:31 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, alexandru.elisei, anup, aou, atishp,
	borntraeger, chenhuacai, david, frankja, imbrenda, james.morse,
	kvm-riscv, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linux-riscv, linuxppc-dev, maz, oliver.upton, palmer,
	paul.walmsley, pbonzini, suzuki.poulose

On Wed, Dec 13, 2023 at 2:25 PM Maxim Levitsky <mlevitsk@redhat.com> wrote:
>
> On Tue, 2023-12-12 at 07:28 -0800, Sean Christopherson wrote:
> > On Sun, Dec 10, 2023, Jim Mattson wrote:
> > > On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > Doh.  We got the less obvious cases and missed the obvious one.
> > > >
> > > > Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt().  That
> > > > thing should really be folded into vmx_has_nested_events().
> > > >
> > > > Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
> > > > specifically checks if L1 interrupts are blocked.
> > > >
> > > > Compile tested only, and definitely needs to be chunked into multiple patches,
> > > > but I think something like this mess?
> > >
> > > The proposed patch does not fix the problem. In fact, it messes things
> > > up so much that I don't get any test results back.
> >
> > Drat.
> >
> > > Google has an internal K-U-T test that demonstrates the problem. I
> > > will post it soon.
> >
> > Received, I'll dig in soonish, though "soonish" might unfortunately might mean
> > 2024.
> >
>
> Hi,
>
> So this is what I think:
>
>
> KVM does have kvm_guest_apic_has_interrupt() for this exact purpose,
> to check if nested APICv has a pending interrupt before halting.
>
>
> However the problem is bigger - with APICv we have in essence 2 pending interrupt
> bitmaps - the PIR and the IRR, and to know if the guest has a pending interrupt
> one has in theory to copy PIR to IRR, then see if the max is larger then the current PPR.
>
> Since we don't want to write to guest memory, and the IRR here resides in the guest memory,
> I guess we have to do a 'dry-run' version of 'vmx_complete_nested_posted_interrupt' and call
> it from  kvm_guest_apic_has_interrupt().
>
> What do you think? I can prepare a patch for this.
>
> Can you share a reproducer or write a new one that can be shared?

See https://lore.kernel.org/kvm/20231211185552.3856862-1-jmattson@google.com/.

> Best regards,
>         Maxim Levitsky
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2023-12-13 22:31             ` Jim Mattson
@ 2023-12-13 22:44               ` Maxim Levitsky
  0 siblings, 0 replies; 23+ messages in thread
From: Maxim Levitsky @ 2023-12-13 22:44 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Sean Christopherson, alexandru.elisei, anup, aou, atishp,
	borntraeger, chenhuacai, david, frankja, imbrenda, james.morse,
	kvm-riscv, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linux-riscv, linuxppc-dev, maz, oliver.upton, palmer,
	paul.walmsley, pbonzini, suzuki.poulose

On Wed, 2023-12-13 at 14:31 -0800, Jim Mattson wrote:
> On Wed, Dec 13, 2023 at 2:25 PM Maxim Levitsky <mlevitsk@redhat.com> wrote:
> > On Tue, 2023-12-12 at 07:28 -0800, Sean Christopherson wrote:
> > > On Sun, Dec 10, 2023, Jim Mattson wrote:
> > > > On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > > Doh.  We got the less obvious cases and missed the obvious one.
> > > > > 
> > > > > Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt().  That
> > > > > thing should really be folded into vmx_has_nested_events().
> > > > > 
> > > > > Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
> > > > > specifically checks if L1 interrupts are blocked.
> > > > > 
> > > > > Compile tested only, and definitely needs to be chunked into multiple patches,
> > > > > but I think something like this mess?
> > > > 
> > > > The proposed patch does not fix the problem. In fact, it messes things
> > > > up so much that I don't get any test results back.
> > > 
> > > Drat.
> > > 
> > > > Google has an internal K-U-T test that demonstrates the problem. I
> > > > will post it soon.
> > > 
> > > Received, I'll dig in soonish, though "soonish" might unfortunately might mean
> > > 2024.
> > > 
> > 
> > Hi,
> > 
> > So this is what I think:
> > 
> > 
> > KVM does have kvm_guest_apic_has_interrupt() for this exact purpose,
> > to check if nested APICv has a pending interrupt before halting.
> > 
> > 
> > However the problem is bigger - with APICv we have in essence 2 pending interrupt
> > bitmaps - the PIR and the IRR, and to know if the guest has a pending interrupt
> > one has in theory to copy PIR to IRR, then see if the max is larger then the current PPR.
> > 
> > Since we don't want to write to guest memory, and the IRR here resides in the guest memory,
> > I guess we have to do a 'dry-run' version of 'vmx_complete_nested_posted_interrupt' and call
> > it from  kvm_guest_apic_has_interrupt().
> > 
> > What do you think? I can prepare a patch for this.
> > 
> > Can you share a reproducer or write a new one that can be shared?
> 
> See https://lore.kernel.org/kvm/20231211185552.3856862-1-jmattson@google.com/.

Thank you!

Best regards,
	Maxim Levitsky

> 
> > Best regards,
> >         Maxim Levitsky
> > 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()
  2023-12-13 22:25           ` Maxim Levitsky
  2023-12-13 22:31             ` Jim Mattson
@ 2023-12-13 22:59             ` Sean Christopherson
  1 sibling, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2023-12-13 22:59 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Jim Mattson, alexandru.elisei, anup, aou, atishp, borntraeger,
	chenhuacai, david, frankja, imbrenda, james.morse, kvm-riscv,
	kvm, linux-arm-kernel, linux-kernel, linux-mips, linux-riscv,
	linuxppc-dev, maz, oliver.upton, palmer, paul.walmsley, pbonzini,
	suzuki.poulose

On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> On Tue, 2023-12-12 at 07:28 -0800, Sean Christopherson wrote:
> > On Sun, Dec 10, 2023, Jim Mattson wrote:
> > > On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > Doh.  We got the less obvious cases and missed the obvious one.
> > > > 
> > > > Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt().  That
> > > > thing should really be folded into vmx_has_nested_events().
> > > > 
> > > > Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
> > > > specifically checks if L1 interrupts are blocked.
> > > > 
> > > > Compile tested only, and definitely needs to be chunked into multiple patches,
> > > > but I think something like this mess?
> > > 
> > > The proposed patch does not fix the problem. In fact, it messes things
> > > up so much that I don't get any test results back.
> > 
> > Drat.
> > 
> > > Google has an internal K-U-T test that demonstrates the problem. I
> > > will post it soon.
> > 
> > Received, I'll dig in soonish, though "soonish" might unfortunately might mean
> > 2024.
> > 
> 
> Hi,
> 
> So this is what I think:
> 
> KVM does have kvm_guest_apic_has_interrupt() for this exact purpose,
> to check if nested APICv has a pending interrupt before halting.

For all intents and purposes, so was nested_ops->has_events().  I don't see
any reason to have two APIs that do the same thing, and the call to
kvm_guest_apic_has_interrupt() is wrong in that it doesn't verify that IRQs are
enabled for _L2_.  That's why my preference is to fold the two together.

> However the problem is bigger - with APICv we have in essence 2 pending
> interrupt bitmaps - the PIR and the IRR, and to know if the guest has a
> pending interrupt one has in theory to copy PIR to IRR, then see if the max
> is larger then the current PPR.

Yeah, this is what my untested hack-a-patch tried to do.

> Since we don't want to write to guest memory,

The changelog is misleading/wrong.  Writing guest memory is ok, what isn't safe
is blocking or sleeping, i.e. KVM must not trigger a host page fault due to
accessing a page that's been swapped out.  Read vs. write doesn't matter.

So KVM can safely read and write guest memory so long as it already mapped by 
kvm_vcpu_map() (or I suppose if we wrapped an access with pagefault_disable(),
but I can't think of a sane reason to do that).  E.g. nVMX can access a vCPU's
PID mapping, but synthesizing a nested VM-Exit will cause explosions on nSVM.

> and the IRR here resides in the guest memory, I guess we have to do a
> 'dry-run' version of 'vmx_complete_nested_posted_interrupt' and call it from
> kvm_guest_apic_has_interrupt().

nested_ops->has_events() is the much better fit, e.g. the naming won't get weird
and we can gate the whole thing on is_guest_mode().  Though we probably need a
wrapper to handle any commonalities between nVMX and nSVM.

> What do you think? I can prepare a patch for this.

As above, this is what I tried to do, sort of.  Though it's obviously broken.  We
don't need a full dry-run because KVM only needs to detect events that are unique
to L2, e.g. nVMX's preemption timer, MTF, and pending virtual interrupts (hmm,
I suspect nSVM's vNMI is broken too).  Things like INIT and SMI don't require
nested virtualization awareness because the event itself is tracked for the vCPU
as a whole.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-12-13 22:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21  0:31 [PATCH v4 00/12] KVM: x86: never write to memory from kvm_vcpu_check_block Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 01/12] KVM: x86: make vendor code check for all nested events Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 02/12] KVM: nVMX: Make an event request when pending an MTF nested VM-Exit Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 03/12] KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 04/12] KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 05/12] KVM: x86: lapic does not have to process INIT if it is blocked Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 06/12] KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 07/12] KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 08/12] KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 09/12] KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events Sean Christopherson
2022-09-21  0:31 ` [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block() Sean Christopherson
2023-12-07  1:03   ` Jim Mattson
2023-12-07 16:21     ` Sean Christopherson
2023-12-10 22:52       ` Jim Mattson
2023-12-12 15:28         ` Sean Christopherson
2023-12-13 22:25           ` Maxim Levitsky
2023-12-13 22:31             ` Jim Mattson
2023-12-13 22:44               ` Maxim Levitsky
2023-12-13 22:59             ` Sean Christopherson
2022-09-21  0:32 ` [PATCH v4 11/12] KVM: mips, x86: do not rely on KVM_REQ_UNHALT Sean Christopherson
2022-09-22 13:17   ` Philippe Mathieu-Daudé
2022-09-21  0:32 ` [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT Sean Christopherson
2022-09-22 14:52   ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).