All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
@ 2022-12-13  6:09 Sean Christopherson
  2022-12-13  6:09 ` [PATCH 1/7] KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendly Sean Christopherson
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Move NMI VM-Exit handling into vmx_vcpu_enter_exit() to fix a (mostly
benign?) bug where NMIs can be unblocked prior to servicing the NMI that
triggered the VM-Exit, e.g. if instrumentation triggers a fault and thus
an IRET.  I deliberately didn't tag any of these for stable@ as the odds
of me screwing something up or of a backport going sideways seems higher
than out-of-order NMIs causing major problems.

The bulk of this series is just getting various helpers/paths ready for
noinstr usage.

I kept the use of a direct call to a dedicated entry point for NMIs
(doubled down really).  AFAICT, there are no issues with the direct call
in the current code, and I don't know enough about FRED to know if using
INT $2 would be better or worse, i.e. less churn seemed like the way to
go.  And if reverting to INT $2 in the future is desirable, splitting NMI
and IRQ handling makes it quite easy to do so as all the relevant code
that needs to be ripped out is isolated.

Sean Christopherson (7):
  KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info()
    noinstr-friendly
  KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented
  KVM: VMX: Always inline eVMCS read/write helpers
  KVM: VMX: Always inline to_vmx() and to_kvm_vmx()
  x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
  KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ
    handlers
  KVM: VMX: Handle NMI VM-Exits in noinstr region

 arch/x86/include/asm/idtentry.h | 16 +++-----
 arch/x86/kernel/nmi.c           |  8 ++--
 arch/x86/kvm/kvm_cache_regs.h   | 12 ++++++
 arch/x86/kvm/vmx/hyperv.h       | 20 ++++-----
 arch/x86/kvm/vmx/vmcs.h         |  4 +-
 arch/x86/kvm/vmx/vmenter.S      | 72 ++++++++++++++++++---------------
 arch/x86/kvm/vmx/vmx.c          | 55 +++++++++++++------------
 arch/x86/kvm/vmx/vmx.h          | 18 ++++-----
 arch/x86/kvm/vmx/vmx_ops.h      |  2 +
 arch/x86/kvm/x86.h              |  6 +--
 10 files changed, 117 insertions(+), 96 deletions(-)


base-commit: 208f1c64e255fe3a29083880818e010ebdf585c6
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/7] KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendly
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
@ 2022-12-13  6:09 ` Sean Christopherson
  2022-12-13  6:09 ` [PATCH 2/7] KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented Sean Christopherson
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Add an extra special noinstr-friendly helper to test+mark a "register"
available and use it when caching vmcs.EXIT_QUALIFICATION and
vmcs.VM_EXIT_INTR_INFO.  Make the caching helpers __always_inline too so
that they can be used in noinstr functions.

A future fix will move VMX's handling of NMI exits into the noinstr
vmx_vcpu_enter_exit() so that the NMI is processed before any kind of
instrumentation can trigger a fault and thus IRET, i.e. so that KVM
doesn't invoke the NMI handler with NMIs enabled.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/kvm_cache_regs.h | 12 ++++++++++++
 arch/x86/kvm/vmx/vmx.h        | 14 ++++++--------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index c09174f73a34..4c91f626c058 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -75,6 +75,18 @@ static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
 	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
 }
 
+/*
+ * kvm_register_test_and_mark_available() is a special snowflake that uses an
+ * arch bitop directly to avoid the explicit instrumentation that comes with
+ * the generic bitops.  This allows code that cannot be instrumented (noinstr
+ * functions), e.g. the low level VM-Enter/VM-Exit paths, to cache registers.
+ */
+static __always_inline bool kvm_register_test_and_mark_available(struct kvm_vcpu *vcpu,
+								 enum kvm_reg reg)
+{
+	return arch___test_and_set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
 /*
  * The "raw" register helpers are only for cases where the full 64 bits of a
  * register are read/written irrespective of current vCPU mode.  In other words,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a3da84f4ea45..bb720a2f11ab 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -669,25 +669,23 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);
 
-static inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
+static __always_inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (!kvm_register_is_available(vcpu, VCPU_EXREG_EXIT_INFO_1)) {
-		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1);
+	if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1))
 		vmx->exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
-	}
+
 	return vmx->exit_qualification;
 }
 
-static inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
+static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (!kvm_register_is_available(vcpu, VCPU_EXREG_EXIT_INFO_2)) {
-		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2);
+	if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2))
 		vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-	}
+
 	return vmx->exit_intr_info;
 }
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/7] KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
  2022-12-13  6:09 ` [PATCH 1/7] KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendly Sean Christopherson
@ 2022-12-13  6:09 ` Sean Christopherson
  2022-12-13  6:09 ` [PATCH 3/7] KVM: VMX: Always inline eVMCS read/write helpers Sean Christopherson
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Allow instrumentation in the VM-Fail path of __vmcs_readl() so that the
helper can be used in noinstr functions, e.g. to get the exit reason in
vmx_vcpu_enter_exit() in order to handle NMI VM-Exits in the noinstr
section.  While allowing instrumentation isn't technically safe, KVM has
much bigger problems if VMREAD fails in a noinstr section.

Note, all other VMX instructions also allow instrumentation in their
VM-Fail paths for similar reasons, VMREAD was simply omitted by commit
3ebccdf373c2 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
because VMREAD wasn't used in a noinstr section at the time.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx_ops.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
index 842dc898c972..5838489e719b 100644
--- a/arch/x86/kvm/vmx/vmx_ops.h
+++ b/arch/x86/kvm/vmx/vmx_ops.h
@@ -100,8 +100,10 @@ static __always_inline unsigned long __vmcs_readl(unsigned long field)
 	return value;
 
 do_fail:
+	instrumentation_begin();
 	WARN_ONCE(1, "kvm: vmread failed: field=%lx\n", field);
 	pr_warn_ratelimited("kvm: vmread failed: field=%lx\n", field);
+	instrumentation_end();
 	return 0;
 
 do_exception:
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/7] KVM: VMX: Always inline eVMCS read/write helpers
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
  2022-12-13  6:09 ` [PATCH 1/7] KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendly Sean Christopherson
  2022-12-13  6:09 ` [PATCH 2/7] KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented Sean Christopherson
@ 2022-12-13  6:09 ` Sean Christopherson
  2022-12-13  6:09 ` [PATCH 4/7] KVM: VMX: Always inline to_vmx() and to_kvm_vmx() Sean Christopherson
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Tag all evmcs_{read,write}() helpers __always_inline so that they can be
freely used in noinstr sections, e.g. to get the VM-Exit reason in
vcpu_vmx_enter_exit() (in a future patch).  For consistency and to avoid
more spot fixes in the future, e.g. see commit 010050a86393 ("x86/kvm:
Always inline evmcs_write64()"), tag all accessors even though
evmcs_read32() is the only anticipated use case in the near future.  In
practice, non-KASAN builds are all but guaranteed to inline the helpers
anyways.

  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x107: call to evmcs_read32()
                               leaves .noinstr.text section

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/hyperv.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h
index 571e7929d14e..3f4049e4f35d 100644
--- a/arch/x86/kvm/vmx/hyperv.h
+++ b/arch/x86/kvm/vmx/hyperv.h
@@ -136,7 +136,7 @@ static __always_inline void evmcs_write64(unsigned long field, u64 value)
 	current_evmcs->hv_clean_fields &= ~clean_field;
 }
 
-static inline void evmcs_write32(unsigned long field, u32 value)
+static __always_inline void evmcs_write32(unsigned long field, u32 value)
 {
 	u16 clean_field;
 	int offset = get_evmcs_offset(field, &clean_field);
@@ -148,7 +148,7 @@ static inline void evmcs_write32(unsigned long field, u32 value)
 	current_evmcs->hv_clean_fields &= ~clean_field;
 }
 
-static inline void evmcs_write16(unsigned long field, u16 value)
+static __always_inline void evmcs_write16(unsigned long field, u16 value)
 {
 	u16 clean_field;
 	int offset = get_evmcs_offset(field, &clean_field);
@@ -160,7 +160,7 @@ static inline void evmcs_write16(unsigned long field, u16 value)
 	current_evmcs->hv_clean_fields &= ~clean_field;
 }
 
-static inline u64 evmcs_read64(unsigned long field)
+static __always_inline u64 evmcs_read64(unsigned long field)
 {
 	int offset = get_evmcs_offset(field, NULL);
 
@@ -170,7 +170,7 @@ static inline u64 evmcs_read64(unsigned long field)
 	return *(u64 *)((char *)current_evmcs + offset);
 }
 
-static inline u32 evmcs_read32(unsigned long field)
+static __always_inline u32 evmcs_read32(unsigned long field)
 {
 	int offset = get_evmcs_offset(field, NULL);
 
@@ -180,7 +180,7 @@ static inline u32 evmcs_read32(unsigned long field)
 	return *(u32 *)((char *)current_evmcs + offset);
 }
 
-static inline u16 evmcs_read16(unsigned long field)
+static __always_inline u16 evmcs_read16(unsigned long field)
 {
 	int offset = get_evmcs_offset(field, NULL);
 
@@ -213,11 +213,11 @@ static inline void evmcs_load(u64 phys_addr)
 
 #else /* !IS_ENABLED(CONFIG_HYPERV) */
 static __always_inline void evmcs_write64(unsigned long field, u64 value) {}
-static inline void evmcs_write32(unsigned long field, u32 value) {}
-static inline void evmcs_write16(unsigned long field, u16 value) {}
-static inline u64 evmcs_read64(unsigned long field) { return 0; }
-static inline u32 evmcs_read32(unsigned long field) { return 0; }
-static inline u16 evmcs_read16(unsigned long field) { return 0; }
+static __always_inline void evmcs_write32(unsigned long field, u32 value) {}
+static __always_inline void evmcs_write16(unsigned long field, u16 value) {}
+static __always_inline u64 evmcs_read64(unsigned long field) { return 0; }
+static __always_inline u32 evmcs_read32(unsigned long field) { return 0; }
+static __always_inline u16 evmcs_read16(unsigned long field) { return 0; }
 static inline void evmcs_load(u64 phys_addr) {}
 static inline void evmcs_touch_msr_bitmap(void) {}
 #endif /* IS_ENABLED(CONFIG_HYPERV) */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/7] KVM: VMX: Always inline to_vmx() and to_kvm_vmx()
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-12-13  6:09 ` [PATCH 3/7] KVM: VMX: Always inline eVMCS read/write helpers Sean Christopherson
@ 2022-12-13  6:09 ` Sean Christopherson
  2022-12-13  6:09 ` [PATCH 5/7] x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too Sean Christopherson
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Tag to_vmx() and to_kvm_vmx() __always_inline as they both just reflect
the passed in pointer (the embedded struct is the first field in the
container), and drop the @vmx param from vmx_vcpu_enter_exit(), which
likely existed purely to make noinstr validation happy.

Amusingly, when the compiler decides to not inline the helpers, e.g. for
KASAN builds, to_vmx() and to_kvm_vmx() may end up pointing at the same
symbol, which generates very confusing objtool warnings.  E.g. the use of
to_vmx() in a future patch led to objtool complaining about to_kvm_vmx(),
and only once all use of to_kvm_vmx() was commented out did to_vmx() pop
up in the obj tool report.

  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x160: call to to_kvm_vmx()
                               leaves .noinstr.text section

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 5 +++--
 arch/x86/kvm/vmx/vmx.h | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fe5615fd8295..e2c96f204b82 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7096,9 +7096,10 @@ static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 }
 
 static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
-					struct vcpu_vmx *vmx,
 					unsigned long flags)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
 	guest_state_enter_irqoff();
 
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
@@ -7216,7 +7217,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	kvm_wait_lapic_expire(vcpu);
 
 	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
-	vmx_vcpu_enter_exit(vcpu, vmx, __vmx_vcpu_run_flags(vmx));
+	vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx));
 
 	/* All fields are clean at this point */
 	if (static_branch_unlikely(&enable_evmcs)) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bb720a2f11ab..2acdc54bc34b 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -640,12 +640,12 @@ BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64)
 				(1 << VCPU_EXREG_EXIT_INFO_1) | \
 				(1 << VCPU_EXREG_EXIT_INFO_2))
 
-static inline struct kvm_vmx *to_kvm_vmx(struct kvm *kvm)
+static __always_inline struct kvm_vmx *to_kvm_vmx(struct kvm *kvm)
 {
 	return container_of(kvm, struct kvm_vmx, kvm);
 }
 
-static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
+static __always_inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 {
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-12-13  6:09 ` [PATCH 4/7] KVM: VMX: Always inline to_vmx() and to_kvm_vmx() Sean Christopherson
@ 2022-12-13  6:09 ` Sean Christopherson
  2022-12-14  8:05   ` Lai Jiangshan
  2022-12-13  6:09 ` [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers Sean Christopherson
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Use a dedicated entry for invoking the NMI handler from KVM VMX's VM-Exit
path for 32-bit even though using a dedicated entry for 32-bit isn't
strictly necessary.  Exposing a single symbol will allow KVM to reference
the entry point in assembly code without having to resort to more #ifdefs
(or #defines).  identry.h is intended to be included from asm files only
once, and so simply including idtentry.h in KVM assembly isn't an option.

Bypassing the ESP fixup and CR3 switching in the standard NMI entry code
is safe as KVM always handles NMIs that occur in the guest on a kernel
stack, with a kernel CR3.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/idtentry.h | 16 ++++++----------
 arch/x86/kernel/nmi.c           |  8 ++++----
 arch/x86/kvm/vmx/vmx.c          |  4 ++--
 3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 72184b0b2219..b241af4ce9b4 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -582,18 +582,14 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
 
 /* NMI */
 
-#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+#if IS_ENABLED(CONFIG_KVM_INTEL)
 /*
- * Special NOIST entry point for VMX which invokes this on the kernel
- * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
- * 'executing' marker.
- *
- * On 32bit this just uses the regular NMI entry point because 32-bit does
- * not have ISTs.
+ * Special entry point for VMX which invokes this on the kernel stack, even for
+ * 64-bit, i.e. without using an IST.  asm_exc_nmi() requires an IST to work
+ * correctly vs. the NMI 'executing' marker.  Used for 32-bit kernels as well
+ * to avoid more ifdeffery.
  */
-DECLARE_IDTENTRY(X86_TRAP_NMI,		exc_nmi_noist);
-#else
-#define asm_exc_nmi_noist		asm_exc_nmi
+DECLARE_IDTENTRY(X86_TRAP_NMI,		exc_nmi_kvm_vmx);
 #endif
 
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index cec0bfa3bc04..e37faba95bb5 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -527,14 +527,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
-#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
-DEFINE_IDTENTRY_RAW(exc_nmi_noist)
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+DEFINE_IDTENTRY_RAW(exc_nmi_kvm_vmx)
 {
 	exc_nmi(regs);
 }
-#endif
 #if IS_MODULE(CONFIG_KVM_INTEL)
-EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+EXPORT_SYMBOL_GPL(asm_exc_nmi_kvm_vmx);
+#endif
 #endif
 
 void stop_nmi(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e2c96f204b82..7ace22ee240d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6791,7 +6791,7 @@ void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
 					unsigned long entry)
 {
-	bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist;
+	bool is_nmi = entry == (unsigned long)asm_exc_nmi_kvm_vmx;
 
 	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
 	vmx_do_interrupt_nmi_irqoff(entry);
@@ -6820,7 +6820,7 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
-	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
+	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_kvm_vmx;
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
 	/* if exit due to PF check for async PF */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-12-13  6:09 ` [PATCH 5/7] x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too Sean Christopherson
@ 2022-12-13  6:09 ` Sean Christopherson
  2022-12-14 21:23   ` Li, Xin3
  2022-12-13  6:09 ` [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region Sean Christopherson
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Split the asm subroutines for handling NMIs versus IRQs that occur in the
guest so that the NMI handler can be called from a noinstr section.  As a
bonus, the NMI path doesn't need an indirect branch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmenter.S | 70 +++++++++++++++++++++-----------------
 arch/x86/kvm/vmx/vmx.c     | 26 ++++++--------
 2 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 766c6b3ef5ed..9d987e7e48c4 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -31,6 +31,39 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
 
+.macro VMX_DO_EVENT_IRQOFF call_insn call_target
+	/*
+	 * Unconditionally create a stack frame, getting the correct RSP on the
+	 * stack (for x86-64) would take two instructions anyways, and RBP can
+	 * be used to restore RSP to make objtool happy (see below).
+	 */
+	push %_ASM_BP
+	mov %_ASM_SP, %_ASM_BP
+
+#ifdef CONFIG_X86_64
+	/*
+	 * Align RSP to a 16-byte boundary (to emulate CPU behavior) before
+	 * creating the synthetic interrupt stack frame for the IRQ/NMI.
+	 */
+	and  $-16, %rsp
+	push $__KERNEL_DS
+	push %rbp
+#endif
+	pushf
+	push $__KERNEL_CS
+	\call_insn \call_target
+
+	/*
+	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
+	 * the correct value.  objtool doesn't know the callee will IRET and,
+	 * without the explicit restore, thinks the stack is getting walloped.
+	 * Using an unwind hint is problematic due to x86-64's dynamic alignment.
+	 */
+	mov %_ASM_BP, %_ASM_SP
+	pop %_ASM_BP
+	RET
+.endm
+
 .section .noinstr.text, "ax"
 
 /**
@@ -320,35 +353,10 @@ SYM_FUNC_START(vmread_error_trampoline)
 SYM_FUNC_END(vmread_error_trampoline)
 #endif
 
-SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
-	/*
-	 * Unconditionally create a stack frame, getting the correct RSP on the
-	 * stack (for x86-64) would take two instructions anyways, and RBP can
-	 * be used to restore RSP to make objtool happy (see below).
-	 */
-	push %_ASM_BP
-	mov %_ASM_SP, %_ASM_BP
+SYM_FUNC_START(vmx_do_nmi_irqoff)
+	VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
+SYM_FUNC_END(vmx_do_nmi_irqoff)
 
-#ifdef CONFIG_X86_64
-	/*
-	 * Align RSP to a 16-byte boundary (to emulate CPU behavior) before
-	 * creating the synthetic interrupt stack frame for the IRQ/NMI.
-	 */
-	and  $-16, %rsp
-	push $__KERNEL_DS
-	push %rbp
-#endif
-	pushf
-	push $__KERNEL_CS
-	CALL_NOSPEC _ASM_ARG1
-
-	/*
-	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
-	 * the correct value.  objtool doesn't know the callee will IRET and,
-	 * without the explicit restore, thinks the stack is getting walloped.
-	 * Using an unwind hint is problematic due to x86-64's dynamic alignment.
-	 */
-	mov %_ASM_BP, %_ASM_SP
-	pop %_ASM_BP
-	RET
-SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
+SYM_FUNC_START(vmx_do_interrupt_irqoff)
+	VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
+SYM_FUNC_END(vmx_do_interrupt_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7ace22ee240d..c242e2591896 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6786,17 +6786,8 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
 }
 
-void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
-					unsigned long entry)
-{
-	bool is_nmi = entry == (unsigned long)asm_exc_nmi_kvm_vmx;
-
-	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
-	vmx_do_interrupt_nmi_irqoff(entry);
-	kvm_after_interrupt(vcpu);
-}
+void vmx_do_interrupt_irqoff(unsigned long entry);
+void vmx_do_nmi_irqoff(void);
 
 static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 {
@@ -6820,7 +6811,6 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
-	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_kvm_vmx;
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
 	/* if exit due to PF check for async PF */
@@ -6833,8 +6823,11 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
+	else if (is_nmi(intr_info)) {
+		kvm_before_interrupt(&vmx->vcpu, KVM_HANDLING_NMI);
+		vmx_do_nmi_irqoff();
+		kvm_after_interrupt(&vmx->vcpu);
+	}
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
@@ -6847,7 +6840,10 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
+	vmx_do_interrupt_irqoff(gate_offset(desc));
+	kvm_after_interrupt(vcpu);
+
 	vcpu->arch.at_instruction_boundary = true;
 }
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-12-13  6:09 ` [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers Sean Christopherson
@ 2022-12-13  6:09 ` Sean Christopherson
  2023-01-19  9:49   ` Peter Zijlstra
  2023-08-24  6:57   ` Like Xu
  2022-12-14 17:09 ` [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Li, Xin3
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-12-13  6:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that
the NMI is handled prior to leaving the safety of noinstr.  Handling the
NMI after leaving noinstr exposes the kernel to potential ordering
problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc.
will unblock NMIs when IRETing back to the faulting instruction.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmcs.h    |  4 ++--
 arch/x86/kvm/vmx/vmenter.S |  8 ++++----
 arch/x86/kvm/vmx/vmx.c     | 34 +++++++++++++++++++++-------------
 arch/x86/kvm/x86.h         |  6 +++---
 4 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index ac290a44a693..7c1996b433e2 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -75,7 +75,7 @@ struct loaded_vmcs {
 	struct vmcs_controls_shadow controls_shadow;
 };
 
-static inline bool is_intr_type(u32 intr_info, u32 type)
+static __always_inline bool is_intr_type(u32 intr_info, u32 type)
 {
 	const u32 mask = INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK;
 
@@ -146,7 +146,7 @@ static inline bool is_icebp(u32 intr_info)
 	return is_intr_type(intr_info, INTR_TYPE_PRIV_SW_EXCEPTION);
 }
 
-static inline bool is_nmi(u32 intr_info)
+static __always_inline bool is_nmi(u32 intr_info)
 {
 	return is_intr_type(intr_info, INTR_TYPE_NMI_INTR);
 }
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 9d987e7e48c4..059243085211 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -299,6 +299,10 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
 
 SYM_FUNC_END(__vmx_vcpu_run)
 
+SYM_FUNC_START(vmx_do_nmi_irqoff)
+	VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
+SYM_FUNC_END(vmx_do_nmi_irqoff)
+
 
 .section .text, "ax"
 
@@ -353,10 +357,6 @@ SYM_FUNC_START(vmread_error_trampoline)
 SYM_FUNC_END(vmread_error_trampoline)
 #endif
 
-SYM_FUNC_START(vmx_do_nmi_irqoff)
-	VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
-SYM_FUNC_END(vmx_do_nmi_irqoff)
-
 SYM_FUNC_START(vmx_do_interrupt_irqoff)
 	VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
 SYM_FUNC_END(vmx_do_interrupt_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c242e2591896..b03020ca1840 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5095,8 +5095,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	vect_info = vmx->idt_vectoring_info;
 	intr_info = vmx_get_intr_info(vcpu);
 
+	/*
+	 * Machine checks are handled by handle_exception_irqoff(), or by
+	 * vmx_vcpu_run() if a #MC occurs on VM-Entry.  NMIs are handled by
+	 * vmx_vcpu_enter_exit().
+	 */
 	if (is_machine_check(intr_info) || is_nmi(intr_info))
-		return 1; /* handled by handle_exception_nmi_irqoff() */
+		return 1;
 
 	/*
 	 * Queue the exception here instead of in handle_nm_fault_irqoff().
@@ -6809,7 +6814,7 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
 }
 
-static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
+static void handle_exception_irqoff(struct vcpu_vmx *vmx)
 {
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
@@ -6822,12 +6827,6 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 	/* Handle machine checks before interrupts are enabled */
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
-	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info)) {
-		kvm_before_interrupt(&vmx->vcpu, KVM_HANDLING_NMI);
-		vmx_do_nmi_irqoff();
-		kvm_after_interrupt(&vmx->vcpu);
-	}
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
@@ -6857,7 +6856,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
 		handle_external_interrupt_irqoff(vcpu);
 	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vmx);
+		handle_exception_irqoff(vmx);
 }
 
 /*
@@ -7119,6 +7118,18 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 
 	vmx_enable_fb_clear(vmx);
 
+	if (unlikely(vmx->fail))
+		vmx->exit_reason.full = 0xdead;
+	else
+		vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+
+	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
+	    is_nmi(vmx_get_intr_info(vcpu))) {
+		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
+		vmx_do_nmi_irqoff();
+		kvm_after_interrupt(vcpu);
+	}
+
 	guest_state_exit_irqoff();
 }
 
@@ -7260,12 +7271,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	vmx->idt_vectoring_info = 0;
 
-	if (unlikely(vmx->fail)) {
-		vmx->exit_reason.full = 0xdead;
+	if (unlikely(vmx->fail))
 		return EXIT_FASTPATH_NONE;
-	}
 
-	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
 	if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
 		kvm_machine_check();
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9de72586f406..44d1827f0a30 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -382,13 +382,13 @@ enum kvm_intr_type {
 	KVM_HANDLING_NMI,
 };
 
-static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
-					enum kvm_intr_type intr)
+static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
+						 enum kvm_intr_type intr)
 {
 	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, (u8)intr);
 }
 
-static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
+static __always_inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
 {
 	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 0);
 }
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/7] x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
  2022-12-13  6:09 ` [PATCH 5/7] x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too Sean Christopherson
@ 2022-12-14  8:05   ` Lai Jiangshan
  0 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2022-12-14  8:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Peter Zijlstra,
	Andy Lutomirski, Thomas Gleixner

On Tue, Dec 13, 2022 at 2:11 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Use a dedicated entry for invoking the NMI handler from KVM VMX's VM-Exit
> path for 32-bit even though using a dedicated entry for 32-bit isn't
> strictly necessary.  Exposing a single symbol will allow KVM to reference
> the entry point in assembly code without having to resort to more #ifdefs
> (or #defines).  identry.h is intended to be included from asm files only
> once, and so simply including idtentry.h in KVM assembly isn't an option.
>
> Bypassing the ESP fixup and CR3 switching in the standard NMI entry code
> is safe as KVM always handles NMIs that occur in the guest on a kernel
> stack, with a kernel CR3.
>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/idtentry.h | 16 ++++++----------
>  arch/x86/kernel/nmi.c           |  8 ++++----
>  arch/x86/kvm/vmx/vmx.c          |  4 ++--
>  3 files changed, 12 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 72184b0b2219..b241af4ce9b4 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -582,18 +582,14 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_exc_machine_check);
>
>  /* NMI */
>
> -#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
>  /*
> - * Special NOIST entry point for VMX which invokes this on the kernel
> - * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
> - * 'executing' marker.
> - *
> - * On 32bit this just uses the regular NMI entry point because 32-bit does
> - * not have ISTs.
> + * Special entry point for VMX which invokes this on the kernel stack, even for
> + * 64-bit, i.e. without using an IST.  asm_exc_nmi() requires an IST to work
> + * correctly vs. the NMI 'executing' marker.  Used for 32-bit kernels as well
> + * to avoid more ifdeffery.
>   */
> -DECLARE_IDTENTRY(X86_TRAP_NMI,         exc_nmi_noist);
> -#else
> -#define asm_exc_nmi_noist              asm_exc_nmi
> +DECLARE_IDTENTRY(X86_TRAP_NMI,         exc_nmi_kvm_vmx);

Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (6 preceding siblings ...)
  2022-12-13  6:09 ` [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region Sean Christopherson
@ 2022-12-14 17:09 ` Li, Xin3
  2023-01-18 19:14 ` Li, Xin3
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Li, Xin3 @ 2022-12-14 17:09 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Lutomirski, Andy, Thomas Gleixner

> I kept the use of a direct call to a dedicated entry point for NMIs
> (doubled down really).  AFAICT, there are no issues with the direct call
> in the current code, and I don't know enough about FRED to know if using
> INT $2 would be better or worse, i.e. less churn seemed like the way to
> go.  And if reverting to INT $2 in the future is desirable, splitting NMI
> and IRQ handling makes it quite easy to do so as all the relevant code
> that needs to be ripped out is isolated.

Thanks for making this change.

There is no big difference between "int $2" and calling the NMI handler explicitly.

Xin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers
  2022-12-13  6:09 ` [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers Sean Christopherson
@ 2022-12-14 21:23   ` Li, Xin3
  2022-12-15  0:26     ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Li, Xin3 @ 2022-12-14 21:23 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Lutomirski, Andy, Thomas Gleixner

> +
> +	/*
> +	 * "Restore" RSP from RBP, even though IRET has already unwound
> RSP to
> +	 * the correct value.  objtool doesn't know the callee will IRET and,
> +	 * without the explicit restore, thinks the stack is getting walloped.
> +	 * Using an unwind hint is problematic due to x86-64's dynamic
> alignment.
> +	 */
> +	mov %_ASM_BP, %_ASM_SP
> +	pop %_ASM_BP
> +	RET

For NMI, after this RET instruction, we continue to block NMIs. IRET instead?

> +.endm
> +
>  .section .noinstr.text, "ax"


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers
  2022-12-14 21:23   ` Li, Xin3
@ 2022-12-15  0:26     ` Sean Christopherson
  2022-12-15  3:06       ` Li, Xin3
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-12-15  0:26 UTC (permalink / raw)
  To: Li, Xin3
  Cc: Paolo Bonzini, kvm, linux-kernel, Peter Zijlstra, Lutomirski,
	Andy, Thomas Gleixner

On Wed, Dec 14, 2022, Li, Xin3 wrote:
> > +
> > +	/*
> > +	 * "Restore" RSP from RBP, even though IRET has already unwound
> > RSP to
> > +	 * the correct value.  objtool doesn't know the callee will IRET and,
> > +	 * without the explicit restore, thinks the stack is getting walloped.
> > +	 * Using an unwind hint is problematic due to x86-64's dynamic
> > alignment.
> > +	 */
> > +	mov %_ASM_BP, %_ASM_SP
> > +	pop %_ASM_BP
> > +	RET
> 
> For NMI, after this RET instruction, we continue to block NMIs. IRET instead?

No, IRET has already been done by the kernel-provided handler, e.g. asm_exc_nmi_kvm_vmx()
by way of error_return().  That's why the CALL above (that got snipped) is preceded
by the creation of a synthetic NMI/IRQ stack frame: the target returns from the CALL
via IRET, not RET.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers
  2022-12-15  0:26     ` Sean Christopherson
@ 2022-12-15  3:06       ` Li, Xin3
  2022-12-15  5:18         ` Li, Xin3
  0 siblings, 1 reply; 25+ messages in thread
From: Li, Xin3 @ 2022-12-15  3:06 UTC (permalink / raw)
  To: Christopherson,, Sean
  Cc: Paolo Bonzini, kvm, linux-kernel, Peter Zijlstra, Lutomirski,
	Andy, Thomas Gleixner

> > > +	 * "Restore" RSP from RBP, even though IRET has already unwound
> > > RSP to
> > > +	 * the correct value.  objtool doesn't know the callee will IRET and,
> > > +	 * without the explicit restore, thinks the stack is getting walloped.
> > > +	 * Using an unwind hint is problematic due to x86-64's dynamic
> > > alignment.
> > > +	 */
> > > +	mov %_ASM_BP, %_ASM_SP
> > > +	pop %_ASM_BP
> > > +	RET
> >
> > For NMI, after this RET instruction, we continue to block NMIs. IRET
> instead?
> 
> No, IRET has already been done by the kernel-provided handler, e.g.
> asm_exc_nmi_kvm_vmx()
> by way of error_return().  That's why the CALL above (that got snipped) is
> preceded
> by the creation of a synthetic NMI/IRQ stack frame: the target returns from
> the CALL
> via IRET, not RET.


You're right, this assembly makes another call into the asm entry point, which
returns here with IRET.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers
  2022-12-15  3:06       ` Li, Xin3
@ 2022-12-15  5:18         ` Li, Xin3
  0 siblings, 0 replies; 25+ messages in thread
From: Li, Xin3 @ 2022-12-15  5:18 UTC (permalink / raw)
  To: Li, Xin3, Christopherson,, Sean
  Cc: Paolo Bonzini, kvm, linux-kernel, Peter Zijlstra, Lutomirski,
	Andy, Thomas Gleixner

> > > > +	 * "Restore" RSP from RBP, even though IRET has already unwound
> > > > RSP to
> > > > +	 * the correct value.  objtool doesn't know the callee will IRET and,
> > > > +	 * without the explicit restore, thinks the stack is getting walloped.
> > > > +	 * Using an unwind hint is problematic due to x86-64's dynamic
> > > > alignment.
> > > > +	 */
> > > > +	mov %_ASM_BP, %_ASM_SP
> > > > +	pop %_ASM_BP
> > > > +	RET
> > >
> > > For NMI, after this RET instruction, we continue to block NMIs. IRET
> > instead?
> >
> > No, IRET has already been done by the kernel-provided handler, e.g.
> > asm_exc_nmi_kvm_vmx()
> > by way of error_return().  That's why the CALL above (that got snipped) is
> > preceded
> > by the creation of a synthetic NMI/IRQ stack frame: the target returns from
> > the CALL
> > via IRET, not RET.
> 
> 
> You're right, this assembly makes another call into the asm entry point, which
> returns here with IRET.

Like IRET for IDT, we _need_ ERETS for FRED to unblock NMI ASAP. However
a FRED stack frame has way more information than an IDT stack frame,
thus it's complicated to create a FRED stack frame with assembly.

So I prefer "INT $2" for FRED now.

I will post the FRED patch set soon, lets sync up on this afterwards.

Xin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (7 preceding siblings ...)
  2022-12-14 17:09 ` [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Li, Xin3
@ 2023-01-18 19:14 ` Li, Xin3
  2023-01-18 20:38   ` Sean Christopherson
  2023-01-19  9:50 ` Peter Zijlstra
  2023-01-28  0:07 ` Sean Christopherson
  10 siblings, 1 reply; 25+ messages in thread
From: Li, Xin3 @ 2023-01-18 19:14 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Lutomirski, Andy, Thomas Gleixner

Sean,

Is this merged into x86 KVM tree?

Thanks!
    Xin

> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Monday, December 12, 2022 10:09 PM
> To: Christopherson,, Sean <seanjc@google.com>; Paolo Bonzini
> <pbonzini@redhat.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Peter Zijlstra
> <peterz@infradead.org>; Lutomirski, Andy <luto@kernel.org>; Thomas Gleixner
> <tglx@linutronix.de>
> Subject: [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
> 
> Move NMI VM-Exit handling into vmx_vcpu_enter_exit() to fix a (mostly
> benign?) bug where NMIs can be unblocked prior to servicing the NMI that
> triggered the VM-Exit, e.g. if instrumentation triggers a fault and thus an IRET.  I
> deliberately didn't tag any of these for stable@ as the odds of me screwing
> something up or of a backport going sideways seems higher than out-of-order
> NMIs causing major problems.
> 
> The bulk of this series is just getting various helpers/paths ready for noinstr
> usage.
> 
> I kept the use of a direct call to a dedicated entry point for NMIs (doubled down
> really).  AFAICT, there are no issues with the direct call in the current code, and I
> don't know enough about FRED to know if using INT $2 would be better or worse,
> i.e. less churn seemed like the way to go.  And if reverting to INT $2 in the future
> is desirable, splitting NMI and IRQ handling makes it quite easy to do so as all the
> relevant code that needs to be ripped out is isolated.
> 
> Sean Christopherson (7):
>   KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info()
>     noinstr-friendly
>   KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented
>   KVM: VMX: Always inline eVMCS read/write helpers
>   KVM: VMX: Always inline to_vmx() and to_kvm_vmx()
>   x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
>   KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ
>     handlers
>   KVM: VMX: Handle NMI VM-Exits in noinstr region
> 
>  arch/x86/include/asm/idtentry.h | 16 +++-----
>  arch/x86/kernel/nmi.c           |  8 ++--
>  arch/x86/kvm/kvm_cache_regs.h   | 12 ++++++
>  arch/x86/kvm/vmx/hyperv.h       | 20 ++++-----
>  arch/x86/kvm/vmx/vmcs.h         |  4 +-
>  arch/x86/kvm/vmx/vmenter.S      | 72 ++++++++++++++++++---------------
>  arch/x86/kvm/vmx/vmx.c          | 55 +++++++++++++------------
>  arch/x86/kvm/vmx/vmx.h          | 18 ++++-----
>  arch/x86/kvm/vmx/vmx_ops.h      |  2 +
>  arch/x86/kvm/x86.h              |  6 +--
>  10 files changed, 117 insertions(+), 96 deletions(-)
> 
> 
> base-commit: 208f1c64e255fe3a29083880818e010ebdf585c6
> --
> 2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
  2023-01-18 19:14 ` Li, Xin3
@ 2023-01-18 20:38   ` Sean Christopherson
  2023-01-19  1:54     ` Li, Xin3
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2023-01-18 20:38 UTC (permalink / raw)
  To: Li, Xin3
  Cc: Paolo Bonzini, kvm, linux-kernel, Peter Zijlstra, Lutomirski,
	Andy, Thomas Gleixner

On Wed, Jan 18, 2023, Li, Xin3 wrote:
> Sean,
> 
> Is this merged into x86 KVM tree?

No, I want reviews for the KVM patches before merging, and need acks for the
non-KVM changes.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
  2023-01-18 20:38   ` Sean Christopherson
@ 2023-01-19  1:54     ` Li, Xin3
  0 siblings, 0 replies; 25+ messages in thread
From: Li, Xin3 @ 2023-01-19  1:54 UTC (permalink / raw)
  To: Christopherson,, Sean
  Cc: Paolo Bonzini, kvm, linux-kernel, Peter Zijlstra, Lutomirski,
	Andy, Thomas Gleixner


> > Is this merged into x86 KVM tree?
> 
> No, I want reviews for the KVM patches before merging, and need acks for the
> non-KVM changes.

I guess you want Peter Zijlstra, or some other x86 maintainers, to ack it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
  2022-12-13  6:09 ` [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region Sean Christopherson
@ 2023-01-19  9:49   ` Peter Zijlstra
  2023-01-19 15:39     ` Sean Christopherson
  2023-08-24  6:57   ` Like Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2023-01-19  9:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Andy Lutomirski, Thomas Gleixner

On Tue, Dec 13, 2022 at 06:09:12AM +0000, Sean Christopherson wrote:

> @@ -7119,6 +7118,18 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  
>  	vmx_enable_fb_clear(vmx);
>  
> +	if (unlikely(vmx->fail))
> +		vmx->exit_reason.full = 0xdead;
> +	else
> +		vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
> +
> +	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> +	    is_nmi(vmx_get_intr_info(vcpu))) {
> +		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> +		vmx_do_nmi_irqoff();
> +		kvm_after_interrupt(vcpu);
> +	}
> +
>  	guest_state_exit_irqoff();
>  }

I think we're going to have to sprinkle __always_inline on the
kvm_{before,after}_interrupt() things (or I missed the earlier patches
doing this already), sometimes compilers are just weird.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (8 preceding siblings ...)
  2023-01-18 19:14 ` Li, Xin3
@ 2023-01-19  9:50 ` Peter Zijlstra
  2023-01-28  0:07 ` Sean Christopherson
  10 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-01-19  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Andy Lutomirski, Thomas Gleixner

On Tue, Dec 13, 2022 at 06:09:05AM +0000, Sean Christopherson wrote:

> Sean Christopherson (7):
>   KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info()
>     noinstr-friendly
>   KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented
>   KVM: VMX: Always inline eVMCS read/write helpers
>   KVM: VMX: Always inline to_vmx() and to_kvm_vmx()
>   x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
>   KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ
>     handlers
>   KVM: VMX: Handle NMI VM-Exits in noinstr region

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
  2023-01-19  9:49   ` Peter Zijlstra
@ 2023-01-19 15:39     ` Sean Christopherson
  2023-01-19 15:52       ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2023-01-19 15:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, kvm, linux-kernel, Andy Lutomirski, Thomas Gleixner

On Thu, Jan 19, 2023, Peter Zijlstra wrote:
> On Tue, Dec 13, 2022 at 06:09:12AM +0000, Sean Christopherson wrote:
> 
> > @@ -7119,6 +7118,18 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> >  
> >  	vmx_enable_fb_clear(vmx);
> >  
> > +	if (unlikely(vmx->fail))
> > +		vmx->exit_reason.full = 0xdead;
> > +	else
> > +		vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
> > +
> > +	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> > +	    is_nmi(vmx_get_intr_info(vcpu))) {
> > +		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> > +		vmx_do_nmi_irqoff();
> > +		kvm_after_interrupt(vcpu);
> > +	}
> > +
> >  	guest_state_exit_irqoff();
> >  }
> 
> I think we're going to have to sprinkle __always_inline on the
> kvm_{before,after}_interrupt() things (or I missed the earlier patches
> doing this already), sometimes compilers are just weird.

It's in this patch, just lurking at the bottom.

> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 9de72586f406..44d1827f0a30 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -382,13 +382,13 @@ enum kvm_intr_type {
>         KVM_HANDLING_NMI,
>  };
>
> -static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
> -                                       enum kvm_intr_type intr)
> +static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
> +                                                enum kvm_intr_type intr)
>  {
>         WRITE_ONCE(vcpu->arch.handling_intr_from_guest, (u8)intr);
>  }
>
> -static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
> +static __always_inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
>  {
>         WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 0);
>  }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
  2023-01-19 15:39     ` Sean Christopherson
@ 2023-01-19 15:52       ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-01-19 15:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Andy Lutomirski, Thomas Gleixner

On Thu, Jan 19, 2023 at 03:39:55PM +0000, Sean Christopherson wrote:

> It's in this patch, just lurking at the bottom.

Ha!, so much for reading ..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section
  2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
                   ` (9 preceding siblings ...)
  2023-01-19  9:50 ` Peter Zijlstra
@ 2023-01-28  0:07 ` Sean Christopherson
  10 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2023-01-28  0:07 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski, Thomas Gleixner

On Tue, 13 Dec 2022 06:09:05 +0000, Sean Christopherson wrote:
> Move NMI VM-Exit handling into vmx_vcpu_enter_exit() to fix a (mostly
> benign?) bug where NMIs can be unblocked prior to servicing the NMI that
> triggered the VM-Exit, e.g. if instrumentation triggers a fault and thus
> an IRET.  I deliberately didn't tag any of these for stable@ as the odds
> of me screwing something up or of a backport going sideways seems higher
> than out-of-order NMIs causing major problems.
> 
> [...]

Applied to kvm-x86 vmx, thanks!

[1/7] KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendly
      https://github.com/kvm-x86/linux/commit/fc9465be8aad
[2/7] KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented
      https://github.com/kvm-x86/linux/commit/8578f59657c5
[3/7] KVM: VMX: Always inline eVMCS read/write helpers
      https://github.com/kvm-x86/linux/commit/11633f69506d
[4/7] KVM: VMX: Always inline to_vmx() and to_kvm_vmx()
      https://github.com/kvm-x86/linux/commit/432727f1cb6e
[5/7] x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
      https://github.com/kvm-x86/linux/commit/54a3b70a75dc
[6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers
      https://github.com/kvm-x86/linux/commit/4f76e86f7e0d
[7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
      https://github.com/kvm-x86/linux/commit/11df586d774f

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
  2022-12-13  6:09 ` [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region Sean Christopherson
  2023-01-19  9:49   ` Peter Zijlstra
@ 2023-08-24  6:57   ` Like Xu
  2023-08-24 14:16     ` Sean Christopherson
  1 sibling, 1 reply; 25+ messages in thread
From: Like Xu @ 2023-08-24  6:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski,
	Thomas Gleixner, Paolo Bonzini

On 13/12/2022 2:09 pm, Sean Christopherson wrote:
> Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that
> the NMI is handled prior to leaving the safety of noinstr.  Handling the
> NMI after leaving noinstr exposes the kernel to potential ordering
> problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc.
> will unblock NMIs when IRETing back to the faulting instruction.
> 
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/vmx/vmcs.h    |  4 ++--
>   arch/x86/kvm/vmx/vmenter.S |  8 ++++----
>   arch/x86/kvm/vmx/vmx.c     | 34 +++++++++++++++++++++-------------
>   arch/x86/kvm/x86.h         |  6 +++---
>   4 files changed, 30 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
> index ac290a44a693..7c1996b433e2 100644
> --- a/arch/x86/kvm/vmx/vmcs.h
> +++ b/arch/x86/kvm/vmx/vmcs.h
> @@ -75,7 +75,7 @@ struct loaded_vmcs {
>   	struct vmcs_controls_shadow controls_shadow;
>   };
>   
> -static inline bool is_intr_type(u32 intr_info, u32 type)
> +static __always_inline bool is_intr_type(u32 intr_info, u32 type)
>   {
>   	const u32 mask = INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK;
>   
> @@ -146,7 +146,7 @@ static inline bool is_icebp(u32 intr_info)
>   	return is_intr_type(intr_info, INTR_TYPE_PRIV_SW_EXCEPTION);
>   }
>   
> -static inline bool is_nmi(u32 intr_info)
> +static __always_inline bool is_nmi(u32 intr_info)
>   {
>   	return is_intr_type(intr_info, INTR_TYPE_NMI_INTR);
>   }
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index 9d987e7e48c4..059243085211 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -299,6 +299,10 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
>   
>   SYM_FUNC_END(__vmx_vcpu_run)
>   
> +SYM_FUNC_START(vmx_do_nmi_irqoff)
> +	VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
> +SYM_FUNC_END(vmx_do_nmi_irqoff)
> +
>   
>   .section .text, "ax"
>   
> @@ -353,10 +357,6 @@ SYM_FUNC_START(vmread_error_trampoline)
>   SYM_FUNC_END(vmread_error_trampoline)
>   #endif
>   
> -SYM_FUNC_START(vmx_do_nmi_irqoff)
> -	VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
> -SYM_FUNC_END(vmx_do_nmi_irqoff)
> -
>   SYM_FUNC_START(vmx_do_interrupt_irqoff)
>   	VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
>   SYM_FUNC_END(vmx_do_interrupt_irqoff)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c242e2591896..b03020ca1840 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5095,8 +5095,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>   	vect_info = vmx->idt_vectoring_info;
>   	intr_info = vmx_get_intr_info(vcpu);
>   
> +	/*
> +	 * Machine checks are handled by handle_exception_irqoff(), or by
> +	 * vmx_vcpu_run() if a #MC occurs on VM-Entry.  NMIs are handled by
> +	 * vmx_vcpu_enter_exit().
> +	 */
>   	if (is_machine_check(intr_info) || is_nmi(intr_info))
> -		return 1; /* handled by handle_exception_nmi_irqoff() */
> +		return 1;
>   
>   	/*
>   	 * Queue the exception here instead of in handle_nm_fault_irqoff().
> @@ -6809,7 +6814,7 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
>   		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
>   }
>   
> -static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
> +static void handle_exception_irqoff(struct vcpu_vmx *vmx)
>   {
>   	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>   
> @@ -6822,12 +6827,6 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>   	/* Handle machine checks before interrupts are enabled */
>   	else if (is_machine_check(intr_info))
>   		kvm_machine_check();
> -	/* We need to handle NMIs before interrupts are enabled */
> -	else if (is_nmi(intr_info)) {
> -		kvm_before_interrupt(&vmx->vcpu, KVM_HANDLING_NMI);
> -		vmx_do_nmi_irqoff();
> -		kvm_after_interrupt(&vmx->vcpu);
> -	}
>   }
>   
>   static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
> @@ -6857,7 +6856,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
>   	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
>   		handle_external_interrupt_irqoff(vcpu);
>   	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
> -		handle_exception_nmi_irqoff(vmx);
> +		handle_exception_irqoff(vmx);
>   }
>   
>   /*
> @@ -7119,6 +7118,18 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>   
>   	vmx_enable_fb_clear(vmx);
>   
> +	if (unlikely(vmx->fail))
> +		vmx->exit_reason.full = 0xdead;
> +	else
> +		vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
> +
> +	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> +	    is_nmi(vmx_get_intr_info(vcpu))) {
> +		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> +		vmx_do_nmi_irqoff();
> +		kvm_after_interrupt(vcpu);
> +	}

This part of the change brings at least two types of regression:

(1) A simple reproducable case would be to collect PMI inside the guest (e.g. 
via perf),
where the number of guest PMI (a type of NMI) would be very sparse. Further,
this part of the change doesn't call asm_exc_nmi_kvm_vmx as expected and
injects PMI into the guest.

This is because vmx_get_intr_info() depends on this line:
	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
, which is executed after is_nmi(vmx_get_intr_info(vcpu)).

In this case, vmx_get_intr_info() always returned the old value instead of the 
latest
value based on vmcs_read32(VM_EXIT_INTR_INFO).

(2) Even worse, when the control flow is transferred to the host nmi hanlder, 
the context
needed by the host NMI handler is not properly restored:

- update_debugctlmsr(vmx->host_debugctlmsr);
- pt_guest_exit();
- kvm_load_host_xsave_state(); // restore Arch LBR for host nmi hnalder

(3) In addition, trace_kvm_exit() should ideally appear before the host NMI 
trace logs,
which makes it easier to understand.

A proposal fix is to delay vmx_do_nmi_irqoff() a little bit, but not a revert move:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e6849f780dba..1f29b7f22da7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7230,13 +7230,6 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu 
*vcpu,
  	else
  		vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);

-	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
-	    is_nmi(vmx_get_intr_info(vcpu))) {
-		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
-		vmx_do_nmi_irqoff();
-		kvm_after_interrupt(vcpu);
-	}
-
  	guest_state_exit_irqoff();
  }

@@ -7389,6 +7382,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)

  	trace_kvm_exit(vcpu, KVM_ISA_VMX);

+	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
+	    is_nmi(vmx_get_intr_info(vcpu))) {
+		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
+		vmx_do_nmi_irqoff();
+		kvm_after_interrupt(vcpu);
+	}
+
  	if (unlikely(vmx->exit_reason.failed_vmentry))
  		return EXIT_FASTPATH_NONE;

Please help confirm this change, at least it fixes a lot of vPMI tests.

> +
>   	guest_state_exit_irqoff();
>   }
>   
> @@ -7260,12 +7271,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   	vmx->idt_vectoring_info = 0;
>   
> -	if (unlikely(vmx->fail)) {
> -		vmx->exit_reason.full = 0xdead;
> +	if (unlikely(vmx->fail))
>   		return EXIT_FASTPATH_NONE;
> -	}
>   
> -	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
>   	if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
>   		kvm_machine_check();
>   
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 9de72586f406..44d1827f0a30 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -382,13 +382,13 @@ enum kvm_intr_type {
>   	KVM_HANDLING_NMI,
>   };
>   
> -static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
> -					enum kvm_intr_type intr)
> +static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
> +						 enum kvm_intr_type intr)
>   {
>   	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, (u8)intr);
>   }
>   
> -static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
> +static __always_inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
>   {
>   	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 0);
>   }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
  2023-08-24  6:57   ` Like Xu
@ 2023-08-24 14:16     ` Sean Christopherson
  2023-08-24 14:26       ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2023-08-24 14:16 UTC (permalink / raw)
  To: Like Xu
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski,
	Thomas Gleixner, Paolo Bonzini

On Thu, Aug 24, 2023, Like Xu wrote:
> On 13/12/2022 2:09 pm, Sean Christopherson wrote:
> > Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that
> > the NMI is handled prior to leaving the safety of noinstr.  Handling the
> > NMI after leaving noinstr exposes the kernel to potential ordering
> > problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc.
> > will unblock NMIs when IRETing back to the faulting instruction.
> (3) In addition, trace_kvm_exit() should ideally appear before the host NMI
> trace logs, which makes it easier to understand.

Ideally, yes, but tracepoints are not remotely noinstr friendly.

> A proposal fix is to delay vmx_do_nmi_irqoff() a little bit, but not a revert move:
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e6849f780dba..1f29b7f22da7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7230,13 +7230,6 @@ static noinstr void vmx_vcpu_enter_exit(struct
> kvm_vcpu *vcpu,
>  	else
>  		vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
> 
> -	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> -	    is_nmi(vmx_get_intr_info(vcpu))) {
> -		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> -		vmx_do_nmi_irqoff();
> -		kvm_after_interrupt(vcpu);
> -	}
> -
>  	guest_state_exit_irqoff();
>  }
> 
> @@ -7389,6 +7382,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> 
>  	trace_kvm_exit(vcpu, KVM_ISA_VMX);
> 
> +	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> +	    is_nmi(vmx_get_intr_info(vcpu))) {
> +		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> +		vmx_do_nmi_irqoff();
> +		kvm_after_interrupt(vcpu);
> +	}

No, the whole point of doing NMI handling in vmx_vcpu_enter_exit() is so that NMIs
are serviced before instrumentation is enabled.

I think the below is sufficient (untested at this point).  Not quite minimal, e.g.
I'm pretty sure there's (currently) no need to snapshot IDT_VECTORING_INFO_FIELD
so early, but I can't think of any reason to wait.

--
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 24 Aug 2023 06:49:36 -0700
Subject: [PATCH] KVM: VMX: Refresh available regs and IDT vectoring info
 before NMI handling

Reset the mask of available "registers" and refresh the IDT vectoring
info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a
an NMI VM-Exit.  One of the "registers" that KVM VMX lazily loads is the
vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception
or NMI" VM-Exits, i.e. is needed to identify NMIs.  Clearing the available
registers bitmask after handling NMIs results in KVM querying info from
the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both
missed NMIs and spurious NMIs from the guest's perspective.

Opportunistically grab vmcs.IDT_VECTORING_INFO_FIELD early in the VM-Exit
path too, e.g. to guard against similar consumption of stale data.  The
field is read on every "normal" VM-Exit, and there's no point in delaying
the inevitable.

Reported-by: Like Xu <like.xu.linux@gmail.com>
Fixes: 11df586d774f ("KVM: VMX: Handle NMI VM-Exits in noinstr region")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e6849f780dba..d2b78ab7a9f2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7222,13 +7222,20 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 				   flags);
 
 	vcpu->arch.cr2 = native_read_cr2();
+	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
+
+	vmx->idt_vectoring_info = 0;
 
 	vmx_enable_fb_clear(vmx);
 
-	if (unlikely(vmx->fail))
+	if (unlikely(vmx->fail)) {
 		vmx->exit_reason.full = 0xdead;
-	else
-		vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+		goto out;
+	}
+
+	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+	if (likely(!vmx->exit_reason.failed_vmentry))
+		vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
 
 	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
 	    is_nmi(vmx_get_intr_info(vcpu))) {
@@ -7237,6 +7244,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 		kvm_after_interrupt(vcpu);
 	}
 
+out:
 	guest_state_exit_irqoff();
 }
 
@@ -7358,8 +7366,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	loadsegment(es, __USER_DS);
 #endif
 
-	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
-
 	pt_guest_exit(vmx);
 
 	kvm_load_host_xsave_state(vcpu);
@@ -7376,17 +7382,12 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		vmx->nested.nested_run_pending = 0;
 	}
 
-	vmx->idt_vectoring_info = 0;
-
 	if (unlikely(vmx->fail))
 		return EXIT_FASTPATH_NONE;
 
 	if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
 		kvm_machine_check();
 
-	if (likely(!vmx->exit_reason.failed_vmentry))
-		vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
-
 	trace_kvm_exit(vcpu, KVM_ISA_VMX);
 
 	if (unlikely(vmx->exit_reason.failed_vmentry))

base-commit: fff2e47e6c3b8050ca26656693caa857e3a8b740
-- 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region
  2023-08-24 14:16     ` Sean Christopherson
@ 2023-08-24 14:26       ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2023-08-24 14:26 UTC (permalink / raw)
  To: Like Xu
  Cc: kvm, linux-kernel, Peter Zijlstra, Andy Lutomirski,
	Thomas Gleixner, Paolo Bonzini

On Thu, Aug 24, 2023, Sean Christopherson wrote:
> On Thu, Aug 24, 2023, Like Xu wrote:
> > @@ -7389,6 +7382,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> > 
> >  	trace_kvm_exit(vcpu, KVM_ISA_VMX);
> > 
> > +	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> > +	    is_nmi(vmx_get_intr_info(vcpu))) {
> > +		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> > +		vmx_do_nmi_irqoff();
> > +		kvm_after_interrupt(vcpu);
> > +	}
> 
> No, the whole point of doing NMI handling in vmx_vcpu_enter_exit() is so that NMIs
> are serviced before instrumentation is enabled.
> 
> I think the below is sufficient (untested at this point).  Not quite minimal, e.g.
> I'm pretty sure there's (currently) no need to snapshot IDT_VECTORING_INFO_FIELD
> so early, but I can't think of any reason to wait.
> 
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 24 Aug 2023 06:49:36 -0700
> Subject: [PATCH] KVM: VMX: Refresh available regs and IDT vectoring info
>  before NMI handling
> 
> Reset the mask of available "registers" and refresh the IDT vectoring
> info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a
> an NMI VM-Exit.  One of the "registers" that KVM VMX lazily loads is the
> vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception
> or NMI" VM-Exits, i.e. is needed to identify NMIs.  Clearing the available
> registers bitmask after handling NMIs results in KVM querying info from
> the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both
> missed NMIs and spurious NMIs from the guest's perspective.

Oof, it's not just from the guest's perspective, NMIs that are destined for host
consumption will suffer the same fate. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-08-24 14:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-13  6:09 [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Sean Christopherson
2022-12-13  6:09 ` [PATCH 1/7] KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendly Sean Christopherson
2022-12-13  6:09 ` [PATCH 2/7] KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumented Sean Christopherson
2022-12-13  6:09 ` [PATCH 3/7] KVM: VMX: Always inline eVMCS read/write helpers Sean Christopherson
2022-12-13  6:09 ` [PATCH 4/7] KVM: VMX: Always inline to_vmx() and to_kvm_vmx() Sean Christopherson
2022-12-13  6:09 ` [PATCH 5/7] x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too Sean Christopherson
2022-12-14  8:05   ` Lai Jiangshan
2022-12-13  6:09 ` [PATCH 6/7] KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers Sean Christopherson
2022-12-14 21:23   ` Li, Xin3
2022-12-15  0:26     ` Sean Christopherson
2022-12-15  3:06       ` Li, Xin3
2022-12-15  5:18         ` Li, Xin3
2022-12-13  6:09 ` [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region Sean Christopherson
2023-01-19  9:49   ` Peter Zijlstra
2023-01-19 15:39     ` Sean Christopherson
2023-01-19 15:52       ` Peter Zijlstra
2023-08-24  6:57   ` Like Xu
2023-08-24 14:16     ` Sean Christopherson
2023-08-24 14:26       ` Sean Christopherson
2022-12-14 17:09 ` [PATCH 0/7] KVM: VMX: Handle NMI VM-Exits in noinstr section Li, Xin3
2023-01-18 19:14 ` Li, Xin3
2023-01-18 20:38   ` Sean Christopherson
2023-01-19  1:54     ` Li, Xin3
2023-01-19  9:50 ` Peter Zijlstra
2023-01-28  0:07 ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.