All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection
@ 2022-11-10  5:53 Xin Li
  2022-11-10  5:53 ` [PATCH 1/6] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR Xin Li
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Xin Li @ 2022-11-10  5:53 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: tglx, mingo, bp, dave.hansen, hpa, seanjc, pbonzini, kevin.tian

Upon receiving an external interrupt, KVM VMX reinjects it through
calling the interrupt handler in its IDT descriptor on the current
kernel stack, which essentially uses the IDT as an interrupt dispatch
table.

However the IDT is one of the lowest level critical data structures
between a x86 CPU and the Linux kernel, we should avoid using it
*directly* whenever possible, espeically in a software defined manner.

On x86, external interrupts are divided into the following groups
  1) system interrupts
  2) external device interrupts
With the IDT, system interrupts are dispatched through the IDT
directly, while external device interrupts are all routed to the
external interrupt dispatch function common_interrupt(), which
dispatches external device interrupts through a per-CPU external
interrupt dispatch table vector_irq.

Implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection
to eliminate dispatching external interrupts through the IDT with adding
a system interrupt handler table for dispatching a system interrupt
to its corresponding handler directly. Thus a software based dispatch
function will be:

  void external_interrupt(struct pt_regs *regs, u8 vector)
  {
    if (is_system_interrupt(vector))
      system_interrupt_handler_table[vector_to_sysvec(vector)](regs);
    else /* external device interrupt */
      common_interrupt(regs, vector);
  }

And the software dispatch approach nukes a bunch of assembly.

What's more, with the Intel FRED (Flexible Return and Event Delivery)
architecture, IDT, the hardware based event dispatch table, is gone,
and the Linux kernel needs to dispatch events to their handlers with
vector to handler mappings, the dispatch function external_interrupt()
is also needed.

H. Peter Anvin (Intel) (1):
  x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR

Xin Li (5):
  x86/traps: add a system interrupt table for system interrupt dispatch
  x86/traps: add install_system_interrupt_handler()
  x86/traps: add external_interrupt() to dispatch external interrupts
  KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  x86/traps: remove unused NMI entry exc_nmi_noist()

 arch/x86/include/asm/idtentry.h  |  15 -----
 arch/x86/include/asm/traps.h     |  12 ++++
 arch/x86/kernel/cpu/acrn.c       |   7 +-
 arch/x86/kernel/cpu/mshyperv.c   |  22 ++++---
 arch/x86/kernel/irq.c            |   4 ++
 arch/x86/kernel/kvm.c            |   4 +-
 arch/x86/kernel/nmi.c            |  10 ---
 arch/x86/kernel/traps.c          | 107 +++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmenter.S       |  33 ----------
 arch/x86/kvm/vmx/vmx.c           |  19 ++----
 drivers/xen/events/events_base.c |   5 +-
 11 files changed, 156 insertions(+), 82 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR
  2022-11-10  5:53 [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
@ 2022-11-10  5:53 ` Xin Li
  2022-11-10  5:53 ` [PATCH 2/6] x86/traps: add a system interrupt table for system interrupt dispatch Xin Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Xin Li @ 2022-11-10  5:53 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: tglx, mingo, bp, dave.hansen, hpa, seanjc, pbonzini, kevin.tian

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

IRQ_MOVE_CLEANUP_VECTOR is the only one of the system IRQ vectors that
is *below* FIRST_SYSTEM_VECTOR. It is a slow path, so just push it
into common_interrupt() just before the spurios interrupt handling.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kernel/irq.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 766ffe3ba313..7e125fff45ab 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -248,6 +248,10 @@ DEFINE_IDTENTRY_IRQ(common_interrupt)
 	desc = __this_cpu_read(vector_irq[vector]);
 	if (likely(!IS_ERR_OR_NULL(desc))) {
 		handle_irq(desc, regs);
+#ifdef CONFIG_SMP
+	} else if (vector == IRQ_MOVE_CLEANUP_VECTOR) {
+		sysvec_irq_move_cleanup(regs);
+#endif
 	} else {
 		ack_APIC_irq();
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] x86/traps: add a system interrupt table for system interrupt dispatch
  2022-11-10  5:53 [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
  2022-11-10  5:53 ` [PATCH 1/6] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR Xin Li
@ 2022-11-10  5:53 ` Xin Li
  2022-11-10  5:53 ` [PATCH 3/6] x86/traps: add install_system_interrupt_handler() Xin Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Xin Li @ 2022-11-10  5:53 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: tglx, mingo, bp, dave.hansen, hpa, seanjc, pbonzini, kevin.tian

Upon receiving an external interrupt, KVM VMX reinjects it through
calling the interrupt handler in its IDT descriptor on the current
kernel stack, which essentially uses the IDT as an interrupt dispatch
table.

However the IDT is one of the lowest level critical data structures
between a x86 CPU and the Linux kernel, we should avoid using it
*directly* whenever possible, espeically in a software defined manner.

On x86, external interrupts are divided into the following groups
  1) system interrupts
  2) external device interrupts
With the IDT, system interrupts are dispatched through the IDT
directly, while external device interrupts are all routed to the
external interrupt dispatch function common_interrupt(), which
dispatches external device interrupts through a per-CPU external
interrupt dispatch table vector_irq.

To eliminate dispatching external interrupts through the IDT, add
a system interrupt handler table for dispatching a system interrupt
to its corresponding handler directly. Thus a software based dispatch
function will be:

  void external_interrupt(struct pt_regs *regs, u8 vector)
  {
    if (is_system_interrupt(vector))
      system_interrupt_handler_table[vector_to_sysvec(vector)](regs);
    else /* external device interrupt */
      common_interrupt(regs, vector);
  }

What's more, with the Intel FRED (Flexible Return and Event Delivery)
architecture, IDT, the hardware based event dispatch table, is gone,
and the Linux kernel needs to dispatch events to their handlers with
vector to handler mappings, the dispatch function external_interrupt()
is also needed.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/traps.h |  8 ++++++
 arch/x86/kernel/traps.c      | 55 ++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..3dc63d753bda 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -47,4 +47,12 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs,
 				      struct stack_info *info);
 #endif
 
+/*
+ * How system interrupt handlers are called.
+ */
+#define DECLARE_SYSTEM_INTERRUPT_HANDLER(f)			\
+	void f (struct pt_regs *regs __maybe_unused,		\
+		unsigned long vector __maybe_unused)
+typedef DECLARE_SYSTEM_INTERRUPT_HANDLER((*system_interrupt_handler));
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 178015a820f0..95dd917ef9ad 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1444,6 +1444,61 @@ DEFINE_IDTENTRY_SW(iret_error)
 }
 #endif
 
+#define SYSV(x,y) [(x) - FIRST_SYSTEM_VECTOR] = (system_interrupt_handler)y
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-function-type"
+
+/*
+ * The initializer spurious_interrupt() has two arguments of types struct
+ * pt_regs * and unsigned long, and the system interrupt handlers with
+ * prefix sysvec_ are all defined with either DEFINE_IDTENTRY_SYSVEC or
+ * DEFINE_IDTENTRY_SYSVEC_SIMPLE, both with only one argument of type
+ * struct pt_regs *. Because all handlers only declare and require a subset
+ * of the arguments provided by the full system_interrupt_handler prototype,
+ * the function type cast is safe here.
+ */
+const system_interrupt_handler system_interrupt_handler_table[NR_SYSTEM_VECTORS] = {
+	[0 ... NR_SYSTEM_VECTORS-1]		= spurious_interrupt,
+#ifdef CONFIG_SMP
+	SYSV(RESCHEDULE_VECTOR,			sysvec_reschedule_ipi),
+	SYSV(CALL_FUNCTION_VECTOR,		sysvec_call_function),
+	SYSV(CALL_FUNCTION_SINGLE_VECTOR,	sysvec_call_function_single),
+	SYSV(REBOOT_VECTOR,			sysvec_reboot),
+#endif
+
+#ifdef CONFIG_X86_THERMAL_VECTOR
+	SYSV(THERMAL_APIC_VECTOR,		sysvec_thermal),
+#endif
+
+#ifdef CONFIG_X86_MCE_THRESHOLD
+	SYSV(THRESHOLD_APIC_VECTOR,		sysvec_threshold),
+#endif
+
+#ifdef CONFIG_X86_MCE_AMD
+	SYSV(DEFERRED_ERROR_VECTOR,		sysvec_deferred_error),
+#endif
+
+#ifdef CONFIG_X86_LOCAL_APIC
+	SYSV(LOCAL_TIMER_VECTOR,		sysvec_apic_timer_interrupt),
+	SYSV(X86_PLATFORM_IPI_VECTOR,		sysvec_x86_platform_ipi),
+# ifdef CONFIG_HAVE_KVM
+	SYSV(POSTED_INTR_VECTOR,		sysvec_kvm_posted_intr_ipi),
+	SYSV(POSTED_INTR_WAKEUP_VECTOR,		sysvec_kvm_posted_intr_wakeup_ipi),
+	SYSV(POSTED_INTR_NESTED_VECTOR,		sysvec_kvm_posted_intr_nested_ipi),
+# endif
+# ifdef CONFIG_IRQ_WORK
+	SYSV(IRQ_WORK_VECTOR,			sysvec_irq_work),
+# endif
+	SYSV(SPURIOUS_APIC_VECTOR,		sysvec_spurious_apic_interrupt),
+	SYSV(ERROR_APIC_VECTOR,			sysvec_error_interrupt),
+#endif
+};
+
+#pragma GCC diagnostic pop
+
+#undef SYSV
+
 void __init trap_init(void)
 {
 	/* Init cpu_entry_area before IST entries are set up */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] x86/traps: add install_system_interrupt_handler()
  2022-11-10  5:53 [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
  2022-11-10  5:53 ` [PATCH 1/6] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR Xin Li
  2022-11-10  5:53 ` [PATCH 2/6] x86/traps: add a system interrupt table for system interrupt dispatch Xin Li
@ 2022-11-10  5:53 ` Xin Li
  2022-11-10  5:53 ` [PATCH 4/6] x86/traps: add external_interrupt() to dispatch external interrupts Xin Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Xin Li @ 2022-11-10  5:53 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: tglx, mingo, bp, dave.hansen, hpa, seanjc, pbonzini, kevin.tian

Some kernel components install system interrupt handlers into the IDT,
and we need to do the same for system_interrupt_handler_table. A new
function install_system_interrupt_handler() is added to install a system
interrupt handler into both the IDT and system_interrupt_handler_table.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/traps.h     |  2 ++
 arch/x86/kernel/cpu/acrn.c       |  7 +++++--
 arch/x86/kernel/cpu/mshyperv.c   | 22 ++++++++++++++--------
 arch/x86/kernel/kvm.c            |  4 +++-
 arch/x86/kernel/traps.c          | 10 +++++++++-
 drivers/xen/events/events_base.c |  5 ++++-
 6 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 3dc63d753bda..89c4233e19db 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -41,6 +41,8 @@ void math_emulate(struct math_emu_info *);
 
 bool fault_in_kernel_space(unsigned long address);
 
+void install_system_interrupt_handler(unsigned int n, const void *asm_addr, const void *addr);
+
 #ifdef CONFIG_VMAP_STACK
 void __noreturn handle_stack_overflow(struct pt_regs *regs,
 				      unsigned long fault_address,
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 485441b7f030..9351bf183a9e 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -18,6 +18,7 @@
 #include <asm/hypervisor.h>
 #include <asm/idtentry.h>
 #include <asm/irq_regs.h>
+#include <asm/traps.h>
 
 static u32 __init acrn_detect(void)
 {
@@ -26,8 +27,10 @@ static u32 __init acrn_detect(void)
 
 static void __init acrn_init_platform(void)
 {
-	/* Setup the IDT for ACRN hypervisor callback */
-	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_acrn_hv_callback);
+	/* Install system interrupt handler for ACRN hypervisor callback */
+	install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+					 asm_sysvec_acrn_hv_callback,
+					 sysvec_acrn_hv_callback);
 
 	x86_platform.calibrate_tsc = acrn_get_tsc_khz;
 	x86_platform.calibrate_cpu = acrn_get_tsc_khz;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 831613959a92..144b4a622188 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -29,6 +29,7 @@
 #include <asm/i8259.h>
 #include <asm/apic.h>
 #include <asm/timer.h>
+#include <asm/traps.h>
 #include <asm/reboot.h>
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
@@ -415,19 +416,24 @@ static void __init ms_hyperv_init_platform(void)
 	 */
 	x86_platform.apic_post_init = hyperv_init;
 	hyperv_setup_mmu_ops();
-	/* Setup the IDT for hypervisor callback */
-	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_hyperv_callback);
 
-	/* Setup the IDT for reenlightenment notifications */
+	/* Install system interrupt handler for hypervisor callback */
+	install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+					 asm_sysvec_hyperv_callback,
+					 sysvec_hyperv_callback);
+
+	/* Install system interrupt handler for reenlightenment notifications */
 	if (ms_hyperv.features & HV_ACCESS_REENLIGHTENMENT) {
-		alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
-				asm_sysvec_hyperv_reenlightenment);
+		install_system_interrupt_handler(HYPERV_REENLIGHTENMENT_VECTOR,
+						 asm_sysvec_hyperv_reenlightenment,
+						 sysvec_hyperv_reenlightenment);
 	}
 
-	/* Setup the IDT for stimer0 */
+	/* Install system interrupt handler for stimer0 */
 	if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE) {
-		alloc_intr_gate(HYPERV_STIMER0_VECTOR,
-				asm_sysvec_hyperv_stimer0);
+		install_system_interrupt_handler(HYPERV_STIMER0_VECTOR,
+						 asm_sysvec_hyperv_stimer0,
+						 sysvec_hyperv_stimer0);
 	}
 
 # ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d4e48b4a438b..b7388ed2a980 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -835,7 +835,9 @@ static void __init kvm_guest_init(void)
 
 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
 		static_branch_enable(&kvm_async_pf_enabled);
-		alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_kvm_asyncpf_interrupt);
+		install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+						 asm_sysvec_kvm_asyncpf_interrupt,
+						 sysvec_kvm_asyncpf_interrupt);
 	}
 
 #ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 95dd917ef9ad..9c7826e588bc 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1458,7 +1458,7 @@ DEFINE_IDTENTRY_SW(iret_error)
  * of the arguments provided by the full system_interrupt_handler prototype,
  * the function type cast is safe here.
  */
-const system_interrupt_handler system_interrupt_handler_table[NR_SYSTEM_VECTORS] = {
+system_interrupt_handler system_interrupt_handler_table[NR_SYSTEM_VECTORS] = {
 	[0 ... NR_SYSTEM_VECTORS-1]		= spurious_interrupt,
 #ifdef CONFIG_SMP
 	SYSV(RESCHEDULE_VECTOR,			sysvec_reschedule_ipi),
@@ -1499,6 +1499,14 @@ const system_interrupt_handler system_interrupt_handler_table[NR_SYSTEM_VECTORS]
 
 #undef SYSV
 
+void __init install_system_interrupt_handler(unsigned int n, const void *asm_addr, const void *addr)
+{
+	BUG_ON(n < FIRST_SYSTEM_VECTOR);
+
+	system_interrupt_handler_table[n - FIRST_SYSTEM_VECTOR] = (system_interrupt_handler)addr;
+	alloc_intr_gate(n, asm_addr);
+}
+
 void __init trap_init(void)
 {
 	/* Init cpu_entry_area before IST entries are set up */
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c443f04aaad7..1a9eaf417acc 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -45,6 +45,7 @@
 #include <asm/irq.h>
 #include <asm/io_apic.h>
 #include <asm/i8259.h>
+#include <asm/traps.h>
 #include <asm/xen/cpuid.h>
 #include <asm/xen/pci.h>
 #endif
@@ -2246,7 +2247,9 @@ static __init void xen_alloc_callback_vector(void)
 		return;
 
 	pr_info("Xen HVM callback vector for event delivery is enabled\n");
-	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_xen_hvm_callback);
+	install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+					 asm_sysvec_xen_hvm_callback,
+					 sysvec_xen_hvm_callback);
 }
 #else
 void xen_setup_callback_vector(void) {}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] x86/traps: add external_interrupt() to dispatch external interrupts
  2022-11-10  5:53 [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
                   ` (2 preceding siblings ...)
  2022-11-10  5:53 ` [PATCH 3/6] x86/traps: add install_system_interrupt_handler() Xin Li
@ 2022-11-10  5:53 ` Xin Li
  2022-11-10  5:53 ` [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Xin Li
  2022-11-10  5:53 ` [PATCH 6/6] x86/traps: remove unused NMI entry exc_nmi_noist() Xin Li
  5 siblings, 0 replies; 18+ messages in thread
From: Xin Li @ 2022-11-10  5:53 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: tglx, mingo, bp, dave.hansen, hpa, seanjc, pbonzini, kevin.tian

Add external_interrupt() to dispatch external interrupts to their
handlers. If an external interrupt is a system interrupt, dipatch
it through system_interrupt_handler_table, otherwise call into
common_interrupt().

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kernel/traps.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9c7826e588bc..c1eb3bd335ce 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1507,6 +1507,27 @@ void __init install_system_interrupt_handler(unsigned int n, const void *asm_add
 	alloc_intr_gate(n, asm_addr);
 }
 
+/*
+ * External interrupt dispatch function.
+ *
+ * Until/unless common_interrupt() can be taught to deal with the
+ * special system vectors, split the dispatch.
+ *
+ * Note: common_interrupt() already deals with IRQ_MOVE_CLEANUP_VECTOR.
+ */
+__visible noinstr void external_interrupt(struct pt_regs *regs,
+					  unsigned int vector)
+{
+	unsigned int sysvec = vector - FIRST_SYSTEM_VECTOR;
+
+	BUG_ON(vector < FIRST_EXTERNAL_VECTOR);
+
+	if (sysvec < NR_SYSTEM_VECTORS)
+		return system_interrupt_handler_table[sysvec](regs, vector);
+
+	common_interrupt(regs, vector);
+}
+
 void __init trap_init(void)
 {
 	/* Init cpu_entry_area before IST entries are set up */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-10  5:53 [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
                   ` (3 preceding siblings ...)
  2022-11-10  5:53 ` [PATCH 4/6] x86/traps: add external_interrupt() to dispatch external interrupts Xin Li
@ 2022-11-10  5:53 ` Xin Li
  2022-11-10 15:59   ` Sean Christopherson
  2022-11-10  5:53 ` [PATCH 6/6] x86/traps: remove unused NMI entry exc_nmi_noist() Xin Li
  5 siblings, 1 reply; 18+ messages in thread
From: Xin Li @ 2022-11-10  5:53 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: tglx, mingo, bp, dave.hansen, hpa, seanjc, pbonzini, kevin.tian

To eliminate dispatching NMI/IRQ through the IDT, add
kvm_vmx_reinject_nmi_irq(), which calls external_interrupt()
for IRQ reinjection.

Lastly replace calling a NMI/IRQ handler in an IDT descriptor
with calling kvm_vmx_reinject_nmi_irq().

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/traps.h |  2 ++
 arch/x86/kernel/traps.c      | 23 +++++++++++++++++++++++
 arch/x86/kvm/vmx/vmenter.S   | 33 ---------------------------------
 arch/x86/kvm/vmx/vmx.c       | 19 +++++++------------
 4 files changed, 32 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 89c4233e19db..4c56a8d31762 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -57,4 +57,6 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs,
 		unsigned long vector __maybe_unused)
 typedef DECLARE_SYSTEM_INTERRUPT_HANDLER((*system_interrupt_handler));
 
+void kvm_vmx_reinject_nmi_irq(u32 vector);
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c1eb3bd335ce..9abf91534b13 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1528,6 +1528,29 @@ __visible noinstr void external_interrupt(struct pt_regs *regs,
 	common_interrupt(regs, vector);
 }
 
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+/*
+ * KVM VMX reinjects NMI/IRQ on its current stack, it's a sync
+ * call thus the values in the pt_regs structure are not used in
+ * executing NMI/IRQ handlers, except cs.RPL and flags.IF, which
+ * are both always 0 in the VMX NMI/IRQ reinjection context. Thus
+ * we simply allocate a zeroed pt_regs structure on current stack
+ * to call external_interrupt().
+ */
+void kvm_vmx_reinject_nmi_irq(u32 vector)
+{
+	struct pt_regs irq_regs;
+
+	memset(&irq_regs, 0, sizeof(irq_regs));
+
+	if (vector == NMI_VECTOR)
+		return exc_nmi(&irq_regs);
+
+	external_interrupt(&irq_regs, vector);
+}
+EXPORT_SYMBOL_GPL(kvm_vmx_reinject_nmi_irq);
+#endif
+
 void __init trap_init(void)
 {
 	/* Init cpu_entry_area before IST entries are set up */
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 8477d8bdd69c..0c1608b329cd 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -317,36 +317,3 @@ SYM_FUNC_START(vmread_error_trampoline)
 
 	RET
 SYM_FUNC_END(vmread_error_trampoline)
-
-SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
-	/*
-	 * Unconditionally create a stack frame, getting the correct RSP on the
-	 * stack (for x86-64) would take two instructions anyways, and RBP can
-	 * be used to restore RSP to make objtool happy (see below).
-	 */
-	push %_ASM_BP
-	mov %_ASM_SP, %_ASM_BP
-
-#ifdef CONFIG_X86_64
-	/*
-	 * Align RSP to a 16-byte boundary (to emulate CPU behavior) before
-	 * creating the synthetic interrupt stack frame for the IRQ/NMI.
-	 */
-	and  $-16, %rsp
-	push $__KERNEL_DS
-	push %rbp
-#endif
-	pushf
-	push $__KERNEL_CS
-	CALL_NOSPEC _ASM_ARG1
-
-	/*
-	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
-	 * the correct value.  objtool doesn't know the callee will IRET and,
-	 * without the explicit restore, thinks the stack is getting walloped.
-	 * Using an unwind hint is problematic due to x86-64's dynamic alignment.
-	 */
-	mov %_ASM_BP, %_ASM_SP
-	pop %_ASM_BP
-	RET
-SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63247c57c72c..b457e4888468 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -46,6 +46,7 @@
 #include <asm/mshyperv.h>
 #include <asm/mwait.h>
 #include <asm/spec-ctrl.h>
+#include <asm/traps.h>
 #include <asm/virtext.h>
 #include <asm/vmx.h>
 
@@ -6758,15 +6759,11 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
 }
 
-void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
-					unsigned long entry)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 vector)
 {
-	bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist;
-
-	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
-	vmx_do_interrupt_nmi_irqoff(entry);
+	kvm_before_interrupt(vcpu, vector == NMI_VECTOR ?
+				   KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
+	kvm_vmx_reinject_nmi_irq(vector);
 	kvm_after_interrupt(vcpu);
 }
 
@@ -6792,7 +6789,6 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
-	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
 	/* if exit due to PF check for async PF */
@@ -6806,20 +6802,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
+		handle_interrupt_nmi_irqoff(&vmx->vcpu, NMI_VECTOR);
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
 	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
 
 	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+	handle_interrupt_nmi_irqoff(vcpu, vector);
 	vcpu->arch.at_instruction_boundary = true;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] x86/traps: remove unused NMI entry exc_nmi_noist()
  2022-11-10  5:53 [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
                   ` (4 preceding siblings ...)
  2022-11-10  5:53 ` [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Xin Li
@ 2022-11-10  5:53 ` Xin Li
  5 siblings, 0 replies; 18+ messages in thread
From: Xin Li @ 2022-11-10  5:53 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: tglx, mingo, bp, dave.hansen, hpa, seanjc, pbonzini, kevin.tian

After the introduction of kvm_vmx_reinject_irq(), exc_nmi_noist()
is no longer needed, thus remove it.

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/idtentry.h | 15 ---------------
 arch/x86/kernel/nmi.c           | 10 ----------
 2 files changed, 25 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 72184b0b2219..da28ac17c57a 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -581,21 +581,6 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
 #endif
 
 /* NMI */
-
-#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
-/*
- * Special NOIST entry point for VMX which invokes this on the kernel
- * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
- * 'executing' marker.
- *
- * On 32bit this just uses the regular NMI entry point because 32-bit does
- * not have ISTs.
- */
-DECLARE_IDTENTRY(X86_TRAP_NMI,		exc_nmi_noist);
-#else
-#define asm_exc_nmi_noist		asm_exc_nmi
-#endif
-
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
 #ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index cec0bfa3bc04..816bb59a4ba4 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -527,16 +527,6 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
-#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
-DEFINE_IDTENTRY_RAW(exc_nmi_noist)
-{
-	exc_nmi(regs);
-}
-#endif
-#if IS_MODULE(CONFIG_KVM_INTEL)
-EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
-#endif
-
 void stop_nmi(void)
 {
 	ignore_nmis++;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-10  5:53 ` [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Xin Li
@ 2022-11-10 15:59   ` Sean Christopherson
  2022-11-10 20:40     ` Li, Xin3
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2022-11-10 15:59 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, x86, kvm, tglx, mingo, bp, dave.hansen, hpa,
	pbonzini, kevin.tian

On Wed, Nov 09, 2022, Xin Li wrote:
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
> +/*
> + * KVM VMX reinjects NMI/IRQ on its current stack, it's a sync

Please use a verb other than "reinject".  There is no event injection of any kind,
KVM is simply making a function call.  KVM already uses "inject" and "reinject"
for KVM where KVM is is literally injecting events into the guest.

The "kvm_vmx" part is also weird IMO.  The function is in x86's traps/exceptions
namespace, not the KVM VMX namespace.

Maybe exc_raise_nmi_or_irq()?

> + * call thus the values in the pt_regs structure are not used in
> + * executing NMI/IRQ handlers,

Won't this break stack traces to some extent?

> +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 vector)
>  {
> -	bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist;
> -
> -	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
> -	vmx_do_interrupt_nmi_irqoff(entry);
> +	kvm_before_interrupt(vcpu, vector == NMI_VECTOR ?
> +				   KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
> +	kvm_vmx_reinject_nmi_irq(vector);

This is where I strongly object to kvm_vmx_reinject_nmi_irq().  This looks like
KVM is reinjecting the event into the guest, which is all kinds of confusing.

>  	kvm_after_interrupt(vcpu);
>  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-10 15:59   ` Sean Christopherson
@ 2022-11-10 20:40     ` Li, Xin3
  2022-11-10 20:53       ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Li, Xin3 @ 2022-11-10 20:40 UTC (permalink / raw)
  To: Christopherson,, Sean
  Cc: linux-kernel, x86, kvm, tglx, mingo, bp, dave.hansen, hpa,
	pbonzini, Tian, Kevin

> > +#if IS_ENABLED(CONFIG_KVM_INTEL)
> > +/*
> > + * KVM VMX reinjects NMI/IRQ on its current stack, it's a sync
> 
> Please use a verb other than "reinject".  There is no event injection of any kind,
> KVM is simply making a function call.  KVM already uses "inject" and "reinject"
> for KVM where KVM is is literally injecting events into the guest.
> 
> The "kvm_vmx" part is also weird IMO.  The function is in x86's
> traps/exceptions namespace, not the KVM VMX namespace.

right, "kvm_vmx" doesn't look good per your explanation.

> 
> Maybe exc_raise_nmi_or_irq()?

It's good for me.

> 
> > + * call thus the values in the pt_regs structure are not used in
> > + * executing NMI/IRQ handlers,
> 
> Won't this break stack traces to some extent?
> 

The pt_regs structure, and its IP/CS, is NOT part of the call stack, thus
I don't see a problem. No?

> > +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32
> > +vector)
> >  {
> > -	bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist;
> > -
> > -	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI :
> KVM_HANDLING_IRQ);
> > -	vmx_do_interrupt_nmi_irqoff(entry);
> > +	kvm_before_interrupt(vcpu, vector == NMI_VECTOR ?
> > +				   KVM_HANDLING_NMI :
> KVM_HANDLING_IRQ);
> > +	kvm_vmx_reinject_nmi_irq(vector);
> 
> This is where I strongly object to kvm_vmx_reinject_nmi_irq().  This looks like
> KVM is reinjecting the event into the guest, which is all kinds of confusing.
> 
> >  	kvm_after_interrupt(vcpu);
> >  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-10 20:40     ` Li, Xin3
@ 2022-11-10 20:53       ` Sean Christopherson
  2022-11-11  9:19         ` Peter Zijlstra
  2022-11-14  2:18         ` Li, Xin3
  0 siblings, 2 replies; 18+ messages in thread
From: Sean Christopherson @ 2022-11-10 20:53 UTC (permalink / raw)
  To: Li, Xin3
  Cc: linux-kernel, x86, kvm, tglx, mingo, bp, dave.hansen, hpa,
	pbonzini, Tian, Kevin

On Thu, Nov 10, 2022, Li, Xin3 wrote:
> > > +#if IS_ENABLED(CONFIG_KVM_INTEL)
> > > +/*
> > > + * KVM VMX reinjects NMI/IRQ on its current stack, it's a sync
> > 
> > Please use a verb other than "reinject".  There is no event injection of any kind,
> > KVM is simply making a function call.  KVM already uses "inject" and "reinject"
> > for KVM where KVM is is literally injecting events into the guest.
> > 
> > The "kvm_vmx" part is also weird IMO.  The function is in x86's
> > traps/exceptions namespace, not the KVM VMX namespace.
> 
> right, "kvm_vmx" doesn't look good per your explanation.
> 
> > 
> > Maybe exc_raise_nmi_or_irq()?
> 
> It's good for me.
> 
> > 
> > > + * call thus the values in the pt_regs structure are not used in
> > > + * executing NMI/IRQ handlers,
> > 
> > Won't this break stack traces to some extent?
> > 
> 
> The pt_regs structure, and its IP/CS, is NOT part of the call stack, thus
> I don't see a problem. No?

  bool nmi_cpu_backtrace(struct pt_regs *regs)
  {
  	int cpu = smp_processor_id();
  	unsigned long flags;
  
  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
  		/*
  		 * Allow nested NMI backtraces while serializing
  		 * against other CPUs.
  		 */
  		printk_cpu_sync_get_irqsave(flags);
  		if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
  			pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
  				cpu, (void *)instruction_pointer(regs));
  		} else {
  			pr_warn("NMI backtrace for cpu %d\n", cpu);
  			if (regs)
  				show_regs(regs); <============================== HERE!!!
  			else
  				dump_stack();
  		}
  		printk_cpu_sync_put_irqrestore(flags);
  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
  		return true;
  	}
  
  	return false;
  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-10 20:53       ` Sean Christopherson
@ 2022-11-11  9:19         ` Peter Zijlstra
  2022-11-11  9:29           ` H. Peter Anvin
  2022-11-14  2:18         ` Li, Xin3
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-11-11  9:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Li, Xin3, linux-kernel, x86, kvm, tglx, mingo, bp, dave.hansen,
	hpa, pbonzini, Tian, Kevin

On Thu, Nov 10, 2022 at 08:53:09PM +0000, Sean Christopherson wrote:
> On Thu, Nov 10, 2022, Li, Xin3 wrote:

> > > > + * call thus the values in the pt_regs structure are not used in
> > > > + * executing NMI/IRQ handlers,
> > > 
> > > Won't this break stack traces to some extent?
> > > 
> > 
> > The pt_regs structure, and its IP/CS, is NOT part of the call stack, thus
> > I don't see a problem. No?

I'm not sure what Xin3 is trying to say, but NMI/IRQ handers use pt_regs
a *LOT*. pt_regs *MUST* be correct.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-11  9:19         ` Peter Zijlstra
@ 2022-11-11  9:29           ` H. Peter Anvin
  2022-11-11 10:45             ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2022-11-11  9:29 UTC (permalink / raw)
  To: Peter Zijlstra, Sean Christopherson
  Cc: Li, Xin3, linux-kernel, x86, kvm, tglx, mingo, bp, dave.hansen,
	pbonzini, Tian, Kevin

On November 11, 2022 1:19:23 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>On Thu, Nov 10, 2022 at 08:53:09PM +0000, Sean Christopherson wrote:
>> On Thu, Nov 10, 2022, Li, Xin3 wrote:
>
>> > > > + * call thus the values in the pt_regs structure are not used in
>> > > > + * executing NMI/IRQ handlers,
>> > > 
>> > > Won't this break stack traces to some extent?
>> > > 
>> > 
>> > The pt_regs structure, and its IP/CS, is NOT part of the call stack, thus
>> > I don't see a problem. No?
>
>I'm not sure what Xin3 is trying to say, but NMI/IRQ handers use pt_regs
>a *LOT*. pt_regs *MUST* be correct.

What is "correct" in this context? Could you describe what aspects of the register image you rely on, and what you expect them to be? Currently KVM basically stuff random data into pt_regs; this at least makes it explicitly zero.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-11  9:29           ` H. Peter Anvin
@ 2022-11-11 10:45             ` Peter Zijlstra
  2022-11-11 11:57               ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-11-11 10:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sean Christopherson, Li, Xin3, linux-kernel, x86, kvm, tglx,
	mingo, bp, dave.hansen, pbonzini, Tian, Kevin

On Fri, Nov 11, 2022 at 01:29:35AM -0800, H. Peter Anvin wrote:
> On November 11, 2022 1:19:23 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
> >On Thu, Nov 10, 2022 at 08:53:09PM +0000, Sean Christopherson wrote:
> >> On Thu, Nov 10, 2022, Li, Xin3 wrote:
> >
> >> > > > + * call thus the values in the pt_regs structure are not used in
> >> > > > + * executing NMI/IRQ handlers,
> >> > > 
> >> > > Won't this break stack traces to some extent?
> >> > > 
> >> > 
> >> > The pt_regs structure, and its IP/CS, is NOT part of the call stack, thus
> >> > I don't see a problem. No?
> >
> >I'm not sure what Xin3 is trying to say, but NMI/IRQ handers use pt_regs
> >a *LOT*. pt_regs *MUST* be correct.
> 
> What is "correct" in this context? 

I don't know since I don't really speak virt, but I could image the
regset that would match the vmrun (or whatever intel decided to call
that again) instruction.

> Could you describe what aspects of
> the register image you rely on, and what you expect them to be?

We rely on CS,IP,FLAGS,SS,SP to be coherent and usable at the very least
(must be able to start an unwind from it). But things like perf (NMI)
access *all* of them and possibly copy them out to userspace. Perf can
also try and use the segment registers in order to try and establish a
linear address.

Some exceptions (#GP) access whatever is needed to fully decode and
emulate the instruction (IOPL,UMIP,etc..) including the segment
registers.

> Currently KVM basically stuff random data into pt_regs; this at least
> makes it explicitly zero.

:-( Both is broken. Once again proving to me that virt is a bunch of
duck-tape at best.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-11 10:45             ` Peter Zijlstra
@ 2022-11-11 11:57               ` Paolo Bonzini
  2022-11-11 12:10                 ` Peter Zijlstra
  2022-11-14  4:10                 ` Li, Xin3
  0 siblings, 2 replies; 18+ messages in thread
From: Paolo Bonzini @ 2022-11-11 11:57 UTC (permalink / raw)
  To: Peter Zijlstra, H. Peter Anvin
  Cc: Sean Christopherson, Li, Xin3, linux-kernel, x86, kvm, tglx,
	mingo, bp, dave.hansen, Tian, Kevin

On 11/11/22 11:45, Peter Zijlstra wrote:
>> What is "correct" in this context?
>
> I don't know since I don't really speak virt, but I could image the
> regset that would match the vmrun (or whatever intel decided to call
> that again) instruction.

Right now it is not exactly that but close.  The RIP is somewhere in 
vmx_do_interrupt_nmi_irqoff; CS/SS are correct (i.e. it's not like they 
point to guest values!) and other registers including RSP and RFLAGS are 
consistent with the RIP.

>> Currently KVM basically stuff random data into pt_regs; this at least
>> makes it explicitly zero.
>
> 🙁 Both is broken. Once again proving to me that virt is a bunch of
> duck-tape at best.

Except currently it is not random.  At least I'm not alone in sometimes 
thinking I understand stuff when I actually don't.

Zero is just wrong, I agree.  Xin, if you don't want to poke at the IDT 
you need to build the pt_regs and pass them to the function you add in 
patch 5.  In order to make it like Peter wants it, then:

1) for the flags just use X86_EFLAGS_FIXED

2) for CS/SS segment selectors use __KERNEL_CS and __KERNEL_DS

3) the RIP/RSP can be found respectively in (unsigned long)vmx_vmexit 
vmx->loaded_vmcs->host_state.rsp

4) the other registers can be taken from vcpu->arch.regs

But I am not sure it's an improvement.  It may be okay to zero the 
registers, but setting CS/RIP/SS/RSP/RFLAGS to the actual processor 
values in vmx_do_interrupt_nmi_irqoff is safer.

I'm not sure who would set orig_rax, but I haven't looked too closely. 
Perhaps that's not a problem, but if so it has to be documented in the 
commit message.

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-11 11:57               ` Paolo Bonzini
@ 2022-11-11 12:10                 ` Peter Zijlstra
  2022-11-11 12:17                   ` Paolo Bonzini
  2022-11-14  4:10                 ` Li, Xin3
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-11-11 12:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: H. Peter Anvin, Sean Christopherson, Li, Xin3, linux-kernel, x86,
	kvm, tglx, mingo, bp, dave.hansen, Tian, Kevin

On Fri, Nov 11, 2022 at 12:57:58PM +0100, Paolo Bonzini wrote:
> On 11/11/22 11:45, Peter Zijlstra wrote:
> > > What is "correct" in this context?
> > 
> > I don't know since I don't really speak virt, but I could image the
> > regset that would match the vmrun (or whatever intel decided to call
> > that again) instruction.
> 
> Right now it is not exactly that but close.  The RIP is somewhere in
> vmx_do_interrupt_nmi_irqoff; CS/SS are correct (i.e. it's not like they
> point to guest values!) and other registers including RSP and RFLAGS are
> consistent with the RIP.

*phew*, that sounds a *lot* better than 'random'. And yes, that should
do.

Another thing; these patches appear to be about system vectors and
everything, but what I understand from Andrew is that VMX is only screwy
vs NMI, not regular interrupts/exceptions, so where does that come from?

SVM specifically fixed the NMI wonkyness with their Global Interrupt
flag thingy.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-11 12:10                 ` Peter Zijlstra
@ 2022-11-11 12:17                   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2022-11-11 12:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Sean Christopherson, Li, Xin3, linux-kernel, x86,
	kvm, tglx, mingo, bp, dave.hansen, Tian, Kevin


On 11/11/22 13:10, Peter Zijlstra wrote:
> *phew*, that sounds a*lot*  better than 'random'. And yes, that should
> do.
> 
> Another thing; these patches appear to be about system vectors and
> everything, but what I understand from Andrew is that VMX is only screwy
> vs NMI, not regular interrupts/exceptions, so where does that come from?

Exceptions are fine, for interrupts it's optional in theory but in 
practice you have to invoke them manually just like NMIs (I had replied 
on this in the other thread).

Paolo

> SVM specifically fixed the NMI wonkyness with their Global Interrupt
> flag thingy.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-10 20:53       ` Sean Christopherson
  2022-11-11  9:19         ` Peter Zijlstra
@ 2022-11-14  2:18         ` Li, Xin3
  1 sibling, 0 replies; 18+ messages in thread
From: Li, Xin3 @ 2022-11-14  2:18 UTC (permalink / raw)
  To: Christopherson,, Sean
  Cc: linux-kernel, x86, kvm, tglx, mingo, bp, dave.hansen, hpa,
	pbonzini, Tian, Kevin

> > > > + * call thus the values in the pt_regs structure are not used in
> > > > + * executing NMI/IRQ handlers,
> > >
> > > Won't this break stack traces to some extent?
> > >
> >
> > The pt_regs structure, and its IP/CS, is NOT part of the call stack,
> > thus I don't see a problem. No?
> 
>   bool nmi_cpu_backtrace(struct pt_regs *regs)
>   {
>   	int cpu = smp_processor_id();
>   	unsigned long flags;
> 
>   	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
>   		/*
>   		 * Allow nested NMI backtraces while serializing
>   		 * against other CPUs.
>   		 */
>   		printk_cpu_sync_get_irqsave(flags);
>   		if (!READ_ONCE(backtrace_idle) && regs &&
> cpu_in_idle(instruction_pointer(regs))) {
>   			pr_warn("NMI backtrace for cpu %d skipped: idling at
> %pS\n",
>   				cpu, (void *)instruction_pointer(regs));
>   		} else {
>   			pr_warn("NMI backtrace for cpu %d\n", cpu);
>   			if (regs)
>   				show_regs(regs);
> <============================== HERE!!!
>   			else
>   				dump_stack();
>   		}
>   		printk_cpu_sync_put_irqrestore(flags);
>   		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>   		return true;
>   	}
> 
>   	return false;
>   }

Right, this is an example in which pt_regs's usage gets broken with my patch.

However a bigger problem emerges, how NMI handlers should get called when VMX
is running. If we could address it, we will probably be okay with pt_regs's
usage.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
  2022-11-11 11:57               ` Paolo Bonzini
  2022-11-11 12:10                 ` Peter Zijlstra
@ 2022-11-14  4:10                 ` Li, Xin3
  1 sibling, 0 replies; 18+ messages in thread
From: Li, Xin3 @ 2022-11-14  4:10 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Zijlstra, H. Peter Anvin
  Cc: Christopherson,,
	Sean, linux-kernel, x86, kvm, tglx, mingo, bp, dave.hansen, Tian,
	Kevin



> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Friday, November 11, 2022 3:58 AM
> To: Peter Zijlstra <peterz@infradead.org>; H. Peter Anvin <hpa@zytor.com>
> Cc: Christopherson,, Sean <seanjc@google.com>; Li, Xin3 <xin3.li@intel.com>;
> linux-kernel@vger.kernnel.org; x86@kernel.org; kvm@vger.kernel.org;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de;
> dave.hansen@linux.intel.com; Tian, Kevin <kevin.tian@intel.com>
> Subject: Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for
> NMI/IRQ reinjection
> 
> On 11/11/22 11:45, Peter Zijlstra wrote:
> >> What is "correct" in this context?
> >
> > I don't know since I don't really speak virt, but I could image the
> > regset that would match the vmrun (or whatever intel decided to call
> > that again) instruction.
> 
> Right now it is not exactly that but close.  The RIP is somewhere in
> vmx_do_interrupt_nmi_irqoff; CS/SS are correct (i.e. it's not like they point to
> guest values!) and other registers including RSP and RFLAGS are consistent with
> the RIP.
> 
> >> Currently KVM basically stuff random data into pt_regs; this at least
> >> makes it explicitly zero.
> >
> > 🙁 Both is broken. Once again proving to me that virt is a bunch of
> > duck-tape at best.
> 
> Except currently it is not random.  At least I'm not alone in sometimes thinking I
> understand stuff when I actually don't.
> 
> Zero is just wrong, I agree.  Xin, if you don't want to poke at the IDT you need
> to build the pt_regs and pass them to the function you add in patch 5.  In order
> to make it like Peter wants it, then:
> 
> 1) for the flags just use X86_EFLAGS_FIXED
> 
> 2) for CS/SS segment selectors use __KERNEL_CS and __KERNEL_DS
> 
> 3) the RIP/RSP can be found respectively in (unsigned long)vmx_vmexit
> vmx->loaded_vmcs->host_state.rsp
> 
> 4) the other registers can be taken from vcpu->arch.regs
> 
> But I am not sure it's an improvement.  It may be okay to zero the registers, but
> setting CS/RIP/SS/RSP/RFLAGS to the actual processor values in
> vmx_do_interrupt_nmi_irqoff is safer.
> 
> I'm not sure who would set orig_rax, but I haven't looked too closely.
> Perhaps that's not a problem, but if so it has to be documented in the commit
> message.

No IRQ handler uses a pt_regs structure as an argument, while NMI handlers do.

For NMI based profiling, what it wants maybe is more of the guest RIP?
I might make it over complicated by mixing host/guest profiling.

> Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-11-14  4:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10  5:53 [PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
2022-11-10  5:53 ` [PATCH 1/6] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR Xin Li
2022-11-10  5:53 ` [PATCH 2/6] x86/traps: add a system interrupt table for system interrupt dispatch Xin Li
2022-11-10  5:53 ` [PATCH 3/6] x86/traps: add install_system_interrupt_handler() Xin Li
2022-11-10  5:53 ` [PATCH 4/6] x86/traps: add external_interrupt() to dispatch external interrupts Xin Li
2022-11-10  5:53 ` [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Xin Li
2022-11-10 15:59   ` Sean Christopherson
2022-11-10 20:40     ` Li, Xin3
2022-11-10 20:53       ` Sean Christopherson
2022-11-11  9:19         ` Peter Zijlstra
2022-11-11  9:29           ` H. Peter Anvin
2022-11-11 10:45             ` Peter Zijlstra
2022-11-11 11:57               ` Paolo Bonzini
2022-11-11 12:10                 ` Peter Zijlstra
2022-11-11 12:17                   ` Paolo Bonzini
2022-11-14  4:10                 ` Li, Xin3
2022-11-14  2:18         ` Li, Xin3
2022-11-10  5:53 ` [PATCH 6/6] x86/traps: remove unused NMI entry exc_nmi_noist() Xin Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.