LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack
@ 2021-04-26 23:09 Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky

From: Lai Jiangshan <laijs@linux.alibaba.com>

In VMX, the NMI handler needs to be invoked after NMI VM-Exit.

Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
indirect call instead of INTn"), the work is done by INTn ("int $2").

But INTn microcode is relatively expensive, so the commit reworked
NMI VM-Exit handling to invoke the kernel handler by function call.
And INTn doesn't set the NMI blocked flag required by the linux kernel
NMI entry.  So moving away from INTn are very reasonable.

Yet some details were missed.  After the said commit applied, the NMI
entry pointer is fetched from the IDT table and called from the kernel
stack.  But the NMI entry pointer installed on the IDT table is
asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
And it relies on the "NMI executing" variable on the IST stack to work
correctly.  When it is unexpectedly called from the kernel stack, the
RSP-located "NMI executing" variable is also on the kernel stack and
is "uninitialized" and can cause the NMI entry to run in the wrong way.

During fixing the problem for KVM, I found that there might be the same
problem for early booting stage where the IST is not set up. asm_exc_nmi()
is not allowed to be used in this stage for the same reason about
the RSP-located "NMI executing" variable.

For both cases, we should use asm_noist_exc_nmi() which is introduced
in the patch 1 via renaming from an existing asm_xenpv_exc_nmi() and
which is safe on the kernel stack.

https://lore.kernel.org/lkml/20200915191505.10355-3-sean.j.christopherson@intel.com/

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: kvm@vger.kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>

Lai Jiangshan (4):
  x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  KVM/VMX: fold handle_interrupt_nmi_irqoff() into its solo caller

 arch/x86/include/asm/idtentry.h |  4 +---
 arch/x86/kernel/idt.c           |  8 +++++++-
 arch/x86/kernel/nmi.c           | 12 ++++++++++++
 arch/x86/kvm/vmx/vmx.c          | 27 ++++++++++++++-------------
 arch/x86/xen/enlighten_pv.c     |  9 +++------
 arch/x86/xen/xen-asm.S          |  2 +-
 6 files changed, 38 insertions(+), 24 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
@ 2021-04-26 23:09 ` Lai Jiangshan
  2021-04-28 21:27   ` Steven Rostedt
                     ` (2 more replies)
  2021-04-26 23:09 ` [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage Lai Jiangshan
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 38+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

From: Lai Jiangshan <laijs@linux.alibaba.com>

There is no any functionality change intended.  Just rename it and
move it to arch/x86/kernel/nmi.c so that we can resue it later in
next patch for early NMI and kvm.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: kvm@vger.kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/idtentry.h | 2 +-
 arch/x86/kernel/nmi.c           | 8 ++++++++
 arch/x86/xen/enlighten_pv.c     | 9 +++------
 arch/x86/xen/xen-asm.S          | 2 +-
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index e35e342673c7..5b11d2ddbb5c 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -590,7 +590,7 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
 /* NMI */
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
 #ifdef CONFIG_XEN_PV
-DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
+DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	noist_exc_nmi);
 #endif
 
 /* #DB */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index bf250a339655..2b907a76d0a1 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
+#ifdef CONFIG_XEN_PV
+DEFINE_IDTENTRY_RAW(noist_exc_nmi)
+{
+	/* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
+	exc_nmi(regs);
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4f18cd9eacd8..5efbdb0905b7 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -565,12 +565,6 @@ static void xen_write_ldt_entry(struct desc_struct *dt, int entrynum,
 
 void noist_exc_debug(struct pt_regs *regs);
 
-DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
-{
-	/* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
-	exc_nmi(regs);
-}
-
 DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
 {
 	/* On Xen PV, DF doesn't use IST.  The C part is the same as native. */
@@ -626,6 +620,9 @@ struct trap_array_entry {
 	.xen		= xen_asm_xenpv_##func,		\
 	.ist_okay	= ist_ok }
 
+/* Alias to make TRAP_ENTRY_REDIR() happy for nmi */
+#define xen_asm_xenpv_exc_nmi	xen_asm_noist_exc_nmi
+
 static struct trap_array_entry trap_array[] = {
 	TRAP_ENTRY_REDIR(exc_debug,			true  ),
 	TRAP_ENTRY_REDIR(exc_double_fault,		true  ),
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 1e626444712b..12e7cbbb2a8d 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -130,7 +130,7 @@ _ASM_NOKPROBE(xen_\name)
 xen_pv_trap asm_exc_divide_error
 xen_pv_trap asm_xenpv_exc_debug
 xen_pv_trap asm_exc_int3
-xen_pv_trap asm_xenpv_exc_nmi
+xen_pv_trap asm_noist_exc_nmi
 xen_pv_trap asm_exc_overflow
 xen_pv_trap asm_exc_bounds
 xen_pv_trap asm_exc_invalid_op
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
@ 2021-04-26 23:09 ` Lai Jiangshan
  2021-04-28 21:30   ` Steven Rostedt
  2021-05-03 20:13   ` Thomas Gleixner
  2021-04-26 23:09 ` [PATCH 3/4] " Lai Jiangshan
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 38+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai

From: Lai Jiangshan <laijs@linux.alibaba.com>

While the other entries for the exceptions which use Interrupt stacks can
be also used on the kernel stack, asm_exc_nmi() can not be used on the
kernel stack for it relies on the RSP-located "NMI executing" variable
which expects to on a fixed location in the NMI IST stack.  When it is
unexpectedly called from the kernel stack, the RSP-located "NMI executing"
variable is also on the kernel stack and is "uninitialized" and can cause
the NMI entry to run in the wrong way.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/idtentry.h | 2 --
 arch/x86/kernel/idt.c           | 8 +++++++-
 arch/x86/kernel/nmi.c           | 7 ++++---
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 5b11d2ddbb5c..0831c0da5957 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -589,9 +589,7 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
 
 /* NMI */
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
-#ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	noist_exc_nmi);
-#endif
 
 /* #DB */
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index d552f177eca0..c75409633f16 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -71,10 +71,16 @@ static const __initconst struct idt_data early_idts[] = {
  * cpu_init() is invoked. Interrupt stacks cannot be used at that point and
  * the traps which use them are reinitialized with IST after cpu_init() has
  * set up TSS.
+ *
+ * While the other entries for the exceptions which use Interrupt stacks can
+ * be also used on the kernel stack, asm_exc_nmi() can not be used on the
+ * kernel stack for it relies on the RSP-located "NMI executing" variable
+ * which expects to on a fixed location in the NMI IST stack.  For early
+ * booting stage, asm_noist_exc_nmi() is used for NMI.
  */
 static const __initconst struct idt_data def_idts[] = {
 	INTG(X86_TRAP_DE,		asm_exc_divide_error),
-	INTG(X86_TRAP_NMI,		asm_exc_nmi),
+	INTG(X86_TRAP_NMI,		asm_noist_exc_nmi),
 	INTG(X86_TRAP_BR,		asm_exc_bounds),
 	INTG(X86_TRAP_UD,		asm_exc_invalid_op),
 	INTG(X86_TRAP_NM,		asm_exc_device_not_available),
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 2b907a76d0a1..2fb1fd59d714 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,13 +524,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
-#ifdef CONFIG_XEN_PV
 DEFINE_IDTENTRY_RAW(noist_exc_nmi)
 {
-	/* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
+	/*
+	 * On Xen PV and early booting stage, NMI doesn't use IST.
+	 * The C part is the same as native.
+	 */
 	exc_nmi(regs);
 }
-#endif
 
 void stop_nmi(void)
 {
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage Lai Jiangshan
@ 2021-04-26 23:09 ` Lai Jiangshan
  2021-04-30  2:46   ` Lai Jiangshan
                     ` (2 more replies)
  2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
  2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
  4 siblings, 3 replies; 38+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra

From: Lai Jiangshan <laijs@linux.alibaba.com>

In VMX, the NMI handler needs to be invoked after NMI VM-Exit.

Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
indirect call instead of INTn"), the work is done by INTn ("int $2").

But INTn microcode is relatively expensive, so the commit reworked
NMI VM-Exit handling to invoke the kernel handler by function call.
And INTn doesn't set the NMI blocked flag required by the linux kernel
NMI entry.  So moving away from INTn are very reasonable.

Yet some details were missed.  After the said commit applied, the NMI
entry pointer is fetched from the IDT table and called from the kernel
stack.  But the NMI entry pointer installed on the IDT table is
asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
And it relies on the "NMI executing" variable on the IST stack to work
correctly.  When it is unexpectedly called from the kernel stack, the
RSP-located "NMI executing" variable is also on the kernel stack and
is "uninitialized" and can cause the NMI entry to run in the wrong way.

So we should not used the NMI entry installed on the IDT table.  Rather,
we should use the NMI entry allowed to be used on the kernel stack which
is asm_noist_exc_nmi() which is also used for XENPV and early booting.

Link: https://lore.kernel.org/lkml/20200915191505.10355-3-sean.j.christopherson@intel.com/
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: kvm@vger.kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kernel/nmi.c  | 3 +++
 arch/x86/kvm/vmx/vmx.c | 8 ++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 2fb1fd59d714..919f0400d931 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -528,10 +528,13 @@ DEFINE_IDTENTRY_RAW(noist_exc_nmi)
 {
 	/*
 	 * On Xen PV and early booting stage, NMI doesn't use IST.
+	 * And when it is manually called from VMX NMI VM-Exit handler,
+	 * it doesn't use IST either.
 	 * The C part is the same as native.
 	 */
 	exc_nmi(regs);
 }
+EXPORT_SYMBOL_GPL(asm_noist_exc_nmi);
 
 void stop_nmi(void)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcbf0d2139e9..96e59d912637 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
 #include <asm/debugreg.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
 #include <asm/kexec.h>
@@ -6416,8 +6417,11 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+	else if (is_nmi(intr_info)) {
+		kvm_before_interrupt(&vmx->vcpu);
+		vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
+		kvm_after_interrupt(&vmx->vcpu);
+	}
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
                   ` (2 preceding siblings ...)
  2021-04-26 23:09 ` [PATCH 3/4] " Lai Jiangshan
@ 2021-04-26 23:09 ` Lai Jiangshan
  2021-04-30  9:03   ` Thomas Gleixner
  2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
  4 siblings, 1 reply; 38+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

The function handle_interrupt_nmi_irqoff() is called only once and
it doesn't handle for NMI, so its name is outdated.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/vmx/vmx.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 96e59d912637..92c22211203e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6396,16 +6396,6 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
-{
-	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
-	kvm_before_interrupt(vcpu);
-	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
-	kvm_after_interrupt(vcpu);
-}
-
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
@@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
+	unsigned int vector;
+	gate_desc *desc;
 
 	if (WARN_ONCE(!is_external_intr(intr_info),
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, intr_info);
+	vector = intr_info & INTR_INFO_VECTOR_MASK;
+	desc = (gate_desc *)host_idt_base + vector;
+
+	kvm_before_interrupt(vcpu);
+	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+	kvm_after_interrupt(vcpu);
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
@ 2021-04-28 21:27   ` Steven Rostedt
  2021-04-30  7:15     ` Paolo Bonzini
  2021-05-03 19:05   ` Thomas Gleixner
  2021-05-10  7:59   ` Juergen Gross
  2 siblings, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2021-04-28 21:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

On Tue, 27 Apr 2021 07:09:46 +0800
Lai Jiangshan <jiangshanlai@gmail.com> wrote:

> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> There is no any functionality change intended.  Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.

Nit, but in change logs, please avoid stating "next patch" as searching git
history (via git blame or whatever) there is no such thing as "next patch".

Just state: "so that we can reuse it for early NMI and KVM."

I also just noticed the typo in "resue". Or maybe both NMI and KVM should
be sued again ;-)

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  2021-04-26 23:09 ` [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage Lai Jiangshan
@ 2021-04-28 21:30   ` Steven Rostedt
  2021-05-03 20:13   ` Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: Steven Rostedt @ 2021-04-28 21:30 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai

On Tue, 27 Apr 2021 07:09:47 +0800
Lai Jiangshan <jiangshanlai@gmail.com> wrote:

> While the other entries for the exceptions which use Interrupt stacks can
> be also used on the kernel stack, asm_exc_nmi() can not be used on the
> kernel stack for it relies on the RSP-located "NMI executing" variable
> which expects to on a fixed location in the NMI IST stack.  When it is
> unexpectedly called from the kernel stack, the RSP-located "NMI executing"
> variable is also on the kernel stack and is "uninitialized" and can cause
> the NMI entry to run in the wrong way.

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 ` [PATCH 3/4] " Lai Jiangshan
@ 2021-04-30  2:46   ` Lai Jiangshan
  2021-05-03 19:37   ` Thomas Gleixner
  2021-05-03 20:02   ` Thomas Gleixner
  2 siblings, 0 replies; 38+ messages in thread
From: Lai Jiangshan @ 2021-04-30  2:46 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Thomas Gleixner, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra



On 2021/4/27 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
> 
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
> 
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry.  So moving away from INTn are very reasonable.
> 
> Yet some details were missed.  After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack.  But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly.  When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
> 
> So we should not used the NMI entry installed on the IDT table.  Rather,
> we should use the NMI entry allowed to be used on the kernel stack which
> is asm_noist_exc_nmi() which is also used for XENPV and early booting.
> 

The problem can be tested by the following testing-patch.

1) the testing-patch can be applied without conflict before this patch 3.
    And it shows the problem that the NMI is missed in the case.

2) you need to manually copy the same logic of this testing-patch to verify
    this patch 3. And it shows that the problem is fixed.

3) the only one line of code in vmenter.S just emulates the situation that
    a "uninitialized" garbage in the kernel stack happens to be 1 and it happens
    to be at the same location of the RSP-located "NMI executing" variable.


diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 3a6461694fc2..32096049c2a2 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -316,6 +316,7 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
  #endif
  	pushf
  	push $__KERNEL_CS
+	movq $1, -24(%rsp) // "NMI executing": 1 = nested, non-1 = not-nested
  	CALL_NOSPEC _ASM_ARG1

  	/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcbf0d2139e9..9509d2edd50d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6416,8 +6416,12 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
  	else if (is_machine_check(intr_info))
  		kvm_machine_check();
  	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
+	else if (is_nmi(intr_info)) {
+		unsigned long count = this_cpu_read(irq_stat.__nmi_count);
  		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+		if (count == this_cpu_read(irq_stat.__nmi_count))
+			pr_info("kvm nmi miss\n");
+	}
  }

  static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
                   ` (3 preceding siblings ...)
  2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
@ 2021-04-30  7:14 ` Paolo Bonzini
  2021-05-03 14:36   ` Thomas Gleixner
  4 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2021-04-30  7:14 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky

On 27/04/21 01:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
> 
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
> 
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry.  So moving away from INTn are very reasonable.
> 
> Yet some details were missed.  After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack.  But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly.  When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
> 
> During fixing the problem for KVM, I found that there might be the same
> problem for early booting stage where the IST is not set up. asm_exc_nmi()
> is not allowed to be used in this stage for the same reason about
> the RSP-located "NMI executing" variable.
> 
> For both cases, we should use asm_noist_exc_nmi() which is introduced
> in the patch 1 via renaming from an existing asm_xenpv_exc_nmi() and
> which is safe on the kernel stack.
> 
> https://lore.kernel.org/lkml/20200915191505.10355-3-sean.j.christopherson@intel.com/

For the KVM part,

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks,

Paolo

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: kvm@vger.kernel.org
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Lai Jiangshan (4):
>    x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
>    x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
>    KVM/VMX: Invoke NMI non-IST entry instead of IST entry
>    KVM/VMX: fold handle_interrupt_nmi_irqoff() into its solo caller
> 
>   arch/x86/include/asm/idtentry.h |  4 +---
>   arch/x86/kernel/idt.c           |  8 +++++++-
>   arch/x86/kernel/nmi.c           | 12 ++++++++++++
>   arch/x86/kvm/vmx/vmx.c          | 27 ++++++++++++++-------------
>   arch/x86/xen/enlighten_pv.c     |  9 +++------
>   arch/x86/xen/xen-asm.S          |  2 +-
>   6 files changed, 38 insertions(+), 24 deletions(-)
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-28 21:27   ` Steven Rostedt
@ 2021-04-30  7:15     ` Paolo Bonzini
  2021-04-30 12:05       ` Steven Rostedt
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2021-04-30  7:15 UTC (permalink / raw)
  To: Steven Rostedt, Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner,
	Sean Christopherson, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

On 28/04/21 23:27, Steven Rostedt wrote:
> On Tue, 27 Apr 2021 07:09:46 +0800
> Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> 
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>>
>> There is no any functionality change intended.  Just rename it and
>> move it to arch/x86/kernel/nmi.c so that we can resue it later in
>> next patch for early NMI and kvm.
> 
> Nit, but in change logs, please avoid stating "next patch" as searching git
> history (via git blame or whatever) there is no such thing as "next patch".

Interesting, I use next patch(es) relatively often, though you're right 
that something like "in preparation for" works just as well.  Yes, it's 
the previous in "git log", but you get what it's meant in practice. :)

Paolo

> Just state: "so that we can reuse it for early NMI and KVM."
> 
> I also just noticed the typo in "resue". Or maybe both NMI and KVM should
> be sued again ;-)
> 
> -- Steve
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
@ 2021-04-30  9:03   ` Thomas Gleixner
  2021-04-30  9:06     ` Paolo Bonzini
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-04-30  9:03 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, kvm

Lai,

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>  	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>  static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>  {
>  	u32 intr_info = vmx_get_intr_info(vcpu);
> +	unsigned int vector;
> +	gate_desc *desc;
>  
>  	if (WARN_ONCE(!is_external_intr(intr_info),
>  	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>  		return;
>  
> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
> +	vector = intr_info & INTR_INFO_VECTOR_MASK;
> +	desc = (gate_desc *)host_idt_base + vector;
> +
> +	kvm_before_interrupt(vcpu);
> +	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> +	kvm_after_interrupt(vcpu);

So the previous patch does:

+               kvm_before_interrupt(&vmx->vcpu);
+               vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
+               kvm_after_interrupt(&vmx->vcpu);

What is this idt gate descriptor dance for in this code?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-30  9:03   ` Thomas Gleixner
@ 2021-04-30  9:06     ` Paolo Bonzini
  2021-04-30 23:28       ` Thomas Gleixner
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2021-04-30  9:06 UTC (permalink / raw)
  To: Thomas Gleixner, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

On 30/04/21 11:03, Thomas Gleixner wrote:
> Lai,
> 
> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>   	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>>   static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>>   {
>>   	u32 intr_info = vmx_get_intr_info(vcpu);
>> +	unsigned int vector;
>> +	gate_desc *desc;
>>   
>>   	if (WARN_ONCE(!is_external_intr(intr_info),
>>   	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>>   		return;
>>   
>> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
>> +	vector = intr_info & INTR_INFO_VECTOR_MASK;
>> +	desc = (gate_desc *)host_idt_base + vector;
>> +
>> +	kvm_before_interrupt(vcpu);
>> +	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
>> +	kvm_after_interrupt(vcpu);
> 
> So the previous patch does:
> 
> +               kvm_before_interrupt(&vmx->vcpu);
> +               vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
> +               kvm_after_interrupt(&vmx->vcpu);
> 
> What is this idt gate descriptor dance for in this code?

NMIs are sent through a different vmexit code (the same one as 
exceptions).  This one is for interrupts.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-30  7:15     ` Paolo Bonzini
@ 2021-04-30 12:05       ` Steven Rostedt
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Rostedt @ 2021-04-30 12:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Lai Jiangshan, linux-kernel, Lai Jiangshan, Thomas Gleixner,
	Sean Christopherson, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

On Fri, 30 Apr 2021 09:15:51 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> > Nit, but in change logs, please avoid stating "next patch" as searching git
> > history (via git blame or whatever) there is no such thing as "next patch".  
> 
> Interesting, I use next patch(es) relatively often, though you're right 
> that something like "in preparation for" works just as well.  Yes, it's 
> the previous in "git log", but you get what it's meant in practice. :)

It's not always the previous in a git log. Git log sorts by time, and
if an unrelated commit was created in between those two patches, it
will be in between them.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-30  9:06     ` Paolo Bonzini
@ 2021-04-30 23:28       ` Thomas Gleixner
  0 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2021-04-30 23:28 UTC (permalink / raw)
  To: Paolo Bonzini, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

On Fri, Apr 30 2021 at 11:06, Paolo Bonzini wrote:

> On 30/04/21 11:03, Thomas Gleixner wrote:
>> Lai,
>> 
>> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>>   	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>>> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>>>   static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>>>   {
>>>   	u32 intr_info = vmx_get_intr_info(vcpu);
>>> +	unsigned int vector;
>>> +	gate_desc *desc;
>>>   
>>>   	if (WARN_ONCE(!is_external_intr(intr_info),
>>>   	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>>>   		return;
>>>   
>>> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
>>> +	vector = intr_info & INTR_INFO_VECTOR_MASK;
>>> +	desc = (gate_desc *)host_idt_base + vector;
>>> +
>>> +	kvm_before_interrupt(vcpu);
>>> +	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
>>> +	kvm_after_interrupt(vcpu);
>> 
>> So the previous patch does:
>> 
>> +               kvm_before_interrupt(&vmx->vcpu);
>> +               vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
>> +               kvm_after_interrupt(&vmx->vcpu);
>> 
>> What is this idt gate descriptor dance for in this code?
>
> NMIs are sent through a different vmexit code (the same one as 
> exceptions).  This one is for interrupts.

Duh. Yes. The ability to read is clearly an advantage...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack
  2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
@ 2021-05-03 14:36   ` Thomas Gleixner
  0 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 14:36 UTC (permalink / raw)
  To: Paolo Bonzini, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Josh Poimboeuf, Uros Bizjak, Maxim Levitsky

On Fri, Apr 30 2021 at 09:14, Paolo Bonzini wrote:
> On 27/04/21 01:09, Lai Jiangshan wrote:
>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks Paolo. I'm working through it now...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
  2021-04-28 21:27   ` Steven Rostedt
@ 2021-05-03 19:05   ` Thomas Gleixner
  2021-05-03 19:41     ` Thomas Gleixner
  2021-05-10  7:59   ` Juergen Gross
  2 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 19:05 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Peter Zijlstra, Alexandre Chartre,
	Joerg Roedel, Jian Cai, xen-devel

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
>
> There is no any functionality change intended.  Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.

'Reuse it later' is not really a proper explanation why this change it
necessary.

Also this can be simplified by using aliasing which keeps the name
spaces intact.

Thanks,

        tglx
---       

--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -135,6 +135,9 @@ static __always_inline void __##func(str
 #define DEFINE_IDTENTRY_RAW(func)					\
 __visible noinstr void func(struct pt_regs *regs)
 
+#define DEFINE_IDTENTRY_RAW_ALIAS(alias, func)				\
+__visible noinstr void func(struct pt_regs *regs) __alias(alias)
+
 /**
  * DECLARE_IDTENTRY_RAW_ERRORCODE - Declare functions for raw IDT entry points
  *				    Error code pushed by hardware
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,8 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
+DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, xenpv_exc_nmi);
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -565,12 +565,6 @@ static void xen_write_ldt_entry(struct d
 
 void noist_exc_debug(struct pt_regs *regs);
 
-DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
-{
-	/* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
-	exc_nmi(regs);
-}
-
 DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
 {
 	/* On Xen PV, DF doesn't use IST.  The C part is the same as native. */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 ` [PATCH 3/4] " Lai Jiangshan
  2021-04-30  2:46   ` Lai Jiangshan
@ 2021-05-03 19:37   ` Thomas Gleixner
  2021-05-03 20:02   ` Thomas Gleixner
  2 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 19:37 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
>
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
>
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
>
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry.  So moving away from INTn are very reasonable.
>
> Yet some details were missed.  After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack.  But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly.  When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
>
> So we should not used the NMI entry installed on the IDT table.  Rather,
> we should use the NMI entry allowed to be used on the kernel stack which
> is asm_noist_exc_nmi() which is also used for XENPV and early booting.

It's not used by XENPV. XENPV only uses the C entry point, but the ASM
entry is separate.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-05-03 19:05   ` Thomas Gleixner
@ 2021-05-03 19:41     ` Thomas Gleixner
  0 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 19:41 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Peter Zijlstra, Alexandre Chartre,
	Joerg Roedel, Jian Cai, xen-devel

On Mon, May 03 2021 at 21:05, Thomas Gleixner wrote:

> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>>
>> There is no any functionality change intended.  Just rename it and
>> move it to arch/x86/kernel/nmi.c so that we can resue it later in
>> next patch for early NMI and kvm.
>
> 'Reuse it later' is not really a proper explanation why this change it
> necessary.
>
> Also this can be simplified by using aliasing which keeps the name
> spaces intact.

Aside of that this is not required to be part of a fixes series which
needs to be backported.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 ` [PATCH 3/4] " Lai Jiangshan
  2021-04-30  2:46   ` Lai Jiangshan
  2021-05-03 19:37   ` Thomas Gleixner
@ 2021-05-03 20:02   ` Thomas Gleixner
  2021-05-04  8:10     ` Paolo Bonzini
  2 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 20:02 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index bcbf0d2139e9..96e59d912637 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
>  #include <asm/debugreg.h>
>  #include <asm/desc.h>
>  #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
>  #include <asm/io.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/kexec.h>
> @@ -6416,8 +6417,11 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>  	else if (is_machine_check(intr_info))
>  		kvm_machine_check();
>  	/* We need to handle NMIs before interrupts are enabled */
> -	else if (is_nmi(intr_info))
> -		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> +	else if (is_nmi(intr_info)) {

Lacks curly braces for all of the above conditions according to coding style.

> +		kvm_before_interrupt(&vmx->vcpu);
> +		vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
> +		kvm_after_interrupt(&vmx->vcpu);
> +	}

but this and the next patch are not really needed. The below avoids the
extra kvm_before/after() dance in both places. Hmm?

Thanks,

        tglx
---
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -526,6 +526,10 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 
 DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, exc_nmi_noist);
 
+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
 #include <asm/debugreg.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
 #include <asm/kexec.h>
@@ -6395,18 +6396,17 @@ static void vmx_apicv_post_state_restore
 
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+					unsigned long entry)
 {
-	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
 	kvm_before_interrupt(vcpu);
-	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+	vmx_do_interrupt_nmi_irqoff(entry);
 	kvm_after_interrupt(vcpu);
 }
 
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
+	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
 	/* if exit due to PF check for async PF */
@@ -6417,18 +6417,20 @@ static void handle_exception_nmi_irqoff(
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
+	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+	gate_desc *desc = (gate_desc *)host_idt_base + vector;
 
 	if (WARN_ONCE(!is_external_intr(intr_info),
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, intr_info);
+	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  2021-04-26 23:09 ` [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage Lai Jiangshan
  2021-04-28 21:30   ` Steven Rostedt
@ 2021-05-03 20:13   ` Thomas Gleixner
  2021-05-03 20:24     ` Thomas Gleixner
  1 sibling, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 20:13 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> + *
> + * While the other entries for the exceptions which use Interrupt stacks can
> + * be also used on the kernel stack, asm_exc_nmi() can not be used on the
> + * kernel stack for it relies on the RSP-located "NMI executing" variable
> + * which expects to on a fixed location in the NMI IST stack.  For early
> + * booting stage, asm_noist_exc_nmi() is used for NMI.
>   */
>  static const __initconst struct idt_data def_idts[] = {
>  	INTG(X86_TRAP_DE,		asm_exc_divide_error),
> -	INTG(X86_TRAP_NMI,		asm_exc_nmi),
> +	INTG(X86_TRAP_NMI,		asm_noist_exc_nmi),

Actually this is a x86_64 only problem. The 32bit variant is fine, but
for consistency there is no problem to have that extra entry point on
32bit as well.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  2021-05-03 20:13   ` Thomas Gleixner
@ 2021-05-03 20:24     ` Thomas Gleixner
  2021-05-03 21:45       ` Thomas Gleixner
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 20:24 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

On Mon, May 03 2021 at 22:13, Thomas Gleixner wrote:

> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>> + *
>> + * While the other entries for the exceptions which use Interrupt stacks can
>> + * be also used on the kernel stack, asm_exc_nmi() can not be used on the
>> + * kernel stack for it relies on the RSP-located "NMI executing" variable
>> + * which expects to on a fixed location in the NMI IST stack.  For early
>> + * booting stage, asm_noist_exc_nmi() is used for NMI.
>>   */
>>  static const __initconst struct idt_data def_idts[] = {
>>  	INTG(X86_TRAP_DE,		asm_exc_divide_error),
>> -	INTG(X86_TRAP_NMI,		asm_exc_nmi),
>> +	INTG(X86_TRAP_NMI,		asm_noist_exc_nmi),
>
> Actually this is a x86_64 only problem. The 32bit variant is fine, but
> for consistency there is no problem to have that extra entry point on
> 32bit as well.

Bah, no. This patch breaks 32bit because on 32bit nothing sets the entry
to asm_exc_nmi() later on.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  2021-05-03 20:24     ` Thomas Gleixner
@ 2021-05-03 21:45       ` Thomas Gleixner
  2021-05-04 12:43         ` Thomas Gleixner
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-03 21:45 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

On Mon, May 03 2021 at 22:24, Thomas Gleixner wrote:

> On Mon, May 03 2021 at 22:13, Thomas Gleixner wrote:
>
>> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>> + *
>>> + * While the other entries for the exceptions which use Interrupt stacks can
>>> + * be also used on the kernel stack, asm_exc_nmi() can not be used on the
>>> + * kernel stack for it relies on the RSP-located "NMI executing" variable
>>> + * which expects to on a fixed location in the NMI IST stack.  For early
>>> + * booting stage, asm_noist_exc_nmi() is used for NMI.
>>>   */
>>>  static const __initconst struct idt_data def_idts[] = {
>>>  	INTG(X86_TRAP_DE,		asm_exc_divide_error),
>>> -	INTG(X86_TRAP_NMI,		asm_exc_nmi),
>>> +	INTG(X86_TRAP_NMI,		asm_noist_exc_nmi),
>>
>> Actually this is a x86_64 only problem. The 32bit variant is fine, but
>> for consistency there is no problem to have that extra entry point on
>> 32bit as well.
>
> Bah, no. This patch breaks 32bit because on 32bit nothing sets the entry
> to asm_exc_nmi() later on.

Sigh. Finding a fixes tag for this is complicated.

The problem was introduced in 4.14 with b70543a0b2b6 ("x86/idt: Move
regular trap init to tables").

Before that trap_init() installed an IST gate right away, but looking
deeper this was broken forever because there is a hen and egg problem.

ISTs only work after TSS is initialized and the ordering here is:

trap_init()
  init_idt()
  cpu_init()
    init_tss()

So the original code had a race window between init_idt() and
init_tss(). Any IST using exception in that window goes south because
TSS is not initialized.

b70543a0b2b6 traded the above with that NMI issue. All other
exceptions are fine...

I'll think about it tomorrow some more...


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-03 20:02   ` Thomas Gleixner
@ 2021-05-04  8:10     ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2021-05-04  8:10 UTC (permalink / raw)
  To: Thomas Gleixner, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Josh Poimboeuf, Uros Bizjak, Maxim Levitsky,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Peter Zijlstra

On 03/05/21 22:02, Thomas Gleixner wrote:
> but this and the next patch are not really needed. The below avoids the
> extra kvm_before/after() dance in both places. Hmm?

Sure, that's good as well.

Paolo

> Thanks,
> 
>          tglx
> ---
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -526,6 +526,10 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
>   
>   DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, exc_nmi_noist);
>   
> +#if IS_MODULE(CONFIG_KVM_INTEL)
> +EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
> +#endif
> +
>   void stop_nmi(void)
>   {
>   	ignore_nmis++;
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
>   #include <asm/debugreg.h>
>   #include <asm/desc.h>
>   #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
>   #include <asm/io.h>
>   #include <asm/irq_remapping.h>
>   #include <asm/kexec.h>
> @@ -6395,18 +6396,17 @@ static void vmx_apicv_post_state_restore
>   
>   void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
>   
> -static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
> +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
> +					unsigned long entry)
>   {
> -	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> -	gate_desc *desc = (gate_desc *)host_idt_base + vector;
> -
>   	kvm_before_interrupt(vcpu);
> -	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> +	vmx_do_interrupt_nmi_irqoff(entry);
>   	kvm_after_interrupt(vcpu);
>   }
>   
>   static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>   {
> +	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
>   	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>   
>   	/* if exit due to PF check for async PF */
> @@ -6417,18 +6417,20 @@ static void handle_exception_nmi_irqoff(
>   		kvm_machine_check();
>   	/* We need to handle NMIs before interrupts are enabled */
>   	else if (is_nmi(intr_info))
> -		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> +		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
>   }
>   
>   static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>   {
>   	u32 intr_info = vmx_get_intr_info(vcpu);
> +	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> +	gate_desc *desc = (gate_desc *)host_idt_base + vector;
>   
>   	if (WARN_ONCE(!is_external_intr(intr_info),
>   	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>   		return;
>   
> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
> +	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
>   }
>   
>   static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  2021-05-03 21:45       ` Thomas Gleixner
@ 2021-05-04 12:43         ` Thomas Gleixner
  2021-05-04 19:50           ` [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Thomas Gleixner
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-04 12:43 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

On Mon, May 03 2021 at 23:45, Thomas Gleixner wrote:
> The problem was introduced in 4.14 with b70543a0b2b6 ("x86/idt: Move
> regular trap init to tables").
>
> Before that trap_init() installed an IST gate right away, but looking
> deeper this was broken forever because there is a hen and egg problem.
>
> ISTs only work after TSS is initialized and the ordering here is:
>
> trap_init()
>   init_idt()
>   cpu_init()
>     init_tss()
>
> So the original code had a race window between init_idt() and
> init_tss(). Any IST using exception in that window goes south because
> TSS is not initialized.
>
> b70543a0b2b6 traded the above with that NMI issue. All other
> exceptions are fine...
>
> I'll think about it tomorrow some more...

It does not really matter which way around we do it. Even if we do that
noist dance then still any NMI hitting _before_ init_idt() is going to
lala land. So having this tiny step in between is more or less cosmetic.

And just for completness sake, I don't see a reason why we have to set
up the idt gates _before_ the TSS muck, i.e. before cpu_init().

The only thing cpu_init() needs working which is not installed in the
early_idt is #GP because some cpu init code uses rd/wrmsrl_safe(). But
that's pretty much all of it.

So this wants a proper cleanup and not some paper over it with an extra
step and I don't see a reason why any of this should be backported
simply because it does not matter at all whether the early idt which
only populates a few essential gates is active for a bit longer.

So what we need is a solution for that KVM wreckage but that can be
stand alone.

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 12:43         ` Thomas Gleixner
@ 2021-05-04 19:50           ` Thomas Gleixner
  2021-05-04 21:05             ` Maxim Levitsky
  2021-05-06 12:14             ` [tip: x86/urgent] " tip-bot2 for Lai Jiangshan
  0 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-04 19:50 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

From: Lai Jiangshan <laijs@linux.alibaba.com>

In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
call instead of INTn"), this was done by INTn ("int $2"). But INTn
microcode is relatively expensive, so the commit reworked NMI VM-Exit
handling to invoke the kernel handler by function call.

But this missed a detail. The NMI entry point for direct invocation is
fetched from the IDT table and called on the kernel stack.  But on 64-bit
the NMI entry installed in the IDT expects to be invoked on the IST stack.
It relies on the "NMI executing" variable on the IST stack to work
correctly, which is at a fixed position in the IST stack.  When the entry
point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
executing" variable is obviously also on the kernel stack and is
"uninitialized" and can cause the NMI entry code to run in the wrong way.

Provide a non-ist entry point for VMX which shares the C-function with
the regular NMI entry and invoke the new asm entry point instead.

On 32-bit this just maps to the regular NMI entry point as 32-bit has no
ISTs and is not affected.

[ tglx: Made it independent for backporting, massaged changelog ]

Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---

Note: That's the minimal fix which needs to be backported and the other
      stuff is cleanup material on top for 5.14.

---
 arch/x86/include/asm/idtentry.h |   15 +++++++++++++++
 arch/x86/kernel/nmi.c           |   10 ++++++++++
 arch/x86/kvm/vmx/vmx.c          |   16 +++++++++-------
 3 files changed, 34 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -588,6 +588,21 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_
 #endif
 
 /* NMI */
+
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+/*
+ * Special NOIST entry point for VMX which invokes this on the kernel
+ * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
+ * 'executing' marker.
+ *
+ * On 32bit this just uses the regular NMI entry point because 32-bit does
+ * not have ISTs.
+ */
+DECLARE_IDTENTRY(X86_TRAP_NMI,		exc_nmi_noist);
+#else
+#define asm_exc_nmi_noist		asm_exc_nmi
+#endif
+
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
 #ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,16 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+DEFINE_IDTENTRY_RAW(exc_nmi_noist)
+{
+	exc_nmi(regs);
+}
+#endif
+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
 #include <asm/debugreg.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
 #include <asm/kexec.h>
@@ -6415,18 +6416,17 @@ static void vmx_apicv_post_state_restore
 
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+					unsigned long entry)
 {
-	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
 	kvm_before_interrupt(vcpu);
-	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+	vmx_do_interrupt_nmi_irqoff(entry);
 	kvm_after_interrupt(vcpu);
 }
 
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
+	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
 	/* if exit due to PF check for async PF */
@@ -6437,18 +6437,20 @@ static void handle_exception_nmi_irqoff(
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
+	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+	gate_desc *desc = (gate_desc *)host_idt_base + vector;
 
 	if (WARN_ONCE(!is_external_intr(intr_info),
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, intr_info);
+	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 19:50           ` [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Thomas Gleixner
@ 2021-05-04 21:05             ` Maxim Levitsky
  2021-05-04 21:12               ` Paolo Bonzini
  2021-05-06 12:14             ` [tip: x86/urgent] " tip-bot2 for Lai Jiangshan
  1 sibling, 1 reply; 38+ messages in thread
From: Maxim Levitsky @ 2021-05-04 21:05 UTC (permalink / raw)
  To: Thomas Gleixner, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Josh Poimboeuf,
	Uros Bizjak, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Peter Zijlstra, Alexandre Chartre, Juergen Gross, Joerg Roedel,
	Jian Cai

On Tue, 2021-05-04 at 21:50 +0200, Thomas Gleixner wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
> Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
> call instead of INTn"), this was done by INTn ("int $2"). But INTn
> microcode is relatively expensive, so the commit reworked NMI VM-Exit
> handling to invoke the kernel handler by function call.
> 
> But this missed a detail. The NMI entry point for direct invocation is
> fetched from the IDT table and called on the kernel stack.  But on 64-bit
> the NMI entry installed in the IDT expects to be invoked on the IST stack.
> It relies on the "NMI executing" variable on the IST stack to work
> correctly, which is at a fixed position in the IST stack.  When the entry
> point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
> executing" variable is obviously also on the kernel stack and is
> "uninitialized" and can cause the NMI entry code to run in the wrong way.
> 
> Provide a non-ist entry point for VMX which shares the C-function with
> the regular NMI entry and invoke the new asm entry point instead.

I haven't followed this closely, so this was probably already discussed,
but anyway, do I understand correctly that while the NMI handler that was
invoked from VMX code, another NMI arrives, 
they won't share the stacks (one uses IST and other doesn't) and thus
the prevention of NMI nesting wont be activated?

Does this mean that we still rely on hardware NMI masking to be activated?

Or in other words, that is we still can't have an IRET between VM exit and 
the entry to the NMI handler?

Best regards,
	Maxim Levitsky

> 
> On 32-bit this just maps to the regular NMI entry point as 32-bit has no
> ISTs and is not affected.
> 
> [ tglx: Made it independent for backporting, massaged changelog ]
> 
> Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> 
> Note: That's the minimal fix which needs to be backported and the other
>       stuff is cleanup material on top for 5.14.
> 
> ---
>  arch/x86/include/asm/idtentry.h |   15 +++++++++++++++
>  arch/x86/kernel/nmi.c           |   10 ++++++++++
>  arch/x86/kvm/vmx/vmx.c          |   16 +++++++++-------
>  3 files changed, 34 insertions(+), 7 deletions(-)
> 
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -588,6 +588,21 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_
>  #endif
>  
>  /* NMI */
> +
> +#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
> +/*
> + * Special NOIST entry point for VMX which invokes this on the kernel
> + * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
> + * 'executing' marker.
> + *
> + * On 32bit this just uses the regular NMI entry point because 32-bit does
> + * not have ISTs.
> + */
> +DECLARE_IDTENTRY(X86_TRAP_NMI,		exc_nmi_noist);
> +#else
> +#define asm_exc_nmi_noist		asm_exc_nmi
> +#endif
> +
>  DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
>  #ifdef CONFIG_XEN_PV
>  DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -524,6 +524,16 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
>  		mds_user_clear_cpu_buffers();
>  }
>  
> +#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
> +DEFINE_IDTENTRY_RAW(exc_nmi_noist)
> +{
> +	exc_nmi(regs);
> +}
> +#endif
> +#if IS_MODULE(CONFIG_KVM_INTEL)
> +EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
> +#endif
> +
>  void stop_nmi(void)
>  {
>  	ignore_nmis++;
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
>  #include <asm/debugreg.h>
>  #include <asm/desc.h>
>  #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
>  #include <asm/io.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/kexec.h>
> @@ -6415,18 +6416,17 @@ static void vmx_apicv_post_state_restore
>  
>  void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
>  
> -static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
> +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
> +					unsigned long entry)
>  {
> -	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> -	gate_desc *desc = (gate_desc *)host_idt_base + vector;
> -
>  	kvm_before_interrupt(vcpu);
> -	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> +	vmx_do_interrupt_nmi_irqoff(entry);
>  	kvm_after_interrupt(vcpu);
>  }
>  
>  static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>  {
> +	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
>  	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>  
>  	/* if exit due to PF check for async PF */
> @@ -6437,18 +6437,20 @@ static void handle_exception_nmi_irqoff(
>  		kvm_machine_check();
>  	/* We need to handle NMIs before interrupts are enabled */
>  	else if (is_nmi(intr_info))
> -		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> +		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
>  }
>  
>  static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>  {
>  	u32 intr_info = vmx_get_intr_info(vcpu);
> +	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> +	gate_desc *desc = (gate_desc *)host_idt_base + vector;
>  
>  	if (WARN_ONCE(!is_external_intr(intr_info),
>  	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>  		return;
>  
> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
> +	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
>  }
>  
>  static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:05             ` Maxim Levitsky
@ 2021-05-04 21:12               ` Paolo Bonzini
  2021-05-04 21:21                 ` Sean Christopherson
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2021-05-04 21:12 UTC (permalink / raw)
  To: Maxim Levitsky, Thomas Gleixner, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Josh Poimboeuf, Uros Bizjak, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai

On 04/05/21 23:05, Maxim Levitsky wrote:
> Does this mean that we still rely on hardware NMI masking to be activated?

No, the NMI code already handles reentrancy at both the assembly and C 
levels.

> Or in other words, that is we still can't have an IRET between VM exit and
> the entry to the NMI handler?

No, because NMIs are not masked on VM exit.  This in fact makes things 
potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other 
things that Intel thought can be restored atomically with the VM exit.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:12               ` Paolo Bonzini
@ 2021-05-04 21:21                 ` Sean Christopherson
  2021-05-04 21:23                   ` Andy Lutomirski
  0 siblings, 1 reply; 38+ messages in thread
From: Sean Christopherson @ 2021-05-04 21:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, Thomas Gleixner, Lai Jiangshan, linux-kernel,
	Lai Jiangshan, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Josh Poimboeuf, Uros Bizjak, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

On Tue, May 04, 2021, Paolo Bonzini wrote:
> On 04/05/21 23:05, Maxim Levitsky wrote:
> > Does this mean that we still rely on hardware NMI masking to be activated?
> 
> No, the NMI code already handles reentrancy at both the assembly and C
> levels.
> 
> > Or in other words, that is we still can't have an IRET between VM exit and
> > the entry to the NMI handler?
> 
> No, because NMIs are not masked on VM exit.  This in fact makes things
> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
> that Intel thought can be restored atomically with the VM exit.

FWIW, NMIs are masked if the VM-Exit was due to an NMI.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:21                 ` Sean Christopherson
@ 2021-05-04 21:23                   ` Andy Lutomirski
  2021-05-04 21:25                     ` Paolo Bonzini
  2021-05-05  1:07                     ` Lai Jiangshan
  0 siblings, 2 replies; 38+ messages in thread
From: Andy Lutomirski @ 2021-05-04 21:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Maxim Levitsky, Thomas Gleixner, Lai Jiangshan,
	linux-kernel, Lai Jiangshan, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Josh Poimboeuf, Uros Bizjak, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai


> On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>> On 04/05/21 23:05, Maxim Levitsky wrote:
>>> Does this mean that we still rely on hardware NMI masking to be activated?
>> 
>> No, the NMI code already handles reentrancy at both the assembly and C
>> levels.
>> 
>>> Or in other words, that is we still can't have an IRET between VM exit and
>>> the entry to the NMI handler?
>> 
>> No, because NMIs are not masked on VM exit.  This in fact makes things
>> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
>> that Intel thought can be restored atomically with the VM exit.
> 
> FWIW, NMIs are masked if the VM-Exit was due to an NMI.

Then this whole change is busted, since nothing will unmask NMIs. Revert it?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:23                   ` Andy Lutomirski
@ 2021-05-04 21:25                     ` Paolo Bonzini
  2021-05-04 21:51                       ` Sean Christopherson
  2021-05-05  1:07                     ` Lai Jiangshan
  1 sibling, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2021-05-04 21:25 UTC (permalink / raw)
  To: Andy Lutomirski, Sean Christopherson
  Cc: Maxim Levitsky, Thomas Gleixner, Lai Jiangshan, linux-kernel,
	Lai Jiangshan, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Josh Poimboeuf, Uros Bizjak, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

On 04/05/21 23:23, Andy Lutomirski wrote:
>> On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.

Huh, indeed:  "An NMI causes subsequent NMIs to be blocked, but only 
after the VM exit completes".

> Then this whole change is busted, since nothing will unmask NMIs. Revert it?

Looks like the easiest way out indeed.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:25                     ` Paolo Bonzini
@ 2021-05-04 21:51                       ` Sean Christopherson
  2021-05-04 21:56                         ` Paolo Bonzini
  0 siblings, 1 reply; 38+ messages in thread
From: Sean Christopherson @ 2021-05-04 21:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andy Lutomirski, Maxim Levitsky, Thomas Gleixner, Lai Jiangshan,
	linux-kernel, Lai Jiangshan, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Josh Poimboeuf, Uros Bizjak, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai

On Tue, May 04, 2021, Paolo Bonzini wrote:
> On 04/05/21 23:23, Andy Lutomirski wrote:
> > > On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
> > > FWIW, NMIs are masked if the VM-Exit was due to an NMI.
> 
> Huh, indeed:  "An NMI causes subsequent NMIs to be blocked, but only after
> the VM exit completes".
> 
> > Then this whole change is busted, since nothing will unmask NMIs. Revert it?

SMI?  #MC?  :-)

> Looks like the easiest way out indeed.

I've no objection to reverting to intn, but what does reverting versus handling
NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
I'm struggling mightily to connect the dots.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:51                       ` Sean Christopherson
@ 2021-05-04 21:56                         ` Paolo Bonzini
  2021-05-05  0:00                           ` Thomas Gleixner
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2021-05-04 21:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Maxim Levitsky, Thomas Gleixner, Lai Jiangshan,
	linux-kernel, Lai Jiangshan, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Josh Poimboeuf, Uros Bizjak, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai

On 04/05/21 23:51, Sean Christopherson wrote:
> On Tue, May 04, 2021, Paolo Bonzini wrote:
>> On 04/05/21 23:23, Andy Lutomirski wrote:
>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
>>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>>
>> Huh, indeed:  "An NMI causes subsequent NMIs to be blocked, but only after
>> the VM exit completes".
>>
>>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
>> Looks like the easiest way out indeed.
> 
> I've no objection to reverting to intn, but what does reverting versus handling
> NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
> I'm struggling mightily to connect the dots.

Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler 
directly, rather it calls the IDT entrypoint which *will* do an IRET and 
unmask NMIs.  I trusted Andy too much on this one. :)

Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry 
instead of IST entry") looks good.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:56                         ` Paolo Bonzini
@ 2021-05-05  0:00                           ` Thomas Gleixner
  2021-05-05 15:44                             ` Lai Jiangshan
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2021-05-05  0:00 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Andy Lutomirski, Maxim Levitsky, Lai Jiangshan, linux-kernel,
	Lai Jiangshan, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Josh Poimboeuf, Uros Bizjak, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre, Juergen Gross,
	Joerg Roedel, Jian Cai

On Tue, May 04 2021 at 23:56, Paolo Bonzini wrote:
> On 04/05/21 23:51, Sean Christopherson wrote:
>> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>> On 04/05/21 23:23, Andy Lutomirski wrote:
>>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
>>>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>>>
>>> Huh, indeed:  "An NMI causes subsequent NMIs to be blocked, but only after
>>> the VM exit completes".
>>>
>>>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
>>> Looks like the easiest way out indeed.
>> 
>> I've no objection to reverting to intn, but what does reverting versus handling
>> NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
>> I'm struggling mightily to connect the dots.
>
> Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler 
> directly, rather it calls the IDT entrypoint which *will* do an IRET and 
> unmask NMIs.  I trusted Andy too much on this one. :)
>
> Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry 
> instead of IST entry") looks good.

Well, looks good is one thing.

It would be more helpful if someone would actually review and/or test it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 21:23                   ` Andy Lutomirski
  2021-05-04 21:25                     ` Paolo Bonzini
@ 2021-05-05  1:07                     ` Lai Jiangshan
  2021-05-05  1:11                       ` Andy Lutomirski
  1 sibling, 1 reply; 38+ messages in thread
From: Lai Jiangshan @ 2021-05-05  1:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, Paolo Bonzini, Maxim Levitsky,
	Thomas Gleixner, LKML, Lai Jiangshan, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Josh Poimboeuf, Uros Bizjak, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai

On Wed, May 5, 2021 at 5:23 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
> > On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, May 04, 2021, Paolo Bonzini wrote:
> >>> On 04/05/21 23:05, Maxim Levitsky wrote:
> >>> Does this mean that we still rely on hardware NMI masking to be activated?
> >>
> >> No, the NMI code already handles reentrancy at both the assembly and C
> >> levels.
> >>
> >>> Or in other words, that is we still can't have an IRET between VM exit and
> >>> the entry to the NMI handler?
> >>
> >> No, because NMIs are not masked on VM exit.  This in fact makes things
> >> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
> >> that Intel thought can be restored atomically with the VM exit.
> >
> > FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>
> Then this whole change is busted, since nothing will unmask NMIs. Revert it?

There is some instructable code between VMEXIT and
handle_exception_nmi_irqoff().

The possible #DB #BP can happen in this gap and the IRET
of the handler of #DB #BP will unmask NMI.

Another way to fix is to change the VMX code to call the NMI handler
immediately after VMEXIT before leaving "nostr" section.

Reverting it can't fix the problem.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-05  1:07                     ` Lai Jiangshan
@ 2021-05-05  1:11                       ` Andy Lutomirski
  0 siblings, 0 replies; 38+ messages in thread
From: Andy Lutomirski @ 2021-05-05  1:11 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Sean Christopherson, Paolo Bonzini, Maxim Levitsky,
	Thomas Gleixner, LKML, Lai Jiangshan, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Josh Poimboeuf, Uros Bizjak, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Juergen Gross, Joerg Roedel, Jian Cai



> On May 4, 2021, at 6:08 PM, Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> 
> On Wed, May 5, 2021 at 5:23 AM Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>> 
>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
>>> 
>>> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>>>> On 04/05/21 23:05, Maxim Levitsky wrote:
>>>>> Does this mean that we still rely on hardware NMI masking to be activated?
>>>> 
>>>> No, the NMI code already handles reentrancy at both the assembly and C
>>>> levels.
>>>> 
>>>>> Or in other words, that is we still can't have an IRET between VM exit and
>>>>> the entry to the NMI handler?
>>>> 
>>>> No, because NMIs are not masked on VM exit.  This in fact makes things
>>>> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
>>>> that Intel thought can be restored atomically with the VM exit.
>>> 
>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>> 
>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
> 
> There is some instructable code between VMEXIT and
> handle_exception_nmi_irqoff().
> 
> The possible #DB #BP can happen in this gap and the IRET
> of the handler of #DB #BP will unmask NMI.
> 
> Another way to fix is to change the VMX code to call the NMI handler
> immediately after VMEXIT before leaving "nostr" section.
> 
> Reverting it can't fix the problem.

I was indeed wrong, and the helper properly unmasks NMIs. So all should be well.

I will contemplate how this all interacts with FRED.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-05  0:00                           ` Thomas Gleixner
@ 2021-05-05 15:44                             ` Lai Jiangshan
  0 siblings, 0 replies; 38+ messages in thread
From: Lai Jiangshan @ 2021-05-05 15:44 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini, Sean Christopherson
  Cc: Andy Lutomirski, Maxim Levitsky, Lai Jiangshan, linux-kernel,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Josh Poimboeuf,
	Uros Bizjak, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Peter Zijlstra, Alexandre Chartre, Juergen Gross, Joerg Roedel,
	Jian Cai



On 2021/5/5 08:00, Thomas Gleixner wrote:
> On Tue, May 04 2021 at 23:56, Paolo Bonzini wrote:
>> On 04/05/21 23:51, Sean Christopherson wrote:
>>> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>>> On 04/05/21 23:23, Andy Lutomirski wrote:
>>>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@google.com> wrote:
>>>>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>>>>
>>>> Huh, indeed:  "An NMI causes subsequent NMIs to be blocked, but only after
>>>> the VM exit completes".
>>>>
>>>>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
>>>> Looks like the easiest way out indeed.
>>>
>>> I've no objection to reverting to intn, but what does reverting versus handling
>>> NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
>>> I'm struggling mightily to connect the dots.
>>
>> Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler
>> directly, rather it calls the IDT entrypoint which *will* do an IRET and
>> unmask NMIs.  I trusted Andy too much on this one. :)
>>
>> Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry
>> instead of IST entry") looks good.
> 
> Well, looks good is one thing.
> 
> It would be more helpful if someone would actually review and/or test it.
> 
> Thanks,
> 
>          tglx
> 

I tested it with the following testing-patch applied, it shows that the
problem is fixed.

The only one line of code in vmenter.S in the testing-patch just emulates
the situation that a "uninitialized" garbage in the kernel stack happens
to be 1 and it happens to be at the same location of the RSP-located
"NMI executing" variable.


First round:
# apply the testing-patch
# perf record events of a vm which does kbuild inside
# dmesg shows that there are the same number of "kvm nmi" and "kvm nmi miss"
It shows that the problem exists with regard to the invocation of the NMI
handler.

Second Round:
# apply the fix from tglx
# apply the testing-patch
# perf record events of a vm which does kbuild inside
# dmesg shows that there are some "kvm nmi" but no "kvm nmi miss".
It shows that the problem is fixed.


diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 3a6461694fc2..32096049c2a2 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -316,6 +316,7 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
  #endif
  	pushf
  	push $__KERNEL_CS
+	movq $1, -24(%rsp) // "NMI executing": 1 = nested, non-1 = not-nested
  	CALL_NOSPEC _ASM_ARG1

  	/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8586eca349a9..eefd22d22fce 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6439,8 +6439,17 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

  	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
  		handle_external_interrupt_irqoff(vcpu);
-	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
+	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI) {
+		unsigned long count = this_cpu_read(irq_stat.__nmi_count);
+
  		handle_exception_nmi_irqoff(vmx);
+
+		if (is_nmi(vmx_get_intr_info(&vmx->vcpu))) {
+			pr_info("kvm nmi\n");
+			if (count == this_cpu_read(irq_stat.__nmi_count))
+				pr_info("kvm nmi miss\n");
+		}
+	}
  }

  /*


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: x86/urgent] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-04 19:50           ` [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Thomas Gleixner
  2021-05-04 21:05             ` Maxim Levitsky
@ 2021-05-06 12:14             ` tip-bot2 for Lai Jiangshan
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot2 for Lai Jiangshan @ 2021-05-06 12:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Lai Jiangshan, Thomas Gleixner, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     a217a6593cec8b315d4c2f344bae33660b39b703
Gitweb:        https://git.kernel.org/tip/a217a6593cec8b315d4c2f344bae33660b39b703
Author:        Lai Jiangshan <laijs@linux.alibaba.com>
AuthorDate:    Tue, 04 May 2021 21:50:14 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 05 May 2021 22:54:10 +02:00

KVM/VMX: Invoke NMI non-IST entry instead of IST entry

In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
call instead of INTn"), this was done by INTn ("int $2"). But INTn
microcode is relatively expensive, so the commit reworked NMI VM-Exit
handling to invoke the kernel handler by function call.

But this missed a detail. The NMI entry point for direct invocation is
fetched from the IDT table and called on the kernel stack.  But on 64-bit
the NMI entry installed in the IDT expects to be invoked on the IST stack.
It relies on the "NMI executing" variable on the IST stack to work
correctly, which is at a fixed position in the IST stack.  When the entry
point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
executing" variable is obviously also on the kernel stack and is
"uninitialized" and can cause the NMI entry code to run in the wrong way.

Provide a non-ist entry point for VMX which shares the C-function with
the regular NMI entry and invoke the new asm entry point instead.

On 32-bit this just maps to the regular NMI entry point as 32-bit has no
ISTs and is not affected.

[ tglx: Made it independent for backporting, massaged changelog ]

Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lai Jiangshan <laijs@linux.alibaba.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de

---
 arch/x86/include/asm/idtentry.h | 15 +++++++++++++++
 arch/x86/kernel/nmi.c           | 10 ++++++++++
 arch/x86/kvm/vmx/vmx.c          | 16 +++++++++-------
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index e35e342..73d45b0 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -588,6 +588,21 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
 #endif
 
 /* NMI */
+
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+/*
+ * Special NOIST entry point for VMX which invokes this on the kernel
+ * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
+ * 'executing' marker.
+ *
+ * On 32bit this just uses the regular NMI entry point because 32-bit does
+ * not have ISTs.
+ */
+DECLARE_IDTENTRY(X86_TRAP_NMI,		exc_nmi_noist);
+#else
+#define asm_exc_nmi_noist		asm_exc_nmi
+#endif
+
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
 #ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index bf250a3..2ef961c 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,16 @@ nmi_restart:
 		mds_user_clear_cpu_buffers();
 }
 
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+DEFINE_IDTENTRY_RAW(exc_nmi_noist)
+{
+	exc_nmi(regs);
+}
+#endif
+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cbe0cda..b21d751 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
 #include <asm/debugreg.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
 #include <asm/kexec.h>
@@ -6415,18 +6416,17 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+					unsigned long entry)
 {
-	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
 	kvm_before_interrupt(vcpu);
-	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+	vmx_do_interrupt_nmi_irqoff(entry);
 	kvm_after_interrupt(vcpu);
 }
 
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
+	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
 	/* if exit due to PF check for async PF */
@@ -6437,18 +6437,20 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
+	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+	gate_desc *desc = (gate_desc *)host_idt_base + vector;
 
 	if (WARN_ONCE(!is_external_intr(intr_info),
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, intr_info);
+	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
  2021-04-28 21:27   ` Steven Rostedt
  2021-05-03 19:05   ` Thomas Gleixner
@ 2021-05-10  7:59   ` Juergen Gross
  2 siblings, 0 replies; 38+ messages in thread
From: Juergen Gross @ 2021-05-10  7:59 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Stefano Stabellini, Peter Zijlstra, Alexandre Chartre,
	Joerg Roedel, Jian Cai, xen-devel

[-- Attachment #1.1.1: Type: text/plain, Size: 969 bytes --]

On 27.04.21 01:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> There is no any functionality change intended.  Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: kvm@vger.kernel.org
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>

Acked-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, back to index

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
2021-04-28 21:27   ` Steven Rostedt
2021-04-30  7:15     ` Paolo Bonzini
2021-04-30 12:05       ` Steven Rostedt
2021-05-03 19:05   ` Thomas Gleixner
2021-05-03 19:41     ` Thomas Gleixner
2021-05-10  7:59   ` Juergen Gross
2021-04-26 23:09 ` [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage Lai Jiangshan
2021-04-28 21:30   ` Steven Rostedt
2021-05-03 20:13   ` Thomas Gleixner
2021-05-03 20:24     ` Thomas Gleixner
2021-05-03 21:45       ` Thomas Gleixner
2021-05-04 12:43         ` Thomas Gleixner
2021-05-04 19:50           ` [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Thomas Gleixner
2021-05-04 21:05             ` Maxim Levitsky
2021-05-04 21:12               ` Paolo Bonzini
2021-05-04 21:21                 ` Sean Christopherson
2021-05-04 21:23                   ` Andy Lutomirski
2021-05-04 21:25                     ` Paolo Bonzini
2021-05-04 21:51                       ` Sean Christopherson
2021-05-04 21:56                         ` Paolo Bonzini
2021-05-05  0:00                           ` Thomas Gleixner
2021-05-05 15:44                             ` Lai Jiangshan
2021-05-05  1:07                     ` Lai Jiangshan
2021-05-05  1:11                       ` Andy Lutomirski
2021-05-06 12:14             ` [tip: x86/urgent] " tip-bot2 for Lai Jiangshan
2021-04-26 23:09 ` [PATCH 3/4] " Lai Jiangshan
2021-04-30  2:46   ` Lai Jiangshan
2021-05-03 19:37   ` Thomas Gleixner
2021-05-03 20:02   ` Thomas Gleixner
2021-05-04  8:10     ` Paolo Bonzini
2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
2021-04-30  9:03   ` Thomas Gleixner
2021-04-30  9:06     ` Paolo Bonzini
2021-04-30 23:28       ` Thomas Gleixner
2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
2021-05-03 14:36   ` Thomas Gleixner

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git