kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack
@ 2021-04-26 23:09 Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky

From: Lai Jiangshan <laijs@linux.alibaba.com>

In VMX, the NMI handler needs to be invoked after NMI VM-Exit.

Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
indirect call instead of INTn"), the work is done by INTn ("int $2").

But INTn microcode is relatively expensive, so the commit reworked
NMI VM-Exit handling to invoke the kernel handler by function call.
And INTn doesn't set the NMI blocked flag required by the linux kernel
NMI entry.  So moving away from INTn are very reasonable.

Yet some details were missed.  After the said commit applied, the NMI
entry pointer is fetched from the IDT table and called from the kernel
stack.  But the NMI entry pointer installed on the IDT table is
asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
And it relies on the "NMI executing" variable on the IST stack to work
correctly.  When it is unexpectedly called from the kernel stack, the
RSP-located "NMI executing" variable is also on the kernel stack and
is "uninitialized" and can cause the NMI entry to run in the wrong way.

During fixing the problem for KVM, I found that there might be the same
problem for early booting stage where the IST is not set up. asm_exc_nmi()
is not allowed to be used in this stage for the same reason about
the RSP-located "NMI executing" variable.

For both cases, we should use asm_noist_exc_nmi() which is introduced
in the patch 1 via renaming from an existing asm_xenpv_exc_nmi() and
which is safe on the kernel stack.

https://lore.kernel.org/lkml/20200915191505.10355-3-sean.j.christopherson@intel.com/

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: kvm@vger.kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>

Lai Jiangshan (4):
  x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
  KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  KVM/VMX: fold handle_interrupt_nmi_irqoff() into its solo caller

 arch/x86/include/asm/idtentry.h |  4 +---
 arch/x86/kernel/idt.c           |  8 +++++++-
 arch/x86/kernel/nmi.c           | 12 ++++++++++++
 arch/x86/kvm/vmx/vmx.c          | 27 ++++++++++++++-------------
 arch/x86/xen/enlighten_pv.c     |  9 +++------
 arch/x86/xen/xen-asm.S          |  2 +-
 6 files changed, 38 insertions(+), 24 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
@ 2021-04-26 23:09 ` Lai Jiangshan
  2021-04-28 21:27   ` Steven Rostedt
                     ` (2 more replies)
  2021-04-26 23:09 ` [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Lai Jiangshan
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 19+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

From: Lai Jiangshan <laijs@linux.alibaba.com>

There is no any functionality change intended.  Just rename it and
move it to arch/x86/kernel/nmi.c so that we can resue it later in
next patch for early NMI and kvm.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: kvm@vger.kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/idtentry.h | 2 +-
 arch/x86/kernel/nmi.c           | 8 ++++++++
 arch/x86/xen/enlighten_pv.c     | 9 +++------
 arch/x86/xen/xen-asm.S          | 2 +-
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index e35e342673c7..5b11d2ddbb5c 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -590,7 +590,7 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
 /* NMI */
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
 #ifdef CONFIG_XEN_PV
-DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
+DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	noist_exc_nmi);
 #endif
 
 /* #DB */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index bf250a339655..2b907a76d0a1 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
+#ifdef CONFIG_XEN_PV
+DEFINE_IDTENTRY_RAW(noist_exc_nmi)
+{
+	/* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
+	exc_nmi(regs);
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4f18cd9eacd8..5efbdb0905b7 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -565,12 +565,6 @@ static void xen_write_ldt_entry(struct desc_struct *dt, int entrynum,
 
 void noist_exc_debug(struct pt_regs *regs);
 
-DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
-{
-	/* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
-	exc_nmi(regs);
-}
-
 DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
 {
 	/* On Xen PV, DF doesn't use IST.  The C part is the same as native. */
@@ -626,6 +620,9 @@ struct trap_array_entry {
 	.xen		= xen_asm_xenpv_##func,		\
 	.ist_okay	= ist_ok }
 
+/* Alias to make TRAP_ENTRY_REDIR() happy for nmi */
+#define xen_asm_xenpv_exc_nmi	xen_asm_noist_exc_nmi
+
 static struct trap_array_entry trap_array[] = {
 	TRAP_ENTRY_REDIR(exc_debug,			true  ),
 	TRAP_ENTRY_REDIR(exc_double_fault,		true  ),
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 1e626444712b..12e7cbbb2a8d 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -130,7 +130,7 @@ _ASM_NOKPROBE(xen_\name)
 xen_pv_trap asm_exc_divide_error
 xen_pv_trap asm_xenpv_exc_debug
 xen_pv_trap asm_exc_int3
-xen_pv_trap asm_xenpv_exc_nmi
+xen_pv_trap asm_noist_exc_nmi
 xen_pv_trap asm_exc_overflow
 xen_pv_trap asm_exc_bounds
 xen_pv_trap asm_exc_invalid_op
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
@ 2021-04-26 23:09 ` Lai Jiangshan
  2021-04-30  2:46   ` Lai Jiangshan
                     ` (2 more replies)
  2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
  2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
  3 siblings, 3 replies; 19+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra

From: Lai Jiangshan <laijs@linux.alibaba.com>

In VMX, the NMI handler needs to be invoked after NMI VM-Exit.

Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
indirect call instead of INTn"), the work is done by INTn ("int $2").

But INTn microcode is relatively expensive, so the commit reworked
NMI VM-Exit handling to invoke the kernel handler by function call.
And INTn doesn't set the NMI blocked flag required by the linux kernel
NMI entry.  So moving away from INTn are very reasonable.

Yet some details were missed.  After the said commit applied, the NMI
entry pointer is fetched from the IDT table and called from the kernel
stack.  But the NMI entry pointer installed on the IDT table is
asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
And it relies on the "NMI executing" variable on the IST stack to work
correctly.  When it is unexpectedly called from the kernel stack, the
RSP-located "NMI executing" variable is also on the kernel stack and
is "uninitialized" and can cause the NMI entry to run in the wrong way.

So we should not used the NMI entry installed on the IDT table.  Rather,
we should use the NMI entry allowed to be used on the kernel stack which
is asm_noist_exc_nmi() which is also used for XENPV and early booting.

Link: https://lore.kernel.org/lkml/20200915191505.10355-3-sean.j.christopherson@intel.com/
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: kvm@vger.kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kernel/nmi.c  | 3 +++
 arch/x86/kvm/vmx/vmx.c | 8 ++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 2fb1fd59d714..919f0400d931 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -528,10 +528,13 @@ DEFINE_IDTENTRY_RAW(noist_exc_nmi)
 {
 	/*
 	 * On Xen PV and early booting stage, NMI doesn't use IST.
+	 * And when it is manually called from VMX NMI VM-Exit handler,
+	 * it doesn't use IST either.
 	 * The C part is the same as native.
 	 */
 	exc_nmi(regs);
 }
+EXPORT_SYMBOL_GPL(asm_noist_exc_nmi);
 
 void stop_nmi(void)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcbf0d2139e9..96e59d912637 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
 #include <asm/debugreg.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
 #include <asm/kexec.h>
@@ -6416,8 +6417,11 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+	else if (is_nmi(intr_info)) {
+		kvm_before_interrupt(&vmx->vcpu);
+		vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
+		kvm_after_interrupt(&vmx->vcpu);
+	}
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
  2021-04-26 23:09 ` [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Lai Jiangshan
@ 2021-04-26 23:09 ` Lai Jiangshan
  2021-04-30  9:03   ` Thomas Gleixner
  2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
  3 siblings, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2021-04-26 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

The function handle_interrupt_nmi_irqoff() is called only once and
it doesn't handle for NMI, so its name is outdated.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/vmx/vmx.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 96e59d912637..92c22211203e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6396,16 +6396,6 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
-{
-	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
-	kvm_before_interrupt(vcpu);
-	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
-	kvm_after_interrupt(vcpu);
-}
-
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
@@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
+	unsigned int vector;
+	gate_desc *desc;
 
 	if (WARN_ONCE(!is_external_intr(intr_info),
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, intr_info);
+	vector = intr_info & INTR_INFO_VECTOR_MASK;
+	desc = (gate_desc *)host_idt_base + vector;
+
+	kvm_before_interrupt(vcpu);
+	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+	kvm_after_interrupt(vcpu);
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
@ 2021-04-28 21:27   ` Steven Rostedt
  2021-04-30  7:15     ` Paolo Bonzini
  2021-05-03 19:05   ` Thomas Gleixner
  2021-05-10  7:59   ` Juergen Gross
  2 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2021-04-28 21:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

On Tue, 27 Apr 2021 07:09:46 +0800
Lai Jiangshan <jiangshanlai@gmail.com> wrote:

> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> There is no any functionality change intended.  Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.

Nit, but in change logs, please avoid stating "next patch" as searching git
history (via git blame or whatever) there is no such thing as "next patch".

Just state: "so that we can reuse it for early NMI and KVM."

I also just noticed the typo in "resue". Or maybe both NMI and KVM should
be sued again ;-)

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 ` [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Lai Jiangshan
@ 2021-04-30  2:46   ` Lai Jiangshan
  2021-05-03 19:37   ` Thomas Gleixner
  2021-05-03 20:02   ` Thomas Gleixner
  2 siblings, 0 replies; 19+ messages in thread
From: Lai Jiangshan @ 2021-04-30  2:46 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Thomas Gleixner, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra



On 2021/4/27 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
> 
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
> 
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry.  So moving away from INTn are very reasonable.
> 
> Yet some details were missed.  After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack.  But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly.  When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
> 
> So we should not used the NMI entry installed on the IDT table.  Rather,
> we should use the NMI entry allowed to be used on the kernel stack which
> is asm_noist_exc_nmi() which is also used for XENPV and early booting.
> 

The problem can be tested by the following testing-patch.

1) the testing-patch can be applied without conflict before this patch 3.
    And it shows the problem that the NMI is missed in the case.

2) you need to manually copy the same logic of this testing-patch to verify
    this patch 3. And it shows that the problem is fixed.

3) the only one line of code in vmenter.S just emulates the situation that
    a "uninitialized" garbage in the kernel stack happens to be 1 and it happens
    to be at the same location of the RSP-located "NMI executing" variable.


diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 3a6461694fc2..32096049c2a2 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -316,6 +316,7 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
  #endif
  	pushf
  	push $__KERNEL_CS
+	movq $1, -24(%rsp) // "NMI executing": 1 = nested, non-1 = not-nested
  	CALL_NOSPEC _ASM_ARG1

  	/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcbf0d2139e9..9509d2edd50d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6416,8 +6416,12 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
  	else if (is_machine_check(intr_info))
  		kvm_machine_check();
  	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
+	else if (is_nmi(intr_info)) {
+		unsigned long count = this_cpu_read(irq_stat.__nmi_count);
  		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+		if (count == this_cpu_read(irq_stat.__nmi_count))
+			pr_info("kvm nmi miss\n");
+	}
  }

  static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack
  2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
                   ` (2 preceding siblings ...)
  2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
@ 2021-04-30  7:14 ` Paolo Bonzini
  2021-05-03 14:36   ` Thomas Gleixner
  3 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-04-30  7:14 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky

On 27/04/21 01:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
> 
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
> 
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry.  So moving away from INTn are very reasonable.
> 
> Yet some details were missed.  After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack.  But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly.  When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
> 
> During fixing the problem for KVM, I found that there might be the same
> problem for early booting stage where the IST is not set up. asm_exc_nmi()
> is not allowed to be used in this stage for the same reason about
> the RSP-located "NMI executing" variable.
> 
> For both cases, we should use asm_noist_exc_nmi() which is introduced
> in the patch 1 via renaming from an existing asm_xenpv_exc_nmi() and
> which is safe on the kernel stack.
> 
> https://lore.kernel.org/lkml/20200915191505.10355-3-sean.j.christopherson@intel.com/

For the KVM part,

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks,

Paolo

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: kvm@vger.kernel.org
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Lai Jiangshan (4):
>    x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
>    x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
>    KVM/VMX: Invoke NMI non-IST entry instead of IST entry
>    KVM/VMX: fold handle_interrupt_nmi_irqoff() into its solo caller
> 
>   arch/x86/include/asm/idtentry.h |  4 +---
>   arch/x86/kernel/idt.c           |  8 +++++++-
>   arch/x86/kernel/nmi.c           | 12 ++++++++++++
>   arch/x86/kvm/vmx/vmx.c          | 27 ++++++++++++++-------------
>   arch/x86/xen/enlighten_pv.c     |  9 +++------
>   arch/x86/xen/xen-asm.S          |  2 +-
>   6 files changed, 38 insertions(+), 24 deletions(-)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-28 21:27   ` Steven Rostedt
@ 2021-04-30  7:15     ` Paolo Bonzini
  2021-04-30 12:05       ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-04-30  7:15 UTC (permalink / raw)
  To: Steven Rostedt, Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner,
	Sean Christopherson, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

On 28/04/21 23:27, Steven Rostedt wrote:
> On Tue, 27 Apr 2021 07:09:46 +0800
> Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> 
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>>
>> There is no any functionality change intended.  Just rename it and
>> move it to arch/x86/kernel/nmi.c so that we can resue it later in
>> next patch for early NMI and kvm.
> 
> Nit, but in change logs, please avoid stating "next patch" as searching git
> history (via git blame or whatever) there is no such thing as "next patch".

Interesting, I use next patch(es) relatively often, though you're right 
that something like "in preparation for" works just as well.  Yes, it's 
the previous in "git log", but you get what it's meant in practice. :)

Paolo

> Just state: "so that we can reuse it for early NMI and KVM."
> 
> I also just noticed the typo in "resue". Or maybe both NMI and KVM should
> be sued again ;-)
> 
> -- Steve
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
@ 2021-04-30  9:03   ` Thomas Gleixner
  2021-04-30  9:06     ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-30  9:03 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, kvm

Lai,

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>  	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>  static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>  {
>  	u32 intr_info = vmx_get_intr_info(vcpu);
> +	unsigned int vector;
> +	gate_desc *desc;
>  
>  	if (WARN_ONCE(!is_external_intr(intr_info),
>  	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>  		return;
>  
> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
> +	vector = intr_info & INTR_INFO_VECTOR_MASK;
> +	desc = (gate_desc *)host_idt_base + vector;
> +
> +	kvm_before_interrupt(vcpu);
> +	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> +	kvm_after_interrupt(vcpu);

So the previous patch does:

+               kvm_before_interrupt(&vmx->vcpu);
+               vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
+               kvm_after_interrupt(&vmx->vcpu);

What is this idt gate descriptor dance for in this code?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-30  9:03   ` Thomas Gleixner
@ 2021-04-30  9:06     ` Paolo Bonzini
  2021-04-30 23:28       ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-04-30  9:06 UTC (permalink / raw)
  To: Thomas Gleixner, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

On 30/04/21 11:03, Thomas Gleixner wrote:
> Lai,
> 
> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>   	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>>   static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>>   {
>>   	u32 intr_info = vmx_get_intr_info(vcpu);
>> +	unsigned int vector;
>> +	gate_desc *desc;
>>   
>>   	if (WARN_ONCE(!is_external_intr(intr_info),
>>   	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>>   		return;
>>   
>> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
>> +	vector = intr_info & INTR_INFO_VECTOR_MASK;
>> +	desc = (gate_desc *)host_idt_base + vector;
>> +
>> +	kvm_before_interrupt(vcpu);
>> +	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
>> +	kvm_after_interrupt(vcpu);
> 
> So the previous patch does:
> 
> +               kvm_before_interrupt(&vmx->vcpu);
> +               vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
> +               kvm_after_interrupt(&vmx->vcpu);
> 
> What is this idt gate descriptor dance for in this code?

NMIs are sent through a different vmexit code (the same one as 
exceptions).  This one is for interrupts.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-30  7:15     ` Paolo Bonzini
@ 2021-04-30 12:05       ` Steven Rostedt
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2021-04-30 12:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Lai Jiangshan, linux-kernel, Lai Jiangshan, Thomas Gleixner,
	Sean Christopherson, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Peter Zijlstra,
	Alexandre Chartre, Joerg Roedel, Jian Cai, xen-devel

On Fri, 30 Apr 2021 09:15:51 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> > Nit, but in change logs, please avoid stating "next patch" as searching git
> > history (via git blame or whatever) there is no such thing as "next patch".  
> 
> Interesting, I use next patch(es) relatively often, though you're right 
> that something like "in preparation for" works just as well.  Yes, it's 
> the previous in "git log", but you get what it's meant in practice. :)

It's not always the previous in a git log. Git log sorts by time, and
if an unrelated commit was created in between those two patches, it
will be in between them.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller
  2021-04-30  9:06     ` Paolo Bonzini
@ 2021-04-30 23:28       ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-30 23:28 UTC (permalink / raw)
  To: Paolo Bonzini, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

On Fri, Apr 30 2021 at 11:06, Paolo Bonzini wrote:

> On 30/04/21 11:03, Thomas Gleixner wrote:
>> Lai,
>> 
>> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>>   	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>>> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>>>   static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>>>   {
>>>   	u32 intr_info = vmx_get_intr_info(vcpu);
>>> +	unsigned int vector;
>>> +	gate_desc *desc;
>>>   
>>>   	if (WARN_ONCE(!is_external_intr(intr_info),
>>>   	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>>>   		return;
>>>   
>>> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
>>> +	vector = intr_info & INTR_INFO_VECTOR_MASK;
>>> +	desc = (gate_desc *)host_idt_base + vector;
>>> +
>>> +	kvm_before_interrupt(vcpu);
>>> +	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
>>> +	kvm_after_interrupt(vcpu);
>> 
>> So the previous patch does:
>> 
>> +               kvm_before_interrupt(&vmx->vcpu);
>> +               vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
>> +               kvm_after_interrupt(&vmx->vcpu);
>> 
>> What is this idt gate descriptor dance for in this code?
>
> NMIs are sent through a different vmexit code (the same one as 
> exceptions).  This one is for interrupts.

Duh. Yes. The ability to read is clearly an advantage...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack
  2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
@ 2021-05-03 14:36   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2021-05-03 14:36 UTC (permalink / raw)
  To: Paolo Bonzini, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Josh Poimboeuf, Uros Bizjak, Maxim Levitsky

On Fri, Apr 30 2021 at 09:14, Paolo Bonzini wrote:
> On 27/04/21 01:09, Lai Jiangshan wrote:
>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks Paolo. I'm working through it now...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
  2021-04-28 21:27   ` Steven Rostedt
@ 2021-05-03 19:05   ` Thomas Gleixner
  2021-05-03 19:41     ` Thomas Gleixner
  2021-05-10  7:59   ` Juergen Gross
  2 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-05-03 19:05 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Peter Zijlstra, Alexandre Chartre,
	Joerg Roedel, Jian Cai, xen-devel

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
>
> There is no any functionality change intended.  Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.

'Reuse it later' is not really a proper explanation why this change it
necessary.

Also this can be simplified by using aliasing which keeps the name
spaces intact.

Thanks,

        tglx
---       

--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -135,6 +135,9 @@ static __always_inline void __##func(str
 #define DEFINE_IDTENTRY_RAW(func)					\
 __visible noinstr void func(struct pt_regs *regs)
 
+#define DEFINE_IDTENTRY_RAW_ALIAS(alias, func)				\
+__visible noinstr void func(struct pt_regs *regs) __alias(alias)
+
 /**
  * DECLARE_IDTENTRY_RAW_ERRORCODE - Declare functions for raw IDT entry points
  *				    Error code pushed by hardware
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,8 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 		mds_user_clear_cpu_buffers();
 }
 
+DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, xenpv_exc_nmi);
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -565,12 +565,6 @@ static void xen_write_ldt_entry(struct d
 
 void noist_exc_debug(struct pt_regs *regs);
 
-DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
-{
-	/* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
-	exc_nmi(regs);
-}
-
 DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
 {
 	/* On Xen PV, DF doesn't use IST.  The C part is the same as native. */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 ` [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Lai Jiangshan
  2021-04-30  2:46   ` Lai Jiangshan
@ 2021-05-03 19:37   ` Thomas Gleixner
  2021-05-03 20:02   ` Thomas Gleixner
  2 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2021-05-03 19:37 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
>
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
>
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
>
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry.  So moving away from INTn are very reasonable.
>
> Yet some details were missed.  After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack.  But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly.  When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
>
> So we should not used the NMI entry installed on the IDT table.  Rather,
> we should use the NMI entry allowed to be used on the kernel stack which
> is asm_noist_exc_nmi() which is also used for XENPV and early booting.

It's not used by XENPV. XENPV only uses the C entry point, but the ASM
entry is separate.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-05-03 19:05   ` Thomas Gleixner
@ 2021-05-03 19:41     ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2021-05-03 19:41 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Peter Zijlstra, Alexandre Chartre,
	Joerg Roedel, Jian Cai, xen-devel

On Mon, May 03 2021 at 21:05, Thomas Gleixner wrote:

> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>>
>> There is no any functionality change intended.  Just rename it and
>> move it to arch/x86/kernel/nmi.c so that we can resue it later in
>> next patch for early NMI and kvm.
>
> 'Reuse it later' is not really a proper explanation why this change it
> necessary.
>
> Also this can be simplified by using aliasing which keeps the name
> spaces intact.

Aside of that this is not required to be part of a fixes series which
needs to be backported.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-04-26 23:09 ` [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Lai Jiangshan
  2021-04-30  2:46   ` Lai Jiangshan
  2021-05-03 19:37   ` Thomas Gleixner
@ 2021-05-03 20:02   ` Thomas Gleixner
  2021-05-04  8:10     ` Paolo Bonzini
  2 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-05-03 20:02 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Steven Rostedt, Andi Kleen, Andy Lutomirski, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, Josh Poimboeuf,
	Uros Bizjak, Maxim Levitsky, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Peter Zijlstra

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index bcbf0d2139e9..96e59d912637 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
>  #include <asm/debugreg.h>
>  #include <asm/desc.h>
>  #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
>  #include <asm/io.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/kexec.h>
> @@ -6416,8 +6417,11 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>  	else if (is_machine_check(intr_info))
>  		kvm_machine_check();
>  	/* We need to handle NMIs before interrupts are enabled */
> -	else if (is_nmi(intr_info))
> -		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> +	else if (is_nmi(intr_info)) {

Lacks curly braces for all of the above conditions according to coding style.

> +		kvm_before_interrupt(&vmx->vcpu);
> +		vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
> +		kvm_after_interrupt(&vmx->vcpu);
> +	}

but this and the next patch are not really needed. The below avoids the
extra kvm_before/after() dance in both places. Hmm?

Thanks,

        tglx
---
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -526,6 +526,10 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 
 DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, exc_nmi_noist);
 
+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
 #include <asm/debugreg.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
 #include <asm/kexec.h>
@@ -6395,18 +6396,17 @@ static void vmx_apicv_post_state_restore
 
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+					unsigned long entry)
 {
-	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
 	kvm_before_interrupt(vcpu);
-	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+	vmx_do_interrupt_nmi_irqoff(entry);
 	kvm_after_interrupt(vcpu);
 }
 
 static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 {
+	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
 	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
 
 	/* if exit due to PF check for async PF */
@@ -6417,18 +6417,20 @@ static void handle_exception_nmi_irqoff(
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
 }
 
 static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
+	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+	gate_desc *desc = (gate_desc *)host_idt_base + vector;
 
 	if (WARN_ONCE(!is_external_intr(intr_info),
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
-	handle_interrupt_nmi_irqoff(vcpu, intr_info);
+	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry
  2021-05-03 20:02   ` Thomas Gleixner
@ 2021-05-04  8:10     ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-05-04  8:10 UTC (permalink / raw)
  To: Thomas Gleixner, Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Steven Rostedt, Andi Kleen,
	Andy Lutomirski, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Josh Poimboeuf, Uros Bizjak, Maxim Levitsky,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Peter Zijlstra

On 03/05/21 22:02, Thomas Gleixner wrote:
> but this and the next patch are not really needed. The below avoids the
> extra kvm_before/after() dance in both places. Hmm?

Sure, that's good as well.

Paolo

> Thanks,
> 
>          tglx
> ---
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -526,6 +526,10 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
>   
>   DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, exc_nmi_noist);
>   
> +#if IS_MODULE(CONFIG_KVM_INTEL)
> +EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
> +#endif
> +
>   void stop_nmi(void)
>   {
>   	ignore_nmis++;
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
>   #include <asm/debugreg.h>
>   #include <asm/desc.h>
>   #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
>   #include <asm/io.h>
>   #include <asm/irq_remapping.h>
>   #include <asm/kexec.h>
> @@ -6395,18 +6396,17 @@ static void vmx_apicv_post_state_restore
>   
>   void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
>   
> -static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
> +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
> +					unsigned long entry)
>   {
> -	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> -	gate_desc *desc = (gate_desc *)host_idt_base + vector;
> -
>   	kvm_before_interrupt(vcpu);
> -	vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> +	vmx_do_interrupt_nmi_irqoff(entry);
>   	kvm_after_interrupt(vcpu);
>   }
>   
>   static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>   {
> +	const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
>   	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>   
>   	/* if exit due to PF check for async PF */
> @@ -6417,18 +6417,20 @@ static void handle_exception_nmi_irqoff(
>   		kvm_machine_check();
>   	/* We need to handle NMIs before interrupts are enabled */
>   	else if (is_nmi(intr_info))
> -		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> +		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
>   }
>   
>   static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>   {
>   	u32 intr_info = vmx_get_intr_info(vcpu);
> +	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> +	gate_desc *desc = (gate_desc *)host_idt_base + vector;
>   
>   	if (WARN_ONCE(!is_external_intr(intr_info),
>   	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>   		return;
>   
> -	handle_interrupt_nmi_irqoff(vcpu, intr_info);
> +	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
>   }
>   
>   static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
  2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
  2021-04-28 21:27   ` Steven Rostedt
  2021-05-03 19:05   ` Thomas Gleixner
@ 2021-05-10  7:59   ` Juergen Gross
  2 siblings, 0 replies; 19+ messages in thread
From: Juergen Gross @ 2021-05-10  7:59 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Paolo Bonzini,
	Sean Christopherson, Steven Rostedt, Andi Kleen, Andy Lutomirski,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Josh Poimboeuf, Uros Bizjak, Maxim Levitsky, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Boris Ostrovsky,
	Stefano Stabellini, Peter Zijlstra, Alexandre Chartre,
	Joerg Roedel, Jian Cai, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 969 bytes --]

On 27.04.21 01:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> There is no any functionality change intended.  Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: kvm@vger.kernel.org
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>

Acked-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-05-10  7:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 23:09 [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Lai Jiangshan
2021-04-26 23:09 ` [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi Lai Jiangshan
2021-04-28 21:27   ` Steven Rostedt
2021-04-30  7:15     ` Paolo Bonzini
2021-04-30 12:05       ` Steven Rostedt
2021-05-03 19:05   ` Thomas Gleixner
2021-05-03 19:41     ` Thomas Gleixner
2021-05-10  7:59   ` Juergen Gross
2021-04-26 23:09 ` [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry Lai Jiangshan
2021-04-30  2:46   ` Lai Jiangshan
2021-05-03 19:37   ` Thomas Gleixner
2021-05-03 20:02   ` Thomas Gleixner
2021-05-04  8:10     ` Paolo Bonzini
2021-04-26 23:09 ` [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller Lai Jiangshan
2021-04-30  9:03   ` Thomas Gleixner
2021-04-30  9:06     ` Paolo Bonzini
2021-04-30 23:28       ` Thomas Gleixner
2021-04-30  7:14 ` [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack Paolo Bonzini
2021-05-03 14:36   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).