linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] x86/entry/nmi: solidify userspace NMI entry
@ 2021-06-01  6:52 Lai Jiangshan
  2021-06-01  6:52 ` [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack Lai Jiangshan
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Lai Jiangshan @ 2021-06-01  6:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Current kernel has no code to enforce data breakpoint not on the thread
stack.  If there is any data breakpoint on the top area of the thread
stack, there might be problem.

For example, when NMI hits on userspace in this setting, the code copies
the exception frame from the NMI stack to the thread stack and it will
cause #DB and after #DB is handled, the not yet copied portion on the
NMI stack is in danger of corruption because the NMI is unmasked.

The similar problem happens when #DB hits on userspace with data
breakpoint on the thread stack.  We will also fix it for #DB when we
agree on the problem on NMI and the way to fix it.

The way to fix for NMI is to switch to the entry stack before switching
to the thread stack.  It also paves the path to use idtentry_body
macro for NMI since the huge refactor on entry code made idtentry
macros really low level.

Lai Jiangshan (4):
  x86/entry/nmi: Switch to the entry stack before switching to the
    thread stack
  x86/entry/nmi: Use normal idtentry macro for NMI from userspace
  x86/entry: Remove parameter rdx from macro PUSH_AND_CLEAR_REGS and
    PUSH_REGS
  x86/entry/nmi: unmask NMIs on userspace NMI when entry debugging

Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ingo Molnar <mingo@redhat.com>,
Cc: Borislav Petkov <bp@alien8.de>,
Cc: x86@kernel.org,
Cc: "H. Peter Anvin" <hpa@zytor.com>

 arch/x86/entry/calling.h      |  8 ++--
 arch/x86/entry/entry_64.S     | 82 +++++++++++++++++------------------
 arch/x86/kernel/asm-offsets.c |  1 +
 3 files changed, 44 insertions(+), 47 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-01  6:52 [RFC PATCH 0/4] x86/entry/nmi: solidify userspace NMI entry Lai Jiangshan
@ 2021-06-01  6:52 ` Lai Jiangshan
  2021-06-01 17:05   ` Steven Rostedt
  2021-06-19 22:51   ` Thomas Gleixner
  2021-06-01  6:52 ` [RFC PATCH 2/4] x86/entry/nmi: Use normal idtentry macro for NMI from userspace Lai Jiangshan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Lai Jiangshan @ 2021-06-01  6:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Al Viro, Arvind Sankar

From: Lai Jiangshan <laijs@linux.alibaba.com>

Current kernel has no code to enforce data breakpoint not on the thread
stack.  If there is any data breakpoint on the top area of the thread
stack, there might be problem.

For example, when NMI hits on userspace in this setting, the code copies
the exception frame from the NMI stack to the thread stack and it will
cause #DB and after #DB is handled, the not yet copied portion on the
NMI stack is in danger of corruption because the NMI is unmasked.

Stashing the exception frame on the entry stack before touching the
entry stack can fix the problem.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S     | 22 ++++++++++++++++++++++
 arch/x86/kernel/asm-offsets.c |  1 +
 2 files changed, 23 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a5f02d03c585..4190e668f346 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
 	 *
 	 * We also must not push anything to the stack before switching
 	 * stacks lest we corrupt the "NMI executing" variable.
+	 *
+	 * Before switching to the thread stack, it switches to the entry
+	 * stack first lest there is any data breakpoint in the thread
+	 * stack and the iret of #DB will cause NMI unmasked before
+	 * finishing switching.
 	 */
 
+	/* Switch stack to entry stack */
+	movq	%rsp, %rdx
+	addq	$(+6*8			/* to NMI stack top */		\
+		  -EXCEPTION_STKSZ	/* to NMI stack bottom */	\
+		  -CPU_ENTRY_AREA_nmi_stack /* to entry area */		\
+		  +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
+		  +SIZEOF_entry_stack	/* to entry stack top */	\
+		), %rsp
+
+	/* Stash exception frame and %rdx to entry stack */
+	pushq	5*8(%rdx)	/* pt_regs->ss */
+	pushq	4*8(%rdx)	/* pt_regs->rsp */
+	pushq	3*8(%rdx)	/* pt_regs->flags */
+	pushq	2*8(%rdx)	/* pt_regs->cs */
+	pushq	1*8(%rdx)	/* pt_regs->rip */
+	pushq	0*8(%rdx)	/* %rdx */
+
 	swapgs
 	cld
 	FENCE_SWAPGS_USER_ENTRY
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ecd3fd6993d1..dfafa0c7e887 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -88,6 +88,7 @@ static void __used common(void)
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
 	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
+	OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
 
 	/* Offset for fields in tss_struct */
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/4] x86/entry/nmi: Use normal idtentry macro for NMI from userspace
  2021-06-01  6:52 [RFC PATCH 0/4] x86/entry/nmi: solidify userspace NMI entry Lai Jiangshan
  2021-06-01  6:52 ` [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack Lai Jiangshan
@ 2021-06-01  6:52 ` Lai Jiangshan
  2021-06-03 17:36   ` Andy Lutomirski
  2021-06-01  6:52 ` [RFC PATCH 3/4] x86/entry: Remove parameter rdx from macro PUSH_AND_CLEAR_REGS and PUSH_REGS Lai Jiangshan
  2021-06-01  6:52 ` [RFC PATCH 4/4] x86/entry/nmi: unmask NMIs on userspace NMI when entry debugging Lai Jiangshan
  3 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2021-06-01  6:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Before tglx made huge refactor on entry code, high level code is called
from ASM code including idtentry exit path which might reopen IRQ,
reschedule, do signal among other works and made normal entry path not
suitable for userspace NMI entry.  So when the commit 9b6e6a8334d56
("x86/nmi/64: Switch stacks on userspace NMI entry") added special code
for userspace NMI entry, it didn't use normal entry code.

After the said refactor on entry code, high level code was moved into
C code, and the idtentry macros are really low level and fit for
userspace NMI entry after it switches to entry stack, so this
patch uses idtentry_body macro for NMI from userspace.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 42 ++++++---------------------------------
 1 file changed, 6 insertions(+), 36 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4190e668f346..f54e06139d4b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1113,11 +1113,7 @@ SYM_CODE_START(asm_exc_nmi)
 	jz	.Lnmi_from_kernel
 
 	/*
-	 * NMI from user mode.  We need to run on the thread stack, but we
-	 * can't go through the normal entry paths: NMIs are masked, and
-	 * we don't want to enable interrupts, because then we'll end
-	 * up in an awkward situation in which IRQs are on but NMIs
-	 * are off.
+	 * NMI from user mode.  We need to run on the thread stack.
 	 *
 	 * We also must not push anything to the stack before switching
 	 * stacks lest we corrupt the "NMI executing" variable.
@@ -1137,46 +1133,20 @@ SYM_CODE_START(asm_exc_nmi)
 		  +SIZEOF_entry_stack	/* to entry stack top */	\
 		), %rsp
 
-	/* Stash exception frame and %rdx to entry stack */
+	/* Stash exception frame and restore %rdx */
 	pushq	5*8(%rdx)	/* pt_regs->ss */
 	pushq	4*8(%rdx)	/* pt_regs->rsp */
 	pushq	3*8(%rdx)	/* pt_regs->flags */
 	pushq	2*8(%rdx)	/* pt_regs->cs */
 	pushq	1*8(%rdx)	/* pt_regs->rip */
-	pushq	0*8(%rdx)	/* %rdx */
-
-	swapgs
-	cld
-	FENCE_SWAPGS_USER_ENTRY
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
-	movq	%rsp, %rdx
-	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-	UNWIND_HINT_IRET_REGS base=%rdx offset=8
-	pushq	5*8(%rdx)	/* pt_regs->ss */
-	pushq	4*8(%rdx)	/* pt_regs->rsp */
-	pushq	3*8(%rdx)	/* pt_regs->flags */
-	pushq	2*8(%rdx)	/* pt_regs->cs */
-	pushq	1*8(%rdx)	/* pt_regs->rip */
-	UNWIND_HINT_IRET_REGS
-	pushq   $-1		/* pt_regs->orig_ax */
-	PUSH_AND_CLEAR_REGS rdx=(%rdx)
-	ENCODE_FRAME_POINTER
+	movq	0*8(%rdx), %rdx	/* %rdx */
 
 	/*
 	 * At this point we no longer need to worry about stack damage
-	 * due to nesting -- we're on the normal thread stack and we're
-	 * done with the NMI stack.
-	 */
-
-	movq	%rsp, %rdi
-	movq	$-1, %rsi
-	call	exc_nmi
-
-	/*
-	 * Return back to user mode.  We must *not* do the normal exit
-	 * work, because we don't want to enable interrupts.
+	 * due to nesting -- we're done with the NMI stack.
 	 */
-	jmp	swapgs_restore_regs_and_return_to_usermode
+	pushq	$-1		/* pt_regs->orig_ax */
+	idtentry_body exc_nmi has_error_code=0
 
 .Lnmi_from_kernel:
 	/*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/4] x86/entry: Remove parameter rdx from macro PUSH_AND_CLEAR_REGS and PUSH_REGS
  2021-06-01  6:52 [RFC PATCH 0/4] x86/entry/nmi: solidify userspace NMI entry Lai Jiangshan
  2021-06-01  6:52 ` [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack Lai Jiangshan
  2021-06-01  6:52 ` [RFC PATCH 2/4] x86/entry/nmi: Use normal idtentry macro for NMI from userspace Lai Jiangshan
@ 2021-06-01  6:52 ` Lai Jiangshan
  2021-06-01  6:52 ` [RFC PATCH 4/4] x86/entry/nmi: unmask NMIs on userspace NMI when entry debugging Lai Jiangshan
  3 siblings, 0 replies; 17+ messages in thread
From: Lai Jiangshan @ 2021-06-01  6:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

There is no caller using parameter rdx.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/calling.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a4c061fb7c6e..d63fcc09e722 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -63,7 +63,7 @@ For 32-bit we have the following conventions - kernel is built with
  * for assembly code:
  */
 
-.macro PUSH_REGS rdx=%rdx rax=%rax save_ret=0
+.macro PUSH_REGS rax=%rax save_ret=0
 	.if \save_ret
 	pushq	%rsi		/* pt_regs->si */
 	movq	8(%rsp), %rsi	/* temporarily store the return address in %rsi */
@@ -72,7 +72,7 @@ For 32-bit we have the following conventions - kernel is built with
 	pushq   %rdi		/* pt_regs->di */
 	pushq   %rsi		/* pt_regs->si */
 	.endif
-	pushq	\rdx		/* pt_regs->dx */
+	pushq	%rdx		/* pt_regs->dx */
 	pushq   %rcx		/* pt_regs->cx */
 	pushq   \rax		/* pt_regs->ax */
 	pushq   %r8		/* pt_regs->r8 */
@@ -114,8 +114,8 @@ For 32-bit we have the following conventions - kernel is built with
 
 .endm
 
-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
-	PUSH_REGS rdx=\rdx, rax=\rax, save_ret=\save_ret
+.macro PUSH_AND_CLEAR_REGS rax=%rax save_ret=0
+	PUSH_REGS rax=\rax, save_ret=\save_ret
 	CLEAR_REGS
 .endm
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/4] x86/entry/nmi: unmask NMIs on userspace NMI when entry debugging
  2021-06-01  6:52 [RFC PATCH 0/4] x86/entry/nmi: solidify userspace NMI entry Lai Jiangshan
                   ` (2 preceding siblings ...)
  2021-06-01  6:52 ` [RFC PATCH 3/4] x86/entry: Remove parameter rdx from macro PUSH_AND_CLEAR_REGS and PUSH_REGS Lai Jiangshan
@ 2021-06-01  6:52 ` Lai Jiangshan
  2021-06-03 17:38   ` Andy Lutomirski
  3 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2021-06-01  6:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f54e06139d4b..309e63f4f391 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1055,6 +1055,24 @@ SYM_CODE_START_LOCAL(error_return)
 	jmp	swapgs_restore_regs_and_return_to_usermode
 SYM_CODE_END(error_return)
 
+.macro debug_entry_unmask_NMIs
+#ifdef CONFIG_DEBUG_ENTRY
+	/*
+	 * For ease of testing, unmask NMIs right away.  Disabled by
+	 * default because IRET is very expensive.
+	 */
+	pushq	$0		/* SS */
+	pushq	%rsp		/* RSP (minus 8 because of the previous push) */
+	addq	$8, (%rsp)	/* Fix up RSP */
+	pushfq			/* RFLAGS */
+	pushq	$__KERNEL_CS	/* CS */
+	pushq	$1f		/* RIP */
+	iretq			/* continues with NMI unmasked */
+	UNWIND_HINT_IRET_REGS
+1:
+#endif
+.endm
+
 /*
  * Runs on exception stack.  Xen PV does not go through this path at all,
  * so we can use real assembly here.
@@ -1145,6 +1163,7 @@ SYM_CODE_START(asm_exc_nmi)
 	 * At this point we no longer need to worry about stack damage
 	 * due to nesting -- we're done with the NMI stack.
 	 */
+	debug_entry_unmask_NMIs
 	pushq	$-1		/* pt_regs->orig_ax */
 	idtentry_body exc_nmi has_error_code=0
 
@@ -1286,22 +1305,7 @@ first_nmi:
 	UNWIND_HINT_IRET_REGS
 
 	/* Everything up to here is safe from nested NMIs */
-
-#ifdef CONFIG_DEBUG_ENTRY
-	/*
-	 * For ease of testing, unmask NMIs right away.  Disabled by
-	 * default because IRET is very expensive.
-	 */
-	pushq	$0		/* SS */
-	pushq	%rsp		/* RSP (minus 8 because of the previous push) */
-	addq	$8, (%rsp)	/* Fix up RSP */
-	pushfq			/* RFLAGS */
-	pushq	$__KERNEL_CS	/* CS */
-	pushq	$1f		/* RIP */
-	iretq			/* continues at repeat_nmi below */
-	UNWIND_HINT_IRET_REGS
-1:
-#endif
+	debug_entry_unmask_NMIs
 
 repeat_nmi:
 	/*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-01  6:52 ` [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack Lai Jiangshan
@ 2021-06-01 17:05   ` Steven Rostedt
  2021-06-02  0:09     ` Lai Jiangshan
  2021-06-02  0:16     ` Lai Jiangshan
  2021-06-19 22:51   ` Thomas Gleixner
  1 sibling, 2 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-06-01 17:05 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Al Viro, Arvind Sankar

On Tue,  1 Jun 2021 14:52:14 +0800
Lai Jiangshan <jiangshanlai@gmail.com> wrote:

> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Current kernel has no code to enforce data breakpoint not on the thread
> stack.  If there is any data breakpoint on the top area of the thread
> stack, there might be problem.
> 
> For example, when NMI hits on userspace in this setting, the code copies
> the exception frame from the NMI stack to the thread stack and it will
> cause #DB and after #DB is handled, the not yet copied portion on the
> NMI stack is in danger of corruption because the NMI is unmasked.
> 
> Stashing the exception frame on the entry stack before touching the
> entry stack can fix the problem.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
>  arch/x86/entry/entry_64.S     | 22 ++++++++++++++++++++++
>  arch/x86/kernel/asm-offsets.c |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index a5f02d03c585..4190e668f346 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
>  	 *
>  	 * We also must not push anything to the stack before switching
>  	 * stacks lest we corrupt the "NMI executing" variable.
> +	 *
> +	 * Before switching to the thread stack, it switches to the entry
> +	 * stack first lest there is any data breakpoint in the thread
> +	 * stack and the iret of #DB will cause NMI unmasked before
> +	 * finishing switching.
>  	 */
>  
> +	/* Switch stack to entry stack */
> +	movq	%rsp, %rdx
> +	addq	$(+6*8			/* to NMI stack top */		\
> +		  -EXCEPTION_STKSZ	/* to NMI stack bottom */	\
> +		  -CPU_ENTRY_AREA_nmi_stack /* to entry area */		\

Just so that I understand this correctly. This "entry area" is not part
of the NMI stack, but just at the bottom of it? That is, this part of
the stack will never be touched by an NMI coming in from kernel space,
correct?

-- Steve


> +		  +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
> +		  +SIZEOF_entry_stack	/* to entry stack top */	\
> +		), %rsp
> +
> +	/* Stash exception frame and %rdx to entry stack */
> +	pushq	5*8(%rdx)	/* pt_regs->ss */
> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
> +	pushq	3*8(%rdx)	/* pt_regs->flags */
> +	pushq	2*8(%rdx)	/* pt_regs->cs */
> +	pushq	1*8(%rdx)	/* pt_regs->rip */
> +	pushq	0*8(%rdx)	/* %rdx */
> +
>  	swapgs
>  	cld
>  	FENCE_SWAPGS_USER_ENTRY
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index ecd3fd6993d1..dfafa0c7e887 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -88,6 +88,7 @@ static void __used common(void)
>  	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
>  	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
>  	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
> +	OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
>  
>  	/* Offset for fields in tss_struct */
>  	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-01 17:05   ` Steven Rostedt
@ 2021-06-02  0:09     ` Lai Jiangshan
  2021-06-02  0:16     ` Lai Jiangshan
  1 sibling, 0 replies; 17+ messages in thread
From: Lai Jiangshan @ 2021-06-02  0:09 UTC (permalink / raw)
  To: Steven Rostedt, Lai Jiangshan
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Al Viro, Arvind Sankar



On 2021/6/2 01:05, Steven Rostedt wrote:
> On Tue,  1 Jun 2021 14:52:14 +0800
> Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> 
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>>
>> Current kernel has no code to enforce data breakpoint not on the thread
>> stack.  If there is any data breakpoint on the top area of the thread
>> stack, there might be problem.
>>
>> For example, when NMI hits on userspace in this setting, the code copies
>> the exception frame from the NMI stack to the thread stack and it will
>> cause #DB and after #DB is handled, the not yet copied portion on the
>> NMI stack is in danger of corruption because the NMI is unmasked.
>>
>> Stashing the exception frame on the entry stack before touching the
>> entry stack can fix the problem.
>>
>> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
>> ---
>>   arch/x86/entry/entry_64.S     | 22 ++++++++++++++++++++++
>>   arch/x86/kernel/asm-offsets.c |  1 +
>>   2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index a5f02d03c585..4190e668f346 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
>>   	 *
>>   	 * We also must not push anything to the stack before switching
>>   	 * stacks lest we corrupt the "NMI executing" variable.
>> +	 *
>> +	 * Before switching to the thread stack, it switches to the entry
>> +	 * stack first lest there is any data breakpoint in the thread
>> +	 * stack and the iret of #DB will cause NMI unmasked before
>> +	 * finishing switching.
>>   	 */
>>   
>> +	/* Switch stack to entry stack */
>> +	movq	%rsp, %rdx
>> +	addq	$(+6*8			/* to NMI stack top */		\
>> +		  -EXCEPTION_STKSZ	/* to NMI stack bottom */	\
>> +		  -CPU_ENTRY_AREA_nmi_stack /* to entry area */		\
> 
> Just so that I understand this correctly. This "entry area" is not part
> of the NMI stack, but just at the bottom of it? That is, this part of
> the stack will never be touched by an NMI coming in from kernel space,
> correct?

This "entry area" is the pointer of current CPU's struct cpu_entry_area.

This instruction puts %rsp onto the top of the entry/trampoline stack
which is not touched by an NMI coming in from kernel space.

> 
> -- Steve
> 
> 
>> +		  +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
>> +		  +SIZEOF_entry_stack	/* to entry stack top */	\
>> +		), %rsp
>> +
>> +	/* Stash exception frame and %rdx to entry stack */
>> +	pushq	5*8(%rdx)	/* pt_regs->ss */
>> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
>> +	pushq	3*8(%rdx)	/* pt_regs->flags */
>> +	pushq	2*8(%rdx)	/* pt_regs->cs */
>> +	pushq	1*8(%rdx)	/* pt_regs->rip */
>> +	pushq	0*8(%rdx)	/* %rdx */
>> +
>>   	swapgs
>>   	cld
>>   	FENCE_SWAPGS_USER_ENTRY
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index ecd3fd6993d1..dfafa0c7e887 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -88,6 +88,7 @@ static void __used common(void)
>>   	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
>>   	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
>>   	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
>> +	OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
>>   
>>   	/* Offset for fields in tss_struct */
>>   	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-01 17:05   ` Steven Rostedt
  2021-06-02  0:09     ` Lai Jiangshan
@ 2021-06-02  0:16     ` Lai Jiangshan
  1 sibling, 0 replies; 17+ messages in thread
From: Lai Jiangshan @ 2021-06-02  0:16 UTC (permalink / raw)
  To: Steven Rostedt, Lai Jiangshan
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Al Viro, Arvind Sankar



On 2021/6/2 01:05, Steven Rostedt wrote:
> On Tue,  1 Jun 2021 14:52:14 +0800
> Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> 
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>>
>> Current kernel has no code to enforce data breakpoint not on the thread
>> stack.  If there is any data breakpoint on the top area of the thread
>> stack, there might be problem.
>>
>> For example, when NMI hits on userspace in this setting, the code copies
>> the exception frame from the NMI stack to the thread stack and it will
>> cause #DB and after #DB is handled, the not yet copied portion on the
>> NMI stack is in danger of corruption because the NMI is unmasked.
>>
>> Stashing the exception frame on the entry stack before touching the
>> entry stack can fix the problem.
>>
>> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
>> ---
>>   arch/x86/entry/entry_64.S     | 22 ++++++++++++++++++++++
>>   arch/x86/kernel/asm-offsets.c |  1 +
>>   2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index a5f02d03c585..4190e668f346 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
>>   	 *
>>   	 * We also must not push anything to the stack before switching
>>   	 * stacks lest we corrupt the "NMI executing" variable.
>> +	 *
>> +	 * Before switching to the thread stack, it switches to the entry
>> +	 * stack first lest there is any data breakpoint in the thread
>> +	 * stack and the iret of #DB will cause NMI unmasked before
>> +	 * finishing switching.
>>   	 */
>>   
>> +	/* Switch stack to entry stack */
>> +	movq	%rsp, %rdx
>> +	addq	$(+6*8			/* to NMI stack top */		\
>> +		  -EXCEPTION_STKSZ	/* to NMI stack bottom */	\
>> +		  -CPU_ENTRY_AREA_nmi_stack /* to entry area */		\
> 
> Just so that I understand this correctly. This "entry area" is not part
> of the NMI stack, but just at the bottom of it? That is, this part of
> the stack will never be touched by an NMI coming in from kernel space,
> correct?

The NMI stack, exception stacks, entry stack, TSS, GDT are part of this
"entry area" (struct cpu_entry_area).

> 
> -- Steve
> 
> 
>> +		  +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
>> +		  +SIZEOF_entry_stack	/* to entry stack top */	\
>> +		), %rsp
>> +
>> +	/* Stash exception frame and %rdx to entry stack */
>> +	pushq	5*8(%rdx)	/* pt_regs->ss */
>> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
>> +	pushq	3*8(%rdx)	/* pt_regs->flags */
>> +	pushq	2*8(%rdx)	/* pt_regs->cs */
>> +	pushq	1*8(%rdx)	/* pt_regs->rip */
>> +	pushq	0*8(%rdx)	/* %rdx */
>> +
>>   	swapgs
>>   	cld
>>   	FENCE_SWAPGS_USER_ENTRY
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index ecd3fd6993d1..dfafa0c7e887 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -88,6 +88,7 @@ static void __used common(void)
>>   	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
>>   	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
>>   	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
>> +	OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
>>   
>>   	/* Offset for fields in tss_struct */
>>   	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/4] x86/entry/nmi: Use normal idtentry macro for NMI from userspace
  2021-06-01  6:52 ` [RFC PATCH 2/4] x86/entry/nmi: Use normal idtentry macro for NMI from userspace Lai Jiangshan
@ 2021-06-03 17:36   ` Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2021-06-03 17:36 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin

On 5/31/21 11:52 PM, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Before tglx made huge refactor on entry code, high level code is called
> from ASM code including idtentry exit path which might reopen IRQ,
> reschedule, do signal among other works and made normal entry path not
> suitable for userspace NMI entry.  So when the commit 9b6e6a8334d56
> ("x86/nmi/64: Switch stacks on userspace NMI entry") added special code
> for userspace NMI entry, it didn't use normal entry code.
> 
> After the said refactor on entry code, high level code was moved into
> C code, and the idtentry macros are really low level and fit for
> userspace NMI entry after it switches to entry stack, so this
> patch uses idtentry_body macro for NMI from userspace.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
>  arch/x86/entry/entry_64.S | 42 ++++++---------------------------------
>  1 file changed, 6 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4190e668f346..f54e06139d4b 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1113,11 +1113,7 @@ SYM_CODE_START(asm_exc_nmi)
>  	jz	.Lnmi_from_kernel
>  
>  	/*
> -	 * NMI from user mode.  We need to run on the thread stack, but we
> -	 * can't go through the normal entry paths: NMIs are masked, and
> -	 * we don't want to enable interrupts, because then we'll end
> -	 * up in an awkward situation in which IRQs are on but NMIs
> -	 * are off.
> +	 * NMI from user mode.  We need to run on the thread stack.

This comment is IMO still important, but I think you're right that it no
longer matters in the asm.  Could you relocate the comment to the
appropriate place in the C code so that a future cleanup doesn't mess up
the C path?

Thanks,
Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/4] x86/entry/nmi: unmask NMIs on userspace NMI when entry debugging
  2021-06-01  6:52 ` [RFC PATCH 4/4] x86/entry/nmi: unmask NMIs on userspace NMI when entry debugging Lai Jiangshan
@ 2021-06-03 17:38   ` Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2021-06-03 17:38 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin

On 5/31/21 11:52 PM, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>

Why?

> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
>  arch/x86/entry/entry_64.S | 36 ++++++++++++++++++++----------------
>  1 file changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index f54e06139d4b..309e63f4f391 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1055,6 +1055,24 @@ SYM_CODE_START_LOCAL(error_return)
>  	jmp	swapgs_restore_regs_and_return_to_usermode
>  SYM_CODE_END(error_return)
>  
> +.macro debug_entry_unmask_NMIs
> +#ifdef CONFIG_DEBUG_ENTRY
> +	/*
> +	 * For ease of testing, unmask NMIs right away.  Disabled by
> +	 * default because IRET is very expensive.
> +	 */
> +	pushq	$0		/* SS */
> +	pushq	%rsp		/* RSP (minus 8 because of the previous push) */
> +	addq	$8, (%rsp)	/* Fix up RSP */
> +	pushfq			/* RFLAGS */
> +	pushq	$__KERNEL_CS	/* CS */
> +	pushq	$1f		/* RIP */
> +	iretq			/* continues with NMI unmasked */
> +	UNWIND_HINT_IRET_REGS
> +1:
> +#endif
> +.endm
> +
>  /*
>   * Runs on exception stack.  Xen PV does not go through this path at all,
>   * so we can use real assembly here.
> @@ -1145,6 +1163,7 @@ SYM_CODE_START(asm_exc_nmi)
>  	 * At this point we no longer need to worry about stack damage
>  	 * due to nesting -- we're done with the NMI stack.
>  	 */
> +	debug_entry_unmask_NMIs
>  	pushq	$-1		/* pt_regs->orig_ax */
>  	idtentry_body exc_nmi has_error_code=0
>  
> @@ -1286,22 +1305,7 @@ first_nmi:
>  	UNWIND_HINT_IRET_REGS
>  
>  	/* Everything up to here is safe from nested NMIs */
> -
> -#ifdef CONFIG_DEBUG_ENTRY
> -	/*
> -	 * For ease of testing, unmask NMIs right away.  Disabled by
> -	 * default because IRET is very expensive.
> -	 */
> -	pushq	$0		/* SS */
> -	pushq	%rsp		/* RSP (minus 8 because of the previous push) */
> -	addq	$8, (%rsp)	/* Fix up RSP */
> -	pushfq			/* RFLAGS */
> -	pushq	$__KERNEL_CS	/* CS */
> -	pushq	$1f		/* RIP */
> -	iretq			/* continues at repeat_nmi below */
> -	UNWIND_HINT_IRET_REGS
> -1:
> -#endif
> +	debug_entry_unmask_NMIs
>  
>  repeat_nmi:
>  	/*
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-01  6:52 ` [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack Lai Jiangshan
  2021-06-01 17:05   ` Steven Rostedt
@ 2021-06-19 22:51   ` Thomas Gleixner
  2021-06-20  3:13     ` Andy Lutomirski
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2021-06-19 22:51 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Steven Rostedt, Lai Jiangshan, Andy Lutomirski, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Al Viro, Arvind Sankar

On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
>
> Current kernel has no code to enforce data breakpoint not on the thread
> stack.  If there is any data breakpoint on the top area of the thread
> stack, there might be problem.

And because the kernel does not prevent data breakpoints on the thread
stack we need to do more complicated things in the already horrible
entry code instead of just doing the obvious and preventing data
breakpoints on the thread stack?

Confused.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-19 22:51   ` Thomas Gleixner
@ 2021-06-20  3:13     ` Andy Lutomirski
  2021-06-20 11:23       ` Thomas Gleixner
  2021-06-25 10:40       ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Andy Lutomirski @ 2021-06-20  3:13 UTC (permalink / raw)
  To: Thomas Gleixner, Lai Jiangshan, Linux Kernel Mailing List
  Cc: Steven Rostedt, Lai Jiangshan, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Al Viro, Arvind Sankar



On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> > From: Lai Jiangshan <laijs@linux.alibaba.com>
> >
> > Current kernel has no code to enforce data breakpoint not on the thread
> > stack.  If there is any data breakpoint on the top area of the thread
> > stack, there might be problem.
> 
> And because the kernel does not prevent data breakpoints on the thread
> stack we need to do more complicated things in the already horrible
> entry code instead of just doing the obvious and preventing data
> breakpoints on the thread stack?

Preventing breakpoints on the thread stack is a bit messy: it’s possible for a breakpoint to be set before the address in question is allocated for the thread stack.

None of this is NMI-specific. #DB itself has the same problem.  We could plausibly solve it differently by disarming breakpoints in the entry asm before switching stacks. I’m not sure how much I like that approach.

> 
> Confused.
> 
> Thanks,
> 
>         tglx
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-20  3:13     ` Andy Lutomirski
@ 2021-06-20 11:23       ` Thomas Gleixner
  2021-06-25 10:40       ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2021-06-20 11:23 UTC (permalink / raw)
  To: Andy Lutomirski, Lai Jiangshan, Linux Kernel Mailing List
  Cc: Steven Rostedt, Lai Jiangshan, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Al Viro, Arvind Sankar

On Sat, Jun 19 2021 at 20:13, Andy Lutomirski wrote:
> On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
>> On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
>> > From: Lai Jiangshan <laijs@linux.alibaba.com>
>> >
>> > Current kernel has no code to enforce data breakpoint not on the thread
>> > stack.  If there is any data breakpoint on the top area of the thread
>> > stack, there might be problem.
>> 
>> And because the kernel does not prevent data breakpoints on the thread
>> stack we need to do more complicated things in the already horrible
>> entry code instead of just doing the obvious and preventing data
>> breakpoints on the thread stack?
>
> Preventing breakpoints on the thread stack is a bit messy: it’s
> possible for a breakpoint to be set before the address in question is
> allocated for the thread stack.

Bah.

> None of this is NMI-specific. #DB itself has the same problem.

Oh well.

> We could plausibly solve it differently by disarming breakpoints in
> the entry asm before switching stacks. I’m not sure how much I like
> that approach.

That's ugly and TBH in some sense is a breakpoint on the thread stack a
violation of noinstr. I rather see them prevented completely, but yes
that would have to be expanded to pretty much any variable which is
touched in noinstr sections. What a mess.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-20  3:13     ` Andy Lutomirski
  2021-06-20 11:23       ` Thomas Gleixner
@ 2021-06-25 10:40       ` Peter Zijlstra
  2021-06-25 11:00         ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-06-25 10:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Lai Jiangshan, Linux Kernel Mailing List,
	Steven Rostedt, Lai Jiangshan, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin, Juergen Gross, Al Viro,
	Arvind Sankar

On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
> 
> 
> On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> > >
> > > Current kernel has no code to enforce data breakpoint not on the thread
> > > stack.  If there is any data breakpoint on the top area of the thread
> > > stack, there might be problem.
> > 
> > And because the kernel does not prevent data breakpoints on the thread
> > stack we need to do more complicated things in the already horrible
> > entry code instead of just doing the obvious and preventing data
> > breakpoints on the thread stack?
> 
> Preventing breakpoints on the thread stack is a bit messy: it’s
> possible for a breakpoint to be set before the address in question is
> allocated for the thread stack.

How about we call into C from the entry stack and have the from-user
stack swizzle there. The from-kernel entries land on the ISTs and those
are already excluded.

> None of this is NMI-specific. #DB itself has the same problem.  We
> could plausibly solve it differently by disarming breakpoints in the
> entry asm before switching stacks. I’m not sure how much I like that
> approach.

I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
we recurse, we'll get a from-kernel trap, which will land on the IST,
whcih is excluded, and then we clear DR7 there.

IST and entry stack are excluded, the only problem we have is thread
stack, and that can be solved by calling into C from the entry stack.

I should put teaching objtool about .data references from .noinstr.text
and .entry.text higher on the todo list I suppose ...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-25 10:40       ` Peter Zijlstra
@ 2021-06-25 11:00         ` Peter Zijlstra
  2021-06-26  7:03           ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-06-25 11:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Lai Jiangshan, Linux Kernel Mailing List,
	Steven Rostedt, Lai Jiangshan, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin, Juergen Gross, Al Viro,
	Arvind Sankar

On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
> > 
> > 
> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> > > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> > > >
> > > > Current kernel has no code to enforce data breakpoint not on the thread
> > > > stack.  If there is any data breakpoint on the top area of the thread
> > > > stack, there might be problem.
> > > 
> > > And because the kernel does not prevent data breakpoints on the thread
> > > stack we need to do more complicated things in the already horrible
> > > entry code instead of just doing the obvious and preventing data
> > > breakpoints on the thread stack?
> > 
> > Preventing breakpoints on the thread stack is a bit messy: it’s
> > possible for a breakpoint to be set before the address in question is
> > allocated for the thread stack.
> 
> How about we call into C from the entry stack and have the from-user
> stack swizzle there. The from-kernel entries land on the ISTs and those
> are already excluded.
> 
> > None of this is NMI-specific. #DB itself has the same problem.  We
> > could plausibly solve it differently by disarming breakpoints in the
> > entry asm before switching stacks. I’m not sure how much I like that
> > approach.
> 
> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
> we recurse, we'll get a from-kernel trap, which will land on the IST,
> whcih is excluded, and then we clear DR7 there.
> 
> IST and entry stack are excluded, the only problem we have is thread
> stack, and that can be solved by calling into C from the entry stack.
> 
> I should put teaching objtool about .data references from .noinstr.text
> and .entry.text higher on the todo list I suppose ...

Also, I think we can run the from-user exceptions on the entry stack,
without ever switching to the kernel stack, except for #PF, which is
magical and schedules.

Same for SYSCALL, leave switching to the thread stack until C, somewhere
late, right before we'd enable IRQs or something.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-25 11:00         ` Peter Zijlstra
@ 2021-06-26  7:03           ` Thomas Gleixner
  2021-06-26  8:28             ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2021-06-26  7:03 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: Lai Jiangshan, Linux Kernel Mailing List, Steven Rostedt,
	Lai Jiangshan, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin, Juergen Gross, Al Viro,
	Arvind Sankar

On Fri, Jun 25 2021 at 13:00, Peter Zijlstra wrote:
> On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
>> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
>> > 
>> > 
>> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
>> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
>> > > > From: Lai Jiangshan <laijs@linux.alibaba.com>
>> > > >
>> > > > Current kernel has no code to enforce data breakpoint not on the thread
>> > > > stack.  If there is any data breakpoint on the top area of the thread
>> > > > stack, there might be problem.
>> > > 
>> > > And because the kernel does not prevent data breakpoints on the thread
>> > > stack we need to do more complicated things in the already horrible
>> > > entry code instead of just doing the obvious and preventing data
>> > > breakpoints on the thread stack?
>> > 
>> > Preventing breakpoints on the thread stack is a bit messy: it’s
>> > possible for a breakpoint to be set before the address in question is
>> > allocated for the thread stack.
>> 
>> How about we call into C from the entry stack and have the from-user
>> stack swizzle there. The from-kernel entries land on the ISTs and those
>> are already excluded.
>> 
>> > None of this is NMI-specific. #DB itself has the same problem.  We
>> > could plausibly solve it differently by disarming breakpoints in the
>> > entry asm before switching stacks. I’m not sure how much I like that
>> > approach.
>> 
>> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
>> we recurse, we'll get a from-kernel trap, which will land on the IST,
>> whcih is excluded, and then we clear DR7 there.
>> 
>> IST and entry stack are excluded, the only problem we have is thread
>> stack, and that can be solved by calling into C from the entry stack.
>> 
>> I should put teaching objtool about .data references from .noinstr.text
>> and .entry.text higher on the todo list I suppose ...
>
> Also, I think we can run the from-user exceptions on the entry stack,
> without ever switching to the kernel stack, except for #PF, which is
> magical and schedules.

No. Pretty much any exception coming from user space can schedule and
even if it does not voluntary it can be preempted.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
  2021-06-26  7:03           ` Thomas Gleixner
@ 2021-06-26  8:28             ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-06-26  8:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Lai Jiangshan, Linux Kernel Mailing List,
	Steven Rostedt, Lai Jiangshan, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin, Juergen Gross, Al Viro,
	Arvind Sankar

On Sat, Jun 26, 2021 at 09:03:23AM +0200, Thomas Gleixner wrote:
> On Fri, Jun 25 2021 at 13:00, Peter Zijlstra wrote:
> > On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
> >> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
> >> > 
> >> > 
> >> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> >> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> >> > > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> >> > > >
> >> > > > Current kernel has no code to enforce data breakpoint not on the thread
> >> > > > stack.  If there is any data breakpoint on the top area of the thread
> >> > > > stack, there might be problem.
> >> > > 
> >> > > And because the kernel does not prevent data breakpoints on the thread
> >> > > stack we need to do more complicated things in the already horrible
> >> > > entry code instead of just doing the obvious and preventing data
> >> > > breakpoints on the thread stack?
> >> > 
> >> > Preventing breakpoints on the thread stack is a bit messy: it’s
> >> > possible for a breakpoint to be set before the address in question is
> >> > allocated for the thread stack.
> >> 
> >> How about we call into C from the entry stack and have the from-user
> >> stack swizzle there. The from-kernel entries land on the ISTs and those
> >> are already excluded.
> >> 
> >> > None of this is NMI-specific. #DB itself has the same problem.  We
> >> > could plausibly solve it differently by disarming breakpoints in the
> >> > entry asm before switching stacks. I’m not sure how much I like that
> >> > approach.
> >> 
> >> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
> >> we recurse, we'll get a from-kernel trap, which will land on the IST,
> >> whcih is excluded, and then we clear DR7 there.
> >> 
> >> IST and entry stack are excluded, the only problem we have is thread
> >> stack, and that can be solved by calling into C from the entry stack.
> >> 
> >> I should put teaching objtool about .data references from .noinstr.text
> >> and .entry.text higher on the todo list I suppose ...
> >
> > Also, I think we can run the from-user exceptions on the entry stack,
> > without ever switching to the kernel stack, except for #PF, which is
> > magical and schedules.
> 
> No. Pretty much any exception coming from user space can schedule and
> even if it does not voluntary it can be preempted.

Won't most of them have IRQs disabled throughout? In any case, I think
we should only switch to the task stack right around the time we're
ready to enable IRQs just like for syscall/#PF, not earlier.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-26  8:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01  6:52 [RFC PATCH 0/4] x86/entry/nmi: solidify userspace NMI entry Lai Jiangshan
2021-06-01  6:52 ` [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack Lai Jiangshan
2021-06-01 17:05   ` Steven Rostedt
2021-06-02  0:09     ` Lai Jiangshan
2021-06-02  0:16     ` Lai Jiangshan
2021-06-19 22:51   ` Thomas Gleixner
2021-06-20  3:13     ` Andy Lutomirski
2021-06-20 11:23       ` Thomas Gleixner
2021-06-25 10:40       ` Peter Zijlstra
2021-06-25 11:00         ` Peter Zijlstra
2021-06-26  7:03           ` Thomas Gleixner
2021-06-26  8:28             ` Peter Zijlstra
2021-06-01  6:52 ` [RFC PATCH 2/4] x86/entry/nmi: Use normal idtentry macro for NMI from userspace Lai Jiangshan
2021-06-03 17:36   ` Andy Lutomirski
2021-06-01  6:52 ` [RFC PATCH 3/4] x86/entry: Remove parameter rdx from macro PUSH_AND_CLEAR_REGS and PUSH_REGS Lai Jiangshan
2021-06-01  6:52 ` [RFC PATCH 4/4] x86/entry/nmi: unmask NMIs on userspace NMI when entry debugging Lai Jiangshan
2021-06-03 17:38   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).