linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/18] Pile o' entry/exit/sp0 changes
@ 2017-10-26  8:26 Andy Lutomirski
  2017-10-26  8:26 ` [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
                   ` (18 more replies)
  0 siblings, 19 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Hi all-

Here's the first "real" version of this series.  It's based on Linus'
tree from a couple days ago.

Changes from the old RFC version:
 - Rebase
 - Add Juergen's patch
 - Add some assertions
 - Cleanups

Andy Lutomirski (17):
  x86/asm/64: Remove the restore_c_regs_and_iret label
  x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  x86/asm/64: Simplify reg restore code in the standard IRET paths
  x86/asm/64: Shrink paranoid_exit_restore and make labels local
  x86/asm/64: Use pop instead of movq in syscall_return_via_sysret
  x86/asm/64: Merge the fast and slow SYSRET paths
  x86/asm/64: De-Xen-ify our NMI code
  x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of
    native_load_sp0()
  x86/asm/64: Pass sp0 directly to load_sp0()
  x86/asm: Add task_top_of_stack() to find the top of a task's stack
  x86/xen/64: Clean up SP code in cpu_initialize_context()
  x86/boot/64: Stop initializing TSS.sp0 at boot
  x86/asm/64: Remove all remaining direct thread_struct::sp0 reads
  x86/boot/32: Fix cpu_current_top_of_stack initialization at boot
  x86/asm/64: Remove thread_struct::sp0
  x86/traps: Use a new on_thread_stack() helper to clean up an assertion

Juergen Gross (1):
  xen: add xen nmi trap entry

 arch/x86/entry/calling.h              |  21 ++++++
 arch/x86/entry/entry_64.S             | 120 +++++++++++++++++++---------------
 arch/x86/entry/entry_64_compat.S      |   3 +-
 arch/x86/include/asm/compat.h         |   1 +
 arch/x86/include/asm/paravirt.h       |   5 +-
 arch/x86/include/asm/paravirt_types.h |   2 +-
 arch/x86/include/asm/processor.h      |  57 ++++++++--------
 arch/x86/include/asm/switch_to.h      |  23 +++++++
 arch/x86/include/asm/traps.h          |   2 +-
 arch/x86/kernel/cpu/common.c          |  12 +++-
 arch/x86/kernel/head_64.S             |   2 +-
 arch/x86/kernel/process.c             |   3 +-
 arch/x86/kernel/process_32.c          |   3 +-
 arch/x86/kernel/process_64.c          |   5 +-
 arch/x86/kernel/smpboot.c             |   3 +-
 arch/x86/kernel/traps.c               |   3 +-
 arch/x86/kernel/vm86_32.c             |  20 +++---
 arch/x86/xen/enlighten_pv.c           |   9 ++-
 arch/x86/xen/smp_pv.c                 |  17 ++++-
 arch/x86/xen/xen-asm_64.S             |   2 +-
 20 files changed, 195 insertions(+), 118 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26 15:07   ` Borislav Petkov
  2017-11-10  6:08   ` [01/18] " kemi
  2017-10-26  8:26 ` [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

The only user was the 64-bit opportunistic SYSRET failure path, and
that path didn't really need it.  This change makes the
opportunistic SYSRET code a bit more straightforward and gets rid of
the label.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49167258d587..afe1f403fa0e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -245,7 +245,6 @@ entry_SYSCALL64_slow_path:
 	call	do_syscall_64		/* returns with IRQs disabled */
 
 return_from_SYSCALL_64:
-	RESTORE_EXTRA_REGS
 	TRACE_IRQS_IRETQ		/* we're about to change IF */
 
 	/*
@@ -314,6 +313,7 @@ return_from_SYSCALL_64:
 	 */
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
+	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS_EXCEPT_RCX_R11
 	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
@@ -321,7 +321,7 @@ syscall_return_via_sysret:
 
 opportunistic_sysret_failed:
 	SWAPGS
-	jmp	restore_c_regs_and_iret
+	jmp	restore_regs_and_iret
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -638,7 +638,6 @@ retint_kernel:
  */
 GLOBAL(restore_regs_and_iret)
 	RESTORE_EXTRA_REGS
-restore_c_regs_and_iret:
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
 	INTERRUPT_RETURN
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
  2017-10-26  8:26 ` [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-27 18:05   ` Dave Hansen
  2017-10-27 20:04   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

These code paths will diverge soon.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S        | 32 +++++++++++++++++++++++---------
 arch/x86/entry/entry_64_compat.S |  2 +-
 arch/x86/kernel/head_64.S        |  2 +-
 3 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index afe1f403fa0e..493e5e234d36 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -321,7 +321,7 @@ syscall_return_via_sysret:
 
 opportunistic_sysret_failed:
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -423,7 +423,7 @@ ENTRY(ret_from_fork)
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 
 1:
 	/* kernel thread */
@@ -612,7 +612,19 @@ GLOBAL(retint_user)
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
 	SWAPGS
-	jmp	restore_regs_and_iret
+
+GLOBAL(restore_regs_and_return_to_usermode)
+#ifdef CONFIG_DEBUG_ENTRY
+	testl	$3, CS(%rsp)
+	jnz	1f
+	ud2
+1:
+#endif
+	RESTORE_EXTRA_REGS
+	RESTORE_C_REGS
+	REMOVE_PT_GPREGS_FROM_STACK 8
+	INTERRUPT_RETURN
+
 
 /* Returning to kernel space */
 retint_kernel:
@@ -632,11 +644,13 @@ retint_kernel:
 	 */
 	TRACE_IRQS_IRETQ
 
-/*
- * At this label, code paths which return to kernel and to user,
- * which come from interrupts/exception and from syscalls, merge.
- */
-GLOBAL(restore_regs_and_iret)
+GLOBAL(restore_regs_and_return_to_kernel)
+#ifdef CONFIG_DEBUG_ENTRY
+	testl	$3, CS(%rsp)
+	jz	1f
+	ud2
+1:
+#endif
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1327,7 +1341,7 @@ ENTRY(nmi)
 	 * work, because we don't want to enable interrupts.
 	 */
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e26c25ca7756..9ca014a99968 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -337,7 +337,7 @@ ENTRY(entry_INT80_compat)
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 END(entry_INT80_compat)
 
 ENTRY(stub32_clone)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 513cbb012ecc..0b01a105251b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -326,7 +326,7 @@ early_idt_handler_common:
 
 20:
 	decl early_recursion_flag(%rip)
-	jmp restore_regs_and_iret
+	jmp restore_regs_and_return_to_kernel
 ENDPROC(early_idt_handler_common)
 
 	__INITDATA
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
  2017-10-26  8:26 ` [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
  2017-10-26  8:26 ` [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26 13:52   ` Brian Gerst
  2017-10-27 18:08   ` Dave Hansen
  2017-10-26  8:26 ` [PATCH 04/18] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

All of the code paths that ended up doing IRET to usermode did
SWAPGS immediately beforehand.  Move the SWAPGS into the common
code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S        | 26 ++++++++++----------------
 arch/x86/entry/entry_64_compat.S |  3 +--
 2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 493e5e234d36..1909a4e42b81 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -254,7 +254,7 @@ return_from_SYSCALL_64:
 	movq	RCX(%rsp), %rcx
 	movq	RIP(%rsp), %r11
 	cmpq	%rcx, %r11			/* RCX == RIP */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
@@ -272,14 +272,14 @@ return_from_SYSCALL_64:
 
 	/* If this changed %rcx, it was not canonical */
 	cmpq	%rcx, %r11
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	movq	R11(%rsp), %r11
 	cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
@@ -300,12 +300,12 @@ return_from_SYSCALL_64:
 	 * would never get past 'stuck_here'.
 	 */
 	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
-	jnz	opportunistic_sysret_failed
+	jnz	swapgs_restore_regs_and_return_to_usermode
 
 	/* nothing to check for RSP */
 
 	cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * We win! This label is here just for ease of understanding
@@ -318,10 +318,6 @@ syscall_return_via_sysret:
 	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
 	USERGS_SYSRET64
-
-opportunistic_sysret_failed:
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -422,8 +418,7 @@ ENTRY(ret_from_fork)
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 
 1:
 	/* kernel thread */
@@ -611,15 +606,15 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
-	SWAPGS
 
-GLOBAL(restore_regs_and_return_to_usermode)
+GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
 	testl	$3, CS(%rsp)
 	jnz	1f
 	ud2
 1:
 #endif
+	SWAPGS
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1340,8 +1335,7 @@ ENTRY(nmi)
 	 * Return back to user mode.  We must *not* do the normal exit
 	 * work, because we don't want to enable interrupts.
 	 */
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9ca014a99968..932b96ce1b06 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -336,8 +336,7 @@ ENTRY(entry_INT80_compat)
 
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 END(entry_INT80_compat)
 
 ENTRY(stub32_clone)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 04/18] x86/asm/64: Simplify reg restore code in the standard IRET paths
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (2 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26  8:26 ` [PATCH 05/18] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

The old code restored all the registers with movq instead of pop.
In theory, this was done because some CPUs have higher movq
throughput, but any gain there would be tiny and is almost certainly
outweighed by the higher text size.

This saves 96 bytes of text.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/calling.h  | 21 +++++++++++++++++++++
 arch/x86/entry/entry_64.S | 12 ++++++------
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 640aafebdc00..0b9dd8123701 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -151,6 +151,27 @@ For 32-bit we have the following conventions - kernel is built with
 	UNWIND_HINT_REGS offset=\offset extra=0
 	.endm
 
+	.macro POP_EXTRA_REGS
+	popq %r15
+	popq %r14
+	popq %r13
+	popq %r12
+	popq %rbp
+	popq %rbx
+	.endm
+
+	.macro POP_C_REGS
+	popq %r11
+	popq %r10
+	popq %r9
+	popq %r8
+	popq %rax
+	popq %rcx
+	popq %rdx
+	popq %rsi
+	popq %rdi
+	.endm
+
 	.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
 	.if \rstor_r11
 	movq 6*8(%rsp), %r11
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1909a4e42b81..4ad40067162a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -615,9 +615,9 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 1:
 #endif
 	SWAPGS
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
+	POP_EXTRA_REGS
+	POP_C_REGS
+	addq	$8, %rsp
 	INTERRUPT_RETURN
 
 
@@ -646,9 +646,9 @@ GLOBAL(restore_regs_and_return_to_kernel)
 	ud2
 1:
 #endif
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
+	POP_EXTRA_REGS
+	POP_C_REGS
+	addq	$8, %rsp
 	INTERRUPT_RETURN
 
 ENTRY(native_iret)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 05/18] x86/asm/64: Shrink paranoid_exit_restore and make labels local
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (3 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 04/18] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-27 20:07   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 06/18] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

paranoid_exit_restore was a copy of
restore_regs_and_return_to_kernel.  Merge them and make the
paranoid_exit internal labels local.

Keeping .Lparanoid_exit makes the code a bit shorter because it
allows a 2-byte jnz instead of a 5-byte jnz.

Saves 96 bytes of text.

(This is still a bit suboptimal in a non-CONFIG_TRACE_IRQFLAGS
 kernel, but fixing that would make the code rather messy.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4ad40067162a..d6404a613df4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1119,17 +1119,14 @@ ENTRY(paranoid_exit)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
 	testl	%ebx, %ebx			/* swapgs needed? */
-	jnz	paranoid_exit_no_swapgs
+	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
 	SWAPGS_UNSAFE_STACK
-	jmp	paranoid_exit_restore
-paranoid_exit_no_swapgs:
+	jmp	.Lparanoid_exit_restore
+.Lparanoid_exit_no_swapgs:
 	TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
-	INTERRUPT_RETURN
+.Lparanoid_exit_restore:
+	jmp restore_regs_and_return_to_kernel
 END(paranoid_exit)
 
 /*
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 06/18] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (4 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 05/18] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-27 20:11   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Saves 64 bytes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d6404a613df4..9dafafa3e0ec 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -313,10 +313,18 @@ return_from_SYSCALL_64:
 	 */
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
+	POP_EXTRA_REGS
+	popq	%rsi	/* skip r11 */
+	popq	%r10
+	popq	%r9
+	popq	%r8
+	popq	%rax
+	popq	%rsi	/* skip rcx */
+	popq	%rdx
+	popq	%rsi
+	popq	%rdi
+	movq	RSP-ORIG_RAX(%rsp), %rsp
 	USERGS_SYSRET64
 END(entry_SYSCALL_64)
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (5 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 06/18] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-27 20:11   ` Borislav Petkov
  2017-11-01 17:25   ` Brian Gerst
  2017-10-26  8:26 ` [PATCH 08/18] xen: add xen nmi trap entry Andy Lutomirski
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

They did almost the same thing.  Remove a bunch of pointless
instructions (mostly hidden in macros) and reduce cognitive load by
merging them.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9dafafa3e0ec..c855ee91a3a5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -220,10 +220,9 @@ entry_SYSCALL_64_fastpath:
 	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RSP(%rsp), %rsp
+	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
 	UNWIND_HINT_EMPTY
-	USERGS_SYSRET64
+	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret
 
 1:
 	/*
@@ -315,6 +314,7 @@ syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	UNWIND_HINT_EMPTY
 	POP_EXTRA_REGS
+.Lpop_c_regs_except_rcx_r11_and_sysret:
 	popq	%rsi	/* skip r11 */
 	popq	%r10
 	popq	%r9
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 08/18] xen: add xen nmi trap entry
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (6 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26  8:26 ` [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Juergen Gross, Andy Lutomirski

From: Juergen Gross <jgross@suse.com>

Instead of trying to execute any NMI via the bare metal's NMI trap
handler use a Xen specific one for pv domains, like we do for e.g.
debug traps. As in a pv domain the NMI is handled via the normal
kernel stack this is the correct thing to do.

This will enable us to get rid of the very fragile and questionable
dependencies between the bare metal NMI handler and Xen assumptions
believed to be broken anyway.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S    | 2 +-
 arch/x86/include/asm/traps.h | 2 +-
 arch/x86/xen/enlighten_pv.c  | 2 +-
 arch/x86/xen/xen-asm_64.S    | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c855ee91a3a5..bf650fad94ce 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1074,6 +1074,7 @@ idtentry int3			do_int3			has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
 #ifdef CONFIG_XEN
+idtentry xennmi			do_nmi			has_error_code=0
 idtentry xendebug		do_debug		has_error_code=0
 idtentry xenint3		do_int3			has_error_code=0
 #endif
@@ -1236,7 +1237,6 @@ ENTRY(error_exit)
 END(error_exit)
 
 /* Runs on exception stack */
-/* XXX: broken on Xen PV */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
 	/*
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 5545f6459bf5..26f1a88e42da 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,9 +37,9 @@ asmlinkage void simd_coprocessor_error(void);
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
 asmlinkage void xen_divide_error(void);
+asmlinkage void xen_xennmi(void);
 asmlinkage void xen_xendebug(void);
 asmlinkage void xen_xenint3(void);
-asmlinkage void xen_nmi(void);
 asmlinkage void xen_overflow(void);
 asmlinkage void xen_bounds(void);
 asmlinkage void xen_invalid_op(void);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 69b9deff7e5c..8da4eff19c2a 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -600,7 +600,7 @@ static struct trap_array_entry trap_array[] = {
 #ifdef CONFIG_X86_MCE
 	{ machine_check,               xen_machine_check,               true },
 #endif
-	{ nmi,                         xen_nmi,                         true },
+	{ nmi,                         xen_xennmi,                      true },
 	{ overflow,                    xen_overflow,                    false },
 #ifdef CONFIG_IA32_EMULATION
 	{ entry_INT80_compat,          xen_entry_INT80_compat,          false },
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index dae2cc33afb5..286ecc198562 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -29,7 +29,7 @@ xen_pv_trap debug
 xen_pv_trap xendebug
 xen_pv_trap int3
 xen_pv_trap xenint3
-xen_pv_trap nmi
+xen_pv_trap xennmi
 xen_pv_trap overflow
 xen_pv_trap bounds
 xen_pv_trap invalid_op
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (7 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 08/18] xen: add xen nmi trap entry Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26  8:41   ` Juergen Gross
  2017-10-27 20:11   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski, Juergen Gross, Boris Ostrovsky

Xen PV is fundamentally incompatible with our fancy NMI code: it
doesn't use IST at all, and Xen entries clobber two stack slots
below the hardware frame.

Drop Xen PV support from our NMI code entirely.

Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bf650fad94ce..b46c726fa750 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1236,9 +1236,13 @@ ENTRY(error_exit)
 	jmp	retint_user
 END(error_exit)
 
-/* Runs on exception stack */
+/*
+ * Runs on exception stack.  Xen PV does not go through this path at all,
+ * so we can use real assembly here.
+ */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
+
 	/*
 	 * We allow breakpoints in NMIs. If a breakpoint occurs, then
 	 * the iretq it performs will take us out of NMI context.
@@ -1296,7 +1300,7 @@ ENTRY(nmi)
 	 * stacks lest we corrupt the "NMI executing" variable.
 	 */
 
-	SWAPGS_UNSAFE_STACK
+	swapgs
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1461,7 +1465,7 @@ nested_nmi_out:
 	popq	%rdx
 
 	/* We are returning to kernel mode, so this cannot result in a fault. */
-	INTERRUPT_RETURN
+	iretq
 
 first_nmi:
 	/* Restore rdx. */
@@ -1492,7 +1496,7 @@ first_nmi:
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
 	pushq	$1f		/* RIP */
-	INTERRUPT_RETURN	/* continues at repeat_nmi below */
+	iretq			/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
 #endif
@@ -1564,20 +1568,22 @@ nmi_restore:
 	/*
 	 * Clear "NMI executing".  Set DF first so that we can easily
 	 * distinguish the remaining code between here and IRET from
-	 * the SYSCALL entry and exit paths.  On a native kernel, we
-	 * could just inspect RIP, but, on paravirt kernels,
-	 * INTERRUPT_RETURN can translate into a jump into a
-	 * hypercall page.
+	 * the SYSCALL entry and exit paths.
+	 *
+	 * We arguably should just inspect RIP instead, but I (Andy) wrote
+	 * this code when I had the misapprehension that Xen PV supported
+	 * NMIs, and Xen PV would break that approach.
 	 */
 	std
 	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
 
 	/*
-	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
-	 * stack in a single instruction.  We are returning to kernel
-	 * mode, so this cannot result in a fault.
+	 * iretq reads the "iret" frame and exits the NMI stack in a
+	 * single instruction.  We are returning to kernel mode, so this
+	 * cannot result in a fault.  Similarly, we don't need to worry
+	 * about espfix64 on the way back to kernel mode.
 	 */
-	INTERRUPT_RETURN
+	iretq
 END(nmi)
 
 ENTRY(ignore_sysret)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (8 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26 18:00   ` Brian Gerst
                     ` (2 more replies)
  2017-10-26  8:26 ` [PATCH 11/18] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
                   ` (8 subsequent siblings)
  18 siblings, 3 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This causees the MSR_IA32_SYSENTER_CS write to move out of the
paravirt hook.  This shouldn't affect Xen PV: Xen already ignores
MSR_IA32_SYSENTER_ESP writes.  In any event, Xen doesn't support
vm86() in a useful way.

Note to any potential backporters: This patch won't break lguest, as
lguest didn't have any SYSENTER support at all.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h |  7 -------
 arch/x86/include/asm/switch_to.h | 11 +++++++++++
 arch/x86/kernel/process_32.c     |  1 +
 arch/x86/kernel/process_64.c     |  2 +-
 arch/x86/kernel/vm86_32.c        |  6 +++++-
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b390ff76e58f..0167e3e35a57 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -520,13 +520,6 @@ static inline void
 native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
 {
 	tss->x86_tss.sp0 = thread->sp0;
-#ifdef CONFIG_X86_32
-	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
-	if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
-		tss->x86_tss.ss1 = thread->sysenter_cs;
-		wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
-	}
-#endif
 }
 
 static inline void native_swapgs(void)
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index fcc5cd387fd1..f3fa19925ae1 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -72,4 +72,15 @@ do {									\
 	((last) = __switch_to_asm((prev), (next)));			\
 } while (0)
 
+#ifdef CONFIG_X86_32
+static inline void refresh_sysenter_cs(struct thread_struct *thread)
+{
+	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
+	if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
+		return;
+
+	this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);
+}
+#endif
+
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 11966251cd42..84d6c9f554d0 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -287,6 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * current_thread_info().
 	 */
 	load_sp0(tss, next);
+	refresh_sysenter_cs(next);  /* in case prev or next is vm86 */
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 302e7b2572d1..a6ff6d1a0110 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -464,7 +464,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 */
 	this_cpu_write(current_task, next_p);
 
-	/* Reload esp0 and ss1.  This changes current_thread_info(). */
+	/* Reload sp0. */
 	load_sp0(tss, next);
 
 	/*
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 7924a5356c8a..5bc1c3ab6287 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -54,6 +54,7 @@
 #include <asm/irq.h>
 #include <asm/traps.h>
 #include <asm/vm86.h>
+#include <asm/switch_to.h>
 
 /*
  * Known problems:
@@ -149,6 +150,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
 	load_sp0(tss, &tsk->thread);
+	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	put_cpu();
 
@@ -368,8 +370,10 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	/* make room for real-mode segments */
 	tsk->thread.sp0 += 16;
 
-	if (static_cpu_has(X86_FEATURE_SEP))
+	if (static_cpu_has(X86_FEATURE_SEP)) {
 		tsk->thread.sysenter_cs = 0;
+		refresh_sysenter_cs(&tsk->thread);
+	}
 
 	load_sp0(tss, &tsk->thread);
 	put_cpu();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 11/18] x86/asm/64: Pass sp0 directly to load_sp0()
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (9 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-27 20:12   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 12/18] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

load_sp0() had an odd signature:

void load_sp0(struct tss_struct *tss, struct thread_struct *thread);

Simplify it to:

void load_sp0(unsigned long sp0);

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/paravirt.h       |  5 ++---
 arch/x86/include/asm/paravirt_types.h |  2 +-
 arch/x86/include/asm/processor.h      |  9 ++++-----
 arch/x86/kernel/cpu/common.c          |  4 ++--
 arch/x86/kernel/process_32.c          |  2 +-
 arch/x86/kernel/process_64.c          |  2 +-
 arch/x86/kernel/vm86_32.c             | 14 ++++++--------
 arch/x86/xen/enlighten_pv.c           |  7 +++----
 8 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 12deec722cf0..43d4f90edebc 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,10 +15,9 @@
 #include <linux/cpumask.h>
 #include <asm/frame.h>
 
-static inline void load_sp0(struct tss_struct *tss,
-			     struct thread_struct *thread)
+static inline void load_sp0(unsigned long sp0)
 {
-	PVOP_VCALL2(pv_cpu_ops.load_sp0, tss, thread);
+	PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0);
 }
 
 /* The paravirtualized CPUID instruction. */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 280d94c36dad..a916788ac478 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -133,7 +133,7 @@ struct pv_cpu_ops {
 	void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries);
 	void (*free_ldt)(struct desc_struct *ldt, unsigned entries);
 
-	void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t);
+	void (*load_sp0)(unsigned long sp0);
 
 	void (*set_iopl_mask)(unsigned mask);
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0167e3e35a57..064b84722166 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -517,9 +517,9 @@ static inline void native_set_iopl_mask(unsigned mask)
 }
 
 static inline void
-native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
+native_load_sp0(unsigned long sp0)
 {
-	tss->x86_tss.sp0 = thread->sp0;
+	this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
 }
 
 static inline void native_swapgs(void)
@@ -544,10 +544,9 @@ static inline unsigned long current_top_of_stack(void)
 #else
 #define __cpuid			native_cpuid
 
-static inline void load_sp0(struct tss_struct *tss,
-			    struct thread_struct *thread)
+static inline void load_sp0(unsigned long sp0)
 {
-	native_load_sp0(tss, thread);
+	native_load_sp0(sp0);
 }
 
 #define set_iopl_mask native_set_iopl_mask
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c9176bae7fd8..079648bd85ed 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1572,7 +1572,7 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, me);
 
-	load_sp0(t, &current->thread);
+	load_sp0(current->thread.sp0);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
 	load_mm_ldt(&init_mm);
@@ -1627,7 +1627,7 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, curr);
 
-	load_sp0(t, thread);
+	load_sp0(thread->sp0);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
 	load_mm_ldt(&init_mm);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 84d6c9f554d0..8bdf9aaa92a0 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -286,7 +286,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * Reload esp0 and cpu_current_top_of_stack.  This changes
 	 * current_thread_info().
 	 */
-	load_sp0(tss, next);
+	load_sp0(next->sp0);
 	refresh_sysenter_cs(next);  /* in case prev or next is vm86 */
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a6ff6d1a0110..2124304fb77a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 
 	/* Reload sp0. */
-	load_sp0(tss, next);
+	load_sp0(next->sp0);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 5bc1c3ab6287..0f1d92cd20ad 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -94,7 +94,6 @@
 
 void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 {
-	struct tss_struct *tss;
 	struct task_struct *tsk = current;
 	struct vm86plus_struct __user *user;
 	struct vm86 *vm86 = current->thread.vm86;
@@ -146,13 +145,13 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 		do_exit(SIGSEGV);
 	}
 
-	tss = &per_cpu(cpu_tss, get_cpu());
+	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	load_sp0(tss, &tsk->thread);
+	load_sp0(tsk->thread.sp0);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
-	put_cpu();
+	preempt_enable();
 
 	memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));
 
@@ -238,7 +237,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
 
 static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 {
-	struct tss_struct *tss;
 	struct task_struct *tsk = current;
 	struct vm86 *vm86 = tsk->thread.vm86;
 	struct kernel_vm86_regs vm86regs;
@@ -366,8 +364,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	vm86->saved_sp0 = tsk->thread.sp0;
 	lazy_save_gs(vm86->regs32.gs);
 
-	tss = &per_cpu(cpu_tss, get_cpu());
 	/* make room for real-mode segments */
+	preempt_disable();
 	tsk->thread.sp0 += 16;
 
 	if (static_cpu_has(X86_FEATURE_SEP)) {
@@ -375,8 +373,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	load_sp0(tss, &tsk->thread);
-	put_cpu();
+	load_sp0(tsk->thread.sp0);
+	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)
 		mark_screen_rdonly(tsk->mm);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 8da4eff19c2a..e7b213047724 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -810,15 +810,14 @@ static void __init xen_write_gdt_entry_boot(struct desc_struct *dt, int entry,
 	}
 }
 
-static void xen_load_sp0(struct tss_struct *tss,
-			 struct thread_struct *thread)
+static void xen_load_sp0(unsigned long sp0)
 {
 	struct multicall_space mcs;
 
 	mcs = xen_mc_entry(0);
-	MULTI_stack_switch(mcs.mc, __KERNEL_DS, thread->sp0);
+	MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0);
 	xen_mc_issue(PARAVIRT_LAZY_CPU);
-	tss->x86_tss.sp0 = thread->sp0;
+	this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
 }
 
 void xen_set_iopl_mask(unsigned mask)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 12/18] x86/asm: Add task_top_of_stack() to find the top of a task's stack
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (10 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 11/18] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26  8:26 ` [PATCH 13/18] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This will let us get rid of a few places that hardcode accesses to
thread.sp0.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 064b84722166..ad59cec14239 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -795,6 +795,8 @@ static inline void spin_lock_prefetch(const void *x)
 #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \
 			   TOP_OF_KERNEL_STACK_PADDING)
 
+#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
+
 #ifdef CONFIG_X86_32
 /*
  * User space process size: 3GB (default).
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 13/18] x86/xen/64: Clean up SP code in cpu_initialize_context()
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (11 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 12/18] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-26  8:43   ` Juergen Gross
  2017-10-26  8:26 ` [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski, Juergen Gross, Boris Ostrovsky

I'm removing thread_struct::sp0, and Xen's usage of it is slightly
dubious and unnecessary.  Use appropriate helpers instead.

While we're at at, reorder the code slightly to make it more obvious
what's going on.

Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/xen/smp_pv.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 51471408fdd1..8c0e047d0b80 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -13,6 +13,7 @@
  * single-threaded.
  */
 #include <linux/sched.h>
+#include <linux/sched/task_stack.h>
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
@@ -293,12 +294,19 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 #endif
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
+	/*
+	 * Bring up the CPU in cpu_bringup_and_idle() with the stack
+	 * pointing just below where pt_regs would be if it were a normal
+	 * kernel entry.
+	 */
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 	ctxt->flags = VGCF_IN_KERNEL;
 	ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 	ctxt->user_regs.ds = __USER_DS;
 	ctxt->user_regs.es = __USER_DS;
 	ctxt->user_regs.ss = __KERNEL_DS;
+	ctxt->user_regs.cs = __KERNEL_CS;
+	ctxt->user_regs.esp = (unsigned long)task_pt_regs(idle);
 
 	xen_copy_trap_info(ctxt->trap_ctxt);
 
@@ -313,8 +321,13 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 	ctxt->gdt_frames[0] = gdt_mfn;
 	ctxt->gdt_ents      = GDT_ENTRIES;
 
+	/*
+	 * Set SS:SP that Xen will use when entering guest kernel mode
+	 * from guest user mode.  Subsequent calls to load_sp0() can
+	 * change this value.
+	 */
 	ctxt->kernel_ss = __KERNEL_DS;
-	ctxt->kernel_sp = idle->thread.sp0;
+	ctxt->kernel_sp = task_top_of_stack(idle);
 
 #ifdef CONFIG_X86_32
 	ctxt->event_callback_cs     = __KERNEL_CS;
@@ -326,10 +339,8 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 		(unsigned long)xen_hypervisor_callback;
 	ctxt->failsafe_callback_eip =
 		(unsigned long)xen_failsafe_callback;
-	ctxt->user_regs.cs = __KERNEL_CS;
 	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
 
-	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt))
 		BUG();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (12 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 13/18] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-27 18:21   ` Dave Hansen
  2017-10-26  8:26 ` [PATCH 15/18] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

In my quest to get rid of thread_struct::sp0, I want to clean up or
remove all of its readers.  Two of them are in cpu_init() (32-bit and
64-bit), and they aren't needed.  This is because we never enter
userspace at all on the threads that CPUs are initialized in.

Poison the initial TSS.sp0 and stop initializing it on CPU init.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/cpu/common.c | 12 ++++++++++--
 arch/x86/kernel/process.c    |  3 ++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 079648bd85ed..adc02cb351e0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1572,9 +1572,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, me);
 
-	load_sp0(current->thread.sp0);
+	/*
+	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
+	 * task never enters user mode.
+	 */
 	set_tss_desc(cpu, t);
 	load_TR_desc();
+
 	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
@@ -1627,9 +1631,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, curr);
 
-	load_sp0(thread->sp0);
+	/*
+	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
+	 * task never enters user mode.
+	 */
 	set_tss_desc(cpu, t);
 	load_TR_desc();
+
 	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd6b85fac666..7ece9d4764fb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -48,7 +48,8 @@
  */
 __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 	.x86_tss = {
-		.sp0 = TOP_OF_INIT_STACK,
+		/* Initialize sp0 to a value that is definitely invalid. */
+		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
 		.ss1 = __KERNEL_CS,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 15/18] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (13 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-11-01 10:08   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 16/18] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

The only remaining readers in context switch code or vm86(), and
they all just want to update TSS.sp0 to match the current task.
Replace them all with a new helper update_sp0().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/switch_to.h | 6 ++++++
 arch/x86/kernel/process_32.c     | 2 +-
 arch/x86/kernel/process_64.c     | 2 +-
 arch/x86/kernel/vm86_32.c        | 4 ++--
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index f3fa19925ae1..d9bb491ba45c 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -83,4 +83,10 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 }
 #endif
 
+/* This is used when switching tasks or entering/exiting vm86 mode. */
+static inline void update_sp0(struct task_struct *task)
+{
+	load_sp0(task->thread.sp0);
+}
+
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8bdf9aaa92a0..070bdaaa7c86 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -286,7 +286,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * Reload esp0 and cpu_current_top_of_stack.  This changes
 	 * current_thread_info().
 	 */
-	load_sp0(next->sp0);
+	update_sp0(next_p);
 	refresh_sysenter_cs(next);  /* in case prev or next is vm86 */
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2124304fb77a..45e380958392 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 
 	/* Reload sp0. */
-	load_sp0(next->sp0);
+	update_sp0(next_p);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 0f1d92cd20ad..a7b44c75c642 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -148,7 +148,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	load_sp0(tsk->thread.sp0);
+	update_sp0(tsk);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	preempt_enable();
@@ -373,7 +373,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	load_sp0(tsk->thread.sp0);
+	update_sp0(tsk);
 	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 16/18] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (14 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 15/18] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-11-01 10:18   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

cpu_current_top_of_stack's initialization forgot about
TOP_OF_KERNEL_STACK_PADDING.  This bug didn't matter because the
idle threads never enter user mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/smpboot.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ad59edd84de7..06c18fe1c09e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -961,8 +961,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
-	per_cpu(cpu_current_top_of_stack, cpu) =
-		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
+	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
 #else
 	initial_gs = per_cpu_offset(cpu);
 #endif
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (15 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 16/18] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-10-28  0:52   ` Brian Gerst
  2017-11-01 10:23   ` Borislav Petkov
  2017-10-26  8:26 ` [PATCH 18/18] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
  2017-10-26 15:55 ` [PATCH 00/18] Pile o' entry/exit/sp0 changes Linus Torvalds
  18 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

On x86_64, we can easily calculate sp0 when needed instead of
storing it in thread_struct.

On x86_32, a similar cleanup would be possible, but it would require
cleaning up the vm86 code first, and that can wait for a later
cleanup series.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/compat.h    |  1 +
 arch/x86/include/asm/processor.h | 33 +++++++++++++++------------------
 arch/x86/include/asm/switch_to.h |  6 ++++++
 arch/x86/kernel/process_64.c     |  1 -
 4 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 5343c19814b3..948b6d8ec46f 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -6,6 +6,7 @@
  */
 #include <linux/types.h>
 #include <linux/sched.h>
+#include <linux/sched/task_stack.h>
 #include <asm/processor.h>
 #include <asm/user32.h>
 #include <asm/unistd.h>
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ad59cec14239..562c575d8bc3 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -430,7 +430,9 @@ typedef struct {
 struct thread_struct {
 	/* Cached TLS descriptors: */
 	struct desc_struct	tls_array[GDT_ENTRY_TLS_ENTRIES];
+#ifdef CONFIG_X86_32
 	unsigned long		sp0;
+#endif
 	unsigned long		sp;
 #ifdef CONFIG_X86_32
 	unsigned long		sysenter_cs;
@@ -797,6 +799,13 @@ static inline void spin_lock_prefetch(const void *x)
 
 #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
 
+#define task_pt_regs(task) \
+({									\
+	unsigned long __ptr = (unsigned long)task_stack_page(task);	\
+	__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;		\
+	((struct pt_regs *)__ptr) - 1;					\
+})
+
 #ifdef CONFIG_X86_32
 /*
  * User space process size: 3GB (default).
@@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
 	.addr_limit		= KERNEL_DS,				  \
 }
 
-/*
- * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
- * This is necessary to guarantee that the entire "struct pt_regs"
- * is accessible even if the CPU haven't stored the SS/ESP registers
- * on the stack (interrupt gate does not save these registers
- * when switching to the same priv ring).
- * Therefore beware: accessing the ss/esp fields of the
- * "struct pt_regs" is possible, but they may contain the
- * completely wrong values.
- */
-#define task_pt_regs(task) \
-({									\
-	unsigned long __ptr = (unsigned long)task_stack_page(task);	\
-	__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;		\
-	((struct pt_regs *)__ptr) - 1;					\
-})
-
 #define KSTK_ESP(task)		(task_pt_regs(task)->sp)
 
 #else
@@ -865,12 +857,17 @@ static inline void spin_lock_prefetch(const void *x)
 #define STACK_TOP		TASK_SIZE_LOW
 #define STACK_TOP_MAX		TASK_SIZE_MAX
 
+#ifdef CONFIG_X86_32
 #define INIT_THREAD  {						\
 	.sp0			= TOP_OF_INIT_STACK,		\
 	.addr_limit		= KERNEL_DS,			\
 }
+#else
+#define INIT_THREAD  {						\
+	.addr_limit		= KERNEL_DS,			\
+}
+#endif
 
-#define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index d9bb491ba45c..c557b7526cc7 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_SWITCH_TO_H
 #define _ASM_X86_SWITCH_TO_H
 
+#include <linux/sched/task_stack.h>
+
 struct task_struct; /* one of the stranger aspects of C forward declarations */
 
 struct task_struct *__switch_to_asm(struct task_struct *prev,
@@ -86,7 +88,11 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 /* This is used when switching tasks or entering/exiting vm86 mode. */
 static inline void update_sp0(struct task_struct *task)
 {
+#ifdef CONFIG_X86_32
 	load_sp0(task->thread.sp0);
+#else
+	load_sp0(task_top_of_stack(task));
+#endif
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 45e380958392..eeeb34f85c25 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -274,7 +274,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	struct inactive_task_frame *frame;
 	struct task_struct *me = current;
 
-	p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
 	childregs = task_pt_regs(p);
 	fork_frame = container_of(childregs, struct fork_frame, regs);
 	frame = &fork_frame->frame;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 18/18] x86/traps: Use a new on_thread_stack() helper to clean up an assertion
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (16 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
@ 2017-10-26  8:26 ` Andy Lutomirski
  2017-11-01 10:31   ` Borislav Petkov
  2017-10-26 15:55 ` [PATCH 00/18] Pile o' entry/exit/sp0 changes Linus Torvalds
  18 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-26  8:26 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 6 ++++++
 arch/x86/kernel/traps.c          | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 562c575d8bc3..26fc33231cbd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -541,6 +541,12 @@ static inline unsigned long current_top_of_stack(void)
 #endif
 }
 
+static inline bool on_thread_stack(void)
+{
+	return (unsigned long)(current_top_of_stack() -
+			       current_stack_pointer) < THREAD_SIZE;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 67db4f43309e..42a9c4458f5d 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs *regs)
 	 * will catch asm bugs and any attempt to use ist_preempt_enable
 	 * from double_fault.
 	 */
-	BUG_ON((unsigned long)(current_top_of_stack() -
-			       current_stack_pointer) >= THREAD_SIZE);
+	BUG_ON(!on_thread_stack());
 
 	preempt_enable_no_resched();
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code
  2017-10-26  8:26 ` [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
@ 2017-10-26  8:41   ` Juergen Gross
  2017-10-27 20:11   ` Borislav Petkov
  1 sibling, 0 replies; 57+ messages in thread
From: Juergen Gross @ 2017-10-26  8:41 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Boris Ostrovsky

On 26/10/17 10:26, Andy Lutomirski wrote:
> Xen PV is fundamentally incompatible with our fancy NMI code: it
> doesn't use IST at all, and Xen entries clobber two stack slots
> below the hardware frame.
> 
> Drop Xen PV support from our NMI code entirely.
> 
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Acked-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 13/18] x86/xen/64: Clean up SP code in cpu_initialize_context()
  2017-10-26  8:26 ` [PATCH 13/18] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
@ 2017-10-26  8:43   ` Juergen Gross
  0 siblings, 0 replies; 57+ messages in thread
From: Juergen Gross @ 2017-10-26  8:43 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Boris Ostrovsky

On 26/10/17 10:26, Andy Lutomirski wrote:
> I'm removing thread_struct::sp0, and Xen's usage of it is slightly
> dubious and unnecessary.  Use appropriate helpers instead.
> 
> While we're at at, reorder the code slightly to make it more obvious
> what's going on.
> 
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  2017-10-26  8:26 ` [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
@ 2017-10-26 13:52   ` Brian Gerst
  2017-10-26 14:13     ` Dave Hansen
  2017-10-27 18:08   ` Dave Hansen
  1 sibling, 1 reply; 57+ messages in thread
From: Brian Gerst @ 2017-10-26 13:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski <luto@kernel.org> wrote:
> All of the code paths that ended up doing IRET to usermode did
> SWAPGS immediately beforehand.  Move the SWAPGS into the common
> code.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

> +GLOBAL(swapgs_restore_regs_and_return_to_usermode)

Is adding swapgs to the label really necessary?  It's redundant with
the usermode part.  I'm noticing a trend towards absurdly verbose
labels lately.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  2017-10-26 13:52   ` Brian Gerst
@ 2017-10-26 14:13     ` Dave Hansen
  2017-10-26 14:28       ` Borislav Petkov
  0 siblings, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2017-10-26 14:13 UTC (permalink / raw)
  To: Brian Gerst, Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Linus Torvalds

On 10/26/2017 06:52 AM, Brian Gerst wrote:
> On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> All of the code paths that ended up doing IRET to usermode did
>> SWAPGS immediately beforehand.  Move the SWAPGS into the common
>> code.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> 
>> +GLOBAL(swapgs_restore_regs_and_return_to_usermode)
> 
> Is adding swapgs to the label really necessary?  It's redundant with
> the usermode part.  I'm noticing a trend towards absurdly verbose
> labels lately.

I kinda appreciate the verbosity.  When I see a jump to such a label, I
know explicitly what the jumper has to do.  While it's possible that
every "return to usermode" spot in the kernel does a swapgs, it's not
patently obvious to me that every last one does that in every last
situation.

There are also places where we do register restoring after swapgs and
then return to userspace.  It's nice to know what mode we are in and
what we are supposed to to just from the label.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  2017-10-26 14:13     ` Dave Hansen
@ 2017-10-26 14:28       ` Borislav Petkov
  2017-10-27 15:44         ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2017-10-26 14:28 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brian Gerst, Andy Lutomirski, X86 ML, linux-kernel, Linus Torvalds

On Thu, Oct 26, 2017 at 07:13:32AM -0700, Dave Hansen wrote:
> I kinda appreciate the verbosity.  When I see a jump to such a label, I

Sorry but this and other labels' names are just ridiculously long.
swapgs_restore_regs_return or swapgs_restore or somesuch are perfectly
fine.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label
  2017-10-26  8:26 ` [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
@ 2017-10-26 15:07   ` Borislav Petkov
  2017-11-10  6:08   ` [01/18] " kemi
  1 sibling, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-10-26 15:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:33AM -0700, Andy Lutomirski wrote:
> The only user was the 64-bit opportunistic SYSRET failure path, and
> that path didn't really need it.  This change makes the
> opportunistic SYSRET code a bit more straightforward and gets rid of
> the label.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/18] Pile o' entry/exit/sp0 changes
  2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (17 preceding siblings ...)
  2017-10-26  8:26 ` [PATCH 18/18] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
@ 2017-10-26 15:55 ` Linus Torvalds
  2017-10-28  7:31   ` Andy Lutomirski
  18 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2017-10-26 15:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen



On Thu, 26 Oct 2017, Andy Lutomirski wrote:
> 
> Here's the first "real" version of this series.  It's based on Linus'
> tree from a couple days ago.

Not seeing any problems in this series. I was hoping it would remove more 
lines of code than it adds, and it's the reverse, but that seems to be 
mostly the new push/pop stuff. I'm taking it we couldn't just replace the 
'mov' instructions in the save/restore macros..

I *do* ask that when we have code like this:

        POP_EXTRA_REGS
        POP_C_REGS
        addq    $8, %rsp

that "addq $8" would have a comment about what stack entry we're skipping.

I *think* it's 'orig_eax', but wouldn't it be nice to say so?

            Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
  2017-10-26  8:26 ` [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
@ 2017-10-26 18:00   ` Brian Gerst
  2017-10-27 13:51   ` Thomas Gleixner
  2017-10-27 20:11   ` Borislav Petkov
  2 siblings, 0 replies; 57+ messages in thread
From: Brian Gerst @ 2017-10-26 18:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski <luto@kernel.org> wrote:
> This causees the MSR_IA32_SYSENTER_CS write to move out of the
> paravirt hook.  This shouldn't affect Xen PV: Xen already ignores
> MSR_IA32_SYSENTER_ESP writes.  In any event, Xen doesn't support
> vm86() in a useful way.
>
> Note to any potential backporters: This patch won't break lguest, as
> lguest didn't have any SYSENTER support at all.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/processor.h |  7 -------
>  arch/x86/include/asm/switch_to.h | 11 +++++++++++
>  arch/x86/kernel/process_32.c     |  1 +
>  arch/x86/kernel/process_64.c     |  2 +-
>  arch/x86/kernel/vm86_32.c        |  6 +++++-
>  5 files changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index b390ff76e58f..0167e3e35a57 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -520,13 +520,6 @@ static inline void
>  native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
>  {
>         tss->x86_tss.sp0 = thread->sp0;
> -#ifdef CONFIG_X86_32
> -       /* Only happens when SEP is enabled, no need to test "SEP"arately: */
> -       if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
> -               tss->x86_tss.ss1 = thread->sysenter_cs;
> -               wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
> -       }
> -#endif
>  }
>
>  static inline void native_swapgs(void)
> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
> index fcc5cd387fd1..f3fa19925ae1 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -72,4 +72,15 @@ do {                                                                 \
>         ((last) = __switch_to_asm((prev), (next)));                     \
>  } while (0)
>
> +#ifdef CONFIG_X86_32
> +static inline void refresh_sysenter_cs(struct thread_struct *thread)
> +{
> +       /* Only happens when SEP is enabled, no need to test "SEP"arately: */
> +       if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
> +               return;
> +
> +       this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);
> +}
> +#endif
> +
>  #endif /* _ASM_X86_SWITCH_TO_H */
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 11966251cd42..84d6c9f554d0 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -287,6 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>          * current_thread_info().
>          */
>         load_sp0(tss, next);
> +       refresh_sysenter_cs(next);  /* in case prev or next is vm86 */
>         this_cpu_write(cpu_current_top_of_stack,
>                        (unsigned long)task_stack_page(next_p) +
>                        THREAD_SIZE);
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 302e7b2572d1..a6ff6d1a0110 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -464,7 +464,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>          */
>         this_cpu_write(current_task, next_p);
>
> -       /* Reload esp0 and ss1.  This changes current_thread_info(). */
> +       /* Reload sp0. */
>         load_sp0(tss, next);
>
>         /*
> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
> index 7924a5356c8a..5bc1c3ab6287 100644
> --- a/arch/x86/kernel/vm86_32.c
> +++ b/arch/x86/kernel/vm86_32.c
> @@ -54,6 +54,7 @@
>  #include <asm/irq.h>
>  #include <asm/traps.h>
>  #include <asm/vm86.h>
> +#include <asm/switch_to.h>
>
>  /*
>   * Known problems:
> @@ -149,6 +150,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
>         tsk->thread.sp0 = vm86->saved_sp0;
>         tsk->thread.sysenter_cs = __KERNEL_CS;
>         load_sp0(tss, &tsk->thread);
> +       refresh_sysenter_cs(&tsk->thread);
>         vm86->saved_sp0 = 0;
>         put_cpu();
>
> @@ -368,8 +370,10 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
>         /* make room for real-mode segments */
>         tsk->thread.sp0 += 16;
>
> -       if (static_cpu_has(X86_FEATURE_SEP))
> +       if (static_cpu_has(X86_FEATURE_SEP)) {
>                 tsk->thread.sysenter_cs = 0;
> +               refresh_sysenter_cs(&tsk->thread);
> +       }
>
>         load_sp0(tss, &tsk->thread);
>         put_cpu();

The MSR update is missing in the new version.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
  2017-10-26  8:26 ` [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
  2017-10-26 18:00   ` Brian Gerst
@ 2017-10-27 13:51   ` Thomas Gleixner
  2017-10-27 15:50     ` Andy Lutomirski
  2017-10-27 20:11   ` Borislav Petkov
  2 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2017-10-27 13:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Thu, 26 Oct 2017, Andy Lutomirski wrote:
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index b390ff76e58f..0167e3e35a57 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -520,13 +520,6 @@ static inline void
>  native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
>  {
>  	tss->x86_tss.sp0 = thread->sp0;
> -#ifdef CONFIG_X86_32
> -	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
> -	if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
> -		tss->x86_tss.ss1 = thread->sysenter_cs;
> -		wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
> -	}
> -#endif
>  }
>  
>  static inline void native_swapgs(void)
> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
> index fcc5cd387fd1..f3fa19925ae1 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -72,4 +72,15 @@ do {									\
>  	((last) = __switch_to_asm((prev), (next)));			\
>  } while (0)
>  
> +#ifdef CONFIG_X86_32
> +static inline void refresh_sysenter_cs(struct thread_struct *thread)
> +{
> +	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
> +	if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
> +		return;
> +
> +	this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);

You lost the wrmsr() on the way... Ideally you move the code unmodified
first and then do the this_cpu_ change on top.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  2017-10-26 14:28       ` Borislav Petkov
@ 2017-10-27 15:44         ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-27 15:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Brian Gerst, Andy Lutomirski, X86 ML, linux-kernel,
	Linus Torvalds

On Thu, Oct 26, 2017 at 7:28 AM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, Oct 26, 2017 at 07:13:32AM -0700, Dave Hansen wrote:
>> I kinda appreciate the verbosity.  When I see a jump to such a label, I
>
> Sorry but this and other labels' names are just ridiculously long.
> swapgs_restore_regs_return or swapgs_restore or somesuch are perfectly
> fine.
>

Hmm.  I'm kind of tempted to rename it to exit_to_usermode.  After
all, it's the thing you call after prepare_exit_to_usermode, and its
preconditions rather nontrivial and are quite likely to become even
less trivial in the future.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
  2017-10-27 13:51   ` Thomas Gleixner
@ 2017-10-27 15:50     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-27 15:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Dave Hansen, Linus Torvalds

On Fri, Oct 27, 2017 at 6:51 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 26 Oct 2017, Andy Lutomirski wrote:
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index b390ff76e58f..0167e3e35a57 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -520,13 +520,6 @@ static inline void
>>  native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
>>  {
>>       tss->x86_tss.sp0 = thread->sp0;
>> -#ifdef CONFIG_X86_32
>> -     /* Only happens when SEP is enabled, no need to test "SEP"arately: */
>> -     if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
>> -             tss->x86_tss.ss1 = thread->sysenter_cs;
>> -             wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
>> -     }
>> -#endif
>>  }
>>
>>  static inline void native_swapgs(void)
>> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
>> index fcc5cd387fd1..f3fa19925ae1 100644
>> --- a/arch/x86/include/asm/switch_to.h
>> +++ b/arch/x86/include/asm/switch_to.h
>> @@ -72,4 +72,15 @@ do {                                                                       \
>>       ((last) = __switch_to_asm((prev), (next)));                     \
>>  } while (0)
>>
>> +#ifdef CONFIG_X86_32
>> +static inline void refresh_sysenter_cs(struct thread_struct *thread)
>> +{
>> +     /* Only happens when SEP is enabled, no need to test "SEP"arately: */
>> +     if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
>> +             return;
>> +
>> +     this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);
>
> You lost the wrmsr() on the way... Ideally you move the code unmodified
> first and then do the this_cpu_ change on top.

I don't think that would work.  The old location of that code had a
local variable "tss", and the new location doesn't.  I could make the
nonsensical one-line change as a preparatory patch, but I don't think
that really makes what's changing any clearer.


Anyway, I fixed it.

>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  2017-10-26  8:26 ` [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
@ 2017-10-27 18:05   ` Dave Hansen
  2017-11-01 11:34     ` Andy Lutomirski
  2017-10-27 20:04   ` Borislav Petkov
  1 sibling, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2017-10-27 18:05 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Linus Torvalds

On 10/26/2017 01:26 AM, Andy Lutomirski wrote:
> +GLOBAL(restore_regs_and_return_to_usermode)
> +#ifdef CONFIG_DEBUG_ENTRY
> +	testl	$3, CS(%rsp)
> +	jnz	1f
> +	ud2

A nit from the mere mortals in the audience: Could we start commenting
or make a constant for the user segment bits in CS?

Also, it would be nice to explain what's going on here.  Maybe:

/*
 * We think we are returning to the kernel.  Check the
 * registers we are about to restore and if we appear to
 * be returning to userspace, do something that will cause
 * a fault and hopefully an oops report.
 */

Otherwise, I really like this change.  It's really hard to figure out
what the context is in the entry assembly in a lot of cases.  It's a
place where code reuse actually makes things harder to follow.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  2017-10-26  8:26 ` [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
  2017-10-26 13:52   ` Brian Gerst
@ 2017-10-27 18:08   ` Dave Hansen
  1 sibling, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-10-27 18:08 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Linus Torvalds

On 10/26/2017 01:26 AM, Andy Lutomirski wrote:
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 493e5e234d36..1909a4e42b81 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -254,7 +254,7 @@ return_from_SYSCALL_64:
>  	movq	RCX(%rsp), %rcx
>  	movq	RIP(%rsp), %r11
>  	cmpq	%rcx, %r11			/* RCX == RIP */
> -	jne	opportunistic_sysret_failed
> +	jne	swapgs_restore_regs_and_return_to_usermode

Could we just leave the "opportunistic_sysret_failed" label and put it
at the same location as "swapgs_restore_regs_and_return_to_usermode".
It's kinda nice to have the failure paths spelled out.

Just reading this, I have no idea if "RCX == RIP" is good or bad and
whether we're proceeding with the sysret or giving up on it.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot
  2017-10-26  8:26 ` [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
@ 2017-10-27 18:21   ` Dave Hansen
  2017-10-28  8:10     ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2017-10-27 18:21 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Linus Torvalds

On 10/26/2017 01:26 AM, Andy Lutomirski wrote:
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -48,7 +48,8 @@
>   */
>  __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
>  	.x86_tss = {
> -		.sp0 = TOP_OF_INIT_STACK,
> +		/* Initialize sp0 to a value that is definitely invalid. */
> +		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,

This confused me at first: How does this not poison the init task's stack?

Should the comment maybe say something like:

	The hardware only uses .sp0 (or sp1 or sp2 for that matter) when
	doing ring transitions.  Since the init task never runs anything
	other than ring 0, it has no need for a valid value here.
	Poison it.

to clarify what's going on?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  2017-10-26  8:26 ` [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
  2017-10-27 18:05   ` Dave Hansen
@ 2017-10-27 20:04   ` Borislav Petkov
  2017-11-01 11:32     ` Andy Lutomirski
  1 sibling, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2017-10-27 20:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:34AM -0700, Andy Lutomirski wrote:
> These code paths will diverge soon.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S        | 32 +++++++++++++++++++++++---------
>  arch/x86/entry/entry_64_compat.S |  2 +-
>  arch/x86/kernel/head_64.S        |  2 +-
>  3 files changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index afe1f403fa0e..493e5e234d36 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -321,7 +321,7 @@ syscall_return_via_sysret:
>  
>  opportunistic_sysret_failed:
>  	SWAPGS
> -	jmp	restore_regs_and_iret
> +	jmp	restore_regs_and_return_to_usermode

rstor_regs_ret_user

still sounds pretty ok to me and it is shorter and it is using the
*RSTOR spelling the respective FPU insns use, so should be recognizable. :-)

And the other one could be

rstor_regs_ret_kernel

of course.

>  END(entry_SYSCALL_64)
>  
>  ENTRY(stub_ptregs_64)
> @@ -423,7 +423,7 @@ ENTRY(ret_from_fork)
>  	call	syscall_return_slowpath	/* returns with IRQs disabled */
>  	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
>  	SWAPGS
> -	jmp	restore_regs_and_iret
> +	jmp	restore_regs_and_return_to_usermode
>  
>  1:
>  	/* kernel thread */
> @@ -612,7 +612,19 @@ GLOBAL(retint_user)
>  	call	prepare_exit_to_usermode
>  	TRACE_IRQS_IRETQ
>  	SWAPGS
> -	jmp	restore_regs_and_iret
> +
> +GLOBAL(restore_regs_and_return_to_usermode)
> +#ifdef CONFIG_DEBUG_ENTRY
> +	testl	$3, CS(%rsp)

testb

ditto for the other spot.

> +	jnz	1f
> +	ud2
> +1:
> +#endif

Why aren't we exploding here unconditionally? I mean, we don't want to
return with RPL != 3 CS for whatever reason...

> +	RESTORE_EXTRA_REGS
> +	RESTORE_C_REGS
> +	REMOVE_PT_GPREGS_FROM_STACK 8
> +	INTERRUPT_RETURN
> +
>  
>  /* Returning to kernel space */
>  retint_kernel:
> @@ -632,11 +644,13 @@ retint_kernel:
>  	 */
>  	TRACE_IRQS_IRETQ
>  
> -/*
> - * At this label, code paths which return to kernel and to user,
> - * which come from interrupts/exception and from syscalls, merge.
> - */
> -GLOBAL(restore_regs_and_iret)
> +GLOBAL(restore_regs_and_return_to_kernel)
> +#ifdef CONFIG_DEBUG_ENTRY
> +	testl	$3, CS(%rsp)
> +	jz	1f
> +	ud2
> +1:
> +#endif

I guess here too, as a sanity-check. TEST and Jcc should be very cheap
compared to everything else jumping-through-hoops we'll be doing :-\

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/18] x86/asm/64: Shrink paranoid_exit_restore and make labels local
  2017-10-26  8:26 ` [PATCH 05/18] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
@ 2017-10-27 20:07   ` Borislav Petkov
  0 siblings, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-10-27 20:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:37AM -0700, Andy Lutomirski wrote:
> paranoid_exit_restore was a copy of
> restore_regs_and_return_to_kernel.  Merge them and make the
> paranoid_exit internal labels local.
> 
> Keeping .Lparanoid_exit makes the code a bit shorter because it
> allows a 2-byte jnz instead of a 5-byte jnz.
> 
> Saves 96 bytes of text.
> 
> (This is still a bit suboptimal in a non-CONFIG_TRACE_IRQFLAGS
>  kernel, but fixing that would make the code rather messy.)
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4ad40067162a..d6404a613df4 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1119,17 +1119,14 @@ ENTRY(paranoid_exit)
>  	DISABLE_INTERRUPTS(CLBR_ANY)
>  	TRACE_IRQS_OFF_DEBUG
>  	testl	%ebx, %ebx			/* swapgs needed? */
> -	jnz	paranoid_exit_no_swapgs
> +	jnz	.Lparanoid_exit_no_swapgs
>  	TRACE_IRQS_IRETQ
>  	SWAPGS_UNSAFE_STACK
> -	jmp	paranoid_exit_restore
> -paranoid_exit_no_swapgs:
> +	jmp	.Lparanoid_exit_restore

Kill that paranoid_exit_restore too and save everybody a jump:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1b9e8079ca3a..b88e17e120fa 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1122,10 +1122,9 @@ ENTRY(paranoid_exit)
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
 	SWAPGS_UNSAFE_STACK
-	jmp	.Lparanoid_exit_restore
+	jmp restore_regs_and_return_to_kernel
 .Lparanoid_exit_no_swapgs:
 	TRACE_IRQS_IRETQ_DEBUG
-.Lparanoid_exit_restore:
 	jmp restore_regs_and_return_to_kernel
 END(paranoid_exit)
 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/18] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret
  2017-10-26  8:26 ` [PATCH 06/18] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
@ 2017-10-27 20:11   ` Borislav Petkov
  0 siblings, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-10-27 20:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:38AM -0700, Andy Lutomirski wrote:
> Saves 64 bytes.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index d6404a613df4..9dafafa3e0ec 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -313,10 +313,18 @@ return_from_SYSCALL_64:
>  	 */
>  syscall_return_via_sysret:
>  	/* rcx and r11 are already restored (see code above) */
> -	RESTORE_EXTRA_REGS
> -	RESTORE_C_REGS_EXCEPT_RCX_R11
> -	movq	RSP(%rsp), %rsp
>  	UNWIND_HINT_EMPTY
> +	POP_EXTRA_REGS
> +	popq	%rsi	/* skip r11 */
> +	popq	%r10
> +	popq	%r9
> +	popq	%r8
> +	popq	%rax
> +	popq	%rsi	/* skip rcx */
> +	popq	%rdx
> +	popq	%rsi
> +	popq	%rdi
> +	movq	RSP-ORIG_RAX(%rsp), %rsp
>  	USERGS_SYSRET64
>  END(entry_SYSCALL_64)
>  
> -- 

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-10-26  8:26 ` [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
@ 2017-10-27 20:11   ` Borislav Petkov
  2017-11-01 11:29     ` Andy Lutomirski
  2017-11-01 17:25   ` Brian Gerst
  1 sibling, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2017-10-27 20:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:39AM -0700, Andy Lutomirski wrote:
> They did almost the same thing.  Remove a bunch of pointless
> instructions (mostly hidden in macros) and reduce cognitive load by
> merging them.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 9dafafa3e0ec..c855ee91a3a5 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -220,10 +220,9 @@ entry_SYSCALL_64_fastpath:
>  	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
>  	movq	RIP(%rsp), %rcx
>  	movq	EFLAGS(%rsp), %r11
> -	RESTORE_C_REGS_EXCEPT_RCX_R11
> -	movq	RSP(%rsp), %rsp
> +	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
>  	UNWIND_HINT_EMPTY
> -	USERGS_SYSRET64
> +	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret
>  
>  1:
>  	/*
> @@ -315,6 +314,7 @@ syscall_return_via_sysret:
>  	/* rcx and r11 are already restored (see code above) */
>  	UNWIND_HINT_EMPTY
>  	POP_EXTRA_REGS
> +.Lpop_c_regs_except_rcx_r11_and_sysret:

.Lpop_regs_sysret I guess.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code
  2017-10-26  8:26 ` [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
  2017-10-26  8:41   ` Juergen Gross
@ 2017-10-27 20:11   ` Borislav Petkov
  1 sibling, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-10-27 20:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds,
	Juergen Gross, Boris Ostrovsky

On Thu, Oct 26, 2017 at 01:26:41AM -0700, Andy Lutomirski wrote:
> Xen PV is fundamentally incompatible with our fancy NMI code: it
> doesn't use IST at all, and Xen entries clobber two stack slots
> below the hardware frame.
> 
> Drop Xen PV support from our NMI code entirely.
> 
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
  2017-10-26  8:26 ` [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
  2017-10-26 18:00   ` Brian Gerst
  2017-10-27 13:51   ` Thomas Gleixner
@ 2017-10-27 20:11   ` Borislav Petkov
  2 siblings, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-10-27 20:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:42AM -0700, Andy Lutomirski wrote:
> This causees the MSR_IA32_SYSENTER_CS write to move out of the

Spellcheck pls.

> paravirt hook.  This shouldn't affect Xen PV: Xen already ignores
> MSR_IA32_SYSENTER_ESP writes.  In any event, Xen doesn't support
> vm86() in a useful way.
> 
> Note to any potential backporters: This patch won't break lguest, as
> lguest didn't have any SYSENTER support at all.

...

> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 11966251cd42..84d6c9f554d0 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -287,6 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  	 * current_thread_info().
>  	 */
>  	load_sp0(tss, next);
> +	refresh_sysenter_cs(next);  /* in case prev or next is vm86 */

Ewww, side comments.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/18] x86/asm/64: Pass sp0 directly to load_sp0()
  2017-10-26  8:26 ` [PATCH 11/18] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
@ 2017-10-27 20:12   ` Borislav Petkov
  0 siblings, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-10-27 20:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:43AM -0700, Andy Lutomirski wrote:
> load_sp0() had an odd signature:
> 
> void load_sp0(struct tss_struct *tss, struct thread_struct *thread);
> 
> Simplify it to:
> 
> void load_sp0(unsigned long sp0);
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/paravirt.h       |  5 ++---
>  arch/x86/include/asm/paravirt_types.h |  2 +-
>  arch/x86/include/asm/processor.h      |  9 ++++-----
>  arch/x86/kernel/cpu/common.c          |  4 ++--
>  arch/x86/kernel/process_32.c          |  2 +-
>  arch/x86/kernel/process_64.c          |  2 +-
>  arch/x86/kernel/vm86_32.c             | 14 ++++++--------
>  arch/x86/xen/enlighten_pv.c           |  7 +++----
>  8 files changed, 20 insertions(+), 25 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0
  2017-10-26  8:26 ` [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
@ 2017-10-28  0:52   ` Brian Gerst
  2017-11-01 10:23   ` Borislav Petkov
  1 sibling, 0 replies; 57+ messages in thread
From: Brian Gerst @ 2017-10-28  0:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On x86_64, we can easily calculate sp0 when needed instead of
> storing it in thread_struct.
>
> On x86_32, a similar cleanup would be possible, but it would require
> cleaning up the vm86 code first, and that can wait for a later
> cleanup series.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/compat.h    |  1 +
>  arch/x86/include/asm/processor.h | 33 +++++++++++++++------------------
>  arch/x86/include/asm/switch_to.h |  6 ++++++
>  arch/x86/kernel/process_64.c     |  1 -
>  4 files changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
> index 5343c19814b3..948b6d8ec46f 100644
> --- a/arch/x86/include/asm/compat.h
> +++ b/arch/x86/include/asm/compat.h
> @@ -6,6 +6,7 @@
>   */
>  #include <linux/types.h>
>  #include <linux/sched.h>
> +#include <linux/sched/task_stack.h>
>  #include <asm/processor.h>
>  #include <asm/user32.h>
>  #include <asm/unistd.h>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index ad59cec14239..562c575d8bc3 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -430,7 +430,9 @@ typedef struct {
>  struct thread_struct {
>         /* Cached TLS descriptors: */
>         struct desc_struct      tls_array[GDT_ENTRY_TLS_ENTRIES];
> +#ifdef CONFIG_X86_32
>         unsigned long           sp0;
> +#endif
>         unsigned long           sp;
>  #ifdef CONFIG_X86_32
>         unsigned long           sysenter_cs;
> @@ -797,6 +799,13 @@ static inline void spin_lock_prefetch(const void *x)
>
>  #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
>
> +#define task_pt_regs(task) \
> +({                                                                     \
> +       unsigned long __ptr = (unsigned long)task_stack_page(task);     \
> +       __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;             \
> +       ((struct pt_regs *)__ptr) - 1;                                  \
> +})
> +
>  #ifdef CONFIG_X86_32
>  /*
>   * User space process size: 3GB (default).
> @@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
>         .addr_limit             = KERNEL_DS,                              \
>  }
>
> -/*
> - * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
> - * This is necessary to guarantee that the entire "struct pt_regs"
> - * is accessible even if the CPU haven't stored the SS/ESP registers
> - * on the stack (interrupt gate does not save these registers
> - * when switching to the same priv ring).
> - * Therefore beware: accessing the ss/esp fields of the
> - * "struct pt_regs" is possible, but they may contain the
> - * completely wrong values.
> - */
> -#define task_pt_regs(task) \
> -({                                                                     \
> -       unsigned long __ptr = (unsigned long)task_stack_page(task);     \
> -       __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;             \
> -       ((struct pt_regs *)__ptr) - 1;                                  \
> -})
> -
>  #define KSTK_ESP(task)         (task_pt_regs(task)->sp)
>
>  #else
> @@ -865,12 +857,17 @@ static inline void spin_lock_prefetch(const void *x)
>  #define STACK_TOP              TASK_SIZE_LOW
>  #define STACK_TOP_MAX          TASK_SIZE_MAX
>
> +#ifdef CONFIG_X86_32
>  #define INIT_THREAD  {                                         \
>         .sp0                    = TOP_OF_INIT_STACK,            \
>         .addr_limit             = KERNEL_DS,                    \
>  }
> +#else
> +#define INIT_THREAD  {                                         \
> +       .addr_limit             = KERNEL_DS,                    \
> +}
> +#endif

There is already a separate INIT_THREAD for 32-bit.  Just delete the sp0 member.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/18] Pile o' entry/exit/sp0 changes
  2017-10-26 15:55 ` [PATCH 00/18] Pile o' entry/exit/sp0 changes Linus Torvalds
@ 2017-10-28  7:31   ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-28  7:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Dave Hansen

On Thu, Oct 26, 2017 at 8:55 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 26 Oct 2017, Andy Lutomirski wrote:
>>
>> Here's the first "real" version of this series.  It's based on Linus'
>> tree from a couple days ago.
>
> Not seeing any problems in this series. I was hoping it would remove more
> lines of code than it adds, and it's the reverse, but that seems to be
> mostly the new push/pop stuff. I'm taking it we couldn't just replace the
> 'mov' instructions in the save/restore macros..

The problem is that I didn't quite finish the job.  v2 gets rid of the
RESTORE macros entirely.  I'll deal with SAVE some other time.

>
> I *do* ask that when we have code like this:
>
>         POP_EXTRA_REGS
>         POP_C_REGS
>         addq    $8, %rsp
>
> that "addq $8" would have a comment about what stack entry we're skipping.
>
> I *think* it's 'orig_eax', but wouldn't it be nice to say so?
>

Done.  (It was orig_ax.)

>             Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot
  2017-10-27 18:21   ` Dave Hansen
@ 2017-10-28  8:10     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-10-28  8:10 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Linus Torvalds

On Fri, Oct 27, 2017 at 11:21 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 10/26/2017 01:26 AM, Andy Lutomirski wrote:
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -48,7 +48,8 @@
>>   */
>>  __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
>>       .x86_tss = {
>> -             .sp0 = TOP_OF_INIT_STACK,
>> +             /* Initialize sp0 to a value that is definitely invalid. */
>> +             .sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
>
> This confused me at first: How does this not poison the init task's stack?
>
> Should the comment maybe say something like:
>
>         The hardware only uses .sp0 (or sp1 or sp2 for that matter) when
>         doing ring transitions.  Since the init task never runs anything
>         other than ring 0, it has no need for a valid value here.
>         Poison it.
>
> to clarify what's going on?

Done.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 15/18] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads
  2017-10-26  8:26 ` [PATCH 15/18] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
@ 2017-11-01 10:08   ` Borislav Petkov
  0 siblings, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-11-01 10:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:47AM -0700, Andy Lutomirski wrote:
> The only remaining readers in context switch code or vm86(), and
> they all just want to update TSS.sp0 to match the current task.
> Replace them all with a new helper update_sp0().
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/switch_to.h | 6 ++++++
>  arch/x86/kernel/process_32.c     | 2 +-
>  arch/x86/kernel/process_64.c     | 2 +-
>  arch/x86/kernel/vm86_32.c        | 4 ++--
>  4 files changed, 10 insertions(+), 4 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 16/18] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot
  2017-10-26  8:26 ` [PATCH 16/18] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
@ 2017-11-01 10:18   ` Borislav Petkov
  0 siblings, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2017-11-01 10:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:48AM -0700, Andy Lutomirski wrote:
> cpu_current_top_of_stack's initialization forgot about
> TOP_OF_KERNEL_STACK_PADDING.  This bug didn't matter because the
> idle threads never enter user mode.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/kernel/smpboot.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ad59edd84de7..06c18fe1c09e 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -961,8 +961,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
>  #ifdef CONFIG_X86_32
>  	/* Stack for startup_32 can be just as for start_secondary onwards */
>  	irq_ctx_init(cpu);
> -	per_cpu(cpu_current_top_of_stack, cpu) =
> -		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
> +	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
>  #else
>  	initial_gs = per_cpu_offset(cpu);
>  #endif
> -- 

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0
  2017-10-26  8:26 ` [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
  2017-10-28  0:52   ` Brian Gerst
@ 2017-11-01 10:23   ` Borislav Petkov
  2017-11-01 10:40     ` Andy Lutomirski
  1 sibling, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2017-11-01 10:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:49AM -0700, Andy Lutomirski wrote:
> On x86_64, we can easily calculate sp0 when needed instead of
> storing it in thread_struct.
> 
> On x86_32, a similar cleanup would be possible, but it would require
> cleaning up the vm86 code first, and that can wait for a later
> cleanup series.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

...

> @@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
>  	.addr_limit		= KERNEL_DS,				  \
>  }
>  
> -/*
> - * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
> - * This is necessary to guarantee that the entire "struct pt_regs"
> - * is accessible even if the CPU haven't stored the SS/ESP registers
> - * on the stack (interrupt gate does not save these registers
> - * when switching to the same priv ring).
> - * Therefore beware: accessing the ss/esp fields of the
> - * "struct pt_regs" is possible, but they may contain the
> - * completely wrong values.
> - */

Why are we removing the explaination of TOP_OF_KERNEL_STACK_PADDING? Is
that going to change later?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 18/18] x86/traps: Use a new on_thread_stack() helper to clean up an assertion
  2017-10-26  8:26 ` [PATCH 18/18] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
@ 2017-11-01 10:31   ` Borislav Petkov
  2017-11-01 10:45     ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2017-11-01 10:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 01:26:50AM -0700, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/processor.h | 6 ++++++
>  arch/x86/kernel/traps.c          | 3 +--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 562c575d8bc3..26fc33231cbd 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -541,6 +541,12 @@ static inline unsigned long current_top_of_stack(void)
>  #endif
>  }
>  
> +static inline bool on_thread_stack(void)
> +{
> +	return (unsigned long)(current_top_of_stack() -
> +			       current_stack_pointer) < THREAD_SIZE;
> +}
> +
>  #ifdef CONFIG_PARAVIRT
>  #include <asm/paravirt.h>
>  #else
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 67db4f43309e..42a9c4458f5d 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs *regs)
>  	 * will catch asm bugs and any attempt to use ist_preempt_enable
>  	 * from double_fault.
>  	 */
> -	BUG_ON((unsigned long)(current_top_of_stack() -
> -			       current_stack_pointer) >= THREAD_SIZE);
> +	BUG_ON(!on_thread_stack());
>  
>  	preempt_enable_no_resched();
>  }
> -- 

Having a commit message is always better. Other than that:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0
  2017-11-01 10:23   ` Borislav Petkov
@ 2017-11-01 10:40     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-11-01 10:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Wed, Nov 1, 2017 at 3:23 AM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, Oct 26, 2017 at 01:26:49AM -0700, Andy Lutomirski wrote:
>> On x86_64, we can easily calculate sp0 when needed instead of
>> storing it in thread_struct.
>>
>> On x86_32, a similar cleanup would be possible, but it would require
>> cleaning up the vm86 code first, and that can wait for a later
>> cleanup series.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>
> ...
>
>> @@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
>>       .addr_limit             = KERNEL_DS,                              \
>>  }
>>
>> -/*
>> - * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
>> - * This is necessary to guarantee that the entire "struct pt_regs"
>> - * is accessible even if the CPU haven't stored the SS/ESP registers
>> - * on the stack (interrupt gate does not save these registers
>> - * when switching to the same priv ring).
>> - * Therefore beware: accessing the ss/esp fields of the
>> - * "struct pt_regs" is possible, but they may contain the
>> - * completely wrong values.
>> - */
>
> Why are we removing the explaination of TOP_OF_KERNEL_STACK_PADDING? Is
> that going to change later?

There's another nearly identical explanatory comment, so it seemed redundant.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 18/18] x86/traps: Use a new on_thread_stack() helper to clean up an assertion
  2017-11-01 10:31   ` Borislav Petkov
@ 2017-11-01 10:45     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-11-01 10:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

I added a commit message :)

On Wed, Nov 1, 2017 at 3:31 AM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, Oct 26, 2017 at 01:26:50AM -0700, Andy Lutomirski wrote:
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/include/asm/processor.h | 6 ++++++
>>  arch/x86/kernel/traps.c          | 3 +--
>>  2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 562c575d8bc3..26fc33231cbd 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -541,6 +541,12 @@ static inline unsigned long current_top_of_stack(void)
>>  #endif
>>  }
>>
>> +static inline bool on_thread_stack(void)
>> +{
>> +     return (unsigned long)(current_top_of_stack() -
>> +                            current_stack_pointer) < THREAD_SIZE;
>> +}
>> +
>>  #ifdef CONFIG_PARAVIRT
>>  #include <asm/paravirt.h>
>>  #else
>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index 67db4f43309e..42a9c4458f5d 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs *regs)
>>        * will catch asm bugs and any attempt to use ist_preempt_enable
>>        * from double_fault.
>>        */
>> -     BUG_ON((unsigned long)(current_top_of_stack() -
>> -                            current_stack_pointer) >= THREAD_SIZE);
>> +     BUG_ON(!on_thread_stack());
>>
>>       preempt_enable_no_resched();
>>  }
>> --
>
> Having a commit message is always better. Other than that:
>
> Reviewed-by: Borislav Petkov <bp@suse.de>
>
> --
> Regards/Gruss,
>     Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> --

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-10-27 20:11   ` Borislav Petkov
@ 2017-11-01 11:29     ` Andy Lutomirski
  2017-11-01 12:35       ` Borislav Petkov
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2017-11-01 11:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Fri, Oct 27, 2017 at 1:11 PM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, Oct 26, 2017 at 01:26:39AM -0700, Andy Lutomirski wrote:
>> They did almost the same thing.  Remove a bunch of pointless
>> instructions (mostly hidden in macros) and reduce cognitive load by
>> merging them.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/entry/entry_64.S | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 9dafafa3e0ec..c855ee91a3a5 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -220,10 +220,9 @@ entry_SYSCALL_64_fastpath:
>>       TRACE_IRQS_ON           /* user mode is traced as IRQs on */
>>       movq    RIP(%rsp), %rcx
>>       movq    EFLAGS(%rsp), %r11
>> -     RESTORE_C_REGS_EXCEPT_RCX_R11
>> -     movq    RSP(%rsp), %rsp
>> +     addq    $6*8, %rsp      /* skip extra regs -- they were preserved */
>>       UNWIND_HINT_EMPTY
>> -     USERGS_SYSRET64
>> +     jmp     .Lpop_c_regs_except_rcx_r11_and_sysret
>>
>>  1:
>>       /*
>> @@ -315,6 +314,7 @@ syscall_return_via_sysret:
>>       /* rcx and r11 are already restored (see code above) */
>>       UNWIND_HINT_EMPTY
>>       POP_EXTRA_REGS
>> +.Lpop_c_regs_except_rcx_r11_and_sysret:
>
> .Lpop_regs_sysret I guess.
>

I'm inclined to leave it.  I first wrote it without the long name and
then I had to re-read the code to make sure I got the register state
right.  The long name serves as documentation that we need rcx and r11
to have special contents.  There's only one place that jumps there and
it's in the same function.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  2017-10-27 20:04   ` Borislav Petkov
@ 2017-11-01 11:32     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-11-01 11:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Fri, Oct 27, 2017 at 1:04 PM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, Oct 26, 2017 at 01:26:34AM -0700, Andy Lutomirski wrote:
>> These code paths will diverge soon.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/entry/entry_64.S        | 32 +++++++++++++++++++++++---------
>>  arch/x86/entry/entry_64_compat.S |  2 +-
>>  arch/x86/kernel/head_64.S        |  2 +-
>>  3 files changed, 25 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index afe1f403fa0e..493e5e234d36 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -321,7 +321,7 @@ syscall_return_via_sysret:
>>
>>  opportunistic_sysret_failed:
>>       SWAPGS
>> -     jmp     restore_regs_and_iret
>> +     jmp     restore_regs_and_return_to_usermode
>
> rstor_regs_ret_user
>
> still sounds pretty ok to me and it is shorter and it is using the
> *RSTOR spelling the respective FPU insns use, so should be recognizable. :-)
>
> And the other one could be
>
> rstor_regs_ret_kernel
>
> of course.
>
>>  END(entry_SYSCALL_64)
>>
>>  ENTRY(stub_ptregs_64)
>> @@ -423,7 +423,7 @@ ENTRY(ret_from_fork)
>>       call    syscall_return_slowpath /* returns with IRQs disabled */
>>       TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
>>       SWAPGS
>> -     jmp     restore_regs_and_iret
>> +     jmp     restore_regs_and_return_to_usermode
>>
>>  1:
>>       /* kernel thread */
>> @@ -612,7 +612,19 @@ GLOBAL(retint_user)
>>       call    prepare_exit_to_usermode
>>       TRACE_IRQS_IRETQ
>>       SWAPGS
>> -     jmp     restore_regs_and_iret
>> +
>> +GLOBAL(restore_regs_and_return_to_usermode)
>> +#ifdef CONFIG_DEBUG_ENTRY
>> +     testl   $3, CS(%rsp)
>
> testb
>
> ditto for the other spot.
>
>> +     jnz     1f
>> +     ud2
>> +1:
>> +#endif
>
> Why aren't we exploding here unconditionally? I mean, we don't want to
> return with RPL != 3 CS for whatever reason...

Performance?  I'll admit that both of these paths are extremely slow
no matter what, but it's nice to get a real dump if we screw it up.

I'll leave it as is for v2.  Feel free to argue more in v2 :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  2017-10-27 18:05   ` Dave Hansen
@ 2017-11-01 11:34     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-11-01 11:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Linus Torvalds

On Fri, Oct 27, 2017 at 11:05 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 10/26/2017 01:26 AM, Andy Lutomirski wrote:
>> +GLOBAL(restore_regs_and_return_to_usermode)
>> +#ifdef CONFIG_DEBUG_ENTRY
>> +     testl   $3, CS(%rsp)
>> +     jnz     1f
>> +     ud2
>
> A nit from the mere mortals in the audience: Could we start commenting
> or make a constant for the user segment bits in CS?

Yeah.  We have such a define, but it's not currently usable from asm.
Also, we can't do the obvious:

testl $SEGMENT_RPL_MASK, ...
jump_if_not_equal_to_KERNEL_RPL

because that makes no sense in asm :(

>
> Also, it would be nice to explain what's going on here.  Maybe:
>
> /*
>  * We think we are returning to the kernel.  Check the
>  * registers we are about to restore and if we appear to
>  * be returning to userspace, do something that will cause
>  * a fault and hopefully an oops report.
>  */
>
> Otherwise, I really like this change.  It's really hard to figure out
> what the context is in the entry assembly in a lot of cases.  It's a
> place where code reuse actually makes things harder to follow.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-11-01 11:29     ` Andy Lutomirski
@ 2017-11-01 12:35       ` Borislav Petkov
  2017-11-01 17:26         ` Thomas Gleixner
  0 siblings, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2017-11-01 12:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Wed, Nov 01, 2017 at 04:29:21AM -0700, Andy Lutomirski wrote:
> I'm inclined to leave it.  I first wrote it without the long name and
> then I had to re-read the code to make sure I got the register state
> right.  The long name serves as documentation that we need rcx and r11
> to have special contents.  There's only one place that jumps there and
> it's in the same function.

Frankly, I'd prefer a sane label name and a good comment over it instead
of insanely long label name which is not as explanatory as a normal
english sentence or two.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-10-26  8:26 ` [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
  2017-10-27 20:11   ` Borislav Petkov
@ 2017-11-01 17:25   ` Brian Gerst
  2017-11-01 20:35     ` Andy Lutomirski
  1 sibling, 1 reply; 57+ messages in thread
From: Brian Gerst @ 2017-11-01 17:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Dave Hansen, Linus Torvalds

On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski <luto@kernel.org> wrote:
> They did almost the same thing.  Remove a bunch of pointless
> instructions (mostly hidden in macros) and reduce cognitive load by
> merging them.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 9dafafa3e0ec..c855ee91a3a5 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -220,10 +220,9 @@ entry_SYSCALL_64_fastpath:
>         TRACE_IRQS_ON           /* user mode is traced as IRQs on */
>         movq    RIP(%rsp), %rcx
>         movq    EFLAGS(%rsp), %r11
> -       RESTORE_C_REGS_EXCEPT_RCX_R11
> -       movq    RSP(%rsp), %rsp
> +       addq    $6*8, %rsp      /* skip extra regs -- they were preserved */
>         UNWIND_HINT_EMPTY
> -       USERGS_SYSRET64
> +       jmp     .Lpop_c_regs_except_rcx_r11_and_sysret
>
>  1:
>         /*
> @@ -315,6 +314,7 @@ syscall_return_via_sysret:
>         /* rcx and r11 are already restored (see code above) */
>         UNWIND_HINT_EMPTY
>         POP_EXTRA_REGS
> +.Lpop_c_regs_except_rcx_r11_and_sysret:
>         popq    %rsi    /* skip r11 */
>         popq    %r10
>         popq    %r9

Wouldn't it be more logical to keep the SYSRET path at the end of the
fast path (reverse of what you are doing here)?  That way the fast
path falls through without jumping.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-11-01 12:35       ` Borislav Petkov
@ 2017-11-01 17:26         ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2017-11-01 17:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds


On Wed, 1 Nov 2017, Borislav Petkov wrote:

> On Wed, Nov 01, 2017 at 04:29:21AM -0700, Andy Lutomirski wrote:
> > I'm inclined to leave it.  I first wrote it without the long name and
> > then I had to re-read the code to make sure I got the register state
> > right.  The long name serves as documentation that we need rcx and r11
> > to have special contents.  There's only one place that jumps there and
> > it's in the same function.
> 
> Frankly, I'd prefer a sane label name and a good comment over it instead
> of insanely long label name which is not as explanatory as a normal
> english sentence or two.

Comments are always appreciated, but in that code really expressive labels
make it way better to follow and prevent accidental wreckage.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-11-01 17:25   ` Brian Gerst
@ 2017-11-01 20:35     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2017-11-01 20:35 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Dave Hansen, Linus Torvalds

On Wed, Nov 1, 2017 at 10:25 AM, Brian Gerst <brgerst@gmail.com> wrote:
> On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> They did almost the same thing.  Remove a bunch of pointless
>> instructions (mostly hidden in macros) and reduce cognitive load by
>> merging them.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/entry/entry_64.S | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 9dafafa3e0ec..c855ee91a3a5 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -220,10 +220,9 @@ entry_SYSCALL_64_fastpath:
>>         TRACE_IRQS_ON           /* user mode is traced as IRQs on */
>>         movq    RIP(%rsp), %rcx
>>         movq    EFLAGS(%rsp), %r11
>> -       RESTORE_C_REGS_EXCEPT_RCX_R11
>> -       movq    RSP(%rsp), %rsp
>> +       addq    $6*8, %rsp      /* skip extra regs -- they were preserved */
>>         UNWIND_HINT_EMPTY
>> -       USERGS_SYSRET64
>> +       jmp     .Lpop_c_regs_except_rcx_r11_and_sysret
>>
>>  1:
>>         /*
>> @@ -315,6 +314,7 @@ syscall_return_via_sysret:
>>         /* rcx and r11 are already restored (see code above) */
>>         UNWIND_HINT_EMPTY
>>         POP_EXTRA_REGS
>> +.Lpop_c_regs_except_rcx_r11_and_sysret:
>>         popq    %rsi    /* skip r11 */
>>         popq    %r10
>>         popq    %r9
>
> Wouldn't it be more logical to keep the SYSRET path at the end of the
> fast path (reverse of what you are doing here)?  That way the fast
> path falls through without jumping.

Ugh, probably.  I think I'll save that for v3, though.  I want to keep
this moving.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [01/18] x86/asm/64: Remove the restore_c_regs_and_iret label
  2017-10-26  8:26 ` [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
  2017-10-26 15:07   ` Borislav Petkov
@ 2017-11-10  6:08   ` kemi
  1 sibling, 0 replies; 57+ messages in thread
From: kemi @ 2017-11-10  6:08 UTC (permalink / raw)
  To: Andrew Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

Some performance regression/improvement is reported by LKP-tools for this patch series
tested with Intel Atom processor. So, post the data here for your reference.

Branch:x86/entry_consolidation
Commit id:
     base:50da9d439392fdd91601d36e7f05728265bff262
     head:69af865668fdb86a95e4e948b1f48b2689d60b73
Benchmark suite:will-it-scale
Download link:https://github.com/antonblanchard/will-it-scale/tree/master/tests
Metrics:
     will-it-scale.per_process_ops=processes/nr_cpu
     will-it-scale.per_thread_ops=threads/nr_cpu

tbox:lkp-avoton3(nr_cpu=8,memory=16G)
CPU: Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz
Performance regression with will-it-scale benchmark suite:
testcase        base            change          head            metric                   
eventfd1        1505677         -5.9%           1416132         will-it-scale.per_process_ops
                1352716         -3.0%           1311943         will-it-scale.per_thread_ops
lseek2          7306698         -4.3%           6991473         will-it-scale.per_process_ops
                4906388         -3.6%           4730531         will-it-scale.per_thread_ops
lseek1          7355365         -4.2%           7046224         will-it-scale.per_process_ops
                4928961         -3.7%           4748791         will-it-scale.per_thread_ops
getppid1        8479806         -4.1%           8129026         will-it-scale.per_process_ops
                8515252         -4.1%           8162076         will-it-scale.per_thread_ops
lock1           1054249         -3.2%           1020895         will-it-scale.per_process_ops
                989145          -2.6%           963578          will-it-scale.per_thread_ops
dup1            2675825         -3.0%           2596257         will-it-scale.per_process_ops
futex3          4986520         -2.8%           4846640         will-it-scale.per_process_ops
                5009388         -2.7%           4875126         will-it-scale.per_thread_ops
futex4          3932936         -2.0%           3854240         will-it-scale.per_process_ops
                3950138         -2.0%           3872615         will-it-scale.per_thread_ops
futex1          2941886         -1.8%           2888912         will-it-scale.per_process_ops
futex2          2500203         -1.6%           2461065         will-it-scale.per_process_ops
                1534692         -2.3%           1499532         will-it-scale.per_thread_ops
malloc1         61314           -1.0%           60725           will-it-scale.per_process_ops
                19996           -1.5%           19688           will-it-scale.per_thread_ops

Performance improvement with will-it-scale benchmark suite:
testcase        base            change          head            metric                   
context_switch1 176376          +1.6%           179152          will-it-scale.per_process_ops
                180703          +1.9%           184209          will-it-scale.per_thread_ops
page_fault2     179716          +2.5%           184272          will-it-scale.per_process_ops
                146890          +2.8%           150989          will-it-scale.per_thread_ops
page_fault3     666953          +3.7%           691735          will-it-scale.per_process_ops
                464641          +5.0%           487952          will-it-scale.per_thread_ops
unix1           483094          +4.4%           504201          will-it-scale.per_process_ops
                450055          +7.5%           483637          will-it-scale.per_thread_ops
read2           575887          +5.0%           604440          will-it-scale.per_process_ops
                500319          +5.2%           526361          will-it-scale.per_thread_ops
poll1           4614597         +5.4%           4864022         will-it-scale.per_process_ops
                3981551         +5.8%           4213409         will-it-scale.per_thread_ops
pwrite2         383344          +5.7%           405151          will-it-scale.per_process_ops
                367006          +5.0%           385209          will-it-scale.per_thread_ops
sched_yield     3011191         +6.0%           3191710         will-it-scale.per_process_ops
                3024171         +6.1%           3208197         will-it-scale.per_thread_ops
pipe1           755487          +6.2%           802622          will-it-scale.per_process_ops
                705136          +8.8%           766950          will-it-scale.per_thread_ops
pwrite3         422850          +6.6%           450660          will-it-scale.per_process_ops
                413370          +3.7%           428704          will-it-scale.per_thread_ops
readseek1       972102          +6.7%           1036852         will-it-scale.per_process_ops
                844877          +6.6%           900686          will-it-scale.per_thread_ops
pwrite1         981310          +6.8%           1047809         will-it-scale.per_process_ops
                944421          +5.7%           998472          will-it-scale.per_thread_ops
pread2          444743          +6.9%           475332          will-it-scale.per_process_ops
                430299          +6.1%           456718          will-it-scale.per_thread_ops
writeseek1      849520          +7.0%           908672          will-it-scale.per_process_ops
                746978          +9.3%           816372          will-it-scale.per_thread_ops
pread3          1108949         +7.2%           1189021         will-it-scale.per_process_ops
                1088521         +5.5%           1148522         will-it-scale.per_thread_ops
mmap1           207314          +7.3%           222442          will-it-scale.per_process_ops
                82533           +6.9%           88199           will-it-scale.per_thread_ops
writeseek3      377973          +7.4%           405853          will-it-scale.per_process_ops
                333156          +11.4%          371100          will-it-scale.per_thread_ops
open2           266217          +7.6%           286335          will-it-scale.per_process_ops
                208208          +6.6%           222052          will-it-scale.per_thread_ops
unlink2         54774           +7.7%           59013           will-it-scale.per_process_ops
                53792           +7.0%           57584           will-it-scale.per_thread_ops
poll2           257458          +8.0%           278072          will-it-scale.per_process_ops
                153400          +8.4%           166256          will-it-scale.per_thread_ops
posix_semaphore1 19898603        +8.3%           21552049        will-it-scale.per_process_ops
                19797092        +8.4%           21458395        will-it-scale.per_thread_ops
pthread_mutex2  35871102        +8.4%           38868017        will-it-scale.per_process_ops
                21506625        +8.4%           23312550        will-it-scale.per_thread_ops
mmap2           154242          +8.5%           167348          will-it-scale.per_process_ops
                62234           +7.4%           66841           will-it-scale.per_thread_ops
unlink1         31487           +9.3%           34404           will-it-scale.per_process_ops
                31607           +8.5%           34285           will-it-scale.per_thread_ops
open1           280301          +9.9%           307995          will-it-scale.per_process_ops
                213863          +7.8%           230585          will-it-scale.per_thread_ops
signal1         355247          +11.2%          394875          will-it-scale.per_process_ops
                176973          +9.7%           194160          will-it-scale.per_thread_ops

============================================================================================
Branch:x86/entry_consolidation
Commit id:
     base:50da9d439392fdd91601d36e7f05728265bff262
     head:69af865668fdb86a95e4e948b1f48b2689d60b73
Benchmark suite:unixbench
Download link:https://github.com/kdlucas/byte-unixbench.git

tbox:lkp-avoton2(nr_cpu=8,memory=16G)
CPU: Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz
Performance regression with unixbench benchmark suite:
testcase        base            change          head            metric                   
syscall         1206            -4.2%           1155            unixbench.score          
pipe            4851            -1.5%           4779            unixbench.score          
execl           498.83          -1.2%           492.90          unixbench.score          

Performance improvement with unixbench benchmark suite:
testcase        base            change          head            metric                   
fsdisk          2150            +2.7%           2208            unixbench.score

=============================================================================================

On 2017年10月26日 16:26, Andrew Lutomirski wrote:
> The only user was the 64-bit opportunistic SYSRET failure path, and
> that path didn't really need it.  This change makes the
> opportunistic SYSRET code a bit more straightforward and gets rid of
> the label.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Reviewed-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/entry/entry_64.S | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 49167258d587..afe1f403fa0e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -245,7 +245,6 @@ entry_SYSCALL64_slow_path:
>  	call	do_syscall_64		/* returns with IRQs disabled */
>  
>  return_from_SYSCALL_64:
> -	RESTORE_EXTRA_REGS
>  	TRACE_IRQS_IRETQ		/* we're about to change IF */
>  
>  	/*
> @@ -314,6 +313,7 @@ return_from_SYSCALL_64:
>  	 */
>  syscall_return_via_sysret:
>  	/* rcx and r11 are already restored (see code above) */
> +	RESTORE_EXTRA_REGS
>  	RESTORE_C_REGS_EXCEPT_RCX_R11
>  	movq	RSP(%rsp), %rsp
>  	UNWIND_HINT_EMPTY
> @@ -321,7 +321,7 @@ syscall_return_via_sysret:
>  
>  opportunistic_sysret_failed:
>  	SWAPGS
> -	jmp	restore_c_regs_and_iret
> +	jmp	restore_regs_and_iret
>  END(entry_SYSCALL_64)
>  
>  ENTRY(stub_ptregs_64)
> @@ -638,7 +638,6 @@ retint_kernel:
>   */
>  GLOBAL(restore_regs_and_iret)
>  	RESTORE_EXTRA_REGS
> -restore_c_regs_and_iret:
>  	RESTORE_C_REGS
>  	REMOVE_PT_GPREGS_FROM_STACK 8
>  	INTERRUPT_RETURN
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2017-11-10  6:09 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-26  8:26 [PATCH 00/18] Pile o' entry/exit/sp0 changes Andy Lutomirski
2017-10-26  8:26 ` [PATCH 01/18] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
2017-10-26 15:07   ` Borislav Petkov
2017-11-10  6:08   ` [01/18] " kemi
2017-10-26  8:26 ` [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
2017-10-27 18:05   ` Dave Hansen
2017-11-01 11:34     ` Andy Lutomirski
2017-10-27 20:04   ` Borislav Petkov
2017-11-01 11:32     ` Andy Lutomirski
2017-10-26  8:26 ` [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
2017-10-26 13:52   ` Brian Gerst
2017-10-26 14:13     ` Dave Hansen
2017-10-26 14:28       ` Borislav Petkov
2017-10-27 15:44         ` Andy Lutomirski
2017-10-27 18:08   ` Dave Hansen
2017-10-26  8:26 ` [PATCH 04/18] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
2017-10-26  8:26 ` [PATCH 05/18] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
2017-10-27 20:07   ` Borislav Petkov
2017-10-26  8:26 ` [PATCH 06/18] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
2017-10-27 20:11   ` Borislav Petkov
2017-10-26  8:26 ` [PATCH 07/18] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
2017-10-27 20:11   ` Borislav Petkov
2017-11-01 11:29     ` Andy Lutomirski
2017-11-01 12:35       ` Borislav Petkov
2017-11-01 17:26         ` Thomas Gleixner
2017-11-01 17:25   ` Brian Gerst
2017-11-01 20:35     ` Andy Lutomirski
2017-10-26  8:26 ` [PATCH 08/18] xen: add xen nmi trap entry Andy Lutomirski
2017-10-26  8:26 ` [PATCH 09/18] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
2017-10-26  8:41   ` Juergen Gross
2017-10-27 20:11   ` Borislav Petkov
2017-10-26  8:26 ` [PATCH 10/18] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
2017-10-26 18:00   ` Brian Gerst
2017-10-27 13:51   ` Thomas Gleixner
2017-10-27 15:50     ` Andy Lutomirski
2017-10-27 20:11   ` Borislav Petkov
2017-10-26  8:26 ` [PATCH 11/18] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
2017-10-27 20:12   ` Borislav Petkov
2017-10-26  8:26 ` [PATCH 12/18] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
2017-10-26  8:26 ` [PATCH 13/18] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
2017-10-26  8:43   ` Juergen Gross
2017-10-26  8:26 ` [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
2017-10-27 18:21   ` Dave Hansen
2017-10-28  8:10     ` Andy Lutomirski
2017-10-26  8:26 ` [PATCH 15/18] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
2017-11-01 10:08   ` Borislav Petkov
2017-10-26  8:26 ` [PATCH 16/18] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
2017-11-01 10:18   ` Borislav Petkov
2017-10-26  8:26 ` [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
2017-10-28  0:52   ` Brian Gerst
2017-11-01 10:23   ` Borislav Petkov
2017-11-01 10:40     ` Andy Lutomirski
2017-10-26  8:26 ` [PATCH 18/18] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
2017-11-01 10:31   ` Borislav Petkov
2017-11-01 10:45     ` Andy Lutomirski
2017-10-26 15:55 ` [PATCH 00/18] Pile o' entry/exit/sp0 changes Linus Torvalds
2017-10-28  7:31   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).