linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/20] Pile o' entry/exit/sp0 changes
@ 2017-11-02  7:58 Andy Lutomirski
  2017-11-02  7:58 ` [PATCH v2 01/20] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
                   ` (19 more replies)
  0 siblings, 20 replies; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:58 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Hi all-

Here we go again!  I haven't done most of the requested label
renaming, nor have I done Brian's reordering of the SYSCALL return
code, mainly because I want to keep the ball rolling.  As for label
renaming, I'd rather let everyone finish arguing and then rename any
labels at the end.  I personally don't mind longish asm labels.

Changes from v1:
 - Comment improvements.
 - Lots of Reviewed-bys added.
 - Fix vm86 bug.
 - Completely remove RESTORE_..._REGS_... (Linus) -- adds two patches.

Changes from the old RFC version:
 - Rebase
 - Add Juergen's patch
 - Add some assertions
 - Cleanups
 
Andy Lutomirski (19):
  x86/asm/64: Remove the restore_c_regs_and_iret label
  x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  x86/asm/64: Simplify reg restore code in the standard IRET paths
  x86/asm/64: Shrink paranoid_exit_restore and make labels local
  x86/asm/64: Use pop instead of movq in syscall_return_via_sysret
  x86/asm/64: Merge the fast and slow SYSRET paths
  x86/entry/64: Use POP instead of MOV to restore regs on NMI return
  x86/entry/64: Remove the RESTORE_..._REGS infrastructure
  x86/asm/64: De-Xen-ify our NMI code
  x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of
    native_load_sp0()
  x86/asm/64: Pass sp0 directly to load_sp0()
  x86/asm: Add task_top_of_stack() to find the top of a task's stack
  x86/xen/64: Clean up SP code in cpu_initialize_context()
  x86/boot/64: Stop initializing TSS.sp0 at boot
  x86/asm/64: Remove all remaining direct thread_struct::sp0 reads
  x86/boot/32: Fix cpu_current_top_of_stack initialization at boot
  x86/asm/64: Remove thread_struct::sp0
  x86/traps: Use a new on_thread_stack() helper to clean up an assertion

Juergen Gross (1):
  xen: add xen nmi trap entry

 arch/x86/entry/calling.h              |  69 +++++------------
 arch/x86/entry/entry_64.S             | 139 ++++++++++++++++++++--------------
 arch/x86/entry/entry_64_compat.S      |   3 +-
 arch/x86/include/asm/compat.h         |   1 +
 arch/x86/include/asm/paravirt.h       |   5 +-
 arch/x86/include/asm/paravirt_types.h |   2 +-
 arch/x86/include/asm/processor.h      |  52 +++++--------
 arch/x86/include/asm/switch_to.h      |  24 ++++++
 arch/x86/include/asm/traps.h          |   2 +-
 arch/x86/kernel/cpu/common.c          |  12 ++-
 arch/x86/kernel/head_64.S             |   2 +-
 arch/x86/kernel/process.c             |   8 +-
 arch/x86/kernel/process_32.c          |   6 +-
 arch/x86/kernel/process_64.c          |   5 +-
 arch/x86/kernel/smpboot.c             |   3 +-
 arch/x86/kernel/traps.c               |   3 +-
 arch/x86/kernel/vm86_32.c             |  20 ++---
 arch/x86/xen/enlighten_pv.c           |   9 +--
 arch/x86/xen/smp_pv.c                 |  17 ++++-
 arch/x86/xen/xen-asm_64.S             |   2 +-
 20 files changed, 208 insertions(+), 176 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 01/20] x86/asm/64: Remove the restore_c_regs_and_iret label
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
@ 2017-11-02  7:58 ` Andy Lutomirski
  2017-11-02 10:49   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:58 ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:58 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

The only user was the 64-bit opportunistic SYSRET failure path, and
that path didn't really need it.  This change makes the
opportunistic SYSRET code a bit more straightforward and gets rid of
the label.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49167258d587..afe1f403fa0e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -245,7 +245,6 @@ entry_SYSCALL64_slow_path:
 	call	do_syscall_64		/* returns with IRQs disabled */
 
 return_from_SYSCALL_64:
-	RESTORE_EXTRA_REGS
 	TRACE_IRQS_IRETQ		/* we're about to change IF */
 
 	/*
@@ -314,6 +313,7 @@ return_from_SYSCALL_64:
 	 */
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
+	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS_EXCEPT_RCX_R11
 	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
@@ -321,7 +321,7 @@ syscall_return_via_sysret:
 
 opportunistic_sysret_failed:
 	SWAPGS
-	jmp	restore_c_regs_and_iret
+	jmp	restore_regs_and_iret
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -638,7 +638,6 @@ retint_kernel:
  */
 GLOBAL(restore_regs_and_iret)
 	RESTORE_EXTRA_REGS
-restore_c_regs_and_iret:
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
 	INTERRUPT_RETURN
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
  2017-11-02  7:58 ` [PATCH v2 01/20] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
@ 2017-11-02  7:58 ` Andy Lutomirski
  2017-11-02 10:49   ` [tip:x86/asm] x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths tip-bot for Andy Lutomirski
  2017-11-02 10:50   ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Borislav Petkov
  2017-11-02  7:59 ` [PATCH v2 03/20] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:58 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

These code paths will diverge soon.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S        | 34 +++++++++++++++++++++++++---------
 arch/x86/entry/entry_64_compat.S |  2 +-
 arch/x86/kernel/head_64.S        |  2 +-
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index afe1f403fa0e..07fe816f0d28 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -321,7 +321,7 @@ syscall_return_via_sysret:
 
 opportunistic_sysret_failed:
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -423,7 +423,7 @@ ENTRY(ret_from_fork)
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 
 1:
 	/* kernel thread */
@@ -612,7 +612,20 @@ GLOBAL(retint_user)
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
 	SWAPGS
-	jmp	restore_regs_and_iret
+
+GLOBAL(restore_regs_and_return_to_usermode)
+#ifdef CONFIG_DEBUG_ENTRY
+	/* Assert that pt_regs indicates user mode. */
+	testl	$3, CS(%rsp)
+	jnz	1f
+	ud2
+1:
+#endif
+	RESTORE_EXTRA_REGS
+	RESTORE_C_REGS
+	REMOVE_PT_GPREGS_FROM_STACK 8
+	INTERRUPT_RETURN
+
 
 /* Returning to kernel space */
 retint_kernel:
@@ -632,11 +645,14 @@ retint_kernel:
 	 */
 	TRACE_IRQS_IRETQ
 
-/*
- * At this label, code paths which return to kernel and to user,
- * which come from interrupts/exception and from syscalls, merge.
- */
-GLOBAL(restore_regs_and_iret)
+GLOBAL(restore_regs_and_return_to_kernel)
+#ifdef CONFIG_DEBUG_ENTRY
+	/* Assert that pt_regs indicates kernel mode. */
+	testl	$3, CS(%rsp)
+	jz	1f
+	ud2
+1:
+#endif
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1327,7 +1343,7 @@ ENTRY(nmi)
 	 * work, because we don't want to enable interrupts.
 	 */
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e26c25ca7756..9ca014a99968 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -337,7 +337,7 @@ ENTRY(entry_INT80_compat)
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 END(entry_INT80_compat)
 
 ENTRY(stub32_clone)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 513cbb012ecc..0b01a105251b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -326,7 +326,7 @@ early_idt_handler_common:
 
 20:
 	decl early_recursion_flag(%rip)
-	jmp restore_regs_and_iret
+	jmp restore_regs_and_return_to_kernel
 ENDPROC(early_idt_handler_common)
 
 	__INITDATA
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 03/20] x86/asm/64: Move SWAPGS into the common iret-to-usermode path
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
  2017-11-02  7:58 ` [PATCH v2 01/20] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
  2017-11-02  7:58 ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:49   ` [tip:x86/asm] x86/entry/64: Move SWAPGS into the common IRET-to-usermode path tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 04/20] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

All of the code paths that ended up doing IRET to usermode did
SWAPGS immediately beforehand.  Move the SWAPGS into the common
code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S        | 32 ++++++++++++++------------------
 arch/x86/entry/entry_64_compat.S |  3 +--
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 07fe816f0d28..9bfa34c3b6b5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -249,12 +249,14 @@ return_from_SYSCALL_64:
 
 	/*
 	 * Try to use SYSRET instead of IRET if we're returning to
-	 * a completely clean 64-bit userspace context.
+	 * a completely clean 64-bit userspace context.  If we're not,
+	 * go to the slow exit path.
 	 */
 	movq	RCX(%rsp), %rcx
 	movq	RIP(%rsp), %r11
-	cmpq	%rcx, %r11			/* RCX == RIP */
-	jne	opportunistic_sysret_failed
+
+	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
@@ -272,14 +274,14 @@ return_from_SYSCALL_64:
 
 	/* If this changed %rcx, it was not canonical */
 	cmpq	%rcx, %r11
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	movq	R11(%rsp), %r11
 	cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
@@ -300,12 +302,12 @@ return_from_SYSCALL_64:
 	 * would never get past 'stuck_here'.
 	 */
 	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
-	jnz	opportunistic_sysret_failed
+	jnz	swapgs_restore_regs_and_return_to_usermode
 
 	/* nothing to check for RSP */
 
 	cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * We win! This label is here just for ease of understanding
@@ -318,10 +320,6 @@ syscall_return_via_sysret:
 	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
 	USERGS_SYSRET64
-
-opportunistic_sysret_failed:
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -422,8 +420,7 @@ ENTRY(ret_from_fork)
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 
 1:
 	/* kernel thread */
@@ -611,9 +608,8 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
-	SWAPGS
 
-GLOBAL(restore_regs_and_return_to_usermode)
+GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates user mode. */
 	testl	$3, CS(%rsp)
@@ -621,6 +617,7 @@ GLOBAL(restore_regs_and_return_to_usermode)
 	ud2
 1:
 #endif
+	SWAPGS
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1342,8 +1339,7 @@ ENTRY(nmi)
 	 * Return back to user mode.  We must *not* do the normal exit
 	 * work, because we don't want to enable interrupts.
 	 */
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9ca014a99968..932b96ce1b06 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -336,8 +336,7 @@ ENTRY(entry_INT80_compat)
 
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 END(entry_INT80_compat)
 
 ENTRY(stub32_clone)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 04/20] x86/asm/64: Simplify reg restore code in the standard IRET paths
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (2 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 03/20] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:50   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 05/20] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

The old code restored all the registers with movq instead of pop.
In theory, this was done because some CPUs have higher movq
throughput, but any gain there would be tiny and is almost certainly
outweighed by the higher text size.

This saves 96 bytes of text.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/calling.h  | 21 +++++++++++++++++++++
 arch/x86/entry/entry_64.S | 12 ++++++------
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 640aafebdc00..0b9dd8123701 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -151,6 +151,27 @@ For 32-bit we have the following conventions - kernel is built with
 	UNWIND_HINT_REGS offset=\offset extra=0
 	.endm
 
+	.macro POP_EXTRA_REGS
+	popq %r15
+	popq %r14
+	popq %r13
+	popq %r12
+	popq %rbp
+	popq %rbx
+	.endm
+
+	.macro POP_C_REGS
+	popq %r11
+	popq %r10
+	popq %r9
+	popq %r8
+	popq %rax
+	popq %rcx
+	popq %rdx
+	popq %rsi
+	popq %rdi
+	.endm
+
 	.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
 	.if \rstor_r11
 	movq 6*8(%rsp), %r11
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9bfa34c3b6b5..36a2a8b6ec99 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -618,9 +618,9 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 1:
 #endif
 	SWAPGS
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
+	POP_EXTRA_REGS
+	POP_C_REGS
+	addq	$8, %rsp	/* skip regs->orig_ax */
 	INTERRUPT_RETURN
 
 
@@ -650,9 +650,9 @@ GLOBAL(restore_regs_and_return_to_kernel)
 	ud2
 1:
 #endif
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
+	POP_EXTRA_REGS
+	POP_C_REGS
+	addq	$8, %rsp	/* skip regs->orig_ax */
 	INTERRUPT_RETURN
 
 ENTRY(native_iret)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 05/20] x86/asm/64: Shrink paranoid_exit_restore and make labels local
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (3 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 04/20] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:50   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 06/20] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

paranoid_exit_restore was a copy of
restore_regs_and_return_to_kernel.  Merge them and make the
paranoid_exit internal labels local.

Keeping .Lparanoid_exit makes the code a bit shorter because it
allows a 2-byte jnz instead of a 5-byte jnz.

Saves 96 bytes of text.

(This is still a bit suboptimal in a non-CONFIG_TRACE_IRQFLAGS
 kernel, but fixing that would make the code rather messy.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 36a2a8b6ec99..e70303258daf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1123,17 +1123,14 @@ ENTRY(paranoid_exit)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
 	testl	%ebx, %ebx			/* swapgs needed? */
-	jnz	paranoid_exit_no_swapgs
+	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
 	SWAPGS_UNSAFE_STACK
-	jmp	paranoid_exit_restore
-paranoid_exit_no_swapgs:
+	jmp	.Lparanoid_exit_restore
+.Lparanoid_exit_no_swapgs:
 	TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
-	INTERRUPT_RETURN
+.Lparanoid_exit_restore:
+	jmp restore_regs_and_return_to_kernel
 END(paranoid_exit)
 
 /*
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 06/20] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (4 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 05/20] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:51   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 07/20] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Saves 64 bytes.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e70303258daf..86fdce00e682 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -315,10 +315,18 @@ return_from_SYSCALL_64:
 	 */
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
+	POP_EXTRA_REGS
+	popq	%rsi	/* skip r11 */
+	popq	%r10
+	popq	%r9
+	popq	%r8
+	popq	%rax
+	popq	%rsi	/* skip rcx */
+	popq	%rdx
+	popq	%rsi
+	popq	%rdi
+	movq	RSP-ORIG_RAX(%rsp), %rsp
 	USERGS_SYSRET64
 END(entry_SYSCALL_64)
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 07/20] x86/asm/64: Merge the fast and slow SYSRET paths
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (5 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 06/20] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:51   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 08/20] x86/entry/64: Use POP instead of MOV to restore regs on NMI return Andy Lutomirski
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

They did almost the same thing.  Remove a bunch of pointless
instructions (mostly hidden in macros) and reduce cognitive load by
merging them.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 86fdce00e682..2e7f4952af94 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -220,10 +220,9 @@ entry_SYSCALL_64_fastpath:
 	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RSP(%rsp), %rsp
+	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
 	UNWIND_HINT_EMPTY
-	USERGS_SYSRET64
+	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret
 
 1:
 	/*
@@ -317,6 +316,7 @@ syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	UNWIND_HINT_EMPTY
 	POP_EXTRA_REGS
+.Lpop_c_regs_except_rcx_r11_and_sysret:
 	popq	%rsi	/* skip r11 */
 	popq	%r10
 	popq	%r9
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 08/20] x86/entry/64: Use POP instead of MOV to restore regs on NMI return
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (6 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 07/20] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:51   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 09/20] x86/entry/64: Remove the RESTORE_..._REGS infrastructure Andy Lutomirski
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This gets rid of the last user of the old RESTORE_..._REGS infrastructure.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 2e7f4952af94..f0f842124aa9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1559,11 +1559,14 @@ end_repeat_nmi:
 nmi_swapgs:
 	SWAPGS_UNSAFE_STACK
 nmi_restore:
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
+	POP_EXTRA_REGS
+	POP_C_REGS
 
-	/* Point RSP at the "iret" frame. */
-	REMOVE_PT_GPREGS_FROM_STACK 6*8
+	/*
+	 * Skip orig_ax and the "outermost" frame to point RSP at the "iret"
+	 * at the "iret" frame.
+	 */
+	addq	$6*8, %rsp
 
 	/*
 	 * Clear "NMI executing".  Set DF first so that we can easily
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 09/20] x86/entry/64: Remove the RESTORE_..._REGS infrastructure
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (7 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 08/20] x86/entry/64: Use POP instead of MOV to restore regs on NMI return Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:52   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 10/20] xen: add xen nmi trap entry Andy Lutomirski
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

All users of RESTORE_EXTRA_REGS, RESTORE_C_REGS and such, and
REMOVE_PT_GPREGS_FROM_STACK are gone.  Delete the macros.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/calling.h | 52 ------------------------------------------------
 1 file changed, 52 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0b9dd8123701..1895a685d3dd 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -141,16 +141,6 @@ For 32-bit we have the following conventions - kernel is built with
 	UNWIND_HINT_REGS offset=\offset
 	.endm
 
-	.macro RESTORE_EXTRA_REGS offset=0
-	movq 0*8+\offset(%rsp), %r15
-	movq 1*8+\offset(%rsp), %r14
-	movq 2*8+\offset(%rsp), %r13
-	movq 3*8+\offset(%rsp), %r12
-	movq 4*8+\offset(%rsp), %rbp
-	movq 5*8+\offset(%rsp), %rbx
-	UNWIND_HINT_REGS offset=\offset extra=0
-	.endm
-
 	.macro POP_EXTRA_REGS
 	popq %r15
 	popq %r14
@@ -172,48 +162,6 @@ For 32-bit we have the following conventions - kernel is built with
 	popq %rdi
 	.endm
 
-	.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
-	.if \rstor_r11
-	movq 6*8(%rsp), %r11
-	.endif
-	.if \rstor_r8910
-	movq 7*8(%rsp), %r10
-	movq 8*8(%rsp), %r9
-	movq 9*8(%rsp), %r8
-	.endif
-	.if \rstor_rax
-	movq 10*8(%rsp), %rax
-	.endif
-	.if \rstor_rcx
-	movq 11*8(%rsp), %rcx
-	.endif
-	.if \rstor_rdx
-	movq 12*8(%rsp), %rdx
-	.endif
-	movq 13*8(%rsp), %rsi
-	movq 14*8(%rsp), %rdi
-	UNWIND_HINT_IRET_REGS offset=16*8
-	.endm
-	.macro RESTORE_C_REGS
-	RESTORE_C_REGS_HELPER 1,1,1,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_RAX
-	RESTORE_C_REGS_HELPER 0,1,1,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_RCX
-	RESTORE_C_REGS_HELPER 1,0,1,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_R11
-	RESTORE_C_REGS_HELPER 1,1,0,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_RCX_R11
-	RESTORE_C_REGS_HELPER 1,0,0,1,1
-	.endm
-
-	.macro REMOVE_PT_GPREGS_FROM_STACK addskip=0
-	subq $-(15*8+\addskip), %rsp
-	.endm
-
 	.macro icebp
 	.byte 0xf1
 	.endm
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 10/20] xen: add xen nmi trap entry
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (8 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 09/20] x86/entry/64: Remove the RESTORE_..._REGS infrastructure Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:52   ` [tip:x86/asm] xen, x86/entry/64: Add xen NMI " tip-bot for Juergen Gross
  2017-11-02  7:59 ` [PATCH v2 11/20] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Juergen Gross, Andy Lutomirski

From: Juergen Gross <jgross@suse.com>

Instead of trying to execute any NMI via the bare metal's NMI trap
handler use a Xen specific one for pv domains, like we do for e.g.
debug traps. As in a pv domain the NMI is handled via the normal
kernel stack this is the correct thing to do.

This will enable us to get rid of the very fragile and questionable
dependencies between the bare metal NMI handler and Xen assumptions
believed to be broken anyway.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S    | 2 +-
 arch/x86/include/asm/traps.h | 2 +-
 arch/x86/xen/enlighten_pv.c  | 2 +-
 arch/x86/xen/xen-asm_64.S    | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f0f842124aa9..b4df83177d14 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1078,6 +1078,7 @@ idtentry int3			do_int3			has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
 #ifdef CONFIG_XEN
+idtentry xennmi			do_nmi			has_error_code=0
 idtentry xendebug		do_debug		has_error_code=0
 idtentry xenint3		do_int3			has_error_code=0
 #endif
@@ -1240,7 +1241,6 @@ ENTRY(error_exit)
 END(error_exit)
 
 /* Runs on exception stack */
-/* XXX: broken on Xen PV */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
 	/*
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 5545f6459bf5..26f1a88e42da 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,9 +37,9 @@ asmlinkage void simd_coprocessor_error(void);
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
 asmlinkage void xen_divide_error(void);
+asmlinkage void xen_xennmi(void);
 asmlinkage void xen_xendebug(void);
 asmlinkage void xen_xenint3(void);
-asmlinkage void xen_nmi(void);
 asmlinkage void xen_overflow(void);
 asmlinkage void xen_bounds(void);
 asmlinkage void xen_invalid_op(void);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 69b9deff7e5c..8da4eff19c2a 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -600,7 +600,7 @@ static struct trap_array_entry trap_array[] = {
 #ifdef CONFIG_X86_MCE
 	{ machine_check,               xen_machine_check,               true },
 #endif
-	{ nmi,                         xen_nmi,                         true },
+	{ nmi,                         xen_xennmi,                      true },
 	{ overflow,                    xen_overflow,                    false },
 #ifdef CONFIG_IA32_EMULATION
 	{ entry_INT80_compat,          xen_entry_INT80_compat,          false },
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index dae2cc33afb5..286ecc198562 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -29,7 +29,7 @@ xen_pv_trap debug
 xen_pv_trap xendebug
 xen_pv_trap int3
 xen_pv_trap xenint3
-xen_pv_trap nmi
+xen_pv_trap xennmi
 xen_pv_trap overflow
 xen_pv_trap bounds
 xen_pv_trap invalid_op
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 11/20] x86/asm/64: De-Xen-ify our NMI code
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (9 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 10/20] xen: add xen nmi trap entry Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:53   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 12/20] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski, Juergen Gross, Boris Ostrovsky

Xen PV is fundamentally incompatible with our fancy NMI code: it
doesn't use IST at all, and Xen entries clobber two stack slots
below the hardware frame.

Drop Xen PV support from our NMI code entirely.

Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b4df83177d14..b58fb6335850 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1240,9 +1240,13 @@ ENTRY(error_exit)
 	jmp	retint_user
 END(error_exit)
 
-/* Runs on exception stack */
+/*
+ * Runs on exception stack.  Xen PV does not go through this path at all,
+ * so we can use real assembly here.
+ */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
+
 	/*
 	 * We allow breakpoints in NMIs. If a breakpoint occurs, then
 	 * the iretq it performs will take us out of NMI context.
@@ -1300,7 +1304,7 @@ ENTRY(nmi)
 	 * stacks lest we corrupt the "NMI executing" variable.
 	 */
 
-	SWAPGS_UNSAFE_STACK
+	swapgs
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1465,7 +1469,7 @@ nested_nmi_out:
 	popq	%rdx
 
 	/* We are returning to kernel mode, so this cannot result in a fault. */
-	INTERRUPT_RETURN
+	iretq
 
 first_nmi:
 	/* Restore rdx. */
@@ -1496,7 +1500,7 @@ first_nmi:
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
 	pushq	$1f		/* RIP */
-	INTERRUPT_RETURN	/* continues at repeat_nmi below */
+	iretq			/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
 #endif
@@ -1571,20 +1575,22 @@ nmi_restore:
 	/*
 	 * Clear "NMI executing".  Set DF first so that we can easily
 	 * distinguish the remaining code between here and IRET from
-	 * the SYSCALL entry and exit paths.  On a native kernel, we
-	 * could just inspect RIP, but, on paravirt kernels,
-	 * INTERRUPT_RETURN can translate into a jump into a
-	 * hypercall page.
+	 * the SYSCALL entry and exit paths.
+	 *
+	 * We arguably should just inspect RIP instead, but I (Andy) wrote
+	 * this code when I had the misapprehension that Xen PV supported
+	 * NMIs, and Xen PV would break that approach.
 	 */
 	std
 	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
 
 	/*
-	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
-	 * stack in a single instruction.  We are returning to kernel
-	 * mode, so this cannot result in a fault.
+	 * iretq reads the "iret" frame and exits the NMI stack in a
+	 * single instruction.  We are returning to kernel mode, so this
+	 * cannot result in a fault.  Similarly, we don't need to worry
+	 * about espfix64 on the way back to kernel mode.
 	 */
-	INTERRUPT_RETURN
+	iretq
 END(nmi)
 
 ENTRY(ignore_sysret)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 12/20] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (10 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 11/20] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:53   ` [tip:x86/asm] x86/entry/32: Pull the " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This causes the MSR_IA32_SYSENTER_CS write to move out of the
paravirt hook.  This shouldn't affect Xen PV: Xen already ignores
MSR_IA32_SYSENTER_ESP writes.  In any event, Xen doesn't support
vm86() in a useful way.

Note to any potential backporters: This patch won't break lguest, as
lguest didn't have any SYSENTER support at all.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h |  7 -------
 arch/x86/include/asm/switch_to.h | 12 ++++++++++++
 arch/x86/kernel/process_32.c     |  4 +++-
 arch/x86/kernel/process_64.c     |  2 +-
 arch/x86/kernel/vm86_32.c        |  6 +++++-
 5 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b390ff76e58f..0167e3e35a57 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -520,13 +520,6 @@ static inline void
 native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
 {
 	tss->x86_tss.sp0 = thread->sp0;
-#ifdef CONFIG_X86_32
-	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
-	if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
-		tss->x86_tss.ss1 = thread->sysenter_cs;
-		wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
-	}
-#endif
 }
 
 static inline void native_swapgs(void)
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index fcc5cd387fd1..7ae8caffbada 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -72,4 +72,16 @@ do {									\
 	((last) = __switch_to_asm((prev), (next)));			\
 } while (0)
 
+#ifdef CONFIG_X86_32
+static inline void refresh_sysenter_cs(struct thread_struct *thread)
+{
+	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
+	if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
+		return;
+
+	this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);
+	wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
+}
+#endif
+
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 11966251cd42..0936ed3da6b6 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -284,9 +284,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	/*
 	 * Reload esp0 and cpu_current_top_of_stack.  This changes
-	 * current_thread_info().
+	 * current_thread_info().  Refresh the SYSENTER configuration in
+	 * case prev or next is vm86.
 	 */
 	load_sp0(tss, next);
+	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 302e7b2572d1..a6ff6d1a0110 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -464,7 +464,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 */
 	this_cpu_write(current_task, next_p);
 
-	/* Reload esp0 and ss1.  This changes current_thread_info(). */
+	/* Reload sp0. */
 	load_sp0(tss, next);
 
 	/*
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 7924a5356c8a..5bc1c3ab6287 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -54,6 +54,7 @@
 #include <asm/irq.h>
 #include <asm/traps.h>
 #include <asm/vm86.h>
+#include <asm/switch_to.h>
 
 /*
  * Known problems:
@@ -149,6 +150,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
 	load_sp0(tss, &tsk->thread);
+	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	put_cpu();
 
@@ -368,8 +370,10 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	/* make room for real-mode segments */
 	tsk->thread.sp0 += 16;
 
-	if (static_cpu_has(X86_FEATURE_SEP))
+	if (static_cpu_has(X86_FEATURE_SEP)) {
 		tsk->thread.sysenter_cs = 0;
+		refresh_sysenter_cs(&tsk->thread);
+	}
 
 	load_sp0(tss, &tsk->thread);
 	put_cpu();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0()
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (11 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 12/20] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02  9:48   ` Ingo Molnar
  2017-11-02 10:53   ` [tip:x86/asm] x86/entry/64: Pass SP0 " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 14/20] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
                   ` (6 subsequent siblings)
  19 siblings, 2 replies; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

load_sp0() had an odd signature:

void load_sp0(struct tss_struct *tss, struct thread_struct *thread);

Simplify it to:

void load_sp0(unsigned long sp0);

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/paravirt.h       |  5 ++---
 arch/x86/include/asm/paravirt_types.h |  2 +-
 arch/x86/include/asm/processor.h      |  9 ++++-----
 arch/x86/kernel/cpu/common.c          |  4 ++--
 arch/x86/kernel/process_32.c          |  2 +-
 arch/x86/kernel/process_64.c          |  2 +-
 arch/x86/kernel/vm86_32.c             | 14 ++++++--------
 arch/x86/xen/enlighten_pv.c           |  7 +++----
 8 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 12deec722cf0..43d4f90edebc 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,10 +15,9 @@
 #include <linux/cpumask.h>
 #include <asm/frame.h>
 
-static inline void load_sp0(struct tss_struct *tss,
-			     struct thread_struct *thread)
+static inline void load_sp0(unsigned long sp0)
 {
-	PVOP_VCALL2(pv_cpu_ops.load_sp0, tss, thread);
+	PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0);
 }
 
 /* The paravirtualized CPUID instruction. */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 280d94c36dad..a916788ac478 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -133,7 +133,7 @@ struct pv_cpu_ops {
 	void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries);
 	void (*free_ldt)(struct desc_struct *ldt, unsigned entries);
 
-	void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t);
+	void (*load_sp0)(unsigned long sp0);
 
 	void (*set_iopl_mask)(unsigned mask);
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0167e3e35a57..064b84722166 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -517,9 +517,9 @@ static inline void native_set_iopl_mask(unsigned mask)
 }
 
 static inline void
-native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
+native_load_sp0(unsigned long sp0)
 {
-	tss->x86_tss.sp0 = thread->sp0;
+	this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
 }
 
 static inline void native_swapgs(void)
@@ -544,10 +544,9 @@ static inline unsigned long current_top_of_stack(void)
 #else
 #define __cpuid			native_cpuid
 
-static inline void load_sp0(struct tss_struct *tss,
-			    struct thread_struct *thread)
+static inline void load_sp0(unsigned long sp0)
 {
-	native_load_sp0(tss, thread);
+	native_load_sp0(sp0);
 }
 
 #define set_iopl_mask native_set_iopl_mask
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c9176bae7fd8..079648bd85ed 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1572,7 +1572,7 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, me);
 
-	load_sp0(t, &current->thread);
+	load_sp0(current->thread.sp0);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
 	load_mm_ldt(&init_mm);
@@ -1627,7 +1627,7 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, curr);
 
-	load_sp0(t, thread);
+	load_sp0(thread->sp0);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
 	load_mm_ldt(&init_mm);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0936ed3da6b6..40b85870e429 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * current_thread_info().  Refresh the SYSENTER configuration in
 	 * case prev or next is vm86.
 	 */
-	load_sp0(tss, next);
+	load_sp0(next->sp0);
 	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a6ff6d1a0110..2124304fb77a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 
 	/* Reload sp0. */
-	load_sp0(tss, next);
+	load_sp0(next->sp0);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 5bc1c3ab6287..0f1d92cd20ad 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -94,7 +94,6 @@
 
 void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 {
-	struct tss_struct *tss;
 	struct task_struct *tsk = current;
 	struct vm86plus_struct __user *user;
 	struct vm86 *vm86 = current->thread.vm86;
@@ -146,13 +145,13 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 		do_exit(SIGSEGV);
 	}
 
-	tss = &per_cpu(cpu_tss, get_cpu());
+	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	load_sp0(tss, &tsk->thread);
+	load_sp0(tsk->thread.sp0);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
-	put_cpu();
+	preempt_enable();
 
 	memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));
 
@@ -238,7 +237,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
 
 static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 {
-	struct tss_struct *tss;
 	struct task_struct *tsk = current;
 	struct vm86 *vm86 = tsk->thread.vm86;
 	struct kernel_vm86_regs vm86regs;
@@ -366,8 +364,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	vm86->saved_sp0 = tsk->thread.sp0;
 	lazy_save_gs(vm86->regs32.gs);
 
-	tss = &per_cpu(cpu_tss, get_cpu());
 	/* make room for real-mode segments */
+	preempt_disable();
 	tsk->thread.sp0 += 16;
 
 	if (static_cpu_has(X86_FEATURE_SEP)) {
@@ -375,8 +373,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	load_sp0(tss, &tsk->thread);
-	put_cpu();
+	load_sp0(tsk->thread.sp0);
+	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)
 		mark_screen_rdonly(tsk->mm);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 8da4eff19c2a..e7b213047724 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -810,15 +810,14 @@ static void __init xen_write_gdt_entry_boot(struct desc_struct *dt, int entry,
 	}
 }
 
-static void xen_load_sp0(struct tss_struct *tss,
-			 struct thread_struct *thread)
+static void xen_load_sp0(unsigned long sp0)
 {
 	struct multicall_space mcs;
 
 	mcs = xen_mc_entry(0);
-	MULTI_stack_switch(mcs.mc, __KERNEL_DS, thread->sp0);
+	MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0);
 	xen_mc_issue(PARAVIRT_LAZY_CPU);
-	tss->x86_tss.sp0 = thread->sp0;
+	this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
 }
 
 void xen_set_iopl_mask(unsigned mask)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 14/20] x86/asm: Add task_top_of_stack() to find the top of a task's stack
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (12 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:54   ` [tip:x86/asm] x86/entry: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 15/20] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This will let us get rid of a few places that hardcode accesses to
thread.sp0.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 064b84722166..ad59cec14239 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -795,6 +795,8 @@ static inline void spin_lock_prefetch(const void *x)
 #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \
 			   TOP_OF_KERNEL_STACK_PADDING)
 
+#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
+
 #ifdef CONFIG_X86_32
 /*
  * User space process size: 3GB (default).
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 15/20] x86/xen/64: Clean up SP code in cpu_initialize_context()
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (13 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 14/20] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02  9:56   ` Juergen Gross
  2017-11-02 10:54   ` [tip:x86/asm] x86/xen/64, x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 16/20] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
                   ` (4 subsequent siblings)
  19 siblings, 2 replies; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski, Juergen Gross, Boris Ostrovsky

I'm removing thread_struct::sp0, and Xen's usage of it is slightly
dubious and unnecessary.  Use appropriate helpers instead.

While we're at at, reorder the code slightly to make it more obvious
what's going on.

Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/xen/smp_pv.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 51471408fdd1..8c0e047d0b80 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -13,6 +13,7 @@
  * single-threaded.
  */
 #include <linux/sched.h>
+#include <linux/sched/task_stack.h>
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
@@ -293,12 +294,19 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 #endif
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
+	/*
+	 * Bring up the CPU in cpu_bringup_and_idle() with the stack
+	 * pointing just below where pt_regs would be if it were a normal
+	 * kernel entry.
+	 */
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 	ctxt->flags = VGCF_IN_KERNEL;
 	ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 	ctxt->user_regs.ds = __USER_DS;
 	ctxt->user_regs.es = __USER_DS;
 	ctxt->user_regs.ss = __KERNEL_DS;
+	ctxt->user_regs.cs = __KERNEL_CS;
+	ctxt->user_regs.esp = (unsigned long)task_pt_regs(idle);
 
 	xen_copy_trap_info(ctxt->trap_ctxt);
 
@@ -313,8 +321,13 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 	ctxt->gdt_frames[0] = gdt_mfn;
 	ctxt->gdt_ents      = GDT_ENTRIES;
 
+	/*
+	 * Set SS:SP that Xen will use when entering guest kernel mode
+	 * from guest user mode.  Subsequent calls to load_sp0() can
+	 * change this value.
+	 */
 	ctxt->kernel_ss = __KERNEL_DS;
-	ctxt->kernel_sp = idle->thread.sp0;
+	ctxt->kernel_sp = task_top_of_stack(idle);
 
 #ifdef CONFIG_X86_32
 	ctxt->event_callback_cs     = __KERNEL_CS;
@@ -326,10 +339,8 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 		(unsigned long)xen_hypervisor_callback;
 	ctxt->failsafe_callback_eip =
 		(unsigned long)xen_failsafe_callback;
-	ctxt->user_regs.cs = __KERNEL_CS;
 	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
 
-	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt))
 		BUG();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 16/20] x86/boot/64: Stop initializing TSS.sp0 at boot
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (14 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 15/20] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:55   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 17/20] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

In my quest to get rid of thread_struct::sp0, I want to clean up or
remove all of its readers.  Two of them are in cpu_init() (32-bit and
64-bit), and they aren't needed.  This is because we never enter
userspace at all on the threads that CPUs are initialized in.

Poison the initial TSS.sp0 and stop initializing it on CPU init.

The comment text mostly comes from Dave Hansen.  Thanks!

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/cpu/common.c | 12 ++++++++++--
 arch/x86/kernel/process.c    |  8 +++++++-
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 079648bd85ed..adc02cb351e0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1572,9 +1572,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, me);
 
-	load_sp0(current->thread.sp0);
+	/*
+	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
+	 * task never enters user mode.
+	 */
 	set_tss_desc(cpu, t);
 	load_TR_desc();
+
 	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
@@ -1627,9 +1631,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, curr);
 
-	load_sp0(thread->sp0);
+	/*
+	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
+	 * task never enters user mode.
+	 */
 	set_tss_desc(cpu, t);
 	load_TR_desc();
+
 	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd6b85fac666..ff8a9acbcf8b 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -48,7 +48,13 @@
  */
 __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 	.x86_tss = {
-		.sp0 = TOP_OF_INIT_STACK,
+		/*
+		 * .sp0 is only used when entering ring 0 from a lower
+		 * privilege level.  Since the init task never runs anything
+		 * but ring 0 code, there is no need for a valid value here.
+		 * Poison it.
+		 */
+		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
 		.ss1 = __KERNEL_CS,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 17/20] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (15 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 16/20] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:55   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 18/20] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

The only remaining readers in context switch code or vm86(), and
they all just want to update TSS.sp0 to match the current task.
Replace them all with a new helper update_sp0().

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/switch_to.h | 6 ++++++
 arch/x86/kernel/process_32.c     | 2 +-
 arch/x86/kernel/process_64.c     | 2 +-
 arch/x86/kernel/vm86_32.c        | 4 ++--
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 7ae8caffbada..54e64d909725 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -84,4 +84,10 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 }
 #endif
 
+/* This is used when switching tasks or entering/exiting vm86 mode. */
+static inline void update_sp0(struct task_struct *task)
+{
+	load_sp0(task->thread.sp0);
+}
+
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 40b85870e429..45bf0c5f93e1 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * current_thread_info().  Refresh the SYSENTER configuration in
 	 * case prev or next is vm86.
 	 */
-	load_sp0(next->sp0);
+	update_sp0(next_p);
 	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2124304fb77a..45e380958392 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 
 	/* Reload sp0. */
-	load_sp0(next->sp0);
+	update_sp0(next_p);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 0f1d92cd20ad..a7b44c75c642 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -148,7 +148,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	load_sp0(tsk->thread.sp0);
+	update_sp0(tsk);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	preempt_enable();
@@ -373,7 +373,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	load_sp0(tsk->thread.sp0);
+	update_sp0(tsk);
 	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 18/20] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (16 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 17/20] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:56   ` [tip:x86/asm] x86/entry/32: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 19/20] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 20/20] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

cpu_current_top_of_stack's initialization forgot about
TOP_OF_KERNEL_STACK_PADDING.  This bug didn't matter because the
idle threads never enter user mode.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/smpboot.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ad59edd84de7..06c18fe1c09e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -961,8 +961,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
-	per_cpu(cpu_current_top_of_stack, cpu) =
-		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
+	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
 #else
 	initial_gs = per_cpu_offset(cpu);
 #endif
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 19/20] x86/asm/64: Remove thread_struct::sp0
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (17 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 18/20] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:56   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
  2017-11-02  7:59 ` [PATCH v2 20/20] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

On x86_64, we can easily calculate sp0 when needed instead of
storing it in thread_struct.

On x86_32, a similar cleanup would be possible, but it would require
cleaning up the vm86 code first, and that can wait for a later
cleanup series.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/compat.h    |  1 +
 arch/x86/include/asm/processor.h | 28 +++++++++-------------------
 arch/x86/include/asm/switch_to.h |  6 ++++++
 arch/x86/kernel/process_64.c     |  1 -
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 5343c19814b3..948b6d8ec46f 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -6,6 +6,7 @@
  */
 #include <linux/types.h>
 #include <linux/sched.h>
+#include <linux/sched/task_stack.h>
 #include <asm/processor.h>
 #include <asm/user32.h>
 #include <asm/unistd.h>
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ad59cec14239..ae2ae6d80674 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -430,7 +430,9 @@ typedef struct {
 struct thread_struct {
 	/* Cached TLS descriptors: */
 	struct desc_struct	tls_array[GDT_ENTRY_TLS_ENTRIES];
+#ifdef CONFIG_X86_32
 	unsigned long		sp0;
+#endif
 	unsigned long		sp;
 #ifdef CONFIG_X86_32
 	unsigned long		sysenter_cs;
@@ -797,6 +799,13 @@ static inline void spin_lock_prefetch(const void *x)
 
 #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
 
+#define task_pt_regs(task) \
+({									\
+	unsigned long __ptr = (unsigned long)task_stack_page(task);	\
+	__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;		\
+	((struct pt_regs *)__ptr) - 1;					\
+})
+
 #ifdef CONFIG_X86_32
 /*
  * User space process size: 3GB (default).
@@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
 	.addr_limit		= KERNEL_DS,				  \
 }
 
-/*
- * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
- * This is necessary to guarantee that the entire "struct pt_regs"
- * is accessible even if the CPU haven't stored the SS/ESP registers
- * on the stack (interrupt gate does not save these registers
- * when switching to the same priv ring).
- * Therefore beware: accessing the ss/esp fields of the
- * "struct pt_regs" is possible, but they may contain the
- * completely wrong values.
- */
-#define task_pt_regs(task) \
-({									\
-	unsigned long __ptr = (unsigned long)task_stack_page(task);	\
-	__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;		\
-	((struct pt_regs *)__ptr) - 1;					\
-})
-
 #define KSTK_ESP(task)		(task_pt_regs(task)->sp)
 
 #else
@@ -866,11 +858,9 @@ static inline void spin_lock_prefetch(const void *x)
 #define STACK_TOP_MAX		TASK_SIZE_MAX
 
 #define INIT_THREAD  {						\
-	.sp0			= TOP_OF_INIT_STACK,		\
 	.addr_limit		= KERNEL_DS,			\
 }
 
-#define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 54e64d909725..010cd6e4eafc 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_SWITCH_TO_H
 #define _ASM_X86_SWITCH_TO_H
 
+#include <linux/sched/task_stack.h>
+
 struct task_struct; /* one of the stranger aspects of C forward declarations */
 
 struct task_struct *__switch_to_asm(struct task_struct *prev,
@@ -87,7 +89,11 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 /* This is used when switching tasks or entering/exiting vm86 mode. */
 static inline void update_sp0(struct task_struct *task)
 {
+#ifdef CONFIG_X86_32
 	load_sp0(task->thread.sp0);
+#else
+	load_sp0(task_top_of_stack(task));
+#endif
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 45e380958392..eeeb34f85c25 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -274,7 +274,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	struct inactive_task_frame *frame;
 	struct task_struct *me = current;
 
-	p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
 	childregs = task_pt_regs(p);
 	fork_frame = container_of(childregs, struct fork_frame, regs);
 	frame = &fork_frame->frame;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 20/20] x86/traps: Use a new on_thread_stack() helper to clean up an assertion
  2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
                   ` (18 preceding siblings ...)
  2017-11-02  7:59 ` [PATCH v2 19/20] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
@ 2017-11-02  7:59 ` Andy Lutomirski
  2017-11-02 10:56   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  19 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02  7:59 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Let's keep the stack-related logic together rather than open-coding
a comparison in an assertion in the traps code.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 6 ++++++
 arch/x86/kernel/traps.c          | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ae2ae6d80674..f10dae14f951 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -541,6 +541,12 @@ static inline unsigned long current_top_of_stack(void)
 #endif
 }
 
+static inline bool on_thread_stack(void)
+{
+	return (unsigned long)(current_top_of_stack() -
+			       current_stack_pointer) < THREAD_SIZE;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 67db4f43309e..42a9c4458f5d 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs *regs)
 	 * will catch asm bugs and any attempt to use ist_preempt_enable
 	 * from double_fault.
 	 */
-	BUG_ON((unsigned long)(current_top_of_stack() -
-			       current_stack_pointer) >= THREAD_SIZE);
+	BUG_ON(!on_thread_stack());
 
 	preempt_enable_no_resched();
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0()
  2017-11-02  7:59 ` [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
@ 2017-11-02  9:48   ` Ingo Molnar
  2017-11-02  9:53     ` Ingo Molnar
  2017-11-02 10:32     ` Andy Lutomirski
  2017-11-02 10:53   ` [tip:x86/asm] x86/entry/64: Pass SP0 " tip-bot for Andy Lutomirski
  1 sibling, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2017-11-02  9:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds


* Andy Lutomirski <luto@kernel.org> wrote:

> load_sp0() had an odd signature:
> 
> void load_sp0(struct tss_struct *tss, struct thread_struct *thread);
> 
> Simplify it to:
> 
> void load_sp0(unsigned long sp0);

I also added this to the changelog:

> Also simplify a few get_cpu()/put_cpu() sequences to
> preempt_disable()/preempt_enable().

Plus:

> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1572,7 +1572,7 @@ void cpu_init(void)
>  	initialize_tlbstate_and_flush();
>  	enter_lazy_tlb(&init_mm, me);
>  
> -	load_sp0(t, &current->thread);
> +	load_sp0(current->thread.sp0);
>  	set_tss_desc(cpu, t);
>  	load_TR_desc();
>  	load_mm_ldt(&init_mm);
> @@ -1627,7 +1627,7 @@ void cpu_init(void)
>  	initialize_tlbstate_and_flush();
>  	enter_lazy_tlb(&init_mm, curr);
>  
> -	load_sp0(t, thread);
> +	load_sp0(thread->sp0);
>  	set_tss_desc(cpu, t);
>  	load_TR_desc();
>  	load_mm_ldt(&init_mm);

In the 32-bit path this was the last use of 'thread', making the local variable 
unused - I removed it.

Just curious: did you build/boot-test 32-bit kernels, or should we consider it 
mostly untested?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0()
  2017-11-02  9:48   ` Ingo Molnar
@ 2017-11-02  9:53     ` Ingo Molnar
  2017-11-02 10:32     ` Andy Lutomirski
  1 sibling, 0 replies; 48+ messages in thread
From: Ingo Molnar @ 2017-11-02  9:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Andy Lutomirski <luto@kernel.org> wrote:
> 
> > load_sp0() had an odd signature:
> > 
> > void load_sp0(struct tss_struct *tss, struct thread_struct *thread);
> > 
> > Simplify it to:
> > 
> > void load_sp0(unsigned long sp0);
> 
> I also added this to the changelog:
> 
> > Also simplify a few get_cpu()/put_cpu() sequences to
> > preempt_disable()/preempt_enable().
> 
> Plus:
> 
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1572,7 +1572,7 @@ void cpu_init(void)
> >  	initialize_tlbstate_and_flush();
> >  	enter_lazy_tlb(&init_mm, me);
> >  
> > -	load_sp0(t, &current->thread);
> > +	load_sp0(current->thread.sp0);
> >  	set_tss_desc(cpu, t);
> >  	load_TR_desc();
> >  	load_mm_ldt(&init_mm);
> > @@ -1627,7 +1627,7 @@ void cpu_init(void)
> >  	initialize_tlbstate_and_flush();
> >  	enter_lazy_tlb(&init_mm, curr);
> >  
> > -	load_sp0(t, thread);
> > +	load_sp0(thread->sp0);
> >  	set_tss_desc(cpu, t);
> >  	load_TR_desc();
> >  	load_mm_ldt(&init_mm);
> 
> In the 32-bit path this was the last use of 'thread', making the local variable 
> unused - I removed it.

Correction, it's patch #16 that removes the last reference:

   [PATCH v2 16/20] x86/boot/64: Stop initializing TSS.sp0 at boot

I removed the local variable there.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 15/20] x86/xen/64: Clean up SP code in cpu_initialize_context()
  2017-11-02  7:59 ` [PATCH v2 15/20] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
@ 2017-11-02  9:56   ` Juergen Gross
  2017-11-02 10:54   ` [tip:x86/asm] x86/xen/64, x86/entry/64: " tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 48+ messages in thread
From: Juergen Gross @ 2017-11-02  9:56 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Boris Ostrovsky

On 02/11/17 08:59, Andy Lutomirski wrote:
> I'm removing thread_struct::sp0, and Xen's usage of it is slightly
> dubious and unnecessary.  Use appropriate helpers instead.
> 
> While we're at at, reorder the code slightly to make it more obvious
> what's going on.
> 
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0()
  2017-11-02  9:48   ` Ingo Molnar
  2017-11-02  9:53     ` Ingo Molnar
@ 2017-11-02 10:32     ` Andy Lutomirski
  1 sibling, 0 replies; 48+ messages in thread
From: Andy Lutomirski @ 2017-11-02 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Dave Hansen, Linus Torvalds



> On Nov 2, 2017, at 10:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
> 
> 
> * Andy Lutomirski <luto@kernel.org> wrote:
> 
>> load_sp0() had an odd signature:
>> 
>> void load_sp0(struct tss_struct *tss, struct thread_struct *thread);
>> 
>> Simplify it to:
>> 
>> void load_sp0(unsigned long sp0);
> 
> I also added this to the changelog:
> 
>> Also simplify a few get_cpu()/put_cpu() sequences to
>> preempt_disable()/preempt_enable().
> 
> Plus:
> 
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -1572,7 +1572,7 @@ void cpu_init(void)
>>    initialize_tlbstate_and_flush();
>>    enter_lazy_tlb(&init_mm, me);
>> 
>> -    load_sp0(t, &current->thread);
>> +    load_sp0(current->thread.sp0);
>>    set_tss_desc(cpu, t);
>>    load_TR_desc();
>>    load_mm_ldt(&init_mm);
>> @@ -1627,7 +1627,7 @@ void cpu_init(void)
>>    initialize_tlbstate_and_flush();
>>    enter_lazy_tlb(&init_mm, curr);
>> 
>> -    load_sp0(t, thread);
>> +    load_sp0(thread->sp0);
>>    set_tss_desc(cpu, t);
>>    load_TR_desc();
>>    load_mm_ldt(&init_mm);
> 
> In the 32-bit path this was the last use of 'thread', making the local variable 
> unused - I removed it.
> 
> Just curious: did you build/boot-test 32-bit kernels, or should we consider it 
> mostly untested?

I tested it in an earlier version, but I'm away from my real computer, so I haven't tested as well as I should.  It should be run through at least the selftests on 32-bit, 64-bit, and Xen PV.

> 
> Thanks,
> 
>    Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Remove the restore_c_regs_and_iret label
  2017-11-02  7:58 ` [PATCH v2 01/20] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
@ 2017-11-02 10:49   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, hpa, mingo, linux-kernel, dave.hansen, peterz, tglx,
	bpetkov, bp, torvalds, brgerst

Commit-ID:  9da78ba6b47b46428cfdfc0851511ab29c869798
Gitweb:     https://git.kernel.org/tip/9da78ba6b47b46428cfdfc0851511ab29c869798
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:58:58 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:36 +0100

x86/entry/64: Remove the restore_c_regs_and_iret label

The only user was the 64-bit opportunistic SYSRET failure path, and
that path didn't really need it.  This change makes the
opportunistic SYSRET code a bit more straightforward and gets rid of
the label.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/be3006a7ad3326e3458cf1cc55d416252cbe1986.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 846e84a..e8ef83d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -245,7 +245,6 @@ entry_SYSCALL64_slow_path:
 	call	do_syscall_64		/* returns with IRQs disabled */
 
 return_from_SYSCALL_64:
-	RESTORE_EXTRA_REGS
 	TRACE_IRQS_IRETQ		/* we're about to change IF */
 
 	/*
@@ -314,6 +313,7 @@ return_from_SYSCALL_64:
 	 */
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
+	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS_EXCEPT_RCX_R11
 	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
@@ -321,7 +321,7 @@ syscall_return_via_sysret:
 
 opportunistic_sysret_failed:
 	SWAPGS
-	jmp	restore_c_regs_and_iret
+	jmp	restore_regs_and_iret
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -638,7 +638,6 @@ retint_kernel:
  */
 GLOBAL(restore_regs_and_iret)
 	RESTORE_EXTRA_REGS
-restore_c_regs_and_iret:
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
 	INTERRUPT_RETURN

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths
  2017-11-02  7:58 ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
@ 2017-11-02 10:49   ` tip-bot for Andy Lutomirski
  2017-11-02 10:50   ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Borislav Petkov
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, torvalds, linux-kernel, hpa, luto, bpetkov, peterz,
	brgerst, dave.hansen, mingo

Commit-ID:  26c4ef9c49d8a0341f6d97ce2cfdd55d1236ed29
Gitweb:     https://git.kernel.org/tip/26c4ef9c49d8a0341f6d97ce2cfdd55d1236ed29
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:58:59 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:37 +0100

x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths

These code paths will diverge soon.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dccf8c7b3750199b4b30383c812d4e2931811509.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S        | 34 +++++++++++++++++++++++++---------
 arch/x86/entry/entry_64_compat.S |  2 +-
 arch/x86/kernel/head_64.S        |  2 +-
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e8ef83d..3eeb169 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -321,7 +321,7 @@ syscall_return_via_sysret:
 
 opportunistic_sysret_failed:
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -423,7 +423,7 @@ ENTRY(ret_from_fork)
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 
 1:
 	/* kernel thread */
@@ -612,7 +612,20 @@ GLOBAL(retint_user)
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
 	SWAPGS
-	jmp	restore_regs_and_iret
+
+GLOBAL(restore_regs_and_return_to_usermode)
+#ifdef CONFIG_DEBUG_ENTRY
+	/* Assert that pt_regs indicates user mode. */
+	testl	$3, CS(%rsp)
+	jnz	1f
+	ud2
+1:
+#endif
+	RESTORE_EXTRA_REGS
+	RESTORE_C_REGS
+	REMOVE_PT_GPREGS_FROM_STACK 8
+	INTERRUPT_RETURN
+
 
 /* Returning to kernel space */
 retint_kernel:
@@ -632,11 +645,14 @@ retint_kernel:
 	 */
 	TRACE_IRQS_IRETQ
 
-/*
- * At this label, code paths which return to kernel and to user,
- * which come from interrupts/exception and from syscalls, merge.
- */
-GLOBAL(restore_regs_and_iret)
+GLOBAL(restore_regs_and_return_to_kernel)
+#ifdef CONFIG_DEBUG_ENTRY
+	/* Assert that pt_regs indicates kernel mode. */
+	testl	$3, CS(%rsp)
+	jz	1f
+	ud2
+1:
+#endif
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1327,7 +1343,7 @@ ENTRY(nmi)
 	 * work, because we don't want to enable interrupts.
 	 */
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e26c25c..9ca014a 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -337,7 +337,7 @@ ENTRY(entry_INT80_compat)
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
 	SWAPGS
-	jmp	restore_regs_and_iret
+	jmp	restore_regs_and_return_to_usermode
 END(entry_INT80_compat)
 
 ENTRY(stub32_clone)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 189bf42..08f067f 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -326,7 +326,7 @@ early_idt_handler_common:
 
 20:
 	decl early_recursion_flag(%rip)
-	jmp restore_regs_and_iret
+	jmp restore_regs_and_return_to_kernel
 END(early_idt_handler_common)
 
 	__INITDATA

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Move SWAPGS into the common IRET-to-usermode path
  2017-11-02  7:59 ` [PATCH v2 03/20] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
@ 2017-11-02 10:49   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, torvalds, linux-kernel, bpetkov, peterz, tglx,
	dave.hansen, brgerst, luto

Commit-ID:  8a055d7f411d41755ce30db5bb65b154777c4b78
Gitweb:     https://git.kernel.org/tip/8a055d7f411d41755ce30db5bb65b154777c4b78
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:00 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:38 +0100

x86/entry/64: Move SWAPGS into the common IRET-to-usermode path

All of the code paths that ended up doing IRET to usermode did
SWAPGS immediately beforehand.  Move the SWAPGS into the common
code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27fd6f45b7cd640de38fb9066fd0349bcd11f8e1.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S        | 32 ++++++++++++++------------------
 arch/x86/entry/entry_64_compat.S |  3 +--
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3eeb169..d6ffdc9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -249,12 +249,14 @@ return_from_SYSCALL_64:
 
 	/*
 	 * Try to use SYSRET instead of IRET if we're returning to
-	 * a completely clean 64-bit userspace context.
+	 * a completely clean 64-bit userspace context.  If we're not,
+	 * go to the slow exit path.
 	 */
 	movq	RCX(%rsp), %rcx
 	movq	RIP(%rsp), %r11
-	cmpq	%rcx, %r11			/* RCX == RIP */
-	jne	opportunistic_sysret_failed
+
+	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
@@ -272,14 +274,14 @@ return_from_SYSCALL_64:
 
 	/* If this changed %rcx, it was not canonical */
 	cmpq	%rcx, %r11
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	movq	R11(%rsp), %r11
 	cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
@@ -300,12 +302,12 @@ return_from_SYSCALL_64:
 	 * would never get past 'stuck_here'.
 	 */
 	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
-	jnz	opportunistic_sysret_failed
+	jnz	swapgs_restore_regs_and_return_to_usermode
 
 	/* nothing to check for RSP */
 
 	cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
-	jne	opportunistic_sysret_failed
+	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * We win! This label is here just for ease of understanding
@@ -318,10 +320,6 @@ syscall_return_via_sysret:
 	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
 	USERGS_SYSRET64
-
-opportunistic_sysret_failed:
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
 END(entry_SYSCALL_64)
 
 ENTRY(stub_ptregs_64)
@@ -422,8 +420,7 @@ ENTRY(ret_from_fork)
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 
 1:
 	/* kernel thread */
@@ -611,9 +608,8 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
-	SWAPGS
 
-GLOBAL(restore_regs_and_return_to_usermode)
+GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates user mode. */
 	testl	$3, CS(%rsp)
@@ -621,6 +617,7 @@ GLOBAL(restore_regs_and_return_to_usermode)
 	ud2
 1:
 #endif
+	SWAPGS
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1342,8 +1339,7 @@ ENTRY(nmi)
 	 * Return back to user mode.  We must *not* do the normal exit
 	 * work, because we don't want to enable interrupts.
 	 */
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9ca014a..932b96c 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -336,8 +336,7 @@ ENTRY(entry_INT80_compat)
 
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
-	SWAPGS
-	jmp	restore_regs_and_return_to_usermode
+	jmp	swapgs_restore_regs_and_return_to_usermode
 END(entry_INT80_compat)
 
 ENTRY(stub32_clone)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths
  2017-11-02  7:58 ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
  2017-11-02 10:49   ` [tip:x86/asm] x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths tip-bot for Andy Lutomirski
@ 2017-11-02 10:50   ` Borislav Petkov
  2017-11-02 12:09     ` [PATCH] x86/entry/64: Shorten TEST instructions Borislav Petkov
  1 sibling, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2017-11-02 10:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Nov 02, 2017 at 12:58:59AM -0700, Andy Lutomirski wrote:
> These code paths will diverge soon.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S        | 34 +++++++++++++++++++++++++---------
>  arch/x86/entry/entry_64_compat.S |  2 +-
>  arch/x86/kernel/head_64.S        |  2 +-
>  3 files changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index afe1f403fa0e..07fe816f0d28 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -321,7 +321,7 @@ syscall_return_via_sysret:
>  
>  opportunistic_sysret_failed:
>  	SWAPGS
> -	jmp	restore_regs_and_iret
> +	jmp	restore_regs_and_return_to_usermode
>  END(entry_SYSCALL_64)
>  
>  ENTRY(stub_ptregs_64)
> @@ -423,7 +423,7 @@ ENTRY(ret_from_fork)
>  	call	syscall_return_slowpath	/* returns with IRQs disabled */
>  	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
>  	SWAPGS
> -	jmp	restore_regs_and_iret
> +	jmp	restore_regs_and_return_to_usermode
>  
>  1:
>  	/* kernel thread */
> @@ -612,7 +612,20 @@ GLOBAL(retint_user)
>  	call	prepare_exit_to_usermode
>  	TRACE_IRQS_IRETQ
>  	SWAPGS
> -	jmp	restore_regs_and_iret
> +
> +GLOBAL(restore_regs_and_return_to_usermode)
> +#ifdef CONFIG_DEBUG_ENTRY
> +	/* Assert that pt_regs indicates user mode. */
> +	testl	$3, CS(%rsp)
> +	jnz	1f
> +	ud2
> +1:
> +#endif

Here's me arguing in v2:

If these paths are slow and adding a TEST and a Jcc would give us the
additional sanity-checking, then I don't see any downside to it.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Simplify reg restore code in the standard IRET paths
  2017-11-02  7:59 ` [PATCH v2 04/20] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
@ 2017-11-02 10:50   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, luto, bpetkov, mingo, tglx, peterz, torvalds, brgerst,
	linux-kernel, dave.hansen

Commit-ID:  e872045bfd9c465a8555bab4b8567d56a4d2d3bb
Gitweb:     https://git.kernel.org/tip/e872045bfd9c465a8555bab4b8567d56a4d2d3bb
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:01 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:38 +0100

x86/entry/64: Simplify reg restore code in the standard IRET paths

The old code restored all the registers with movq instead of pop.

In theory, this was done because some CPUs have higher movq
throughput, but any gain there would be tiny and is almost certainly
outweighed by the higher text size.

This saves 96 bytes of text.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ad82520a207ccd851b04ba613f4f752b33ac05f7.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/calling.h  | 21 +++++++++++++++++++++
 arch/x86/entry/entry_64.S | 12 ++++++------
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 640aafe..0b9dd81 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -151,6 +151,27 @@ For 32-bit we have the following conventions - kernel is built with
 	UNWIND_HINT_REGS offset=\offset extra=0
 	.endm
 
+	.macro POP_EXTRA_REGS
+	popq %r15
+	popq %r14
+	popq %r13
+	popq %r12
+	popq %rbp
+	popq %rbx
+	.endm
+
+	.macro POP_C_REGS
+	popq %r11
+	popq %r10
+	popq %r9
+	popq %r8
+	popq %rax
+	popq %rcx
+	popq %rdx
+	popq %rsi
+	popq %rdi
+	.endm
+
 	.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
 	.if \rstor_r11
 	movq 6*8(%rsp), %r11
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d6ffdc9..925d562 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -618,9 +618,9 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 1:
 #endif
 	SWAPGS
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
+	POP_EXTRA_REGS
+	POP_C_REGS
+	addq	$8, %rsp	/* skip regs->orig_ax */
 	INTERRUPT_RETURN
 
 
@@ -650,9 +650,9 @@ GLOBAL(restore_regs_and_return_to_kernel)
 	ud2
 1:
 #endif
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
+	POP_EXTRA_REGS
+	POP_C_REGS
+	addq	$8, %rsp	/* skip regs->orig_ax */
 	INTERRUPT_RETURN
 
 ENTRY(native_iret)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Shrink paranoid_exit_restore and make labels local
  2017-11-02  7:59 ` [PATCH v2 05/20] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
@ 2017-11-02 10:50   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, tglx, linux-kernel, dave.hansen, hpa, bpetkov, brgerst,
	luto, torvalds, mingo

Commit-ID:  e53178328c9b96fbdbc719e78c93b5687ee007c3
Gitweb:     https://git.kernel.org/tip/e53178328c9b96fbdbc719e78c93b5687ee007c3
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:02 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:39 +0100

x86/entry/64: Shrink paranoid_exit_restore and make labels local

paranoid_exit_restore was a copy of restore_regs_and_return_to_kernel.
Merge them and make the paranoid_exit internal labels local.

Keeping .Lparanoid_exit makes the code a bit shorter because it
allows a 2-byte jnz instead of a 5-byte jnz.

Saves 96 bytes of text.

( This is still a bit suboptimal in a non-CONFIG_TRACE_IRQFLAGS
  kernel, but fixing that would make the code rather messy. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/510d66a1895cda9473c84b1086f0bb974f22de6a.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 925d562..15539644 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1123,17 +1123,14 @@ ENTRY(paranoid_exit)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
 	testl	%ebx, %ebx			/* swapgs needed? */
-	jnz	paranoid_exit_no_swapgs
+	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
 	SWAPGS_UNSAFE_STACK
-	jmp	paranoid_exit_restore
-paranoid_exit_no_swapgs:
+	jmp	.Lparanoid_exit_restore
+.Lparanoid_exit_no_swapgs:
 	TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
-	INTERRUPT_RETURN
+.Lparanoid_exit_restore:
+	jmp restore_regs_and_return_to_kernel
 END(paranoid_exit)
 
 /*

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Use pop instead of movq in syscall_return_via_sysret
  2017-11-02  7:59 ` [PATCH v2 06/20] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
@ 2017-11-02 10:51   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, hpa, luto, bpetkov, linux-kernel,
	peterz, dave.hansen, tglx, bp

Commit-ID:  4fbb39108f972437c44e5ffa781b56635d496826
Gitweb:     https://git.kernel.org/tip/4fbb39108f972437c44e5ffa781b56635d496826
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:03 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:39 +0100

x86/entry/64: Use pop instead of movq in syscall_return_via_sysret

Saves 64 bytes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6609b7f74ab31c36604ad746e019ea8495aec76c.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 15539644..4f9b446 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -315,10 +315,18 @@ return_from_SYSCALL_64:
 	 */
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RSP(%rsp), %rsp
 	UNWIND_HINT_EMPTY
+	POP_EXTRA_REGS
+	popq	%rsi	/* skip r11 */
+	popq	%r10
+	popq	%r9
+	popq	%r8
+	popq	%rax
+	popq	%rsi	/* skip rcx */
+	popq	%rdx
+	popq	%rsi
+	popq	%rdi
+	movq	RSP-ORIG_RAX(%rsp), %rsp
 	USERGS_SYSRET64
 END(entry_SYSCALL_64)
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Merge the fast and slow SYSRET paths
  2017-11-02  7:59 ` [PATCH v2 07/20] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
@ 2017-11-02 10:51   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, peterz, hpa, mingo, tglx, luto, torvalds,
	linux-kernel, bpetkov, brgerst

Commit-ID:  a512210643da8082cb44181dba8b18e752bd68f0
Gitweb:     https://git.kernel.org/tip/a512210643da8082cb44181dba8b18e752bd68f0
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:04 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:40 +0100

x86/entry/64: Merge the fast and slow SYSRET paths

They did almost the same thing.  Remove a bunch of pointless
instructions (mostly hidden in macros) and reduce cognitive load by
merging them.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1204e20233fcab9130a1ba80b3b1879b5db3fc1f.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4f9b446..b5a0ea6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -220,10 +220,9 @@ entry_SYSCALL_64_fastpath:
 	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RSP(%rsp), %rsp
+	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
 	UNWIND_HINT_EMPTY
-	USERGS_SYSRET64
+	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret
 
 1:
 	/*
@@ -317,6 +316,7 @@ syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	UNWIND_HINT_EMPTY
 	POP_EXTRA_REGS
+.Lpop_c_regs_except_rcx_r11_and_sysret:
 	popq	%rsi	/* skip r11 */
 	popq	%r10
 	popq	%r9

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Use POP instead of MOV to restore regs on NMI return
  2017-11-02  7:59 ` [PATCH v2 08/20] x86/entry/64: Use POP instead of MOV to restore regs on NMI return Andy Lutomirski
@ 2017-11-02 10:51   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, hpa, tglx, torvalds, linux-kernel, luto, brgerst, mingo,
	bpetkov, dave.hansen

Commit-ID:  471ee4832209e986029b9fabdaad57b1eecb856b
Gitweb:     https://git.kernel.org/tip/471ee4832209e986029b9fabdaad57b1eecb856b
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:05 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:40 +0100

x86/entry/64: Use POP instead of MOV to restore regs on NMI return

This gets rid of the last user of the old RESTORE_..._REGS infrastructure.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/652a260f17a160789bc6a41d997f98249b73e2ab.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b5a0ea6..5b2f0bc 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1559,11 +1559,14 @@ end_repeat_nmi:
 nmi_swapgs:
 	SWAPGS_UNSAFE_STACK
 nmi_restore:
-	RESTORE_EXTRA_REGS
-	RESTORE_C_REGS
+	POP_EXTRA_REGS
+	POP_C_REGS
 
-	/* Point RSP at the "iret" frame. */
-	REMOVE_PT_GPREGS_FROM_STACK 6*8
+	/*
+	 * Skip orig_ax and the "outermost" frame to point RSP at the "iret"
+	 * at the "iret" frame.
+	 */
+	addq	$6*8, %rsp
 
 	/*
 	 * Clear "NMI executing".  Set DF first so that we can easily

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Remove the RESTORE_..._REGS infrastructure
  2017-11-02  7:59 ` [PATCH v2 09/20] x86/entry/64: Remove the RESTORE_..._REGS infrastructure Andy Lutomirski
@ 2017-11-02 10:52   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, dave.hansen, luto, tglx, torvalds, hpa,
	linux-kernel, bpetkov, peterz

Commit-ID:  c39858de696f0cc160a544455e8403d663d577e9
Gitweb:     https://git.kernel.org/tip/c39858de696f0cc160a544455e8403d663d577e9
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:06 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:41 +0100

x86/entry/64: Remove the RESTORE_..._REGS infrastructure

All users of RESTORE_EXTRA_REGS, RESTORE_C_REGS and such, and
REMOVE_PT_GPREGS_FROM_STACK are gone.  Delete the macros.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c32672f6e47c561893316d48e06c7656b1039a36.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/calling.h | 52 ------------------------------------------------
 1 file changed, 52 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0b9dd81..1895a68 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -141,16 +141,6 @@ For 32-bit we have the following conventions - kernel is built with
 	UNWIND_HINT_REGS offset=\offset
 	.endm
 
-	.macro RESTORE_EXTRA_REGS offset=0
-	movq 0*8+\offset(%rsp), %r15
-	movq 1*8+\offset(%rsp), %r14
-	movq 2*8+\offset(%rsp), %r13
-	movq 3*8+\offset(%rsp), %r12
-	movq 4*8+\offset(%rsp), %rbp
-	movq 5*8+\offset(%rsp), %rbx
-	UNWIND_HINT_REGS offset=\offset extra=0
-	.endm
-
 	.macro POP_EXTRA_REGS
 	popq %r15
 	popq %r14
@@ -172,48 +162,6 @@ For 32-bit we have the following conventions - kernel is built with
 	popq %rdi
 	.endm
 
-	.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
-	.if \rstor_r11
-	movq 6*8(%rsp), %r11
-	.endif
-	.if \rstor_r8910
-	movq 7*8(%rsp), %r10
-	movq 8*8(%rsp), %r9
-	movq 9*8(%rsp), %r8
-	.endif
-	.if \rstor_rax
-	movq 10*8(%rsp), %rax
-	.endif
-	.if \rstor_rcx
-	movq 11*8(%rsp), %rcx
-	.endif
-	.if \rstor_rdx
-	movq 12*8(%rsp), %rdx
-	.endif
-	movq 13*8(%rsp), %rsi
-	movq 14*8(%rsp), %rdi
-	UNWIND_HINT_IRET_REGS offset=16*8
-	.endm
-	.macro RESTORE_C_REGS
-	RESTORE_C_REGS_HELPER 1,1,1,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_RAX
-	RESTORE_C_REGS_HELPER 0,1,1,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_RCX
-	RESTORE_C_REGS_HELPER 1,0,1,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_R11
-	RESTORE_C_REGS_HELPER 1,1,0,1,1
-	.endm
-	.macro RESTORE_C_REGS_EXCEPT_RCX_R11
-	RESTORE_C_REGS_HELPER 1,0,0,1,1
-	.endm
-
-	.macro REMOVE_PT_GPREGS_FROM_STACK addskip=0
-	subq $-(15*8+\addskip), %rsp
-	.endm
-
 	.macro icebp
 	.byte 0xf1
 	.endm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] xen, x86/entry/64: Add xen NMI trap entry
  2017-11-02  7:59 ` [PATCH v2 10/20] xen: add xen nmi trap entry Andy Lutomirski
@ 2017-11-02 10:52   ` tip-bot for Juergen Gross
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Juergen Gross @ 2017-11-02 10:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, luto, bpetkov, tglx, dave.hansen, jgross, linux-kernel,
	peterz, brgerst, hpa, torvalds

Commit-ID:  43e4111086a70c78bedb6ad990bee97f17b27a6e
Gitweb:     https://git.kernel.org/tip/43e4111086a70c78bedb6ad990bee97f17b27a6e
Author:     Juergen Gross <jgross@suse.com>
AuthorDate: Thu, 2 Nov 2017 00:59:07 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:42 +0100

xen, x86/entry/64: Add xen NMI trap entry

Instead of trying to execute any NMI via the bare metal's NMI trap
handler use a Xen specific one for PV domains, like we do for e.g.
debug traps. As in a PV domain the NMI is handled via the normal
kernel stack this is the correct thing to do.

This will enable us to get rid of the very fragile and questionable
dependencies between the bare metal NMI handler and Xen assumptions
believed to be broken anyway.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5baf5c0528d58402441550c5770b98e7961e7680.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S    | 2 +-
 arch/x86/include/asm/traps.h | 2 +-
 arch/x86/xen/enlighten_pv.c  | 2 +-
 arch/x86/xen/xen-asm_64.S    | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 5b2f0bc..a3f76ab 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1078,6 +1078,7 @@ idtentry int3			do_int3			has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
 #ifdef CONFIG_XEN
+idtentry xennmi			do_nmi			has_error_code=0
 idtentry xendebug		do_debug		has_error_code=0
 idtentry xenint3		do_int3			has_error_code=0
 #endif
@@ -1240,7 +1241,6 @@ ENTRY(error_exit)
 END(error_exit)
 
 /* Runs on exception stack */
-/* XXX: broken on Xen PV */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
 	/*
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index da3c3a3..e76ce80 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,9 +37,9 @@ asmlinkage void simd_coprocessor_error(void);
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
 asmlinkage void xen_divide_error(void);
+asmlinkage void xen_xennmi(void);
 asmlinkage void xen_xendebug(void);
 asmlinkage void xen_xenint3(void);
-asmlinkage void xen_nmi(void);
 asmlinkage void xen_overflow(void);
 asmlinkage void xen_bounds(void);
 asmlinkage void xen_invalid_op(void);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 69b9def..8da4eff 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -600,7 +600,7 @@ static struct trap_array_entry trap_array[] = {
 #ifdef CONFIG_X86_MCE
 	{ machine_check,               xen_machine_check,               true },
 #endif
-	{ nmi,                         xen_nmi,                         true },
+	{ nmi,                         xen_xennmi,                      true },
 	{ overflow,                    xen_overflow,                    false },
 #ifdef CONFIG_IA32_EMULATION
 	{ entry_INT80_compat,          xen_entry_INT80_compat,          false },
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index dae2cc3..286ecc1 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -29,7 +29,7 @@ xen_pv_trap debug
 xen_pv_trap xendebug
 xen_pv_trap int3
 xen_pv_trap xenint3
-xen_pv_trap nmi
+xen_pv_trap xennmi
 xen_pv_trap overflow
 xen_pv_trap bounds
 xen_pv_trap invalid_op

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: De-Xen-ify our NMI code
  2017-11-02  7:59 ` [PATCH v2 11/20] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
@ 2017-11-02 10:53   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: boris.ostrovsky, torvalds, peterz, hpa, bpetkov, bp, mingo,
	brgerst, luto, linux-kernel, dave.hansen, tglx, jgross

Commit-ID:  929bacec21478a72c78e4f29f98fb799bd00105a
Gitweb:     https://git.kernel.org/tip/929bacec21478a72c78e4f29f98fb799bd00105a
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:08 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:42 +0100

x86/entry/64: De-Xen-ify our NMI code

Xen PV is fundamentally incompatible with our fancy NMI code: it
doesn't use IST at all, and Xen entries clobber two stack slots
below the hardware frame.

Drop Xen PV support from our NMI code entirely.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bfbe711b5ae03f672f8848999a8eb2711efc7f98.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a3f76ab..40e9933 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1240,9 +1240,13 @@ ENTRY(error_exit)
 	jmp	retint_user
 END(error_exit)
 
-/* Runs on exception stack */
+/*
+ * Runs on exception stack.  Xen PV does not go through this path at all,
+ * so we can use real assembly here.
+ */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
+
 	/*
 	 * We allow breakpoints in NMIs. If a breakpoint occurs, then
 	 * the iretq it performs will take us out of NMI context.
@@ -1300,7 +1304,7 @@ ENTRY(nmi)
 	 * stacks lest we corrupt the "NMI executing" variable.
 	 */
 
-	SWAPGS_UNSAFE_STACK
+	swapgs
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1465,7 +1469,7 @@ nested_nmi_out:
 	popq	%rdx
 
 	/* We are returning to kernel mode, so this cannot result in a fault. */
-	INTERRUPT_RETURN
+	iretq
 
 first_nmi:
 	/* Restore rdx. */
@@ -1496,7 +1500,7 @@ first_nmi:
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
 	pushq	$1f		/* RIP */
-	INTERRUPT_RETURN	/* continues at repeat_nmi below */
+	iretq			/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
 #endif
@@ -1571,20 +1575,22 @@ nmi_restore:
 	/*
 	 * Clear "NMI executing".  Set DF first so that we can easily
 	 * distinguish the remaining code between here and IRET from
-	 * the SYSCALL entry and exit paths.  On a native kernel, we
-	 * could just inspect RIP, but, on paravirt kernels,
-	 * INTERRUPT_RETURN can translate into a jump into a
-	 * hypercall page.
+	 * the SYSCALL entry and exit paths.
+	 *
+	 * We arguably should just inspect RIP instead, but I (Andy) wrote
+	 * this code when I had the misapprehension that Xen PV supported
+	 * NMIs, and Xen PV would break that approach.
 	 */
 	std
 	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
 
 	/*
-	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
-	 * stack in a single instruction.  We are returning to kernel
-	 * mode, so this cannot result in a fault.
+	 * iretq reads the "iret" frame and exits the NMI stack in a
+	 * single instruction.  We are returning to kernel mode, so this
+	 * cannot result in a fault.  Similarly, we don't need to worry
+	 * about espfix64 on the way back to kernel mode.
 	 */
-	INTERRUPT_RETURN
+	iretq
 END(nmi)
 
 ENTRY(ignore_sysret)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/32: Pull the MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
  2017-11-02  7:59 ` [PATCH v2 12/20] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
@ 2017-11-02 10:53   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, linux-kernel, torvalds, bpetkov, tglx, mingo, luto, hpa,
	dave.hansen, peterz

Commit-ID:  bd7dc5a6afac719d8ce4092391eef2c7e83c2a75
Gitweb:     https://git.kernel.org/tip/bd7dc5a6afac719d8ce4092391eef2c7e83c2a75
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:09 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:43 +0100

x86/entry/32: Pull the MSR_IA32_SYSENTER_CS update code out of native_load_sp0()

This causes the MSR_IA32_SYSENTER_CS write to move out of the
paravirt callback.  This shouldn't affect Xen PV: Xen already ignores
MSR_IA32_SYSENTER_ESP writes.  In any event, Xen doesn't support
vm86() in a useful way.

Note to any potential backporters: This patch won't break lguest, as
lguest didn't have any SYSENTER support at all.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/75cf09fe03ae778532d0ca6c65aa58e66bc2f90c.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h |  7 -------
 arch/x86/include/asm/switch_to.h | 12 ++++++++++++
 arch/x86/kernel/process_32.c     |  4 +++-
 arch/x86/kernel/process_64.c     |  2 +-
 arch/x86/kernel/vm86_32.c        |  6 +++++-
 5 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b390ff7..0167e3e 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -520,13 +520,6 @@ static inline void
 native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
 {
 	tss->x86_tss.sp0 = thread->sp0;
-#ifdef CONFIG_X86_32
-	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
-	if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
-		tss->x86_tss.ss1 = thread->sysenter_cs;
-		wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
-	}
-#endif
 }
 
 static inline void native_swapgs(void)
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index fcc5cd3..7ae8caf 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -72,4 +72,16 @@ do {									\
 	((last) = __switch_to_asm((prev), (next)));			\
 } while (0)
 
+#ifdef CONFIG_X86_32
+static inline void refresh_sysenter_cs(struct thread_struct *thread)
+{
+	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
+	if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
+		return;
+
+	this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);
+	wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
+}
+#endif
+
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 1196625..0936ed3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -284,9 +284,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	/*
 	 * Reload esp0 and cpu_current_top_of_stack.  This changes
-	 * current_thread_info().
+	 * current_thread_info().  Refresh the SYSENTER configuration in
+	 * case prev or next is vm86.
 	 */
 	load_sp0(tss, next);
+	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 302e7b2..a6ff6d1 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -464,7 +464,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 */
 	this_cpu_write(current_task, next_p);
 
-	/* Reload esp0 and ss1.  This changes current_thread_info(). */
+	/* Reload sp0. */
 	load_sp0(tss, next);
 
 	/*
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 7924a53..5bc1c3a 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -54,6 +54,7 @@
 #include <asm/irq.h>
 #include <asm/traps.h>
 #include <asm/vm86.h>
+#include <asm/switch_to.h>
 
 /*
  * Known problems:
@@ -149,6 +150,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
 	load_sp0(tss, &tsk->thread);
+	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	put_cpu();
 
@@ -368,8 +370,10 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	/* make room for real-mode segments */
 	tsk->thread.sp0 += 16;
 
-	if (static_cpu_has(X86_FEATURE_SEP))
+	if (static_cpu_has(X86_FEATURE_SEP)) {
 		tsk->thread.sysenter_cs = 0;
+		refresh_sysenter_cs(&tsk->thread);
+	}
 
 	load_sp0(tss, &tsk->thread);
 	put_cpu();

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Pass SP0 directly to load_sp0()
  2017-11-02  7:59 ` [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
  2017-11-02  9:48   ` Ingo Molnar
@ 2017-11-02 10:53   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, bp, dave.hansen, peterz, mingo, luto, tglx, hpa,
	bpetkov, torvalds, linux-kernel

Commit-ID:  da51da189a24bb9b7e2d5a123be096e51a4695a5
Gitweb:     https://git.kernel.org/tip/da51da189a24bb9b7e2d5a123be096e51a4695a5
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:10 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:44 +0100

x86/entry/64: Pass SP0 directly to load_sp0()

load_sp0() had an odd signature:

  void load_sp0(struct tss_struct *tss, struct thread_struct *thread);

Simplify it to:

  void load_sp0(unsigned long sp0);

Also simplify a few get_cpu()/put_cpu() sequences to
preempt_disable()/preempt_enable().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2655d8b42ed940aa384fe18ee1129bbbcf730a08.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/paravirt.h       |  5 ++---
 arch/x86/include/asm/paravirt_types.h |  2 +-
 arch/x86/include/asm/processor.h      |  9 ++++-----
 arch/x86/kernel/cpu/common.c          |  4 ++--
 arch/x86/kernel/process_32.c          |  2 +-
 arch/x86/kernel/process_64.c          |  2 +-
 arch/x86/kernel/vm86_32.c             | 14 ++++++--------
 arch/x86/xen/enlighten_pv.c           |  7 +++----
 8 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 12deec7..43d4f90 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,10 +15,9 @@
 #include <linux/cpumask.h>
 #include <asm/frame.h>
 
-static inline void load_sp0(struct tss_struct *tss,
-			     struct thread_struct *thread)
+static inline void load_sp0(unsigned long sp0)
 {
-	PVOP_VCALL2(pv_cpu_ops.load_sp0, tss, thread);
+	PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0);
 }
 
 /* The paravirtualized CPUID instruction. */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 280d94c..a916788 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -133,7 +133,7 @@ struct pv_cpu_ops {
 	void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries);
 	void (*free_ldt)(struct desc_struct *ldt, unsigned entries);
 
-	void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t);
+	void (*load_sp0)(unsigned long sp0);
 
 	void (*set_iopl_mask)(unsigned mask);
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0167e3e..064b847 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -517,9 +517,9 @@ static inline void native_set_iopl_mask(unsigned mask)
 }
 
 static inline void
-native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
+native_load_sp0(unsigned long sp0)
 {
-	tss->x86_tss.sp0 = thread->sp0;
+	this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
 }
 
 static inline void native_swapgs(void)
@@ -544,10 +544,9 @@ static inline unsigned long current_top_of_stack(void)
 #else
 #define __cpuid			native_cpuid
 
-static inline void load_sp0(struct tss_struct *tss,
-			    struct thread_struct *thread)
+static inline void load_sp0(unsigned long sp0)
 {
-	native_load_sp0(tss, thread);
+	native_load_sp0(sp0);
 }
 
 #define set_iopl_mask native_set_iopl_mask
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 03bb004..4e7fb9c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1570,7 +1570,7 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, me);
 
-	load_sp0(t, &current->thread);
+	load_sp0(current->thread.sp0);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
 	load_mm_ldt(&init_mm);
@@ -1625,7 +1625,7 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, curr);
 
-	load_sp0(t, thread);
+	load_sp0(thread->sp0);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
 	load_mm_ldt(&init_mm);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0936ed3..40b8587 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * current_thread_info().  Refresh the SYSENTER configuration in
 	 * case prev or next is vm86.
 	 */
-	load_sp0(tss, next);
+	load_sp0(next->sp0);
 	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a6ff6d1..2124304 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 
 	/* Reload sp0. */
-	load_sp0(tss, next);
+	load_sp0(next->sp0);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 5bc1c3a..0f1d92c 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -94,7 +94,6 @@
 
 void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 {
-	struct tss_struct *tss;
 	struct task_struct *tsk = current;
 	struct vm86plus_struct __user *user;
 	struct vm86 *vm86 = current->thread.vm86;
@@ -146,13 +145,13 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 		do_exit(SIGSEGV);
 	}
 
-	tss = &per_cpu(cpu_tss, get_cpu());
+	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	load_sp0(tss, &tsk->thread);
+	load_sp0(tsk->thread.sp0);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
-	put_cpu();
+	preempt_enable();
 
 	memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));
 
@@ -238,7 +237,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
 
 static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 {
-	struct tss_struct *tss;
 	struct task_struct *tsk = current;
 	struct vm86 *vm86 = tsk->thread.vm86;
 	struct kernel_vm86_regs vm86regs;
@@ -366,8 +364,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	vm86->saved_sp0 = tsk->thread.sp0;
 	lazy_save_gs(vm86->regs32.gs);
 
-	tss = &per_cpu(cpu_tss, get_cpu());
 	/* make room for real-mode segments */
+	preempt_disable();
 	tsk->thread.sp0 += 16;
 
 	if (static_cpu_has(X86_FEATURE_SEP)) {
@@ -375,8 +373,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	load_sp0(tss, &tsk->thread);
-	put_cpu();
+	load_sp0(tsk->thread.sp0);
+	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)
 		mark_screen_rdonly(tsk->mm);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 8da4eff..e7b2130 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -810,15 +810,14 @@ static void __init xen_write_gdt_entry_boot(struct desc_struct *dt, int entry,
 	}
 }
 
-static void xen_load_sp0(struct tss_struct *tss,
-			 struct thread_struct *thread)
+static void xen_load_sp0(unsigned long sp0)
 {
 	struct multicall_space mcs;
 
 	mcs = xen_mc_entry(0);
-	MULTI_stack_switch(mcs.mc, __KERNEL_DS, thread->sp0);
+	MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0);
 	xen_mc_issue(PARAVIRT_LAZY_CPU);
-	tss->x86_tss.sp0 = thread->sp0;
+	this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
 }
 
 void xen_set_iopl_mask(unsigned mask)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry: Add task_top_of_stack() to find the top of a task's stack
  2017-11-02  7:59 ` [PATCH v2 14/20] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
@ 2017-11-02 10:54   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bpetkov, brgerst, linux-kernel, peterz, mingo, hpa, luto,
	dave.hansen, tglx, torvalds

Commit-ID:  3500130b84a3cdc5b6796eba1daf178944935efe
Gitweb:     https://git.kernel.org/tip/3500130b84a3cdc5b6796eba1daf178944935efe
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:11 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:44 +0100

x86/entry: Add task_top_of_stack() to find the top of a task's stack

This will let us get rid of a few places that hardcode accesses to
thread.sp0.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b49b3f95a8ff858c40c9b0f5b32be0355324327d.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 064b847..ad59cec 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -795,6 +795,8 @@ static inline void spin_lock_prefetch(const void *x)
 #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \
 			   TOP_OF_KERNEL_STACK_PADDING)
 
+#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
+
 #ifdef CONFIG_X86_32
 /*
  * User space process size: 3GB (default).

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/xen/64, x86/entry/64: Clean up SP code in cpu_initialize_context()
  2017-11-02  7:59 ` [PATCH v2 15/20] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
  2017-11-02  9:56   ` Juergen Gross
@ 2017-11-02 10:54   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, linux-kernel, bpetkov, hpa, mingo, tglx, torvalds, luto,
	brgerst, jgross, boris.ostrovsky, dave.hansen

Commit-ID:  f16b3da1dc936c0f8121741d0a1731bf242f2f56
Gitweb:     https://git.kernel.org/tip/f16b3da1dc936c0f8121741d0a1731bf242f2f56
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:12 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:45 +0100

x86/xen/64, x86/entry/64: Clean up SP code in cpu_initialize_context()

I'm removing thread_struct::sp0, and Xen's usage of it is slightly
dubious and unnecessary.  Use appropriate helpers instead.

While we're at at, reorder the code slightly to make it more obvious
what's going on.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d5b9a3da2b47c68325bd2bbe8f82d9554dee0d0f.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/xen/smp_pv.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 5147140..8c0e047 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -13,6 +13,7 @@
  * single-threaded.
  */
 #include <linux/sched.h>
+#include <linux/sched/task_stack.h>
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
@@ -293,12 +294,19 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 #endif
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
+	/*
+	 * Bring up the CPU in cpu_bringup_and_idle() with the stack
+	 * pointing just below where pt_regs would be if it were a normal
+	 * kernel entry.
+	 */
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 	ctxt->flags = VGCF_IN_KERNEL;
 	ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 	ctxt->user_regs.ds = __USER_DS;
 	ctxt->user_regs.es = __USER_DS;
 	ctxt->user_regs.ss = __KERNEL_DS;
+	ctxt->user_regs.cs = __KERNEL_CS;
+	ctxt->user_regs.esp = (unsigned long)task_pt_regs(idle);
 
 	xen_copy_trap_info(ctxt->trap_ctxt);
 
@@ -313,8 +321,13 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 	ctxt->gdt_frames[0] = gdt_mfn;
 	ctxt->gdt_ents      = GDT_ENTRIES;
 
+	/*
+	 * Set SS:SP that Xen will use when entering guest kernel mode
+	 * from guest user mode.  Subsequent calls to load_sp0() can
+	 * change this value.
+	 */
 	ctxt->kernel_ss = __KERNEL_DS;
-	ctxt->kernel_sp = idle->thread.sp0;
+	ctxt->kernel_sp = task_top_of_stack(idle);
 
 #ifdef CONFIG_X86_32
 	ctxt->event_callback_cs     = __KERNEL_CS;
@@ -326,10 +339,8 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 		(unsigned long)xen_hypervisor_callback;
 	ctxt->failsafe_callback_eip =
 		(unsigned long)xen_failsafe_callback;
-	ctxt->user_regs.cs = __KERNEL_CS;
 	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
 
-	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt))
 		BUG();

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Stop initializing TSS.sp0 at boot
  2017-11-02  7:59 ` [PATCH v2 16/20] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
@ 2017-11-02 10:55   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, hpa, peterz, tglx, luto, bpetkov,
	dave.hansen, torvalds, brgerst

Commit-ID:  20bb83443ea79087b5e5f8dab4e9d80bb9bf7acb
Gitweb:     https://git.kernel.org/tip/20bb83443ea79087b5e5f8dab4e9d80bb9bf7acb
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:13 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:46 +0100

x86/entry/64: Stop initializing TSS.sp0 at boot

In my quest to get rid of thread_struct::sp0, I want to clean up or
remove all of its readers.  Two of them are in cpu_init() (32-bit and
64-bit), and they aren't needed.  This is because we never enter
userspace at all on the threads that CPUs are initialized in.

Poison the initial TSS.sp0 and stop initializing it on CPU init.

The comment text mostly comes from Dave Hansen.  Thanks!

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ee4a00540ad28c6cff475fbcc7769a4460acc861.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/common.c | 13 ++++++++++---
 arch/x86/kernel/process.c    |  8 +++++++-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4e7fb9c..cdf79ab 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1570,9 +1570,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, me);
 
-	load_sp0(current->thread.sp0);
+	/*
+	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
+	 * task never enters user mode.
+	 */
 	set_tss_desc(cpu, t);
 	load_TR_desc();
+
 	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
@@ -1594,7 +1598,6 @@ void cpu_init(void)
 	int cpu = smp_processor_id();
 	struct task_struct *curr = current;
 	struct tss_struct *t = &per_cpu(cpu_tss, cpu);
-	struct thread_struct *thread = &curr->thread;
 
 	wait_for_master_cpu(cpu);
 
@@ -1625,9 +1628,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, curr);
 
-	load_sp0(thread->sp0);
+	/*
+	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
+	 * task never enters user mode.
+	 */
 	set_tss_desc(cpu, t);
 	load_TR_desc();
+
 	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd6b85f..ff8a9ac 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -48,7 +48,13 @@
  */
 __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 	.x86_tss = {
-		.sp0 = TOP_OF_INIT_STACK,
+		/*
+		 * .sp0 is only used when entering ring 0 from a lower
+		 * privilege level.  Since the init task never runs anything
+		 * but ring 0 code, there is no need for a valid value here.
+		 * Poison it.
+		 */
+		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
 		.ss1 = __KERNEL_CS,

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Remove all remaining direct thread_struct::sp0 reads
  2017-11-02  7:59 ` [PATCH v2 17/20] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
@ 2017-11-02 10:55   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, dave.hansen, hpa, brgerst, tglx, bpetkov, linux-kernel,
	torvalds, luto, peterz, bp

Commit-ID:  46f5a10a721ce8dce8cc8fe55279b49e1c6b3288
Gitweb:     https://git.kernel.org/tip/46f5a10a721ce8dce8cc8fe55279b49e1c6b3288
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:14 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:47 +0100

x86/entry/64: Remove all remaining direct thread_struct::sp0 reads

The only remaining readers in context switch code or vm86(), and
they all just want to update TSS.sp0 to match the current task.
Replace them all with a new helper update_sp0().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2d231687f4ff288c9d9e98d7861b7df374246ac3.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/switch_to.h | 6 ++++++
 arch/x86/kernel/process_32.c     | 2 +-
 arch/x86/kernel/process_64.c     | 2 +-
 arch/x86/kernel/vm86_32.c        | 4 ++--
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 7ae8caf..54e64d9 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -84,4 +84,10 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 }
 #endif
 
+/* This is used when switching tasks or entering/exiting vm86 mode. */
+static inline void update_sp0(struct task_struct *task)
+{
+	load_sp0(task->thread.sp0);
+}
+
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 40b8587..45bf0c5 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * current_thread_info().  Refresh the SYSENTER configuration in
 	 * case prev or next is vm86.
 	 */
-	load_sp0(next->sp0);
+	update_sp0(next_p);
 	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2124304..45e3809 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 
 	/* Reload sp0. */
-	load_sp0(next->sp0);
+	update_sp0(next_p);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 0f1d92c..a7b44c7 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -148,7 +148,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	load_sp0(tsk->thread.sp0);
+	update_sp0(tsk);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	preempt_enable();
@@ -373,7 +373,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	load_sp0(tsk->thread.sp0);
+	update_sp0(tsk);
 	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/32: Fix cpu_current_top_of_stack initialization at boot
  2017-11-02  7:59 ` [PATCH v2 18/20] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
@ 2017-11-02 10:56   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, luto, mingo, dave.hansen, hpa, bp, torvalds, brgerst,
	bpetkov, linux-kernel, tglx

Commit-ID:  cd493a6deb8b78eca280d05f7fa73fd69403ae29
Gitweb:     https://git.kernel.org/tip/cd493a6deb8b78eca280d05f7fa73fd69403ae29
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:15 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:47 +0100

x86/entry/32: Fix cpu_current_top_of_stack initialization at boot

cpu_current_top_of_stack's initialization forgot about
TOP_OF_KERNEL_STACK_PADDING.  This bug didn't matter because the
idle threads never enter user mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e5e370a7e6e4fddd1c4e4cf619765d96bb874b21.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/smpboot.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ad59edd..06c18fe 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -961,8 +961,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
-	per_cpu(cpu_current_top_of_stack, cpu) =
-		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
+	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
 #else
 	initial_gs = per_cpu_offset(cpu);
 #endif

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Remove thread_struct::sp0
  2017-11-02  7:59 ` [PATCH v2 19/20] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
@ 2017-11-02 10:56   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bpetkov, mingo, hpa, brgerst, tglx, dave.hansen, torvalds, luto,
	linux-kernel, peterz

Commit-ID:  d375cf1530595e33961a8844192cddab913650e3
Gitweb:     https://git.kernel.org/tip/d375cf1530595e33961a8844192cddab913650e3
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:16 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:48 +0100

x86/entry/64: Remove thread_struct::sp0

On x86_64, we can easily calculate sp0 when needed instead of
storing it in thread_struct.

On x86_32, a similar cleanup would be possible, but it would require
cleaning up the vm86 code first, and that can wait for a later
cleanup series.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/719cd9c66c548c4350d98a90f050aee8b17f8919.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/compat.h    |  1 +
 arch/x86/include/asm/processor.h | 28 +++++++++-------------------
 arch/x86/include/asm/switch_to.h |  6 ++++++
 arch/x86/kernel/process_64.c     |  1 -
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 5343c19..948b6d8 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -6,6 +6,7 @@
  */
 #include <linux/types.h>
 #include <linux/sched.h>
+#include <linux/sched/task_stack.h>
 #include <asm/processor.h>
 #include <asm/user32.h>
 #include <asm/unistd.h>
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ad59cec..ae2ae6d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -430,7 +430,9 @@ typedef struct {
 struct thread_struct {
 	/* Cached TLS descriptors: */
 	struct desc_struct	tls_array[GDT_ENTRY_TLS_ENTRIES];
+#ifdef CONFIG_X86_32
 	unsigned long		sp0;
+#endif
 	unsigned long		sp;
 #ifdef CONFIG_X86_32
 	unsigned long		sysenter_cs;
@@ -797,6 +799,13 @@ static inline void spin_lock_prefetch(const void *x)
 
 #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
 
+#define task_pt_regs(task) \
+({									\
+	unsigned long __ptr = (unsigned long)task_stack_page(task);	\
+	__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;		\
+	((struct pt_regs *)__ptr) - 1;					\
+})
+
 #ifdef CONFIG_X86_32
 /*
  * User space process size: 3GB (default).
@@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
 	.addr_limit		= KERNEL_DS,				  \
 }
 
-/*
- * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
- * This is necessary to guarantee that the entire "struct pt_regs"
- * is accessible even if the CPU haven't stored the SS/ESP registers
- * on the stack (interrupt gate does not save these registers
- * when switching to the same priv ring).
- * Therefore beware: accessing the ss/esp fields of the
- * "struct pt_regs" is possible, but they may contain the
- * completely wrong values.
- */
-#define task_pt_regs(task) \
-({									\
-	unsigned long __ptr = (unsigned long)task_stack_page(task);	\
-	__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;		\
-	((struct pt_regs *)__ptr) - 1;					\
-})
-
 #define KSTK_ESP(task)		(task_pt_regs(task)->sp)
 
 #else
@@ -866,11 +858,9 @@ static inline void spin_lock_prefetch(const void *x)
 #define STACK_TOP_MAX		TASK_SIZE_MAX
 
 #define INIT_THREAD  {						\
-	.sp0			= TOP_OF_INIT_STACK,		\
 	.addr_limit		= KERNEL_DS,			\
 }
 
-#define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 54e64d9..010cd6e 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_SWITCH_TO_H
 #define _ASM_X86_SWITCH_TO_H
 
+#include <linux/sched/task_stack.h>
+
 struct task_struct; /* one of the stranger aspects of C forward declarations */
 
 struct task_struct *__switch_to_asm(struct task_struct *prev,
@@ -87,7 +89,11 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 /* This is used when switching tasks or entering/exiting vm86 mode. */
 static inline void update_sp0(struct task_struct *task)
 {
+#ifdef CONFIG_X86_32
 	load_sp0(task->thread.sp0);
+#else
+	load_sp0(task_top_of_stack(task));
+#endif
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 45e3809..eeeb34f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -274,7 +274,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	struct inactive_task_frame *frame;
 	struct task_struct *me = current;
 
-	p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
 	childregs = task_pt_regs(p);
 	fork_frame = container_of(childregs, struct fork_frame, regs);
 	frame = &fork_frame->frame;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/traps: Use a new on_thread_stack() helper to clean up an assertion
  2017-11-02  7:59 ` [PATCH v2 20/20] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
@ 2017-11-02 10:56   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-11-02 10:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, brgerst, tglx, hpa, torvalds, dave.hansen, linux-kernel,
	mingo, bp, luto, bpetkov

Commit-ID:  3383642c2f9d4f5b4fa37436db4a109a1a10018c
Gitweb:     https://git.kernel.org/tip/3383642c2f9d4f5b4fa37436db4a109a1a10018c
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Nov 2017 00:59:17 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 11:04:49 +0100

x86/traps: Use a new on_thread_stack() helper to clean up an assertion

Let's keep the stack-related logic together rather than open-coding
a comparison in an assertion in the traps code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/856b15bee1f55017b8f79d3758b0d51c48a08cf8.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h | 6 ++++++
 arch/x86/kernel/traps.c          | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ae2ae6d..f10dae1 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -541,6 +541,12 @@ static inline unsigned long current_top_of_stack(void)
 #endif
 }
 
+static inline bool on_thread_stack(void)
+{
+	return (unsigned long)(current_top_of_stack() -
+			       current_stack_pointer) < THREAD_SIZE;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 67db4f4..42a9c44 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs *regs)
 	 * will catch asm bugs and any attempt to use ist_preempt_enable
 	 * from double_fault.
 	 */
-	BUG_ON((unsigned long)(current_top_of_stack() -
-			       current_stack_pointer) >= THREAD_SIZE);
+	BUG_ON(!on_thread_stack());
 
 	preempt_enable_no_resched();
 }

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH] x86/entry/64: Shorten TEST instructions
  2017-11-02 10:50   ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Borislav Petkov
@ 2017-11-02 12:09     ` Borislav Petkov
  2017-11-02 12:48       ` [tip:x86/asm] " tip-bot for Borislav Petkov
  0 siblings, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2017-11-02 12:09 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Thu, Nov 02, 2017 at 11:50:18AM +0100, Borislav Petkov wrote:
> If these paths are slow and adding a TEST and a Jcc would give us the
> additional sanity-checking, then I don't see any downside to it.

Damn, that really shows. Almost 900K iterations less.

./lseek1_processes -s 50

before: average:11994233
after : average:11134599

So we'll have to remember to enable CONFIG_DEBUG_ENTRY from time to
time. :-\

Ok, let's then only shorten the TEST insns:

---
From: Borislav Petkov <bp@suse.de>
Date: Thu, 2 Nov 2017 13:00:49 +0100
Subject: [PATCH] x86/entry/64: Shorten TEST instructions

Convert TESTL to TESTB and save 3 bytes per callsite.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/entry/entry_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 40e9933a2d33..84263c79a119 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -620,7 +620,7 @@ GLOBAL(retint_user)
 GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates user mode. */
-	testl	$3, CS(%rsp)
+	testb	$3, CS(%rsp)
 	jnz	1f
 	ud2
 1:
@@ -653,7 +653,7 @@ retint_kernel:
 GLOBAL(restore_regs_and_return_to_kernel)
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates kernel mode. */
-	testl	$3, CS(%rsp)
+	testb	$3, CS(%rsp)
 	jz	1f
 	ud2
 1:
-- 
2.13.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:x86/asm] x86/entry/64: Shorten TEST instructions
  2017-11-02 12:09     ` [PATCH] x86/entry/64: Shorten TEST instructions Borislav Petkov
@ 2017-11-02 12:48       ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-11-02 12:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, torvalds, luto, tglx, hpa, peterz, brgerst, dave.hansen,
	linux-kernel, bp

Commit-ID:  1e4c4f610f774df6088d7c065b2dd4d22adba698
Gitweb:     https://git.kernel.org/tip/1e4c4f610f774df6088d7c065b2dd4d22adba698
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Thu, 2 Nov 2017 13:09:26 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Nov 2017 13:45:37 +0100

x86/entry/64: Shorten TEST instructions

Convert TESTL to TESTB and save 3 bytes per callsite.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171102120926.4srwerqrr7g72e2k@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 40e9933..84263c7 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -620,7 +620,7 @@ GLOBAL(retint_user)
 GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates user mode. */
-	testl	$3, CS(%rsp)
+	testb	$3, CS(%rsp)
 	jnz	1f
 	ud2
 1:
@@ -653,7 +653,7 @@ retint_kernel:
 GLOBAL(restore_regs_and_return_to_kernel)
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates kernel mode. */
-	testl	$3, CS(%rsp)
+	testb	$3, CS(%rsp)
 	jz	1f
 	ud2
 1:

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-11-02 12:53 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02  7:58 [PATCH v2 00/20] Pile o' entry/exit/sp0 changes Andy Lutomirski
2017-11-02  7:58 ` [PATCH v2 01/20] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
2017-11-02 10:49   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:58 ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
2017-11-02 10:49   ` [tip:x86/asm] x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths tip-bot for Andy Lutomirski
2017-11-02 10:50   ` [PATCH v2 02/20] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Borislav Petkov
2017-11-02 12:09     ` [PATCH] x86/entry/64: Shorten TEST instructions Borislav Petkov
2017-11-02 12:48       ` [tip:x86/asm] " tip-bot for Borislav Petkov
2017-11-02  7:59 ` [PATCH v2 03/20] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
2017-11-02 10:49   ` [tip:x86/asm] x86/entry/64: Move SWAPGS into the common IRET-to-usermode path tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 04/20] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
2017-11-02 10:50   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 05/20] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
2017-11-02 10:50   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 06/20] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
2017-11-02 10:51   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 07/20] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
2017-11-02 10:51   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 08/20] x86/entry/64: Use POP instead of MOV to restore regs on NMI return Andy Lutomirski
2017-11-02 10:51   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 09/20] x86/entry/64: Remove the RESTORE_..._REGS infrastructure Andy Lutomirski
2017-11-02 10:52   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 10/20] xen: add xen nmi trap entry Andy Lutomirski
2017-11-02 10:52   ` [tip:x86/asm] xen, x86/entry/64: Add xen NMI " tip-bot for Juergen Gross
2017-11-02  7:59 ` [PATCH v2 11/20] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
2017-11-02 10:53   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 12/20] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
2017-11-02 10:53   ` [tip:x86/asm] x86/entry/32: Pull the " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 13/20] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
2017-11-02  9:48   ` Ingo Molnar
2017-11-02  9:53     ` Ingo Molnar
2017-11-02 10:32     ` Andy Lutomirski
2017-11-02 10:53   ` [tip:x86/asm] x86/entry/64: Pass SP0 " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 14/20] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
2017-11-02 10:54   ` [tip:x86/asm] x86/entry: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 15/20] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
2017-11-02  9:56   ` Juergen Gross
2017-11-02 10:54   ` [tip:x86/asm] x86/xen/64, x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 16/20] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
2017-11-02 10:55   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 17/20] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
2017-11-02 10:55   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 18/20] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
2017-11-02 10:56   ` [tip:x86/asm] x86/entry/32: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 19/20] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
2017-11-02 10:56   ` [tip:x86/asm] x86/entry/64: " tip-bot for Andy Lutomirski
2017-11-02  7:59 ` [PATCH v2 20/20] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
2017-11-02 10:56   ` [tip:x86/asm] " tip-bot for Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).