All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86, ia32entry: Use SYSRETL to return from SYSENTER
@ 2015-04-03  0:12 Andy Lutomirski
  2015-04-03  8:27 ` [tip:x86/asm] x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER tip-bot for Andy Lutomirski
  0 siblings, 1 reply; 2+ messages in thread
From: Andy Lutomirski @ 2015-04-03  0:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List, X86 ML, Ingo Molnar
  Cc: Denys Vlasenko, Andy Lutomirski, Linus Torvalds, Denys Vlasenko,
	H. Peter Anvin, Borislav Petkov

SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked with
usergs and IRQs on.  That means that we rely on STI to correctly
mask interrupts for one instruction.  This is okay by itself, but
the semantics with respect to NMIs are unclear.

Avoid the whole issue by using SYSRETL instead.  For background,
Intel CPUs don't allow SYSCALL from compat mode, but they do allow
SYSRETL back to compat mode.  Go figure.

To avoid doing too much at once, this doesn't revamp the calling
convention.  We still return with EBP, EDX, and ECX on the user
stack.

Oddly this seems to be 30 cycles or so faster.  Avoiding POPFQ and
STI will account for under half of that, I think, so my best guess
is that Intel just optimizes SYSRET much better than SYSEXIT.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

I'll send a patch tomorrow to delete the associated paravirt code once
the kbuild bot has had a while to chew on it.

Changes from RFC:
 - Capitalize references to opcodes (Borislav)
 - Add much better comments (Denys)
 - Drop Cc: stable (Ingo)

 arch/x86/ia32/ia32entry.S | 53 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 8d01cce7b6b8..0fa5d8750399 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -186,28 +186,55 @@ sysenter_dispatch:
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	sysexit_audit
 sysexit_from_sys_call:
+	/*
+	 * NB: sysexit is not obviously safe for 64-bit kernels -- an
+	 * NMI between sti and sysexit has poorly specified behavior,
+	 * and and NMI followed by an IRQ with usergs is fatal.  So
+	 * we just pretend we're using sysexit but we really use
+	 * sysretl instead.
+	 *
+	 * This code path is still called sysexit because it pairs
+	 * with sysenter and it uses the sysenter calling convention.
+	 */
 	andl    $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
-	/* clear IF, that popfq doesn't enable interrupts early */
-	andl	$~0x200,EFLAGS(%rsp)
-	movl	RIP(%rsp),%edx		/* User %eip */
-	CFI_REGISTER rip,rdx
+	movl	RIP(%rsp),%ecx		/* User %eip */
+	CFI_REGISTER rip,rcx
 	RESTORE_RSI_RDI
-	/* pop everything except ss,rsp,rflags slots */
-	REMOVE_PT_GPREGS_FROM_STACK 3*8
+	xorl	%edx,%edx		/* avoid info leaks */
 	xorq	%r8,%r8
 	xorq	%r9,%r9
 	xorq	%r10,%r10
-	xorq	%r11,%r11
-	popfq_cfi
+	movl	EFLAGS(%rsp),%r11d	/* User eflags */
 	/*CFI_RESTORE rflags*/
-	popq_cfi %rcx				/* User %esp */
-	CFI_REGISTER rsp,rcx
 	TRACE_IRQS_ON
+
 	/*
-	 * 32bit SYSEXIT restores eip from edx, esp from ecx.
-	 * cs and ss are loaded from MSRs.
+	 * SYSRETL works even on Intel CPUs.  Use it in preference to SYSEXIT,
+	 * since it avoids a dicey window with interrupts enabled.
 	 */
-	ENABLE_INTERRUPTS_SYSEXIT32
+	movl	RSP(%rsp),%esp
+
+	/*
+	 * USERGS_SYSRET32 does:
+	 *  gsbase = user's gs base
+	 *  eip = ecx
+	 *  rflags = r11
+	 *  cs = __USER32_CS
+	 *  ss = __USER_DS
+	 *
+	 * The prologue set RIP(%rsp) to VDSO32_SYSENTER_RETURN, which does:
+	 *
+	 *  pop %ebp
+	 *  pop %edx
+	 *  pop %ecx
+	 *
+	 * Therefore, we invoke SYSRETL with edx and r8-r10 zeroed to
+	 * avoid info leaks.  r11 ends up with VDSO32_SYSENTER_RETURN's
+	 * address (already known to user code), and r12-r15 are
+	 * callee-saved and therefore don't contain any interesting
+	 * kernel data.
+	 */
+	USERGS_SYSRET32
 
 	CFI_RESTORE_STATE
 
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [tip:x86/asm] x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER
  2015-04-03  0:12 [PATCH v2] x86, ia32entry: Use SYSRETL to return from SYSENTER Andy Lutomirski
@ 2015-04-03  8:27 ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 2+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-04-03  8:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vda.linux, tglx, luto, brgerst, torvalds, hpa, linux-kernel,
	dvlasenk, bp, luto, mingo

Commit-ID:  4214a16b02971c60960afd675d03544e109e0d75
Gitweb:     http://git.kernel.org/tip/4214a16b02971c60960afd675d03544e109e0d75
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 2 Apr 2015 17:12:12 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 3 Apr 2015 09:14:00 +0200

x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER

SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
with usergs and IRQs on.  That means that we rely on STI to
correctly mask interrupts for one instruction.  This is okay by
itself, but the semantics with respect to NMIs are unclear.

Avoid the whole issue by using SYSRETL instead.  For background,
Intel CPUs don't allow SYSCALL from compat mode, but they do
allow SYSRETL back to compat mode.  Go figure.

To avoid doing too much at once, this doesn't revamp the calling
convention.  We still return with EBP, EDX, and ECX on the user
stack.

Oddly this seems to be 30 cycles or so faster.  Avoiding POPFQ
and STI will account for under half of that, I think, so my best
guess is that Intel just optimizes SYSRET much better than
SYSEXIT.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/ia32/ia32entry.S | 53 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 8d01cce..5d8f987 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -186,28 +186,55 @@ sysenter_dispatch:
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	sysexit_audit
 sysexit_from_sys_call:
+	/*
+	 * NB: SYSEXIT is not obviously safe for 64-bit kernels -- an
+	 * NMI between STI and SYSEXIT has poorly specified behavior,
+	 * and and NMI followed by an IRQ with usergs is fatal.  So
+	 * we just pretend we're using SYSEXIT but we really use
+	 * SYSRETL instead.
+	 *
+	 * This code path is still called 'sysexit' because it pairs
+	 * with 'sysenter' and it uses the SYSENTER calling convention.
+	 */
 	andl    $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
-	/* clear IF, that popfq doesn't enable interrupts early */
-	andl	$~0x200,EFLAGS(%rsp)
-	movl	RIP(%rsp),%edx		/* User %eip */
-	CFI_REGISTER rip,rdx
+	movl	RIP(%rsp),%ecx		/* User %eip */
+	CFI_REGISTER rip,rcx
 	RESTORE_RSI_RDI
-	/* pop everything except ss,rsp,rflags slots */
-	REMOVE_PT_GPREGS_FROM_STACK 3*8
+	xorl	%edx,%edx		/* avoid info leaks */
 	xorq	%r8,%r8
 	xorq	%r9,%r9
 	xorq	%r10,%r10
-	xorq	%r11,%r11
-	popfq_cfi
+	movl	EFLAGS(%rsp),%r11d	/* User eflags */
 	/*CFI_RESTORE rflags*/
-	popq_cfi %rcx				/* User %esp */
-	CFI_REGISTER rsp,rcx
 	TRACE_IRQS_ON
+
 	/*
-	 * 32bit SYSEXIT restores eip from edx, esp from ecx.
-	 * cs and ss are loaded from MSRs.
+	 * SYSRETL works even on Intel CPUs.  Use it in preference to SYSEXIT,
+	 * since it avoids a dicey window with interrupts enabled.
 	 */
-	ENABLE_INTERRUPTS_SYSEXIT32
+	movl	RSP(%rsp),%esp
+
+	/*
+	 * USERGS_SYSRET32 does:
+	 *  gsbase = user's gs base
+	 *  eip = ecx
+	 *  rflags = r11
+	 *  cs = __USER32_CS
+	 *  ss = __USER_DS
+	 *
+	 * The prologue set RIP(%rsp) to VDSO32_SYSENTER_RETURN, which does:
+	 *
+	 *  pop %ebp
+	 *  pop %edx
+	 *  pop %ecx
+	 *
+	 * Therefore, we invoke SYSRETL with EDX and R8-R10 zeroed to
+	 * avoid info leaks.  R11 ends up with VDSO32_SYSENTER_RETURN's
+	 * address (already known to user code), and R12-R15 are
+	 * callee-saved and therefore don't contain any interesting
+	 * kernel data.
+	 */
+	USERGS_SYSRET32
 
 	CFI_RESTORE_STATE
 

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-04-03  8:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-03  0:12 [PATCH v2] x86, ia32entry: Use SYSRETL to return from SYSENTER Andy Lutomirski
2015-04-03  8:27 ` [tip:x86/asm] x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER tip-bot for Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.