All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
@ 2016-02-29  5:28 Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER
path.  Little did he know...

This series makes the observed behavior of SYSENTER wrt flags the same
for all sane flags and kernel bitnesses.  That is, SYSENTER preserves
flags now unless you do a syscall that explicitly changes flags, and
the HW flags that the syscall executes with are sanitized.  This
includes NT, TF, AC and all arithmetic flags.  Prior to this series,
32-bit kernels clobbered TF and the arithmetic flags and behaved
highly erratically if NT was set.  (If IF is cleared by evil userspace
when SYSENTER starts, IF will be set again on return.  There's nothing
the kernel can do about this -- SYSENTER inherently forgets the state
of IF.)

This series speeds up SYSENTER on all kernels by a surprisingly large
amount on Skylake because it eliminates an unconditional CLAC.

While SYSENTER used to handle TF correctly as far as I can tell on
64-bit kernels, the means by which it did so was heavily tangled up in
the ptrace single-step logic.  It now works just like all the other
kernel entries except insofar as do_debug has a simple special case
for it.  Relatedly, the bizarre and poorly explained old fixup in
do_debug is now hidden behind a WARN_ON_ONCE in preparation for
deleting it at some point.

The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels
used to be both terrifying and incorrect.  (It doesn't appear to have
been exploitably bad, but the reason for that is subtle, and the code
was certainy more fragile than it deserved to me.)  We still need a
special fixup, but it's much simpler now.

While I was doing all this, I also noticed that DR6 and BTF handling
in do_debug was a bit off.  Two of the patches in here try to fix it
up.

Have fun!

tl;dr: Cleanups and sanity fixes here, but no security fixes, and I
don't think anything needs to be backported or put in x86/urgent.

This series applies to the result of merging tip:x86/asm and
tip:x86/urgent.  I've been testing on a somewhat bastardized base,
because tip currently doesn't work on my laptop in 32-bit mode.  (That
bug is fixed in Linus' tree.)

Andy Lutomirski (10):
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing
    FLAGS test
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/traps: Clear DR6 early in do_debug and improve the comment
  x86/entry: Vastly simplify SYSENTER TF handling
  x86/entry: Only allocate space for SYSENTER_stack if needed
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry/32: Add and check a stack canary for the SYSENTER stack

 arch/x86/entry/entry_32.S                | 182 ++++++++++++++++++-------------
 arch/x86/entry/entry_64_compat.S         |  15 ++-
 arch/x86/include/asm/processor.h         |   5 +-
 arch/x86/include/asm/proto.h             |  15 ++-
 arch/x86/kernel/asm-offsets_32.c         |   5 +
 arch/x86/kernel/process.c                |   3 +
 arch/x86/kernel/traps.c                  |  87 ++++++++++++---
 tools/testing/selftests/x86/syscall_nt.c |  57 ++++++++--
 8 files changed, 263 insertions(+), 106 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-03-02 12:59   ` Borislav Petkov
  2016-02-29  5:28 ` [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Setting TF prevents fastpath returns in most cases, which causes the
test to fail on 32-bit kernels because 32-bit kernels do not, in
fact, handle NT correctly on SYSENTER entries.

The next patch will fix 32-bit kernels.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/syscall_nt.c | 57 +++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c
index 60c06af4646a..a6ceff86c199 100644
--- a/tools/testing/selftests/x86/syscall_nt.c
+++ b/tools/testing/selftests/x86/syscall_nt.c
@@ -17,6 +17,9 @@
 
 #include <stdio.h>
 #include <unistd.h>
+#include <string.h>
+#include <signal.h>
+#include <err.h>
 #include <sys/syscall.h>
 #include <asm/processor-flags.h>
 
@@ -26,6 +29,8 @@
 # define WIDTH "l"
 #endif
 
+static unsigned int nerrs;
+
 static unsigned long get_eflags(void)
 {
 	unsigned long eflags;
@@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags)
 		      : : "rm" (eflags) : "flags");
 }
 
-int main()
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
 {
-	printf("[RUN]\tSet NT and issue a syscall\n");
-	set_eflags(get_eflags() | X86_EFLAGS_NT);
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static void sigtrap(int sig, siginfo_t *si, void *ctx_void)
+{
+}
+
+static void do_it(unsigned long extraflags)
+{
+	unsigned long flags;
+
+	set_eflags(get_eflags() | extraflags);
 	syscall(SYS_getpid);
-	if (get_eflags() & X86_EFLAGS_NT) {
-		printf("[OK]\tThe syscall worked and NT is still set\n");
-		return 0;
+	flags = get_eflags();
+	if ((flags & extraflags) == extraflags) {
+		printf("[OK]\tThe syscall worked and flags are still set\n");
 	} else {
-		printf("[FAIL]\tThe syscall worked but NT was cleared\n");
-		return 1;
+		printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n",
+		       flags, extraflags);
+		nerrs++;
 	}
 }
+
+int main()
+{
+	printf("[RUN]\tSet NT and issue a syscall\n");
+	do_it(X86_EFLAGS_NT);
+
+	/*
+	 * Now try it again with TF set -- TF forces returns via IRET in all
+	 * cases except non-ptregs-using 64-bit full fast path syscalls.
+	 */
+
+	sethandler(SIGTRAP, sigtrap, 0);
+
+	printf("[RUN]\tSet NT|TF and issue a syscall\n");
+	do_it(X86_EFLAGS_NT | X86_EFLAGS_TF);
+
+	return nerrs == 0 ? 0 : 1;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29 20:39   ` Borislav Petkov
  2016-02-29  5:28 ` [PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

CLAC is slow, and the SYSENTER code already has an unlikely path
that runs if unusual flags are set.  Drop the CLAC and instead rely
on the unlikely path to clear AC.

This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
make this faster please!)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64_compat.S | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 89bcb4979e7a..7c8e72da7654 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat)
 	 */
 	pushfq				/* pt_regs->flags (except IF = 0) */
 	orl	$X86_EFLAGS_IF, (%rsp)	/* Fix saved flags */
-	ASM_CLAC			/* Clear AC after saving FLAGS */
-
 	pushq	$__USER32_CS		/* pt_regs->cs */
 	xorq    %r8,%r8
 	pushq	%r8			/* pt_regs->ip = 0 (placeholder) */
@@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat)
 	cld
 
 	/*
-	 * Sysenter doesn't filter flags, so we need to clear NT
+	 * Sysenter doesn't filter flags, so we need to clear NT and AC
 	 * ourselves.  To save a few cycles, we can check whether
-	 * NT was set instead of doing an unconditional popfq.
+	 * either was set instead of doing an unconditional popfq.
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
@@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat)
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT, EFLAGS(%rsp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-03-02 13:24   ` Borislav Petkov
  2016-02-29  5:28 ` [PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT Andy Lutomirski
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

This makes the 32-bit code work just like the 64-bit code.  It should
speed up syscalls on 32-bit kernels on Skylake by something like 20
cycles (by analogy to the 64-bit compat case).

It also cleans up NT just like we do for the 64-bit case.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ab710eee4308..263ebde6333f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -294,7 +294,6 @@ sysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
 	pushfl				/* pt_regs->flags (except IF = 0) */
-	ASM_CLAC			/* Clear AC after saving FLAGS */
 	orl	$X86_EFLAGS_IF, (%esp)	/* Fix IF */
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
@@ -302,6 +301,23 @@ sysenter_past_esp:
 	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
 
 	/*
+	 * Sysenter doesn't filter flags, so we need to clear NT and AC
+	 * ourselves.  To save a few cycles, we can check whether
+	 * either was set instead of doing an unconditional popfq.
+	 * This needs to happen before enabling interrupts so that
+	 * we don't get preempted with NT set.
+	 *
+	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
+	 * out-of-line as an optimization: NT is unlikely to be set in the
+	 * majority of the cases and instead of polluting the I$ unnecessarily,
+	 * we're keeping that code behind a branch which will predict as
+	 * not-taken and therefore its instructions won't be fetched.
+	 */
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+	jnz	.Lsysenter_fix_flags
+.Lsysenter_flags_fixed:
+
+	/*
 	 * User mode is traced as though IRQs are on, and SYSENTER
 	 * turned them off.
 	 */
@@ -339,6 +355,11 @@ sysenter_past_esp:
 .popsection
 	_ASM_EXTABLE(1b, 2b)
 	PTGS_TO_GS_EX
+
+.Lsysenter_fix_flags:
+	pushl	$X86_EFLAGS_FIXED
+	popfl
+	jmp	.Lsysenter_flags_fixed
 ENDPROC(entry_SYSENTER_32)
 
 	# system call handler stub
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (2 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions Andy Lutomirski
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.

With this patch applied, native kernels should always honor
task_pt_regs()->flags, which opens the door for some sys_iopl
cleanups.  I'll do those as a separate series, though, since getting
it right will involve tweaking some paravirt ops.

(The short version is that, before this patch, sys_iopl, invoked via
 SYSENTER, wasn't guaranteed to ever transfer the updated
 regs->flags, so sys_iopl had to change the hardware flags register
 as well.)

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 263ebde6333f..ed171f938960 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -343,6 +343,15 @@ sysenter_past_esp:
 	popl	%eax			/* pt_regs->ax */
 
 	/*
+	 * Restore all flags except IF (we restore IF separately because
+	 * STI gives a one-instruction window in which we won't be interrupted,
+	 * whereas POPF does not.
+	 */
+	addl	$PT_EFLAGS-PT_DS, %esp	/* point esp at pt_regs->flags */
+	btr	$X86_EFLAGS_IF_BIT, (%esp)
+	popfl
+
+	/*
 	 * Return back to the vDSO, which will pop ecx and edx.
 	 * Don't bother with DS and ES (they already contain __USER_DS).
 	 */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (3 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment Andy Lutomirski
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

The SDM says that debug exceptions clear BTF, and we need to keep
TIF_BLOCKSTEP in sync with BTF.  Clear it unconditionally and improve
the comment.

I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not
to be cleared was just an oversight.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index dd2c2e66c2e1..19e6cfa501e3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	dr6 &= ~DR6_RESERVED;
 
 	/*
+	 * The SDM says "The processor clears the BTF flag when it
+	 * generates a debug exception."  Clear TIF_BLOCKSTEP to keep
+	 * TIF_BLOCKSTEP in sync with the hardware BTF flag.
+	 */
+	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
+
+	/*
 	 * If dr6 has no reason to give us about the origin of this trap,
 	 * then it's very likely the result of an icebp/int01 trap.
 	 * User wants a sigtrap for that.
@@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	/* DR6 may or may not be cleared by the CPU */
 	set_debugreg(0, 6);
 
-	/*
-	 * The processor cleared BTF, so don't mark that we need it set.
-	 */
-	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
-
 	/* Store the virtualized DR6 value */
 	tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (4 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling Andy Lutomirski
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Leaving any bits set in DR6 on return from a debug exception is
asking for trouble.  Prevent it by writing zero right away and
clarify the comment.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 19e6cfa501e3..6dddc220e3ed 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	ist_enter(regs);
 
 	get_debugreg(dr6, 6);
+	/*
+	 * The Intel SDM says:
+	 *
+	 *   Certain debug exceptions may clear bits 0-3. The remaining
+	 *   contents of the DR6 register are never cleared by the
+	 *   processor. To avoid confusion in identifying debug
+	 *   exceptions, debug handlers should clear the register before
+	 *   returning to the interrupted task.
+	 *
+	 * Keep it simple: clear DR6 immediately.
+	 */
+	set_debugreg(0, 6);
 
 	/* Filter out all the reserved bits which are preset to 1 */
 	dr6 &= ~DR6_RESERVED;
@@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
 		goto exit;
 
-	/* DR6 may or may not be cleared by the CPU */
-	set_debugreg(0, 6);
-
 	/* Store the virtualized DR6 value */
 	tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (5 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed Andy Lutomirski
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
if a user does SYSENTER with TF set, we will single-step through the
kernel until something clears TF.  There is absolutely nothing we can
do to prevent this short of turning off SYSENTER [1].

Simplify the handling considerably with two changes:

1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
   add TF to that list of flags to sanitize with no overhead whatsoever.

2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.

That's all we need to do.

Don't get too excited -- our handling is still buggy on 32-bit
kernels.  There's nothing wrong with the SYSENTER code itself, but
the #DB prologue has a clever fixup for traps on the very first
instruction of entry_SYSENTER_32, and the fixup doesn't work quite
correctly.  The next two patches will fix that.

[1] We could probably prevent it by forcing BTF on at all times and
    making sure we clear TF before any branches in the SYSENTER
    code.  Needless to say, this is a bad idea.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S        | 42 ++++++++++++++++++++++----------
 arch/x86/entry/entry_64_compat.S |  9 ++++++-
 arch/x86/include/asm/proto.h     | 15 ++++++++++--
 arch/x86/kernel/traps.c          | 52 +++++++++++++++++++++++++++++++++-------
 4 files changed, 94 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ed171f938960..752d4f031a18 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -287,7 +287,26 @@ need_resched:
 END(resume_kernel)
 #endif
 
-	# SYSENTER  call handler stub
+GLOBAL(__begin_SYSENTER_singlestep_region)
+/*
+ * All code from here through __end_SYSENTER_singlestep_region is subject
+ * to being single-stepped if a user program sets TF and executes SYSENTER.
+ * There is absolutely nothing that we can do to prevent this from happening
+ * (thanks Intel!).  To keep our handling of this situation as simple as
+ * possible, we handle TF just like AC and NT, except that our #DB handler
+ * will ignore all of the single-step traps generated in this range.
+ */
+
+#ifdef CONFIG_XEN
+/*
+ * Xen doesn't set %esp to be precisely what the normal SYSENTER
+ * entry point expects, so fix it up before using the normal path.
+ */
+ENTRY(xen_sysenter_target)
+	addl	$5*4, %esp			/* remove xen-provided frame */
+	jmp	sysenter_past_esp
+#endif
+
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_sysenter_sp0(%esp), %esp
 sysenter_past_esp:
@@ -301,19 +320,25 @@ sysenter_past_esp:
 	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
 
 	/*
-	 * Sysenter doesn't filter flags, so we need to clear NT and AC
-	 * ourselves.  To save a few cycles, we can check whether
+	 * Sysenter doesn't filter flags, so we need to clear NT, AC
+	 * and TF ourselves.  To save a few cycles, we can check whether
 	 * either was set instead of doing an unconditional popfq.
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
+	 * If TF is set, we will single-step all the way to here -- do_debug
+	 * will ignore all the traps.  (Yes, this is slow, but so is
+	 * single-stepping in general.  This allows us to avoid having
+	 * a more complicated code to handle the case where a user program
+	 * forces us to single-step through the SYSENTER entry code.)
+	 *
 	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 	 * out-of-line as an optimization: NT is unlikely to be set in the
 	 * majority of the cases and instead of polluting the I$ unnecessarily,
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -369,6 +394,7 @@ sysenter_past_esp:
 	pushl	$X86_EFLAGS_FIXED
 	popfl
 	jmp	.Lsysenter_flags_fixed
+GLOBAL(__end_SYSENTER_singlestep_region)
 ENDPROC(entry_SYSENTER_32)
 
 	# system call handler stub
@@ -662,14 +688,6 @@ ENTRY(spurious_interrupt_bug)
 END(spurious_interrupt_bug)
 
 #ifdef CONFIG_XEN
-/*
- * Xen doesn't set %esp to be precisely what the normal SYSENTER
- * entry point expects, so fix it up before using the normal path.
- */
-ENTRY(xen_sysenter_target)
-	addl	$5*4, %esp			/* remove xen-provided frame */
-	jmp	sysenter_past_esp
-
 ENTRY(xen_hypervisor_callback)
 	pushl	$-1				/* orig_ax = -1 => not a system call */
 	SAVE_ALL
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 7c8e72da7654..6aec75b41b06 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -94,13 +94,19 @@ ENTRY(entry_SYSENTER_compat)
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
+	 * If TF is set, we will single-step all the way to here -- do_debug
+	 * will ignore all the traps.  (Yes, this is slow, but so is
+	 * single-stepping in general.  This allows us to avoid having
+	 * a more complicated code to handle the case where a user program
+	 * forces us to single-step through the SYSENTER entry code.)
+	 *
 	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 	 * out-of-line as an optimization: NT is unlikely to be set in the
 	 * majority of the cases and instead of polluting the I$ unnecessarily,
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, EFLAGS(%rsp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -121,6 +127,7 @@ ENTRY(entry_SYSENTER_compat)
 	pushq	$X86_EFLAGS_FIXED
 	popfq
 	jmp	.Lsysenter_flags_fixed
+GLOBAL(__end_entry_SYSENTER_compat)
 ENDPROC(entry_SYSENTER_compat)
 
 /*
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index a4a77286cb1d..9b9b30b19441 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -7,12 +7,23 @@
 
 void syscall_init(void);
 
+#ifdef CONFIG_X86_64
 void entry_SYSCALL_64(void);
-void entry_SYSCALL_compat(void);
+#endif
+
+#ifdef CONFIG_X86_32
 void entry_INT80_32(void);
-void entry_INT80_compat(void);
 void entry_SYSENTER_32(void);
+void __begin_SYSENTER_singlestep_region(void);
+void __end_SYSENTER_singlestep_region(void);
+#endif
+
+#ifdef CONFIG_IA32_EMULATION
 void entry_SYSENTER_compat(void);
+void __end_entry_SYSENTER_compat(void);
+void entry_SYSCALL_compat(void);
+void entry_INT80_compat(void);
+#endif
 
 void x86_configure_nx(void);
 void x86_report_nx(void);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6dddc220e3ed..80928ea78373 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -559,6 +559,29 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 NOKPROBE_SYMBOL(fixup_bad_iret);
 #endif
 
+static bool is_sysenter_singlestep(struct pt_regs *regs)
+{
+	/*
+	 * We don't try for precision here.  If we're anywhere in the region of
+	 * code that can be single-stepped in the SYSENTER entry path, then
+	 * assume that this is a useless single-step trap due to SYSENTER
+	 * being invoked with TF set.  (We don't know in advance exactly
+	 * which instructions will be hit because BTF could plausibly
+	 * be set.
+	 */
+#ifdef CONFIG_X86_32
+	return (regs->ip - (unsigned long)__begin_SYSENTER_singlestep_region) <
+		(unsigned long)__end_SYSENTER_singlestep_region -
+		(unsigned long)__begin_SYSENTER_singlestep_region;
+#elif defined(CONFIG_IA32_EMULATION)
+	return (regs->ip - (unsigned long)entry_SYSENTER_compat) <
+		(unsigned long)__end_entry_SYSENTER_compat -
+		(unsigned long)entry_SYSENTER_compat;
+#else
+	return false;
+#endif
+}
+
 /*
  * Our handling of the processor debug registers is non-trivial.
  * We do not clear them on entry and exit from the kernel. Therefore
@@ -616,6 +639,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	 */
 	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
 
+	if (unlikely(!user_mode(regs) && (dr6 & DR_STEP) &&
+		     is_sysenter_singlestep(regs))) {
+		dr6 &= ~DR_STEP;
+		if (!dr6)
+			goto exit;
+		/*
+		 * else we might have gotten a single-step trap and hit a
+		 * watchpoint at the same time, in which case we should fall
+		 * through and handle the watchpoint.
+		 */
+	}
+
 	/*
 	 * If dr6 has no reason to give us about the origin of this trap,
 	 * then it's very likely the result of an icebp/int01 trap.
@@ -624,7 +659,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	if (!dr6 && user_mode(regs))
 		user_icebp = 1;
 
-	/* Catch kmemcheck conditions first of all! */
+	/* Catch kmemcheck conditions! */
 	if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
 		goto exit;
 
@@ -659,14 +694,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 		goto exit;
 	}
 
-	/*
-	 * Single-stepping through system calls: ignore any exceptions in
-	 * kernel space, but re-enable TF when returning to user mode.
-	 *
-	 * We already checked v86 mode above, so we can check for kernel mode
-	 * by just checking the CPL of CS.
-	 */
-	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+	if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
+		/*
+		 * Historical junk that used to handle SYSENTER single-stepping.
+		 * This should be unreachable now.  If we survive for a while
+		 * without anyone hitting this warning, we'll turn this into
+		 * an oops.
+		 */
 		tsk->thread.debugreg6 &= ~DR_STEP;
 		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
 		regs->flags &= ~X86_EFLAGS_TF;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (6 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup Andy Lutomirski
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

The SYSENTER stack is only used on 32-bit kernels.  Remove it in
64-bit kernels.

(We may end up using it down the road on 64-bit kernels.  If so,
 we'll re-enable it for CONFIG_IA32_EMULATION.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ecb410310e70..7cd01b71b5bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -297,10 +297,12 @@ struct tss_struct {
 	 */
 	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
 
+#ifdef CONFIG_X86_32
 	/*
 	 * Space for the temporary SYSENTER stack:
 	 */
 	unsigned long		SYSENTER_stack[64];
+#endif
 
 } ____cacheline_aligned;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (7 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29  5:28 ` [PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack Andy Lutomirski
  2016-02-29 18:55 ` [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
so the exception handler is invoked on the temporary SYSENTER stack.

Because the SYSENTER stack is very small, we have a fixup to switch
off the stack quickly when this happens.  The old fixup had several issues:

1. It checked the interrupt frame's CS and EIP.  This wasn't
   obviously correct on Xen or if vm86 mode was in use [1].

2. In the NMI handler, it did some frightening digging into the
   stack frame.  I'm not convinced this digging was correct.

3. The fixup didn't switch stacks and then switch back.  Instead, it
   synthesized a brand new stack frame that would redirect the IRET
   back to the SYSENTER code.  That frame was highly questionable.
   For one thing, if NMI nested inside #DB, we would effectively
   abort the #DB prologue, which was probably safe but was
   frightening.  For another, the code used PUSHFL to write the
   FLAGS portion of the frame, which was simply bogus -- by the time
   PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
   flags were clobbered.

Simplify this considerably.  Instead of looking at the saved frame
to see where we came from, check the hardware ESP register against
the SYSENTER stack directly.  Malicious user code cannot spoof the
kernel ESP register, and by moving the check after SAVE_ALL, we can
use normal PER_CPU accesses to find all the relevant addresses.

With this patch applied, the improved syscall_nt_32 test finally
passes on 32-bit kernels.

[1] It isn't obviously correct, but it is nonetheless safe from vm86
    shenanigans as far as I can tell.  A user can't point EIP at
    entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
    like all kernel addresses, is greater than 0xffff and would thus
    violate the CS segment limit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S        | 114 ++++++++++++++++++---------------------
 arch/x86/kernel/asm-offsets_32.c |   5 ++
 2 files changed, 56 insertions(+), 63 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 752d4f031a18..99bf636a6eaf 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -987,51 +987,48 @@ error_code:
 	jmp	ret_from_exception
 END(page_fault)
 
-/*
- * Debug traps and NMI can happen at the one SYSENTER instruction
- * that sets up the real kernel stack. Check here, since we can't
- * allow the wrong stack to be used.
- *
- * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
- * already pushed 3 words if it hits on the sysenter instruction:
- * eflags, cs and eip.
- *
- * We just load the right stack, and push the three (known) values
- * by hand onto the new stack - while updating the return eip past
- * the instruction that would have done it for sysenter.
- */
-.macro FIX_STACK offset ok label
-	cmpw	$__KERNEL_CS, 4(%esp)
-	jne	\ok
-\label:
-	movl	TSS_sysenter_sp0 + \offset(%esp), %esp
-	pushfl
-	pushl	$__KERNEL_CS
-	pushl	$sysenter_past_esp
-.endm
-
 ENTRY(debug)
+	/*
+	 * #DB can happen at the first instruction of
+	 * entry_SYSENTER_32 or in Xen's SYSENTER prologue.  If this
+	 * happens, then we will be running on a very small stack.  We
+	 * need to detect this condition and switch to the thread
+	 * stack before calling any C code at all.
+	 *
+	 * If you edit this code, keep in mind that NMIs can happen in here.
+	 */
 	ASM_CLAC
-	cmpl	$entry_SYSENTER_32, (%esp)
-	jne	debug_stack_correct
-	FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
-debug_stack_correct:
 	pushl	$-1				# mark this as an int
 	SAVE_ALL
-	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
+
+	/* Are we currently on the SYSENTER stack? */
+	PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+	subl	%eax, %ecx		/* ecx = (end of SYENTER_stack) - esp */
+	cmpl	$SIZEOF_SYSENTER_stack, %ecx
+	jb	.Ldebug_from_sysenter_stack
+
+	TRACE_IRQS_OFF
+	call	do_debug
+	jmp	ret_from_exception
+
+.Ldebug_from_sysenter_stack:
+	/* We're on the SYSENTER stack.  Switch off. */
+	movl	%esp, %ebp
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
+	TRACE_IRQS_OFF
 	call	do_debug
+	movl	%ebp, %esp
 	jmp	ret_from_exception
 END(debug)
 
 /*
- * NMI is doubly nasty. It can happen _while_ we're handling
- * a debug fault, and the debug fault hasn't yet been able to
- * clear up the stack. So we first check whether we got  an
- * NMI on the sysenter entry path, but after that we need to
- * check whether we got an NMI on the debug path where the debug
- * fault happened on the sysenter path.
+ * NMI is doubly nasty.  It can happen on the first instruction of
+ * entry_SYSENTER_32 (just like #DB), but it can also interrupt the beginning
+ * of the #DB handler even if that #DB in turn hit before entry_SYSENTER_32
+ * switched stacks.  We handle both conditions by simply checking whether we
+ * interrupted kernel code running on the SYSENTER stack.
  */
 ENTRY(nmi)
 	ASM_CLAC
@@ -1042,41 +1039,32 @@ ENTRY(nmi)
 	popl	%eax
 	je	nmi_espfix_stack
 #endif
-	cmpl	$entry_SYSENTER_32, (%esp)
-	je	nmi_stack_fixup
-	pushl	%eax
-	movl	%esp, %eax
-	/*
-	 * Do not access memory above the end of our stack page,
-	 * it might not exist.
-	 */
-	andl	$(THREAD_SIZE-1), %eax
-	cmpl	$(THREAD_SIZE-20), %eax
-	popl	%eax
-	jae	nmi_stack_correct
-	cmpl	$entry_SYSENTER_32, 12(%esp)
-	je	nmi_debug_stack_check
-nmi_stack_correct:
-	pushl	%eax
+
+	pushl	%eax				# pt_regs->orig_ax
 	SAVE_ALL
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
+
+	/* Are we currently on the SYSENTER stack? */
+	PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+	subl	%eax, %ecx		/* ecx = (end of SYENTER_stack) - esp */
+	cmpl	$SIZEOF_SYSENTER_stack, %ecx
+	jb	.Lnmi_from_sysenter_stack
+
+	/* Not on SYSENTER stack. */
 	call	do_nmi
 	jmp	restore_all_notrace
 
-nmi_stack_fixup:
-	FIX_STACK 12, nmi_stack_correct, 1
-	jmp	nmi_stack_correct
-
-nmi_debug_stack_check:
-	cmpw	$__KERNEL_CS, 16(%esp)
-	jne	nmi_stack_correct
-	cmpl	$debug, (%esp)
-	jb	nmi_stack_correct
-	cmpl	$debug_esp_fix_insn, (%esp)
-	ja	nmi_stack_correct
-	FIX_STACK 24, nmi_stack_correct, 1
-	jmp	nmi_stack_correct
+.Lnmi_from_sysenter_stack:
+	/*
+	 * We're on the SYSENTER stack.  Switch off.  No one (not even debug)
+	 * is using the thread stack right now, so it's safe for us to use it.
+	 */
+	movl	%esp, %ebp
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
+	call	do_nmi
+	movl	%ebp, %esp
+	jmp	restore_all_notrace
 
 #ifdef CONFIG_X86_ESPFIX32
 nmi_espfix_stack:
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index fdeb0ce07c16..ecdc1d217dc0 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -52,6 +52,11 @@ void foo(void)
 	DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) -
 	       offsetofend(struct tss_struct, SYSENTER_stack));
 
+	/* Offset from cpu_tss to SYSENTER_stack */
+	OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack);
+	/* Size of SYSENTER_stack */
+	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
+
 #if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_GUEST) || defined(CONFIG_LGUEST_MODULE)
 	BLANK();
 	OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (8 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup Andy Lutomirski
@ 2016-02-29  5:28 ` Andy Lutomirski
  2016-02-29 18:55 ` [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29  5:28 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 3 ++-
 arch/x86/kernel/process.c        | 3 +++
 arch/x86/kernel/traps.c          | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7cd01b71b5bd..50a6dc871cc0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -299,8 +299,9 @@ struct tss_struct {
 
 #ifdef CONFIG_X86_32
 	/*
-	 * Space for the temporary SYSENTER stack:
+	 * Space for the temporary SYSENTER stack.
 	 */
+	unsigned long		SYSENTER_stack_canary;
 	unsigned long		SYSENTER_stack[64];
 #endif
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c22477..ee9a9792caeb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 	  */
 	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },
 #endif
+#ifdef CONFIG_X86_32
+	.SYSENTER_stack_canary	= STACK_END_MAGIC,
+#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss);
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 80928ea78373..590110119e6a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	debug_stack_usage_dec();
 
 exit:
+#if defined(CONFIG_X86_32)
+	/*
+	 * This is the most likely code path that involves non-trivial use
+	 * of the SYSENTER stack.  Check that we haven't overrun it.
+	 */
+	WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC,
+	     "Overran or corrupted SYSENTER stack\n");
+#endif
 	ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
  2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (9 preceding siblings ...)
  2016-02-29  5:28 ` [PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack Andy Lutomirski
@ 2016-02-29 18:55 ` Andy Lutomirski
  10 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29 18:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Borislav Petkov, Oleg Nesterov,
	Andrew Cooper, Brian Gerst

On Sun, Feb 28, 2016 at 9:28 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> While I was doing all this, I also noticed that DR6 and BTF handling
> in do_debug was a bit off.  Two of the patches in here try to fix it
> up.

It's worth noting that do_debug is really quite screwed up with or
without this patchset applied.  For example:

    /*
     * Let others (NMI) know that the debug stack is in use
     * as we may switch to the interrupt stack.
     */
    debug_stack_usage_inc();

    /* It's safe to allow irq's after DR6 has been saved */
    preempt_disable();
    cond_local_irq_enable(regs);

This has never really been valid.  It should be guarded by an
if(user_mode(regs)).  And we need to kill the die_notifier garbage in
here -- it makes it basically impossible to understand what's going
on.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-02-29  5:28 ` [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
@ 2016-02-29 20:39   ` Borislav Petkov
  2016-02-29 20:45     ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-02-29 20:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Oleg Nesterov, Andrew Cooper, Brian Gerst

On Sun, Feb 28, 2016 at 09:28:47PM -0800, Andy Lutomirski wrote:
> CLAC is slow, and the SYSENTER code already has an unlikely path
> that runs if unusual flags are set.  Drop the CLAC and instead rely
> on the unlikely path to clear AC.
> 
> This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
> make this faster please!)
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64_compat.S | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
> index 89bcb4979e7a..7c8e72da7654 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat)
>  	 */
>  	pushfq				/* pt_regs->flags (except IF = 0) */
>  	orl	$X86_EFLAGS_IF, (%rsp)	/* Fix saved flags */
> -	ASM_CLAC			/* Clear AC after saving FLAGS */
> -
>  	pushq	$__USER32_CS		/* pt_regs->cs */
>  	xorq    %r8,%r8
>  	pushq	%r8			/* pt_regs->ip = 0 (placeholder) */
> @@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat)
>  	cld
>  
>  	/*
> -	 * Sysenter doesn't filter flags, so we need to clear NT
> +	 * Sysenter doesn't filter flags, so we need to clear NT and AC
>  	 * ourselves.  To save a few cycles, we can check whether
> -	 * NT was set instead of doing an unconditional popfq.
> +	 * either was set instead of doing an unconditional popfq.
>  	 * This needs to happen before enabling interrupts so that
>  	 * we don't get preempted with NT set.
>  	 *
> @@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat)
>  	 * we're keeping that code behind a branch which will predict as
>  	 * not-taken and therefore its instructions won't be fetched.
>  	 */
> -	testl	$X86_EFLAGS_NT, EFLAGS(%rsp)
> +	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
>  	jnz	.Lsysenter_fix_flags
>  .Lsysenter_flags_fixed:

Do I see it correctly that with this change, that .Lsysenter_fix_flags:
is going to be visited each time on SMAP machines and then we can get
rid of it? The reason for it was not to pollute I$ as the comment says
but that happening now anyway...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-02-29 20:39   ` Borislav Petkov
@ 2016-02-29 20:45     ` Andy Lutomirski
  2016-02-29 22:09       ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29 20:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Oleg Nesterov,
	Andrew Cooper, Brian Gerst

On Mon, Feb 29, 2016 at 12:39 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Feb 28, 2016 at 09:28:47PM -0800, Andy Lutomirski wrote:
>> CLAC is slow, and the SYSENTER code already has an unlikely path
>> that runs if unusual flags are set.  Drop the CLAC and instead rely
>> on the unlikely path to clear AC.
>>
>> This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
>> make this faster please!)
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/entry/entry_64_compat.S | 8 +++-----
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
>> index 89bcb4979e7a..7c8e72da7654 100644
>> --- a/arch/x86/entry/entry_64_compat.S
>> +++ b/arch/x86/entry/entry_64_compat.S
>> @@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat)
>>        */
>>       pushfq                          /* pt_regs->flags (except IF = 0) */
>>       orl     $X86_EFLAGS_IF, (%rsp)  /* Fix saved flags */
>> -     ASM_CLAC                        /* Clear AC after saving FLAGS */
>> -
>>       pushq   $__USER32_CS            /* pt_regs->cs */
>>       xorq    %r8,%r8
>>       pushq   %r8                     /* pt_regs->ip = 0 (placeholder) */
>> @@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat)
>>       cld
>>
>>       /*
>> -      * Sysenter doesn't filter flags, so we need to clear NT
>> +      * Sysenter doesn't filter flags, so we need to clear NT and AC
>>        * ourselves.  To save a few cycles, we can check whether
>> -      * NT was set instead of doing an unconditional popfq.
>> +      * either was set instead of doing an unconditional popfq.
>>        * This needs to happen before enabling interrupts so that
>>        * we don't get preempted with NT set.
>>        *
>> @@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat)
>>        * we're keeping that code behind a branch which will predict as
>>        * not-taken and therefore its instructions won't be fetched.
>>        */
>> -     testl   $X86_EFLAGS_NT, EFLAGS(%rsp)
>> +     testl   $X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
>>       jnz     .Lsysenter_fix_flags
>>  .Lsysenter_flags_fixed:
>
> Do I see it correctly that with this change, that .Lsysenter_fix_flags:
> is going to be visited each time on SMAP machines and then we can get
> rid of it? The reason for it was not to pollute I$ as the comment says
> but that happening now anyway...
>

I don't think so.  Sensible user programs shouldn't set AC in the first place.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-02-29 20:45     ` Andy Lutomirski
@ 2016-02-29 22:09       ` Borislav Petkov
  2016-02-29 22:33         ` Brian Gerst
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-02-29 22:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Oleg Nesterov,
	Andrew Cooper, Brian Gerst

On Mon, Feb 29, 2016 at 12:45:58PM -0800, Andy Lutomirski wrote:
> I don't think so.  Sensible user programs shouldn't set AC in the first place.

Then I'm most likely missing something: so before this patch, we did
unconditionally CLAC thus disallowing kernel access to user pages. Why
don't we need it anymore and need to pay attention only to user rFLAGS?

Especially since we do:

do_fast_syscall_32
|-> __get_user
   |-> __get_user_nocheck
      |-> __uaccess_begin which is stac()

Or are we saying, we don't need that CLAC in the beginning of
entry_SYSENTER_compat() at all because we're going to STAC anyway in
__get_user() ?

Hmmm...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-02-29 22:09       ` Borislav Petkov
@ 2016-02-29 22:33         ` Brian Gerst
  2016-02-29 22:37           ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Brian Gerst @ 2016-02-29 22:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Oleg Nesterov, Andrew Cooper

On Mon, Feb 29, 2016 at 5:09 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 29, 2016 at 12:45:58PM -0800, Andy Lutomirski wrote:
>> I don't think so.  Sensible user programs shouldn't set AC in the first place.
>
> Then I'm most likely missing something: so before this patch, we did
> unconditionally CLAC thus disallowing kernel access to user pages. Why
> don't we need it anymore and need to pay attention only to user rFLAGS?

SYSENTER doesn't save EFLAGS so we have to fudge it by pushing the
kernel's flags.  Most user programs never set AC, so by adding it to
the test for flags to clear we can avoid the CLAC or POPF in the
common case that it is already clear.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-02-29 22:33         ` Brian Gerst
@ 2016-02-29 22:37           ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-02-29 22:37 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Oleg Nesterov, Andrew Cooper

On Mon, Feb 29, 2016 at 05:33:16PM -0500, Brian Gerst wrote:
> SYSENTER doesn't save EFLAGS so we have to fudge it by pushing the
> kernel's flags.  Most user programs never set AC, so by adding it to
> the test for flags to clear we can avoid the CLAC or POPF in the
> common case that it is already clear.

Yeah, Andy just explained it to me on IRC. Thanks guys.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
  2016-02-29  5:28 ` [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
@ 2016-03-02 12:59   ` Borislav Petkov
  2016-03-02 14:01     ` One Thousand Gnomes
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-03-02 12:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Oleg Nesterov, Andrew Cooper, Brian Gerst

On Sun, Feb 28, 2016 at 09:28:46PM -0800, Andy Lutomirski wrote:
> Setting TF prevents fastpath returns in most cases, which causes the
> test to fail on 32-bit kernels because 32-bit kernels do not, in
> fact, handle NT correctly on SYSENTER entries.
> 
> The next patch will fix 32-bit kernels.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  tools/testing/selftests/x86/syscall_nt.c | 57 +++++++++++++++++++++++++++-----
>  1 file changed, 49 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c
> index 60c06af4646a..a6ceff86c199 100644
> --- a/tools/testing/selftests/x86/syscall_nt.c
> +++ b/tools/testing/selftests/x86/syscall_nt.c

...

> +static void do_it(unsigned long extraflags)
> +{
> +	unsigned long flags;
> +
> +	set_eflags(get_eflags() | extraflags);
>  	syscall(SYS_getpid);
> -	if (get_eflags() & X86_EFLAGS_NT) {
> -		printf("[OK]\tThe syscall worked and NT is still set\n");
> -		return 0;
> +	flags = get_eflags();
> +	if ((flags & extraflags) == extraflags) {
> +		printf("[OK]\tThe syscall worked and flags are still set\n");
>  	} else {
> -		printf("[FAIL]\tThe syscall worked but NT was cleared\n");
> -		return 1;
> +		printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n",
> +		       flags, extraflags);
> +		nerrs++;
>  	}
>  }
> +
> +int main()

ERROR: Bad function definition - int main() should probably be int main(void)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  2016-02-29  5:28 ` [PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
@ 2016-03-02 13:24   ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-03-02 13:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Oleg Nesterov, Andrew Cooper, Brian Gerst

On Sun, Feb 28, 2016 at 09:28:48PM -0800, Andy Lutomirski wrote:
> This makes the 32-bit code work just like the 64-bit code.  It should
> speed up syscalls on 32-bit kernels on Skylake by something like 20
> cycles (by analogy to the 64-bit compat case).
> 
> It also cleans up NT just like we do for the 64-bit case.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_32.S | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index ab710eee4308..263ebde6333f 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -294,7 +294,6 @@ sysenter_past_esp:
>  	pushl	$__USER_DS		/* pt_regs->ss */
>  	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
>  	pushfl				/* pt_regs->flags (except IF = 0) */
> -	ASM_CLAC			/* Clear AC after saving FLAGS */
>  	orl	$X86_EFLAGS_IF, (%esp)	/* Fix IF */
>  	pushl	$__USER_CS		/* pt_regs->cs */
>  	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
> @@ -302,6 +301,23 @@ sysenter_past_esp:
>  	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
>  
>  	/*
> +	 * Sysenter doesn't filter flags, so we need to clear NT and AC
> +	 * ourselves.  To save a few cycles, we can check whether
> +	 * either was set instead of doing an unconditional popfq.

Let's write insn names capitalized: SYSENTER, POPF, etc... I know,
entry_SYSENTER_compat() doesn't do it either but you could fix that up
in your 2/10 patch, since you're touching that place anyway...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
  2016-03-02 12:59   ` Borislav Petkov
@ 2016-03-02 14:01     ` One Thousand Gnomes
  2016-03-02 14:28       ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: One Thousand Gnomes @ 2016-03-02 14:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, x86, linux-kernel, Oleg Nesterov, Andrew Cooper,
	Brian Gerst

On Wed, 2 Mar 2016 13:59:52 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Sun, Feb 28, 2016 at 09:28:46PM -0800, Andy Lutomirski wrote:
> > Setting TF prevents fastpath returns in most cases, which causes the
> > test to fail on 32-bit kernels because 32-bit kernels do not, in
> > fact, handle NT correctly on SYSENTER entries.
> > 
> > The next patch will fix 32-bit kernels.
> > 
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > ---
> >  tools/testing/selftests/x86/syscall_nt.c | 57 +++++++++++++++++++++++++++-----
> >  1 file changed, 49 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c
> > index 60c06af4646a..a6ceff86c199 100644
> > --- a/tools/testing/selftests/x86/syscall_nt.c
> > +++ b/tools/testing/selftests/x86/syscall_nt.c  
> 
> ...
> 
> > +static void do_it(unsigned long extraflags)
> > +{
> > +	unsigned long flags;
> > +
> > +	set_eflags(get_eflags() | extraflags);
> >  	syscall(SYS_getpid);
> > -	if (get_eflags() & X86_EFLAGS_NT) {
> > -		printf("[OK]\tThe syscall worked and NT is still set\n");
> > -		return 0;
> > +	flags = get_eflags();
> > +	if ((flags & extraflags) == extraflags) {
> > +		printf("[OK]\tThe syscall worked and flags are still set\n");
> >  	} else {
> > -		printf("[FAIL]\tThe syscall worked but NT was cleared\n");
> > -		return 1;
> > +		printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n",
> > +		       flags, extraflags);
> > +		nerrs++;
> >  	}
> >  }
> > +
> > +int main()  
> 
> ERROR: Bad function definition - int main() should probably be int main(void)

int main(void) is wrong as there are passed arguments

int main() is ok (in C89 at least) because it means "there are unknown
arguments"

int main(int argc, char *argv[]) is allowed

int main(void) is not safe on all platforms because some compilers
choose to do the argument cleanup in the return path of the called
function. Having the wrong number of arguments doesn't end well in such
cases. I doubt any Linux platforms do this but we shouldn't be
encouraging bad programming techniques 8)

Alan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
  2016-03-02 14:01     ` One Thousand Gnomes
@ 2016-03-02 14:28       ` Borislav Petkov
  2016-03-02 19:03         ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-03-02 14:28 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Andy Lutomirski, x86, linux-kernel, Oleg Nesterov, Andrew Cooper,
	Brian Gerst

On Wed, Mar 02, 2016 at 02:01:15PM +0000, One Thousand Gnomes wrote:
> int main(void) is wrong as there are passed arguments

Not in this particular case - test doesn't take args.

> int main() is ok (in C89 at least) because it means "there are unknown
> arguments"
> 
> int main(int argc, char *argv[]) is allowed
> 
> int main(void) is not safe on all platforms because some compilers
> choose to do the argument cleanup in the return path of the called
> function. Having the wrong number of arguments doesn't end well in such
> cases. I doubt any Linux platforms do this but we shouldn't be
> encouraging bad programming techniques 8)

There's also the variadic thing. Here's hpa's sermon from a couple of
years ago:

http://thread.gmane.org/gmane.linux.kernel/1268751/focus=1268792

:-))))

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
  2016-03-02 14:28       ` Borislav Petkov
@ 2016-03-02 19:03         ` Andy Lutomirski
  2016-03-02 19:30           ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2016-03-02 19:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andrew Cooper, Oleg Nesterov, Brian Gerst, linux-kernel,
	One Thousand Gnomes, X86 ML

On Mar 2, 2016 6:29 AM, "Borislav Petkov" <bp@alien8.de> wrote:
>
> On Wed, Mar 02, 2016 at 02:01:15PM +0000, One Thousand Gnomes wrote:
> > int main(void) is wrong as there are passed arguments
>
> Not in this particular case - test doesn't take args.

IIRC the C standard says otherwise. main is special.

Arguably checkpatch should learn about that -- it doesn't matter for
*kernel* code, but this is a user program.  (Also, it's shamelessly
written in C99.  I figure 15 years is long enough...)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
  2016-03-02 19:03         ` Andy Lutomirski
@ 2016-03-02 19:30           ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-03-02 19:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Cooper, Oleg Nesterov, Brian Gerst, linux-kernel,
	One Thousand Gnomes, X86 ML

On Wed, Mar 02, 2016 at 11:03:44AM -0800, Andy Lutomirski wrote:
> IIRC the C standard says otherwise. main is special.

My C99 draft pdf says either "int main(void)" or "int main(int argc,
char *argv[])".

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
  2016-03-06 16:16   ` Andy Lutomirski
@ 2016-03-07  8:29     ` Ingo Molnar
  0 siblings, 0 replies; 29+ messages in thread
From: Ingo Molnar @ 2016-03-07  8:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Cooper, Oleg Nesterov, Borislav Petkov, Brian Gerst,
	linux-kernel, X86 ML, Linus Torvalds, Thomas Gleixner,
	H. Peter Anvin, Andrew Morton


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Mar 6, 2016 12:22 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
> >
> >
> > * Andy Lutomirski <luto@kernel.org> wrote:
> >
> > > This series applies to the result of merging tip:x86/asm and
> > > tip:x86/urgent.  I've been testing on a somewhat bastardized base,
> > > because tip currently doesn't work on my laptop in 32-bit mode.  (That
> > > bug is fixed in Linus' tree.)
> >
> > Hm, which fix is that?
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=de9e478b9d49f3a0214310d921450cf5bb4a21e6

ah, indeed.

I've merged v4.5-rc7 into tip:x86/asm, which includes this fix, to make the topic 
branch bisectable on SkyLake as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
  2016-03-06  8:22 ` Ingo Molnar
@ 2016-03-06 16:16   ` Andy Lutomirski
  2016-03-07  8:29     ` Ingo Molnar
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2016-03-06 16:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Cooper, Oleg Nesterov, Borislav Petkov, Brian Gerst,
	linux-kernel, X86 ML

On Mar 6, 2016 12:22 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Andy Lutomirski <luto@kernel.org> wrote:
>
> > This series applies to the result of merging tip:x86/asm and
> > tip:x86/urgent.  I've been testing on a somewhat bastardized base,
> > because tip currently doesn't work on my laptop in 32-bit mode.  (That
> > bug is fixed in Linus' tree.)
>
> Hm, which fix is that?

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=de9e478b9d49f3a0214310d921450cf5bb4a21e6

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
  2016-03-06  8:31 ` Ingo Molnar
@ 2016-03-06 16:16   ` Andy Lutomirski
  0 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-03-06 16:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Cooper, Oleg Nesterov, Borislav Petkov, Brian Gerst,
	linux-kernel, X86 ML, Linus Torvalds

On Mar 6, 2016 12:31 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Andy Lutomirski <luto@kernel.org> wrote:
>
> > hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER
> > path.  Little did he know...
>
> Btw., before we further change this code, something else I think would be very
> useful. We have countless system call entry points on x86 CPUs, and they are now
> consistently named and are very easy to grep for:
>
>  triton:~/tip> git grep 'ENTRY(entry_' arch/x86/entry/
>  arch/x86/entry/entry_32.S:ENTRY(entry_SYSENTER_32)
>  arch/x86/entry/entry_32.S:ENTRY(entry_INT80_32)
>  arch/x86/entry/entry_64.S:ENTRY(entry_SYSCALL_64)
>  arch/x86/entry/entry_64_compat.S:ENTRY(entry_SYSENTER_compat)
>  arch/x86/entry/entry_64_compat.S:ENTRY(entry_SYSCALL_compat)
>  arch/x86/entry/entry_64_compat.S:ENTRY(entry_INT80_compat)
>
> Furthermore, each entry point has extensive comments, except one important detail:
> none of the comments really explains the circumstances under which the entry
> points are _used_ by user-space.
>
> I'd like to see something like:
>
> arch/x86/entry/entry_64.S:ENTRY(entry_SYSCALL_64)
>
>         *
>         * The 64-bit SYSCALL instruction is used by all modern 64-bit user-space
>         * code to execute most system calls: this instruction is the fastest and
>         * sanest implementation on modern Intel and AMD CPUs.
>         *
>
> ... and we should add similar explanations for all of the 6 entry points, with
> caveats and limitations listed generously.
>
> Especially valuable would be to list eventual 'strange' usages of the various
> syscall instructions, used by rare packages, compatibility layers, emulators,
> embedded libraries, etc. (To the extent we know about them, obviously.)

I'll send a follow-up patch.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
  2016-03-06  5:52 Andy Lutomirski
  2016-03-06  8:22 ` Ingo Molnar
@ 2016-03-06  8:31 ` Ingo Molnar
  2016-03-06 16:16   ` Andy Lutomirski
  1 sibling, 1 reply; 29+ messages in thread
From: Ingo Molnar @ 2016-03-06  8:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Linus Torvalds


* Andy Lutomirski <luto@kernel.org> wrote:

> hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER
> path.  Little did he know...

Btw., before we further change this code, something else I think would be very 
useful. We have countless system call entry points on x86 CPUs, and they are now 
consistently named and are very easy to grep for:

 triton:~/tip> git grep 'ENTRY(entry_' arch/x86/entry/
 arch/x86/entry/entry_32.S:ENTRY(entry_SYSENTER_32)
 arch/x86/entry/entry_32.S:ENTRY(entry_INT80_32)
 arch/x86/entry/entry_64.S:ENTRY(entry_SYSCALL_64)
 arch/x86/entry/entry_64_compat.S:ENTRY(entry_SYSENTER_compat)
 arch/x86/entry/entry_64_compat.S:ENTRY(entry_SYSCALL_compat)
 arch/x86/entry/entry_64_compat.S:ENTRY(entry_INT80_compat)

Furthermore, each entry point has extensive comments, except one important detail: 
none of the comments really explains the circumstances under which the entry 
points are _used_ by user-space.

I'd like to see something like:

arch/x86/entry/entry_64.S:ENTRY(entry_SYSCALL_64)

	*
	* The 64-bit SYSCALL instruction is used by all modern 64-bit user-space 
	* code to execute most system calls: this instruction is the fastest and 
	* sanest implementation on modern Intel and AMD CPUs.
	*

... and we should add similar explanations for all of the 6 entry points, with 
caveats and limitations listed generously.

Especially valuable would be to list eventual 'strange' usages of the various 
syscall instructions, used by rare packages, compatibility layers, emulators, 
embedded libraries, etc. (To the extent we know about them, obviously.)

I.e. it would be very nice to do a full documentation of our current system call 
usage patterns, as utilized by user-space. Beyond the documentation value this 
will also help people prioritize optimizations between the various entry points - 
which should be optimized more, which entry point matters less, etc.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
  2016-03-06  5:52 Andy Lutomirski
@ 2016-03-06  8:22 ` Ingo Molnar
  2016-03-06 16:16   ` Andy Lutomirski
  2016-03-06  8:31 ` Ingo Molnar
  1 sibling, 1 reply; 29+ messages in thread
From: Ingo Molnar @ 2016-03-06  8:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst


* Andy Lutomirski <luto@kernel.org> wrote:

> This series applies to the result of merging tip:x86/asm and
> tip:x86/urgent.  I've been testing on a somewhat bastardized base,
> because tip currently doesn't work on my laptop in 32-bit mode.  (That
> bug is fixed in Linus' tree.)

Hm, which fix is that?

Generally, tip:master closely tracks Linus's latest (so it should work as-is), but 
individual sub-trees are only updated when necessary - but a refresh can be done 
anytime if requested.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
@ 2016-03-06  5:52 Andy Lutomirski
  2016-03-06  8:22 ` Ingo Molnar
  2016-03-06  8:31 ` Ingo Molnar
  0 siblings, 2 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-03-06  5:52 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER
path.  Little did he know...

This series makes the observed behavior of SYSENTER wrt flags the same
for all sane flags and kernel bitnesses.  That is, SYSENTER preserves
flags now unless you do a syscall that explicitly changes flags, and
the HW flags that the syscall executes with are sanitized.  This
includes NT, TF, AC and all arithmetic flags.  Prior to this series,
32-bit kernels clobbered TF and the arithmetic flags and behaved
highly erratically if NT was set.  (If IF is cleared by evil userspace
when SYSENTER starts, IF will be set again on return.  There's nothing
the kernel can do about this -- SYSENTER inherently forgets the state
of IF.)

This series speeds up SYSENTER on all kernels by a surprisingly large
amount on Skylake because it eliminates an unconditional CLAC.

While SYSENTER used to handle TF correctly as far as I can tell on
64-bit kernels, the means by which it did so was heavily tangled up in
the ptrace single-step logic.  It now works just like all the other
kernel entries except insofar as do_debug has a simple special case
for it.  Relatedly, the bizarre and poorly explained old fixup in
do_debug is now hidden behind a WARN_ON_ONCE in preparation for
deleting it at some point.

The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels
used to be both terrifying and incorrect.  (It doesn't appear to have
been exploitably bad, but the reason for that is subtle, and the code
was certainy more fragile than it deserved to me.)  We still need a
special fixup, but it's much simpler now.

While I was doing all this, I also noticed that DR6 and BTF handling
in do_debug was a bit off.  Two of the patches in here try to fix it
up.

Have fun!

tl;dr: Cleanups and sanity fixes here, but no security fixes, and I
don't think anything needs to be backported or put in x86/urgent.

This series applies to the result of merging tip:x86/asm and
tip:x86/urgent.  I've been testing on a somewhat bastardized base,
because tip currently doesn't work on my laptop in 32-bit mode.  (That
bug is fixed in Linus' tree.)

Changes from v1:
 - s/Sysenter/SYSENTER in two places (Borislav)
 - int main() -> int main(void) (Borislav)

Andy Lutomirski (10):
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing
    FLAGS test
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/traps: Clear DR6 early in do_debug and improve the comment
  x86/entry: Vastly simplify SYSENTER TF handling
  x86/entry: Only allocate space for SYSENTER_stack if needed
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry/32: Add and check a stack canary for the SYSENTER stack

 arch/x86/entry/entry_32.S                | 182 ++++++++++++++++++-------------
 arch/x86/entry/entry_64_compat.S         |  15 ++-
 arch/x86/include/asm/processor.h         |   5 +-
 arch/x86/include/asm/proto.h             |  15 ++-
 arch/x86/kernel/asm-offsets_32.c         |   5 +
 arch/x86/kernel/process.c                |   3 +
 arch/x86/kernel/traps.c                  |  87 ++++++++++++---
 tools/testing/selftests/x86/syscall_nt.c |  57 ++++++++--
 8 files changed, 263 insertions(+), 106 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-03-07  8:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-29  5:28 [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
2016-02-29  5:28 ` [PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
2016-03-02 12:59   ` Borislav Petkov
2016-03-02 14:01     ` One Thousand Gnomes
2016-03-02 14:28       ` Borislav Petkov
2016-03-02 19:03         ` Andy Lutomirski
2016-03-02 19:30           ` Borislav Petkov
2016-02-29  5:28 ` [PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
2016-02-29 20:39   ` Borislav Petkov
2016-02-29 20:45     ` Andy Lutomirski
2016-02-29 22:09       ` Borislav Petkov
2016-02-29 22:33         ` Brian Gerst
2016-02-29 22:37           ` Borislav Petkov
2016-02-29  5:28 ` [PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
2016-03-02 13:24   ` Borislav Petkov
2016-02-29  5:28 ` [PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT Andy Lutomirski
2016-02-29  5:28 ` [PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions Andy Lutomirski
2016-02-29  5:28 ` [PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment Andy Lutomirski
2016-02-29  5:28 ` [PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling Andy Lutomirski
2016-02-29  5:28 ` [PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed Andy Lutomirski
2016-02-29  5:28 ` [PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup Andy Lutomirski
2016-02-29  5:28 ` [PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack Andy Lutomirski
2016-02-29 18:55 ` [PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
2016-03-06  5:52 Andy Lutomirski
2016-03-06  8:22 ` Ingo Molnar
2016-03-06 16:16   ` Andy Lutomirski
2016-03-07  8:29     ` Ingo Molnar
2016-03-06  8:31 ` Ingo Molnar
2016-03-06 16:16   ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.