All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
@ 2016-03-10  3:00 Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 01/12] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER
path.  Little did he know...

This series makes the observed behavior of SYSENTER wrt flags the same
for all sane flags and kernel bitnesses.  That is, SYSENTER preserves
flags now unless you do a syscall that explicitly changes flags, and
the HW flags that the syscall executes with are sanitized.  This
includes NT, TF, AC and all arithmetic flags.  Prior to this series,
32-bit kernels clobbered TF and the arithmetic flags and behaved
highly erratically if NT was set.  (If IF is cleared by evil userspace
when SYSENTER starts, IF will be set again on return.  There's nothing
the kernel can do about this -- SYSENTER inherently forgets the state
of IF.)

This series speeds up SYSENTER on all kernels by a surprisingly large
amount on Skylake because it eliminates an unconditional CLAC.

While SYSENTER used to handle TF correctly as far as I can tell on
64-bit kernels, the means by which it did so was heavily tangled up in
the ptrace single-step logic.  It now works just like all the other
kernel entries except insofar as do_debug has a simple special case
for it.  Relatedly, the bizarre and poorly explained old fixup in
do_debug is now hidden behind a WARN_ON_ONCE in preparation for
deleting it at some point.

The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels
used to be both terrifying and incorrect.  (It doesn't appear to have
been exploitably bad, but the reason for that is subtle, and the code
was certainy more fragile than it deserved to me.)  We still need a
special fixup, but it's much simpler now.

While I was doing all this, I also noticed that DR6 and BTF handling
in do_debug was a bit off.  Two of the patches in here try to fix it
up.

Have fun!

tl;dr: Cleanups and sanity fixes here, but no security fixes, and I
don't think anything needs to be backported or put in x86/urgent.

This series applies to the result of merging tip:x86/asm and
tip:x86/urgent.  I've been testing on a somewhat bastardized base,
because tip currently doesn't work on my laptop in 32-bit mode.  (That
bug is fixed in Linus' tree.)

Changes from v1:
 - s/Sysenter/SYSENTER in two places (Borislav)
 - int main() -> int main(void) (Borislav)
 - Fix some "SYENTER" typoed
 - Add the last two patches (11 was a missed cleanup and 12 is a respin)
 - Patch 12 differs from the old version in that Ingo's comments are hopefully addressed.

Andy Lutomirski (12):
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing
    FLAGS test
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/traps: Clear DR6 early in do_debug and improve the comment
  x86/entry: Vastly simplify SYSENTER TF handling
  x86/entry: Only allocate space for SYSENTER_stack if needed
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry/32: Add and check a stack canary for the SYSENTER stack
  x86/entry: Remove TIF_SINGLESTEP entry work
  x86/entry: Improve system call entry comments

 arch/x86/entry/common.c                  |  10 --
 arch/x86/entry/entry_32.S                | 243 +++++++++++++++++++++----------
 arch/x86/entry/entry_64.S                |  10 ++
 arch/x86/entry/entry_64_compat.S         | 100 +++++++++----
 arch/x86/include/asm/processor.h         |   5 +-
 arch/x86/include/asm/proto.h             |  15 +-
 arch/x86/include/asm/thread_info.h       |   2 +-
 arch/x86/kernel/asm-offsets_32.c         |   5 +
 arch/x86/kernel/process.c                |   3 +
 arch/x86/kernel/traps.c                  |  87 ++++++++---
 tools/testing/selftests/x86/syscall_nt.c |  57 +++++++-
 11 files changed, 392 insertions(+), 145 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/12] selftests/x86: In syscall_nt, test NT|TF as well
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:00   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 02/12] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Setting TF prevents fastpath returns in most cases, which causes the
test to fail on 32-bit kernels because 32-bit kernels do not, in
fact, handle NT correctly on SYSENTER entries.

The next patch will fix 32-bit kernels.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/syscall_nt.c | 57 +++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c
index 60c06af4646a..43fcab367fb0 100644
--- a/tools/testing/selftests/x86/syscall_nt.c
+++ b/tools/testing/selftests/x86/syscall_nt.c
@@ -17,6 +17,9 @@
 
 #include <stdio.h>
 #include <unistd.h>
+#include <string.h>
+#include <signal.h>
+#include <err.h>
 #include <sys/syscall.h>
 #include <asm/processor-flags.h>
 
@@ -26,6 +29,8 @@
 # define WIDTH "l"
 #endif
 
+static unsigned int nerrs;
+
 static unsigned long get_eflags(void)
 {
 	unsigned long eflags;
@@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags)
 		      : : "rm" (eflags) : "flags");
 }
 
-int main()
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
 {
-	printf("[RUN]\tSet NT and issue a syscall\n");
-	set_eflags(get_eflags() | X86_EFLAGS_NT);
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static void sigtrap(int sig, siginfo_t *si, void *ctx_void)
+{
+}
+
+static void do_it(unsigned long extraflags)
+{
+	unsigned long flags;
+
+	set_eflags(get_eflags() | extraflags);
 	syscall(SYS_getpid);
-	if (get_eflags() & X86_EFLAGS_NT) {
-		printf("[OK]\tThe syscall worked and NT is still set\n");
-		return 0;
+	flags = get_eflags();
+	if ((flags & extraflags) == extraflags) {
+		printf("[OK]\tThe syscall worked and flags are still set\n");
 	} else {
-		printf("[FAIL]\tThe syscall worked but NT was cleared\n");
-		return 1;
+		printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n",
+		       flags, extraflags);
+		nerrs++;
 	}
 }
+
+int main(void)
+{
+	printf("[RUN]\tSet NT and issue a syscall\n");
+	do_it(X86_EFLAGS_NT);
+
+	/*
+	 * Now try it again with TF set -- TF forces returns via IRET in all
+	 * cases except non-ptregs-using 64-bit full fast path syscalls.
+	 */
+
+	sethandler(SIGTRAP, sigtrap, 0);
+
+	printf("[RUN]\tSet NT|TF and issue a syscall\n");
+	do_it(X86_EFLAGS_NT | X86_EFLAGS_TF);
+
+	return nerrs == 0 ? 0 : 1;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/12] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 01/12] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:01   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 03/12] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

CLAC is slow, and the SYSENTER code already has an unlikely path
that runs if unusual flags are set.  Drop the CLAC and instead rely
on the unlikely path to clear AC.

This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
make this faster please!)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64_compat.S | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 89bcb4979e7a..8d3728131809 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat)
 	 */
 	pushfq				/* pt_regs->flags (except IF = 0) */
 	orl	$X86_EFLAGS_IF, (%rsp)	/* Fix saved flags */
-	ASM_CLAC			/* Clear AC after saving FLAGS */
-
 	pushq	$__USER32_CS		/* pt_regs->cs */
 	xorq    %r8,%r8
 	pushq	%r8			/* pt_regs->ip = 0 (placeholder) */
@@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat)
 	cld
 
 	/*
-	 * Sysenter doesn't filter flags, so we need to clear NT
+	 * SYSENTER doesn't filter flags, so we need to clear NT and AC
 	 * ourselves.  To save a few cycles, we can check whether
-	 * NT was set instead of doing an unconditional popfq.
+	 * either was set instead of doing an unconditional popfq.
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
@@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat)
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT, EFLAGS(%rsp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/12] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 01/12] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 02/12] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:01   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 04/12] x86/entry/32: Restore FLAGS on SYSEXIT Andy Lutomirski
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

This makes the 32-bit code work just like the 64-bit code.  It should
speed up syscalls on 32-bit kernels on Skylake by something like 20
cycles (by analogy to the 64-bit compat case).

It also cleans up NT just like we do for the 64-bit case.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 684d2505b93a..86665f93ff5d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -294,7 +294,6 @@ sysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
 	pushfl				/* pt_regs->flags (except IF = 0) */
-	ASM_CLAC			/* Clear AC after saving FLAGS */
 	orl	$X86_EFLAGS_IF, (%esp)	/* Fix IF */
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
@@ -302,6 +301,23 @@ sysenter_past_esp:
 	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
 
 	/*
+	 * SYSENTER doesn't filter flags, so we need to clear NT and AC
+	 * ourselves.  To save a few cycles, we can check whether
+	 * either was set instead of doing an unconditional popfq.
+	 * This needs to happen before enabling interrupts so that
+	 * we don't get preempted with NT set.
+	 *
+	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
+	 * out-of-line as an optimization: NT is unlikely to be set in the
+	 * majority of the cases and instead of polluting the I$ unnecessarily,
+	 * we're keeping that code behind a branch which will predict as
+	 * not-taken and therefore its instructions won't be fetched.
+	 */
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+	jnz	.Lsysenter_fix_flags
+.Lsysenter_flags_fixed:
+
+	/*
 	 * User mode is traced as though IRQs are on, and SYSENTER
 	 * turned them off.
 	 */
@@ -339,6 +355,11 @@ sysenter_past_esp:
 .popsection
 	_ASM_EXTABLE(1b, 2b)
 	PTGS_TO_GS_EX
+
+.Lsysenter_fix_flags:
+	pushl	$X86_EFLAGS_FIXED
+	popfl
+	jmp	.Lsysenter_flags_fixed
 ENDPROC(entry_SYSENTER_32)
 
 	# system call handler stub
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/12] x86/entry/32: Restore FLAGS on SYSEXIT
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (2 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 03/12] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:01   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 05/12] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions Andy Lutomirski
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.

With this patch applied, native kernels should always honor
task_pt_regs()->flags, which opens the door for some sys_iopl
cleanups.  I'll do those as a separate series, though, since getting
it right will involve tweaking some paravirt ops.

(The short version is that, before this patch, sys_iopl, invoked via
 SYSENTER, wasn't guaranteed to ever transfer the updated
 regs->flags, so sys_iopl had to change the hardware flags register
 as well.)

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 86665f93ff5d..d1437fdbf597 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -343,6 +343,15 @@ sysenter_past_esp:
 	popl	%eax			/* pt_regs->ax */
 
 	/*
+	 * Restore all flags except IF (we restore IF separately because
+	 * STI gives a one-instruction window in which we won't be interrupted,
+	 * whereas POPF does not.
+	 */
+	addl	$PT_EFLAGS-PT_DS, %esp	/* point esp at pt_regs->flags */
+	btr	$X86_EFLAGS_IF_BIT, (%esp)
+	popfl
+
+	/*
 	 * Return back to the vDSO, which will pop ecx and edx.
 	 * Don't bother with DS and ES (they already contain __USER_DS).
 	 */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/12] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (3 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 04/12] x86/entry/32: Restore FLAGS on SYSEXIT Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:02   ` [tip:x86/asm] x86/entry/traps: " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 06/12] x86/traps: Clear DR6 early in do_debug and improve the comment Andy Lutomirski
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

The SDM says that debug exceptions clear BTF, and we need to keep
TIF_BLOCKSTEP in sync with BTF.  Clear it unconditionally and improve
the comment.

I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not
to be cleared was just an oversight.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index dd2c2e66c2e1..19e6cfa501e3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	dr6 &= ~DR6_RESERVED;
 
 	/*
+	 * The SDM says "The processor clears the BTF flag when it
+	 * generates a debug exception."  Clear TIF_BLOCKSTEP to keep
+	 * TIF_BLOCKSTEP in sync with the hardware BTF flag.
+	 */
+	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
+
+	/*
 	 * If dr6 has no reason to give us about the origin of this trap,
 	 * then it's very likely the result of an icebp/int01 trap.
 	 * User wants a sigtrap for that.
@@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	/* DR6 may or may not be cleared by the CPU */
 	set_debugreg(0, 6);
 
-	/*
-	 * The processor cleared BTF, so don't mark that we need it set.
-	 */
-	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
-
 	/* Store the virtualized DR6 value */
 	tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/12] x86/traps: Clear DR6 early in do_debug and improve the comment
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (4 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 05/12] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:02   ` [tip:x86/asm] x86/entry/traps: Clear DR6 early in do_debug() " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 07/12] x86/entry: Vastly simplify SYSENTER TF handling Andy Lutomirski
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Leaving any bits set in DR6 on return from a debug exception is
asking for trouble.  Prevent it by writing zero right away and
clarify the comment.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 19e6cfa501e3..6dddc220e3ed 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	ist_enter(regs);
 
 	get_debugreg(dr6, 6);
+	/*
+	 * The Intel SDM says:
+	 *
+	 *   Certain debug exceptions may clear bits 0-3. The remaining
+	 *   contents of the DR6 register are never cleared by the
+	 *   processor. To avoid confusion in identifying debug
+	 *   exceptions, debug handlers should clear the register before
+	 *   returning to the interrupted task.
+	 *
+	 * Keep it simple: clear DR6 immediately.
+	 */
+	set_debugreg(0, 6);
 
 	/* Filter out all the reserved bits which are preset to 1 */
 	dr6 &= ~DR6_RESERVED;
@@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
 		goto exit;
 
-	/* DR6 may or may not be cleared by the CPU */
-	set_debugreg(0, 6);
-
 	/* Store the virtualized DR6 value */
 	tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/12] x86/entry: Vastly simplify SYSENTER TF handling
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (5 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 06/12] x86/traps: Clear DR6 early in do_debug and improve the comment Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:03   ` [tip:x86/asm] x86/entry: Vastly simplify SYSENTER TF (single-step) handling tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 08/12] x86/entry: Only allocate space for SYSENTER_stack if needed Andy Lutomirski
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
if a user does SYSENTER with TF set, we will single-step through the
kernel until something clears TF.  There is absolutely nothing we can
do to prevent this short of turning off SYSENTER [1].

Simplify the handling considerably with two changes:

1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
   add TF to that list of flags to sanitize with no overhead whatsoever.

2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.

That's all we need to do.

Don't get too excited -- our handling is still buggy on 32-bit
kernels.  There's nothing wrong with the SYSENTER code itself, but
the #DB prologue has a clever fixup for traps on the very first
instruction of entry_SYSENTER_32, and the fixup doesn't work quite
correctly.  The next two patches will fix that.

[1] We could probably prevent it by forcing BTF on at all times and
    making sure we clear TF before any branches in the SYSENTER
    code.  Needless to say, this is a bad idea.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S        | 42 ++++++++++++++++++++++----------
 arch/x86/entry/entry_64_compat.S |  9 ++++++-
 arch/x86/include/asm/proto.h     | 15 ++++++++++--
 arch/x86/kernel/traps.c          | 52 +++++++++++++++++++++++++++++++++-------
 4 files changed, 94 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d1437fdbf597..c227039ce62d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -287,7 +287,26 @@ need_resched:
 END(resume_kernel)
 #endif
 
-	# SYSENTER  call handler stub
+GLOBAL(__begin_SYSENTER_singlestep_region)
+/*
+ * All code from here through __end_SYSENTER_singlestep_region is subject
+ * to being single-stepped if a user program sets TF and executes SYSENTER.
+ * There is absolutely nothing that we can do to prevent this from happening
+ * (thanks Intel!).  To keep our handling of this situation as simple as
+ * possible, we handle TF just like AC and NT, except that our #DB handler
+ * will ignore all of the single-step traps generated in this range.
+ */
+
+#ifdef CONFIG_XEN
+/*
+ * Xen doesn't set %esp to be precisely what the normal SYSENTER
+ * entry point expects, so fix it up before using the normal path.
+ */
+ENTRY(xen_sysenter_target)
+	addl	$5*4, %esp			/* remove xen-provided frame */
+	jmp	sysenter_past_esp
+#endif
+
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_sysenter_sp0(%esp), %esp
 sysenter_past_esp:
@@ -301,19 +320,25 @@ sysenter_past_esp:
 	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
 
 	/*
-	 * SYSENTER doesn't filter flags, so we need to clear NT and AC
-	 * ourselves.  To save a few cycles, we can check whether
+	 * SYSENTER doesn't filter flags, so we need to clear NT, AC
+	 * and TF ourselves.  To save a few cycles, we can check whether
 	 * either was set instead of doing an unconditional popfq.
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
+	 * If TF is set, we will single-step all the way to here -- do_debug
+	 * will ignore all the traps.  (Yes, this is slow, but so is
+	 * single-stepping in general.  This allows us to avoid having
+	 * a more complicated code to handle the case where a user program
+	 * forces us to single-step through the SYSENTER entry code.)
+	 *
 	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 	 * out-of-line as an optimization: NT is unlikely to be set in the
 	 * majority of the cases and instead of polluting the I$ unnecessarily,
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -369,6 +394,7 @@ sysenter_past_esp:
 	pushl	$X86_EFLAGS_FIXED
 	popfl
 	jmp	.Lsysenter_flags_fixed
+GLOBAL(__end_SYSENTER_singlestep_region)
 ENDPROC(entry_SYSENTER_32)
 
 	# system call handler stub
@@ -651,14 +677,6 @@ ENTRY(spurious_interrupt_bug)
 END(spurious_interrupt_bug)
 
 #ifdef CONFIG_XEN
-/*
- * Xen doesn't set %esp to be precisely what the normal SYSENTER
- * entry point expects, so fix it up before using the normal path.
- */
-ENTRY(xen_sysenter_target)
-	addl	$5*4, %esp			/* remove xen-provided frame */
-	jmp	sysenter_past_esp
-
 ENTRY(xen_hypervisor_callback)
 	pushl	$-1				/* orig_ax = -1 => not a system call */
 	SAVE_ALL
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 8d3728131809..26469c11796d 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -94,13 +94,19 @@ ENTRY(entry_SYSENTER_compat)
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
+	 * If TF is set, we will single-step all the way to here -- do_debug
+	 * will ignore all the traps.  (Yes, this is slow, but so is
+	 * single-stepping in general.  This allows us to avoid having
+	 * a more complicated code to handle the case where a user program
+	 * forces us to single-step through the SYSENTER entry code.)
+	 *
 	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 	 * out-of-line as an optimization: NT is unlikely to be set in the
 	 * majority of the cases and instead of polluting the I$ unnecessarily,
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, EFLAGS(%rsp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -121,6 +127,7 @@ ENTRY(entry_SYSENTER_compat)
 	pushq	$X86_EFLAGS_FIXED
 	popfq
 	jmp	.Lsysenter_flags_fixed
+GLOBAL(__end_entry_SYSENTER_compat)
 ENDPROC(entry_SYSENTER_compat)
 
 /*
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index a4a77286cb1d..9b9b30b19441 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -7,12 +7,23 @@
 
 void syscall_init(void);
 
+#ifdef CONFIG_X86_64
 void entry_SYSCALL_64(void);
-void entry_SYSCALL_compat(void);
+#endif
+
+#ifdef CONFIG_X86_32
 void entry_INT80_32(void);
-void entry_INT80_compat(void);
 void entry_SYSENTER_32(void);
+void __begin_SYSENTER_singlestep_region(void);
+void __end_SYSENTER_singlestep_region(void);
+#endif
+
+#ifdef CONFIG_IA32_EMULATION
 void entry_SYSENTER_compat(void);
+void __end_entry_SYSENTER_compat(void);
+void entry_SYSCALL_compat(void);
+void entry_INT80_compat(void);
+#endif
 
 void x86_configure_nx(void);
 void x86_report_nx(void);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6dddc220e3ed..80928ea78373 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -559,6 +559,29 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 NOKPROBE_SYMBOL(fixup_bad_iret);
 #endif
 
+static bool is_sysenter_singlestep(struct pt_regs *regs)
+{
+	/*
+	 * We don't try for precision here.  If we're anywhere in the region of
+	 * code that can be single-stepped in the SYSENTER entry path, then
+	 * assume that this is a useless single-step trap due to SYSENTER
+	 * being invoked with TF set.  (We don't know in advance exactly
+	 * which instructions will be hit because BTF could plausibly
+	 * be set.
+	 */
+#ifdef CONFIG_X86_32
+	return (regs->ip - (unsigned long)__begin_SYSENTER_singlestep_region) <
+		(unsigned long)__end_SYSENTER_singlestep_region -
+		(unsigned long)__begin_SYSENTER_singlestep_region;
+#elif defined(CONFIG_IA32_EMULATION)
+	return (regs->ip - (unsigned long)entry_SYSENTER_compat) <
+		(unsigned long)__end_entry_SYSENTER_compat -
+		(unsigned long)entry_SYSENTER_compat;
+#else
+	return false;
+#endif
+}
+
 /*
  * Our handling of the processor debug registers is non-trivial.
  * We do not clear them on entry and exit from the kernel. Therefore
@@ -616,6 +639,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	 */
 	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
 
+	if (unlikely(!user_mode(regs) && (dr6 & DR_STEP) &&
+		     is_sysenter_singlestep(regs))) {
+		dr6 &= ~DR_STEP;
+		if (!dr6)
+			goto exit;
+		/*
+		 * else we might have gotten a single-step trap and hit a
+		 * watchpoint at the same time, in which case we should fall
+		 * through and handle the watchpoint.
+		 */
+	}
+
 	/*
 	 * If dr6 has no reason to give us about the origin of this trap,
 	 * then it's very likely the result of an icebp/int01 trap.
@@ -624,7 +659,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	if (!dr6 && user_mode(regs))
 		user_icebp = 1;
 
-	/* Catch kmemcheck conditions first of all! */
+	/* Catch kmemcheck conditions! */
 	if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
 		goto exit;
 
@@ -659,14 +694,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 		goto exit;
 	}
 
-	/*
-	 * Single-stepping through system calls: ignore any exceptions in
-	 * kernel space, but re-enable TF when returning to user mode.
-	 *
-	 * We already checked v86 mode above, so we can check for kernel mode
-	 * by just checking the CPL of CS.
-	 */
-	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+	if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
+		/*
+		 * Historical junk that used to handle SYSENTER single-stepping.
+		 * This should be unreachable now.  If we survive for a while
+		 * without anyone hitting this warning, we'll turn this into
+		 * an oops.
+		 */
 		tsk->thread.debugreg6 &= ~DR_STEP;
 		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
 		regs->flags &= ~X86_EFLAGS_TF;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/12] x86/entry: Only allocate space for SYSENTER_stack if needed
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (6 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 07/12] x86/entry: Vastly simplify SYSENTER TF handling Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:03   ` [tip:x86/asm] x86/entry: Only allocate space for tss_struct::SYSENTER_stack " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 09/12] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup Andy Lutomirski
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

The SYSENTER stack is only used on 32-bit kernels.  Remove it in
64-bit kernels.

(We may end up using it down the road on 64-bit kernels.  If so,
 we'll re-enable it for CONFIG_IA32_EMULATION.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ecb410310e70..7cd01b71b5bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -297,10 +297,12 @@ struct tss_struct {
 	 */
 	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
 
+#ifdef CONFIG_X86_32
 	/*
 	 * Space for the temporary SYSENTER stack:
 	 */
 	unsigned long		SYSENTER_stack[64];
+#endif
 
 } ____cacheline_aligned;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/12] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (7 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 08/12] x86/entry: Only allocate space for SYSENTER_stack if needed Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:03   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 10/12] x86/entry/32: Add and check a stack canary for the SYSENTER stack Andy Lutomirski
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
so the exception handler is invoked on the temporary SYSENTER stack.

Because the SYSENTER stack is very small, we have a fixup to switch
off the stack quickly when this happens.  The old fixup had several issues:

1. It checked the interrupt frame's CS and EIP.  This wasn't
   obviously correct on Xen or if vm86 mode was in use [1].

2. In the NMI handler, it did some frightening digging into the
   stack frame.  I'm not convinced this digging was correct.

3. The fixup didn't switch stacks and then switch back.  Instead, it
   synthesized a brand new stack frame that would redirect the IRET
   back to the SYSENTER code.  That frame was highly questionable.
   For one thing, if NMI nested inside #DB, we would effectively
   abort the #DB prologue, which was probably safe but was
   frightening.  For another, the code used PUSHFL to write the
   FLAGS portion of the frame, which was simply bogus -- by the time
   PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
   flags were clobbered.

Simplify this considerably.  Instead of looking at the saved frame
to see where we came from, check the hardware ESP register against
the SYSENTER stack directly.  Malicious user code cannot spoof the
kernel ESP register, and by moving the check after SAVE_ALL, we can
use normal PER_CPU accesses to find all the relevant addresses.

With this patch applied, the improved syscall_nt_32 test finally
passes on 32-bit kernels.

[1] It isn't obviously correct, but it is nonetheless safe from vm86
    shenanigans as far as I can tell.  A user can't point EIP at
    entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
    like all kernel addresses, is greater than 0xffff and would thus
    violate the CS segment limit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S        | 114 ++++++++++++++++++---------------------
 arch/x86/kernel/asm-offsets_32.c |   5 ++
 2 files changed, 56 insertions(+), 63 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index c227039ce62d..13a7f41c1886 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -976,51 +976,48 @@ error_code:
 	jmp	ret_from_exception
 END(page_fault)
 
-/*
- * Debug traps and NMI can happen at the one SYSENTER instruction
- * that sets up the real kernel stack. Check here, since we can't
- * allow the wrong stack to be used.
- *
- * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
- * already pushed 3 words if it hits on the sysenter instruction:
- * eflags, cs and eip.
- *
- * We just load the right stack, and push the three (known) values
- * by hand onto the new stack - while updating the return eip past
- * the instruction that would have done it for sysenter.
- */
-.macro FIX_STACK offset ok label
-	cmpw	$__KERNEL_CS, 4(%esp)
-	jne	\ok
-\label:
-	movl	TSS_sysenter_sp0 + \offset(%esp), %esp
-	pushfl
-	pushl	$__KERNEL_CS
-	pushl	$sysenter_past_esp
-.endm
-
 ENTRY(debug)
+	/*
+	 * #DB can happen at the first instruction of
+	 * entry_SYSENTER_32 or in Xen's SYSENTER prologue.  If this
+	 * happens, then we will be running on a very small stack.  We
+	 * need to detect this condition and switch to the thread
+	 * stack before calling any C code at all.
+	 *
+	 * If you edit this code, keep in mind that NMIs can happen in here.
+	 */
 	ASM_CLAC
-	cmpl	$entry_SYSENTER_32, (%esp)
-	jne	debug_stack_correct
-	FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
-debug_stack_correct:
 	pushl	$-1				# mark this as an int
 	SAVE_ALL
-	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
+
+	/* Are we currently on the SYSENTER stack? */
+	PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+	subl	%eax, %ecx	/* ecx = (end of SYSENTER_stack) - esp */
+	cmpl	$SIZEOF_SYSENTER_stack, %ecx
+	jb	.Ldebug_from_sysenter_stack
+
+	TRACE_IRQS_OFF
+	call	do_debug
+	jmp	ret_from_exception
+
+.Ldebug_from_sysenter_stack:
+	/* We're on the SYSENTER stack.  Switch off. */
+	movl	%esp, %ebp
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
+	TRACE_IRQS_OFF
 	call	do_debug
+	movl	%ebp, %esp
 	jmp	ret_from_exception
 END(debug)
 
 /*
- * NMI is doubly nasty. It can happen _while_ we're handling
- * a debug fault, and the debug fault hasn't yet been able to
- * clear up the stack. So we first check whether we got  an
- * NMI on the sysenter entry path, but after that we need to
- * check whether we got an NMI on the debug path where the debug
- * fault happened on the sysenter path.
+ * NMI is doubly nasty.  It can happen on the first instruction of
+ * entry_SYSENTER_32 (just like #DB), but it can also interrupt the beginning
+ * of the #DB handler even if that #DB in turn hit before entry_SYSENTER_32
+ * switched stacks.  We handle both conditions by simply checking whether we
+ * interrupted kernel code running on the SYSENTER stack.
  */
 ENTRY(nmi)
 	ASM_CLAC
@@ -1031,41 +1028,32 @@ ENTRY(nmi)
 	popl	%eax
 	je	nmi_espfix_stack
 #endif
-	cmpl	$entry_SYSENTER_32, (%esp)
-	je	nmi_stack_fixup
-	pushl	%eax
-	movl	%esp, %eax
-	/*
-	 * Do not access memory above the end of our stack page,
-	 * it might not exist.
-	 */
-	andl	$(THREAD_SIZE-1), %eax
-	cmpl	$(THREAD_SIZE-20), %eax
-	popl	%eax
-	jae	nmi_stack_correct
-	cmpl	$entry_SYSENTER_32, 12(%esp)
-	je	nmi_debug_stack_check
-nmi_stack_correct:
-	pushl	%eax
+
+	pushl	%eax				# pt_regs->orig_ax
 	SAVE_ALL
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
+
+	/* Are we currently on the SYSENTER stack? */
+	PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+	subl	%eax, %ecx	/* ecx = (end of SYSENTER_stack) - esp */
+	cmpl	$SIZEOF_SYSENTER_stack, %ecx
+	jb	.Lnmi_from_sysenter_stack
+
+	/* Not on SYSENTER stack. */
 	call	do_nmi
 	jmp	restore_all_notrace
 
-nmi_stack_fixup:
-	FIX_STACK 12, nmi_stack_correct, 1
-	jmp	nmi_stack_correct
-
-nmi_debug_stack_check:
-	cmpw	$__KERNEL_CS, 16(%esp)
-	jne	nmi_stack_correct
-	cmpl	$debug, (%esp)
-	jb	nmi_stack_correct
-	cmpl	$debug_esp_fix_insn, (%esp)
-	ja	nmi_stack_correct
-	FIX_STACK 24, nmi_stack_correct, 1
-	jmp	nmi_stack_correct
+.Lnmi_from_sysenter_stack:
+	/*
+	 * We're on the SYSENTER stack.  Switch off.  No one (not even debug)
+	 * is using the thread stack right now, so it's safe for us to use it.
+	 */
+	movl	%esp, %ebp
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
+	call	do_nmi
+	movl	%ebp, %esp
+	jmp	restore_all_notrace
 
 #ifdef CONFIG_X86_ESPFIX32
 nmi_espfix_stack:
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index fdeb0ce07c16..ecdc1d217dc0 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -52,6 +52,11 @@ void foo(void)
 	DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) -
 	       offsetofend(struct tss_struct, SYSENTER_stack));
 
+	/* Offset from cpu_tss to SYSENTER_stack */
+	OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack);
+	/* Size of SYSENTER_stack */
+	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
+
 #if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_GUEST) || defined(CONFIG_LGUEST_MODULE)
 	BLANK();
 	OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/12] x86/entry/32: Add and check a stack canary for the SYSENTER stack
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (8 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 09/12] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:04   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 11/12] x86/entry: Remove TIF_SINGLESTEP entry work Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 12/12] x86/entry: Improve system call entry comments Andy Lutomirski
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

The first instruction of the SYSENTER entry runs on its own tiny
stack.  That stack can be used if a #DB or NMI is delivered before
the SYSENTER prologue switches to a real stack.

We have code in place to prevent us from overflowing the tiny stack.
For added paranoia, add a canary to the stack and check it in
do_debug -- that way, if something goes wrong with the #DB logic,
we'll eventually notice.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 3 ++-
 arch/x86/kernel/process.c        | 3 +++
 arch/x86/kernel/traps.c          | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7cd01b71b5bd..50a6dc871cc0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -299,8 +299,9 @@ struct tss_struct {
 
 #ifdef CONFIG_X86_32
 	/*
-	 * Space for the temporary SYSENTER stack:
+	 * Space for the temporary SYSENTER stack.
 	 */
+	unsigned long		SYSENTER_stack_canary;
 	unsigned long		SYSENTER_stack[64];
 #endif
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c22477..ee9a9792caeb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 	  */
 	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },
 #endif
+#ifdef CONFIG_X86_32
+	.SYSENTER_stack_canary	= STACK_END_MAGIC,
+#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss);
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 80928ea78373..590110119e6a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	debug_stack_usage_dec();
 
 exit:
+#if defined(CONFIG_X86_32)
+	/*
+	 * This is the most likely code path that involves non-trivial use
+	 * of the SYSENTER stack.  Check that we haven't overrun it.
+	 */
+	WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC,
+	     "Overran or corrupted SYSENTER stack\n");
+#endif
 	ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/12] x86/entry: Remove TIF_SINGLESTEP entry work
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (9 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 10/12] x86/entry/32: Add and check a stack canary for the SYSENTER stack Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:04   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-03-10  3:00 ` [PATCH v2 12/12] x86/entry: Improve system call entry comments Andy Lutomirski
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into
regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall
entry code.  Remove it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/common.c            | 10 ----------
 arch/x86/include/asm/thread_info.h |  2 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index adea12fb7215..e79d93d44ecd 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -163,16 +163,6 @@ long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
 	if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
 		BUG_ON(regs != task_pt_regs(current));
 
-	/*
-	 * If we stepped into a sysenter/syscall insn, it trapped in
-	 * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
-	 * If user-mode had set TF itself, then it's still clear from
-	 * do_debug() and we need to set it again to restore the user
-	 * state.  If we entered on the slow path, TF was already set.
-	 */
-	if (work & _TIF_SINGLESTEP)
-		regs->flags |= X86_EFLAGS_TF;
-
 #ifdef CONFIG_SECCOMP
 	/*
 	 * Call seccomp_phase2 before running the other hooks so that
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index fbdb24e61835..82866697fcf1 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -140,7 +140,7 @@ struct thread_info {
  */
 #define _TIF_WORK_SYSCALL_ENTRY	\
 	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |	\
-	 _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT |	\
+	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT |	\
 	 _TIF_NOHZ)
 
 /* work to do on any return to user space */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/12] x86/entry: Improve system call entry comments
  2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
                   ` (10 preceding siblings ...)
  2016-03-10  3:00 ` [PATCH v2 11/12] x86/entry: Remove TIF_SINGLESTEP entry work Andy Lutomirski
@ 2016-03-10  3:00 ` Andy Lutomirski
  2016-03-10 11:05   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  11 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-03-10  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Oleg Nesterov, Andrew Cooper,
	Brian Gerst, Andy Lutomirski

Ingo suggested that the comments should explain when the various
entries are used.  This adds these explanations and improves other
parts of the comments.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S        | 61 +++++++++++++++++++++++++++-
 arch/x86/entry/entry_64.S        | 10 +++++
 arch/x86/entry/entry_64_compat.S | 85 +++++++++++++++++++++++++++-------------
 3 files changed, 128 insertions(+), 28 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 13a7f41c1886..587f3eb8c524 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -307,6 +307,38 @@ ENTRY(xen_sysenter_target)
 	jmp	sysenter_past_esp
 #endif
 
+/*
+ * 32-bit SYSENTER entry.
+ *
+ * 32-bit system calls through the vDSO's __kernel_vsyscall enter here
+ * if X86_FEATURE_SEP is available.  This is the preferred system call
+ * entry on 32-bit systems.
+ *
+ * The SYSENTER instruction, in principle, should *only* occur in the
+ * vDSO.  In practice, a small number of Android devices were shipped
+ * with a copy of Bionic that inlined a SYSENTER instruction.  This
+ * never happened in any of Google's Bionic versions -- it only happened
+ * in a narrow range of Intel-provided versions.
+ *
+ * SYSENTER loads SS, ESP, CS, and EIP from previously programmed MSRs.
+ * IF and VM in RFLAGS are cleared (IOW: interrupts are off).
+ * SYSENTER does not save anything on the stack,
+ * and does not save old EIP (!!!), ESP, or EFLAGS.
+ *
+ * To avoid losing track of EFLAGS.VM (and thus potentially corrupting
+ * user and/or vm86 state), we explicitly disable the SYSENTER
+ * instruction in vm86 mode by reprogramming the MSRs.
+ *
+ * Arguments:
+ * eax  system call number
+ * ebx  arg1
+ * ecx  arg2
+ * edx  arg3
+ * esi  arg4
+ * edi  arg5
+ * ebp  user stack
+ * 0(%ebp) arg6
+ */
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_sysenter_sp0(%esp), %esp
 sysenter_past_esp:
@@ -397,7 +429,34 @@ sysenter_past_esp:
 GLOBAL(__end_SYSENTER_singlestep_region)
 ENDPROC(entry_SYSENTER_32)
 
-	# system call handler stub
+/*
+ * 32-bit legacy system call entry.
+ *
+ * 32-bit x86 Linux system calls traditionally used the INT $0x80
+ * instruction.  INT $0x80 lands here.
+ *
+ * This entry point can be used by any 32-bit perform system calls.
+ * Instances of INT $0x80 can be found inline in various programs and
+ * libraries.  It is also used by the vDSO's __kernel_vsyscall
+ * fallback for hardware that doesn't support a faster entry method.
+ * Restarted 32-bit system calls also fall back to INT $0x80
+ * regardless of what instruction was originally used to do the system
+ * call.  (64-bit programs can use INT $0x80 as well, but they can
+ * only run on 64-bit kernels and therefore land in
+ * entry_INT80_compat.)
+ *
+ * This is considered a slow path.  It is not used by most libc
+ * implementations on modern hardware except during process startup.
+ *
+ * Arguments:
+ * eax  system call number
+ * ebx  arg1
+ * ecx  arg2
+ * edx  arg3
+ * esi  arg4
+ * edi  arg5
+ * ebp  arg6
+ */
 ENTRY(entry_INT80_32)
 	ASM_CLAC
 	pushl	%eax			/* pt_regs->orig_ax */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 70eadb0ea5fa..858b555e274b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -103,6 +103,16 @@ ENDPROC(native_usergs_sysret64)
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
  *
+ * This is the only entry point used for 64-bit system calls.  The
+ * hardware interface is reasonably well designed and the register to
+ * argument mapping Linux uses fits well with the registers that are
+ * available when SYSCALL is used.
+ *
+ * SYSCALL instructions can be found inlined in libc implementations as
+ * well as some other programs and libraries.  There are also a handful
+ * of SYSCALL instructions in the vDSO used, for example, as a
+ * clock_gettimeofday fallback.
+ *
  * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
  * then loads new ss, cs, and rip from previously programmed MSRs.
  * rflags gets masked by a value from another MSR (so CLD and CLAC
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 26469c11796d..847f2f0c31e5 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -19,12 +19,21 @@
 	.section .entry.text, "ax"
 
 /*
- * 32-bit SYSENTER instruction entry.
+ * 32-bit SYSENTER entry.
  *
- * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
- * IF and VM in rflags are cleared (IOW: interrupts are off).
+ * 32-bit system calls through the vDSO's __kernel_vsyscall enter here
+ * on 64-bit kernels running on Intel CPUs.
+ *
+ * The SYSENTER instruction, in principle, should *only* occur in the
+ * vDSO.  In practice, a small number of Android devices were shipped
+ * with a copy of Bionic that inlined a SYSENTER instruction.  This
+ * never happened in any of Google's Bionic versions -- it only happened
+ * in a narrow range of Intel-provided versions.
+ *
+ * SYSENTER loads SS, RSP, CS, and RIP from previously programmed MSRs.
+ * IF and VM in RFLAGS are cleared (IOW: interrupts are off).
  * SYSENTER does not save anything on the stack,
- * and does not save old rip (!!!) and rflags.
+ * and does not save old RIP (!!!), RSP, or RFLAGS.
  *
  * Arguments:
  * eax  system call number
@@ -35,10 +44,6 @@
  * edi  arg5
  * ebp  user stack
  * 0(%ebp) arg6
- *
- * This is purely a fast path. For anything complicated we use the int 0x80
- * path below. We set up a complete hardware stack frame to share code
- * with the int 0x80 path.
  */
 ENTRY(entry_SYSENTER_compat)
 	/* Interrupts are off on entry. */
@@ -131,17 +136,38 @@ GLOBAL(__end_entry_SYSENTER_compat)
 ENDPROC(entry_SYSENTER_compat)
 
 /*
- * 32-bit SYSCALL instruction entry.
+ * 32-bit SYSCALL entry.
+ *
+ * 32-bit system calls through the vDSO's __kernel_vsyscall enter here
+ * on 64-bit kernels running on AMD CPUs.
+ *
+ * The SYSCALL instruction, in principle, should *only* occur in the
+ * vDSO.  In practice, it appears that this really is the case.
+ * As evidence:
+ *
+ *  - The calling convention for SYSCALL has changed several times without
+ *    anyone noticing.
+ *
+ *  - Prior to the in-kernel X86_BUG_SYSRET_SS_ATTRS fixup, anything
+ *    user task that did SYSCALL without immediately reloading SS
+ *    would randomly crash.
  *
- * 32-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
- * then loads new ss, cs, and rip from previously programmed MSRs.
- * rflags gets masked by a value from another MSR (so CLD and CLAC
- * are not needed). SYSCALL does not save anything on the stack
- * and does not change rsp.
+ *  - Most programmers do not directly target AMD CPUs, and the 32-bit
+ *    SYSCALL instruction does not exist on Intel CPUs.  Even on AMD
+ *    CPUs, Linux disables the SYSCALL instruction on 32-bit kernels
+ *    because the SYSCALL instruction in legacy/native 32-bit mode (as
+ *    opposed to compat mode) is sufficiently poorly designed as to be
+ *    essentially unusable.
  *
- * Note: rflags saving+masking-with-MSR happens only in Long mode
+ * 32-bit SYSCALL saves RIP to RCX, clears RFLAGS.RF, then saves
+ * RFLAGS to R11, then loads new SS, CS, and RIP from previously
+ * programmed MSRs.  RFLAGS gets masked by a value from another MSR
+ * (so CLD and CLAC are not needed).  SYSCALL does not save anything on
+ * the stack and does not change RSP.
+ *
+ * Note: RFLAGS saving+masking-with-MSR happens only in Long mode
  * (in legacy 32-bit mode, IF, RF and VM bits are cleared and that's it).
- * Don't get confused: rflags saving+masking depends on Long Mode Active bit
+ * Don't get confused: RFLAGS saving+masking depends on Long Mode Active bit
  * (EFER.LMA=1), NOT on bitness of userspace where SYSCALL executes
  * or target CS descriptor's L bit (SYSCALL does not read segment descriptors).
  *
@@ -241,7 +267,21 @@ sysret32_from_system_call:
 END(entry_SYSCALL_compat)
 
 /*
- * Emulated IA32 system calls via int 0x80.
+ * 32-bit legacy system call entry.
+ *
+ * 32-bit x86 Linux system calls traditionally used the INT $0x80
+ * instruction.  INT $0x80 lands here.
+ *
+ * This entry point can be used by 32-bit and 64-bit programs to perform
+ * 32-bit system calls.  Instances of INT $0x80 can be found inline in
+ * various programs and libraries.  It is also used by the vDSO's
+ * __kernel_vsyscall fallback for hardware that doesn't support a faster
+ * entry method.  Restarted 32-bit system calls also fall back to INT
+ * $0x80 regardless of what instruction was originally used to do the
+ * system call.
+ *
+ * This is considered a slow path.  It is not used by most libc
+ * implementations on modern hardware except during process startup.
  *
  * Arguments:
  * eax  system call number
@@ -250,17 +290,8 @@ END(entry_SYSCALL_compat)
  * edx  arg3
  * esi  arg4
  * edi  arg5
- * ebp  arg6	(note: not saved in the stack frame, should not be touched)
- *
- * Notes:
- * Uses the same stack frame as the x86-64 version.
- * All registers except eax must be saved (but ptrace may violate that).
- * Arguments are zero extended. For system calls that want sign extension and
- * take long arguments a wrapper is needed. Most calls can just be called
- * directly.
- * Assumes it is only called from user space and entered with interrupts off.
+ * ebp  arg6
  */
-
 ENTRY(entry_INT80_compat)
 	/*
 	 * Interrupts are off on entry.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] selftests/x86: In syscall_nt, test NT|TF as well
  2016-03-10  3:00 ` [PATCH v2 01/12] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
@ 2016-03-10 11:00   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, andrew.cooper3, oleg, bp, luto, hpa, torvalds, luto,
	peterz, mingo, dvlasenk, tglx, linux-kernel

Commit-ID:  a318beea224d255c38652f88a7ae0d30ac4c23c3
Gitweb:     http://git.kernel.org/tip/a318beea224d255c38652f88a7ae0d30ac4c23c3
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:24 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:12 +0100

selftests/x86: In syscall_nt, test NT|TF as well

Setting TF prevents fastpath returns in most cases, which causes the
test to fail on 32-bit kernels because 32-bit kernels do not, in
fact, handle NT correctly on SYSENTER entries.

The next patch will fix 32-bit kernels.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bd4bb48af6b10c0dc84aec6dbcf487ed25683495.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/x86/syscall_nt.c | 57 +++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c
index 60c06af4..43fcab36 100644
--- a/tools/testing/selftests/x86/syscall_nt.c
+++ b/tools/testing/selftests/x86/syscall_nt.c
@@ -17,6 +17,9 @@
 
 #include <stdio.h>
 #include <unistd.h>
+#include <string.h>
+#include <signal.h>
+#include <err.h>
 #include <sys/syscall.h>
 #include <asm/processor-flags.h>
 
@@ -26,6 +29,8 @@
 # define WIDTH "l"
 #endif
 
+static unsigned int nerrs;
+
 static unsigned long get_eflags(void)
 {
 	unsigned long eflags;
@@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags)
 		      : : "rm" (eflags) : "flags");
 }
 
-int main()
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
 {
-	printf("[RUN]\tSet NT and issue a syscall\n");
-	set_eflags(get_eflags() | X86_EFLAGS_NT);
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static void sigtrap(int sig, siginfo_t *si, void *ctx_void)
+{
+}
+
+static void do_it(unsigned long extraflags)
+{
+	unsigned long flags;
+
+	set_eflags(get_eflags() | extraflags);
 	syscall(SYS_getpid);
-	if (get_eflags() & X86_EFLAGS_NT) {
-		printf("[OK]\tThe syscall worked and NT is still set\n");
-		return 0;
+	flags = get_eflags();
+	if ((flags & extraflags) == extraflags) {
+		printf("[OK]\tThe syscall worked and flags are still set\n");
 	} else {
-		printf("[FAIL]\tThe syscall worked but NT was cleared\n");
-		return 1;
+		printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n",
+		       flags, extraflags);
+		nerrs++;
 	}
 }
+
+int main(void)
+{
+	printf("[RUN]\tSet NT and issue a syscall\n");
+	do_it(X86_EFLAGS_NT);
+
+	/*
+	 * Now try it again with TF set -- TF forces returns via IRET in all
+	 * cases except non-ptregs-using 64-bit full fast path syscalls.
+	 */
+
+	sethandler(SIGTRAP, sigtrap, 0);
+
+	printf("[RUN]\tSet NT|TF and issue a syscall\n");
+	do_it(X86_EFLAGS_NT | X86_EFLAGS_TF);
+
+	return nerrs == 0 ? 0 : 1;
+}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  2016-03-10  3:00 ` [PATCH v2 02/12] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
@ 2016-03-10 11:01   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, brgerst, oleg, peterz, tglx, bp, luto, andrew.cooper3,
	torvalds, hpa, mingo, dvlasenk, linux-kernel

Commit-ID:  e786041153df6343169373177248abfab5c5ac1b
Gitweb:     http://git.kernel.org/tip/e786041153df6343169373177248abfab5c5ac1b
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:25 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:12 +0100

x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test

CLAC is slow, and the SYSENTER code already has an unlikely path
that runs if unusual flags are set.  Drop the CLAC and instead rely
on the unlikely path to clear AC.

This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
make this faster please!)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/90d6db2189f9add83bc7bddd75a0c19ebbd676b2.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64_compat.S | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 3c990ee..da5c194 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat)
 	 */
 	pushfq				/* pt_regs->flags (except IF = 0) */
 	orl	$X86_EFLAGS_IF, (%rsp)	/* Fix saved flags */
-	ASM_CLAC			/* Clear AC after saving FLAGS */
-
 	pushq	$__USER32_CS		/* pt_regs->cs */
 	xorq    %r8,%r8
 	pushq	%r8			/* pt_regs->ip = 0 (placeholder) */
@@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat)
 	cld
 
 	/*
-	 * Sysenter doesn't filter flags, so we need to clear NT
+	 * SYSENTER doesn't filter flags, so we need to clear NT and AC
 	 * ourselves.  To save a few cycles, we can check whether
-	 * NT was set instead of doing an unconditional popfq.
+	 * either was set instead of doing an unconditional popfq.
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
@@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat)
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT, EFLAGS(%rsp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  2016-03-10  3:00 ` [PATCH v2 03/12] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
@ 2016-03-10 11:01   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: andrew.cooper3, tglx, brgerst, peterz, hpa, torvalds, bp, luto,
	linux-kernel, dvlasenk, mingo, luto, oleg

Commit-ID:  67f590e8d4d718d9bd377b39223f7f69678d6a10
Gitweb:     http://git.kernel.org/tip/67f590e8d4d718d9bd377b39223f7f69678d6a10
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:26 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:12 +0100

x86/entry/32: Filter NT and speed up AC filtering in SYSENTER

This makes the 32-bit code work just like the 64-bit code.  It should
speed up syscalls on 32-bit kernels on Skylake by something like 20
cycles (by analogy to the 64-bit compat case).

It also cleans up NT just like we do for the 64-bit case.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/07daef3d44bd1ed62a2c866e143e8df64edb40ee.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e130272..8daa812 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -294,7 +294,6 @@ sysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
 	pushfl				/* pt_regs->flags (except IF = 0) */
-	ASM_CLAC			/* Clear AC after saving FLAGS */
 	orl	$X86_EFLAGS_IF, (%esp)	/* Fix IF */
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
@@ -302,6 +301,23 @@ sysenter_past_esp:
 	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
 
 	/*
+	 * SYSENTER doesn't filter flags, so we need to clear NT and AC
+	 * ourselves.  To save a few cycles, we can check whether
+	 * either was set instead of doing an unconditional popfq.
+	 * This needs to happen before enabling interrupts so that
+	 * we don't get preempted with NT set.
+	 *
+	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
+	 * out-of-line as an optimization: NT is unlikely to be set in the
+	 * majority of the cases and instead of polluting the I$ unnecessarily,
+	 * we're keeping that code behind a branch which will predict as
+	 * not-taken and therefore its instructions won't be fetched.
+	 */
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+	jnz	.Lsysenter_fix_flags
+.Lsysenter_flags_fixed:
+
+	/*
 	 * User mode is traced as though IRQs are on, and SYSENTER
 	 * turned them off.
 	 */
@@ -339,6 +355,11 @@ sysenter_past_esp:
 .popsection
 	_ASM_EXTABLE(1b, 2b)
 	PTGS_TO_GS_EX
+
+.Lsysenter_fix_flags:
+	pushl	$X86_EFLAGS_FIXED
+	popfl
+	jmp	.Lsysenter_flags_fixed
 ENDPROC(entry_SYSENTER_32)
 
 	# system call handler stub

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry/32: Restore FLAGS on SYSEXIT
  2016-03-10  3:00 ` [PATCH v2 04/12] x86/entry/32: Restore FLAGS on SYSEXIT Andy Lutomirski
@ 2016-03-10 11:01   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: andrew.cooper3, luto, brgerst, torvalds, linux-kernel, oleg,
	luto, hpa, bp, mingo, tglx, peterz, dvlasenk

Commit-ID:  c2c9b52fab0d0cf993476ed4c34f24da5a1205ae
Gitweb:     http://git.kernel.org/tip/c2c9b52fab0d0cf993476ed4c34f24da5a1205ae
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:27 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:12 +0100

x86/entry/32: Restore FLAGS on SYSEXIT

We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.

With this patch applied, native kernels should always honor
task_pt_regs()->flags, which opens the door for some sys_iopl()
cleanups.  I'll do those as a separate series, though, since getting
it right will involve tweaking some paravirt ops.

( The short version is that, before this patch, sys_iopl(), invoked via
  SYSENTER, wasn't guaranteed to ever transfer the updated
  regs->flags, so sys_iopl() had to change the hardware flags register
  as well. )

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3f98b207472dc9784838eb5ca2b89dcc845ce269.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8daa812..7610906 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -343,6 +343,15 @@ sysenter_past_esp:
 	popl	%eax			/* pt_regs->ax */
 
 	/*
+	 * Restore all flags except IF. (We restore IF separately because
+	 * STI gives a one-instruction window in which we won't be interrupted,
+	 * whereas POPF does not.)
+	 */
+	addl	$PT_EFLAGS-PT_DS, %esp	/* point esp at pt_regs->flags */
+	btr	$X86_EFLAGS_IF_BIT, (%esp)
+	popfl
+
+	/*
 	 * Return back to the vDSO, which will pop ecx and edx.
 	 * Don't bother with DS and ES (they already contain __USER_DS).
 	 */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  2016-03-10  3:00 ` [PATCH v2 05/12] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions Andy Lutomirski
@ 2016-03-10 11:02   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, brgerst, oleg, luto, andrew.cooper3, hpa, torvalds,
	peterz, mingo, luto, linux-kernel, bp, tglx

Commit-ID:  81edd9f69a6fd214fdbe66b43de6aa1610c84c63
Gitweb:     http://git.kernel.org/tip/81edd9f69a6fd214fdbe66b43de6aa1610c84c63
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:28 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:13 +0100

x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions

The SDM says that debug exceptions clear BTF, and we need to keep
TIF_BLOCKSTEP in sync with BTF.  Clear it unconditionally and improve
the comment.

I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not
to be cleared was just an oversight.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/fa86e55d196e6dde5b38839595bde2a292c52fdc.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/traps.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 410e8e2..1af5621 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	dr6 &= ~DR6_RESERVED;
 
 	/*
+	 * The SDM says "The processor clears the BTF flag when it
+	 * generates a debug exception."  Clear TIF_BLOCKSTEP to keep
+	 * TIF_BLOCKSTEP in sync with the hardware BTF flag.
+	 */
+	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
+
+	/*
 	 * If dr6 has no reason to give us about the origin of this trap,
 	 * then it's very likely the result of an icebp/int01 trap.
 	 * User wants a sigtrap for that.
@@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	/* DR6 may or may not be cleared by the CPU */
 	set_debugreg(0, 6);
 
-	/*
-	 * The processor cleared BTF, so don't mark that we need it set.
-	 */
-	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
-
 	/* Store the virtualized DR6 value */
 	tsk->thread.debugreg6 = dr6;
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
  2016-03-10  3:00 ` [PATCH v2 06/12] x86/traps: Clear DR6 early in do_debug and improve the comment Andy Lutomirski
@ 2016-03-10 11:02   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, brgerst, torvalds, dvlasenk, mingo, bp, andrew.cooper3,
	peterz, luto, luto, tglx, oleg, linux-kernel

Commit-ID:  8bb5643686d2898bec181c407651cf84e77d413f
Gitweb:     http://git.kernel.org/tip/8bb5643686d2898bec181c407651cf84e77d413f
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:29 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:13 +0100

x86/entry/traps: Clear DR6 early in do_debug() and improve the comment

Leaving any bits set in DR6 on return from a debug exception is
asking for trouble.  Prevent it by writing zero right away and
clarify the comment.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3857676e1be8fb27db4b89bbb1e2052b7f435ff4.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/traps.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 1af5621..7394be8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	ist_enter(regs);
 
 	get_debugreg(dr6, 6);
+	/*
+	 * The Intel SDM says:
+	 *
+	 *   Certain debug exceptions may clear bits 0-3. The remaining
+	 *   contents of the DR6 register are never cleared by the
+	 *   processor. To avoid confusion in identifying debug
+	 *   exceptions, debug handlers should clear the register before
+	 *   returning to the interrupted task.
+	 *
+	 * Keep it simple: clear DR6 immediately.
+	 */
+	set_debugreg(0, 6);
 
 	/* Filter out all the reserved bits which are preset to 1 */
 	dr6 &= ~DR6_RESERVED;
@@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
 		goto exit;
 
-	/* DR6 may or may not be cleared by the CPU */
-	set_debugreg(0, 6);
-
 	/* Store the virtualized DR6 value */
 	tsk->thread.debugreg6 = dr6;
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry: Vastly simplify SYSENTER TF (single-step) handling
  2016-03-10  3:00 ` [PATCH v2 07/12] x86/entry: Vastly simplify SYSENTER TF handling Andy Lutomirski
@ 2016-03-10 11:03   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, brgerst, bp, mingo, linux-kernel, luto, andrew.cooper3,
	luto, hpa, torvalds, oleg, tglx, peterz

Commit-ID:  f2b375756c839ff655a3e0b45e339f8fbf973093
Gitweb:     http://git.kernel.org/tip/f2b375756c839ff655a3e0b45e339f8fbf973093
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:30 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:13 +0100

x86/entry: Vastly simplify SYSENTER TF (single-step) handling

Due to a blatant design error, SYSENTER doesn't clear TF (single-step).

As a result, if a user does SYSENTER with TF set, we will single-step
through the kernel until something clears TF.  There is absolutely
nothing we can do to prevent this short of turning off SYSENTER [1].

Simplify the handling considerably with two changes:

  1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
     add TF to that list of flags to sanitize with no overhead whatsoever.

  2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue.

That's all we need to do.

Don't get too excited -- our handling is still buggy on 32-bit
kernels.  There's nothing wrong with the SYSENTER code itself, but
the #DB prologue has a clever fixup for traps on the very first
instruction of entry_SYSENTER_32, and the fixup doesn't work quite
correctly.  The next two patches will fix that.

[1] We could probably prevent it by forcing BTF on at all times and
    making sure we clear TF before any branches in the SYSENTER
    code.  Needless to say, this is a bad idea.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S        | 42 ++++++++++++++++++++++----------
 arch/x86/entry/entry_64_compat.S |  9 ++++++-
 arch/x86/include/asm/proto.h     | 15 ++++++++++--
 arch/x86/kernel/traps.c          | 52 +++++++++++++++++++++++++++++++++-------
 4 files changed, 94 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7610906..b2e1d44 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -287,7 +287,26 @@ need_resched:
 END(resume_kernel)
 #endif
 
-	# SYSENTER  call handler stub
+GLOBAL(__begin_SYSENTER_singlestep_region)
+/*
+ * All code from here through __end_SYSENTER_singlestep_region is subject
+ * to being single-stepped if a user program sets TF and executes SYSENTER.
+ * There is absolutely nothing that we can do to prevent this from happening
+ * (thanks Intel!).  To keep our handling of this situation as simple as
+ * possible, we handle TF just like AC and NT, except that our #DB handler
+ * will ignore all of the single-step traps generated in this range.
+ */
+
+#ifdef CONFIG_XEN
+/*
+ * Xen doesn't set %esp to be precisely what the normal SYSENTER
+ * entry point expects, so fix it up before using the normal path.
+ */
+ENTRY(xen_sysenter_target)
+	addl	$5*4, %esp			/* remove xen-provided frame */
+	jmp	sysenter_past_esp
+#endif
+
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_sysenter_sp0(%esp), %esp
 sysenter_past_esp:
@@ -301,19 +320,25 @@ sysenter_past_esp:
 	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
 
 	/*
-	 * SYSENTER doesn't filter flags, so we need to clear NT and AC
-	 * ourselves.  To save a few cycles, we can check whether
+	 * SYSENTER doesn't filter flags, so we need to clear NT, AC
+	 * and TF ourselves.  To save a few cycles, we can check whether
 	 * either was set instead of doing an unconditional popfq.
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
+	 * If TF is set, we will single-step all the way to here -- do_debug
+	 * will ignore all the traps.  (Yes, this is slow, but so is
+	 * single-stepping in general.  This allows us to avoid having
+	 * a more complicated code to handle the case where a user program
+	 * forces us to single-step through the SYSENTER entry code.)
+	 *
 	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 	 * out-of-line as an optimization: NT is unlikely to be set in the
 	 * majority of the cases and instead of polluting the I$ unnecessarily,
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -369,6 +394,7 @@ sysenter_past_esp:
 	pushl	$X86_EFLAGS_FIXED
 	popfl
 	jmp	.Lsysenter_flags_fixed
+GLOBAL(__end_SYSENTER_singlestep_region)
 ENDPROC(entry_SYSENTER_32)
 
 	# system call handler stub
@@ -651,14 +677,6 @@ ENTRY(spurious_interrupt_bug)
 END(spurious_interrupt_bug)
 
 #ifdef CONFIG_XEN
-/*
- * Xen doesn't set %esp to be precisely what the normal SYSENTER
- * entry point expects, so fix it up before using the normal path.
- */
-ENTRY(xen_sysenter_target)
-	addl	$5*4, %esp			/* remove xen-provided frame */
-	jmp	sysenter_past_esp
-
 ENTRY(xen_hypervisor_callback)
 	pushl	$-1				/* orig_ax = -1 => not a system call */
 	SAVE_ALL
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index da5c194..cbb5e69 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -94,13 +94,19 @@ ENTRY(entry_SYSENTER_compat)
 	 * This needs to happen before enabling interrupts so that
 	 * we don't get preempted with NT set.
 	 *
+	 * If TF is set, we will single-step all the way to here -- do_debug
+	 * will ignore all the traps.  (Yes, this is slow, but so is
+	 * single-stepping in general.  This allows us to avoid having
+	 * a more complicated code to handle the case where a user program
+	 * forces us to single-step through the SYSENTER entry code.)
+	 *
 	 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 	 * out-of-line as an optimization: NT is unlikely to be set in the
 	 * majority of the cases and instead of polluting the I$ unnecessarily,
 	 * we're keeping that code behind a branch which will predict as
 	 * not-taken and therefore its instructions won't be fetched.
 	 */
-	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
+	testl	$X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, EFLAGS(%rsp)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -121,6 +127,7 @@ ENTRY(entry_SYSENTER_compat)
 	pushq	$X86_EFLAGS_FIXED
 	popfq
 	jmp	.Lsysenter_flags_fixed
+GLOBAL(__end_entry_SYSENTER_compat)
 ENDPROC(entry_SYSENTER_compat)
 
 /*
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index a4a7728..9b9b30b 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -7,12 +7,23 @@
 
 void syscall_init(void);
 
+#ifdef CONFIG_X86_64
 void entry_SYSCALL_64(void);
-void entry_SYSCALL_compat(void);
+#endif
+
+#ifdef CONFIG_X86_32
 void entry_INT80_32(void);
-void entry_INT80_compat(void);
 void entry_SYSENTER_32(void);
+void __begin_SYSENTER_singlestep_region(void);
+void __end_SYSENTER_singlestep_region(void);
+#endif
+
+#ifdef CONFIG_IA32_EMULATION
 void entry_SYSENTER_compat(void);
+void __end_entry_SYSENTER_compat(void);
+void entry_SYSCALL_compat(void);
+void entry_INT80_compat(void);
+#endif
 
 void x86_configure_nx(void);
 void x86_report_nx(void);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7394be8..b0ddb819 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -559,6 +559,29 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 NOKPROBE_SYMBOL(fixup_bad_iret);
 #endif
 
+static bool is_sysenter_singlestep(struct pt_regs *regs)
+{
+	/*
+	 * We don't try for precision here.  If we're anywhere in the region of
+	 * code that can be single-stepped in the SYSENTER entry path, then
+	 * assume that this is a useless single-step trap due to SYSENTER
+	 * being invoked with TF set.  (We don't know in advance exactly
+	 * which instructions will be hit because BTF could plausibly
+	 * be set.)
+	 */
+#ifdef CONFIG_X86_32
+	return (regs->ip - (unsigned long)__begin_SYSENTER_singlestep_region) <
+		(unsigned long)__end_SYSENTER_singlestep_region -
+		(unsigned long)__begin_SYSENTER_singlestep_region;
+#elif defined(CONFIG_IA32_EMULATION)
+	return (regs->ip - (unsigned long)entry_SYSENTER_compat) <
+		(unsigned long)__end_entry_SYSENTER_compat -
+		(unsigned long)entry_SYSENTER_compat;
+#else
+	return false;
+#endif
+}
+
 /*
  * Our handling of the processor debug registers is non-trivial.
  * We do not clear them on entry and exit from the kernel. Therefore
@@ -616,6 +639,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	 */
 	clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
 
+	if (unlikely(!user_mode(regs) && (dr6 & DR_STEP) &&
+		     is_sysenter_singlestep(regs))) {
+		dr6 &= ~DR_STEP;
+		if (!dr6)
+			goto exit;
+		/*
+		 * else we might have gotten a single-step trap and hit a
+		 * watchpoint at the same time, in which case we should fall
+		 * through and handle the watchpoint.
+		 */
+	}
+
 	/*
 	 * If dr6 has no reason to give us about the origin of this trap,
 	 * then it's very likely the result of an icebp/int01 trap.
@@ -624,7 +659,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	if (!dr6 && user_mode(regs))
 		user_icebp = 1;
 
-	/* Catch kmemcheck conditions first of all! */
+	/* Catch kmemcheck conditions! */
 	if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
 		goto exit;
 
@@ -659,14 +694,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 		goto exit;
 	}
 
-	/*
-	 * Single-stepping through system calls: ignore any exceptions in
-	 * kernel space, but re-enable TF when returning to user mode.
-	 *
-	 * We already checked v86 mode above, so we can check for kernel mode
-	 * by just checking the CPL of CS.
-	 */
-	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+	if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
+		/*
+		 * Historical junk that used to handle SYSENTER single-stepping.
+		 * This should be unreachable now.  If we survive for a while
+		 * without anyone hitting this warning, we'll turn this into
+		 * an oops.
+		 */
 		tsk->thread.debugreg6 &= ~DR_STEP;
 		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
 		regs->flags &= ~X86_EFLAGS_TF;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
  2016-03-10  3:00 ` [PATCH v2 08/12] x86/entry: Only allocate space for SYSENTER_stack if needed Andy Lutomirski
@ 2016-03-10 11:03   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, tglx, luto, oleg, mingo, bp, luto, hpa,
	peterz, dvlasenk, andrew.cooper3, brgerst

Commit-ID:  6dcc94149d605908a7c0c4cf2085340637aac86d
Gitweb:     http://git.kernel.org/tip/6dcc94149d605908a7c0c4cf2085340637aac86d
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:31 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:14 +0100

x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed

The SYSENTER stack is only used on 32-bit kernels.  Remove it on 64-bit kernels.

( We may end up using it down the road on 64-bit kernels. If so,
  we'll re-enable it for CONFIG_IA32_EMULATION. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9dbd18429f9ff61a76b6eda97a9ea20510b9f6ba.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ecb4103..7cd01b7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -297,10 +297,12 @@ struct tss_struct {
 	 */
 	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
 
+#ifdef CONFIG_X86_32
 	/*
 	 * Space for the temporary SYSENTER stack:
 	 */
 	unsigned long		SYSENTER_stack[64];
+#endif
 
 } ____cacheline_aligned;
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  2016-03-10  3:00 ` [PATCH v2 09/12] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup Andy Lutomirski
@ 2016-03-10 11:03   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, dvlasenk, brgerst, peterz, luto, tglx, linux-kernel,
	andrew.cooper3, hpa, oleg, bp, mingo, torvalds

Commit-ID:  7536656f08d0c1a3b4c487d00785c5186ec6f533
Gitweb:     http://git.kernel.org/tip/7536656f08d0c1a3b4c487d00785c5186ec6f533
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:32 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:14 +0100

x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup

Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
so the exception handler is invoked on the temporary SYSENTER stack.

Because the SYSENTER stack is very small, we have a fixup to switch
off the stack quickly when this happens.  The old fixup had several issues:

 1. It checked the interrupt frame's CS and EIP.  This wasn't
    obviously correct on Xen or if vm86 mode was in use [1].

 2. In the NMI handler, it did some frightening digging into the
    stack frame.  I'm not convinced this digging was correct.

 3. The fixup didn't switch stacks and then switch back.  Instead, it
    synthesized a brand new stack frame that would redirect the IRET
    back to the SYSENTER code.  That frame was highly questionable.
    For one thing, if NMI nested inside #DB, we would effectively
    abort the #DB prologue, which was probably safe but was
    frightening.  For another, the code used PUSHFL to write the
    FLAGS portion of the frame, which was simply bogus -- by the time
    PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
    flags were clobbered.

Simplify this considerably.  Instead of looking at the saved frame
to see where we came from, check the hardware ESP register against
the SYSENTER stack directly.  Malicious user code cannot spoof the
kernel ESP register, and by moving the check after SAVE_ALL, we can
use normal PER_CPU accesses to find all the relevant addresses.

With this patch applied, the improved syscall_nt_32 test finally
passes on 32-bit kernels.

[1] It isn't obviously correct, but it is nonetheless safe from vm86
    shenanigans as far as I can tell.  A user can't point EIP at
    entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
    like all kernel addresses, is greater than 0xffff and would thus
    violate the CS segment limit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S        | 114 ++++++++++++++++++---------------------
 arch/x86/kernel/asm-offsets_32.c |   5 ++
 2 files changed, 56 insertions(+), 63 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b2e1d44..7b3ec24 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -976,51 +976,48 @@ error_code:
 	jmp	ret_from_exception
 END(page_fault)
 
-/*
- * Debug traps and NMI can happen at the one SYSENTER instruction
- * that sets up the real kernel stack. Check here, since we can't
- * allow the wrong stack to be used.
- *
- * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
- * already pushed 3 words if it hits on the sysenter instruction:
- * eflags, cs and eip.
- *
- * We just load the right stack, and push the three (known) values
- * by hand onto the new stack - while updating the return eip past
- * the instruction that would have done it for sysenter.
- */
-.macro FIX_STACK offset ok label
-	cmpw	$__KERNEL_CS, 4(%esp)
-	jne	\ok
-\label:
-	movl	TSS_sysenter_sp0 + \offset(%esp), %esp
-	pushfl
-	pushl	$__KERNEL_CS
-	pushl	$sysenter_past_esp
-.endm
-
 ENTRY(debug)
+	/*
+	 * #DB can happen at the first instruction of
+	 * entry_SYSENTER_32 or in Xen's SYSENTER prologue.  If this
+	 * happens, then we will be running on a very small stack.  We
+	 * need to detect this condition and switch to the thread
+	 * stack before calling any C code at all.
+	 *
+	 * If you edit this code, keep in mind that NMIs can happen in here.
+	 */
 	ASM_CLAC
-	cmpl	$entry_SYSENTER_32, (%esp)
-	jne	debug_stack_correct
-	FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
-debug_stack_correct:
 	pushl	$-1				# mark this as an int
 	SAVE_ALL
-	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
+
+	/* Are we currently on the SYSENTER stack? */
+	PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+	subl	%eax, %ecx	/* ecx = (end of SYSENTER_stack) - esp */
+	cmpl	$SIZEOF_SYSENTER_stack, %ecx
+	jb	.Ldebug_from_sysenter_stack
+
+	TRACE_IRQS_OFF
+	call	do_debug
+	jmp	ret_from_exception
+
+.Ldebug_from_sysenter_stack:
+	/* We're on the SYSENTER stack.  Switch off. */
+	movl	%esp, %ebp
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
+	TRACE_IRQS_OFF
 	call	do_debug
+	movl	%ebp, %esp
 	jmp	ret_from_exception
 END(debug)
 
 /*
- * NMI is doubly nasty. It can happen _while_ we're handling
- * a debug fault, and the debug fault hasn't yet been able to
- * clear up the stack. So we first check whether we got  an
- * NMI on the sysenter entry path, but after that we need to
- * check whether we got an NMI on the debug path where the debug
- * fault happened on the sysenter path.
+ * NMI is doubly nasty.  It can happen on the first instruction of
+ * entry_SYSENTER_32 (just like #DB), but it can also interrupt the beginning
+ * of the #DB handler even if that #DB in turn hit before entry_SYSENTER_32
+ * switched stacks.  We handle both conditions by simply checking whether we
+ * interrupted kernel code running on the SYSENTER stack.
  */
 ENTRY(nmi)
 	ASM_CLAC
@@ -1031,41 +1028,32 @@ ENTRY(nmi)
 	popl	%eax
 	je	nmi_espfix_stack
 #endif
-	cmpl	$entry_SYSENTER_32, (%esp)
-	je	nmi_stack_fixup
-	pushl	%eax
-	movl	%esp, %eax
-	/*
-	 * Do not access memory above the end of our stack page,
-	 * it might not exist.
-	 */
-	andl	$(THREAD_SIZE-1), %eax
-	cmpl	$(THREAD_SIZE-20), %eax
-	popl	%eax
-	jae	nmi_stack_correct
-	cmpl	$entry_SYSENTER_32, 12(%esp)
-	je	nmi_debug_stack_check
-nmi_stack_correct:
-	pushl	%eax
+
+	pushl	%eax				# pt_regs->orig_ax
 	SAVE_ALL
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
+
+	/* Are we currently on the SYSENTER stack? */
+	PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+	subl	%eax, %ecx	/* ecx = (end of SYSENTER_stack) - esp */
+	cmpl	$SIZEOF_SYSENTER_stack, %ecx
+	jb	.Lnmi_from_sysenter_stack
+
+	/* Not on SYSENTER stack. */
 	call	do_nmi
 	jmp	restore_all_notrace
 
-nmi_stack_fixup:
-	FIX_STACK 12, nmi_stack_correct, 1
-	jmp	nmi_stack_correct
-
-nmi_debug_stack_check:
-	cmpw	$__KERNEL_CS, 16(%esp)
-	jne	nmi_stack_correct
-	cmpl	$debug, (%esp)
-	jb	nmi_stack_correct
-	cmpl	$debug_esp_fix_insn, (%esp)
-	ja	nmi_stack_correct
-	FIX_STACK 24, nmi_stack_correct, 1
-	jmp	nmi_stack_correct
+.Lnmi_from_sysenter_stack:
+	/*
+	 * We're on the SYSENTER stack.  Switch off.  No one (not even debug)
+	 * is using the thread stack right now, so it's safe for us to use it.
+	 */
+	movl	%esp, %ebp
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
+	call	do_nmi
+	movl	%ebp, %esp
+	jmp	restore_all_notrace
 
 #ifdef CONFIG_X86_ESPFIX32
 nmi_espfix_stack:
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index fdeb0ce..ecdc1d2 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -52,6 +52,11 @@ void foo(void)
 	DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) -
 	       offsetofend(struct tss_struct, SYSENTER_stack));
 
+	/* Offset from cpu_tss to SYSENTER_stack */
+	OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack);
+	/* Size of SYSENTER_stack */
+	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
+
 #if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_GUEST) || defined(CONFIG_LGUEST_MODULE)
 	BLANK();
 	OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry/32: Add and check a stack canary for the SYSENTER stack
  2016-03-10  3:00 ` [PATCH v2 10/12] x86/entry/32: Add and check a stack canary for the SYSENTER stack Andy Lutomirski
@ 2016-03-10 11:04   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: oleg, peterz, andrew.cooper3, dvlasenk, luto, hpa, luto, mingo,
	bp, linux-kernel, brgerst, torvalds, tglx

Commit-ID:  2a41aa4feb25af3ead60b740c43df80c576efea2
Gitweb:     http://git.kernel.org/tip/2a41aa4feb25af3ead60b740c43df80c576efea2
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:33 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:14 +0100

x86/entry/32: Add and check a stack canary for the SYSENTER stack

The first instruction of the SYSENTER entry runs on its own tiny
stack.  That stack can be used if a #DB or NMI is delivered before
the SYSENTER prologue switches to a real stack.

We have code in place to prevent us from overflowing the tiny stack.
For added paranoia, add a canary to the stack and check it in
do_debug() -- that way, if something goes wrong with the #DB logic,
we'll eventually notice.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6ff9a806f39098b166dc2c41c1db744df5272f29.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h | 3 ++-
 arch/x86/kernel/process.c        | 3 +++
 arch/x86/kernel/traps.c          | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7cd01b7..50a6dc8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -299,8 +299,9 @@ struct tss_struct {
 
 #ifdef CONFIG_X86_32
 	/*
-	 * Space for the temporary SYSENTER stack:
+	 * Space for the temporary SYSENTER stack.
 	 */
+	unsigned long		SYSENTER_stack_canary;
 	unsigned long		SYSENTER_stack[64];
 #endif
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c..ee9a979 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 	  */
 	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },
 #endif
+#ifdef CONFIG_X86_32
+	.SYSENTER_stack_canary	= STACK_END_MAGIC,
+#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss);
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b0ddb819..49e2e77 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	debug_stack_usage_dec();
 
 exit:
+#if defined(CONFIG_X86_32)
+	/*
+	 * This is the most likely code path that involves non-trivial use
+	 * of the SYSENTER stack.  Check that we haven't overrun it.
+	 */
+	WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC,
+	     "Overran or corrupted SYSENTER stack\n");
+#endif
 	ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry: Remove TIF_SINGLESTEP entry work
  2016-03-10  3:00 ` [PATCH v2 11/12] x86/entry: Remove TIF_SINGLESTEP entry work Andy Lutomirski
@ 2016-03-10 11:04   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, bp, peterz, hpa, luto, torvalds, andrew.cooper3, mingo,
	oleg, linux-kernel, luto, brgerst, tglx

Commit-ID:  392a62549fbb80da48811d04391615a25c39d091
Gitweb:     http://git.kernel.org/tip/392a62549fbb80da48811d04391615a25c39d091
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:34 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:14 +0100

x86/entry: Remove TIF_SINGLESTEP entry work

Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into
regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall
entry code.  Remove it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/common.c            | 10 ----------
 arch/x86/include/asm/thread_info.h |  2 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 1a000f5..209ba33 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -172,16 +172,6 @@ long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
 	if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
 		BUG_ON(regs != task_pt_regs(current));
 
-	/*
-	 * If we stepped into a sysenter/syscall insn, it trapped in
-	 * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
-	 * If user-mode had set TF itself, then it's still clear from
-	 * do_debug() and we need to set it again to restore the user
-	 * state.  If we entered on the slow path, TF was already set.
-	 */
-	if (work & _TIF_SINGLESTEP)
-		regs->flags |= X86_EFLAGS_TF;
-
 #ifdef CONFIG_SECCOMP
 	/*
 	 * Call seccomp_phase2 before running the other hooks so that
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index c0778fc..f2e2302 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -137,7 +137,7 @@ struct thread_info {
 /* work to do in syscall_trace_enter() */
 #define _TIF_WORK_SYSCALL_ENTRY	\
 	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |	\
-	 _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT |	\
+	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT |	\
 	 _TIF_NOHZ)
 
 /* work to do on any return to user space */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] x86/entry: Improve system call entry comments
  2016-03-10  3:00 ` [PATCH v2 12/12] x86/entry: Improve system call entry comments Andy Lutomirski
@ 2016-03-10 11:05   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-03-10 11:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, hpa, oleg, luto, luto, linux-kernel, andrew.cooper3,
	bp, tglx, peterz, torvalds, mingo, brgerst

Commit-ID:  fda57b2267e12de014069b1596a5438cf76fc7c6
Gitweb:     http://git.kernel.org/tip/fda57b2267e12de014069b1596a5438cf76fc7c6
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 9 Mar 2016 19:00:35 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 09:48:15 +0100

x86/entry: Improve system call entry comments

Ingo suggested that the comments should explain when the various
entries are used.  This adds these explanations and improves other
parts of the comments.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9524ecef7a295347294300045d08354d6a57c6e7.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S        | 61 +++++++++++++++++++++++++++-
 arch/x86/entry/entry_64.S        | 10 +++++
 arch/x86/entry/entry_64_compat.S | 85 +++++++++++++++++++++++++++-------------
 3 files changed, 128 insertions(+), 28 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7b3ec24..286efa3 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -307,6 +307,38 @@ ENTRY(xen_sysenter_target)
 	jmp	sysenter_past_esp
 #endif
 
+/*
+ * 32-bit SYSENTER entry.
+ *
+ * 32-bit system calls through the vDSO's __kernel_vsyscall enter here
+ * if X86_FEATURE_SEP is available.  This is the preferred system call
+ * entry on 32-bit systems.
+ *
+ * The SYSENTER instruction, in principle, should *only* occur in the
+ * vDSO.  In practice, a small number of Android devices were shipped
+ * with a copy of Bionic that inlined a SYSENTER instruction.  This
+ * never happened in any of Google's Bionic versions -- it only happened
+ * in a narrow range of Intel-provided versions.
+ *
+ * SYSENTER loads SS, ESP, CS, and EIP from previously programmed MSRs.
+ * IF and VM in RFLAGS are cleared (IOW: interrupts are off).
+ * SYSENTER does not save anything on the stack,
+ * and does not save old EIP (!!!), ESP, or EFLAGS.
+ *
+ * To avoid losing track of EFLAGS.VM (and thus potentially corrupting
+ * user and/or vm86 state), we explicitly disable the SYSENTER
+ * instruction in vm86 mode by reprogramming the MSRs.
+ *
+ * Arguments:
+ * eax  system call number
+ * ebx  arg1
+ * ecx  arg2
+ * edx  arg3
+ * esi  arg4
+ * edi  arg5
+ * ebp  user stack
+ * 0(%ebp) arg6
+ */
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_sysenter_sp0(%esp), %esp
 sysenter_past_esp:
@@ -397,7 +429,34 @@ sysenter_past_esp:
 GLOBAL(__end_SYSENTER_singlestep_region)
 ENDPROC(entry_SYSENTER_32)
 
-	# system call handler stub
+/*
+ * 32-bit legacy system call entry.
+ *
+ * 32-bit x86 Linux system calls traditionally used the INT $0x80
+ * instruction.  INT $0x80 lands here.
+ *
+ * This entry point can be used by any 32-bit perform system calls.
+ * Instances of INT $0x80 can be found inline in various programs and
+ * libraries.  It is also used by the vDSO's __kernel_vsyscall
+ * fallback for hardware that doesn't support a faster entry method.
+ * Restarted 32-bit system calls also fall back to INT $0x80
+ * regardless of what instruction was originally used to do the system
+ * call.  (64-bit programs can use INT $0x80 as well, but they can
+ * only run on 64-bit kernels and therefore land in
+ * entry_INT80_compat.)
+ *
+ * This is considered a slow path.  It is not used by most libc
+ * implementations on modern hardware except during process startup.
+ *
+ * Arguments:
+ * eax  system call number
+ * ebx  arg1
+ * ecx  arg2
+ * edx  arg3
+ * esi  arg4
+ * edi  arg5
+ * ebp  arg6
+ */
 ENTRY(entry_INT80_32)
 	ASM_CLAC
 	pushl	%eax			/* pt_regs->orig_ax */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 70eadb0..858b555 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -103,6 +103,16 @@ ENDPROC(native_usergs_sysret64)
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
  *
+ * This is the only entry point used for 64-bit system calls.  The
+ * hardware interface is reasonably well designed and the register to
+ * argument mapping Linux uses fits well with the registers that are
+ * available when SYSCALL is used.
+ *
+ * SYSCALL instructions can be found inlined in libc implementations as
+ * well as some other programs and libraries.  There are also a handful
+ * of SYSCALL instructions in the vDSO used, for example, as a
+ * clock_gettimeofday fallback.
+ *
  * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
  * then loads new ss, cs, and rip from previously programmed MSRs.
  * rflags gets masked by a value from another MSR (so CLD and CLAC
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index cbb5e69..efe0fe3 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -19,12 +19,21 @@
 	.section .entry.text, "ax"
 
 /*
- * 32-bit SYSENTER instruction entry.
+ * 32-bit SYSENTER entry.
  *
- * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
- * IF and VM in rflags are cleared (IOW: interrupts are off).
+ * 32-bit system calls through the vDSO's __kernel_vsyscall enter here
+ * on 64-bit kernels running on Intel CPUs.
+ *
+ * The SYSENTER instruction, in principle, should *only* occur in the
+ * vDSO.  In practice, a small number of Android devices were shipped
+ * with a copy of Bionic that inlined a SYSENTER instruction.  This
+ * never happened in any of Google's Bionic versions -- it only happened
+ * in a narrow range of Intel-provided versions.
+ *
+ * SYSENTER loads SS, RSP, CS, and RIP from previously programmed MSRs.
+ * IF and VM in RFLAGS are cleared (IOW: interrupts are off).
  * SYSENTER does not save anything on the stack,
- * and does not save old rip (!!!) and rflags.
+ * and does not save old RIP (!!!), RSP, or RFLAGS.
  *
  * Arguments:
  * eax  system call number
@@ -35,10 +44,6 @@
  * edi  arg5
  * ebp  user stack
  * 0(%ebp) arg6
- *
- * This is purely a fast path. For anything complicated we use the int 0x80
- * path below. We set up a complete hardware stack frame to share code
- * with the int 0x80 path.
  */
 ENTRY(entry_SYSENTER_compat)
 	/* Interrupts are off on entry. */
@@ -131,17 +136,38 @@ GLOBAL(__end_entry_SYSENTER_compat)
 ENDPROC(entry_SYSENTER_compat)
 
 /*
- * 32-bit SYSCALL instruction entry.
+ * 32-bit SYSCALL entry.
+ *
+ * 32-bit system calls through the vDSO's __kernel_vsyscall enter here
+ * on 64-bit kernels running on AMD CPUs.
+ *
+ * The SYSCALL instruction, in principle, should *only* occur in the
+ * vDSO.  In practice, it appears that this really is the case.
+ * As evidence:
+ *
+ *  - The calling convention for SYSCALL has changed several times without
+ *    anyone noticing.
+ *
+ *  - Prior to the in-kernel X86_BUG_SYSRET_SS_ATTRS fixup, anything
+ *    user task that did SYSCALL without immediately reloading SS
+ *    would randomly crash.
  *
- * 32-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
- * then loads new ss, cs, and rip from previously programmed MSRs.
- * rflags gets masked by a value from another MSR (so CLD and CLAC
- * are not needed). SYSCALL does not save anything on the stack
- * and does not change rsp.
+ *  - Most programmers do not directly target AMD CPUs, and the 32-bit
+ *    SYSCALL instruction does not exist on Intel CPUs.  Even on AMD
+ *    CPUs, Linux disables the SYSCALL instruction on 32-bit kernels
+ *    because the SYSCALL instruction in legacy/native 32-bit mode (as
+ *    opposed to compat mode) is sufficiently poorly designed as to be
+ *    essentially unusable.
  *
- * Note: rflags saving+masking-with-MSR happens only in Long mode
+ * 32-bit SYSCALL saves RIP to RCX, clears RFLAGS.RF, then saves
+ * RFLAGS to R11, then loads new SS, CS, and RIP from previously
+ * programmed MSRs.  RFLAGS gets masked by a value from another MSR
+ * (so CLD and CLAC are not needed).  SYSCALL does not save anything on
+ * the stack and does not change RSP.
+ *
+ * Note: RFLAGS saving+masking-with-MSR happens only in Long mode
  * (in legacy 32-bit mode, IF, RF and VM bits are cleared and that's it).
- * Don't get confused: rflags saving+masking depends on Long Mode Active bit
+ * Don't get confused: RFLAGS saving+masking depends on Long Mode Active bit
  * (EFER.LMA=1), NOT on bitness of userspace where SYSCALL executes
  * or target CS descriptor's L bit (SYSCALL does not read segment descriptors).
  *
@@ -241,7 +267,21 @@ sysret32_from_system_call:
 END(entry_SYSCALL_compat)
 
 /*
- * Emulated IA32 system calls via int 0x80.
+ * 32-bit legacy system call entry.
+ *
+ * 32-bit x86 Linux system calls traditionally used the INT $0x80
+ * instruction.  INT $0x80 lands here.
+ *
+ * This entry point can be used by 32-bit and 64-bit programs to perform
+ * 32-bit system calls.  Instances of INT $0x80 can be found inline in
+ * various programs and libraries.  It is also used by the vDSO's
+ * __kernel_vsyscall fallback for hardware that doesn't support a faster
+ * entry method.  Restarted 32-bit system calls also fall back to INT
+ * $0x80 regardless of what instruction was originally used to do the
+ * system call.
+ *
+ * This is considered a slow path.  It is not used by most libc
+ * implementations on modern hardware except during process startup.
  *
  * Arguments:
  * eax  system call number
@@ -250,17 +290,8 @@ END(entry_SYSCALL_compat)
  * edx  arg3
  * esi  arg4
  * edi  arg5
- * ebp  arg6	(note: not saved in the stack frame, should not be touched)
- *
- * Notes:
- * Uses the same stack frame as the x86-64 version.
- * All registers except eax must be saved (but ptrace may violate that).
- * Arguments are zero extended. For system calls that want sign extension and
- * take long arguments a wrapper is needed. Most calls can just be called
- * directly.
- * Assumes it is only called from user space and entered with interrupts off.
+ * ebp  arg6
  */
-
 ENTRY(entry_INT80_compat)
 	/*
 	 * Interrupts are off on entry.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-03-10 11:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-10  3:00 [PATCH v2 00/12] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 01/12] selftests/x86: In syscall_nt, test NT|TF as well Andy Lutomirski
2016-03-10 11:00   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 02/12] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test Andy Lutomirski
2016-03-10 11:01   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 03/12] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER Andy Lutomirski
2016-03-10 11:01   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 04/12] x86/entry/32: Restore FLAGS on SYSEXIT Andy Lutomirski
2016-03-10 11:01   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 05/12] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions Andy Lutomirski
2016-03-10 11:02   ` [tip:x86/asm] x86/entry/traps: " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 06/12] x86/traps: Clear DR6 early in do_debug and improve the comment Andy Lutomirski
2016-03-10 11:02   ` [tip:x86/asm] x86/entry/traps: Clear DR6 early in do_debug() " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 07/12] x86/entry: Vastly simplify SYSENTER TF handling Andy Lutomirski
2016-03-10 11:03   ` [tip:x86/asm] x86/entry: Vastly simplify SYSENTER TF (single-step) handling tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 08/12] x86/entry: Only allocate space for SYSENTER_stack if needed Andy Lutomirski
2016-03-10 11:03   ` [tip:x86/asm] x86/entry: Only allocate space for tss_struct::SYSENTER_stack " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 09/12] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup Andy Lutomirski
2016-03-10 11:03   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 10/12] x86/entry/32: Add and check a stack canary for the SYSENTER stack Andy Lutomirski
2016-03-10 11:04   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 11/12] x86/entry: Remove TIF_SINGLESTEP entry work Andy Lutomirski
2016-03-10 11:04   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-03-10  3:00 ` [PATCH v2 12/12] x86/entry: Improve system call entry comments Andy Lutomirski
2016-03-10 11:05   ` [tip:x86/asm] " tip-bot for Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.