All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/14] x86: Rewrite exit-to-userspace code
@ 2015-06-18 19:08 Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 01/14] uml: Fix do_signal() prototype Andy Lutomirski
                   ` (14 more replies)
  0 siblings, 15 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

This is the first big batch of x86 asm-to-C conversion patches.

The exit-to-usermode code is copied in several places and is written
in a nasty combination of asm and C.  It's not at all clear what
it's supposed to do, and the way it's structured makes it very hard
to work with.  For example, it's not even clear why syscall exit
hooks are called only once per syscall right now.  (It seems to be a
side effect of the way that rdi and rdx are handled in the asm loop,
and it seems reliable, but it's still pointlessly complicated.)  The
existing code also makes context tracking overly complicated and
hard to understand.  Finally, it's nearly impossible for anyone to
change what happens on exit to usermode, since the existing code is
so fragile.

I tried to clean it up incrementally, but I decided it was too hard.
Instead, this series just replaces the code.  It seems to work.

Context tracking in particular works very differently now.  The
low-level entry code checks that we're in CONTEXT_USER and switches
to CONTEXT_KERNEL.  The exit code does the reverse.  There is no
need to track what CONTEXT_XYZ state we came from, because we
already know.  Similarly, SCHEDULE_USER is gone, since we can
reschedule if needed by simply calling schedule() from C code.

The main things that are missing are that I haven't done the 32-bit
parts (anyone want to help?) and therefore I haven't deleted the old
C code.  I also think this may break UML for trivial reasons.

IRQ context tracking is still duplicated.  We should probably clean
it up by changing the core code to supply something like
irq_enter_we_are_already_in_context_kernel.

Changes from v1:
 - Fix bisection failure by squashing the 64-bit native and compat syscall
   conversions together.  The intermediate state didn't built, and fixing
   it isn't worthwhile (the results will be harder to understand).
 - Replace context_tracking_assert_state with CT_WARN_ON and ct_state.
 - The last two patches are now.  I incorrectly thought that we weren't
   ready for them yet on 32-bit kernels, but I was wrong.

Andy Lutomirski (13):
  context_tracking: Add ct_state and CT_WARN_ON
  notifiers: Assert that RCU is watching in notify_die
  x86: Move C entry and exit code to arch/x86/entry/common.c
  x86/traps: Assert that we're in CONTEXT_KERNEL in exception entries
  x86/entry: Add enter_from_user_mode and use it in syscalls
  x86/entry: Add new, comprehensible entry and exit hooks
  x86/entry/64: Really create an error-entry-from-usermode code path
  x86/entry/64: Migrate 64-bit and compat syscalls to new exit hooks
  x86/asm/entry/64: Save all regs on interrupt entry
  x86/asm/entry/64: Simplify irq stack pt_regs handling
  x86/asm/entry/64: Migrate error and interrupt exit work to C
  x86/entry: Remove exception_enter from trap handlers
  x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h

Ingo Molnar (1):
  uml: Fix do_signal() prototype

 arch/um/include/shared/kern_util.h      |   3 +-
 arch/um/kernel/process.c                |   6 +-
 arch/um/kernel/signal.c                 |   8 +-
 arch/um/kernel/tlb.c                    |   2 +-
 arch/um/kernel/trap.c                   |   2 +-
 arch/x86/entry/Makefile                 |   1 +
 arch/x86/entry/common.c                 | 374 ++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S               | 176 ++++-----------
 arch/x86/entry/entry_64_compat.S        |   7 +-
 arch/x86/include/asm/context_tracking.h |  10 -
 arch/x86/include/asm/signal.h           |   1 +
 arch/x86/include/asm/traps.h            |   4 +-
 arch/x86/kernel/cpu/mcheck/mce.c        |   5 +-
 arch/x86/kernel/cpu/mcheck/p5.c         |   5 +-
 arch/x86/kernel/cpu/mcheck/winchip.c    |   4 +-
 arch/x86/kernel/ptrace.c                | 202 +----------------
 arch/x86/kernel/signal.c                |  28 +--
 arch/x86/kernel/traps.c                 |  87 +++-----
 include/linux/context_tracking.h        |  15 ++
 include/linux/context_tracking_state.h  |   1 +
 kernel/notifier.c                       |   2 +
 21 files changed, 485 insertions(+), 458 deletions(-)
 create mode 100644 arch/x86/entry/common.c
 delete mode 100644 arch/x86/include/asm/context_tracking.h

-- 
2.4.3


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 01/14] uml: Fix do_signal() prototype
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 02/14] context_tracking: Add ct_state and CT_WARN_ON Andy Lutomirski
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Ingo Molnar, Richard Weinberger, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, H. Peter Anvin, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Andy Lutomirski

From: Ingo Molnar <mingo@kernel.org>

Once x86 exports its do_signal(), the prototypes will clash.

Fix the clash and also improve the code a bit: remove the unnecessary
kern_do_signal() indirection. This allows interrupt_end() to share
the 'regs' parameter calculation.

Also remove the unused return code to match x86.

Minimally build and boot tested.

Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[Adjusted the commit message because I reordered the patch. --Andy]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/um/include/shared/kern_util.h | 3 ++-
 arch/um/kernel/process.c           | 6 ++++--
 arch/um/kernel/signal.c            | 8 +-------
 arch/um/kernel/tlb.c               | 2 +-
 arch/um/kernel/trap.c              | 2 +-
 5 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
index 83a91f976330..35ab97e4bb9b 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -22,7 +22,8 @@ extern int kmalloc_ok;
 extern unsigned long alloc_stack(int order, int atomic);
 extern void free_stack(unsigned long stack, int order);
 
-extern int do_signal(void);
+struct pt_regs;
+extern void do_signal(struct pt_regs *regs);
 extern void interrupt_end(void);
 extern void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs);
 
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 68b9119841cd..a6d922672b9f 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -90,12 +90,14 @@ void *__switch_to(struct task_struct *from, struct task_struct *to)
 
 void interrupt_end(void)
 {
+	struct pt_regs *regs = &current->thread.regs;
+
 	if (need_resched())
 		schedule();
 	if (test_thread_flag(TIF_SIGPENDING))
-		do_signal();
+		do_signal(regs);
 	if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
-		tracehook_notify_resume(&current->thread.regs);
+		tracehook_notify_resume(regs);
 }
 
 void exit_thread(void)
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 4f60e4aad790..57acbd67d85d 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -64,7 +64,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	signal_setup_done(err, ksig, singlestep);
 }
 
-static int kern_do_signal(struct pt_regs *regs)
+void do_signal(struct pt_regs *regs)
 {
 	struct ksignal ksig;
 	int handled_sig = 0;
@@ -110,10 +110,4 @@ static int kern_do_signal(struct pt_regs *regs)
 	 */
 	if (!handled_sig)
 		restore_saved_sigmask();
-	return handled_sig;
-}
-
-int do_signal(void)
-{
-	return kern_do_signal(&current->thread.regs);
 }
diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
index f1b3eb14b855..2077248e8a72 100644
--- a/arch/um/kernel/tlb.c
+++ b/arch/um/kernel/tlb.c
@@ -291,7 +291,7 @@ void fix_range_common(struct mm_struct *mm, unsigned long start_addr,
 		/* We are under mmap_sem, release it such that current can terminate */
 		up_write(&current->mm->mmap_sem);
 		force_sig(SIGKILL, current);
-		do_signal();
+		do_signal(&current->thread.regs);
 	}
 }
 
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 8e4daf44e980..481311becb05 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -172,7 +172,7 @@ static void bad_segv(struct faultinfo fi, unsigned long ip)
 void fatal_sigsegv(void)
 {
 	force_sigsegv(SIGSEGV, current);
-	do_signal();
+	do_signal(&current->thread.regs);
 	/*
 	 * This is to tell gcc that we're not returning - do_signal
 	 * can, in general, return, but in this case, it's not, since
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 02/14] context_tracking: Add ct_state and CT_WARN_ON
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 01/14] uml: Fix do_signal() prototype Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die Andy Lutomirski
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

This will let us sprinkle sanity checks around the kernel without
making too much of a mess.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 include/linux/context_tracking.h       | 15 +++++++++++++++
 include/linux/context_tracking_state.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 2821838256b4..3f5ac9b86f69 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -57,6 +57,19 @@ static inline void context_tracking_task_switch(struct task_struct *prev,
 	if (context_tracking_is_enabled())
 		__context_tracking_task_switch(prev, next);
 }
+
+/**
+ * ct_state() - return the current context tracking state if known
+ *
+ * Returns the current cpu's context tracking state if context tracking
+ * is enabled.  If context tracking is disabled, returns
+ * CONTEXT_DISABLED.  This should be used primarily for debugging.
+ */
+static inline enum ctx_state ct_state(void)
+{
+	return context_tracking_is_enabled() ?
+		this_cpu_read(context_tracking.state) : CONTEXT_DISABLED;
+}
 #else
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
@@ -64,8 +77,10 @@ static inline enum ctx_state exception_enter(void) { return 0; }
 static inline void exception_exit(enum ctx_state prev_ctx) { }
 static inline void context_tracking_task_switch(struct task_struct *prev,
 						struct task_struct *next) { }
+static inline enum ctx_state ct_state(void) { return CONTEXT_DISABLED; }
 #endif /* !CONFIG_CONTEXT_TRACKING */
 
+#define CT_WARN_ON(cond) WARN_ON(context_tracking_is_enabled() && (cond))
 
 #ifdef CONFIG_CONTEXT_TRACKING_FORCE
 extern void context_tracking_init(void);
diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
index 6b7b96a32b75..d4aec2805849 100644
--- a/include/linux/context_tracking_state.h
+++ b/include/linux/context_tracking_state.h
@@ -13,6 +13,7 @@ struct context_tracking {
 	 */
 	bool active;
 	enum ctx_state {
+		CONTEXT_DISABLED = -1,	/* returned by ct_state() if unknown */
 		CONTEXT_KERNEL = 0,
 		CONTEXT_USER,
 		CONTEXT_GUEST,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 01/14] uml: Fix do_signal() prototype Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 02/14] context_tracking: Add ct_state and CT_WARN_ON Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-22 11:36   ` Borislav Petkov
  2015-06-18 19:08 ` [PATCH v2 04/14] x86: Move C entry and exit code to arch/x86/entry/common.c Andy Lutomirski
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

Low-level arch entries often call notify_die, and it's easy for arch
code to fail to exit an RCU quiescent state first.  Assert that
we're not quiescent in notify_die.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/notifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index ae9fc7cc360e..980e4330fb59 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -544,6 +544,8 @@ int notrace notify_die(enum die_val val, const char *str,
 		.signr	= sig,
 
 	};
+	rcu_lockdep_assert(rcu_is_watching(),
+			   "notify_die called but RCU thinks we're quiescent");
 	return atomic_notifier_call_chain(&die_chain, val, &args);
 }
 NOKPROBE_SYMBOL(notify_die);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 04/14] x86: Move C entry and exit code to arch/x86/entry/common.c
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (2 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 05/14] x86/traps: Assert that we're in CONTEXT_KERNEL in exception entries Andy Lutomirski
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

The entry and exit C helpers were confusingly scattered between
ptrace.c and signal.c, even though they aren't specific to ptrace or
signal handling.  Move them together in a new file.

This change just moves code around.  It doesn't change anything.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/Makefile       |   1 +
 arch/x86/entry/common.c       | 253 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/signal.h |   1 +
 arch/x86/kernel/ptrace.c      | 202 +--------------------------------
 arch/x86/kernel/signal.c      |  28 +----
 5 files changed, 257 insertions(+), 228 deletions(-)
 create mode 100644 arch/x86/entry/common.c

diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index 7a144971db79..bd55dedd7614 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -2,6 +2,7 @@
 # Makefile for the x86 low level entry code
 #
 obj-y				:= entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
+obj-y				+= common.o
 
 obj-y				+= vdso/
 obj-y				+= vsyscall/
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
new file mode 100644
index 000000000000..348465473e55
--- /dev/null
+++ b/arch/x86/entry/common.c
@@ -0,0 +1,253 @@
+/*
+ * entry.c - C code for kernel entry and exit
+ * Copyright (c) 2015 Andrew Lutomirski
+ * GPL v2
+ *
+ * Based on asm and ptrace code by many authors.  The code here originated
+ * in ptrace.c and signal.c.
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/errno.h>
+#include <linux/ptrace.h>
+#include <linux/tracehook.h>
+#include <linux/audit.h>
+#include <linux/seccomp.h>
+#include <linux/signal.h>
+#include <linux/export.h>
+#include <linux/context_tracking.h>
+#include <linux/user-return-notifier.h>
+#include <linux/uprobes.h>
+
+#include <asm/desc.h>
+#include <asm/traps.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/syscalls.h>
+
+static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
+{
+#ifdef CONFIG_X86_64
+	if (arch == AUDIT_ARCH_X86_64) {
+		audit_syscall_entry(regs->orig_ax, regs->di,
+				    regs->si, regs->dx, regs->r10);
+	} else
+#endif
+	{
+		audit_syscall_entry(regs->orig_ax, regs->bx,
+				    regs->cx, regs->dx, regs->si);
+	}
+}
+
+/*
+ * We can return 0 to resume the syscall or anything else to go to phase
+ * 2.  If we resume the syscall, we need to put something appropriate in
+ * regs->orig_ax.
+ *
+ * NB: We don't have full pt_regs here, but regs->orig_ax and regs->ax
+ * are fully functional.
+ *
+ * For phase 2's benefit, our return value is:
+ * 0:			resume the syscall
+ * 1:			go to phase 2; no seccomp phase 2 needed
+ * anything else:	go to phase 2; pass return value to seccomp
+ */
+unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
+{
+	unsigned long ret = 0;
+	u32 work;
+
+	BUG_ON(regs != task_pt_regs(current));
+
+	work = ACCESS_ONCE(current_thread_info()->flags) &
+		_TIF_WORK_SYSCALL_ENTRY;
+
+	/*
+	 * If TIF_NOHZ is set, we are required to call user_exit() before
+	 * doing anything that could touch RCU.
+	 */
+	if (work & _TIF_NOHZ) {
+		user_exit();
+		work &= ~_TIF_NOHZ;
+	}
+
+#ifdef CONFIG_SECCOMP
+	/*
+	 * Do seccomp first -- it should minimize exposure of other
+	 * code, and keeping seccomp fast is probably more valuable
+	 * than the rest of this.
+	 */
+	if (work & _TIF_SECCOMP) {
+		struct seccomp_data sd;
+
+		sd.arch = arch;
+		sd.nr = regs->orig_ax;
+		sd.instruction_pointer = regs->ip;
+#ifdef CONFIG_X86_64
+		if (arch == AUDIT_ARCH_X86_64) {
+			sd.args[0] = regs->di;
+			sd.args[1] = regs->si;
+			sd.args[2] = regs->dx;
+			sd.args[3] = regs->r10;
+			sd.args[4] = regs->r8;
+			sd.args[5] = regs->r9;
+		} else
+#endif
+		{
+			sd.args[0] = regs->bx;
+			sd.args[1] = regs->cx;
+			sd.args[2] = regs->dx;
+			sd.args[3] = regs->si;
+			sd.args[4] = regs->di;
+			sd.args[5] = regs->bp;
+		}
+
+		BUILD_BUG_ON(SECCOMP_PHASE1_OK != 0);
+		BUILD_BUG_ON(SECCOMP_PHASE1_SKIP != 1);
+
+		ret = seccomp_phase1(&sd);
+		if (ret == SECCOMP_PHASE1_SKIP) {
+			regs->orig_ax = -1;
+			ret = 0;
+		} else if (ret != SECCOMP_PHASE1_OK) {
+			return ret;  /* Go directly to phase 2 */
+		}
+
+		work &= ~_TIF_SECCOMP;
+	}
+#endif
+
+	/* Do our best to finish without phase 2. */
+	if (work == 0)
+		return ret;  /* seccomp and/or nohz only (ret == 0 here) */
+
+#ifdef CONFIG_AUDITSYSCALL
+	if (work == _TIF_SYSCALL_AUDIT) {
+		/*
+		 * If there is no more work to be done except auditing,
+		 * then audit in phase 1.  Phase 2 always audits, so, if
+		 * we audit here, then we can't go on to phase 2.
+		 */
+		do_audit_syscall_entry(regs, arch);
+		return 0;
+	}
+#endif
+
+	return 1;  /* Something is enabled that we can't handle in phase 1 */
+}
+
+/* Returns the syscall nr to run (which should match regs->orig_ax). */
+long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
+				unsigned long phase1_result)
+{
+	long ret = 0;
+	u32 work = ACCESS_ONCE(current_thread_info()->flags) &
+		_TIF_WORK_SYSCALL_ENTRY;
+
+	BUG_ON(regs != task_pt_regs(current));
+
+	/*
+	 * If we stepped into a sysenter/syscall insn, it trapped in
+	 * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
+	 * If user-mode had set TF itself, then it's still clear from
+	 * do_debug() and we need to set it again to restore the user
+	 * state.  If we entered on the slow path, TF was already set.
+	 */
+	if (work & _TIF_SINGLESTEP)
+		regs->flags |= X86_EFLAGS_TF;
+
+#ifdef CONFIG_SECCOMP
+	/*
+	 * Call seccomp_phase2 before running the other hooks so that
+	 * they can see any changes made by a seccomp tracer.
+	 */
+	if (phase1_result > 1 && seccomp_phase2(phase1_result)) {
+		/* seccomp failures shouldn't expose any additional code. */
+		return -1;
+	}
+#endif
+
+	if (unlikely(work & _TIF_SYSCALL_EMU))
+		ret = -1L;
+
+	if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) &&
+	    tracehook_report_syscall_entry(regs))
+		ret = -1L;
+
+	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
+		trace_sys_enter(regs, regs->orig_ax);
+
+	do_audit_syscall_entry(regs, arch);
+
+	return ret ?: regs->orig_ax;
+}
+
+long syscall_trace_enter(struct pt_regs *regs)
+{
+	u32 arch = is_ia32_task() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
+	unsigned long phase1_result = syscall_trace_enter_phase1(regs, arch);
+
+	if (phase1_result == 0)
+		return regs->orig_ax;
+	else
+		return syscall_trace_enter_phase2(regs, arch, phase1_result);
+}
+
+void syscall_trace_leave(struct pt_regs *regs)
+{
+	bool step;
+
+	/*
+	 * We may come here right after calling schedule_user()
+	 * or do_notify_resume(), in which case we can be in RCU
+	 * user mode.
+	 */
+	user_exit();
+
+	audit_syscall_exit(regs);
+
+	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
+		trace_sys_exit(regs, regs->ax);
+
+	/*
+	 * If TIF_SYSCALL_EMU is set, we only get here because of
+	 * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP).
+	 * We already reported this syscall instruction in
+	 * syscall_trace_enter().
+	 */
+	step = unlikely(test_thread_flag(TIF_SINGLESTEP)) &&
+			!test_thread_flag(TIF_SYSCALL_EMU);
+	if (step || test_thread_flag(TIF_SYSCALL_TRACE))
+		tracehook_report_syscall_exit(regs, step);
+
+	user_enter();
+}
+
+/*
+ * notification of userspace execution resumption
+ * - triggered by the TIF_WORK_MASK flags
+ */
+__visible void
+do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
+{
+	user_exit();
+
+	if (thread_info_flags & _TIF_UPROBE)
+		uprobe_notify_resume(regs);
+
+	/* deal with pending signal delivery */
+	if (thread_info_flags & _TIF_SIGPENDING)
+		do_signal(regs);
+
+	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+		clear_thread_flag(TIF_NOTIFY_RESUME);
+		tracehook_notify_resume(regs);
+	}
+	if (thread_info_flags & _TIF_USER_RETURN_NOTIFY)
+		fire_user_return_notifiers();
+
+	user_enter();
+}
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 31eab867e6d3..b42408bcf6b5 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -30,6 +30,7 @@ typedef sigset_t compat_sigset_t;
 #endif /* __ASSEMBLY__ */
 #include <uapi/asm/signal.h>
 #ifndef __ASSEMBLY__
+extern void do_signal(struct pt_regs *regs);
 extern void do_notify_resume(struct pt_regs *, void *, __u32);
 
 #define __ARCH_HAS_SA_RESTORER
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index a7bc79480719..d9417430a4b0 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -37,12 +37,10 @@
 #include <asm/proto.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/traps.h>
+#include <asm/syscall.h>
 
 #include "tls.h"
 
-#define CREATE_TRACE_POINTS
-#include <trace/events/syscalls.h>
-
 enum x86_regset {
 	REGSET_GENERAL,
 	REGSET_FP,
@@ -1434,201 +1432,3 @@ void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 	/* Send us the fake SIGTRAP */
 	force_sig_info(SIGTRAP, &info, tsk);
 }
-
-static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
-{
-#ifdef CONFIG_X86_64
-	if (arch == AUDIT_ARCH_X86_64) {
-		audit_syscall_entry(regs->orig_ax, regs->di,
-				    regs->si, regs->dx, regs->r10);
-	} else
-#endif
-	{
-		audit_syscall_entry(regs->orig_ax, regs->bx,
-				    regs->cx, regs->dx, regs->si);
-	}
-}
-
-/*
- * We can return 0 to resume the syscall or anything else to go to phase
- * 2.  If we resume the syscall, we need to put something appropriate in
- * regs->orig_ax.
- *
- * NB: We don't have full pt_regs here, but regs->orig_ax and regs->ax
- * are fully functional.
- *
- * For phase 2's benefit, our return value is:
- * 0:			resume the syscall
- * 1:			go to phase 2; no seccomp phase 2 needed
- * anything else:	go to phase 2; pass return value to seccomp
- */
-unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
-{
-	unsigned long ret = 0;
-	u32 work;
-
-	BUG_ON(regs != task_pt_regs(current));
-
-	work = ACCESS_ONCE(current_thread_info()->flags) &
-		_TIF_WORK_SYSCALL_ENTRY;
-
-	/*
-	 * If TIF_NOHZ is set, we are required to call user_exit() before
-	 * doing anything that could touch RCU.
-	 */
-	if (work & _TIF_NOHZ) {
-		user_exit();
-		work &= ~_TIF_NOHZ;
-	}
-
-#ifdef CONFIG_SECCOMP
-	/*
-	 * Do seccomp first -- it should minimize exposure of other
-	 * code, and keeping seccomp fast is probably more valuable
-	 * than the rest of this.
-	 */
-	if (work & _TIF_SECCOMP) {
-		struct seccomp_data sd;
-
-		sd.arch = arch;
-		sd.nr = regs->orig_ax;
-		sd.instruction_pointer = regs->ip;
-#ifdef CONFIG_X86_64
-		if (arch == AUDIT_ARCH_X86_64) {
-			sd.args[0] = regs->di;
-			sd.args[1] = regs->si;
-			sd.args[2] = regs->dx;
-			sd.args[3] = regs->r10;
-			sd.args[4] = regs->r8;
-			sd.args[5] = regs->r9;
-		} else
-#endif
-		{
-			sd.args[0] = regs->bx;
-			sd.args[1] = regs->cx;
-			sd.args[2] = regs->dx;
-			sd.args[3] = regs->si;
-			sd.args[4] = regs->di;
-			sd.args[5] = regs->bp;
-		}
-
-		BUILD_BUG_ON(SECCOMP_PHASE1_OK != 0);
-		BUILD_BUG_ON(SECCOMP_PHASE1_SKIP != 1);
-
-		ret = seccomp_phase1(&sd);
-		if (ret == SECCOMP_PHASE1_SKIP) {
-			regs->orig_ax = -1;
-			ret = 0;
-		} else if (ret != SECCOMP_PHASE1_OK) {
-			return ret;  /* Go directly to phase 2 */
-		}
-
-		work &= ~_TIF_SECCOMP;
-	}
-#endif
-
-	/* Do our best to finish without phase 2. */
-	if (work == 0)
-		return ret;  /* seccomp and/or nohz only (ret == 0 here) */
-
-#ifdef CONFIG_AUDITSYSCALL
-	if (work == _TIF_SYSCALL_AUDIT) {
-		/*
-		 * If there is no more work to be done except auditing,
-		 * then audit in phase 1.  Phase 2 always audits, so, if
-		 * we audit here, then we can't go on to phase 2.
-		 */
-		do_audit_syscall_entry(regs, arch);
-		return 0;
-	}
-#endif
-
-	return 1;  /* Something is enabled that we can't handle in phase 1 */
-}
-
-/* Returns the syscall nr to run (which should match regs->orig_ax). */
-long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
-				unsigned long phase1_result)
-{
-	long ret = 0;
-	u32 work = ACCESS_ONCE(current_thread_info()->flags) &
-		_TIF_WORK_SYSCALL_ENTRY;
-
-	BUG_ON(regs != task_pt_regs(current));
-
-	/*
-	 * If we stepped into a sysenter/syscall insn, it trapped in
-	 * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
-	 * If user-mode had set TF itself, then it's still clear from
-	 * do_debug() and we need to set it again to restore the user
-	 * state.  If we entered on the slow path, TF was already set.
-	 */
-	if (work & _TIF_SINGLESTEP)
-		regs->flags |= X86_EFLAGS_TF;
-
-#ifdef CONFIG_SECCOMP
-	/*
-	 * Call seccomp_phase2 before running the other hooks so that
-	 * they can see any changes made by a seccomp tracer.
-	 */
-	if (phase1_result > 1 && seccomp_phase2(phase1_result)) {
-		/* seccomp failures shouldn't expose any additional code. */
-		return -1;
-	}
-#endif
-
-	if (unlikely(work & _TIF_SYSCALL_EMU))
-		ret = -1L;
-
-	if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) &&
-	    tracehook_report_syscall_entry(regs))
-		ret = -1L;
-
-	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
-		trace_sys_enter(regs, regs->orig_ax);
-
-	do_audit_syscall_entry(regs, arch);
-
-	return ret ?: regs->orig_ax;
-}
-
-long syscall_trace_enter(struct pt_regs *regs)
-{
-	u32 arch = is_ia32_task() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
-	unsigned long phase1_result = syscall_trace_enter_phase1(regs, arch);
-
-	if (phase1_result == 0)
-		return regs->orig_ax;
-	else
-		return syscall_trace_enter_phase2(regs, arch, phase1_result);
-}
-
-void syscall_trace_leave(struct pt_regs *regs)
-{
-	bool step;
-
-	/*
-	 * We may come here right after calling schedule_user()
-	 * or do_notify_resume(), in which case we can be in RCU
-	 * user mode.
-	 */
-	user_exit();
-
-	audit_syscall_exit(regs);
-
-	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
-		trace_sys_exit(regs, regs->ax);
-
-	/*
-	 * If TIF_SYSCALL_EMU is set, we only get here because of
-	 * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP).
-	 * We already reported this syscall instruction in
-	 * syscall_trace_enter().
-	 */
-	step = unlikely(test_thread_flag(TIF_SINGLESTEP)) &&
-			!test_thread_flag(TIF_SYSCALL_EMU);
-	if (step || test_thread_flag(TIF_SYSCALL_TRACE))
-		tracehook_report_syscall_exit(regs, step);
-
-	user_enter();
-}
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 1ea14fd53933..c0ad42a640b9 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -683,7 +683,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
  * want to handle. Thus you cannot kill init even with a SIGKILL even by
  * mistake.
  */
-static void do_signal(struct pt_regs *regs)
+void do_signal(struct pt_regs *regs)
 {
 	struct ksignal ksig;
 
@@ -718,32 +718,6 @@ static void do_signal(struct pt_regs *regs)
 	restore_saved_sigmask();
 }
 
-/*
- * notification of userspace execution resumption
- * - triggered by the TIF_WORK_MASK flags
- */
-__visible void
-do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
-{
-	user_exit();
-
-	if (thread_info_flags & _TIF_UPROBE)
-		uprobe_notify_resume(regs);
-
-	/* deal with pending signal delivery */
-	if (thread_info_flags & _TIF_SIGPENDING)
-		do_signal(regs);
-
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
-		tracehook_notify_resume(regs);
-	}
-	if (thread_info_flags & _TIF_USER_RETURN_NOTIFY)
-		fire_user_return_notifiers();
-
-	user_enter();
-}
-
 void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
 {
 	struct task_struct *me = current;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 05/14] x86/traps: Assert that we're in CONTEXT_KERNEL in exception entries
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (3 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 04/14] x86: Move C entry and exit code to arch/x86/entry/common.c Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 06/14] x86/entry: Add enter_from_user_mode and use it in syscalls Andy Lutomirski
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

Other than the super-atomic exception entries, all exception entries
are supposed to switch our context tracking state to CONTEXT_KERNEL.
Assert that they do.  These assertions appear trivial at this point,
as exception_enter is the function responsible for switching
context, but I'm planning on reworking x86's exception context
tracking, and these assertions will help make sure that all of this
code keeps working.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index de379366f6d1..3f947488c9c1 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -291,6 +291,8 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
 	enum ctx_state prev_state = exception_enter();
 	siginfo_t info;
 
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+
 	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
 			NOTIFY_STOP) {
 		conditional_sti(regs);
@@ -377,6 +379,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	siginfo_t *info;
 
 	prev_state = exception_enter();
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	if (notify_die(DIE_TRAP, "bounds", regs, error_code,
 			X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
 		goto exit;
@@ -458,6 +461,7 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	conditional_sti(regs);
 
 	if (v8086_mode(regs)) {
@@ -515,6 +519,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
 		return;
 
 	prev_state = ist_enter(regs);
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
 	if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
 				SIGTRAP) == NOTIFY_STOP)
@@ -794,6 +799,7 @@ dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
 	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	math_error(regs, error_code, X86_TRAP_MF);
 	exception_exit(prev_state);
 }
@@ -804,6 +810,7 @@ do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
 	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	math_error(regs, error_code, X86_TRAP_XF);
 	exception_exit(prev_state);
 }
@@ -862,6 +869,7 @@ do_device_not_available(struct pt_regs *regs, long error_code)
 	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	BUG_ON(use_eager_fpu());
 
 #ifdef CONFIG_MATH_EMULATION
@@ -891,6 +899,7 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
 	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	local_irq_enable();
 
 	info.si_signo = SIGILL;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 06/14] x86/entry: Add enter_from_user_mode and use it in syscalls
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (4 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 05/14] x86/traps: Assert that we're in CONTEXT_KERNEL in exception entries Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 07/14] x86/entry: Add new, comprehensible entry and exit hooks Andy Lutomirski
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

Changing the x86 context tracking hooks is dangerous because there
are no good checks that we track our context correctly.  Add a
helper to check that we're actually in CONTEXT_USER when we enter
from user mode and wire it up for syscall entries.

Subsequent patches will wire this up for all non-NMI entries as
well.  NMIs are their own special beast and cannot currently switch
overall context tracking state.  Instead, they have their own
special RCU hooks.

This is a tiny speedup if !CONFIG_CONTEXT_TRACKING (removes a
branch) and a tiny slowdown if CONFIG_CONTEXT_TRACING (adds a layer
of indirection).  Eventually, we should fix up the core context
tracking code to supply a function that does what we want (and can
be much simpler than user_exit), which will enable us to get rid of
the extra call.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/common.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 348465473e55..8be5bf36a866 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -28,6 +28,15 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
 
+#ifdef CONFIG_CONTEXT_TRACKING
+/* Called on entry from user mode with IRQs off. */
+asmlinkage __visible void enter_from_user_mode(void)
+{
+	CT_WARN_ON(ct_state() != CONTEXT_USER);
+	user_exit();
+}
+#endif
+
 static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
 {
 #ifdef CONFIG_X86_64
@@ -65,14 +74,16 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
 	work = ACCESS_ONCE(current_thread_info()->flags) &
 		_TIF_WORK_SYSCALL_ENTRY;
 
+#ifdef CONFIG_CONTEXT_TRACKING
 	/*
 	 * If TIF_NOHZ is set, we are required to call user_exit() before
 	 * doing anything that could touch RCU.
 	 */
 	if (work & _TIF_NOHZ) {
-		user_exit();
+		enter_from_user_mode();
 		work &= ~_TIF_NOHZ;
 	}
+#endif
 
 #ifdef CONFIG_SECCOMP
 	/*
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 07/14] x86/entry: Add new, comprehensible entry and exit hooks
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (5 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 06/14] x86/entry: Add enter_from_user_mode and use it in syscalls Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 08/14] x86/entry/64: Really create an error-entry-from-usermode code path Andy Lutomirski
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

The current entry and exit code is incomprehensible, appears to work
primary by luck, and is very difficult to incrementally improve.  Add
new code in preparation for simply deleting the old code.

prepare_exit_to_usermode is a new function that will handle all slow
path exits to user mode.  It is called with IRQs disabled and it
leaves us in a state in which it is safe to immediately return to
user mode.  IRQs must not be re-enabled at any point after
prepare_exit_to_usermode returns and user mode is actually entered.
(We can, of course, fail to enter user mode and treat that failure
as a fresh entry to kernel mode.)  All callers of do_notify_resume
will be migrated to call prepare_exit_to_usermode instead;
prepare_exit_to_usermode needs to do everything that
do_notify_resume does, but it also takes care of scheduling and
context tracking.  Unlike do_notify_resume, it does not need to be
called in a loop.

syscall_return_slowpath is exactly what it sounds like.  It will be
called on any syscall exit slow path.  It will replaces
syscall_trace_leave and it calls prepare_exit_to_usermode on the way
out.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/common.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 8be5bf36a866..402569a0ce35 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -207,6 +207,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 		return syscall_trace_enter_phase2(regs, arch, phase1_result);
 }
 
+/* Deprecated. */
 void syscall_trace_leave(struct pt_regs *regs)
 {
 	bool step;
@@ -237,8 +238,117 @@ void syscall_trace_leave(struct pt_regs *regs)
 	user_enter();
 }
 
+static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
+{
+	unsigned long top_of_stack =
+		(unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING;
+	return (struct thread_info *)(top_of_stack - THREAD_SIZE);
+}
+
+/* Called with IRQs disabled. */
+asmlinkage __visible void prepare_exit_to_usermode(struct pt_regs *regs)
+{
+	if (WARN_ON(!irqs_disabled()))
+		local_irq_enable();
+
+	/*
+	 * In order to return to user mode, we need to have IRQs off with
+	 * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY,
+	 * _TIF_UPROBE, or _TIF_NEED_RESCHED set.  Several of these flags
+	 * can be set at any time on preemptable kernels if we have IRQs on,
+	 * so we need to loop.  Disabling preemption wouldn't help: doing the
+	 * work to clear some of the flags can sleep.
+	 */
+	while (true) {
+		u32 cached_flags =
+			READ_ONCE(pt_regs_to_thread_info(regs)->flags);
+
+		if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
+				      _TIF_UPROBE | _TIF_NEED_RESCHED)))
+			break;
+
+		/* We have work to do. */
+		local_irq_enable();
+
+		if (cached_flags & _TIF_NEED_RESCHED)
+			schedule();
+
+		if (cached_flags & _TIF_UPROBE)
+			uprobe_notify_resume(regs);
+
+		/* deal with pending signal delivery */
+		if (cached_flags & _TIF_SIGPENDING)
+			do_signal(regs);
+
+		if (cached_flags & _TIF_NOTIFY_RESUME) {
+			clear_thread_flag(TIF_NOTIFY_RESUME);
+			tracehook_notify_resume(regs);
+		}
+
+		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
+			fire_user_return_notifiers();
+
+		/* Disable IRQs and retry */
+		local_irq_disable();
+	}
+
+	user_enter();
+}
+
+/*
+ * Called with IRQs on and fully valid regs.  Returns with IRQs off in a
+ * state such that we can immediately switch to user mode.
+ */
+asmlinkage __visible void syscall_return_slowpath(struct pt_regs *regs)
+{
+	struct thread_info *ti = pt_regs_to_thread_info(regs);
+	u32 cached_flags = READ_ONCE(ti->flags);
+	bool step;
+
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+
+	if (WARN(irqs_disabled(), "syscall %ld left IRQs disabled",
+		 regs->orig_ax))
+		local_irq_enable();
+
+	/*
+	 * First do one-time work.  If these work items are enabled, we
+	 * want to run them exactly once per syscall exit with IRQs on.
+	 */
+	if (cached_flags & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT |
+			    _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)) {
+		audit_syscall_exit(regs);
+
+		if (cached_flags & _TIF_SYSCALL_TRACEPOINT)
+			trace_sys_exit(regs, regs->ax);
+
+		/*
+		 * If TIF_SYSCALL_EMU is set, we only get here because of
+		 * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP).
+		 * We already reported this syscall instruction in
+		 * syscall_trace_enter().
+		 */
+		step = unlikely(
+			(cached_flags & (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU))
+			== _TIF_SINGLESTEP);
+		if (step || cached_flags & _TIF_SYSCALL_TRACE)
+			tracehook_report_syscall_exit(regs, step);
+	}
+
+#ifdef CONFIG_COMPAT
+	/*
+	 * Compat syscalls set TS_COMPAT.  Make sure we clear it before
+	 * returning to user mode.
+	 */
+	ti->status &= ~TS_COMPAT;
+#endif
+
+	local_irq_disable();
+	prepare_exit_to_usermode(regs);
+}
+
 /*
- * notification of userspace execution resumption
+ * Deprecated notification of userspace execution resumption
  * - triggered by the TIF_WORK_MASK flags
  */
 __visible void
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 08/14] x86/entry/64: Really create an error-entry-from-usermode code path
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (6 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 07/14] x86/entry: Add new, comprehensible entry and exit hooks Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 09/14] x86/entry/64: Migrate 64-bit and compat syscalls to new exit hooks Andy Lutomirski
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

In 539f51136500 ("x86/asm/entry/64: Disentangle error_entry/exit
gsbase/ebx/usermode code"), I arranged the code slightly wrong --
IRET faults would skip the code path that was intended to execute on
all error entries from user mode.  Fix it up.

This does not fix a bug, but we'll need it, and it slightly shrinks
the code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3bb2c4302df1..33acc3dcc281 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1145,7 +1145,11 @@ ENTRY(error_entry)
 	testb	$3, CS+8(%rsp)
 	jz	error_kernelspace
 
-	/* We entered from user mode */
+error_entry_from_usermode:
+	/*
+	 * We entered from user mode or we're pretending to have entered
+	 * from user mode due to an IRET fault.
+	 */
 	SWAPGS
 
 error_entry_done:
@@ -1174,8 +1178,7 @@ error_kernelspace:
 	 * gsbase and proceed.  We'll fix up the exception and land in
 	 * gs_change's error handler with kernel gsbase.
 	 */
-	SWAPGS
-	jmp	error_entry_done
+	jmp	error_entry_from_usermode
 
 bstep_iret:
 	/* Fix truncated RIP */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 09/14] x86/entry/64: Migrate 64-bit and compat syscalls to new exit hooks
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (7 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 08/14] x86/entry/64: Really create an error-entry-from-usermode code path Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 10/14] x86/asm/entry/64: Save all regs on interrupt entry Andy Lutomirski
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

These need to be migrated together, as the compat case used to jump
into the middle of the 64-bit exit code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S        | 68 +++++-----------------------------------
 arch/x86/entry/entry_64_compat.S |  7 ++---
 2 files changed, 10 insertions(+), 65 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 33acc3dcc281..a5044d7a9d43 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -229,6 +229,11 @@ entry_SYSCALL_64_fastpath:
 	 */
 	USERGS_SYSRET64
 
+GLOBAL(int_ret_from_sys_call_irqs_off)
+	TRACE_IRQS_ON
+	ENABLE_INTERRUPTS(CLBR_NONE)
+	jmp int_ret_from_sys_call
+
 	/* Do syscall entry tracing */
 tracesys:
 	movq	%rsp, %rdi
@@ -272,69 +277,10 @@ tracesys_phase2:
  * Has correct iret frame.
  */
 GLOBAL(int_ret_from_sys_call)
-	DISABLE_INTERRUPTS(CLBR_NONE)
-int_ret_from_sys_call_irqs_off: /* jumps come here from the irqs-off SYSRET path */
-	TRACE_IRQS_OFF
-	movl	$_TIF_ALLWORK_MASK, %edi
-	/* edi:	mask to check */
-GLOBAL(int_with_check)
-	LOCKDEP_SYS_EXIT_IRQ
-	GET_THREAD_INFO(%rcx)
-	movl	TI_flags(%rcx), %edx
-	andl	%edi, %edx
-	jnz	int_careful
-	andl	$~TS_COMPAT, TI_status(%rcx)
-	jmp	syscall_return
-
-	/*
-	 * Either reschedule or signal or syscall exit tracking needed.
-	 * First do a reschedule test.
-	 * edx:	work, edi: workmask
-	 */
-int_careful:
-	bt	$TIF_NEED_RESCHED, %edx
-	jnc	int_very_careful
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-	pushq	%rdi
-	SCHEDULE_USER
-	popq	%rdi
-	DISABLE_INTERRUPTS(CLBR_NONE)
-	TRACE_IRQS_OFF
-	jmp	int_with_check
-
-	/* handle signals and tracing -- both require a full pt_regs */
-int_very_careful:
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
 	SAVE_EXTRA_REGS
-	/* Check for syscall exit trace */
-	testl	$_TIF_WORK_SYSCALL_EXIT, %edx
-	jz	int_signal
-	pushq	%rdi
-	leaq	8(%rsp), %rdi			/* &ptregs -> arg1 */
-	call	syscall_trace_leave
-	popq	%rdi
-	andl	$~(_TIF_WORK_SYSCALL_EXIT|_TIF_SYSCALL_EMU), %edi
-	jmp	int_restore_rest
-
-int_signal:
-	testl	$_TIF_DO_NOTIFY_MASK, %edx
-	jz	1f
-	movq	%rsp, %rdi			/* &ptregs -> arg1 */
-	xorl	%esi, %esi			/* oldset -> arg2 */
-	call	do_notify_resume
-1:	movl	$_TIF_WORK_MASK, %edi
-int_restore_rest:
+	movq	%rsp, %rdi
+	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	RESTORE_EXTRA_REGS
-	DISABLE_INTERRUPTS(CLBR_NONE)
-	TRACE_IRQS_OFF
-	jmp	int_with_check
-
-syscall_return:
-	/* The IRETQ could re-enable interrupts: */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_IRETQ
 
 	/*
 	 * Try to use SYSRET instead of IRET if we're returning to
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index bb187a6a877c..415afa038edf 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -209,10 +209,10 @@ sysexit_from_sys_call:
 	.endm
 
 	.macro auditsys_exit exit
-	testl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	ia32_ret_from_sys_call
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
+	testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+	jnz ia32_ret_from_sys_call
 	movl	%eax, %esi		/* second arg, syscall return value */
 	cmpl	$-MAX_ERRNO, %eax	/* is it an error ? */
 	jbe	1f
@@ -227,11 +227,10 @@ sysexit_from_sys_call:
 	testl	%edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jz	\exit
 	xorl	%eax, %eax		/* Do not leak kernel information */
-	movq	%rax, R11(%rsp)
+	jmp	int_ret_from_sys_call_irqs_off
 	movq	%rax, R10(%rsp)
 	movq	%rax, R9(%rsp)
 	movq	%rax, R8(%rsp)
-	jmp	int_with_check
 	.endm
 
 sysenter_auditsys:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 10/14] x86/asm/entry/64: Save all regs on interrupt entry
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (8 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 09/14] x86/entry/64: Migrate 64-bit and compat syscalls to new exit hooks Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 11/14] x86/asm/entry/64: Simplify irq stack pt_regs handling Andy Lutomirski
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

To prepare for the big rewrite of the error and interrupt exit
paths, we will need pt_regs completely filled in.  It's already
completely filled in when error_exit runs, so rearrange interrupt
handling to match it.  This will slow down interrupt handling very
slightly (eight instructions), but the simplification it enables
will be more than worth it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a5044d7a9d43..43bf5762443c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -501,21 +501,13 @@ END(irq_entries_start)
 /* 0(%rsp): ~(interrupt number) */
 	.macro interrupt func
 	cld
-	/*
-	 * Since nothing in interrupt handling code touches r12...r15 members
-	 * of "struct pt_regs", and since interrupts can nest, we can save
-	 * four stack slots and simultaneously provide
-	 * an unwind-friendly stack layout by saving "truncated" pt_regs
-	 * exactly up to rbp slot, without these members.
-	 */
-	ALLOC_PT_GPREGS_ON_STACK -RBP
-	SAVE_C_REGS -RBP
-	/* this goes to 0(%rsp) for unwinder, not for saving the value: */
-	SAVE_EXTRA_REGS_RBP -RBP
+	ALLOC_PT_GPREGS_ON_STACK
+	SAVE_C_REGS
+	SAVE_EXTRA_REGS
 
-	leaq	-RBP(%rsp), %rdi		/* arg1 for \func (pointer to pt_regs) */
+	movq	%rsp,%rdi	/* arg1 for \func (pointer to pt_regs) */
 
-	testb	$3, CS-RBP(%rsp)
+	testb	$3, CS(%rsp)
 	jz	1f
 	SWAPGS
 1:
@@ -552,9 +544,7 @@ ret_from_intr:
 	decl	PER_CPU_VAR(irq_count)
 
 	/* Restore saved previous stack */
-	popq	%rsi
-	/* return code expects complete pt_regs - adjust rsp accordingly: */
-	leaq	-RBP(%rsi), %rsp
+	popq	%rsp
 
 	testb	$3, CS(%rsp)
 	jz	retint_kernel
@@ -579,7 +569,7 @@ retint_swapgs:					/* return to user-space */
 	TRACE_IRQS_IRETQ
 
 	SWAPGS
-	jmp	restore_c_regs_and_iret
+	jmp	restore_regs_and_iret
 
 /* Returning to kernel space */
 retint_kernel:
@@ -603,6 +593,8 @@ retint_kernel:
  * At this label, code paths which return to kernel and to user,
  * which come from interrupts/exception and from syscalls, merge.
  */
+restore_regs_and_iret:
+	RESTORE_EXTRA_REGS
 restore_c_regs_and_iret:
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
@@ -673,12 +665,10 @@ retint_signal:
 	jz	retint_swapgs
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	SAVE_EXTRA_REGS
 	movq	$-1, ORIG_RAX(%rsp)
 	xorl	%esi, %esi			/* oldset */
 	movq	%rsp, %rdi			/* &pt_regs */
 	call	do_notify_resume
-	RESTORE_EXTRA_REGS
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
 	GET_THREAD_INFO(%rcx)
@@ -1158,7 +1148,6 @@ END(error_entry)
  */
 ENTRY(error_exit)
 	movl	%ebx, %eax
-	RESTORE_EXTRA_REGS
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
 	testl	%eax, %eax
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 11/14] x86/asm/entry/64: Simplify irq stack pt_regs handling
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (9 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 10/14] x86/asm/entry/64: Save all regs on interrupt entry Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 12/14] x86/asm/entry/64: Migrate error and interrupt exit work to C Andy Lutomirski
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

There's no need for both rsi and rdi to point to the original stack.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 43bf5762443c..ab8cbf602d19 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -505,8 +505,6 @@ END(irq_entries_start)
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
 
-	movq	%rsp,%rdi	/* arg1 for \func (pointer to pt_regs) */
-
 	testb	$3, CS(%rsp)
 	jz	1f
 	SWAPGS
@@ -518,14 +516,14 @@ END(irq_entries_start)
 	 * a little cheaper to use a separate counter in the PDA (short of
 	 * moving irq_enter into assembly, which would be too much work)
 	 */
-	movq	%rsp, %rsi
+	movq	%rsp, %rdi
 	incl	PER_CPU_VAR(irq_count)
 	cmovzq	PER_CPU_VAR(irq_stack_ptr), %rsp
-	pushq	%rsi
+	pushq	%rdi
 	/* We entered an interrupt context - irqs are off: */
 	TRACE_IRQS_OFF
 
-	call	\func
+	call	\func	/* rdi points to pt_regs */
 	.endm
 
 	/*
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 12/14] x86/asm/entry/64: Migrate error and interrupt exit work to C
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (10 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 11/14] x86/asm/entry/64: Simplify irq stack pt_regs handling Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 13/14] x86/entry: Remove exception_enter from trap handlers Andy Lutomirski
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 63 +++++++++++++----------------------------------
 1 file changed, 17 insertions(+), 46 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ab8cbf602d19..9ae8b8ab91fa 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -507,7 +507,16 @@ END(irq_entries_start)
 
 	testb	$3, CS(%rsp)
 	jz	1f
+
+	/*
+	 * IRQ from user mode.  Switch to kernel gsbase and inform context
+	 * tracking that we're in kernel mode.
+	 */
 	SWAPGS
+#ifdef CONFIG_CONTEXT_TRACKING
+	call enter_from_user_mode
+#endif
+
 1:
 	/*
 	 * Save previous stack pointer, optionally switch to interrupt stack.
@@ -546,26 +555,13 @@ ret_from_intr:
 
 	testb	$3, CS(%rsp)
 	jz	retint_kernel
-	/* Interrupt came from user space */
-retint_user:
-	GET_THREAD_INFO(%rcx)
 
-	/* %rcx: thread info. Interrupts are off. */
-retint_with_reschedule:
-	movl	$_TIF_WORK_MASK, %edi
-retint_check:
+	/* Interrupt came from user space */
 	LOCKDEP_SYS_EXIT_IRQ
-	movl	TI_flags(%rcx), %edx
-	andl	%edi, %edx
-	jnz	retint_careful
-
-retint_swapgs:					/* return to user-space */
-	/*
-	 * The iretq could re-enable interrupts:
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
+retint_user:
+	mov	%rsp,%rdi
+	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
-
 	SWAPGS
 	jmp	restore_regs_and_iret
 
@@ -643,35 +639,6 @@ native_irq_return_ldt:
 	popq	%rax
 	jmp	native_irq_return_iret
 #endif
-
-	/* edi: workmask, edx: work */
-retint_careful:
-	bt	$TIF_NEED_RESCHED, %edx
-	jnc	retint_signal
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-	pushq	%rdi
-	SCHEDULE_USER
-	popq	%rdi
-	GET_THREAD_INFO(%rcx)
-	DISABLE_INTERRUPTS(CLBR_NONE)
-	TRACE_IRQS_OFF
-	jmp	retint_check
-
-retint_signal:
-	testl	$_TIF_DO_NOTIFY_MASK, %edx
-	jz	retint_swapgs
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-	movq	$-1, ORIG_RAX(%rsp)
-	xorl	%esi, %esi			/* oldset */
-	movq	%rsp, %rdi			/* &pt_regs */
-	call	do_notify_resume
-	DISABLE_INTERRUPTS(CLBR_NONE)
-	TRACE_IRQS_OFF
-	GET_THREAD_INFO(%rcx)
-	jmp	retint_with_reschedule
-
 END(common_interrupt)
 
 /*
@@ -1086,6 +1053,10 @@ error_entry_from_usermode:
 	 */
 	SWAPGS
 
+#ifdef CONFIG_CONTEXT_TRACKING
+	call enter_from_user_mode
+#endif
+
 error_entry_done:
 	TRACE_IRQS_OFF
 	ret
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 13/14] x86/entry: Remove exception_enter from trap handlers
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (11 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 12/14] x86/asm/entry/64: Migrate error and interrupt exit work to C Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-18 19:08 ` [PATCH v2 14/14] x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h Andy Lutomirski
  2015-06-22 19:50 ` [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

On 64-bit kernels, we don't need it any more: we handle context
tracking directly on entry from user mode and exit to user mode.  On
32-bit kernels, we don't support context tracking at all, so these
hooks had no effect.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/traps.h         |  4 +-
 arch/x86/kernel/cpu/mcheck/mce.c     |  5 +--
 arch/x86/kernel/cpu/mcheck/p5.c      |  5 +--
 arch/x86/kernel/cpu/mcheck/winchip.c |  4 +-
 arch/x86/kernel/traps.c              | 78 +++++++++---------------------------
 5 files changed, 27 insertions(+), 69 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index c5380bea2a36..c3496619740a 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -112,8 +112,8 @@ asmlinkage void smp_threshold_interrupt(void);
 asmlinkage void smp_deferred_error_interrupt(void);
 #endif
 
-extern enum ctx_state ist_enter(struct pt_regs *regs);
-extern void ist_exit(struct pt_regs *regs, enum ctx_state prev_state);
+extern void ist_enter(struct pt_regs *regs);
+extern void ist_exit(struct pt_regs *regs);
 extern void ist_begin_non_atomic(struct pt_regs *regs);
 extern void ist_end_non_atomic(void);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5b974c97e31e..29cba90eb96d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1026,7 +1026,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 {
 	struct mca_config *cfg = &mca_cfg;
 	struct mce m, *final;
-	enum ctx_state prev_state;
 	int i;
 	int worst = 0;
 	int severity;
@@ -1052,7 +1051,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	int flags = MF_ACTION_REQUIRED;
 	int lmce = 0;
 
-	prev_state = ist_enter(regs);
+	ist_enter(regs);
 
 	this_cpu_inc(mce_exception_count);
 
@@ -1224,7 +1223,7 @@ out:
 	local_irq_disable();
 	ist_end_non_atomic();
 done:
-	ist_exit(regs, prev_state);
+	ist_exit(regs);
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
 
diff --git a/arch/x86/kernel/cpu/mcheck/p5.c b/arch/x86/kernel/cpu/mcheck/p5.c
index 737b0ad4e61a..12402e10aeff 100644
--- a/arch/x86/kernel/cpu/mcheck/p5.c
+++ b/arch/x86/kernel/cpu/mcheck/p5.c
@@ -19,10 +19,9 @@ int mce_p5_enabled __read_mostly;
 /* Machine check handler for Pentium class Intel CPUs: */
 static void pentium_machine_check(struct pt_regs *regs, long error_code)
 {
-	enum ctx_state prev_state;
 	u32 loaddr, hi, lotype;
 
-	prev_state = ist_enter(regs);
+	ist_enter(regs);
 
 	rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
 	rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
@@ -39,7 +38,7 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
 
 	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
-	ist_exit(regs, prev_state);
+	ist_exit(regs);
 }
 
 /* Set up machine check reporting for processors with Intel style MCE: */
diff --git a/arch/x86/kernel/cpu/mcheck/winchip.c b/arch/x86/kernel/cpu/mcheck/winchip.c
index 44f138296fbe..01dd8702880b 100644
--- a/arch/x86/kernel/cpu/mcheck/winchip.c
+++ b/arch/x86/kernel/cpu/mcheck/winchip.c
@@ -15,12 +15,12 @@
 /* Machine check handler for WinChip C6: */
 static void winchip_machine_check(struct pt_regs *regs, long error_code)
 {
-	enum ctx_state prev_state = ist_enter(regs);
+	ist_enter(regs);
 
 	printk(KERN_EMERG "CPU0: Machine Check Exception.\n");
 	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
-	ist_exit(regs, prev_state);
+	ist_exit(regs);
 }
 
 /* Set up machine check reporting on the Winchip C6 series */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3f947488c9c1..459f843e5352 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -107,13 +107,10 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
 	preempt_count_dec();
 }
 
-enum ctx_state ist_enter(struct pt_regs *regs)
+void ist_enter(struct pt_regs *regs)
 {
-	enum ctx_state prev_state;
-
 	if (user_mode(regs)) {
-		/* Other than that, we're just an exception. */
-		prev_state = exception_enter();
+		CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	} else {
 		/*
 		 * We might have interrupted pretty much anything.  In
@@ -122,32 +119,25 @@ enum ctx_state ist_enter(struct pt_regs *regs)
 		 * but we need to notify RCU.
 		 */
 		rcu_nmi_enter();
-		prev_state = CONTEXT_KERNEL;  /* the value is irrelevant. */
 	}
 
 	/*
-	 * We are atomic because we're on the IST stack (or we're on x86_32,
-	 * in which case we still shouldn't schedule).
-	 *
-	 * This must be after exception_enter(), because exception_enter()
-	 * won't do anything if in_interrupt() returns true.
+	 * We are atomic because we're on the IST stack; or we're on
+	 * x86_32, in which case we still shouldn't schedule; or we're
+	 * on x86_64 and entered from user mode, in which case we're
+	 * still atomic unless ist_begin_non_atomic is called.
 	 */
 	preempt_count_add(HARDIRQ_OFFSET);
 
 	/* This code is a bit fragile.  Test it. */
 	rcu_lockdep_assert(rcu_is_watching(), "ist_enter didn't work");
-
-	return prev_state;
 }
 
-void ist_exit(struct pt_regs *regs, enum ctx_state prev_state)
+void ist_exit(struct pt_regs *regs)
 {
-	/* Must be before exception_exit. */
 	preempt_count_sub(HARDIRQ_OFFSET);
 
-	if (user_mode(regs))
-		return exception_exit(prev_state);
-	else
+	if (!user_mode(regs))
 		rcu_nmi_exit();
 }
 
@@ -161,7 +151,7 @@ void ist_exit(struct pt_regs *regs, enum ctx_state prev_state)
  * a double fault, it can be safe to schedule.  ist_begin_non_atomic()
  * begins a non-atomic section within an ist_enter()/ist_exit() region.
  * Callers are responsible for enabling interrupts themselves inside
- * the non-atomic section, and callers must call is_end_non_atomic()
+ * the non-atomic section, and callers must call ist_end_non_atomic()
  * before ist_exit().
  */
 void ist_begin_non_atomic(struct pt_regs *regs)
@@ -288,7 +278,6 @@ NOKPROBE_SYMBOL(do_trap);
 static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
 			  unsigned long trapnr, int signr)
 {
-	enum ctx_state prev_state = exception_enter();
 	siginfo_t info;
 
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
@@ -299,8 +288,6 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
 		do_trap(trapnr, signr, str, regs, error_code,
 			fill_trap_info(regs, signr, trapnr, &info));
 	}
-
-	exception_exit(prev_state);
 }
 
 #define DO_ERROR(trapnr, signr, str, name)				\
@@ -352,7 +339,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 	}
 #endif
 
-	ist_enter(regs);  /* Discard prev_state because we won't return. */
+	ist_enter(regs);
 	notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
 
 	tsk->thread.error_code = error_code;
@@ -374,15 +361,13 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 {
 	struct task_struct *tsk = current;
 	struct xsave_struct *xsave_buf;
-	enum ctx_state prev_state;
 	struct bndcsr *bndcsr;
 	siginfo_t *info;
 
-	prev_state = exception_enter();
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	if (notify_die(DIE_TRAP, "bounds", regs, error_code,
 			X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
-		goto exit;
+		return;
 	conditional_sti(regs);
 
 	if (!user_mode(regs))
@@ -439,9 +424,8 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 		die("bounds", regs, error_code);
 	}
 
-exit:
-	exception_exit(prev_state);
 	return;
+
 exit_trap:
 	/*
 	 * This path out is for all the cases where we could not
@@ -451,36 +435,33 @@ exit_trap:
 	 * time..
 	 */
 	do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
-	exception_exit(prev_state);
 }
 
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
 	struct task_struct *tsk;
-	enum ctx_state prev_state;
 
-	prev_state = exception_enter();
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	conditional_sti(regs);
 
 	if (v8086_mode(regs)) {
 		local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-		goto exit;
+		return;
 	}
 
 	tsk = current;
 	if (!user_mode(regs)) {
 		if (fixup_exception(regs))
-			goto exit;
+			return;
 
 		tsk->thread.error_code = error_code;
 		tsk->thread.trap_nr = X86_TRAP_GP;
 		if (notify_die(DIE_GPF, "general protection fault", regs, error_code,
 			       X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
 			die("general protection fault", regs, error_code);
-		goto exit;
+		return;
 	}
 
 	tsk->thread.error_code = error_code;
@@ -496,16 +477,12 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	}
 
 	force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
-exit:
-	exception_exit(prev_state);
 }
 NOKPROBE_SYMBOL(do_general_protection);
 
 /* May run on IST stack. */
 dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
 {
-	enum ctx_state prev_state;
-
 #ifdef CONFIG_DYNAMIC_FTRACE
 	/*
 	 * ftrace must be first, everything else may cause a recursive crash.
@@ -518,7 +495,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
 	if (poke_int3_handler(regs))
 		return;
 
-	prev_state = ist_enter(regs);
+	ist_enter(regs);
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
 	if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
@@ -545,7 +522,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
 	preempt_conditional_cli(regs);
 	debug_stack_usage_dec();
 exit:
-	ist_exit(regs, prev_state);
+	ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_int3);
 
@@ -621,12 +598,11 @@ NOKPROBE_SYMBOL(fixup_bad_iret);
 dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 {
 	struct task_struct *tsk = current;
-	enum ctx_state prev_state;
 	int user_icebp = 0;
 	unsigned long dr6;
 	int si_code;
 
-	prev_state = ist_enter(regs);
+	ist_enter(regs);
 
 	get_debugreg(dr6, 6);
 
@@ -701,7 +677,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	debug_stack_usage_dec();
 
 exit:
-	ist_exit(regs, prev_state);
+	ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug);
 
@@ -796,23 +772,15 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 
 dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
 {
-	enum ctx_state prev_state;
-
-	prev_state = exception_enter();
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	math_error(regs, error_code, X86_TRAP_MF);
-	exception_exit(prev_state);
 }
 
 dotraplinkage void
 do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
 {
-	enum ctx_state prev_state;
-
-	prev_state = exception_enter();
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	math_error(regs, error_code, X86_TRAP_XF);
-	exception_exit(prev_state);
 }
 
 dotraplinkage void
@@ -866,9 +834,6 @@ EXPORT_SYMBOL_GPL(math_state_restore);
 dotraplinkage void
 do_device_not_available(struct pt_regs *regs, long error_code)
 {
-	enum ctx_state prev_state;
-
-	prev_state = exception_enter();
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	BUG_ON(use_eager_fpu());
 
@@ -880,7 +845,6 @@ do_device_not_available(struct pt_regs *regs, long error_code)
 
 		info.regs = regs;
 		math_emulate(&info);
-		exception_exit(prev_state);
 		return;
 	}
 #endif
@@ -888,7 +852,6 @@ do_device_not_available(struct pt_regs *regs, long error_code)
 #ifdef CONFIG_X86_32
 	conditional_sti(regs);
 #endif
-	exception_exit(prev_state);
 }
 NOKPROBE_SYMBOL(do_device_not_available);
 
@@ -896,9 +859,7 @@ NOKPROBE_SYMBOL(do_device_not_available);
 dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
 {
 	siginfo_t info;
-	enum ctx_state prev_state;
 
-	prev_state = exception_enter();
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 	local_irq_enable();
 
@@ -911,7 +872,6 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
 		do_trap(X86_TRAP_IRET, SIGILL, "iret exception", regs, error_code,
 			&info);
 	}
-	exception_exit(prev_state);
 }
 #endif
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 14/14] x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (12 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 13/14] x86/entry: Remove exception_enter from trap handlers Andy Lutomirski
@ 2015-06-18 19:08 ` Andy Lutomirski
  2015-06-22 19:50 ` [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
  14 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-18 19:08 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst, paulmck,
	Andy Lutomirski

SCHEDULE_USER is no longer used, and asm/context-tracking.h
contained nothing else.  Remove the header entirely

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S               |  1 -
 arch/x86/include/asm/context_tracking.h | 10 ----------
 2 files changed, 11 deletions(-)
 delete mode 100644 arch/x86/include/asm/context_tracking.h

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9ae8b8ab91fa..25cc9c36ca59 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -33,7 +33,6 @@
 #include <asm/paravirt.h>
 #include <asm/percpu.h>
 #include <asm/asm.h>
-#include <asm/context_tracking.h>
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
 #include <linux/err.h>
diff --git a/arch/x86/include/asm/context_tracking.h b/arch/x86/include/asm/context_tracking.h
deleted file mode 100644
index 1fe49704b146..000000000000
--- a/arch/x86/include/asm/context_tracking.h
+++ /dev/null
@@ -1,10 +0,0 @@
-#ifndef _ASM_X86_CONTEXT_TRACKING_H
-#define _ASM_X86_CONTEXT_TRACKING_H
-
-#ifdef CONFIG_CONTEXT_TRACKING
-# define SCHEDULE_USER call schedule_user
-#else
-# define SCHEDULE_USER call schedule
-#endif
-
-#endif
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-18 19:08 ` [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die Andy Lutomirski
@ 2015-06-22 11:36   ` Borislav Petkov
  2015-06-22 16:26     ` Andy Lutomirski
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2015-06-22 11:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Frédéric Weisbecker, Rik van Riel,
	Oleg Nesterov, Denys Vlasenko, Kees Cook, Brian Gerst, paulmck

On Thu, Jun 18, 2015 at 12:08:35PM -0700, Andy Lutomirski wrote:
> Low-level arch entries often call notify_die, and it's easy for arch
> code to fail to exit an RCU quiescent state first.  Assert that
> we're not quiescent in notify_die.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  kernel/notifier.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index ae9fc7cc360e..980e4330fb59 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -544,6 +544,8 @@ int notrace notify_die(enum die_val val, const char *str,
>  		.signr	= sig,
>  
>  	};
> +	rcu_lockdep_assert(rcu_is_watching(),
> +			   "notify_die called but RCU thinks we're quiescent");
>  	return atomic_notifier_call_chain(&die_chain, val, &args);
>  }

Ok, we're about to die and we will prepend what would be a more
important splat possibly hinting at the problem is with a lockdep splat.

I think we should do the assertion and make the rcu_lockdep splat come
last I but don't see how to do this easily from all the notify_die()
call sites.

Or am I missing something...?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 11:36   ` Borislav Petkov
@ 2015-06-22 16:26     ` Andy Lutomirski
  2015-06-22 16:33       ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-22 16:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kees Cook, Paul E. McKenney, linux-kernel, Oleg Nesterov,
	Denys Vlasenko, Brian Gerst, Frédéric Weisbecker,
	X86 ML, Rik van Riel

On Jun 22, 2015 4:37 AM, "Borislav Petkov" <bp@alien8.de> wrote:
>
> On Thu, Jun 18, 2015 at 12:08:35PM -0700, Andy Lutomirski wrote:
> > Low-level arch entries often call notify_die, and it's easy for arch
> > code to fail to exit an RCU quiescent state first.  Assert that
> > we're not quiescent in notify_die.
> >
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > ---
> >  kernel/notifier.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/notifier.c b/kernel/notifier.c
> > index ae9fc7cc360e..980e4330fb59 100644
> > --- a/kernel/notifier.c
> > +++ b/kernel/notifier.c
> > @@ -544,6 +544,8 @@ int notrace notify_die(enum die_val val, const char *str,
> >               .signr  = sig,
> >
> >       };
> > +     rcu_lockdep_assert(rcu_is_watching(),
> > +                        "notify_die called but RCU thinks we're quiescent");
> >       return atomic_notifier_call_chain(&die_chain, val, &args);
> >  }
>
> Ok, we're about to die and we will prepend what would be a more
> important splat possibly hinting at the problem is with a lockdep splat.
>
> I think we should do the assertion and make the rcu_lockdep splat come
> last I but don't see how to do this easily from all the notify_die()
> call sites.
>
> Or am I missing something...?

notify_die is misnamed and has little to do with death.  It's really
just notifying about an exception, and we might end up oopsing,
sending a signal, or neither.

It's unfortunate that context tracking state isn't nmi-safe, forcing
us to differentiate rcu_is_watching (set by rcu_nmi_enter) from
ct_state() != CONTEXT_USER.

Also, I just realized that this whole series has a minor issue in that
there's a race where an IRQ hits after syscall entry but before
context tracking.  I'll fix it up.  (I think the impact is limited to
a warning.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 16:26     ` Andy Lutomirski
@ 2015-06-22 16:33       ` Borislav Petkov
  2015-06-22 17:03         ` Andy Lutomirski
  2015-06-23  8:56         ` Ingo Molnar
  0 siblings, 2 replies; 27+ messages in thread
From: Borislav Petkov @ 2015-06-22 16:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Paul E. McKenney, linux-kernel, Oleg Nesterov,
	Denys Vlasenko, Brian Gerst, Frédéric Weisbecker,
	X86 ML, Rik van Riel

On Mon, Jun 22, 2015 at 09:26:13AM -0700, Andy Lutomirski wrote:
> notify_die is misnamed and has little to do with death. It's really
> just notifying about an exception, and we might end up oopsing,
> sending a signal, or neither.

But if we oops and wedge solid afterwards, it might happen that only the
first splat comes out on the console, no? And that will be the lockdep
splat which would be useless for debugging the actual problem...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 16:33       ` Borislav Petkov
@ 2015-06-22 17:03         ` Andy Lutomirski
  2015-06-22 17:24           ` Borislav Petkov
  2015-06-23  8:56         ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-22 17:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kees Cook, Paul E. McKenney, linux-kernel, Oleg Nesterov,
	Denys Vlasenko, Brian Gerst, Frédéric Weisbecker,
	X86 ML, Rik van Riel

On Mon, Jun 22, 2015 at 9:33 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jun 22, 2015 at 09:26:13AM -0700, Andy Lutomirski wrote:
>> notify_die is misnamed and has little to do with death. It's really
>> just notifying about an exception, and we might end up oopsing,
>> sending a signal, or neither.
>
> But if we oops and wedge solid afterwards, it might happen that only the
> first splat comes out on the console, no? And that will be the lockdep
> splat which would be useless for debugging the actual problem...
>

The rcu_lockdep_assert should be merely a warning, not a full OOPS.  I
think that, if rcu_lockdep_assert hangs, then we should fix that
rather than avoiding debugging checks.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 17:03         ` Andy Lutomirski
@ 2015-06-22 17:24           ` Borislav Petkov
  2015-06-22 17:37             ` Andy Lutomirski
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2015-06-22 17:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Paul E. McKenney, linux-kernel, Oleg Nesterov,
	Denys Vlasenko, Brian Gerst, Frédéric Weisbecker,
	X86 ML, Rik van Riel

On Mon, Jun 22, 2015 at 10:03:30AM -0700, Andy Lutomirski wrote:
> The rcu_lockdep_assert should be merely a warning, not a full OOPS.

It is still pretty huge, see below.

> I think that, if rcu_lockdep_assert hangs, then we should fix that
> rather than avoiding debugging checks.

The RCU assertion firing might be unrelated to the oops happening and
could prevent us from seeing the real splat.

[    0.048815] 
[    0.050493] ===============================
[    0.052005] [ INFO: suspicious RCU usage. ]
[    0.056007] 4.1.0-rc8+ #4 Not tainted
[    0.060005] -------------------------------
[    0.064005] arch/x86/kernel/cpu/amd.c:677 BOINK!
[    0.066758] 
[    0.066758] other info that might help us debug this:
[    0.066758] 
[    0.068006] 
[    0.068006] rcu_scheduler_active = 0, debug_locks = 0
[    0.072005] no locks held by swapper/0/0.
[    0.076005] 
[    0.076005] stack backtrace:
[    0.080006] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc8+ #4
[    0.083331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[    0.084021]  0000000000000000 ffffffff81967eb8 ffffffff816709c7 0000000000000000
[    0.092005]  ffffffff81975580 ffffffff81967ee8 ffffffff8109e8cd 0000000000000000
[    0.097227]  ffffffff81a3aec0 ffffffff81cad9c0 ffffffff81cb42c0 ffffffff81967f38
[    0.104005] Call Trace:
[    0.106021]  [<ffffffff816709c7>] dump_stack+0x4f/0x7b
[    0.108007]  [<ffffffff8109e8cd>] lockdep_rcu_suspicious+0xfd/0x130
[    0.112007]  [<ffffffff81017f74>] init_amd+0x34/0x560
[    0.116007]  [<ffffffff810164e2>] identify_cpu+0x242/0x3b0
[    0.119068]  [<ffffffff81c27172>] identify_boot_cpu+0x10/0x7e
[    0.120006]  [<ffffffff81c27214>] check_bugs+0x9/0x2d
[    0.124007]  [<ffffffff81c1fe8e>] start_kernel+0x40e/0x425
[    0.128007]  [<ffffffff81c1f495>] x86_64_start_reservations+0x2a/0x2c
[    0.132009]  [<ffffffff81c1f582>] x86_64_start_kernel+0xeb/0xef

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 17:24           ` Borislav Petkov
@ 2015-06-22 17:37             ` Andy Lutomirski
  2015-06-22 18:15               ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-22 17:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kees Cook, Paul E. McKenney, linux-kernel, Oleg Nesterov,
	Denys Vlasenko, Brian Gerst, Frédéric Weisbecker,
	X86 ML, Rik van Riel

On Mon, Jun 22, 2015 at 10:24 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jun 22, 2015 at 10:03:30AM -0700, Andy Lutomirski wrote:
>> The rcu_lockdep_assert should be merely a warning, not a full OOPS.
>
> It is still pretty huge, see below.
>
>> I think that, if rcu_lockdep_assert hangs, then we should fix that
>> rather than avoiding debugging checks.
>
> The RCU assertion firing might be unrelated to the oops happening and
> could prevent us from seeing the real splat.
>
> [    0.048815]
> [    0.050493] ===============================
> [    0.052005] [ INFO: suspicious RCU usage. ]
> [    0.056007] 4.1.0-rc8+ #4 Not tainted
> [    0.060005] -------------------------------
> [    0.064005] arch/x86/kernel/cpu/amd.c:677 BOINK!
> [    0.066758]
> [    0.066758] other info that might help us debug this:
> [    0.066758]
> [    0.068006]
> [    0.068006] rcu_scheduler_active = 0, debug_locks = 0
> [    0.072005] no locks held by swapper/0/0.
> [    0.076005]
> [    0.076005] stack backtrace:
> [    0.080006] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc8+ #4
> [    0.083331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [    0.084021]  0000000000000000 ffffffff81967eb8 ffffffff816709c7 0000000000000000
> [    0.092005]  ffffffff81975580 ffffffff81967ee8 ffffffff8109e8cd 0000000000000000
> [    0.097227]  ffffffff81a3aec0 ffffffff81cad9c0 ffffffff81cb42c0 ffffffff81967f38
> [    0.104005] Call Trace:
> [    0.106021]  [<ffffffff816709c7>] dump_stack+0x4f/0x7b
> [    0.108007]  [<ffffffff8109e8cd>] lockdep_rcu_suspicious+0xfd/0x130
> [    0.112007]  [<ffffffff81017f74>] init_amd+0x34/0x560
> [    0.116007]  [<ffffffff810164e2>] identify_cpu+0x242/0x3b0
> [    0.119068]  [<ffffffff81c27172>] identify_boot_cpu+0x10/0x7e
> [    0.120006]  [<ffffffff81c27214>] check_bugs+0x9/0x2d
> [    0.124007]  [<ffffffff81c1fe8e>] start_kernel+0x40e/0x425
> [    0.128007]  [<ffffffff81c1f495>] x86_64_start_reservations+0x2a/0x2c
> [    0.132009]  [<ffffffff81c1f582>] x86_64_start_kernel+0xeb/0xef

But if we OOPS, we'll OOPS after the lockdep splat and the lockdep
splat will scroll off the screen, right?  Am I missing something here?

notify_die is called before the actual OOPS code is invoked in traps.c.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 17:37             ` Andy Lutomirski
@ 2015-06-22 18:15               ` Borislav Petkov
  2015-06-22 19:48                 ` Andy Lutomirski
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2015-06-22 18:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Paul E. McKenney, linux-kernel, Oleg Nesterov,
	Denys Vlasenko, Brian Gerst, Frédéric Weisbecker,
	X86 ML, Rik van Riel

On Mon, Jun 22, 2015 at 10:37:39AM -0700, Andy Lutomirski wrote:
> But if we OOPS, we'll OOPS after the lockdep splat and the lockdep
> splat will scroll off the screen, right?  Am I missing something here?

No, you're not.

> notify_die is called before the actual OOPS code is invoked in traps.c.

Yes, and with this assertion, you get to potentially print two
dump_stack()'s back-to-back instead of the one from traps.c.

And if the machine is about to be wedged solid soon anyway, we want to
dump as less (not-so-important) blurb to serial/console as possible. And
in this case, my suspicion is not that the lockdep splat will scroll
off the screen but that we might freeze before we even issue the whole
thing.

That's why I think we should be conservative and make the lockdep splat
come out second, if possible.

Am I making more sense now?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 18:15               ` Borislav Petkov
@ 2015-06-22 19:48                 ` Andy Lutomirski
  0 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-22 19:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kees Cook, Paul E. McKenney, linux-kernel, Oleg Nesterov,
	Denys Vlasenko, Brian Gerst, Frédéric Weisbecker,
	X86 ML, Rik van Riel

On Mon, Jun 22, 2015 at 11:15 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jun 22, 2015 at 10:37:39AM -0700, Andy Lutomirski wrote:
>> But if we OOPS, we'll OOPS after the lockdep splat and the lockdep
>> splat will scroll off the screen, right?  Am I missing something here?
>
> No, you're not.
>
>> notify_die is called before the actual OOPS code is invoked in traps.c.
>
> Yes, and with this assertion, you get to potentially print two
> dump_stack()'s back-to-back instead of the one from traps.c.
>
> And if the machine is about to be wedged solid soon anyway, we want to
> dump as less (not-so-important) blurb to serial/console as possible. And
> in this case, my suspicion is not that the lockdep splat will scroll
> off the screen but that we might freeze before we even issue the whole
> thing.
>
> That's why I think we should be conservative and make the lockdep splat
> come out second, if possible.

That'll annoy people using regular consoles, though.

I think this scenario isn't that likely. If we dereference a NULL
pointer, then we really should rcu watching before we actually oops in
the page fault code.  Similarly, if we take a non-fixed-up GPF, we
should have rcu watching in the early part of do_general_protection.

I'd be all for skipping the assertion entirely if we're going to OOPS,
but we don't know whether we're actually OOPSing when notify_die is
called.  We could individually instrument everything, or we could just
drop this patch entirely, but it has helped me catch some goofs while
developing all this code.

--Andy

>
> Am I making more sense now?
>
> --
> Regards/Gruss,
>     Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/14] x86: Rewrite exit-to-userspace code
  2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
                   ` (13 preceding siblings ...)
  2015-06-18 19:08 ` [PATCH v2 14/14] x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h Andy Lutomirski
@ 2015-06-22 19:50 ` Andy Lutomirski
  2015-06-23  5:32   ` Andy Lutomirski
  14 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-22 19:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Frédéric Weisbecker,
	Rik van Riel, Oleg Nesterov, Denys Vlasenko, Borislav Petkov,
	Kees Cook, Brian Gerst, Paul McKenney

On Thu, Jun 18, 2015 at 12:08 PM, Andy Lutomirski <luto@kernel.org> wrote:
> This is the first big batch of x86 asm-to-C conversion patches.

Ingo, what's the plan for these?  I found a minor design error that's
harmless so far (I think), but I don't know whether I should fix it as
a follow-up or do a v3.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/14] x86: Rewrite exit-to-userspace code
  2015-06-22 19:50 ` [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
@ 2015-06-23  5:32   ` Andy Lutomirski
  0 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2015-06-23  5:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Frédéric Weisbecker,
	Rik van Riel, Oleg Nesterov, Denys Vlasenko, Borislav Petkov,
	Kees Cook, Brian Gerst, Paul McKenney

On Mon, Jun 22, 2015 at 12:50 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Jun 18, 2015 at 12:08 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> This is the first big batch of x86 asm-to-C conversion patches.
>
> Ingo, what's the plan for these?  I found a minor design error that's
> harmless so far (I think), but I don't know whether I should fix it as
> a follow-up or do a v3.

Never mind.  Don't apply them.  I found a real bug.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-22 16:33       ` Borislav Petkov
  2015-06-22 17:03         ` Andy Lutomirski
@ 2015-06-23  8:56         ` Ingo Molnar
  2015-06-23 11:08           ` Borislav Petkov
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2015-06-23  8:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Kees Cook, Paul E. McKenney, linux-kernel,
	Oleg Nesterov, Denys Vlasenko, Brian Gerst,
	Frédéric Weisbecker, X86 ML, Rik van Riel


* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Jun 22, 2015 at 09:26:13AM -0700, Andy Lutomirski wrote:
>
> > notify_die is misnamed and has little to do with death. It's really just 
> > notifying about an exception, and we might end up oopsing, sending a signal, 
> > or neither.
> 
> But if we oops and wedge solid afterwards, it might happen that only the first 
> splat comes out on the console, no? And that will be the lockdep splat which 
> would be useless for debugging the actual problem...

So I think the theory is that crashes do happen, and that any RCU warning only 
matters to (usually) small race windows.

So by the time a difficult crash truly happens, exactly in that race window, we'd 
have fixed the RCU warning long ago.

I.e. the placement of the RCU warning isn't really relevant in the long run, as it 
should not trigger.

In the short run it's probably more important to have it first, because if we have 
that RCU race then we don't know whether we can trust anything that happens after 
executing the (flawed) notifier chain.

Does that logic make sense to you?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die
  2015-06-23  8:56         ` Ingo Molnar
@ 2015-06-23 11:08           ` Borislav Petkov
  0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2015-06-23 11:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Kees Cook, Paul E. McKenney, linux-kernel,
	Oleg Nesterov, Denys Vlasenko, Brian Gerst,
	Frédéric Weisbecker, X86 ML, Rik van Riel

On Tue, Jun 23, 2015 at 10:56:24AM +0200, Ingo Molnar wrote:
> So I think the theory is that crashes do happen, and that any RCU warning only 
> matters to (usually) small race windows.
> 
> So by the time a difficult crash truly happens, exactly in that race window, we'd 
> have fixed the RCU warning long ago.
> 
> I.e. the placement of the RCU warning isn't really relevant in the long run, as it 
> should not trigger.
> 
> In the short run it's probably more important to have it first, because if we have 
> that RCU race then we don't know whether we can trust anything that happens after 
> executing the (flawed) notifier chain.
> 
> Does that logic make sense to you?

Yap, it does actually. Nice :)

Thanks!

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-06-23 11:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18 19:08 [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 01/14] uml: Fix do_signal() prototype Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 02/14] context_tracking: Add ct_state and CT_WARN_ON Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die Andy Lutomirski
2015-06-22 11:36   ` Borislav Petkov
2015-06-22 16:26     ` Andy Lutomirski
2015-06-22 16:33       ` Borislav Petkov
2015-06-22 17:03         ` Andy Lutomirski
2015-06-22 17:24           ` Borislav Petkov
2015-06-22 17:37             ` Andy Lutomirski
2015-06-22 18:15               ` Borislav Petkov
2015-06-22 19:48                 ` Andy Lutomirski
2015-06-23  8:56         ` Ingo Molnar
2015-06-23 11:08           ` Borislav Petkov
2015-06-18 19:08 ` [PATCH v2 04/14] x86: Move C entry and exit code to arch/x86/entry/common.c Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 05/14] x86/traps: Assert that we're in CONTEXT_KERNEL in exception entries Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 06/14] x86/entry: Add enter_from_user_mode and use it in syscalls Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 07/14] x86/entry: Add new, comprehensible entry and exit hooks Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 08/14] x86/entry/64: Really create an error-entry-from-usermode code path Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 09/14] x86/entry/64: Migrate 64-bit and compat syscalls to new exit hooks Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 10/14] x86/asm/entry/64: Save all regs on interrupt entry Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 11/14] x86/asm/entry/64: Simplify irq stack pt_regs handling Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 12/14] x86/asm/entry/64: Migrate error and interrupt exit work to C Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 13/14] x86/entry: Remove exception_enter from trap handlers Andy Lutomirski
2015-06-18 19:08 ` [PATCH v2 14/14] x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h Andy Lutomirski
2015-06-22 19:50 ` [PATCH v2 00/14] x86: Rewrite exit-to-userspace code Andy Lutomirski
2015-06-23  5:32   ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.