[PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-29  3:38 ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-kernel, Kees Cook, Will Drewry
  Cc: Oleg Nesterov, x86, linux-arm-kernel, linux-mips, linux-arch,
	linux-security-module, Alexei Starovoitov, hpa,
	Frederic Weisbecker, Andy Lutomirski

This applies to:
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git seccomp-fastpath

Gitweb:
https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath

This is both a cleanup and a speedup.  It reduces overhead due to
installing a trivial seccomp filter by 87%.  The speedup comes from
avoiding the full syscall tracing mechanism for filters that don't
return SECCOMP_RET_TRACE.

This series depends on splitting the seccomp hooks into two phases.
The first phase evaluates the filter; it can skip syscalls, allow
them, kill the calling task, or pass a u32 to the second phase.  The
second phase requires a full tracing context, and it sends ptrace
events if necessary.  The seccomp core part is in Kees' seccomp/fastpath
tree.

These patches implement a similar split for the x86 syscall
entry work.  The C callback is invoked in two phases: the first has
only a partial frame, and it can request phase 2 processing with a
full frame.

Finally, I switch the 64-bit system_call code to use the new split
entry work.  This is a net deletion of assembly code: it replaces
all of the audit entry muck.

In the process, I fixed some bugs.

If this is acceptable, someone can do the same tweak for the
ia32entry and entry_32 code.

This passes all seccomp tests that I know of.

Changes from v3:
 - Dropped the core seccomp changes from the email -- Kees has applied them.
 - Add patch 2 (the TIF_NOHZ change).
 - Fix TIF_NOHZ in the two-phase entry code (thanks, Oleg).

Changes from v2:
 - Fixed 32-bit x86 build (and the tests pass).
 - Put the doc patch where it belongs.

Changes from v1:
 - Rebased on top of Kees' shiny new seccomp tree (no effect on the x86
   part).
 - Improved patch 6 vs patch 7 split (thanks Alexei!)
 - Fixed bogus -ENOSYS in patch 5 (thanks Kees!)
 - Improved changelog message in patch 6.

Changes from RFC version:
 - The first three patches are more or less the same
 - The rest is more or less a rewrite

Andy Lutomirski (5):
  x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit
  x86,entry: Only call user_exit if TIF_NOHZ
  x86: Split syscall_trace_enter into two phases
  x86_64,entry: Treat regs->ax the same in fastpath and slowpath
    syscalls
  x86_64,entry: Use split-phase syscall_trace_enter for 64-bit syscalls

 arch/x86/include/asm/calling.h |   6 +-
 arch/x86/include/asm/ptrace.h  |   5 ++
 arch/x86/kernel/entry_64.S     |  51 +++++--------
 arch/x86/kernel/ptrace.c       | 159 ++++++++++++++++++++++++++++++++++-------
 4 files changed, 161 insertions(+), 60 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-29  3:38 ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-arm-kernel

This applies to:
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git seccomp-fastpath

Gitweb:
https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath

This is both a cleanup and a speedup.  It reduces overhead due to
installing a trivial seccomp filter by 87%.  The speedup comes from
avoiding the full syscall tracing mechanism for filters that don't
return SECCOMP_RET_TRACE.

This series depends on splitting the seccomp hooks into two phases.
The first phase evaluates the filter; it can skip syscalls, allow
them, kill the calling task, or pass a u32 to the second phase.  The
second phase requires a full tracing context, and it sends ptrace
events if necessary.  The seccomp core part is in Kees' seccomp/fastpath
tree.

These patches implement a similar split for the x86 syscall
entry work.  The C callback is invoked in two phases: the first has
only a partial frame, and it can request phase 2 processing with a
full frame.

Finally, I switch the 64-bit system_call code to use the new split
entry work.  This is a net deletion of assembly code: it replaces
all of the audit entry muck.

In the process, I fixed some bugs.

If this is acceptable, someone can do the same tweak for the
ia32entry and entry_32 code.

This passes all seccomp tests that I know of.

Changes from v3:
 - Dropped the core seccomp changes from the email -- Kees has applied them.
 - Add patch 2 (the TIF_NOHZ change).
 - Fix TIF_NOHZ in the two-phase entry code (thanks, Oleg).

Changes from v2:
 - Fixed 32-bit x86 build (and the tests pass).
 - Put the doc patch where it belongs.

Changes from v1:
 - Rebased on top of Kees' shiny new seccomp tree (no effect on the x86
   part).
 - Improved patch 6 vs patch 7 split (thanks Alexei!)
 - Fixed bogus -ENOSYS in patch 5 (thanks Kees!)
 - Improved changelog message in patch 6.

Changes from RFC version:
 - The first three patches are more or less the same
 - The rest is more or less a rewrite

Andy Lutomirski (5):
  x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit
  x86,entry: Only call user_exit if TIF_NOHZ
  x86: Split syscall_trace_enter into two phases
  x86_64,entry: Treat regs->ax the same in fastpath and slowpath
    syscalls
  x86_64,entry: Use split-phase syscall_trace_enter for 64-bit syscalls

 arch/x86/include/asm/calling.h |   6 +-
 arch/x86/include/asm/ptrace.h  |   5 ++
 arch/x86/kernel/entry_64.S     |  51 +++++--------
 arch/x86/kernel/ptrace.c       | 159 ++++++++++++++++++++++++++++++++++-------
 4 files changed, 161 insertions(+), 60 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 1/5] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit
  2014-07-29  3:38 ` Andy Lutomirski
@ 2014-07-29  3:38   ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-kernel, Kees Cook, Will Drewry
  Cc: Oleg Nesterov, x86, linux-arm-kernel, linux-mips, linux-arch,
	linux-security-module, Alexei Starovoitov, hpa,
	Frederic Weisbecker, Andy Lutomirski

is_compat_task() is the wrong check for audit arch; the check should
be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not
AUDIT_ARCH_I386.

CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has
no visible effect.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/ptrace.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 93c182a..39296d2 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1441,15 +1441,6 @@ void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 	force_sig_info(SIGTRAP, &info, tsk);
 }
 
-
-#ifdef CONFIG_X86_32
-# define IS_IA32	1
-#elif defined CONFIG_IA32_EMULATION
-# define IS_IA32	is_compat_task()
-#else
-# define IS_IA32	0
-#endif
-
 /*
  * We must return the syscall number to actually look up in the table.
  * This can be -1L to skip running any syscall at all.
@@ -1487,7 +1478,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (IS_IA32)
+	if (is_ia32_task())
 		audit_syscall_entry(AUDIT_ARCH_I386,
 				    regs->orig_ax,
 				    regs->bx, regs->cx,
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 1/5] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit
@ 2014-07-29  3:38   ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-arm-kernel

is_compat_task() is the wrong check for audit arch; the check should
be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not
AUDIT_ARCH_I386.

CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has
no visible effect.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/ptrace.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 93c182a..39296d2 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1441,15 +1441,6 @@ void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 	force_sig_info(SIGTRAP, &info, tsk);
 }
 
-
-#ifdef CONFIG_X86_32
-# define IS_IA32	1
-#elif defined CONFIG_IA32_EMULATION
-# define IS_IA32	is_compat_task()
-#else
-# define IS_IA32	0
-#endif
-
 /*
  * We must return the syscall number to actually look up in the table.
  * This can be -1L to skip running any syscall at all.
@@ -1487,7 +1478,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (IS_IA32)
+	if (is_ia32_task())
 		audit_syscall_entry(AUDIT_ARCH_I386,
 				    regs->orig_ax,
 				    regs->bx, regs->cx,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-29  3:38 ` Andy Lutomirski
@ 2014-07-29  3:38   ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-kernel, Kees Cook, Will Drewry
  Cc: Oleg Nesterov, x86, linux-arm-kernel, linux-mips, linux-arch,
	linux-security-module, Alexei Starovoitov, hpa,
	Frederic Weisbecker, Andy Lutomirski

The RCU context tracking code requires that arch code call
user_exit() on any entry into kernel code if TIF_NOHZ is set.  This
patch adds a check for TIF_NOHZ and a comment to the syscall entry
tracing code.

The main purpose of this patch is to make the code easier to follow:
one can read the body of user_exit and of every function it calls
without finding any explanation of why it's called for traced
syscalls but not for untraced syscalls.  This makes it clear when
user_exit() is necessary.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/ptrace.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 39296d2..bbf338a 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	user_exit();
+	/*
+	 * If TIF_NOHZ is set, we are required to call user_exit() before
+	 * doing anything that could touch RCU.
+	 */
+	if (test_thread_flag(TIF_NOHZ))
+		user_exit();
 
 	/*
 	 * If we stepped into a sysenter/syscall insn, it trapped in
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-29  3:38   ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-arm-kernel

The RCU context tracking code requires that arch code call
user_exit() on any entry into kernel code if TIF_NOHZ is set.  This
patch adds a check for TIF_NOHZ and a comment to the syscall entry
tracing code.

The main purpose of this patch is to make the code easier to follow:
one can read the body of user_exit and of every function it calls
without finding any explanation of why it's called for traced
syscalls but not for untraced syscalls.  This makes it clear when
user_exit() is necessary.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/ptrace.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 39296d2..bbf338a 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;

-	user_exit();
+	/*
+	 * If TIF_NOHZ is set, we are required to call user_exit() before
+	 * doing anything that could touch RCU.
+	 */
+	if (test_thread_flag(TIF_NOHZ))
+		user_exit();

 	/*
 	 * If we stepped into a sysenter/syscall insn, it trapped in
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 3/5] x86: Split syscall_trace_enter into two phases
  2014-07-29  3:38 ` Andy Lutomirski
@ 2014-07-29  3:38   ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-kernel, Kees Cook, Will Drewry
  Cc: Oleg Nesterov, x86, linux-arm-kernel, linux-mips, linux-arch,
	linux-security-module, Alexei Starovoitov, hpa,
	Frederic Weisbecker, Andy Lutomirski

This splits syscall_trace_enter into syscall_trace_enter_phase1 and
syscall_trace_enter_phase2.  Only phase 2 has full pt_regs, and only
phase 2 is permitted to modify any of pt_regs except for orig_ax.

The intent is that phase 1 can be called from the syscall fast path.

In this implementation, phase1 can handle any combination of
TIF_NOHZ (RCU context tracking), TIF_SECCOMP, and TIF_SYSCALL_AUDIT,
unless seccomp requests a ptrace event, in which case phase2 is
forced.

In principle, this could yield a big speedup for TIF_NOHZ as well as
for TIF_SECCOMP if syscall exit work were similarly split up.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/ptrace.h |   5 ++
 arch/x86/kernel/ptrace.c      | 151 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 135 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6205f0c..86fc2bb 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -75,6 +75,11 @@ convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs);
 extern void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 			 int error_code, int si_code);
 
+
+extern unsigned long syscall_trace_enter_phase1(struct pt_regs *, u32 arch);
+extern long syscall_trace_enter_phase2(struct pt_regs *, u32 arch,
+				       unsigned long phase1_result);
+
 extern long syscall_trace_enter(struct pt_regs *);
 extern void syscall_trace_leave(struct pt_regs *);
 
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index bbf338a..5da2672 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1441,20 +1441,126 @@ void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 	force_sig_info(SIGTRAP, &info, tsk);
 }
 
+static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
+{
+#ifdef CONFIG_X86_64
+	if (arch == AUDIT_ARCH_X86_64) {
+		audit_syscall_entry(arch, regs->orig_ax, regs->di,
+				    regs->si, regs->dx, regs->r10);
+	} else
+#endif
+	{
+		audit_syscall_entry(arch, regs->orig_ax, regs->bx,
+				    regs->cx, regs->dx, regs->si);
+	}
+}
+
 /*
- * We must return the syscall number to actually look up in the table.
- * This can be -1L to skip running any syscall at all.
+ * We can return 0 to resume the syscall or anything else to go to phase
+ * 2.  If we resume the syscall, we need to put something appropriate in
+ * regs->orig_ax.
+ *
+ * NB: We don't have full pt_regs here, but regs->orig_ax and regs->ax
+ * are fully functional.
+ *
+ * For phase 2's benefit, our return value is:
+ * 0:			resume the syscall
+ * 1:			go to phase 2; no seccomp phase 2 needed
+ * anything else:	go to phase 2; pass return value to seccomp
  */
-long syscall_trace_enter(struct pt_regs *regs)
+unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
 {
-	long ret = 0;
+	unsigned long ret = 0;
+	u32 work;
+
+	BUG_ON(regs != task_pt_regs(current));
+
+	work = ACCESS_ONCE(current_thread_info()->flags) &
+		_TIF_WORK_SYSCALL_ENTRY;
 
 	/*
 	 * If TIF_NOHZ is set, we are required to call user_exit() before
 	 * doing anything that could touch RCU.
 	 */
-	if (test_thread_flag(TIF_NOHZ))
+	if (work & _TIF_NOHZ) {
 		user_exit();
+		work &= ~TIF_NOHZ;
+	}
+
+#ifdef CONFIG_SECCOMP
+	/*
+	 * Do seccomp first -- it should minimize exposure of other
+	 * code, and keeping seccomp fast is probably more valuable
+	 * than the rest of this.
+	 */
+	if (work & _TIF_SECCOMP) {
+		struct seccomp_data sd;
+
+		sd.arch = arch;
+		sd.nr = regs->orig_ax;
+		sd.instruction_pointer = regs->ip;
+#ifdef CONFIG_X86_64
+		if (arch == AUDIT_ARCH_X86_64) {
+			sd.args[0] = regs->di;
+			sd.args[1] = regs->si;
+			sd.args[2] = regs->dx;
+			sd.args[3] = regs->r10;
+			sd.args[4] = regs->r8;
+			sd.args[5] = regs->r9;
+		} else
+#endif
+		{
+			sd.args[0] = regs->bx;
+			sd.args[1] = regs->cx;
+			sd.args[2] = regs->dx;
+			sd.args[3] = regs->si;
+			sd.args[4] = regs->di;
+			sd.args[5] = regs->bp;
+		}
+
+		BUILD_BUG_ON(SECCOMP_PHASE1_OK != 0);
+		BUILD_BUG_ON(SECCOMP_PHASE1_SKIP != 1);
+
+		ret = seccomp_phase1(&sd);
+		if (ret == SECCOMP_PHASE1_SKIP) {
+			regs->orig_ax = -1;
+			ret = 0;
+		} else if (ret != SECCOMP_PHASE1_OK) {
+			return ret;  /* Go directly to phase 2 */
+		}
+
+		work &= ~_TIF_SECCOMP;
+	}
+#endif
+
+	/* Do our best to finish without phase 2. */
+	if (work == 0)
+		return ret;  /* seccomp and/or nohz only (ret == 0 here) */
+
+#ifdef CONFIG_AUDITSYSCALL
+	if (work == _TIF_SYSCALL_AUDIT) {
+		/*
+		 * If there is no more work to be done except auditing,
+		 * then audit in phase 1.  Phase 2 always audits, so, if
+		 * we audit here, then we can't go on to phase 2.
+		 */
+		do_audit_syscall_entry(regs, arch);
+		return 0;
+	}
+#endif
+
+	return 1;  /* Something is enabled that we can't handle in phase 1 */
+}
+
+/* Returns the syscall nr to run (which should match regs->orig_ax). */
+long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
+				unsigned long phase1_result)
+{
+	long ret = 0;
+	u32 work = ACCESS_ONCE(current_thread_info()->flags) &
+		_TIF_WORK_SYSCALL_ENTRY;
+
+	BUG_ON(regs != task_pt_regs(current));
 
 	/*
 	 * If we stepped into a sysenter/syscall insn, it trapped in
@@ -1463,17 +1569,20 @@ long syscall_trace_enter(struct pt_regs *regs)
 	 * do_debug() and we need to set it again to restore the user
 	 * state.  If we entered on the slow path, TF was already set.
 	 */
-	if (test_thread_flag(TIF_SINGLESTEP))
+	if (work & _TIF_SINGLESTEP)
 		regs->flags |= X86_EFLAGS_TF;
 
-	/* do the secure computing check first */
-	if (secure_computing()) {
+	/*
+	 * Call seccomp_phase2 before running the other hooks so that
+	 * they can see any changes made by a seccomp tracer.
+	 */
+	if (phase1_result > 1 && seccomp_phase2(phase1_result)) {
 		/* seccomp failures shouldn't expose any additional code. */
 		ret = -1L;
 		goto out;
 	}
 
-	if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
+	if (unlikely(work & _TIF_SYSCALL_EMU))
 		ret = -1L;
 
 	if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) &&
@@ -1483,23 +1592,23 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (is_ia32_task())
-		audit_syscall_entry(AUDIT_ARCH_I386,
-				    regs->orig_ax,
-				    regs->bx, regs->cx,
-				    regs->dx, regs->si);
-#ifdef CONFIG_X86_64
-	else
-		audit_syscall_entry(AUDIT_ARCH_X86_64,
-				    regs->orig_ax,
-				    regs->di, regs->si,
-				    regs->dx, regs->r10);
-#endif
+	do_audit_syscall_entry(regs, arch);
 
 out:
 	return ret ?: regs->orig_ax;
 }
 
+long syscall_trace_enter(struct pt_regs *regs)
+{
+	u32 arch = is_ia32_task() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
+	unsigned long phase1_result = syscall_trace_enter_phase1(regs, arch);
+
+	if (phase1_result == 0)
+		return regs->orig_ax;
+	else
+		return syscall_trace_enter_phase2(regs, arch, phase1_result);
+}
+
 void syscall_trace_leave(struct pt_regs *regs)
 {
 	bool step;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 3/5] x86: Split syscall_trace_enter into two phases
@ 2014-07-29  3:38   ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-arm-kernel

This splits syscall_trace_enter into syscall_trace_enter_phase1 and
syscall_trace_enter_phase2.  Only phase 2 has full pt_regs, and only
phase 2 is permitted to modify any of pt_regs except for orig_ax.

The intent is that phase 1 can be called from the syscall fast path.

In this implementation, phase1 can handle any combination of
TIF_NOHZ (RCU context tracking), TIF_SECCOMP, and TIF_SYSCALL_AUDIT,
unless seccomp requests a ptrace event, in which case phase2 is
forced.

In principle, this could yield a big speedup for TIF_NOHZ as well as
for TIF_SECCOMP if syscall exit work were similarly split up.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/ptrace.h |   5 ++
 arch/x86/kernel/ptrace.c      | 151 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 135 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6205f0c..86fc2bb 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -75,6 +75,11 @@ convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs);
 extern void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 			 int error_code, int si_code);
 
+
+extern unsigned long syscall_trace_enter_phase1(struct pt_regs *, u32 arch);
+extern long syscall_trace_enter_phase2(struct pt_regs *, u32 arch,
+				       unsigned long phase1_result);
+
 extern long syscall_trace_enter(struct pt_regs *);
 extern void syscall_trace_leave(struct pt_regs *);
 
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index bbf338a..5da2672 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1441,20 +1441,126 @@ void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 	force_sig_info(SIGTRAP, &info, tsk);
 }
 
+static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
+{
+#ifdef CONFIG_X86_64
+	if (arch == AUDIT_ARCH_X86_64) {
+		audit_syscall_entry(arch, regs->orig_ax, regs->di,
+				    regs->si, regs->dx, regs->r10);
+	} else
+#endif
+	{
+		audit_syscall_entry(arch, regs->orig_ax, regs->bx,
+				    regs->cx, regs->dx, regs->si);
+	}
+}
+
 /*
- * We must return the syscall number to actually look up in the table.
- * This can be -1L to skip running any syscall at all.
+ * We can return 0 to resume the syscall or anything else to go to phase
+ * 2.  If we resume the syscall, we need to put something appropriate in
+ * regs->orig_ax.
+ *
+ * NB: We don't have full pt_regs here, but regs->orig_ax and regs->ax
+ * are fully functional.
+ *
+ * For phase 2's benefit, our return value is:
+ * 0:			resume the syscall
+ * 1:			go to phase 2; no seccomp phase 2 needed
+ * anything else:	go to phase 2; pass return value to seccomp
  */
-long syscall_trace_enter(struct pt_regs *regs)
+unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
 {
-	long ret = 0;
+	unsigned long ret = 0;
+	u32 work;
+
+	BUG_ON(regs != task_pt_regs(current));
+
+	work = ACCESS_ONCE(current_thread_info()->flags) &
+		_TIF_WORK_SYSCALL_ENTRY;
 
 	/*
 	 * If TIF_NOHZ is set, we are required to call user_exit() before
 	 * doing anything that could touch RCU.
 	 */
-	if (test_thread_flag(TIF_NOHZ))
+	if (work & _TIF_NOHZ) {
 		user_exit();
+		work &= ~TIF_NOHZ;
+	}
+
+#ifdef CONFIG_SECCOMP
+	/*
+	 * Do seccomp first -- it should minimize exposure of other
+	 * code, and keeping seccomp fast is probably more valuable
+	 * than the rest of this.
+	 */
+	if (work & _TIF_SECCOMP) {
+		struct seccomp_data sd;
+
+		sd.arch = arch;
+		sd.nr = regs->orig_ax;
+		sd.instruction_pointer = regs->ip;
+#ifdef CONFIG_X86_64
+		if (arch == AUDIT_ARCH_X86_64) {
+			sd.args[0] = regs->di;
+			sd.args[1] = regs->si;
+			sd.args[2] = regs->dx;
+			sd.args[3] = regs->r10;
+			sd.args[4] = regs->r8;
+			sd.args[5] = regs->r9;
+		} else
+#endif
+		{
+			sd.args[0] = regs->bx;
+			sd.args[1] = regs->cx;
+			sd.args[2] = regs->dx;
+			sd.args[3] = regs->si;
+			sd.args[4] = regs->di;
+			sd.args[5] = regs->bp;
+		}
+
+		BUILD_BUG_ON(SECCOMP_PHASE1_OK != 0);
+		BUILD_BUG_ON(SECCOMP_PHASE1_SKIP != 1);
+
+		ret = seccomp_phase1(&sd);
+		if (ret == SECCOMP_PHASE1_SKIP) {
+			regs->orig_ax = -1;
+			ret = 0;
+		} else if (ret != SECCOMP_PHASE1_OK) {
+			return ret;  /* Go directly to phase 2 */
+		}
+
+		work &= ~_TIF_SECCOMP;
+	}
+#endif
+
+	/* Do our best to finish without phase 2. */
+	if (work == 0)
+		return ret;  /* seccomp and/or nohz only (ret == 0 here) */
+
+#ifdef CONFIG_AUDITSYSCALL
+	if (work == _TIF_SYSCALL_AUDIT) {
+		/*
+		 * If there is no more work to be done except auditing,
+		 * then audit in phase 1.  Phase 2 always audits, so, if
+		 * we audit here, then we can't go on to phase 2.
+		 */
+		do_audit_syscall_entry(regs, arch);
+		return 0;
+	}
+#endif
+
+	return 1;  /* Something is enabled that we can't handle in phase 1 */
+}
+
+/* Returns the syscall nr to run (which should match regs->orig_ax). */
+long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
+				unsigned long phase1_result)
+{
+	long ret = 0;
+	u32 work = ACCESS_ONCE(current_thread_info()->flags) &
+		_TIF_WORK_SYSCALL_ENTRY;
+
+	BUG_ON(regs != task_pt_regs(current));
 
 	/*
 	 * If we stepped into a sysenter/syscall insn, it trapped in
@@ -1463,17 +1569,20 @@ long syscall_trace_enter(struct pt_regs *regs)
 	 * do_debug() and we need to set it again to restore the user
 	 * state.  If we entered on the slow path, TF was already set.
 	 */
-	if (test_thread_flag(TIF_SINGLESTEP))
+	if (work & _TIF_SINGLESTEP)
 		regs->flags |= X86_EFLAGS_TF;
 
-	/* do the secure computing check first */
-	if (secure_computing()) {
+	/*
+	 * Call seccomp_phase2 before running the other hooks so that
+	 * they can see any changes made by a seccomp tracer.
+	 */
+	if (phase1_result > 1 && seccomp_phase2(phase1_result)) {
 		/* seccomp failures shouldn't expose any additional code. */
 		ret = -1L;
 		goto out;
 	}
 
-	if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
+	if (unlikely(work & _TIF_SYSCALL_EMU))
 		ret = -1L;
 
 	if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) &&
@@ -1483,23 +1592,23 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (is_ia32_task())
-		audit_syscall_entry(AUDIT_ARCH_I386,
-				    regs->orig_ax,
-				    regs->bx, regs->cx,
-				    regs->dx, regs->si);
-#ifdef CONFIG_X86_64
-	else
-		audit_syscall_entry(AUDIT_ARCH_X86_64,
-				    regs->orig_ax,
-				    regs->di, regs->si,
-				    regs->dx, regs->r10);
-#endif
+	do_audit_syscall_entry(regs, arch);
 
 out:
 	return ret ?: regs->orig_ax;
 }
 
+long syscall_trace_enter(struct pt_regs *regs)
+{
+	u32 arch = is_ia32_task() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
+	unsigned long phase1_result = syscall_trace_enter_phase1(regs, arch);
+
+	if (phase1_result == 0)
+		return regs->orig_ax;
+	else
+		return syscall_trace_enter_phase2(regs, arch, phase1_result);
+}
+
 void syscall_trace_leave(struct pt_regs *regs)
 {
 	bool step;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 4/5] x86_64,entry: Treat regs->ax the same in fastpath and slowpath syscalls
  2014-07-29  3:38 ` Andy Lutomirski
@ 2014-07-29  3:38   ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-kernel, Kees Cook, Will Drewry
  Cc: Oleg Nesterov, x86, linux-arm-kernel, linux-mips, linux-arch,
	linux-security-module, Alexei Starovoitov, hpa,
	Frederic Weisbecker, Andy Lutomirski

For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick
the syscall number into regs->orig_ax prior to any possible tracing
and syscall execution.  This is user-visible ABI used by ptrace
syscall emulation and seccomp.

For fastpath syscalls, there's no good reason not to do the same
thing.  It's even slightly simpler than what we're currently doing.
It probably has no measureable performance impact.  It should have
no user-visible effect.

The purpose of this patch is to prepare for two-phase syscall
tracing, in which the first phase might modify the saved RAX without
leaving the fast path.  This change is just subtle enough that I'm
keeping it separate.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/calling.h |  6 +++++-
 arch/x86/kernel/entry_64.S     | 13 ++++---------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index cb4c73b..76659b6 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -85,7 +85,7 @@ For 32-bit we have the following conventions - kernel is built with
 #define ARGOFFSET	R11
 #define SWFRAME		ORIG_RAX
 
-	.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1
+	.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1, rax_enosys=0
 	subq  $9*8+\addskip, %rsp
 	CFI_ADJUST_CFA_OFFSET	9*8+\addskip
 	movq_cfi rdi, 8*8
@@ -96,7 +96,11 @@ For 32-bit we have the following conventions - kernel is built with
 	movq_cfi rcx, 5*8
 	.endif
 
+	.if \rax_enosys
+	movq $-ENOSYS, 4*8(%rsp)
+	.else
 	movq_cfi rax, 4*8
+	.endif
 
 	.if \save_r891011
 	movq_cfi r8,  3*8
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b25ca96..1eb3094 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -405,8 +405,8 @@ GLOBAL(system_call_after_swapgs)
 	 * and short:
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	SAVE_ARGS 8,0
-	movq  %rax,ORIG_RAX-ARGOFFSET(%rsp)
+	SAVE_ARGS 8, 0, rax_enosys=1
+	movq_cfi rax,(ORIG_RAX-ARGOFFSET)
 	movq  %rcx,RIP-ARGOFFSET(%rsp)
 	CFI_REL_OFFSET rip,RIP-ARGOFFSET
 	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
@@ -418,7 +418,7 @@ system_call_fastpath:
 	andl $__SYSCALL_MASK,%eax
 	cmpl $__NR_syscall_max,%eax
 #endif
-	ja badsys
+	ja ret_from_sys_call  /* and return regs->ax */
 	movq %r10,%rcx
 	call *sys_call_table(,%rax,8)  # XXX:	 rip relative
 	movq %rax,RAX-ARGOFFSET(%rsp)
@@ -477,10 +477,6 @@ sysret_signal:
 	FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
 	jmp int_check_syscall_exit_work
 
-badsys:
-	movq $-ENOSYS,RAX-ARGOFFSET(%rsp)
-	jmp ret_from_sys_call
-
 #ifdef CONFIG_AUDITSYSCALL
 	/*
 	 * Fast path for syscall audit without full syscall trace.
@@ -520,7 +516,6 @@ tracesys:
 	jz auditsys
 #endif
 	SAVE_REST
-	movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
 	FIXUP_TOP_OF_STACK %rdi
 	movq %rsp,%rdi
 	call syscall_trace_enter
@@ -537,7 +532,7 @@ tracesys:
 	andl $__SYSCALL_MASK,%eax
 	cmpl $__NR_syscall_max,%eax
 #endif
-	ja   int_ret_from_sys_call	/* RAX(%rsp) set to -ENOSYS above */
+	ja   int_ret_from_sys_call	/* RAX(%rsp) is already set */
 	movq %r10,%rcx	/* fixup for C */
 	call *sys_call_table(,%rax,8)
 	movq %rax,RAX-ARGOFFSET(%rsp)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 4/5] x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
@ 2014-07-29  3:38   ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-arm-kernel

For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick
the syscall number into regs->orig_ax prior to any possible tracing
and syscall execution.  This is user-visible ABI used by ptrace
syscall emulation and seccomp.

For fastpath syscalls, there's no good reason not to do the same
thing.  It's even slightly simpler than what we're currently doing.
It probably has no measureable performance impact.  It should have
no user-visible effect.

The purpose of this patch is to prepare for two-phase syscall
tracing, in which the first phase might modify the saved RAX without
leaving the fast path.  This change is just subtle enough that I'm
keeping it separate.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/calling.h |  6 +++++-
 arch/x86/kernel/entry_64.S     | 13 ++++---------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index cb4c73b..76659b6 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -85,7 +85,7 @@ For 32-bit we have the following conventions - kernel is built with
 #define ARGOFFSET	R11
 #define SWFRAME		ORIG_RAX
 
-	.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1
+	.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1, rax_enosys=0
 	subq  $9*8+\addskip, %rsp
 	CFI_ADJUST_CFA_OFFSET	9*8+\addskip
 	movq_cfi rdi, 8*8
@@ -96,7 +96,11 @@ For 32-bit we have the following conventions - kernel is built with
 	movq_cfi rcx, 5*8
 	.endif
 
+	.if \rax_enosys
+	movq $-ENOSYS, 4*8(%rsp)
+	.else
 	movq_cfi rax, 4*8
+	.endif
 
 	.if \save_r891011
 	movq_cfi r8,  3*8
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b25ca96..1eb3094 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -405,8 +405,8 @@ GLOBAL(system_call_after_swapgs)
 	 * and short:
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	SAVE_ARGS 8,0
-	movq  %rax,ORIG_RAX-ARGOFFSET(%rsp)
+	SAVE_ARGS 8, 0, rax_enosys=1
+	movq_cfi rax,(ORIG_RAX-ARGOFFSET)
 	movq  %rcx,RIP-ARGOFFSET(%rsp)
 	CFI_REL_OFFSET rip,RIP-ARGOFFSET
 	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
@@ -418,7 +418,7 @@ system_call_fastpath:
 	andl $__SYSCALL_MASK,%eax
 	cmpl $__NR_syscall_max,%eax
 #endif
-	ja badsys
+	ja ret_from_sys_call  /* and return regs->ax */
 	movq %r10,%rcx
 	call *sys_call_table(,%rax,8)  # XXX:	 rip relative
 	movq %rax,RAX-ARGOFFSET(%rsp)
@@ -477,10 +477,6 @@ sysret_signal:
 	FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
 	jmp int_check_syscall_exit_work
 
-badsys:
-	movq $-ENOSYS,RAX-ARGOFFSET(%rsp)
-	jmp ret_from_sys_call
-
 #ifdef CONFIG_AUDITSYSCALL
 	/*
 	 * Fast path for syscall audit without full syscall trace.
@@ -520,7 +516,6 @@ tracesys:
 	jz auditsys
 #endif
 	SAVE_REST
-	movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
 	FIXUP_TOP_OF_STACK %rdi
 	movq %rsp,%rdi
 	call syscall_trace_enter
@@ -537,7 +532,7 @@ tracesys:
 	andl $__SYSCALL_MASK,%eax
 	cmpl $__NR_syscall_max,%eax
 #endif
-	ja   int_ret_from_sys_call	/* RAX(%rsp) set to -ENOSYS above */
+	ja   int_ret_from_sys_call	/* RAX(%rsp) is already set */
 	movq %r10,%rcx	/* fixup for C */
 	call *sys_call_table(,%rax,8)
 	movq %rax,RAX-ARGOFFSET(%rsp)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 5/5] x86_64,entry: Use split-phase syscall_trace_enter for 64-bit syscalls
  2014-07-29  3:38 ` Andy Lutomirski
@ 2014-07-29  3:38   ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-kernel, Kees Cook, Will Drewry
  Cc: Oleg Nesterov, x86, linux-arm-kernel, linux-mips, linux-arch,
	linux-security-module, Alexei Starovoitov, hpa,
	Frederic Weisbecker, Andy Lutomirski

On KVM on my box, this reduces the overhead from an always-accept
seccomp filter from ~130ns to ~17ns.  Most of that comes from
avoiding IRET on every syscall when seccomp is enabled.

In extremely approximate hacked-up benchmarking, just bypassing IRET
saves about 80ns, so there's another 43ns of savings here from
simplifying the seccomp path.

The diffstat is also rather nice :)

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/entry_64.S | 38 +++++++++++++++-----------------------
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1eb3094..13e0c1d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -479,22 +479,6 @@ sysret_signal:
 
 #ifdef CONFIG_AUDITSYSCALL
 	/*
-	 * Fast path for syscall audit without full syscall trace.
-	 * We just call __audit_syscall_entry() directly, and then
-	 * jump back to the normal fast path.
-	 */
-auditsys:
-	movq %r10,%r9			/* 6th arg: 4th syscall arg */
-	movq %rdx,%r8			/* 5th arg: 3rd syscall arg */
-	movq %rsi,%rcx			/* 4th arg: 2nd syscall arg */
-	movq %rdi,%rdx			/* 3rd arg: 1st syscall arg */
-	movq %rax,%rsi			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_X86_64,%edi	/* 1st arg: audit arch */
-	call __audit_syscall_entry
-	LOAD_ARGS 0		/* reload call-clobbered registers */
-	jmp system_call_fastpath
-
-	/*
 	 * Return fast path for syscall audit.  Call __audit_syscall_exit()
 	 * directly and then jump back to the fast path with TIF_SYSCALL_AUDIT
 	 * masked off.
@@ -511,17 +495,25 @@ sysret_audit:
 
 	/* Do syscall tracing */
 tracesys:
-#ifdef CONFIG_AUDITSYSCALL
-	testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
-	jz auditsys
-#endif
+	leaq -REST_SKIP(%rsp), %rdi
+	movq $AUDIT_ARCH_X86_64, %rsi
+	call syscall_trace_enter_phase1
+	test %rax, %rax
+	jnz tracesys_phase2		/* if needed, run the slow path */
+	LOAD_ARGS 0			/* else restore clobbered regs */
+	jmp system_call_fastpath	/*      and return to the fast path */
+
+tracesys_phase2:
 	SAVE_REST
 	FIXUP_TOP_OF_STACK %rdi
-	movq %rsp,%rdi
-	call syscall_trace_enter
+	movq %rsp, %rdi
+	movq $AUDIT_ARCH_X86_64, %rsi
+	movq %rax,%rdx
+	call syscall_trace_enter_phase2
+
 	/*
 	 * Reload arg registers from stack in case ptrace changed them.
-	 * We don't reload %rax because syscall_trace_enter() returned
+	 * We don't reload %rax because syscall_trace_entry_phase2() returned
 	 * the value it wants us to use in the table lookup.
 	 */
 	LOAD_ARGS ARGOFFSET, 1
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 5/5] x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
@ 2014-07-29  3:38   ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29  3:38 UTC (permalink / raw)
  To: linux-arm-kernel

On KVM on my box, this reduces the overhead from an always-accept
seccomp filter from ~130ns to ~17ns.  Most of that comes from
avoiding IRET on every syscall when seccomp is enabled.

In extremely approximate hacked-up benchmarking, just bypassing IRET
saves about 80ns, so there's another 43ns of savings here from
simplifying the seccomp path.

The diffstat is also rather nice :)

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/entry_64.S | 38 +++++++++++++++-----------------------
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1eb3094..13e0c1d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -479,22 +479,6 @@ sysret_signal:
 
 #ifdef CONFIG_AUDITSYSCALL
 	/*
-	 * Fast path for syscall audit without full syscall trace.
-	 * We just call __audit_syscall_entry() directly, and then
-	 * jump back to the normal fast path.
-	 */
-auditsys:
-	movq %r10,%r9			/* 6th arg: 4th syscall arg */
-	movq %rdx,%r8			/* 5th arg: 3rd syscall arg */
-	movq %rsi,%rcx			/* 4th arg: 2nd syscall arg */
-	movq %rdi,%rdx			/* 3rd arg: 1st syscall arg */
-	movq %rax,%rsi			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_X86_64,%edi	/* 1st arg: audit arch */
-	call __audit_syscall_entry
-	LOAD_ARGS 0		/* reload call-clobbered registers */
-	jmp system_call_fastpath
-
-	/*
 	 * Return fast path for syscall audit.  Call __audit_syscall_exit()
 	 * directly and then jump back to the fast path with TIF_SYSCALL_AUDIT
 	 * masked off.
@@ -511,17 +495,25 @@ sysret_audit:
 
 	/* Do syscall tracing */
 tracesys:
-#ifdef CONFIG_AUDITSYSCALL
-	testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
-	jz auditsys
-#endif
+	leaq -REST_SKIP(%rsp), %rdi
+	movq $AUDIT_ARCH_X86_64, %rsi
+	call syscall_trace_enter_phase1
+	test %rax, %rax
+	jnz tracesys_phase2		/* if needed, run the slow path */
+	LOAD_ARGS 0			/* else restore clobbered regs */
+	jmp system_call_fastpath	/*      and return to the fast path */
+
+tracesys_phase2:
 	SAVE_REST
 	FIXUP_TOP_OF_STACK %rdi
-	movq %rsp,%rdi
-	call syscall_trace_enter
+	movq %rsp, %rdi
+	movq $AUDIT_ARCH_X86_64, %rsi
+	movq %rax,%rdx
+	call syscall_trace_enter_phase2
+
 	/*
 	 * Reload arg registers from stack in case ptrace changed them.
-	 * We don't reload %rax because syscall_trace_enter() returned
+	 * We don't reload %rax because syscall_trace_entry_phase2() returned
 	 * the value it wants us to use in the table lookup.
 	 */
 	LOAD_ARGS ARGOFFSET, 1
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-29  3:38 ` Andy Lutomirski
@ 2014-07-29 19:20   ` Oleg Nesterov
  -1 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-29 19:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Kees Cook, Will Drewry, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module,
	Alexei Starovoitov, hpa, Frederic Weisbecker

Andy, to avoid the confusion: I am not trying to review this changes.
As you probably know my understanding of asm code in entry.S is very
limited.

Just a couple of questions to ensure I understand this correctly.

On 07/28, Andy Lutomirski wrote:
>
> This is both a cleanup and a speedup.  It reduces overhead due to
> installing a trivial seccomp filter by 87%.  The speedup comes from
> avoiding the full syscall tracing mechanism for filters that don't
> return SECCOMP_RET_TRACE.

And only after I look at 5/5 I _seem_ to actually understand where
this speedup comes from.

So. Currently tracesys: path always lead to "iret" after syscall, with
this change we can avoid it if phase_1() returns zero, correct?

And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
cool.

I am wondering if we can do something similar with do_notify_resume() ?

Stupid question. To simplify, lets forget that syscall_trace_enter()
already returns the value. Can't we simplify the asm code if we do
not export 2 functions, but make syscall_trace_enter() return
"bool slow_path_is_needed". So that "tracesys:" could do

	// pseudo code

tracesys:
	SAVE_REST
	FIXUP_TOP_OF_STACK

	call syscall_trace_enter

	if (!slow_path_is_needed) {
		addq REST_SKIP, %rsp
		jmp system_call_fastpath
	}

	...

?

Once again, I am just curious, it is not that I actually suggest to consider
this option.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-29 19:20   ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-29 19:20 UTC (permalink / raw)
  To: linux-arm-kernel

Andy, to avoid the confusion: I am not trying to review this changes.
As you probably know my understanding of asm code in entry.S is very
limited.

Just a couple of questions to ensure I understand this correctly.

On 07/28, Andy Lutomirski wrote:
>
> This is both a cleanup and a speedup.  It reduces overhead due to
> installing a trivial seccomp filter by 87%.  The speedup comes from
> avoiding the full syscall tracing mechanism for filters that don't
> return SECCOMP_RET_TRACE.

And only after I look at 5/5 I _seem_ to actually understand where
this speedup comes from.

So. Currently tracesys: path always lead to "iret" after syscall, with
this change we can avoid it if phase_1() returns zero, correct?

And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
cool.

I am wondering if we can do something similar with do_notify_resume() ?

Stupid question. To simplify, lets forget that syscall_trace_enter()
already returns the value. Can't we simplify the asm code if we do
not export 2 functions, but make syscall_trace_enter() return
"bool slow_path_is_needed". So that "tracesys:" could do

	// pseudo code

tracesys:
	SAVE_REST
	FIXUP_TOP_OF_STACK

	call syscall_trace_enter

	if (!slow_path_is_needed) {
		addq REST_SKIP, %rsp
		jmp system_call_fastpath
	}

	...

?

Once again, I am just curious, it is not that I actually suggest to consider
this option.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/5] x86: Split syscall_trace_enter into two phases
  2014-07-29  3:38   ` Andy Lutomirski
@ 2014-07-29 19:25     ` Oleg Nesterov
  -1 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-29 19:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Kees Cook, Will Drewry, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module,
	Alexei Starovoitov, hpa, Frederic Weisbecker

On 07/28, Andy Lutomirski wrote:
>
> +long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
> +				unsigned long phase1_result)
> +{
> +	long ret = 0;
> +	u32 work = ACCESS_ONCE(current_thread_info()->flags) &
> +		_TIF_WORK_SYSCALL_ENTRY;
> +
> +	BUG_ON(regs != task_pt_regs(current));
>  
>  	/*
>  	 * If we stepped into a sysenter/syscall insn, it trapped in
> @@ -1463,17 +1569,20 @@ long syscall_trace_enter(struct pt_regs *regs)
>  	 * do_debug() and we need to set it again to restore the user
>  	 * state.  If we entered on the slow path, TF was already set.
>  	 */
> -	if (test_thread_flag(TIF_SINGLESTEP))
> +	if (work & _TIF_SINGLESTEP)
>  		regs->flags |= X86_EFLAGS_TF;

This is minor, I won't argue, but this can be moved into phase1(), afaics.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 3/5] x86: Split syscall_trace_enter into two phases
@ 2014-07-29 19:25     ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-29 19:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/28, Andy Lutomirski wrote:
>
> +long syscall_trace_enter_phase2(struct pt_regs *regs, u32 arch,
> +				unsigned long phase1_result)
> +{
> +	long ret = 0;
> +	u32 work = ACCESS_ONCE(current_thread_info()->flags) &
> +		_TIF_WORK_SYSCALL_ENTRY;
> +
> +	BUG_ON(regs != task_pt_regs(current));
>  
>  	/*
>  	 * If we stepped into a sysenter/syscall insn, it trapped in
> @@ -1463,17 +1569,20 @@ long syscall_trace_enter(struct pt_regs *regs)
>  	 * do_debug() and we need to set it again to restore the user
>  	 * state.  If we entered on the slow path, TF was already set.
>  	 */
> -	if (test_thread_flag(TIF_SINGLESTEP))
> +	if (work & _TIF_SINGLESTEP)
>  		regs->flags |= X86_EFLAGS_TF;

This is minor, I won't argue, but this can be moved into phase1(), afaics.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-29  3:38   ` Andy Lutomirski
@ 2014-07-29 19:32     ` Oleg Nesterov
  -1 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-29 19:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Kees Cook, Will Drewry, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module,
	Alexei Starovoitov, hpa, Frederic Weisbecker

On 07/28, Andy Lutomirski wrote:
>
> @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
>  {
>  	long ret = 0;
>  
> -	user_exit();
> +	/*
> +	 * If TIF_NOHZ is set, we are required to call user_exit() before
> +	 * doing anything that could touch RCU.
> +	 */
> +	if (test_thread_flag(TIF_NOHZ))
> +		user_exit();

Personally I still think this change just adds more confusion, but I leave
this to you and Frederic.

It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
the implementation detail which triggers this slow path.

At least it should be correct, unless I am confused even more than I think.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-29 19:32     ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-29 19:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/28, Andy Lutomirski wrote:
>
> @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
>  {
>  	long ret = 0;
>  
> -	user_exit();
> +	/*
> +	 * If TIF_NOHZ is set, we are required to call user_exit() before
> +	 * doing anything that could touch RCU.
> +	 */
> +	if (test_thread_flag(TIF_NOHZ))
> +		user_exit();

Personally I still think this change just adds more confusion, but I leave
this to you and Frederic.

It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
the implementation detail which triggers this slow path.

At least it should be correct, unless I am confused even more than I think.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-29 19:20   ` Oleg Nesterov
  (?)
@ 2014-07-29 20:54     ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29 20:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: H. Peter Anvin, linux-arch, X86 ML, Frederic Weisbecker,
	LSM List, Linux MIPS Mailing List, linux-arm-kernel,
	Alexei Starovoitov, Will Drewry, Kees Cook, linux-kernel

On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>
> Andy, to avoid the confusion: I am not trying to review this changes.
> As you probably know my understanding of asm code in entry.S is very
> limited.
>
> Just a couple of questions to ensure I understand this correctly.
>
> On 07/28, Andy Lutomirski wrote:
> >
> > This is both a cleanup and a speedup.  It reduces overhead due to
> > installing a trivial seccomp filter by 87%.  The speedup comes from
> > avoiding the full syscall tracing mechanism for filters that don't
> > return SECCOMP_RET_TRACE.
>
> And only after I look at 5/5 I _seem_ to actually understand where
> this speedup comes from.
>
> So. Currently tracesys: path always lead to "iret" after syscall, with
> this change we can avoid it if phase_1() returns zero, correct?
>
> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
> cool.
>
> I am wondering if we can do something similar with do_notify_resume() ?
>
>
> Stupid question. To simplify, lets forget that syscall_trace_enter()
> already returns the value. Can't we simplify the asm code if we do
> not export 2 functions, but make syscall_trace_enter() return
> "bool slow_path_is_needed". So that "tracesys:" could do
>
>         // pseudo code
>
> tracesys:
>         SAVE_REST
>         FIXUP_TOP_OF_STACK
>
>         call syscall_trace_enter
>
>         if (!slow_path_is_needed) {
>                 addq REST_SKIP, %rsp
>                 jmp system_call_fastpath
>         }
>
>         ...
>
> ?
>
> Once again, I am just curious, it is not that I actually suggest to consider
> this option.

We could, but this would lose a decent amount of the speedup.  I could
try it and benchmark it, but I'm guessing that the save and restore is
kind of expensive.  This will make audit slower than it currently is,
which may also annoy some people.  (Not me.)

I'm also not convinced that it would be much simpler.  My code is currently:

tracesys:
    leaq -REST_SKIP(%rsp), %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    call syscall_trace_enter_phase1
    test %rax, %rax
    jnz tracesys_phase2        /* if needed, run the slow path */
    LOAD_ARGS 0            /* else restore clobbered regs */
    jmp system_call_fastpath    /*      and return to the fast path */

tracesys_phase2:
    SAVE_REST
    FIXUP_TOP_OF_STACK %rdi
    movq %rsp, %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    movq %rax,%rdx
    call syscall_trace_enter_phase2

    LOAD_ARGS ARGOFFSET, 1
    RESTORE_REST

    ... slow path here ...

It would end up looking more like (totally untested):

tracesys:
    SAVE_REST
    FIXUP_TOP_OF_STACK %rdi
    mov %rsp, %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    call syscall_trace_enter
    LOAD_ARGS
    RESTORE_REST
    test [whatever condition]
    j[cond] system_call_fastpath

    ... slow path here ...

So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
multiple syscall path distinction.

SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
21 movqs, and addq, and a subq.  That may be significant.  (And I
suspect that the difference is much larger on platforms like arm64,
but that's a separate issue.)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-29 20:54     ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29 20:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: H. Peter Anvin, linux-arch, X86 ML, Frederic Weisbecker,
	LSM List, Linux MIPS Mailing List, linux-arm-kernel,
	Alexei Starovoitov, Will Drewry, Kees Cook, linux-kernel

On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>
> Andy, to avoid the confusion: I am not trying to review this changes.
> As you probably know my understanding of asm code in entry.S is very
> limited.
>
> Just a couple of questions to ensure I understand this correctly.
>
> On 07/28, Andy Lutomirski wrote:
> >
> > This is both a cleanup and a speedup.  It reduces overhead due to
> > installing a trivial seccomp filter by 87%.  The speedup comes from
> > avoiding the full syscall tracing mechanism for filters that don't
> > return SECCOMP_RET_TRACE.
>
> And only after I look at 5/5 I _seem_ to actually understand where
> this speedup comes from.
>
> So. Currently tracesys: path always lead to "iret" after syscall, with
> this change we can avoid it if phase_1() returns zero, correct?
>
> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
> cool.
>
> I am wondering if we can do something similar with do_notify_resume() ?
>
>
> Stupid question. To simplify, lets forget that syscall_trace_enter()
> already returns the value. Can't we simplify the asm code if we do
> not export 2 functions, but make syscall_trace_enter() return
> "bool slow_path_is_needed". So that "tracesys:" could do
>
>         // pseudo code
>
> tracesys:
>         SAVE_REST
>         FIXUP_TOP_OF_STACK
>
>         call syscall_trace_enter
>
>         if (!slow_path_is_needed) {
>                 addq REST_SKIP, %rsp
>                 jmp system_call_fastpath
>         }
>
>         ...
>
> ?
>
> Once again, I am just curious, it is not that I actually suggest to consider
> this option.

We could, but this would lose a decent amount of the speedup.  I could
try it and benchmark it, but I'm guessing that the save and restore is
kind of expensive.  This will make audit slower than it currently is,
which may also annoy some people.  (Not me.)

I'm also not convinced that it would be much simpler.  My code is currently:

tracesys:
    leaq -REST_SKIP(%rsp), %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    call syscall_trace_enter_phase1
    test %rax, %rax
    jnz tracesys_phase2        /* if needed, run the slow path */
    LOAD_ARGS 0            /* else restore clobbered regs */
    jmp system_call_fastpath    /*      and return to the fast path */

tracesys_phase2:
    SAVE_REST
    FIXUP_TOP_OF_STACK %rdi
    movq %rsp, %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    movq %rax,%rdx
    call syscall_trace_enter_phase2

    LOAD_ARGS ARGOFFSET, 1
    RESTORE_REST

    ... slow path here ...

It would end up looking more like (totally untested):

tracesys:
    SAVE_REST
    FIXUP_TOP_OF_STACK %rdi
    mov %rsp, %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    call syscall_trace_enter
    LOAD_ARGS
    RESTORE_REST
    test [whatever condition]
    j[cond] system_call_fastpath

    ... slow path here ...

So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
multiple syscall path distinction.

SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
21 movqs, and addq, and a subq.  That may be significant.  (And I
suspect that the difference is much larger on platforms like arm64,
but that's a separate issue.)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-29 20:54     ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29 20:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>
> Andy, to avoid the confusion: I am not trying to review this changes.
> As you probably know my understanding of asm code in entry.S is very
> limited.
>
> Just a couple of questions to ensure I understand this correctly.
>
> On 07/28, Andy Lutomirski wrote:
> >
> > This is both a cleanup and a speedup.  It reduces overhead due to
> > installing a trivial seccomp filter by 87%.  The speedup comes from
> > avoiding the full syscall tracing mechanism for filters that don't
> > return SECCOMP_RET_TRACE.
>
> And only after I look at 5/5 I _seem_ to actually understand where
> this speedup comes from.
>
> So. Currently tracesys: path always lead to "iret" after syscall, with
> this change we can avoid it if phase_1() returns zero, correct?
>
> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
> cool.
>
> I am wondering if we can do something similar with do_notify_resume() ?
>
>
> Stupid question. To simplify, lets forget that syscall_trace_enter()
> already returns the value. Can't we simplify the asm code if we do
> not export 2 functions, but make syscall_trace_enter() return
> "bool slow_path_is_needed". So that "tracesys:" could do
>
>         // pseudo code
>
> tracesys:
>         SAVE_REST
>         FIXUP_TOP_OF_STACK
>
>         call syscall_trace_enter
>
>         if (!slow_path_is_needed) {
>                 addq REST_SKIP, %rsp
>                 jmp system_call_fastpath
>         }
>
>         ...
>
> ?
>
> Once again, I am just curious, it is not that I actually suggest to consider
> this option.

We could, but this would lose a decent amount of the speedup.  I could
try it and benchmark it, but I'm guessing that the save and restore is
kind of expensive.  This will make audit slower than it currently is,
which may also annoy some people.  (Not me.)

I'm also not convinced that it would be much simpler.  My code is currently:

tracesys:
    leaq -REST_SKIP(%rsp), %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    call syscall_trace_enter_phase1
    test %rax, %rax
    jnz tracesys_phase2        /* if needed, run the slow path */
    LOAD_ARGS 0            /* else restore clobbered regs */
    jmp system_call_fastpath    /*      and return to the fast path */

tracesys_phase2:
    SAVE_REST
    FIXUP_TOP_OF_STACK %rdi
    movq %rsp, %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    movq %rax,%rdx
    call syscall_trace_enter_phase2

    LOAD_ARGS ARGOFFSET, 1
    RESTORE_REST

    ... slow path here ...

It would end up looking more like (totally untested):

tracesys:
    SAVE_REST
    FIXUP_TOP_OF_STACK %rdi
    mov %rsp, %rdi
    movq $AUDIT_ARCH_X86_64, %rsi
    call syscall_trace_enter
    LOAD_ARGS
    RESTORE_REST
    test [whatever condition]
    j[cond] system_call_fastpath

    ... slow path here ...

So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
multiple syscall path distinction.

SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
21 movqs, and addq, and a subq.  That may be significant.  (And I
suspect that the difference is much larger on platforms like arm64,
but that's a separate issue.)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-29 20:54     ` Andy Lutomirski
  (?)
@ 2014-07-29 23:30       ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29 23:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: H. Peter Anvin, linux-arch, X86 ML, Frederic Weisbecker,
	LSM List, Linux MIPS Mailing List, linux-arm-kernel,
	Alexei Starovoitov, Will Drewry, Kees Cook, linux-kernel

On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>>
>> Andy, to avoid the confusion: I am not trying to review this changes.
>> As you probably know my understanding of asm code in entry.S is very
>> limited.
>>
>> Just a couple of questions to ensure I understand this correctly.
>>
>> On 07/28, Andy Lutomirski wrote:
>> >
>> > This is both a cleanup and a speedup.  It reduces overhead due to
>> > installing a trivial seccomp filter by 87%.  The speedup comes from
>> > avoiding the full syscall tracing mechanism for filters that don't
>> > return SECCOMP_RET_TRACE.
>>
>> And only after I look at 5/5 I _seem_ to actually understand where
>> this speedup comes from.
>>
>> So. Currently tracesys: path always lead to "iret" after syscall, with
>> this change we can avoid it if phase_1() returns zero, correct?
>>
>> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
>> cool.
>>
>> I am wondering if we can do something similar with do_notify_resume() ?
>>
>>
>> Stupid question. To simplify, lets forget that syscall_trace_enter()
>> already returns the value. Can't we simplify the asm code if we do
>> not export 2 functions, but make syscall_trace_enter() return
>> "bool slow_path_is_needed". So that "tracesys:" could do
>>
>>         // pseudo code
>>
>> tracesys:
>>         SAVE_REST
>>         FIXUP_TOP_OF_STACK
>>
>>         call syscall_trace_enter
>>
>>         if (!slow_path_is_needed) {
>>                 addq REST_SKIP, %rsp
>>                 jmp system_call_fastpath
>>         }
>>
>>         ...
>>
>> ?
>>
>> Once again, I am just curious, it is not that I actually suggest to consider
>> this option.
>
> We could, but this would lose a decent amount of the speedup.  I could
> try it and benchmark it, but I'm guessing that the save and restore is
> kind of expensive.  This will make audit slower than it currently is,
> which may also annoy some people.  (Not me.)
>
> I'm also not convinced that it would be much simpler.  My code is currently:
>
> tracesys:
>     leaq -REST_SKIP(%rsp), %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     call syscall_trace_enter_phase1
>     test %rax, %rax
>     jnz tracesys_phase2        /* if needed, run the slow path */
>     LOAD_ARGS 0            /* else restore clobbered regs */
>     jmp system_call_fastpath    /*      and return to the fast path */
>
> tracesys_phase2:
>     SAVE_REST
>     FIXUP_TOP_OF_STACK %rdi
>     movq %rsp, %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     movq %rax,%rdx
>     call syscall_trace_enter_phase2
>
>     LOAD_ARGS ARGOFFSET, 1
>     RESTORE_REST
>
>     ... slow path here ...
>
> It would end up looking more like (totally untested):
>
> tracesys:
>     SAVE_REST
>     FIXUP_TOP_OF_STACK %rdi
>     mov %rsp, %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     call syscall_trace_enter
>     LOAD_ARGS
>     RESTORE_REST
>     test [whatever condition]
>     j[cond] system_call_fastpath
>
>     ... slow path here ...
>
> So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
> multiple syscall path distinction.
>
> SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> 21 movqs, and addq, and a subq.  That may be significant.  (And I
> suspect that the difference is much larger on platforms like arm64,
> but that's a separate issue.)

To put some more options on the table: there's an argument to be made
that the whole fast-path/slow-path split isn't worth it.  We could
unconditionally set up a full frame for all syscalls.  This means:

No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
SS, CS, RCX, and EFLAGS right away.  That's five stores for all
syscalls instead of two loads and five stores for syscalls that need
it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.

No more bugs involving C code that assumes a full stack frame when no
such frame exists inside syscall code.

We could possibly remove a whole bunch of duplicated code.

The upshot would be simpler code, faster slow-path syscalls, and
slower fast-path syscalls (but probably not much slower).  On the
other hand, there's zero chance that this would be ready for 3.17.

I'd tend to advocate for keeping the approach in my patches for now.
It's probably a smaller assembly diff than any of the other options --
the split between fast-path, slower fast-path, and slow-path already
exists due to the audit crap.  If we end up making more radical
changes later, then at worst we end up reverting part of the change to
ptrace.c.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-29 23:30       ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29 23:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: H. Peter Anvin, linux-arch, X86 ML, Frederic Weisbecker,
	LSM List, Linux MIPS Mailing List, linux-arm-kernel,
	Alexei Starovoitov, Will Drewry, Kees Cook, linux-kernel

On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>>
>> Andy, to avoid the confusion: I am not trying to review this changes.
>> As you probably know my understanding of asm code in entry.S is very
>> limited.
>>
>> Just a couple of questions to ensure I understand this correctly.
>>
>> On 07/28, Andy Lutomirski wrote:
>> >
>> > This is both a cleanup and a speedup.  It reduces overhead due to
>> > installing a trivial seccomp filter by 87%.  The speedup comes from
>> > avoiding the full syscall tracing mechanism for filters that don't
>> > return SECCOMP_RET_TRACE.
>>
>> And only after I look at 5/5 I _seem_ to actually understand where
>> this speedup comes from.
>>
>> So. Currently tracesys: path always lead to "iret" after syscall, with
>> this change we can avoid it if phase_1() returns zero, correct?
>>
>> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
>> cool.
>>
>> I am wondering if we can do something similar with do_notify_resume() ?
>>
>>
>> Stupid question. To simplify, lets forget that syscall_trace_enter()
>> already returns the value. Can't we simplify the asm code if we do
>> not export 2 functions, but make syscall_trace_enter() return
>> "bool slow_path_is_needed". So that "tracesys:" could do
>>
>>         // pseudo code
>>
>> tracesys:
>>         SAVE_REST
>>         FIXUP_TOP_OF_STACK
>>
>>         call syscall_trace_enter
>>
>>         if (!slow_path_is_needed) {
>>                 addq REST_SKIP, %rsp
>>                 jmp system_call_fastpath
>>         }
>>
>>         ...
>>
>> ?
>>
>> Once again, I am just curious, it is not that I actually suggest to consider
>> this option.
>
> We could, but this would lose a decent amount of the speedup.  I could
> try it and benchmark it, but I'm guessing that the save and restore is
> kind of expensive.  This will make audit slower than it currently is,
> which may also annoy some people.  (Not me.)
>
> I'm also not convinced that it would be much simpler.  My code is currently:
>
> tracesys:
>     leaq -REST_SKIP(%rsp), %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     call syscall_trace_enter_phase1
>     test %rax, %rax
>     jnz tracesys_phase2        /* if needed, run the slow path */
>     LOAD_ARGS 0            /* else restore clobbered regs */
>     jmp system_call_fastpath    /*      and return to the fast path */
>
> tracesys_phase2:
>     SAVE_REST
>     FIXUP_TOP_OF_STACK %rdi
>     movq %rsp, %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     movq %rax,%rdx
>     call syscall_trace_enter_phase2
>
>     LOAD_ARGS ARGOFFSET, 1
>     RESTORE_REST
>
>     ... slow path here ...
>
> It would end up looking more like (totally untested):
>
> tracesys:
>     SAVE_REST
>     FIXUP_TOP_OF_STACK %rdi
>     mov %rsp, %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     call syscall_trace_enter
>     LOAD_ARGS
>     RESTORE_REST
>     test [whatever condition]
>     j[cond] system_call_fastpath
>
>     ... slow path here ...
>
> So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
> multiple syscall path distinction.
>
> SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> 21 movqs, and addq, and a subq.  That may be significant.  (And I
> suspect that the difference is much larger on platforms like arm64,
> but that's a separate issue.)

To put some more options on the table: there's an argument to be made
that the whole fast-path/slow-path split isn't worth it.  We could
unconditionally set up a full frame for all syscalls.  This means:

No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
SS, CS, RCX, and EFLAGS right away.  That's five stores for all
syscalls instead of two loads and five stores for syscalls that need
it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.

No more bugs involving C code that assumes a full stack frame when no
such frame exists inside syscall code.

We could possibly remove a whole bunch of duplicated code.

The upshot would be simpler code, faster slow-path syscalls, and
slower fast-path syscalls (but probably not much slower).  On the
other hand, there's zero chance that this would be ready for 3.17.

I'd tend to advocate for keeping the approach in my patches for now.
It's probably a smaller assembly diff than any of the other options --
the split between fast-path, slower fast-path, and slow-path already
exists due to the audit crap.  If we end up making more radical
changes later, then at worst we end up reverting part of the change to
ptrace.c.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-29 23:30       ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-29 23:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>>
>> Andy, to avoid the confusion: I am not trying to review this changes.
>> As you probably know my understanding of asm code in entry.S is very
>> limited.
>>
>> Just a couple of questions to ensure I understand this correctly.
>>
>> On 07/28, Andy Lutomirski wrote:
>> >
>> > This is both a cleanup and a speedup.  It reduces overhead due to
>> > installing a trivial seccomp filter by 87%.  The speedup comes from
>> > avoiding the full syscall tracing mechanism for filters that don't
>> > return SECCOMP_RET_TRACE.
>>
>> And only after I look at 5/5 I _seem_ to actually understand where
>> this speedup comes from.
>>
>> So. Currently tracesys: path always lead to "iret" after syscall, with
>> this change we can avoid it if phase_1() returns zero, correct?
>>
>> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
>> cool.
>>
>> I am wondering if we can do something similar with do_notify_resume() ?
>>
>>
>> Stupid question. To simplify, lets forget that syscall_trace_enter()
>> already returns the value. Can't we simplify the asm code if we do
>> not export 2 functions, but make syscall_trace_enter() return
>> "bool slow_path_is_needed". So that "tracesys:" could do
>>
>>         // pseudo code
>>
>> tracesys:
>>         SAVE_REST
>>         FIXUP_TOP_OF_STACK
>>
>>         call syscall_trace_enter
>>
>>         if (!slow_path_is_needed) {
>>                 addq REST_SKIP, %rsp
>>                 jmp system_call_fastpath
>>         }
>>
>>         ...
>>
>> ?
>>
>> Once again, I am just curious, it is not that I actually suggest to consider
>> this option.
>
> We could, but this would lose a decent amount of the speedup.  I could
> try it and benchmark it, but I'm guessing that the save and restore is
> kind of expensive.  This will make audit slower than it currently is,
> which may also annoy some people.  (Not me.)
>
> I'm also not convinced that it would be much simpler.  My code is currently:
>
> tracesys:
>     leaq -REST_SKIP(%rsp), %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     call syscall_trace_enter_phase1
>     test %rax, %rax
>     jnz tracesys_phase2        /* if needed, run the slow path */
>     LOAD_ARGS 0            /* else restore clobbered regs */
>     jmp system_call_fastpath    /*      and return to the fast path */
>
> tracesys_phase2:
>     SAVE_REST
>     FIXUP_TOP_OF_STACK %rdi
>     movq %rsp, %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     movq %rax,%rdx
>     call syscall_trace_enter_phase2
>
>     LOAD_ARGS ARGOFFSET, 1
>     RESTORE_REST
>
>     ... slow path here ...
>
> It would end up looking more like (totally untested):
>
> tracesys:
>     SAVE_REST
>     FIXUP_TOP_OF_STACK %rdi
>     mov %rsp, %rdi
>     movq $AUDIT_ARCH_X86_64, %rsi
>     call syscall_trace_enter
>     LOAD_ARGS
>     RESTORE_REST
>     test [whatever condition]
>     j[cond] system_call_fastpath
>
>     ... slow path here ...
>
> So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
> multiple syscall path distinction.
>
> SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> 21 movqs, and addq, and a subq.  That may be significant.  (And I
> suspect that the difference is much larger on platforms like arm64,
> but that's a separate issue.)

To put some more options on the table: there's an argument to be made
that the whole fast-path/slow-path split isn't worth it.  We could
unconditionally set up a full frame for all syscalls.  This means:

No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
SS, CS, RCX, and EFLAGS right away.  That's five stores for all
syscalls instead of two loads and five stores for syscalls that need
it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.

No more bugs involving C code that assumes a full stack frame when no
such frame exists inside syscall code.

We could possibly remove a whole bunch of duplicated code.

The upshot would be simpler code, faster slow-path syscalls, and
slower fast-path syscalls (but probably not much slower).  On the
other hand, there's zero chance that this would be ready for 3.17.

I'd tend to advocate for keeping the approach in my patches for now.
It's probably a smaller assembly diff than any of the other options --
the split between fast-path, slower fast-path, and slow-path already
exists due to the audit crap.  If we end up making more radical
changes later, then at worst we end up reverting part of the change to
ptrace.c.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-29 23:30       ` Andy Lutomirski
  (?)
@ 2014-07-30 15:32         ` Oleg Nesterov
  -1 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-30 15:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, linux-arch, X86 ML, Frederic Weisbecker,
	LSM List, Linux MIPS Mailing List, linux-arm-kernel,
	Alexei Starovoitov, Will Drewry, Kees Cook, linux-kernel

On 07/29, Andy Lutomirski wrote:
>
> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
> > suspect that the difference is much larger on platforms like arm64,
> > but that's a separate issue.)

OK, thanks. We could probably simplify the logic in phase1 + phase2 if
it was a single function though.

> To put some more options on the table: there's an argument to be made
> that the whole fast-path/slow-path split isn't worth it.  We could
> unconditionally set up a full frame for all syscalls.  This means:

Or, at least, can't we allocate the full frame and avoid "add/sub %rsp"?

> This means:
...
> On the
> other hand, there's zero chance that this would be ready for 3.17.
>
> I'd tend to advocate for keeping the approach in my patches for now.

Yes, sure, I didn't try to convince you to change this code. Thanks.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-30 15:32         ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-30 15:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, linux-arch, X86 ML, Frederic Weisbecker,
	LSM List, Linux MIPS Mailing List, linux-arm-kernel,
	Alexei Starovoitov, Will Drewry, Kees Cook, linux-kernel

On 07/29, Andy Lutomirski wrote:
>
> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
> > suspect that the difference is much larger on platforms like arm64,
> > but that's a separate issue.)

OK, thanks. We could probably simplify the logic in phase1 + phase2 if
it was a single function though.

> To put some more options on the table: there's an argument to be made
> that the whole fast-path/slow-path split isn't worth it.  We could
> unconditionally set up a full frame for all syscalls.  This means:

Or, at least, can't we allocate the full frame and avoid "add/sub %rsp"?

> This means:
...
> On the
> other hand, there's zero chance that this would be ready for 3.17.
>
> I'd tend to advocate for keeping the approach in my patches for now.

Yes, sure, I didn't try to convince you to change this code. Thanks.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-30 15:32         ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-30 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/29, Andy Lutomirski wrote:
>
> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
> > suspect that the difference is much larger on platforms like arm64,
> > but that's a separate issue.)

OK, thanks. We could probably simplify the logic in phase1 + phase2 if
it was a single function though.

> To put some more options on the table: there's an argument to be made
> that the whole fast-path/slow-path split isn't worth it.  We could
> unconditionally set up a full frame for all syscalls.  This means:

Or, at least, can't we allocate the full frame and avoid "add/sub %rsp"?

> This means:
...
> On the
> other hand, there's zero chance that this would be ready for 3.17.
>
> I'd tend to advocate for keeping the approach in my patches for now.

Yes, sure, I didn't try to convince you to change this code. Thanks.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-29 19:32     ` Oleg Nesterov
@ 2014-07-30 16:43       ` Frederic Weisbecker
  -1 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-30 16:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, x86,
	linux-arm-kernel, linux-mips, linux-arch, linux-security-module,
	Alexei Starovoitov, hpa

On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
> On 07/28, Andy Lutomirski wrote:
> >
> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
> >  {
> >  	long ret = 0;
> >  
> > -	user_exit();
> > +	/*
> > +	 * If TIF_NOHZ is set, we are required to call user_exit() before
> > +	 * doing anything that could touch RCU.
> > +	 */
> > +	if (test_thread_flag(TIF_NOHZ))
> > +		user_exit();
> 
> Personally I still think this change just adds more confusion, but I leave
> this to you and Frederic.
> 
> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
> the implementation detail which triggers this slow path.
> 
> At least it should be correct, unless I am confused even more than I think.

Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
this is only about tracing? syscall_slowpath_enter() could be an alternative.
But that's still tracing in a general sense so...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-30 16:43       ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-30 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
> On 07/28, Andy Lutomirski wrote:
> >
> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
> >  {
> >  	long ret = 0;
> >  
> > -	user_exit();
> > +	/*
> > +	 * If TIF_NOHZ is set, we are required to call user_exit() before
> > +	 * doing anything that could touch RCU.
> > +	 */
> > +	if (test_thread_flag(TIF_NOHZ))
> > +		user_exit();
> 
> Personally I still think this change just adds more confusion, but I leave
> this to you and Frederic.
> 
> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
> the implementation detail which triggers this slow path.
> 
> At least it should be correct, unless I am confused even more than I think.

Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
this is only about tracing? syscall_slowpath_enter() could be an alternative.
But that's still tracing in a general sense so...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-29 23:30       ` Andy Lutomirski
  (?)
@ 2014-07-30 16:59         ` Frederic Weisbecker
  -1 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-30 16:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, H. Peter Anvin, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On Tue, Jul 29, 2014 at 04:30:58PM -0700, Andy Lutomirski wrote:
> On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >>
> >> Andy, to avoid the confusion: I am not trying to review this changes.
> >> As you probably know my understanding of asm code in entry.S is very
> >> limited.
> >>
> >> Just a couple of questions to ensure I understand this correctly.
> >>
> >> On 07/28, Andy Lutomirski wrote:
> >> >
> >> > This is both a cleanup and a speedup.  It reduces overhead due to
> >> > installing a trivial seccomp filter by 87%.  The speedup comes from
> >> > avoiding the full syscall tracing mechanism for filters that don't
> >> > return SECCOMP_RET_TRACE.
> >>
> >> And only after I look at 5/5 I _seem_ to actually understand where
> >> this speedup comes from.
> >>
> >> So. Currently tracesys: path always lead to "iret" after syscall, with
> >> this change we can avoid it if phase_1() returns zero, correct?
> >>
> >> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
> >> cool.
> >>
> >> I am wondering if we can do something similar with do_notify_resume() ?
> >>
> >>
> >> Stupid question. To simplify, lets forget that syscall_trace_enter()
> >> already returns the value. Can't we simplify the asm code if we do
> >> not export 2 functions, but make syscall_trace_enter() return
> >> "bool slow_path_is_needed". So that "tracesys:" could do
> >>
> >>         // pseudo code
> >>
> >> tracesys:
> >>         SAVE_REST
> >>         FIXUP_TOP_OF_STACK
> >>
> >>         call syscall_trace_enter
> >>
> >>         if (!slow_path_is_needed) {
> >>                 addq REST_SKIP, %rsp
> >>                 jmp system_call_fastpath
> >>         }
> >>
> >>         ...
> >>
> >> ?
> >>
> >> Once again, I am just curious, it is not that I actually suggest to consider
> >> this option.
> >
> > We could, but this would lose a decent amount of the speedup.  I could
> > try it and benchmark it, but I'm guessing that the save and restore is
> > kind of expensive.  This will make audit slower than it currently is,
> > which may also annoy some people.  (Not me.)
> >
> > I'm also not convinced that it would be much simpler.  My code is currently:
> >
> > tracesys:
> >     leaq -REST_SKIP(%rsp), %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     call syscall_trace_enter_phase1
> >     test %rax, %rax
> >     jnz tracesys_phase2        /* if needed, run the slow path */
> >     LOAD_ARGS 0            /* else restore clobbered regs */
> >     jmp system_call_fastpath    /*      and return to the fast path */
> >
> > tracesys_phase2:
> >     SAVE_REST
> >     FIXUP_TOP_OF_STACK %rdi
> >     movq %rsp, %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     movq %rax,%rdx
> >     call syscall_trace_enter_phase2
> >
> >     LOAD_ARGS ARGOFFSET, 1
> >     RESTORE_REST
> >
> >     ... slow path here ...
> >
> > It would end up looking more like (totally untested):
> >
> > tracesys:
> >     SAVE_REST
> >     FIXUP_TOP_OF_STACK %rdi
> >     mov %rsp, %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     call syscall_trace_enter
> >     LOAD_ARGS
> >     RESTORE_REST
> >     test [whatever condition]
> >     j[cond] system_call_fastpath
> >
> >     ... slow path here ...
> >
> > So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
> > multiple syscall path distinction.
> >
> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
> > suspect that the difference is much larger on platforms like arm64,
> > but that's a separate issue.)
> 
> To put some more options on the table: there's an argument to be made
> that the whole fast-path/slow-path split isn't worth it.  We could
> unconditionally set up a full frame for all syscalls.  This means:
> 
> No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
> SS, CS, RCX, and EFLAGS right away.  That's five stores for all
> syscalls instead of two loads and five stores for syscalls that need
> it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.
> 
> No more bugs involving C code that assumes a full stack frame when no
> such frame exists inside syscall code.
> 
> We could possibly remove a whole bunch of duplicated code.
> 
> The upshot would be simpler code, faster slow-path syscalls, and
> slower fast-path syscalls (but probably not much slower).  On the
> other hand, there's zero chance that this would be ready for 3.17.

Indeed.

If we ever take that direction (ie: remove that partial frame optimization),
there is that common part that we can find in many archs when they return
from syscalls, exceptions or interrupts which could be more or less consolidated as:

void sysret_check(struct pt_regs *regs)
{
          while(test_thread_flag(TIF_ALLWORK_MASK)) {
              int resched = need_resched();

              local_irq_enable();
              if (resched)
                  schedule();
              else
                  do_notify_resume(regs);
              local_irq_disable()
           }
}

But well, probably the syscall fastpath is still worth it.              

> 
> I'd tend to advocate for keeping the approach in my patches for now.
> It's probably a smaller assembly diff than any of the other options --
> the split between fast-path, slower fast-path, and slow-path already
> exists due to the audit crap.  If we end up making more radical
> changes later, then at worst we end up reverting part of the change to
> ptrace.c.
> 
> --Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-30 16:59         ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-30 16:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, H. Peter Anvin, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On Tue, Jul 29, 2014 at 04:30:58PM -0700, Andy Lutomirski wrote:
> On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >>
> >> Andy, to avoid the confusion: I am not trying to review this changes.
> >> As you probably know my understanding of asm code in entry.S is very
> >> limited.
> >>
> >> Just a couple of questions to ensure I understand this correctly.
> >>
> >> On 07/28, Andy Lutomirski wrote:
> >> >
> >> > This is both a cleanup and a speedup.  It reduces overhead due to
> >> > installing a trivial seccomp filter by 87%.  The speedup comes from
> >> > avoiding the full syscall tracing mechanism for filters that don't
> >> > return SECCOMP_RET_TRACE.
> >>
> >> And only after I look at 5/5 I _seem_ to actually understand where
> >> this speedup comes from.
> >>
> >> So. Currently tracesys: path always lead to "iret" after syscall, with
> >> this change we can avoid it if phase_1() returns zero, correct?
> >>
> >> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
> >> cool.
> >>
> >> I am wondering if we can do something similar with do_notify_resume() ?
> >>
> >>
> >> Stupid question. To simplify, lets forget that syscall_trace_enter()
> >> already returns the value. Can't we simplify the asm code if we do
> >> not export 2 functions, but make syscall_trace_enter() return
> >> "bool slow_path_is_needed". So that "tracesys:" could do
> >>
> >>         // pseudo code
> >>
> >> tracesys:
> >>         SAVE_REST
> >>         FIXUP_TOP_OF_STACK
> >>
> >>         call syscall_trace_enter
> >>
> >>         if (!slow_path_is_needed) {
> >>                 addq REST_SKIP, %rsp
> >>                 jmp system_call_fastpath
> >>         }
> >>
> >>         ...
> >>
> >> ?
> >>
> >> Once again, I am just curious, it is not that I actually suggest to consider
> >> this option.
> >
> > We could, but this would lose a decent amount of the speedup.  I could
> > try it and benchmark it, but I'm guessing that the save and restore is
> > kind of expensive.  This will make audit slower than it currently is,
> > which may also annoy some people.  (Not me.)
> >
> > I'm also not convinced that it would be much simpler.  My code is currently:
> >
> > tracesys:
> >     leaq -REST_SKIP(%rsp), %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     call syscall_trace_enter_phase1
> >     test %rax, %rax
> >     jnz tracesys_phase2        /* if needed, run the slow path */
> >     LOAD_ARGS 0            /* else restore clobbered regs */
> >     jmp system_call_fastpath    /*      and return to the fast path */
> >
> > tracesys_phase2:
> >     SAVE_REST
> >     FIXUP_TOP_OF_STACK %rdi
> >     movq %rsp, %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     movq %rax,%rdx
> >     call syscall_trace_enter_phase2
> >
> >     LOAD_ARGS ARGOFFSET, 1
> >     RESTORE_REST
> >
> >     ... slow path here ...
> >
> > It would end up looking more like (totally untested):
> >
> > tracesys:
> >     SAVE_REST
> >     FIXUP_TOP_OF_STACK %rdi
> >     mov %rsp, %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     call syscall_trace_enter
> >     LOAD_ARGS
> >     RESTORE_REST
> >     test [whatever condition]
> >     j[cond] system_call_fastpath
> >
> >     ... slow path here ...
> >
> > So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
> > multiple syscall path distinction.
> >
> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
> > suspect that the difference is much larger on platforms like arm64,
> > but that's a separate issue.)
> 
> To put some more options on the table: there's an argument to be made
> that the whole fast-path/slow-path split isn't worth it.  We could
> unconditionally set up a full frame for all syscalls.  This means:
> 
> No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
> SS, CS, RCX, and EFLAGS right away.  That's five stores for all
> syscalls instead of two loads and five stores for syscalls that need
> it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.
> 
> No more bugs involving C code that assumes a full stack frame when no
> such frame exists inside syscall code.
> 
> We could possibly remove a whole bunch of duplicated code.
> 
> The upshot would be simpler code, faster slow-path syscalls, and
> slower fast-path syscalls (but probably not much slower).  On the
> other hand, there's zero chance that this would be ready for 3.17.

Indeed.

If we ever take that direction (ie: remove that partial frame optimization),
there is that common part that we can find in many archs when they return
from syscalls, exceptions or interrupts which could be more or less consolidated as:

void sysret_check(struct pt_regs *regs)
{
          while(test_thread_flag(TIF_ALLWORK_MASK)) {
              int resched = need_resched();

              local_irq_enable();
              if (resched)
                  schedule();
              else
                  do_notify_resume(regs);
              local_irq_disable()
           }
}

But well, probably the syscall fastpath is still worth it.              

> 
> I'd tend to advocate for keeping the approach in my patches for now.
> It's probably a smaller assembly diff than any of the other options --
> the split between fast-path, slower fast-path, and slow-path already
> exists due to the audit crap.  If we end up making more radical
> changes later, then at worst we end up reverting part of the change to
> ptrace.c.
> 
> --Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-30 16:59         ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-30 16:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 04:30:58PM -0700, Andy Lutomirski wrote:
> On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >>
> >> Andy, to avoid the confusion: I am not trying to review this changes.
> >> As you probably know my understanding of asm code in entry.S is very
> >> limited.
> >>
> >> Just a couple of questions to ensure I understand this correctly.
> >>
> >> On 07/28, Andy Lutomirski wrote:
> >> >
> >> > This is both a cleanup and a speedup.  It reduces overhead due to
> >> > installing a trivial seccomp filter by 87%.  The speedup comes from
> >> > avoiding the full syscall tracing mechanism for filters that don't
> >> > return SECCOMP_RET_TRACE.
> >>
> >> And only after I look at 5/5 I _seem_ to actually understand where
> >> this speedup comes from.
> >>
> >> So. Currently tracesys: path always lead to "iret" after syscall, with
> >> this change we can avoid it if phase_1() returns zero, correct?
> >>
> >> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
> >> cool.
> >>
> >> I am wondering if we can do something similar with do_notify_resume() ?
> >>
> >>
> >> Stupid question. To simplify, lets forget that syscall_trace_enter()
> >> already returns the value. Can't we simplify the asm code if we do
> >> not export 2 functions, but make syscall_trace_enter() return
> >> "bool slow_path_is_needed". So that "tracesys:" could do
> >>
> >>         // pseudo code
> >>
> >> tracesys:
> >>         SAVE_REST
> >>         FIXUP_TOP_OF_STACK
> >>
> >>         call syscall_trace_enter
> >>
> >>         if (!slow_path_is_needed) {
> >>                 addq REST_SKIP, %rsp
> >>                 jmp system_call_fastpath
> >>         }
> >>
> >>         ...
> >>
> >> ?
> >>
> >> Once again, I am just curious, it is not that I actually suggest to consider
> >> this option.
> >
> > We could, but this would lose a decent amount of the speedup.  I could
> > try it and benchmark it, but I'm guessing that the save and restore is
> > kind of expensive.  This will make audit slower than it currently is,
> > which may also annoy some people.  (Not me.)
> >
> > I'm also not convinced that it would be much simpler.  My code is currently:
> >
> > tracesys:
> >     leaq -REST_SKIP(%rsp), %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     call syscall_trace_enter_phase1
> >     test %rax, %rax
> >     jnz tracesys_phase2        /* if needed, run the slow path */
> >     LOAD_ARGS 0            /* else restore clobbered regs */
> >     jmp system_call_fastpath    /*      and return to the fast path */
> >
> > tracesys_phase2:
> >     SAVE_REST
> >     FIXUP_TOP_OF_STACK %rdi
> >     movq %rsp, %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     movq %rax,%rdx
> >     call syscall_trace_enter_phase2
> >
> >     LOAD_ARGS ARGOFFSET, 1
> >     RESTORE_REST
> >
> >     ... slow path here ...
> >
> > It would end up looking more like (totally untested):
> >
> > tracesys:
> >     SAVE_REST
> >     FIXUP_TOP_OF_STACK %rdi
> >     mov %rsp, %rdi
> >     movq $AUDIT_ARCH_X86_64, %rsi
> >     call syscall_trace_enter
> >     LOAD_ARGS
> >     RESTORE_REST
> >     test [whatever condition]
> >     j[cond] system_call_fastpath
> >
> >     ... slow path here ...
> >
> > So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
> > multiple syscall path distinction.
> >
> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
> > suspect that the difference is much larger on platforms like arm64,
> > but that's a separate issue.)
> 
> To put some more options on the table: there's an argument to be made
> that the whole fast-path/slow-path split isn't worth it.  We could
> unconditionally set up a full frame for all syscalls.  This means:
> 
> No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
> SS, CS, RCX, and EFLAGS right away.  That's five stores for all
> syscalls instead of two loads and five stores for syscalls that need
> it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.
> 
> No more bugs involving C code that assumes a full stack frame when no
> such frame exists inside syscall code.
> 
> We could possibly remove a whole bunch of duplicated code.
> 
> The upshot would be simpler code, faster slow-path syscalls, and
> slower fast-path syscalls (but probably not much slower).  On the
> other hand, there's zero chance that this would be ready for 3.17.

Indeed.

If we ever take that direction (ie: remove that partial frame optimization),
there is that common part that we can find in many archs when they return
from syscalls, exceptions or interrupts which could be more or less consolidated as:

void sysret_check(struct pt_regs *regs)
{
          while(test_thread_flag(TIF_ALLWORK_MASK)) {
              int resched = need_resched();

              local_irq_enable();
              if (resched)
                  schedule();
              else
                  do_notify_resume(regs);
              local_irq_disable()
           }
}

But well, probably the syscall fastpath is still worth it.              

> 
> I'd tend to advocate for keeping the approach in my patches for now.
> It's probably a smaller assembly diff than any of the other options --
> the split between fast-path, slower fast-path, and slow-path already
> exists due to the audit crap.  If we end up making more radical
> changes later, then at worst we end up reverting part of the change to
> ptrace.c.
> 
> --Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-30 16:43       ` Frederic Weisbecker
  (?)
@ 2014-07-30 17:23         ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-30 17:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Oleg Nesterov, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Wed, Jul 30, 2014 at 9:43 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
>> On 07/28, Andy Lutomirski wrote:
>> >
>> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
>> >  {
>> >     long ret = 0;
>> >
>> > -   user_exit();
>> > +   /*
>> > +    * If TIF_NOHZ is set, we are required to call user_exit() before
>> > +    * doing anything that could touch RCU.
>> > +    */
>> > +   if (test_thread_flag(TIF_NOHZ))
>> > +           user_exit();
>>
>> Personally I still think this change just adds more confusion, but I leave
>> this to you and Frederic.
>>
>> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
>> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
>> the implementation detail which triggers this slow path.
>>
>> At least it should be correct, unless I am confused even more than I think.
>
> Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
> this is only about tracing? syscall_slowpath_enter() could be an alternative.
> But that's still tracing in a general sense so...

At the end of the day, the syscall slowpath code calls a bunch of
functions depending on what TIF_XYZ flags are set.  As long as it's
structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
like that, it's comprehensible.  But once random functions with no
explicit flag checks or comments start showing up, it gets confusing.

If it's indeed all-or-nothing, I could remove the check and add a
comment.  But please keep in mind that, currently, the slow path is
*slow*, and my patches only improve the entry case.  So enabling
context tracking on every task will hurt.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-30 17:23         ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-30 17:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Oleg Nesterov, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Wed, Jul 30, 2014 at 9:43 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
>> On 07/28, Andy Lutomirski wrote:
>> >
>> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
>> >  {
>> >     long ret = 0;
>> >
>> > -   user_exit();
>> > +   /*
>> > +    * If TIF_NOHZ is set, we are required to call user_exit() before
>> > +    * doing anything that could touch RCU.
>> > +    */
>> > +   if (test_thread_flag(TIF_NOHZ))
>> > +           user_exit();
>>
>> Personally I still think this change just adds more confusion, but I leave
>> this to you and Frederic.
>>
>> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
>> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
>> the implementation detail which triggers this slow path.
>>
>> At least it should be correct, unless I am confused even more than I think.
>
> Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
> this is only about tracing? syscall_slowpath_enter() could be an alternative.
> But that's still tracing in a general sense so...

At the end of the day, the syscall slowpath code calls a bunch of
functions depending on what TIF_XYZ flags are set.  As long as it's
structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
like that, it's comprehensible.  But once random functions with no
explicit flag checks or comments start showing up, it gets confusing.

If it's indeed all-or-nothing, I could remove the check and add a
comment.  But please keep in mind that, currently, the slow path is
*slow*, and my patches only improve the entry case.  So enabling
context tracking on every task will hurt.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-30 17:23         ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-30 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 9:43 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
>> On 07/28, Andy Lutomirski wrote:
>> >
>> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
>> >  {
>> >     long ret = 0;
>> >
>> > -   user_exit();
>> > +   /*
>> > +    * If TIF_NOHZ is set, we are required to call user_exit() before
>> > +    * doing anything that could touch RCU.
>> > +    */
>> > +   if (test_thread_flag(TIF_NOHZ))
>> > +           user_exit();
>>
>> Personally I still think this change just adds more confusion, but I leave
>> this to you and Frederic.
>>
>> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
>> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
>> the implementation detail which triggers this slow path.
>>
>> At least it should be correct, unless I am confused even more than I think.
>
> Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
> this is only about tracing? syscall_slowpath_enter() could be an alternative.
> But that's still tracing in a general sense so...

At the end of the day, the syscall slowpath code calls a bunch of
functions depending on what TIF_XYZ flags are set.  As long as it's
structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
like that, it's comprehensible.  But once random functions with no
explicit flag checks or comments start showing up, it gets confusing.

If it's indeed all-or-nothing, I could remove the check and add a
comment.  But please keep in mind that, currently, the slow path is
*slow*, and my patches only improve the entry case.  So enabling
context tracking on every task will hurt.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-30 16:59         ` Frederic Weisbecker
  (?)
@ 2014-07-30 17:25           ` Andy Lutomirski
  -1 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-30 17:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Oleg Nesterov, H. Peter Anvin, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On Wed, Jul 30, 2014 at 9:59 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Jul 29, 2014 at 04:30:58PM -0700, Andy Lutomirski wrote:
>> On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>> >>
>> >> Andy, to avoid the confusion: I am not trying to review this changes.
>> >> As you probably know my understanding of asm code in entry.S is very
>> >> limited.
>> >>
>> >> Just a couple of questions to ensure I understand this correctly.
>> >>
>> >> On 07/28, Andy Lutomirski wrote:
>> >> >
>> >> > This is both a cleanup and a speedup.  It reduces overhead due to
>> >> > installing a trivial seccomp filter by 87%.  The speedup comes from
>> >> > avoiding the full syscall tracing mechanism for filters that don't
>> >> > return SECCOMP_RET_TRACE.
>> >>
>> >> And only after I look at 5/5 I _seem_ to actually understand where
>> >> this speedup comes from.
>> >>
>> >> So. Currently tracesys: path always lead to "iret" after syscall, with
>> >> this change we can avoid it if phase_1() returns zero, correct?
>> >>
>> >> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
>> >> cool.
>> >>
>> >> I am wondering if we can do something similar with do_notify_resume() ?
>> >>
>> >>
>> >> Stupid question. To simplify, lets forget that syscall_trace_enter()
>> >> already returns the value. Can't we simplify the asm code if we do
>> >> not export 2 functions, but make syscall_trace_enter() return
>> >> "bool slow_path_is_needed". So that "tracesys:" could do
>> >>
>> >>         // pseudo code
>> >>
>> >> tracesys:
>> >>         SAVE_REST
>> >>         FIXUP_TOP_OF_STACK
>> >>
>> >>         call syscall_trace_enter
>> >>
>> >>         if (!slow_path_is_needed) {
>> >>                 addq REST_SKIP, %rsp
>> >>                 jmp system_call_fastpath
>> >>         }
>> >>
>> >>         ...
>> >>
>> >> ?
>> >>
>> >> Once again, I am just curious, it is not that I actually suggest to consider
>> >> this option.
>> >
>> > We could, but this would lose a decent amount of the speedup.  I could
>> > try it and benchmark it, but I'm guessing that the save and restore is
>> > kind of expensive.  This will make audit slower than it currently is,
>> > which may also annoy some people.  (Not me.)
>> >
>> > I'm also not convinced that it would be much simpler.  My code is currently:
>> >
>> > tracesys:
>> >     leaq -REST_SKIP(%rsp), %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter_phase1
>> >     test %rax, %rax
>> >     jnz tracesys_phase2        /* if needed, run the slow path */
>> >     LOAD_ARGS 0            /* else restore clobbered regs */
>> >     jmp system_call_fastpath    /*      and return to the fast path */
>> >
>> > tracesys_phase2:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     movq %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     movq %rax,%rdx
>> >     call syscall_trace_enter_phase2
>> >
>> >     LOAD_ARGS ARGOFFSET, 1
>> >     RESTORE_REST
>> >
>> >     ... slow path here ...
>> >
>> > It would end up looking more like (totally untested):
>> >
>> > tracesys:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     mov %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter
>> >     LOAD_ARGS
>> >     RESTORE_REST
>> >     test [whatever condition]
>> >     j[cond] system_call_fastpath
>> >
>> >     ... slow path here ...
>> >
>> > So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
>> > multiple syscall path distinction.
>> >
>> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
>> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
>> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
>> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
>> > suspect that the difference is much larger on platforms like arm64,
>> > but that's a separate issue.)
>>
>> To put some more options on the table: there's an argument to be made
>> that the whole fast-path/slow-path split isn't worth it.  We could
>> unconditionally set up a full frame for all syscalls.  This means:
>>
>> No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
>> SS, CS, RCX, and EFLAGS right away.  That's five stores for all
>> syscalls instead of two loads and five stores for syscalls that need
>> it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.
>>
>> No more bugs involving C code that assumes a full stack frame when no
>> such frame exists inside syscall code.
>>
>> We could possibly remove a whole bunch of duplicated code.
>>
>> The upshot would be simpler code, faster slow-path syscalls, and
>> slower fast-path syscalls (but probably not much slower).  On the
>> other hand, there's zero chance that this would be ready for 3.17.
>
> Indeed.
>
> If we ever take that direction (ie: remove that partial frame optimization),
> there is that common part that we can find in many archs when they return
> from syscalls, exceptions or interrupts which could be more or less consolidated as:
>
> void sysret_check(struct pt_regs *regs)
> {
>           while(test_thread_flag(TIF_ALLWORK_MASK)) {
>               int resched = need_resched();
>
>               local_irq_enable();
>               if (resched)
>                   schedule();
>               else
>                   do_notify_resume(regs);
>               local_irq_disable()
>            }
> }
>
> But well, probably the syscall fastpath is still worth it.

And yet x86_64 has this code implemented in assembly even in the
slowpath.  Go figure.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-30 17:25           ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-30 17:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Oleg Nesterov, H. Peter Anvin, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On Wed, Jul 30, 2014 at 9:59 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Jul 29, 2014 at 04:30:58PM -0700, Andy Lutomirski wrote:
>> On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>> >>
>> >> Andy, to avoid the confusion: I am not trying to review this changes.
>> >> As you probably know my understanding of asm code in entry.S is very
>> >> limited.
>> >>
>> >> Just a couple of questions to ensure I understand this correctly.
>> >>
>> >> On 07/28, Andy Lutomirski wrote:
>> >> >
>> >> > This is both a cleanup and a speedup.  It reduces overhead due to
>> >> > installing a trivial seccomp filter by 87%.  The speedup comes from
>> >> > avoiding the full syscall tracing mechanism for filters that don't
>> >> > return SECCOMP_RET_TRACE.
>> >>
>> >> And only after I look at 5/5 I _seem_ to actually understand where
>> >> this speedup comes from.
>> >>
>> >> So. Currently tracesys: path always lead to "iret" after syscall, with
>> >> this change we can avoid it if phase_1() returns zero, correct?
>> >>
>> >> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
>> >> cool.
>> >>
>> >> I am wondering if we can do something similar with do_notify_resume() ?
>> >>
>> >>
>> >> Stupid question. To simplify, lets forget that syscall_trace_enter()
>> >> already returns the value. Can't we simplify the asm code if we do
>> >> not export 2 functions, but make syscall_trace_enter() return
>> >> "bool slow_path_is_needed". So that "tracesys:" could do
>> >>
>> >>         // pseudo code
>> >>
>> >> tracesys:
>> >>         SAVE_REST
>> >>         FIXUP_TOP_OF_STACK
>> >>
>> >>         call syscall_trace_enter
>> >>
>> >>         if (!slow_path_is_needed) {
>> >>                 addq REST_SKIP, %rsp
>> >>                 jmp system_call_fastpath
>> >>         }
>> >>
>> >>         ...
>> >>
>> >> ?
>> >>
>> >> Once again, I am just curious, it is not that I actually suggest to consider
>> >> this option.
>> >
>> > We could, but this would lose a decent amount of the speedup.  I could
>> > try it and benchmark it, but I'm guessing that the save and restore is
>> > kind of expensive.  This will make audit slower than it currently is,
>> > which may also annoy some people.  (Not me.)
>> >
>> > I'm also not convinced that it would be much simpler.  My code is currently:
>> >
>> > tracesys:
>> >     leaq -REST_SKIP(%rsp), %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter_phase1
>> >     test %rax, %rax
>> >     jnz tracesys_phase2        /* if needed, run the slow path */
>> >     LOAD_ARGS 0            /* else restore clobbered regs */
>> >     jmp system_call_fastpath    /*      and return to the fast path */
>> >
>> > tracesys_phase2:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     movq %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     movq %rax,%rdx
>> >     call syscall_trace_enter_phase2
>> >
>> >     LOAD_ARGS ARGOFFSET, 1
>> >     RESTORE_REST
>> >
>> >     ... slow path here ...
>> >
>> > It would end up looking more like (totally untested):
>> >
>> > tracesys:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     mov %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter
>> >     LOAD_ARGS
>> >     RESTORE_REST
>> >     test [whatever condition]
>> >     j[cond] system_call_fastpath
>> >
>> >     ... slow path here ...
>> >
>> > So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
>> > multiple syscall path distinction.
>> >
>> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
>> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
>> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
>> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
>> > suspect that the difference is much larger on platforms like arm64,
>> > but that's a separate issue.)
>>
>> To put some more options on the table: there's an argument to be made
>> that the whole fast-path/slow-path split isn't worth it.  We could
>> unconditionally set up a full frame for all syscalls.  This means:
>>
>> No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
>> SS, CS, RCX, and EFLAGS right away.  That's five stores for all
>> syscalls instead of two loads and five stores for syscalls that need
>> it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.
>>
>> No more bugs involving C code that assumes a full stack frame when no
>> such frame exists inside syscall code.
>>
>> We could possibly remove a whole bunch of duplicated code.
>>
>> The upshot would be simpler code, faster slow-path syscalls, and
>> slower fast-path syscalls (but probably not much slower).  On the
>> other hand, there's zero chance that this would be ready for 3.17.
>
> Indeed.
>
> If we ever take that direction (ie: remove that partial frame optimization),
> there is that common part that we can find in many archs when they return
> from syscalls, exceptions or interrupts which could be more or less consolidated as:
>
> void sysret_check(struct pt_regs *regs)
> {
>           while(test_thread_flag(TIF_ALLWORK_MASK)) {
>               int resched = need_resched();
>
>               local_irq_enable();
>               if (resched)
>                   schedule();
>               else
>                   do_notify_resume(regs);
>               local_irq_disable()
>            }
> }
>
> But well, probably the syscall fastpath is still worth it.

And yet x86_64 has this code implemented in assembly even in the
slowpath.  Go figure.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-30 17:25           ` Andy Lutomirski
  0 siblings, 0 replies; 62+ messages in thread
From: Andy Lutomirski @ 2014-07-30 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 9:59 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Jul 29, 2014 at 04:30:58PM -0700, Andy Lutomirski wrote:
>> On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>> >>
>> >> Andy, to avoid the confusion: I am not trying to review this changes.
>> >> As you probably know my understanding of asm code in entry.S is very
>> >> limited.
>> >>
>> >> Just a couple of questions to ensure I understand this correctly.
>> >>
>> >> On 07/28, Andy Lutomirski wrote:
>> >> >
>> >> > This is both a cleanup and a speedup.  It reduces overhead due to
>> >> > installing a trivial seccomp filter by 87%.  The speedup comes from
>> >> > avoiding the full syscall tracing mechanism for filters that don't
>> >> > return SECCOMP_RET_TRACE.
>> >>
>> >> And only after I look at 5/5 I _seem_ to actually understand where
>> >> this speedup comes from.
>> >>
>> >> So. Currently tracesys: path always lead to "iret" after syscall, with
>> >> this change we can avoid it if phase_1() returns zero, correct?
>> >>
>> >> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
>> >> cool.
>> >>
>> >> I am wondering if we can do something similar with do_notify_resume() ?
>> >>
>> >>
>> >> Stupid question. To simplify, lets forget that syscall_trace_enter()
>> >> already returns the value. Can't we simplify the asm code if we do
>> >> not export 2 functions, but make syscall_trace_enter() return
>> >> "bool slow_path_is_needed". So that "tracesys:" could do
>> >>
>> >>         // pseudo code
>> >>
>> >> tracesys:
>> >>         SAVE_REST
>> >>         FIXUP_TOP_OF_STACK
>> >>
>> >>         call syscall_trace_enter
>> >>
>> >>         if (!slow_path_is_needed) {
>> >>                 addq REST_SKIP, %rsp
>> >>                 jmp system_call_fastpath
>> >>         }
>> >>
>> >>         ...
>> >>
>> >> ?
>> >>
>> >> Once again, I am just curious, it is not that I actually suggest to consider
>> >> this option.
>> >
>> > We could, but this would lose a decent amount of the speedup.  I could
>> > try it and benchmark it, but I'm guessing that the save and restore is
>> > kind of expensive.  This will make audit slower than it currently is,
>> > which may also annoy some people.  (Not me.)
>> >
>> > I'm also not convinced that it would be much simpler.  My code is currently:
>> >
>> > tracesys:
>> >     leaq -REST_SKIP(%rsp), %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter_phase1
>> >     test %rax, %rax
>> >     jnz tracesys_phase2        /* if needed, run the slow path */
>> >     LOAD_ARGS 0            /* else restore clobbered regs */
>> >     jmp system_call_fastpath    /*      and return to the fast path */
>> >
>> > tracesys_phase2:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     movq %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     movq %rax,%rdx
>> >     call syscall_trace_enter_phase2
>> >
>> >     LOAD_ARGS ARGOFFSET, 1
>> >     RESTORE_REST
>> >
>> >     ... slow path here ...
>> >
>> > It would end up looking more like (totally untested):
>> >
>> > tracesys:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     mov %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter
>> >     LOAD_ARGS
>> >     RESTORE_REST
>> >     test [whatever condition]
>> >     j[cond] system_call_fastpath
>> >
>> >     ... slow path here ...
>> >
>> > So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
>> > multiple syscall path distinction.
>> >
>> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
>> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
>> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
>> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
>> > suspect that the difference is much larger on platforms like arm64,
>> > but that's a separate issue.)
>>
>> To put some more options on the table: there's an argument to be made
>> that the whole fast-path/slow-path split isn't worth it.  We could
>> unconditionally set up a full frame for all syscalls.  This means:
>>
>> No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
>> SS, CS, RCX, and EFLAGS right away.  That's five stores for all
>> syscalls instead of two loads and five stores for syscalls that need
>> it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.
>>
>> No more bugs involving C code that assumes a full stack frame when no
>> such frame exists inside syscall code.
>>
>> We could possibly remove a whole bunch of duplicated code.
>>
>> The upshot would be simpler code, faster slow-path syscalls, and
>> slower fast-path syscalls (but probably not much slower).  On the
>> other hand, there's zero chance that this would be ready for 3.17.
>
> Indeed.
>
> If we ever take that direction (ie: remove that partial frame optimization),
> there is that common part that we can find in many archs when they return
> from syscalls, exceptions or interrupts which could be more or less consolidated as:
>
> void sysret_check(struct pt_regs *regs)
> {
>           while(test_thread_flag(TIF_ALLWORK_MASK)) {
>               int resched = need_resched();
>
>               local_irq_enable();
>               if (resched)
>                   schedule();
>               else
>                   do_notify_resume(regs);
>               local_irq_disable()
>            }
> }
>
> But well, probably the syscall fastpath is still worth it.

And yet x86_64 has this code implemented in assembly even in the
slowpath.  Go figure.

--Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-30 17:23         ` Andy Lutomirski
  (?)
@ 2014-07-31 15:16           ` Frederic Weisbecker
  -1 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 15:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> On Wed, Jul 30, 2014 at 9:43 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
> >> On 07/28, Andy Lutomirski wrote:
> >> >
> >> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
> >> >  {
> >> >     long ret = 0;
> >> >
> >> > -   user_exit();
> >> > +   /*
> >> > +    * If TIF_NOHZ is set, we are required to call user_exit() before
> >> > +    * doing anything that could touch RCU.
> >> > +    */
> >> > +   if (test_thread_flag(TIF_NOHZ))
> >> > +           user_exit();
> >>
> >> Personally I still think this change just adds more confusion, but I leave
> >> this to you and Frederic.
> >>
> >> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
> >> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
> >> the implementation detail which triggers this slow path.
> >>
> >> At least it should be correct, unless I am confused even more than I think.
> >
> > Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
> > this is only about tracing? syscall_slowpath_enter() could be an alternative.
> > But that's still tracing in a general sense so...
> 
> At the end of the day, the syscall slowpath code calls a bunch of
> functions depending on what TIF_XYZ flags are set.  As long as it's
> structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> like that, it's comprehensible.  But once random functions with no
> explicit flag checks or comments start showing up, it gets confusing.

Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.

> 
> If it's indeed all-or-nothing, I could remove the check and add a
> comment.  But please keep in mind that, currently, the slow path is
> *slow*, and my patches only improve the entry case.  So enabling
> context tracking on every task will hurt.

That's what we do anyway. I haven't found a safe way to enabled context tracking
without tracking all CPUs.

> 
> --Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 15:16           ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 15:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> On Wed, Jul 30, 2014 at 9:43 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
> >> On 07/28, Andy Lutomirski wrote:
> >> >
> >> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
> >> >  {
> >> >     long ret = 0;
> >> >
> >> > -   user_exit();
> >> > +   /*
> >> > +    * If TIF_NOHZ is set, we are required to call user_exit() before
> >> > +    * doing anything that could touch RCU.
> >> > +    */
> >> > +   if (test_thread_flag(TIF_NOHZ))
> >> > +           user_exit();
> >>
> >> Personally I still think this change just adds more confusion, but I leave
> >> this to you and Frederic.
> >>
> >> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
> >> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
> >> the implementation detail which triggers this slow path.
> >>
> >> At least it should be correct, unless I am confused even more than I think.
> >
> > Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
> > this is only about tracing? syscall_slowpath_enter() could be an alternative.
> > But that's still tracing in a general sense so...
> 
> At the end of the day, the syscall slowpath code calls a bunch of
> functions depending on what TIF_XYZ flags are set.  As long as it's
> structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> like that, it's comprehensible.  But once random functions with no
> explicit flag checks or comments start showing up, it gets confusing.

Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.

> 
> If it's indeed all-or-nothing, I could remove the check and add a
> comment.  But please keep in mind that, currently, the slow path is
> *slow*, and my patches only improve the entry case.  So enabling
> context tracking on every task will hurt.

That's what we do anyway. I haven't found a safe way to enabled context tracking
without tracking all CPUs.

> 
> --Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 15:16           ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 15:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> On Wed, Jul 30, 2014 at 9:43 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Tue, Jul 29, 2014 at 09:32:32PM +0200, Oleg Nesterov wrote:
> >> On 07/28, Andy Lutomirski wrote:
> >> >
> >> > @@ -1449,7 +1449,12 @@ long syscall_trace_enter(struct pt_regs *regs)
> >> >  {
> >> >     long ret = 0;
> >> >
> >> > -   user_exit();
> >> > +   /*
> >> > +    * If TIF_NOHZ is set, we are required to call user_exit() before
> >> > +    * doing anything that could touch RCU.
> >> > +    */
> >> > +   if (test_thread_flag(TIF_NOHZ))
> >> > +           user_exit();
> >>
> >> Personally I still think this change just adds more confusion, but I leave
> >> this to you and Frederic.
> >>
> >> It is not that "If TIF_NOHZ is set, we are required to call user_exit()", we
> >> need to call user_exit() just because we enter the kernel. TIF_NOHZ is just
> >> the implementation detail which triggers this slow path.
> >>
> >> At least it should be correct, unless I am confused even more than I think.
> >
> > Agreed, Perhaps the confusion is on the syscall_trace_enter() name which suggests
> > this is only about tracing? syscall_slowpath_enter() could be an alternative.
> > But that's still tracing in a general sense so...
> 
> At the end of the day, the syscall slowpath code calls a bunch of
> functions depending on what TIF_XYZ flags are set.  As long as it's
> structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> like that, it's comprehensible.  But once random functions with no
> explicit flag checks or comments start showing up, it gets confusing.

Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.

> 
> If it's indeed all-or-nothing, I could remove the check and add a
> comment.  But please keep in mind that, currently, the slow path is
> *slow*, and my patches only improve the entry case.  So enabling
> context tracking on every task will hurt.

That's what we do anyway. I haven't found a safe way to enabled context tracking
without tracking all CPUs.

> 
> --Andy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-31 15:16           ` Frederic Weisbecker
  (?)
@ 2014-07-31 16:42             ` Oleg Nesterov
  -1 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On 07/31, Frederic Weisbecker wrote:
>
> On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> >
> > At the end of the day, the syscall slowpath code calls a bunch of
> > functions depending on what TIF_XYZ flags are set.  As long as it's
> > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > like that, it's comprehensible.  But once random functions with no
> > explicit flag checks or comments start showing up, it gets confusing.
>
> Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.

And in my opinion

	if (work & TIF_XYZ)
		user_exit();

looks even more confusing. Because, once again, TIF_XYZ is not the
reason to call user_exit().

Not to mention this adds a minor performance penalty.

> > If it's indeed all-or-nothing, I could remove the check and add a
> > comment.  But please keep in mind that, currently, the slow path is
> > *slow*, and my patches only improve the entry case.  So enabling
> > context tracking on every task will hurt.
>
> That's what we do anyway. I haven't found a safe way to enabled context tracking
> without tracking all CPUs.

And if we change this, then the code above becomes racy. The state of
TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
but still this adds more confusion.

I feel that TIF_XYZ must die. But yes, yes, I know that it is very simple
to say this. And no, so far I do not know how we can improve this all.


But again, again, I won't insist. Just another "can't resist" email,
please ignore.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:42             ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On 07/31, Frederic Weisbecker wrote:
>
> On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> >
> > At the end of the day, the syscall slowpath code calls a bunch of
> > functions depending on what TIF_XYZ flags are set.  As long as it's
> > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > like that, it's comprehensible.  But once random functions with no
> > explicit flag checks or comments start showing up, it gets confusing.
>
> Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.

And in my opinion

	if (work & TIF_XYZ)
		user_exit();

looks even more confusing. Because, once again, TIF_XYZ is not the
reason to call user_exit().

Not to mention this adds a minor performance penalty.

> > If it's indeed all-or-nothing, I could remove the check and add a
> > comment.  But please keep in mind that, currently, the slow path is
> > *slow*, and my patches only improve the entry case.  So enabling
> > context tracking on every task will hurt.
>
> That's what we do anyway. I haven't found a safe way to enabled context tracking
> without tracking all CPUs.

And if we change this, then the code above becomes racy. The state of
TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
but still this adds more confusion.

I feel that TIF_XYZ must die. But yes, yes, I know that it is very simple
to say this. And no, so far I do not know how we can improve this all.


But again, again, I won't insist. Just another "can't resist" email,
please ignore.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:42             ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/31, Frederic Weisbecker wrote:
>
> On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> >
> > At the end of the day, the syscall slowpath code calls a bunch of
> > functions depending on what TIF_XYZ flags are set.  As long as it's
> > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > like that, it's comprehensible.  But once random functions with no
> > explicit flag checks or comments start showing up, it gets confusing.
>
> Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.

And in my opinion

	if (work & TIF_XYZ)
		user_exit();

looks even more confusing. Because, once again, TIF_XYZ is not the
reason to call user_exit().

Not to mention this adds a minor performance penalty.

> > If it's indeed all-or-nothing, I could remove the check and add a
> > comment.  But please keep in mind that, currently, the slow path is
> > *slow*, and my patches only improve the entry case.  So enabling
> > context tracking on every task will hurt.
>
> That's what we do anyway. I haven't found a safe way to enabled context tracking
> without tracking all CPUs.

And if we change this, then the code above becomes racy. The state of
TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
but still this adds more confusion.

I feel that TIF_XYZ must die. But yes, yes, I know that it is very simple
to say this. And no, so far I do not know how we can improve this all.

But again, again, I won't insist. Just another "can't resist" email,
please ignore.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-31 16:42             ` Oleg Nesterov
  (?)
@ 2014-07-31 16:49               ` Frederic Weisbecker
  -1 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 16:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > >
> > > At the end of the day, the syscall slowpath code calls a bunch of
> > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > like that, it's comprehensible.  But once random functions with no
> > > explicit flag checks or comments start showing up, it gets confusing.
> >
> > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> 
> And in my opinion
> 
> 	if (work & TIF_XYZ)
> 		user_exit();
> 
> looks even more confusing. Because, once again, TIF_XYZ is not the
> reason to call user_exit().
> 
> Not to mention this adds a minor performance penalty.

That's a point too! You guys both convinced me! ;-)

> 
> > > If it's indeed all-or-nothing, I could remove the check and add a
> > > comment.  But please keep in mind that, currently, the slow path is
> > > *slow*, and my patches only improve the entry case.  So enabling
> > > context tracking on every task will hurt.
> >
> > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > without tracking all CPUs.
> 
> And if we change this, then the code above becomes racy. The state of
> TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> but still this adds more confusion.

No because all running tasks have this flag set when context tracking is
enabled. And context tracking can't be disabled on runtime.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:49               ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 16:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > >
> > > At the end of the day, the syscall slowpath code calls a bunch of
> > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > like that, it's comprehensible.  But once random functions with no
> > > explicit flag checks or comments start showing up, it gets confusing.
> >
> > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> 
> And in my opinion
> 
> 	if (work & TIF_XYZ)
> 		user_exit();
> 
> looks even more confusing. Because, once again, TIF_XYZ is not the
> reason to call user_exit().
> 
> Not to mention this adds a minor performance penalty.

That's a point too! You guys both convinced me! ;-)

> 
> > > If it's indeed all-or-nothing, I could remove the check and add a
> > > comment.  But please keep in mind that, currently, the slow path is
> > > *slow*, and my patches only improve the entry case.  So enabling
> > > context tracking on every task will hurt.
> >
> > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > without tracking all CPUs.
> 
> And if we change this, then the code above becomes racy. The state of
> TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> but still this adds more confusion.

No because all running tasks have this flag set when context tracking is
enabled. And context tracking can't be disabled on runtime.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:49               ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > >
> > > At the end of the day, the syscall slowpath code calls a bunch of
> > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > like that, it's comprehensible.  But once random functions with no
> > > explicit flag checks or comments start showing up, it gets confusing.
> >
> > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> 
> And in my opinion
> 
> 	if (work & TIF_XYZ)
> 		user_exit();
> 
> looks even more confusing. Because, once again, TIF_XYZ is not the
> reason to call user_exit().
> 
> Not to mention this adds a minor performance penalty.

That's a point too! You guys both convinced me! ;-)

> 
> > > If it's indeed all-or-nothing, I could remove the check and add a
> > > comment.  But please keep in mind that, currently, the slow path is
> > > *slow*, and my patches only improve the entry case.  So enabling
> > > context tracking on every task will hurt.
> >
> > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > without tracking all CPUs.
> 
> And if we change this, then the code above becomes racy. The state of
> TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> but still this adds more confusion.

No because all running tasks have this flag set when context tracking is
enabled. And context tracking can't be disabled on runtime.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-31 16:49               ` Frederic Weisbecker
  (?)
@ 2014-07-31 16:54                 ` Oleg Nesterov
  -1 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On 07/31, Frederic Weisbecker wrote:
>
> On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > On 07/31, Frederic Weisbecker wrote:
> > >
> > > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > > >
> > > > At the end of the day, the syscall slowpath code calls a bunch of
> > > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > > like that, it's comprehensible.  But once random functions with no
> > > > explicit flag checks or comments start showing up, it gets confusing.
> > >
> > > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> >
> > And in my opinion
> >
> > 	if (work & TIF_XYZ)
> > 		user_exit();
> >
> > looks even more confusing. Because, once again, TIF_XYZ is not the
> > reason to call user_exit().
> >
> > Not to mention this adds a minor performance penalty.
>
> That's a point too! You guys both convinced me! ;-)

Very nice, now I know that you can agree with 2 opposite opinions at
the same time ;)

> > > > If it's indeed all-or-nothing, I could remove the check and add a
> > > > comment.  But please keep in mind that, currently, the slow path is
> > > > *slow*, and my patches only improve the entry case.  So enabling
> > > > context tracking on every task will hurt.
> > >
> > > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > > without tracking all CPUs.
> >
> > And if we change this, then the code above becomes racy. The state of
> > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > but still this adds more confusion.
>
> No because all running tasks have this flag set when context tracking is
> enabled. And context tracking can't be disabled on runtime.

Yes, yes, please note that I said "if we change this".

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:54                 ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On 07/31, Frederic Weisbecker wrote:
>
> On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > On 07/31, Frederic Weisbecker wrote:
> > >
> > > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > > >
> > > > At the end of the day, the syscall slowpath code calls a bunch of
> > > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > > like that, it's comprehensible.  But once random functions with no
> > > > explicit flag checks or comments start showing up, it gets confusing.
> > >
> > > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> >
> > And in my opinion
> >
> > 	if (work & TIF_XYZ)
> > 		user_exit();
> >
> > looks even more confusing. Because, once again, TIF_XYZ is not the
> > reason to call user_exit().
> >
> > Not to mention this adds a minor performance penalty.
>
> That's a point too! You guys both convinced me! ;-)

Very nice, now I know that you can agree with 2 opposite opinions at
the same time ;)

> > > > If it's indeed all-or-nothing, I could remove the check and add a
> > > > comment.  But please keep in mind that, currently, the slow path is
> > > > *slow*, and my patches only improve the entry case.  So enabling
> > > > context tracking on every task will hurt.
> > >
> > > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > > without tracking all CPUs.
> >
> > And if we change this, then the code above becomes racy. The state of
> > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > but still this adds more confusion.
>
> No because all running tasks have this flag set when context tracking is
> enabled. And context tracking can't be disabled on runtime.

Yes, yes, please note that I said "if we change this".

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:54                 ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/31, Frederic Weisbecker wrote:
>
> On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > On 07/31, Frederic Weisbecker wrote:
> > >
> > > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > > >
> > > > At the end of the day, the syscall slowpath code calls a bunch of
> > > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > > like that, it's comprehensible.  But once random functions with no
> > > > explicit flag checks or comments start showing up, it gets confusing.
> > >
> > > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> >
> > And in my opinion
> >
> > 	if (work & TIF_XYZ)
> > 		user_exit();
> >
> > looks even more confusing. Because, once again, TIF_XYZ is not the
> > reason to call user_exit().
> >
> > Not to mention this adds a minor performance penalty.
>
> That's a point too! You guys both convinced me! ;-)

Very nice, now I know that you can agree with 2 opposite opinions at
the same time ;)

> > > > If it's indeed all-or-nothing, I could remove the check and add a
> > > > comment.  But please keep in mind that, currently, the slow path is
> > > > *slow*, and my patches only improve the entry case.  So enabling
> > > > context tracking on every task will hurt.
> > >
> > > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > > without tracking all CPUs.
> >
> > And if we change this, then the code above becomes racy. The state of
> > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > but still this adds more confusion.
>
> No because all running tasks have this flag set when context tracking is
> enabled. And context tracking can't be disabled on runtime.

Yes, yes, please note that I said "if we change this".

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-30 17:25           ` Andy Lutomirski
  (?)
@ 2014-07-31 16:56             ` H. Peter Anvin
  -1 siblings, 0 replies; 62+ messages in thread
From: H. Peter Anvin @ 2014-07-31 16:56 UTC (permalink / raw)
  To: Andy Lutomirski, Frederic Weisbecker
  Cc: Oleg Nesterov, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On 07/30/2014 10:25 AM, Andy Lutomirski wrote:
> 
> And yet x86_64 has this code implemented in assembly even in the
> slowpath.  Go figure.
> 

There is way too much assembly in entry_64.S probably because things
have been grafted on, ahem, "organically".  It is darn nigh impossible
to even remotely figure out what goes on in that file.

	-hpa



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-31 16:56             ` H. Peter Anvin
  0 siblings, 0 replies; 62+ messages in thread
From: H. Peter Anvin @ 2014-07-31 16:56 UTC (permalink / raw)
  To: Andy Lutomirski, Frederic Weisbecker
  Cc: Oleg Nesterov, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On 07/30/2014 10:25 AM, Andy Lutomirski wrote:
> 
> And yet x86_64 has this code implemented in assembly even in the
> slowpath.  Go figure.
> 

There is way too much assembly in entry_64.S probably because things
have been grafted on, ahem, "organically".  It is darn nigh impossible
to even remotely figure out what goes on in that file.

	-hpa

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-31 16:56             ` H. Peter Anvin
  0 siblings, 0 replies; 62+ messages in thread
From: H. Peter Anvin @ 2014-07-31 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/30/2014 10:25 AM, Andy Lutomirski wrote:
> 
> And yet x86_64 has this code implemented in assembly even in the
> slowpath.  Go figure.
> 

There is way too much assembly in entry_64.S probably because things
have been grafted on, ahem, "organically".  It is darn nigh impossible
to even remotely figure out what goes on in that file.

	-hpa

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-31 16:54                 ` Oleg Nesterov
  (?)
@ 2014-07-31 16:58                   ` Oleg Nesterov
  -1 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On 07/31, Oleg Nesterov wrote:
>
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > >
> > > And if we change this, then the code above becomes racy. The state of
> > > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > > but still this adds more confusion.
> >
> > No because all running tasks have this flag set when context tracking is
> > enabled. And context tracking can't be disabled on runtime.
>
> Yes, yes, please note that I said "if we change this".

And "it is racy anyway" connects to the problems we discuss in another
thread.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:58                   ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On 07/31, Oleg Nesterov wrote:
>
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > >
> > > And if we change this, then the code above becomes racy. The state of
> > > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > > but still this adds more confusion.
> >
> > No because all running tasks have this flag set when context tracking is
> > enabled. And context tracking can't be disabled on runtime.
>
> Yes, yes, please note that I said "if we change this".

And "it is racy anyway" connects to the problems we discuss in another
thread.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 16:58                   ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2014-07-31 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/31, Oleg Nesterov wrote:
>
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > >
> > > And if we change this, then the code above becomes racy. The state of
> > > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > > but still this adds more confusion.
> >
> > No because all running tasks have this flag set when context tracking is
> > enabled. And context tracking can't be disabled on runtime.
>
> Yes, yes, please note that I said "if we change this".

And "it is racy anyway" connects to the problems we discuss in another
thread.

Oleg.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
  2014-07-31 16:54                 ` Oleg Nesterov
  (?)
@ 2014-07-31 17:17                   ` Frederic Weisbecker
  -1 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 17:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Thu, Jul 31, 2014 at 06:54:23PM +0200, Oleg Nesterov wrote:
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > > On 07/31, Frederic Weisbecker wrote:
> > > >
> > > > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > > > >
> > > > > At the end of the day, the syscall slowpath code calls a bunch of
> > > > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > > > like that, it's comprehensible.  But once random functions with no
> > > > > explicit flag checks or comments start showing up, it gets confusing.
> > > >
> > > > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> > >
> > > And in my opinion
> > >
> > > 	if (work & TIF_XYZ)
> > > 		user_exit();
> > >
> > > looks even more confusing. Because, once again, TIF_XYZ is not the
> > > reason to call user_exit().
> > >
> > > Not to mention this adds a minor performance penalty.
> >
> > That's a point too! You guys both convinced me! ;-)
> 
> Very nice, now I know that you can agree with 2 opposite opinions at
> the same time ;)

That's what an Acked-by from Schroedinger would look like!

> 
> > > > > If it's indeed all-or-nothing, I could remove the check and add a
> > > > > comment.  But please keep in mind that, currently, the slow path is
> > > > > *slow*, and my patches only improve the entry case.  So enabling
> > > > > context tracking on every task will hurt.
> > > >
> > > > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > > > without tracking all CPUs.
> > >
> > > And if we change this, then the code above becomes racy. The state of
> > > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > > but still this adds more confusion.
> >
> > No because all running tasks have this flag set when context tracking is
> > enabled. And context tracking can't be disabled on runtime.
> 
> Yes, yes, please note that I said "if we change this".

Yeah but the NO_HZ test wouldn't change much the situation I think.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 17:17                   ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 17:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, linux-kernel, Kees Cook, Will Drewry, X86 ML,
	linux-arm-kernel, Linux MIPS Mailing List, linux-arch, LSM List,
	Alexei Starovoitov, H. Peter Anvin

On Thu, Jul 31, 2014 at 06:54:23PM +0200, Oleg Nesterov wrote:
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > > On 07/31, Frederic Weisbecker wrote:
> > > >
> > > > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > > > >
> > > > > At the end of the day, the syscall slowpath code calls a bunch of
> > > > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > > > like that, it's comprehensible.  But once random functions with no
> > > > > explicit flag checks or comments start showing up, it gets confusing.
> > > >
> > > > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> > >
> > > And in my opinion
> > >
> > > 	if (work & TIF_XYZ)
> > > 		user_exit();
> > >
> > > looks even more confusing. Because, once again, TIF_XYZ is not the
> > > reason to call user_exit().
> > >
> > > Not to mention this adds a minor performance penalty.
> >
> > That's a point too! You guys both convinced me! ;-)
> 
> Very nice, now I know that you can agree with 2 opposite opinions at
> the same time ;)

That's what an Acked-by from Schroedinger would look like!

> 
> > > > > If it's indeed all-or-nothing, I could remove the check and add a
> > > > > comment.  But please keep in mind that, currently, the slow path is
> > > > > *slow*, and my patches only improve the entry case.  So enabling
> > > > > context tracking on every task will hurt.
> > > >
> > > > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > > > without tracking all CPUs.
> > >
> > > And if we change this, then the code above becomes racy. The state of
> > > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > > but still this adds more confusion.
> >
> > No because all running tasks have this flag set when context tracking is
> > enabled. And context tracking can't be disabled on runtime.
> 
> Yes, yes, please note that I said "if we change this".

Yeah but the NO_HZ test wouldn't change much the situation I think.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ
@ 2014-07-31 17:17                   ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 17:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 31, 2014 at 06:54:23PM +0200, Oleg Nesterov wrote:
> On 07/31, Frederic Weisbecker wrote:
> >
> > On Thu, Jul 31, 2014 at 06:42:46PM +0200, Oleg Nesterov wrote:
> > > On 07/31, Frederic Weisbecker wrote:
> > > >
> > > > On Wed, Jul 30, 2014 at 10:23:34AM -0700, Andy Lutomirski wrote:
> > > > >
> > > > > At the end of the day, the syscall slowpath code calls a bunch of
> > > > > functions depending on what TIF_XYZ flags are set.  As long as it's
> > > > > structured like "if (TIF_A) do_a(); if (TIF_B) do_b();" or something
> > > > > like that, it's comprehensible.  But once random functions with no
> > > > > explicit flag checks or comments start showing up, it gets confusing.
> > > >
> > > > Yeah that's a point. I don't mind much the TIF_NOHZ test if you like.
> > >
> > > And in my opinion
> > >
> > > 	if (work & TIF_XYZ)
> > > 		user_exit();
> > >
> > > looks even more confusing. Because, once again, TIF_XYZ is not the
> > > reason to call user_exit().
> > >
> > > Not to mention this adds a minor performance penalty.
> >
> > That's a point too! You guys both convinced me! ;-)
> 
> Very nice, now I know that you can agree with 2 opposite opinions at
> the same time ;)

That's what an Acked-by from Schroedinger would look like!

> 
> > > > > If it's indeed all-or-nothing, I could remove the check and add a
> > > > > comment.  But please keep in mind that, currently, the slow path is
> > > > > *slow*, and my patches only improve the entry case.  So enabling
> > > > > context tracking on every task will hurt.
> > > >
> > > > That's what we do anyway. I haven't found a safe way to enabled context tracking
> > > > without tracking all CPUs.
> > >
> > > And if we change this, then the code above becomes racy. The state of
> > > TIF_XYZ can be changed right after the check. OK, it is racy anyway ;)
> > > but still this adds more confusion.
> >
> > No because all running tasks have this flag set when context tracking is
> > enabled. And context tracking can't be disabled on runtime.
> 
> Yes, yes, please note that I said "if we change this".

Yeah but the NO_HZ test wouldn't change much the situation I think.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
  2014-07-31 16:56             ` H. Peter Anvin
  (?)
@ 2014-07-31 17:20               ` Frederic Weisbecker
  -1 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 17:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Oleg Nesterov, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On Thu, Jul 31, 2014 at 09:56:48AM -0700, H. Peter Anvin wrote:
> On 07/30/2014 10:25 AM, Andy Lutomirski wrote:
> > 
> > And yet x86_64 has this code implemented in assembly even in the
> > slowpath.  Go figure.
> > 
> 
> There is way too much assembly in entry_64.S probably because things
> have been grafted on, ahem, "organically".  It is darn nigh impossible
> to even remotely figure out what goes on in that file.

Always warn your family and give them an estimate return hour before opening that file.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-31 17:20               ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 17:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Oleg Nesterov, linux-arch, X86 ML, LSM List,
	Linux MIPS Mailing List, linux-arm-kernel, Alexei Starovoitov,
	Will Drewry, Kees Cook, linux-kernel

On Thu, Jul 31, 2014 at 09:56:48AM -0700, H. Peter Anvin wrote:
> On 07/30/2014 10:25 AM, Andy Lutomirski wrote:
> > 
> > And yet x86_64 has this code implemented in assembly even in the
> > slowpath.  Go figure.
> > 
> 
> There is way too much assembly in entry_64.S probably because things
> have been grafted on, ahem, "organically".  It is darn nigh impossible
> to even remotely figure out what goes on in that file.

Always warn your family and give them an estimate return hour before opening that file.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath
@ 2014-07-31 17:20               ` Frederic Weisbecker
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Weisbecker @ 2014-07-31 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 31, 2014 at 09:56:48AM -0700, H. Peter Anvin wrote:
> On 07/30/2014 10:25 AM, Andy Lutomirski wrote:
> > 
> > And yet x86_64 has this code implemented in assembly even in the
> > slowpath.  Go figure.
> > 
> 
> There is way too much assembly in entry_64.S probably because things
> have been grafted on, ahem, "organically".  It is darn nigh impossible
> to even remotely figure out what goes on in that file.

Always warn your family and give them an estimate return hour before opening that file.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2014-07-31 17:20 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-29  3:38 [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath Andy Lutomirski
2014-07-29  3:38 ` Andy Lutomirski
2014-07-29  3:38 ` [PATCH v4 1/5] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit Andy Lutomirski
2014-07-29  3:38   ` Andy Lutomirski
2014-07-29  3:38 ` [PATCH v4 2/5] x86,entry: Only call user_exit if TIF_NOHZ Andy Lutomirski
2014-07-29  3:38   ` Andy Lutomirski
2014-07-29 19:32   ` Oleg Nesterov
2014-07-29 19:32     ` Oleg Nesterov
2014-07-30 16:43     ` Frederic Weisbecker
2014-07-30 16:43       ` Frederic Weisbecker
2014-07-30 17:23       ` Andy Lutomirski
2014-07-30 17:23         ` Andy Lutomirski
2014-07-30 17:23         ` Andy Lutomirski
2014-07-31 15:16         ` Frederic Weisbecker
2014-07-31 15:16           ` Frederic Weisbecker
2014-07-31 15:16           ` Frederic Weisbecker
2014-07-31 16:42           ` Oleg Nesterov
2014-07-31 16:42             ` Oleg Nesterov
2014-07-31 16:42             ` Oleg Nesterov
2014-07-31 16:49             ` Frederic Weisbecker
2014-07-31 16:49               ` Frederic Weisbecker
2014-07-31 16:49               ` Frederic Weisbecker
2014-07-31 16:54               ` Oleg Nesterov
2014-07-31 16:54                 ` Oleg Nesterov
2014-07-31 16:54                 ` Oleg Nesterov
2014-07-31 16:58                 ` Oleg Nesterov
2014-07-31 16:58                   ` Oleg Nesterov
2014-07-31 16:58                   ` Oleg Nesterov
2014-07-31 17:17                 ` Frederic Weisbecker
2014-07-31 17:17                   ` Frederic Weisbecker
2014-07-31 17:17                   ` Frederic Weisbecker
2014-07-29  3:38 ` [PATCH v4 3/5] x86: Split syscall_trace_enter into two phases Andy Lutomirski
2014-07-29  3:38   ` Andy Lutomirski
2014-07-29 19:25   ` Oleg Nesterov
2014-07-29 19:25     ` Oleg Nesterov
2014-07-29  3:38 ` [PATCH v4 4/5] x86_64,entry: Treat regs->ax the same in fastpath and slowpath syscalls Andy Lutomirski
2014-07-29  3:38   ` [PATCH v4 4/5] x86_64, entry: " Andy Lutomirski
2014-07-29  3:38 ` [PATCH v4 5/5] x86_64,entry: Use split-phase syscall_trace_enter for 64-bit syscalls Andy Lutomirski
2014-07-29  3:38   ` [PATCH v4 5/5] x86_64, entry: " Andy Lutomirski
2014-07-29 19:20 ` [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath Oleg Nesterov
2014-07-29 19:20   ` Oleg Nesterov
2014-07-29 20:54   ` Andy Lutomirski
2014-07-29 20:54     ` Andy Lutomirski
2014-07-29 20:54     ` Andy Lutomirski
2014-07-29 23:30     ` Andy Lutomirski
2014-07-29 23:30       ` Andy Lutomirski
2014-07-29 23:30       ` Andy Lutomirski
2014-07-30 15:32       ` Oleg Nesterov
2014-07-30 15:32         ` Oleg Nesterov
2014-07-30 15:32         ` Oleg Nesterov
2014-07-30 16:59       ` Frederic Weisbecker
2014-07-30 16:59         ` Frederic Weisbecker
2014-07-30 16:59         ` Frederic Weisbecker
2014-07-30 17:25         ` Andy Lutomirski
2014-07-30 17:25           ` Andy Lutomirski
2014-07-30 17:25           ` Andy Lutomirski
2014-07-31 16:56           ` H. Peter Anvin
2014-07-31 16:56             ` H. Peter Anvin
2014-07-31 16:56             ` H. Peter Anvin
2014-07-31 17:20             ` Frederic Weisbecker
2014-07-31 17:20               ` Frederic Weisbecker
2014-07-31 17:20               ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.