linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/9] Syscall User Dispatch
@ 2020-09-04 20:31 Gabriel Krisman Bertazi
  2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
                   ` (8 more replies)
  0 siblings, 9 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Hi,

The v6 of this patch series include only the type change requested by
Andy on the vdso patch, but since v5 included some bigger changes, I'm
documenting them in this cover letter as well.

Please note this applies on top of Linus tree, and it succeeds seccomp
and syscall user dispatch selftests.

v5 cover letter
--------------

This is v5 of Syscall User Dispatch.  It has some big changes in
comparison to v4.

First of all, it allows the vdso trampoline code for architectures that
support it.  This is exposed through an arch hook.  It also addresses
the concern about what happens when a bad selector is provided, instead
of SIGSEGV, we fail with SIGSYS, which is more debug-able.

Another major change is that it is now based on top of Gleixner's common
syscall entry work, and is supposed to only be used by that code.
Therefore, the entry symbol is not exported outside of kernel/entry/ code.

The biggest change in this version is the attempt to avoid using one of
the final TIF flags on x86 32 bit, without increasing the size of that
variable to 64 bit.  My expectation is that, with this work, plus the
removal of TIF_IA32, TIF_X32 and TIF_FORCE_TF, we might be able to avoid
changing this field to 64 bits at all.  Instead, this follows the
suggestion by Andy to have a generic TIF flag for SECCOMP and this
mechanism, and use another field to decide which one is enabled.  The
code for this is not complex, so it seems like a viable approach.

Finally, this version adds some documentation to the feature.

Kees, I dropped your reviewed-by on patch 5, given the amount of
changes.

Thanks,

Previous submissions are archived at:

RFC/v1: https://lkml.org/lkml/2020/7/8/96
v2: https://lkml.org/lkml/2020/7/9/17
v3: https://lkml.org/lkml/2020/7/12/4
v4: https://www.spinics.net/lists/linux-kselftest/msg16377.html
v5: https://lkml.org/lkml/2020/8/10/1320

Gabriel Krisman Bertazi (9):
  kernel: Support TIF_SYSCALL_INTERCEPT flag
  kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code
  x86: vdso: Expose sigreturn address on vdso to the kernel
  signal: Expose SYS_USER_DISPATCH si_code type
  kernel: Implement selective syscall userspace redirection
  kernel: entry: Support Syscall User Dispatch for common syscall entry
  x86: Enable Syscall User Dispatch
  selftests: Add kselftest for syscall user dispatch
  doc: Document Syscall User Dispatch

 .../admin-guide/syscall-user-dispatch.rst     |  87 ++++++
 arch/Kconfig                                  |  21 ++
 arch/x86/Kconfig                              |   1 +
 arch/x86/entry/vdso/vdso2c.c                  |   2 +
 arch/x86/entry/vdso/vdso32/sigreturn.S        |   2 +
 arch/x86/entry/vdso/vma.c                     |  15 +
 arch/x86/include/asm/elf.h                    |   1 +
 arch/x86/include/asm/thread_info.h            |   4 +-
 arch/x86/include/asm/vdso.h                   |   2 +
 arch/x86/kernel/signal_compat.c               |   2 +-
 fs/exec.c                                     |   8 +
 include/linux/entry-common.h                  |   6 +-
 include/linux/sched.h                         |   8 +-
 include/linux/seccomp.h                       |  20 +-
 include/linux/syscall_intercept.h             |  71 +++++
 include/linux/syscall_user_dispatch.h         |  29 ++
 include/uapi/asm-generic/siginfo.h            |   3 +-
 include/uapi/linux/prctl.h                    |   5 +
 kernel/entry/Makefile                         |   1 +
 kernel/entry/common.c                         |  32 +-
 kernel/entry/common.h                         |  15 +
 kernel/entry/syscall_user_dispatch.c          | 101 ++++++
 kernel/fork.c                                 |  10 +-
 kernel/seccomp.c                              |   7 +-
 kernel/sys.c                                  |   5 +
 tools/testing/selftests/Makefile              |   1 +
 .../syscall_user_dispatch/.gitignore          |   2 +
 .../selftests/syscall_user_dispatch/Makefile  |   9 +
 .../selftests/syscall_user_dispatch/config    |   1 +
 .../syscall_user_dispatch.c                   | 292 ++++++++++++++++++
 30 files changed, 744 insertions(+), 19 deletions(-)
 create mode 100644 Documentation/admin-guide/syscall-user-dispatch.rst
 create mode 100644 include/linux/syscall_intercept.h
 create mode 100644 include/linux/syscall_user_dispatch.h
 create mode 100644 kernel/entry/common.h
 create mode 100644 kernel/entry/syscall_user_dispatch.c
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/.gitignore
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/Makefile
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/config
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/syscall_user_dispatch.c

-- 
2.28.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-07 10:16   ` Christian Brauner
                     ` (2 more replies)
  2020-09-04 20:31 ` [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code Gabriel Krisman Bertazi
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Convert TIF_SECCOMP into a generic TI flag for any syscall interception
work being done by the kernel.  The actual type of work is exposed by a
new flag field outside of thread_info.  This ensures that the
syscall_intercept field is only accessed if struct seccomp has to be
accessed already, such that it doesn't incur in a much higher cost to
the seccomp path.

In order to avoid modifying every architecture at once, this patch has a
transition mechanism, such that architectures that define TIF_SECCOMP
continue to work by ignoring the syscall_intercept flag, as long as they
don't support other syscall interception mechanisms like the future
syscall user dispatch.  When migrating TIF_SECCOMP to
TIF_SYSCALL_INTERCEPT, they should adopt the semantics of checking the
syscall_intercept flag, like it is done in the common entry syscall
code, or even better, migrate to the common syscall entry code.

This was tested by running the selftests for seccomp.  No regressions
were observed, and all tests passed (with and without this patch).

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/sched.h             |  6 ++-
 include/linux/seccomp.h           | 20 ++++++++-
 include/linux/syscall_intercept.h | 70 +++++++++++++++++++++++++++++++
 kernel/fork.c                     | 10 ++++-
 kernel/seccomp.c                  |  7 ++--
 5 files changed, 106 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/syscall_intercept.h

diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e232935..3511c98a7849 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -959,7 +959,11 @@ struct task_struct {
 	kuid_t				loginuid;
 	unsigned int			sessionid;
 #endif
-	struct seccomp			seccomp;
+
+	struct {
+		unsigned int			syscall_intercept;
+		struct seccomp			seccomp;
+	};
 
 	/* Thread group tracking: */
 	u64				parent_exec_id;
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 02aef2844c38..027dc462cea9 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -20,6 +20,24 @@
 #include <linux/atomic.h>
 #include <asm/seccomp.h>
 
+/*
+ * Some transitional defines to avoid migrating every architecture code
+ * at once.
+ */
+
+#if defined(TIF_SECCOMP) && defined(TIF_SYSCALL_INTERCEPT)
+# error "TIF_SYSCALL_INTERCEPT and TIF_SECCOMP can't be defined at the same time"
+#endif
+
+/*
+ * If the arch has not transitioned to TIF_SYSCALL_INTERCEPT, this let
+ * seccomp work with these architectures, as long as no other syscall
+ * intercept features are meant to be supported.
+ */
+#ifdef TIF_SECCOMP
+# define TIF_SYSCALL_INTERCEPT TIF_SECCOMP
+#endif
+
 struct seccomp_filter;
 /**
  * struct seccomp - the state of a seccomp'ed process
@@ -42,7 +60,7 @@ struct seccomp {
 extern int __secure_computing(const struct seccomp_data *sd);
 static inline int secure_computing(void)
 {
-	if (unlikely(test_thread_flag(TIF_SECCOMP)))
+	if (unlikely(test_thread_flag(TIF_SYSCALL_INTERCEPT)))
 		return  __secure_computing(NULL);
 	return 0;
 }
diff --git a/include/linux/syscall_intercept.h b/include/linux/syscall_intercept.h
new file mode 100644
index 000000000000..725d157699da
--- /dev/null
+++ b/include/linux/syscall_intercept.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 Collabora Ltd.
+ */
+#ifndef _SYSCALL_INTERCEPT_H
+#define _SYSCALL_INTERCEPT_H
+
+#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/thread_info.h>
+
+#define SYSINT_SECCOMP		0x1
+
+#ifdef TIF_SYSCALL_INTERCEPT
+
+/* seccomp (at least) can modify TIF_SYSCALL_INTERCEPT from a different
+ * thread, which means it can race with itself or with
+ * syscall_user_dispatch. Therefore, TIF_SYSCALL_INTERCEPT and
+ * syscall_intercept are synchronized by tsk->sighand->siglock.
+ */
+
+static inline void __set_tsk_syscall_intercept(struct task_struct *tsk,
+					   unsigned int type)
+{
+	tsk->syscall_intercept |= type;
+
+	if (tsk->syscall_intercept)
+		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
+}
+
+static inline void __clear_tsk_syscall_intercept(struct task_struct *tsk,
+					     unsigned int type)
+{
+	tsk->syscall_intercept &= ~type;
+
+	if (tsk->syscall_intercept == 0)
+		clear_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
+}
+
+static inline void set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
+{
+	spin_lock_irq(&tsk->sighand->siglock);
+	__set_tsk_syscall_intercept(tsk, type);
+	spin_unlock_irq(&tsk->sighand->siglock);
+}
+
+static inline void clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
+{
+	spin_lock_irq(&tsk->sighand->siglock);
+	__clear_tsk_syscall_intercept(tsk, type);
+	spin_unlock_irq(&tsk->sighand->siglock);
+}
+
+#else
+static inline void __set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
+{
+}
+static inline void set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
+{
+}
+static inline void __clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
+{
+}
+static inline void clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
+{
+}
+#endif
+
+#endif
+
diff --git a/kernel/fork.c b/kernel/fork.c
index 4d32190861bd..a39177bed8ea 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -49,7 +49,7 @@
 #include <linux/cgroup.h>
 #include <linux/security.h>
 #include <linux/hugetlb.h>
-#include <linux/seccomp.h>
+#include <linux/syscall_intercept.h>
 #include <linux/swap.h>
 #include <linux/syscalls.h>
 #include <linux/jiffies.h>
@@ -898,6 +898,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	 * the usage counts on the error path calling free_task.
 	 */
 	tsk->seccomp.filter = NULL;
+	tsk->syscall_intercept = 0;
 #endif
 
 	setup_thread_stack(tsk, orig);
@@ -1620,9 +1621,14 @@ static void copy_seccomp(struct task_struct *p)
 	 * If the parent gained a seccomp mode after copying thread
 	 * flags and between before we held the sighand lock, we have
 	 * to manually enable the seccomp thread flag here.
+	 *
+	 * In addition current sighand lock is asserted, so it is safe
+	 * to use the unlocked version of set_tsk_syscall_intercept.
 	 */
 	if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
-		set_tsk_thread_flag(p, TIF_SECCOMP);
+		__set_tsk_syscall_intercept(p, SYSINT_SECCOMP);
+	else
+		__clear_tsk_syscall_intercept(p, SYSINT_SECCOMP);
 #endif
 }
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 3ee59ce0a323..d0643b500f2e 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -28,6 +28,7 @@
 #include <linux/slab.h>
 #include <linux/syscalls.h>
 #include <linux/sysctl.h>
+#include <linux/syscall_intercept.h>
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
 #include <asm/syscall.h>
@@ -352,14 +353,14 @@ static inline void seccomp_assign_mode(struct task_struct *task,
 
 	task->seccomp.mode = seccomp_mode;
 	/*
-	 * Make sure TIF_SECCOMP cannot be set before the mode (and
+	 * Make sure SYSINT_SECCOMP cannot be set before the mode (and
 	 * filter) is set.
 	 */
 	smp_mb__before_atomic();
 	/* Assume default seccomp processes want spec flaw mitigation. */
 	if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0)
 		arch_seccomp_spec_mitigate(task);
-	set_tsk_thread_flag(task, TIF_SECCOMP);
+	__set_tsk_syscall_intercept(task, SYSINT_SECCOMP);
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
@@ -925,7 +926,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 
 	/*
 	 * Make sure that any changes to mode from another thread have
-	 * been seen after TIF_SECCOMP was seen.
+	 * been seen after SYSINT_SECCOMP was seen.
 	 */
 	rmb();
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
  2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-07 10:16   ` Christian Brauner
  2020-09-11  9:35   ` peterz
  2020-09-04 20:31 ` [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel Gabriel Krisman Bertazi
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Syscalls that use common entry code (x86 at the moment of this writing)
need to have their defines updated inside this commit.  This added a
measureable overhead of 1ns to seccomp_benchmark selftests on a
bare-metal AMD system.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/include/asm/thread_info.h |  4 ++--
 include/linux/entry-common.h       |  6 +-----
 kernel/entry/common.c              | 24 +++++++++++++++++++++---
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..cf723181e1f2 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -82,7 +82,7 @@ struct thread_info {
 #define TIF_SSBD		5	/* Speculative store bypass disable */
 #define TIF_SYSCALL_EMU		6	/* syscall emulation active */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
-#define TIF_SECCOMP		8	/* secure computing */
+#define TIF_SYSCALL_INTERCEPT	8	/* Intercept system call */
 #define TIF_SPEC_IB		9	/* Indirect branch speculation mitigation */
 #define TIF_SPEC_FORCE_UPDATE	10	/* Force speculation MSR update in context switch */
 #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
@@ -112,7 +112,7 @@ struct thread_info {
 #define _TIF_SSBD		(1 << TIF_SSBD)
 #define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
-#define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SYSCALL_INTERCEPT	(1 << TIF_SYSCALL_INTERCEPT)
 #define _TIF_SPEC_IB		(1 << TIF_SPEC_IB)
 #define _TIF_SPEC_FORCE_UPDATE	(1 << TIF_SPEC_FORCE_UPDATE)
 #define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index efebbffcd5cc..72ce9ca860c6 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -21,10 +21,6 @@
 # define _TIF_SYSCALL_TRACEPOINT	(0)
 #endif
 
-#ifndef _TIF_SECCOMP
-# define _TIF_SECCOMP			(0)
-#endif
-
 #ifndef _TIF_SYSCALL_AUDIT
 # define _TIF_SYSCALL_AUDIT		(0)
 #endif
@@ -45,7 +41,7 @@
 #endif
 
 #define SYSCALL_ENTER_WORK						\
-	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP |	\
+	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SYSCALL_INTERCEPT | \
 	 _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU |			\
 	 ARCH_SYSCALL_ENTER_WORK)
 
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index fcae019158ca..44fd089d59da 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -4,6 +4,7 @@
 #include <linux/entry-common.h>
 #include <linux/livepatch.h>
 #include <linux/audit.h>
+#include <linux/syscall_intercept.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -41,6 +42,20 @@ static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
 	}
 }
 
+static inline long do_syscall_intercept(struct pt_regs *regs)
+{
+	int sysint_work = READ_ONCE(current->syscall_intercept);
+	int ret;
+
+	if (sysint_work & SYSINT_SECCOMP) {
+		ret = __secure_computing(NULL);
+		if (ret == -1L)
+			return ret;
+	}
+
+	return 0;
+}
+
 static long syscall_trace_enter(struct pt_regs *regs, long syscall,
 				unsigned long ti_work)
 {
@@ -53,9 +68,12 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
 			return -1L;
 	}
 
-	/* Do seccomp after ptrace, to catch any tracer changes. */
-	if (ti_work & _TIF_SECCOMP) {
-		ret = __secure_computing(NULL);
+	/*
+	 * Do syscall interception like seccomp after ptrace, to catch
+	 * any tracer changes.
+	 */
+	if (ti_work & _TIF_SYSCALL_INTERCEPT) {
+		ret = do_syscall_intercept(regs);
 		if (ret == -1L)
 			return ret;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
  2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
  2020-09-04 20:31 ` [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-22 19:40   ` Kees Cook
  2020-09-04 20:31 ` [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type Gabriel Krisman Bertazi
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Syscall user redirection requires the signal trampoline code to not be
captured, in order to support returning with a locked selector while
avoiding recursion back into the signal handler.  For ia-32, which has
the trampoline in the vDSO, expose the entry points to the kernel, such
that it can avoid dispatching syscalls from that region to userspace.

Changes since V1
  - Change return address to bool (Andy)

Suggested-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/entry/vdso/vdso2c.c           |  2 ++
 arch/x86/entry/vdso/vdso32/sigreturn.S |  2 ++
 arch/x86/entry/vdso/vma.c              | 15 +++++++++++++++
 arch/x86/include/asm/elf.h             |  1 +
 arch/x86/include/asm/vdso.h            |  2 ++
 5 files changed, 22 insertions(+)

diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 7380908045c7..2d0f3d8bcc25 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -101,6 +101,8 @@ struct vdso_sym required_syms[] = {
 	{"__kernel_sigreturn", true},
 	{"__kernel_rt_sigreturn", true},
 	{"int80_landing_pad", true},
+	{"vdso32_rt_sigreturn_landing_pad", true},
+	{"vdso32_sigreturn_landing_pad", true},
 };
 
 __attribute__((format(printf, 1, 2))) __attribute__((noreturn))
diff --git a/arch/x86/entry/vdso/vdso32/sigreturn.S b/arch/x86/entry/vdso/vdso32/sigreturn.S
index c3233ee98a6b..1bd068f72d4c 100644
--- a/arch/x86/entry/vdso/vdso32/sigreturn.S
+++ b/arch/x86/entry/vdso/vdso32/sigreturn.S
@@ -18,6 +18,7 @@ __kernel_sigreturn:
 	movl $__NR_sigreturn, %eax
 	SYSCALL_ENTER_KERNEL
 .LEND_sigreturn:
+SYM_INNER_LABEL(vdso32_sigreturn_landing_pad, SYM_L_GLOBAL)
 	nop
 	.size __kernel_sigreturn,.-.LSTART_sigreturn
 
@@ -29,6 +30,7 @@ __kernel_rt_sigreturn:
 	movl $__NR_rt_sigreturn, %eax
 	SYSCALL_ENTER_KERNEL
 .LEND_rt_sigreturn:
+SYM_INNER_LABEL(vdso32_rt_sigreturn_landing_pad, SYM_L_GLOBAL)
 	nop
 	.size __kernel_rt_sigreturn,.-.LSTART_rt_sigreturn
 	.previous
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 9185cb1d13b9..3fc323d24824 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -436,6 +436,21 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 }
 #endif
 
+bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs)
+{
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+	const struct vdso_image *image = current->mm->context.vdso_image;
+	unsigned long vdso = (unsigned long) current->mm->context.vdso;
+
+	if (in_ia32_syscall() && image == &vdso_image_32) {
+		if (regs->ip == vdso + image->sym_vdso32_sigreturn_landing_pad ||
+		    regs->ip == vdso + image->sym_vdso32_rt_sigreturn_landing_pad)
+			return true;
+	}
+#endif
+	return false;
+}
+
 #ifdef CONFIG_X86_64
 static __init int vdso_setup(char *s)
 {
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index b9a5d488f1a5..eb41db289fe6 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -383,6 +383,7 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm,
 				       int uses_interp);
 extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
 					      int uses_interp);
+extern bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs);
 #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages
 
 /* Do not change the values. See get_align_mask() */
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index bbcdc7b8f963..589f489dd375 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -27,6 +27,8 @@ struct vdso_image {
 	long sym___kernel_rt_sigreturn;
 	long sym___kernel_vsyscall;
 	long sym_int80_landing_pad;
+	long sym_vdso32_sigreturn_landing_pad;
+	long sym_vdso32_rt_sigreturn_landing_pad;
 };
 
 #ifdef CONFIG_X86_64
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2020-09-04 20:31 ` [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-07 10:15   ` Christian Brauner
  2020-09-22 19:39   ` Kees Cook
  2020-09-04 20:31 ` [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

SYS_USER_DISPATCH will be triggered when a syscall is sent to userspace
by the Syscall User Dispatch mechanism.  This adjusts eventual
BUILD_BUG_ON around the tree.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/kernel/signal_compat.c    | 2 +-
 include/uapi/asm-generic/siginfo.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf0576cd0..210aecc6eab9 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -31,7 +31,7 @@ static inline void signal_compat_build_tests(void)
 	BUILD_BUG_ON(NSIGBUS  != 5);
 	BUILD_BUG_ON(NSIGTRAP != 5);
 	BUILD_BUG_ON(NSIGCHLD != 6);
-	BUILD_BUG_ON(NSIGSYS  != 1);
+	BUILD_BUG_ON(NSIGSYS  != 2);
 
 	/* This is part of the ABI and can never change in size: */
 	BUILD_BUG_ON(sizeof(compat_siginfo_t) != 128);
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index cb3d6c267181..37741908b846 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -284,7 +284,8 @@ typedef struct siginfo {
  * SIGSYS si_codes
  */
 #define SYS_SECCOMP	1	/* seccomp triggered */
-#define NSIGSYS		1
+#define SYS_USER_DISPATCH 2	/* syscall user dispatch triggered */
+#define NSIGSYS		2
 
 /*
  * SIGEMT si_codes
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2020-09-04 20:31 ` [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-05 11:24   ` Matthew Wilcox
  2020-09-11  9:44   ` peterz
  2020-09-04 20:31 ` [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel, Paul Gofman

Introduce a mechanism to quickly disable/enable syscall handling for a
specific process and redirect to userspace via SIGSYS.  This is useful
for processes with parts that require syscall redirection and parts that
don't, but who need to perform this boundary crossing really fast,
without paying the cost of a system call to reconfigure syscall handling
on each boundary transition.  This is particularly important for Windows
games running over Wine.

The proposed interface looks like this:

  prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <start_addr>, <end_addr>, [selector])

The range [<start_addr>,<end_addr>] is a part of the process memory map
that is allowed to by-pass the redirection code and dispatch syscalls
directly, such that in fast paths a process doesn't need to disable the
trap nor the kernel has to check the selector.  This is essential to
return from SIGSYS to a blocked area without triggering another SIGSYS
from rt_sigreturn.

selector is an optional pointer to a char-sized userspace memory region
that has a key switch for the mechanism. This key switch is set to
either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the
redirection without calling the kernel.

The feature is meant to be set per-thread and it is disabled on
fork/clone/execv.

Internally, this doesn't add overhead to the syscall hot path, and it
requires very little per-architecture support.  I avoided using seccomp,
even though it duplicates some functionality, due to previous feedback
that maybe it shouldn't mix with seccomp since it is not a security
mechanism.  And obviously, this should never be considered a security
mechanism, since any part of the program can by-pass it by using the
syscall dispatcher.

For the sysinfo benchmark, which measures the overhead added to
executing a native syscall that doesn't require interception, the
overhead using only the direct dispatcher region to issue syscalls is
pretty much irrelevant.  The overhead of using the selector goes around
40ns for a native (unredirected) syscall in my system, and it is (as
expected) dominated by the supervisor-mode user-address access.  In
fact, with SMAP off, the overhead is consistently less than 5ns on my
test box.

An example code using this interface can be found at:
  https://gitlab.collabora.com/krisman/syscall-disable-personality

Changes since v4:
  (Andy Lutomirski)
  - Allow sigreturn coming from vDSO
  - Exit with SIGSYS instead of SIGSEGV on bad selector
  (Thomas Gleixner)
  - Use sizeof selector in access_ok
  - Document usage of __get_user
  - Use constant for state value
  - Split out x86 parts
  - Rebase on top of Gleixner's common entry code
  - Don't expose do_syscall_user_dispatch

Changes since v3:
  - NTR.

Changes since v2:
  (Matthew Wilcox suggestions)
  - Drop __user on non-ptr type.
  - Move #define closer to similar defs
  - Allow a memory region that can dispatch directly
  (Kees Cook suggestions)
  - Improve kconfig summary line
  - Move flag cleanup on execve to begin_new_exec
  - Hint branch predictor in the syscall path
  (Me)
  - Convert selector to char

Changes since RFC:
  (Kees Cook suggestions)
  - Don't mention personality while explaining the feature
  - Use syscall_get_nr
  - Remove header guard on several places
  - Convert WARN_ON to WARN_ON_ONCE
  - Explicit check for state values
  - Rename to syscall user dispatcher

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paul Gofman <gofmanp@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-api@vger.kernel.org
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/Kconfig                          |  21 ++++++
 fs/exec.c                             |   8 ++
 include/linux/sched.h                 |   2 +
 include/linux/syscall_intercept.h     |   1 +
 include/linux/syscall_user_dispatch.h |  29 ++++++++
 include/uapi/linux/prctl.h            |   5 ++
 kernel/entry/Makefile                 |   1 +
 kernel/entry/common.h                 |  15 ++++
 kernel/entry/syscall_user_dispatch.c  | 101 ++++++++++++++++++++++++++
 kernel/sys.c                          |   5 ++
 10 files changed, 188 insertions(+)
 create mode 100644 include/linux/syscall_user_dispatch.h
 create mode 100644 kernel/entry/common.h
 create mode 100644 kernel/entry/syscall_user_dispatch.c

diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493..6e022c5de020 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -468,6 +468,27 @@ config SECCOMP_FILTER
 
 	  See Documentation/userspace-api/seccomp_filter.rst for details.
 
+config HAVE_ARCH_SYSCALL_USER_DISPATCH
+	bool
+	help
+	  An arch should select this symbol if it provides all of these things:
+	  - TIF_SYSCALL_INTERCEPT
+	  - syscall_get_arch
+	  - syscall_rollback
+	  - syscall_get_nr
+	  - SIGSYS siginfo_t support
+	  - arch_syscall_is_vdso_sigreturn
+
+config SYSCALL_USER_DISPATCH
+	bool "Support syscall redirection to userspace dispatcher"
+	depends on HAVE_ARCH_SYSCALL_USER_DISPATCH
+	help
+	  Enable tasks to ask the kernel to redirect syscalls not
+	  issued from a predefined dispatcher back to userspace,
+	  depending on a userspace memory selector.
+
+	  This option is useful to optimize games running over Wine.
+
 config HAVE_ARCH_STACKLEAK
 	bool
 	help
diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaa..9cdf827151c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -62,6 +62,7 @@
 #include <linux/oom.h>
 #include <linux/compat.h>
 #include <linux/vmalloc.h>
+#include <linux/syscall_intercept.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1418,6 +1419,13 @@ int begin_new_exec(struct linux_binprm * bprm)
 	flush_thread();
 	me->personality &= ~bprm->per_clear;
 
+	/*
+	 * Prevent Syscall User Dispatch from crossing application
+	 * boundaries.  sighand is already unshared, so it is safe to
+	 * use the unlocked version here.
+	 */
+	__clear_tsk_syscall_intercept(me, SYSINT_USER_DISPATCH);
+
 	/*
 	 * We have to apply CLOEXEC before we change whether the process is
 	 * dumpable (in setup_new_exec) to avoid a race with a process in userspace
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3511c98a7849..47ce09589072 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -34,6 +34,7 @@
 #include <linux/rseq.h>
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
+#include <linux/syscall_user_dispatch.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -963,6 +964,7 @@ struct task_struct {
 	struct {
 		unsigned int			syscall_intercept;
 		struct seccomp			seccomp;
+		struct syscall_user_dispatch	syscall_dispatch;
 	};
 
 	/* Thread group tracking: */
diff --git a/include/linux/syscall_intercept.h b/include/linux/syscall_intercept.h
index 725d157699da..21bc2eb668f3 100644
--- a/include/linux/syscall_intercept.h
+++ b/include/linux/syscall_intercept.h
@@ -10,6 +10,7 @@
 #include <linux/thread_info.h>
 
 #define SYSINT_SECCOMP		0x1
+#define SYSINT_USER_DISPATCH	0x2
 
 #ifdef TIF_SYSCALL_INTERCEPT
 
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
new file mode 100644
index 000000000000..f831358bfaab
--- /dev/null
+++ b/include/linux/syscall_user_dispatch.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 Collabora Ltd.
+ */
+#ifndef _SYSCALL_USER_DISPATCH_H
+#define _SYSCALL_USER_DISPATCH_H
+
+#ifdef CONFIG_SYSCALL_USER_DISPATCH
+struct syscall_user_dispatch {
+	char __user *selector;
+	unsigned long dispatcher_start;
+	unsigned long dispatcher_end;
+};
+
+int set_syscall_user_dispatch(int mode, unsigned long dispatcher_start,
+			      unsigned long dispatcher_end,
+			      char __user *selector);
+#else
+struct syscall_user_dispatch {};
+
+static inline int set_syscall_user_dispatch(int mode, unsigned long dispatcher_start,
+					    unsigned long dispatcher_end,
+					    char __user *selector)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SYSCALL_USER_DISPATCH */
+
+#endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..96265246383d 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,9 @@ struct prctl_mm_map {
 #define PR_SET_IO_FLUSHER		57
 #define PR_GET_IO_FLUSHER		58
 
+/* Dispatch syscalls to a userspace handler */
+#define PR_SET_SYSCALL_USER_DISPATCH	59
+# define PR_SYS_DISPATCH_OFF		0
+# define PR_SYS_DISPATCH_ON		1
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/entry/Makefile b/kernel/entry/Makefile
index 34c8a3f1c735..81182ab5a40c 100644
--- a/kernel/entry/Makefile
+++ b/kernel/entry/Makefile
@@ -11,3 +11,4 @@ CFLAGS_common.o		+= -fno-stack-protector
 
 obj-$(CONFIG_GENERIC_ENTRY) 		+= common.o
 obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK)	+= kvm.o
+obj-$(CONFIG_SYSCALL_USER_DISPATCH)	+= syscall_user_dispatch.o
diff --git a/kernel/entry/common.h b/kernel/entry/common.h
new file mode 100644
index 000000000000..557ecaa3fb31
--- /dev/null
+++ b/kernel/entry/common.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _COMMON_H
+#define _COMMON_H
+
+#ifdef CONFIG_SYSCALL_USER_DISPATCH
+int do_syscall_user_dispatch(struct pt_regs *regs);
+#else
+static inline int do_syscall_user_dispatch(struct pt_regs *regs)
+{
+	WARN_ON_ONCE(1);
+	return 0;
+}
+#endif /* CONFIG_SYSCALL_USER_DISPATCH */
+
+#endif
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
new file mode 100644
index 000000000000..12ea01711dc7
--- /dev/null
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Collabora Ltd.
+ */
+#include <linux/sched.h>
+#include <linux/prctl.h>
+#include <linux/syscall_intercept.h>
+#include <linux/syscall_user_dispatch.h>
+#include <linux/uaccess.h>
+#include <linux/signal.h>
+#include <linux/elf.h>
+
+#include <asm/syscall.h>
+
+#include <linux/sched/signal.h>
+#include <linux/sched/task_stack.h>
+
+static void trigger_sigsys(struct pt_regs *regs)
+{
+	struct kernel_siginfo info;
+
+	clear_siginfo(&info);
+	info.si_signo = SIGSYS;
+	info.si_code = SYS_USER_DISPATCH;
+	info.si_call_addr = (void __user *)KSTK_EIP(current);
+	info.si_errno = 0;
+	info.si_arch = syscall_get_arch(current);
+	info.si_syscall = syscall_get_nr(current, regs);
+
+	force_sig_info(&info);
+}
+
+int do_syscall_user_dispatch(struct pt_regs *regs)
+{
+	struct syscall_user_dispatch *sd = &current->syscall_dispatch;
+	unsigned long ip = instruction_pointer(regs);
+	char state;
+
+	if (likely(ip >= sd->dispatcher_start && ip <= sd->dispatcher_end))
+		return 0;
+
+	if (unlikely(arch_syscall_is_vdso_sigreturn(regs)))
+		return 0;
+
+	if (likely(sd->selector)) {
+		/*
+		 * access_ok() is performed once, at prctl time, when
+		 * the selector is loaded by userspace.
+		 */
+		if (unlikely(__get_user(state, sd->selector)))
+			do_exit(SIGSEGV);
+
+		if (likely(state == PR_SYS_DISPATCH_OFF))
+			return 0;
+
+		if (state != PR_SYS_DISPATCH_ON)
+			do_exit(SIGSYS);
+	}
+
+	syscall_rollback(current, regs);
+	trigger_sigsys(regs);
+
+	return 1;
+}
+
+int set_syscall_user_dispatch(int mode, unsigned long dispatcher_start,
+			      unsigned long dispatcher_end, char __user *selector)
+{
+	switch (mode) {
+	case PR_SYS_DISPATCH_OFF:
+		if (dispatcher_start || dispatcher_end || selector)
+			return -EINVAL;
+		break;
+	case PR_SYS_DISPATCH_ON:
+		/*
+		 * Validate the direct dispatcher region just for basic
+		 * sanity.  If the user is able to submit a syscall from
+		 * an address, that address is obviously valid.
+		 */
+		if (dispatcher_end < dispatcher_start)
+			return -EINVAL;
+
+		if (selector && !access_ok(selector, sizeof(*selector)))
+			return -EFAULT;
+
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	current->syscall_dispatch.selector = selector;
+	current->syscall_dispatch.dispatcher_start = dispatcher_start;
+	current->syscall_dispatch.dispatcher_end = dispatcher_end;
+
+	if (mode == PR_SYS_DISPATCH_ON)
+		set_tsk_syscall_intercept(current, SYSINT_USER_DISPATCH);
+	else
+		clear_tsk_syscall_intercept(current, SYSINT_USER_DISPATCH);
+
+	return 0;
+}
diff --git a/kernel/sys.c b/kernel/sys.c
index ab6c409b1159..2786dca862b9 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -42,6 +42,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/version.h>
 #include <linux/ctype.h>
+#include <linux/syscall_user_dispatch.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -2530,6 +2531,10 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 
 		error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
 		break;
+	case PR_SET_SYSCALL_USER_DISPATCH:
+		error = set_syscall_user_dispatch((int) arg2, arg3, arg4,
+						  (char __user *) arg5);
+		break;
 	default:
 		error = -EINVAL;
 		break;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
                   ` (4 preceding siblings ...)
  2020-09-04 20:31 ` [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-07 10:15   ` Christian Brauner
  2020-09-11  9:46   ` peterz
  2020-09-04 20:31 ` [PATCH v6 7/9] x86: Enable Syscall User Dispatch Gabriel Krisman Bertazi
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Syscall User Dispatch (SUD) must take precedence over seccomp, since the
use case is emulation (it can be invoked with a different ABI) such that
seccomp filtering by syscall number doesn't make sense in the first
place.  In addition, either the syscall is dispatched back to userspace,
in which case there is no resource for seccomp to protect, or the
syscall will be executed, and seccomp will execute next.

Regarding ptrace, I experimented with before and after, and while the
same ABI argument applies, I felt it was easier to debug if I let ptrace
happen for syscalls that are dispatched back to userspace.  In addition,
doing it after ptrace makes the code in syscall_exit_work slightly
simpler, since it doesn't require special handling for this feature.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 kernel/entry/common.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 44fd089d59da..fdb0c543539d 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -6,6 +6,8 @@
 #include <linux/audit.h>
 #include <linux/syscall_intercept.h>
 
+#include "common.h"
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
 
@@ -47,6 +49,12 @@ static inline long do_syscall_intercept(struct pt_regs *regs)
 	int sysint_work = READ_ONCE(current->syscall_intercept);
 	int ret;
 
+	if (sysint_work & SYSINT_USER_DISPATCH) {
+		ret = do_syscall_user_dispatch(regs);
+		if (ret == -1L)
+			return ret;
+	}
+
 	if (sysint_work & SYSINT_SECCOMP) {
 		ret = __secure_computing(NULL);
 		if (ret == -1L)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 7/9] x86: Enable Syscall User Dispatch
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
                   ` (5 preceding siblings ...)
  2020-09-04 20:31 ` [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-22 19:37   ` Kees Cook
  2020-09-04 20:31 ` [PATCH v6 8/9] selftests: Add kselftest for syscall user dispatch Gabriel Krisman Bertazi
  2020-09-04 20:31 ` [PATCH v6 9/9] doc: Document Syscall User Dispatch Gabriel Krisman Bertazi
  8 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Syscall User Dispatch requirements are fully supported in x86. This
patch flips the switch, marking it as supported.  This was tested
against Syscall User Dispatch selftest.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7101ac64bb20..56ac8de99021 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -150,6 +150,7 @@ config X86
 	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
 	select HAVE_ARCH_PREL32_RELOCATIONS
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_SYSCALL_USER_DISPATCH
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_STACKLEAK
 	select HAVE_ARCH_TRACEHOOK
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 8/9] selftests: Add kselftest for syscall user dispatch
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
                   ` (6 preceding siblings ...)
  2020-09-04 20:31 ` [PATCH v6 7/9] x86: Enable Syscall User Dispatch Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-22 19:35   ` Kees Cook
  2020-09-04 20:31 ` [PATCH v6 9/9] doc: Document Syscall User Dispatch Gabriel Krisman Bertazi
  8 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Implement functionality tests for syscall user dispatch.  In order to
make the test portable, refrain from open coding syscall dispatchers and
calculating glibc memory ranges.

Changes since v4:
  - Update bad selector test to reflect change in API

Changes since v3:
  - Sort entry in Makefile
  - Add SPDX header
  - Use __NR_syscalls if available

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 tools/testing/selftests/Makefile              |   1 +
 .../syscall_user_dispatch/.gitignore          |   2 +
 .../selftests/syscall_user_dispatch/Makefile  |   9 +
 .../selftests/syscall_user_dispatch/config    |   1 +
 .../syscall_user_dispatch.c                   | 292 ++++++++++++++++++
 5 files changed, 305 insertions(+)
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/.gitignore
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/Makefile
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/config
 create mode 100644 tools/testing/selftests/syscall_user_dispatch/syscall_user_dispatch.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 9018f45d631d..34ab8579e22f 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -56,6 +56,7 @@ TARGETS += sparc64
 TARGETS += splice
 TARGETS += static_keys
 TARGETS += sync
+TARGETS += syscall_user_dispatch
 TARGETS += sysctl
 TARGETS += tc-testing
 TARGETS += timens
diff --git a/tools/testing/selftests/syscall_user_dispatch/.gitignore b/tools/testing/selftests/syscall_user_dispatch/.gitignore
new file mode 100644
index 000000000000..637f08107add
--- /dev/null
+++ b/tools/testing/selftests/syscall_user_dispatch/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+syscall_user_dispatch
diff --git a/tools/testing/selftests/syscall_user_dispatch/Makefile b/tools/testing/selftests/syscall_user_dispatch/Makefile
new file mode 100644
index 000000000000..eeb07a791057
--- /dev/null
+++ b/tools/testing/selftests/syscall_user_dispatch/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+top_srcdir = ../../../..
+INSTALL_HDR_PATH = $(top_srcdir)/usr
+LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/
+
+CFLAGS += -Wall -I$(LINUX_HDR_PATH)
+
+TEST_GEN_PROGS := syscall_user_dispatch
+include ../lib.mk
diff --git a/tools/testing/selftests/syscall_user_dispatch/config b/tools/testing/selftests/syscall_user_dispatch/config
new file mode 100644
index 000000000000..22c4dfe167ca
--- /dev/null
+++ b/tools/testing/selftests/syscall_user_dispatch/config
@@ -0,0 +1 @@
+CONFIG_SYSCALL_USER_DISPATCH=y
diff --git a/tools/testing/selftests/syscall_user_dispatch/syscall_user_dispatch.c b/tools/testing/selftests/syscall_user_dispatch/syscall_user_dispatch.c
new file mode 100644
index 000000000000..885b5125bd90
--- /dev/null
+++ b/tools/testing/selftests/syscall_user_dispatch/syscall_user_dispatch.c
@@ -0,0 +1,292 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2020 Collabora Ltd.
+ *
+ * Test code for syscall user dispatch
+ */
+
+#define _GNU_SOURCE
+#include <sys/prctl.h>
+#include <sys/sysinfo.h>
+#include <sys/syscall.h>
+#include <signal.h>
+
+#include <asm/unistd.h>
+#include "../kselftest_harness.h"
+
+#ifndef PR_SET_SYSCALL_USER_DISPATCH
+# define PR_SET_SYSCALL_USER_DISPATCH	59
+# define PR_SYS_DISPATCH_OFF	0
+# define PR_SYS_DISPATCH_ON	1
+#endif
+
+#ifndef SYS_USER_DISPATCH
+# define SYS_USER_DISPATCH	2
+#endif
+
+#ifdef __NR_syscalls
+# define MAGIC_SYSCALL_1 (__NR_syscalls + 1) /* Bad Linux syscall number */
+#else
+# define MAGIC_SYSCALL_1 (0xff00)  /* Bad Linux syscall number */
+#endif
+
+#define SYSCALL_DISPATCH_ON(x) ((x) = 1)
+#define SYSCALL_DISPATCH_OFF(x) ((x) = 0)
+
+/* Test Summary:
+ *
+ * - dispatch_trigger_sigsys: Verify if PR_SET_SYSCALL_USER_DISPATCH is
+ *   able to trigger SIGSYS on a syscall.
+ *
+ * - bad_selector: Test that a bad selector value triggers SIGSYS with
+ *   si_errno EINVAL.
+ *
+ * - bad_prctl_param: Test that the API correctly rejects invalid
+ *   parameters on prctl
+ *
+ * - dispatch_and_return: Test that a syscall is selectively dispatched
+ *   to userspace depending on the value of selector.
+ *
+ * - disable_dispatch: Test that the PR_SYS_DISPATCH_OFF correctly
+ *   disables the dispatcher
+ *
+ * - direct_dispatch_range: Test that a syscall within the allowed range
+ *   can bypass the dispatcher.
+ */
+
+TEST_SIGNAL(dispatch_trigger_sigsys, SIGSYS)
+{
+	char sel = 0;
+	struct sysinfo info;
+	int ret;
+
+	ret = sysinfo(&info);
+	ASSERT_EQ(0, ret);
+
+	ret = prctl(PR_SET_SYSCALL_USER_DISPATCH, PR_SYS_DISPATCH_ON, 0, 0, &sel);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support CONFIG_SYSCALL_USER_DISPATCH");
+	}
+
+	SYSCALL_DISPATCH_ON(sel);
+
+	sysinfo(&info);
+
+	EXPECT_FALSE(true) {
+		TH_LOG("Unreachable!");
+	}
+}
+
+TEST(bad_prctl_param)
+{
+	char sel = 0;
+	int op;
+
+	/* Invalid op */
+	op = -1;
+	prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0, 0, &sel);
+	ASSERT_EQ(EINVAL, errno);
+
+	/* PR_SYS_DISPATCH_OFF */
+	op = PR_SYS_DISPATCH_OFF;
+
+	/* start_addr != 0 */
+	prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x1, 0xff, 0);
+	EXPECT_EQ(EINVAL, errno);
+
+	/* end_addr != 0 */
+	prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x0, 0xff, 0);
+	EXPECT_EQ(EINVAL, errno);
+
+	/* sel != NULL */
+	prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x0, 0x0, &sel);
+	EXPECT_EQ(EINVAL, errno);
+
+	/* Valid parameter */
+	errno = 0;
+	prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x0, 0x0, 0x0);
+	EXPECT_EQ(0, errno);
+
+	/* PR_SYS_DISPATCH_ON */
+	op = PR_SYS_DISPATCH_ON;
+
+	/* start_addr > end_addr */
+	prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x1, 0x0, &sel);
+	EXPECT_EQ(EINVAL, errno);
+
+	/* Invalid selector */
+	prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x0, 0x1, (void *) -1);
+	ASSERT_EQ(EFAULT, errno);
+}
+
+/*
+ * Use global selector for handle_sigsys tests, to avoid passing
+ * selector to signal handler
+ */
+char glob_sel;
+int nr_syscalls_emulated;
+int si_code;
+int si_errno;
+
+static void handle_sigsys(int sig, siginfo_t *info, void *ucontext)
+{
+	si_code = info->si_code;
+	si_errno = info->si_errno;
+
+	if (info->si_syscall == MAGIC_SYSCALL_1)
+		nr_syscalls_emulated++;
+
+	/* In preparation for sigreturn. */
+	SYSCALL_DISPATCH_OFF(glob_sel);
+}
+
+TEST(dispatch_and_return)
+{
+	long ret;
+	struct sigaction act;
+	sigset_t mask;
+
+	glob_sel = 0;
+	nr_syscalls_emulated = 0;
+	si_code = 0;
+	si_errno = 0;
+
+	memset(&act, 0, sizeof(act));
+	sigemptyset(&mask);
+
+	act.sa_sigaction = handle_sigsys;
+	act.sa_flags = SA_SIGINFO;
+	act.sa_mask = mask;
+
+	ret = sigaction(SIGSYS, &act, NULL);
+	ASSERT_EQ(0, ret);
+
+	/* Make sure selector is good prior to prctl. */
+	SYSCALL_DISPATCH_OFF(glob_sel);
+
+	ret = prctl(PR_SET_SYSCALL_USER_DISPATCH, PR_SYS_DISPATCH_ON, 0, 0, &glob_sel);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support CONFIG_SYSCALL_USER_DISPATCH");
+	}
+
+	/* MAGIC_SYSCALL_1 doesn't exist. */
+	SYSCALL_DISPATCH_OFF(glob_sel);
+	ret = syscall(MAGIC_SYSCALL_1);
+	EXPECT_EQ(-1, ret) {
+		TH_LOG("Dispatch triggered unexpectedly");
+	}
+
+	/* MAGIC_SYSCALL_1 should be emulated. */
+	nr_syscalls_emulated = 0;
+	SYSCALL_DISPATCH_ON(glob_sel);
+
+	ret = syscall(MAGIC_SYSCALL_1);
+	EXPECT_EQ(MAGIC_SYSCALL_1, ret) {
+		TH_LOG("Failed to intercept syscall");
+	}
+	EXPECT_EQ(1, nr_syscalls_emulated) {
+		TH_LOG("Failed to emulate syscall");
+	}
+	ASSERT_EQ(SYS_USER_DISPATCH, si_code) {
+		TH_LOG("Bad si_code in SIGSYS");
+	}
+	ASSERT_EQ(0, si_errno) {
+		TH_LOG("Bad si_errno in SIGSYS");
+	}
+}
+
+TEST_SIGNAL(bad_selector, SIGSYS)
+{
+	long ret;
+	struct sigaction act;
+	sigset_t mask;
+	struct sysinfo info;
+
+	glob_sel = 0;
+	nr_syscalls_emulated = 0;
+	si_code = 0;
+	si_errno = 0;
+
+	memset(&act, 0, sizeof(act));
+	sigemptyset(&mask);
+
+	act.sa_sigaction = handle_sigsys;
+	act.sa_flags = SA_SIGINFO;
+	act.sa_mask = mask;
+
+	ret = sigaction(SIGSYS, &act, NULL);
+	ASSERT_EQ(0, ret);
+
+	/* Make sure selector is good prior to prctl. */
+	SYSCALL_DISPATCH_OFF(glob_sel);
+
+	ret = prctl(PR_SET_SYSCALL_USER_DISPATCH, PR_SYS_DISPATCH_ON, 0, 0, &glob_sel);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support CONFIG_SYSCALL_USER_DISPATCH");
+	}
+
+	glob_sel = -1;
+
+	sysinfo(&info);
+
+	/* Even though it is ready to catch SIGSYS, the signal is
+	 * supposed to be uncatchable.
+	 */
+
+	EXPECT_FALSE(true) {
+		TH_LOG("Unreachable!");
+	}
+}
+
+TEST(disable_dispatch)
+{
+	int ret;
+	struct sysinfo info;
+	char sel = 0;
+
+	ret = prctl(PR_SET_SYSCALL_USER_DISPATCH, PR_SYS_DISPATCH_ON, 0, 0, &sel);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support CONFIG_SYSCALL_USER_DISPATCH");
+	}
+
+	/* MAGIC_SYSCALL_1 doesn't exist. */
+	SYSCALL_DISPATCH_OFF(glob_sel);
+
+	ret = prctl(PR_SET_SYSCALL_USER_DISPATCH, PR_SYS_DISPATCH_OFF, 0, 0, 0);
+	EXPECT_EQ(0, ret) {
+		TH_LOG("Failed to unset syscall user dispatch");
+	}
+
+	/* Shouldn't have any effect... */
+	SYSCALL_DISPATCH_ON(glob_sel);
+
+	ret = syscall(__NR_sysinfo, &info);
+	EXPECT_EQ(0, ret) {
+		TH_LOG("Dispatch triggered unexpectedly");
+	}
+}
+
+TEST(direct_dispatch_range)
+{
+	int ret = 0;
+	struct sysinfo info;
+	char sel = 0;
+
+	/*
+	 * Instead of calculating libc addresses; allow the entire
+	 * memory map and lock the selector.
+	 */
+	ret = prctl(PR_SET_SYSCALL_USER_DISPATCH, PR_SYS_DISPATCH_ON, 0, -1L, &sel);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support CONFIG_SYSCALL_USER_DISPATCH");
+	}
+
+	SYSCALL_DISPATCH_ON(sel);
+
+	ret = sysinfo(&info);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Dispatch triggered unexpectedly");
+	}
+}
+
+TEST_HARNESS_MAIN
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 9/9] doc: Document Syscall User Dispatch
  2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
                   ` (7 preceding siblings ...)
  2020-09-04 20:31 ` [PATCH v6 8/9] selftests: Add kselftest for syscall user dispatch Gabriel Krisman Bertazi
@ 2020-09-04 20:31 ` Gabriel Krisman Bertazi
  2020-09-22 19:35   ` Kees Cook
  8 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-04 20:31 UTC (permalink / raw)
  To: luto, tglx, keescook
  Cc: x86, linux-kernel, linux-api, willy, linux-kselftest, shuah,
	Gabriel Krisman Bertazi, kernel

Explain the interface, provide some background and security notes.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 .../admin-guide/syscall-user-dispatch.rst     | 87 +++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 Documentation/admin-guide/syscall-user-dispatch.rst

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
new file mode 100644
index 000000000000..96616660fded
--- /dev/null
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Syscall User Dispatch
+=====================
+
+Background
+----------
+
+Compatibility layers like Wine need a way to efficiently emulate system
+calls of only a part of their process - the part that has the
+incompatible code - while being able to execute native syscalls without
+a high performance penalty on the native part of the process.  Seccomp
+falls short on this task, since it has limited support to efficiently
+filter syscalls based on memory regions, and it doesn't support removing
+filters.  Therefore a new mechanism is necessary.
+
+Syscall User Dispatch brings the filtering of the syscall dispatcher
+address back to userspace.  The application is in control of a flip
+switch, indicating the current personality of the process.  A
+multiple-personality application can then flip the switch without
+invoking the kernel, when crossing the compatibility layer API
+boundaries, to enable/disable the syscall redirection and execute
+syscalls directly (disabled) or send them to be emulated in userspace
+through a SIGSYS.
+
+The goal of this design is to provide very quick compatibility layer
+boundary crosses, which is achieved by not executing a syscall to change
+personality every time the compatibility layer executes.  Instead, a
+userspace memory region exposed to the kernel indicates the current
+personality, and the application simply modifies that variable to
+configure the mechanism.
+
+There is a relatively high cost associated with handling signals on most
+architectures, like x86, but at least for Wine, syscalls issued by
+native Windows code are currently not known to be a performance problem,
+since they are quite rare, at least for modern gaming applications.
+
+Since this mechanism is designed to capture syscalls issued by
+non-native applications, it must function on syscalls whose invocation
+ABI is completely unexpected to Linux.  Syscall User Dispatch, therefore
+doesn't rely on any of the syscall ABI to make the filtering.  It uses
+only the syscall dispatcher address and the userspace key.
+
+Interface
+---------
+
+A process can setup this mechanism on supported kernels
+CONFIG_SYSCALL_USER_DISPATCH) by executing the following prctl:
+
+   prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <start_addr>, <end_addr>, [selector])
+
+<op> is either PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF, to enable and
+disable the mechanism globally for that thread.  When
+PR_SYS_DISPATCH_OFF is used, the other fields must be zero.
+
+<start_addr> and <end_addr> delimit a closed memory region interval from
+which syscalls are always executed directly, regardless of the userspace
+selector.  This provides a fast path for the C library, which includes
+the most common syscall dispatchers in the native code applications, and
+also provides a way for the signal handler to return without triggering
+a nested SIGSYS on (rt_)sigreturn.  Users of this interface should make
+sure that at least the signal trampoline code is included in this
+region. In addition, for syscalls that implement the trampoline code on
+the vDSO, that trampoline is never intercepted.
+
+[selector] is a pointer to a char-sized region in the process memory
+region, that provides a quick way to enable disable syscall redirection
+thread-wide, without the need to invoke the kernel directly.  selector
+can be set to PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF.  Any other
+value should terminate the program with a SIGSYS.
+
+Security Notes
+--------------
+
+Syscall User Dispatch provides functionality for compatibility layers to
+quickly capture system calls issued by a non-native part of the
+application, while not impacting the Linux native regions of the
+process.  It is not a mechanism for sandboxing system calls, and it
+should not be seen as a security mechanism, since it is trivial for a
+malicious application to subvert the mechanism by jumping to an allowed
+dispatcher region prior to executing the syscall, or to discover the
+address and modify the selector value.  If the use case requires any
+kind of security sandboxing, Seccomp should be used instead.
+
+Any fork or exec of the existing process resets the mechanism to
+PR_SYS_DISPATCH_OFF.
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection
  2020-09-04 20:31 ` [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
@ 2020-09-05 11:24   ` Matthew Wilcox
  2020-09-11  9:44   ` peterz
  1 sibling, 0 replies; 40+ messages in thread
From: Matthew Wilcox @ 2020-09-05 11:24 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api,
	linux-kselftest, shuah, kernel, Paul Gofman

On Fri, Sep 04, 2020 at 04:31:43PM -0400, Gabriel Krisman Bertazi wrote:
> +int set_syscall_user_dispatch(int mode, unsigned long dispatcher_start,
> +			      unsigned long dispatcher_end, char __user *selector)
> +{
> +	switch (mode) {
> +	case PR_SYS_DISPATCH_OFF:
...
> +	case PR_SYS_DISPATCH_ON:
...
> +	default:
> +		return -EINVAL;
...
> +	case PR_SET_SYSCALL_USER_DISPATCH:
> +		error = set_syscall_user_dispatch((int) arg2, arg3, arg4,
> +						  (char __user *) arg5);

This makes aliases of DISPATCH_OFF and DISPATCH_ON every 4GB throughout
the 64-bit space of arg2.  I don't think that was intentional (nor
desirable).  I'd suggest just making 'mode' a long and dropping the cast.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry
  2020-09-04 20:31 ` [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry Gabriel Krisman Bertazi
@ 2020-09-07 10:15   ` Christian Brauner
  2020-09-07 14:15     ` Andy Lutomirski
  2020-09-11  9:46   ` peterz
  1 sibling, 1 reply; 40+ messages in thread
From: Christian Brauner @ 2020-09-07 10:15 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote:
> Syscall User Dispatch (SUD) must take precedence over seccomp, since the
> use case is emulation (it can be invoked with a different ABI) such that
> seccomp filtering by syscall number doesn't make sense in the first
> place.  In addition, either the syscall is dispatched back to userspace,
> in which case there is no resource for seccomp to protect, or the

Tbh, I'm torn here. I'm not a super clever attacker but it feels to me
that this is still at least a clever way to circumvent a seccomp
sandbox.
If I'd be confined by a seccomp profile that would cause me to be
SIGKILLed when I try do open() I could prctl() myself to do user
dispatch to prevent that from happening, no?

> syscall will be executed, and seccomp will execute next.
> 
> Regarding ptrace, I experimented with before and after, and while the
> same ABI argument applies, I felt it was easier to debug if I let ptrace
> happen for syscalls that are dispatched back to userspace.  In addition,
> doing it after ptrace makes the code in syscall_exit_work slightly
> simpler, since it doesn't require special handling for this feature.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  kernel/entry/common.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 44fd089d59da..fdb0c543539d 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -6,6 +6,8 @@
>  #include <linux/audit.h>
>  #include <linux/syscall_intercept.h>
>  
> +#include "common.h"
> +
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/syscalls.h>
>  
> @@ -47,6 +49,12 @@ static inline long do_syscall_intercept(struct pt_regs *regs)
>  	int sysint_work = READ_ONCE(current->syscall_intercept);
>  	int ret;
>  
> +	if (sysint_work & SYSINT_USER_DISPATCH) {
> +		ret = do_syscall_user_dispatch(regs);
> +		if (ret == -1L)
> +			return ret;
> +	}
> +
>  	if (sysint_work & SYSINT_SECCOMP) {
>  		ret = __secure_computing(NULL);
>  		if (ret == -1L)
> -- 
> 2.28.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type
  2020-09-04 20:31 ` [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type Gabriel Krisman Bertazi
@ 2020-09-07 10:15   ` Christian Brauner
  2020-09-22 19:39   ` Kees Cook
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Brauner @ 2020-09-07 10:15 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 04, 2020 at 04:31:42PM -0400, Gabriel Krisman Bertazi wrote:
> SYS_USER_DISPATCH will be triggered when a syscall is sent to userspace
> by the Syscall User Dispatch mechanism.  This adjusts eventual
> BUILD_BUG_ON around the tree.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---

Thanks!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
@ 2020-09-07 10:16   ` Christian Brauner
  2020-09-08  4:59     ` Gabriel Krisman Bertazi
  2020-09-11  9:32   ` peterz
  2020-09-22 19:44   ` Kees Cook
  2 siblings, 1 reply; 40+ messages in thread
From: Christian Brauner @ 2020-09-07 10:16 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
> Convert TIF_SECCOMP into a generic TI flag for any syscall interception
> work being done by the kernel.  The actual type of work is exposed by a
> new flag field outside of thread_info.  This ensures that the
> syscall_intercept field is only accessed if struct seccomp has to be
> accessed already, such that it doesn't incur in a much higher cost to
> the seccomp path.
> 
> In order to avoid modifying every architecture at once, this patch has a
> transition mechanism, such that architectures that define TIF_SECCOMP
> continue to work by ignoring the syscall_intercept flag, as long as they
> don't support other syscall interception mechanisms like the future
> syscall user dispatch.  When migrating TIF_SECCOMP to
> TIF_SYSCALL_INTERCEPT, they should adopt the semantics of checking the
> syscall_intercept flag, like it is done in the common entry syscall
> code, or even better, migrate to the common syscall entry code.
> 
> This was tested by running the selftests for seccomp.  No regressions
> were observed, and all tests passed (with and without this patch).
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  include/linux/sched.h             |  6 ++-
>  include/linux/seccomp.h           | 20 ++++++++-
>  include/linux/syscall_intercept.h | 70 +++++++++++++++++++++++++++++++
>  kernel/fork.c                     | 10 ++++-
>  kernel/seccomp.c                  |  7 ++--
>  5 files changed, 106 insertions(+), 7 deletions(-)
>  create mode 100644 include/linux/syscall_intercept.h
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index afe01e232935..3511c98a7849 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -959,7 +959,11 @@ struct task_struct {
>  	kuid_t				loginuid;
>  	unsigned int			sessionid;
>  #endif
> -	struct seccomp			seccomp;
> +
> +	struct {
> +		unsigned int			syscall_intercept;
> +		struct seccomp			seccomp;
> +	};

If there's no specific reason to do this I'd not wrap this in an
anonymous struct. It doesn't really buy anything and there doesn't seem
to be  precedent in struct task_struct right now. Also, if this somehow
adds padding it seems you might end up increasing the size of struct
task_struct more than necessary by accident? (I might be wrong though.)

>  
>  	/* Thread group tracking: */
>  	u64				parent_exec_id;
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index 02aef2844c38..027dc462cea9 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -20,6 +20,24 @@
>  #include <linux/atomic.h>
>  #include <asm/seccomp.h>
>  
> +/*
> + * Some transitional defines to avoid migrating every architecture code
> + * at once.
> + */
> +
> +#if defined(TIF_SECCOMP) && defined(TIF_SYSCALL_INTERCEPT)
> +# error "TIF_SYSCALL_INTERCEPT and TIF_SECCOMP can't be defined at the same time"
> +#endif
> +
> +/*
> + * If the arch has not transitioned to TIF_SYSCALL_INTERCEPT, this let
> + * seccomp work with these architectures, as long as no other syscall
> + * intercept features are meant to be supported.
> + */
> +#ifdef TIF_SECCOMP
> +# define TIF_SYSCALL_INTERCEPT TIF_SECCOMP
> +#endif
> +
>  struct seccomp_filter;
>  /**
>   * struct seccomp - the state of a seccomp'ed process
> @@ -42,7 +60,7 @@ struct seccomp {
>  extern int __secure_computing(const struct seccomp_data *sd);
>  static inline int secure_computing(void)
>  {
> -	if (unlikely(test_thread_flag(TIF_SECCOMP)))
> +	if (unlikely(test_thread_flag(TIF_SYSCALL_INTERCEPT)))
>  		return  __secure_computing(NULL);
>  	return 0;
>  }
> diff --git a/include/linux/syscall_intercept.h b/include/linux/syscall_intercept.h
> new file mode 100644
> index 000000000000..725d157699da
> --- /dev/null
> +++ b/include/linux/syscall_intercept.h
> @@ -0,0 +1,70 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 Collabora Ltd.
> + */
> +#ifndef _SYSCALL_INTERCEPT_H
> +#define _SYSCALL_INTERCEPT_H
> +
> +#include <linux/sched.h>
> +#include <linux/sched/signal.h>
> +#include <linux/thread_info.h>
> +
> +#define SYSINT_SECCOMP		0x1

<bikeshed>

Can we maybe use a better name for this? I noone minds the extra
characters I'd suggest:
SYSCALL_INTERCEPT_SECCOMP
or
SYS_INTERCEPT_SECCOMP

</bikeshed>

> +
> +#ifdef TIF_SYSCALL_INTERCEPT
> +
> +/* seccomp (at least) can modify TIF_SYSCALL_INTERCEPT from a different
> + * thread, which means it can race with itself or with
> + * syscall_user_dispatch. Therefore, TIF_SYSCALL_INTERCEPT and
> + * syscall_intercept are synchronized by tsk->sighand->siglock.
> + */
> +
> +static inline void __set_tsk_syscall_intercept(struct task_struct *tsk,
> +					   unsigned int type)
> +{
> +	tsk->syscall_intercept |= type;
> +
> +	if (tsk->syscall_intercept)
> +		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
> +}
> +
> +static inline void __clear_tsk_syscall_intercept(struct task_struct *tsk,
> +					     unsigned int type)
> +{
> +	tsk->syscall_intercept &= ~type;
> +
> +	if (tsk->syscall_intercept == 0)
> +		clear_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
> +}
> +
> +static inline void set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
> +{
> +	spin_lock_irq(&tsk->sighand->siglock);
> +	__set_tsk_syscall_intercept(tsk, type);
> +	spin_unlock_irq(&tsk->sighand->siglock);
> +}
> +
> +static inline void clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
> +{
> +	spin_lock_irq(&tsk->sighand->siglock);
> +	__clear_tsk_syscall_intercept(tsk, type);
> +	spin_unlock_irq(&tsk->sighand->siglock);
> +}
> +
> +#else
> +static inline void __set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
> +{
> +}
> +static inline void set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
> +{
> +}
> +static inline void __clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
> +{
> +}
> +static inline void clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type)
> +{
> +}
> +#endif
> +
> +#endif
> +
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 4d32190861bd..a39177bed8ea 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -49,7 +49,7 @@
>  #include <linux/cgroup.h>
>  #include <linux/security.h>
>  #include <linux/hugetlb.h>
> -#include <linux/seccomp.h>
> +#include <linux/syscall_intercept.h>
>  #include <linux/swap.h>
>  #include <linux/syscalls.h>
>  #include <linux/jiffies.h>
> @@ -898,6 +898,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  	 * the usage counts on the error path calling free_task.
>  	 */
>  	tsk->seccomp.filter = NULL;
> +	tsk->syscall_intercept = 0;
>  #endif
>  
>  	setup_thread_stack(tsk, orig);
> @@ -1620,9 +1621,14 @@ static void copy_seccomp(struct task_struct *p)
>  	 * If the parent gained a seccomp mode after copying thread
>  	 * flags and between before we held the sighand lock, we have
>  	 * to manually enable the seccomp thread flag here.
> +	 *
> +	 * In addition current sighand lock is asserted, so it is safe
> +	 * to use the unlocked version of set_tsk_syscall_intercept.
>  	 */
>  	if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
> -		set_tsk_thread_flag(p, TIF_SECCOMP);
> +		__set_tsk_syscall_intercept(p, SYSINT_SECCOMP);
> +	else
> +		__clear_tsk_syscall_intercept(p, SYSINT_SECCOMP);
>  #endif
>  }
>  
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3ee59ce0a323..d0643b500f2e 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -28,6 +28,7 @@
>  #include <linux/slab.h>
>  #include <linux/syscalls.h>
>  #include <linux/sysctl.h>
> +#include <linux/syscall_intercept.h>
>  
>  #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
>  #include <asm/syscall.h>
> @@ -352,14 +353,14 @@ static inline void seccomp_assign_mode(struct task_struct *task,
>  
>  	task->seccomp.mode = seccomp_mode;
>  	/*
> -	 * Make sure TIF_SECCOMP cannot be set before the mode (and
> +	 * Make sure SYSINT_SECCOMP cannot be set before the mode (and
>  	 * filter) is set.
>  	 */
>  	smp_mb__before_atomic();
>  	/* Assume default seccomp processes want spec flaw mitigation. */
>  	if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0)
>  		arch_seccomp_spec_mitigate(task);
> -	set_tsk_thread_flag(task, TIF_SECCOMP);
> +	__set_tsk_syscall_intercept(task, SYSINT_SECCOMP);
>  }
>  
>  #ifdef CONFIG_SECCOMP_FILTER
> @@ -925,7 +926,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>  
>  	/*
>  	 * Make sure that any changes to mode from another thread have
> -	 * been seen after TIF_SECCOMP was seen.
> +	 * been seen after SYSINT_SECCOMP was seen.
>  	 */
>  	rmb();
>  
> -- 
> 2.28.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code
  2020-09-04 20:31 ` [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code Gabriel Krisman Bertazi
@ 2020-09-07 10:16   ` Christian Brauner
  2020-09-11  9:35   ` peterz
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Brauner @ 2020-09-07 10:16 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 04, 2020 at 04:31:40PM -0400, Gabriel Krisman Bertazi wrote:
> Syscalls that use common entry code (x86 at the moment of this writing)
> need to have their defines updated inside this commit.  This added a
> measureable overhead of 1ns to seccomp_benchmark selftests on a
> bare-metal AMD system.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  arch/x86/include/asm/thread_info.h |  4 ++--
>  include/linux/entry-common.h       |  6 +-----
>  kernel/entry/common.c              | 24 +++++++++++++++++++++---
>  3 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index 267701ae3d86..cf723181e1f2 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -82,7 +82,7 @@ struct thread_info {
>  #define TIF_SSBD		5	/* Speculative store bypass disable */
>  #define TIF_SYSCALL_EMU		6	/* syscall emulation active */
>  #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
> -#define TIF_SECCOMP		8	/* secure computing */
> +#define TIF_SYSCALL_INTERCEPT	8	/* Intercept system call */
>  #define TIF_SPEC_IB		9	/* Indirect branch speculation mitigation */
>  #define TIF_SPEC_FORCE_UPDATE	10	/* Force speculation MSR update in context switch */
>  #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
> @@ -112,7 +112,7 @@ struct thread_info {
>  #define _TIF_SSBD		(1 << TIF_SSBD)
>  #define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
>  #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
> -#define _TIF_SECCOMP		(1 << TIF_SECCOMP)
> +#define _TIF_SYSCALL_INTERCEPT	(1 << TIF_SYSCALL_INTERCEPT)
>  #define _TIF_SPEC_IB		(1 << TIF_SPEC_IB)
>  #define _TIF_SPEC_FORCE_UPDATE	(1 << TIF_SPEC_FORCE_UPDATE)
>  #define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index efebbffcd5cc..72ce9ca860c6 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -21,10 +21,6 @@
>  # define _TIF_SYSCALL_TRACEPOINT	(0)
>  #endif
>  
> -#ifndef _TIF_SECCOMP
> -# define _TIF_SECCOMP			(0)
> -#endif
> -
>  #ifndef _TIF_SYSCALL_AUDIT
>  # define _TIF_SYSCALL_AUDIT		(0)
>  #endif
> @@ -45,7 +41,7 @@
>  #endif
>  
>  #define SYSCALL_ENTER_WORK						\
> -	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP |	\
> +	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SYSCALL_INTERCEPT | \
>  	 _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU |			\
>  	 ARCH_SYSCALL_ENTER_WORK)
>  
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index fcae019158ca..44fd089d59da 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -4,6 +4,7 @@
>  #include <linux/entry-common.h>
>  #include <linux/livepatch.h>
>  #include <linux/audit.h>
> +#include <linux/syscall_intercept.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/syscalls.h>
> @@ -41,6 +42,20 @@ static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
>  	}
>  }
>  
> +static inline long do_syscall_intercept(struct pt_regs *regs)

Hey Gabriel,

I think you can drop the pt_regs argument and just have this be

static inline do_syscall_intercept(void)

otherwise

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

> +{
> +	int sysint_work = READ_ONCE(current->syscall_intercept);
> +	int ret;
> +
> +	if (sysint_work & SYSINT_SECCOMP) {
> +		ret = __secure_computing(NULL);
> +		if (ret == -1L)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>  static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>  				unsigned long ti_work)
>  {
> @@ -53,9 +68,12 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>  			return -1L;
>  	}
>  
> -	/* Do seccomp after ptrace, to catch any tracer changes. */
> -	if (ti_work & _TIF_SECCOMP) {
> -		ret = __secure_computing(NULL);
> +	/*
> +	 * Do syscall interception like seccomp after ptrace, to catch
> +	 * any tracer changes.
> +	 */
> +	if (ti_work & _TIF_SYSCALL_INTERCEPT) {
> +		ret = do_syscall_intercept(regs);
>  		if (ret == -1L)
>  			return ret;
>  	}
> -- 
> 2.28.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry
  2020-09-07 10:15   ` Christian Brauner
@ 2020-09-07 14:15     ` Andy Lutomirski
  2020-09-07 14:25       ` Christian Brauner
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2020-09-07 14:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Gabriel Krisman Bertazi, luto, tglx, keescook, x86, linux-kernel,
	linux-api, willy, linux-kselftest, shuah, kernel



> On Sep 7, 2020, at 3:15 AM, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> 
> On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote:
>> Syscall User Dispatch (SUD) must take precedence over seccomp, since the
>> use case is emulation (it can be invoked with a different ABI) such that
>> seccomp filtering by syscall number doesn't make sense in the first
>> place.  In addition, either the syscall is dispatched back to userspace,
>> in which case there is no resource for seccomp to protect, or the
> 
> Tbh, I'm torn here. I'm not a super clever attacker but it feels to me
> that this is still at least a clever way to circumvent a seccomp
> sandbox.
> If I'd be confined by a seccomp profile that would cause me to be
> SIGKILLed when I try do open() I could prctl() myself to do user
> dispatch to prevent that from happening, no?
> 

Not really, I think. The idea is that you didn’t actually do open(). You did a SYSCALL instruction which meant something else, and the syscall dispatch correctly prevented the kernel from misinterpreting it as open().


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry
  2020-09-07 14:15     ` Andy Lutomirski
@ 2020-09-07 14:25       ` Christian Brauner
  2020-09-07 20:20         ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Brauner @ 2020-09-07 14:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Gabriel Krisman Bertazi, luto, tglx, keescook, x86, linux-kernel,
	linux-api, willy, linux-kselftest, shuah, kernel

On Mon, Sep 07, 2020 at 07:15:52AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Sep 7, 2020, at 3:15 AM, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> > 
> > On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote:
> >> Syscall User Dispatch (SUD) must take precedence over seccomp, since the
> >> use case is emulation (it can be invoked with a different ABI) such that
> >> seccomp filtering by syscall number doesn't make sense in the first
> >> place.  In addition, either the syscall is dispatched back to userspace,
> >> in which case there is no resource for seccomp to protect, or the
> > 
> > Tbh, I'm torn here. I'm not a super clever attacker but it feels to me
> > that this is still at least a clever way to circumvent a seccomp
> > sandbox.
> > If I'd be confined by a seccomp profile that would cause me to be
> > SIGKILLed when I try do open() I could prctl() myself to do user
> > dispatch to prevent that from happening, no?
> > 
> 
> Not really, I think. The idea is that you didn’t actually do open().
> You did a SYSCALL instruction which meant something else, and the
> syscall dispatch correctly prevented the kernel from misinterpreting
> it as open().

Right, for the case where you're e.g. emulating windows syscalls that's
true. I was thinking when you're running natively on Linux: couldn't I
first load a seccomp profile "kill me if someone does an open()", then
I exec() the target binary and that binary is setup to do
prctl(USER_DISPATCH) first thing. I guess, it's ok because as far as I
had time to read it this is a nothing or all mechanism, i.e. _all_
system calls are re-routed in contrast to e.g. seccomp where I could do
this per-syscall. So for user-dispatch it wouldn't make sense to use it
on Linux per se. Still makes me a little uneasy. :)

Christian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry
  2020-09-07 14:25       ` Christian Brauner
@ 2020-09-07 20:20         ` Andy Lutomirski
  0 siblings, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2020-09-07 20:20 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Gabriel Krisman Bertazi, Andrew Lutomirski, Thomas Gleixner,
	Kees Cook, X86 ML, LKML, Linux API, Matthew Wilcox,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan, kernel

On Mon, Sep 7, 2020 at 7:25 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Mon, Sep 07, 2020 at 07:15:52AM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Sep 7, 2020, at 3:15 AM, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> > >
> > > On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote:
> > >> Syscall User Dispatch (SUD) must take precedence over seccomp, since the
> > >> use case is emulation (it can be invoked with a different ABI) such that
> > >> seccomp filtering by syscall number doesn't make sense in the first
> > >> place.  In addition, either the syscall is dispatched back to userspace,
> > >> in which case there is no resource for seccomp to protect, or the
> > >
> > > Tbh, I'm torn here. I'm not a super clever attacker but it feels to me
> > > that this is still at least a clever way to circumvent a seccomp
> > > sandbox.
> > > If I'd be confined by a seccomp profile that would cause me to be
> > > SIGKILLed when I try do open() I could prctl() myself to do user
> > > dispatch to prevent that from happening, no?
> > >
> >
> > Not really, I think. The idea is that you didn’t actually do open().
> > You did a SYSCALL instruction which meant something else, and the
> > syscall dispatch correctly prevented the kernel from misinterpreting
> > it as open().
>
> Right, for the case where you're e.g. emulating windows syscalls that's
> true. I was thinking when you're running natively on Linux: couldn't I
> first load a seccomp profile "kill me if someone does an open()", then
> I exec() the target binary and that binary is setup to do
> prctl(USER_DISPATCH) first thing. I guess, it's ok because as far as I
> had time to read it this is a nothing or all mechanism, i.e. _all_
> system calls are re-routed in contrast to e.g. seccomp where I could do
> this per-syscall. So for user-dispatch it wouldn't make sense to use it
> on Linux per se. Still makes me a little uneasy. :)

There's an escape hatch, so processes using this can still make syscalls.

Maybe think about it another way: a process using user dispatch should
definitely *not* trigger seccomp user notifiers, errno returns, or
ptrace events, since they'll all do the wrong thing.  IMO RET_KILL is
the same.

Barring some very severe defect, there's no way a program can use user
dispatch to escape seccomp -- a program could use user dispatch to
allow them to do:

mov $__NR_open, %rax
syscall

without dying despite the presence of a filter that would kill the
process if it tried to do open(), but this doesn't bypass the filter
at all.  The process could just as easily have done:

mov $__NR_open
jmp magic_stub(%rip)

without tripping the filter, since no system call actually happens here.

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-07 10:16   ` Christian Brauner
@ 2020-09-08  4:59     ` Gabriel Krisman Bertazi
  2020-09-22 19:42       ` Kees Cook
  0 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-08  4:59 UTC (permalink / raw)
  To: Christian Brauner
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

Christian Brauner <christian.brauner@ubuntu.com> writes:

> On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
>> index afe01e232935..3511c98a7849 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -959,7 +959,11 @@ struct task_struct {
>>  	kuid_t				loginuid;
>>  	unsigned int			sessionid;
>>  #endif
>> -	struct seccomp			seccomp;
>> +
>> +	struct {
>> +		unsigned int			syscall_intercept;
>> +		struct seccomp			seccomp;
>> +	};
>
> If there's no specific reason to do this I'd not wrap this in an
> anonymous struct. It doesn't really buy anything and there doesn't seem
> to be  precedent in struct task_struct right now. Also, if this somehow
> adds padding it seems you might end up increasing the size of struct
> task_struct more than necessary by accident? (I might be wrong
> though.)

Hi Christian,

Thanks for your review on this and on the other patches of this series.

I wrapped these to prevent struct layout randomization from separating
the flags field from seccomp, as they are going to be used together and
I was trying to reduce overhead to seccomp entry due to two cache misses
when reading this structure.  Measuring it seccomp_benchmark didn't show
any difference with the unwrapped version, so perhaps it was a bit of
premature optimization?

>> diff --git a/include/linux/syscall_intercept.h b/include/linux/syscall_intercept.h
>> new file mode 100644
>> index 000000000000..725d157699da
>> --- /dev/null
>> +++ b/include/linux/syscall_intercept.h
>> @@ -0,0 +1,70 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) 2020 Collabora Ltd.
>> + */
>> +#ifndef _SYSCALL_INTERCEPT_H
>> +#define _SYSCALL_INTERCEPT_H
>> +
>> +#include <linux/sched.h>
>> +#include <linux/sched/signal.h>
>> +#include <linux/thread_info.h>
>> +
>> +#define SYSINT_SECCOMP		0x1
>
> <bikeshed>
>
> Can we maybe use a better name for this? I noone minds the extra
> characters I'd suggest:
> SYSCALL_INTERCEPT_SECCOMP
> or
> SYS_INTERCEPT_SECCOMP
>
> </bikeshed>
>

will do.

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
  2020-09-07 10:16   ` Christian Brauner
@ 2020-09-11  9:32   ` peterz
  2020-09-11 20:08     ` Gabriel Krisman Bertazi
  2020-09-22 19:44   ` Kees Cook
  2 siblings, 1 reply; 40+ messages in thread
From: peterz @ 2020-09-11  9:32 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
> +static inline void __set_tsk_syscall_intercept(struct task_struct *tsk,
> +					   unsigned int type)
> +{
> +	tsk->syscall_intercept |= type;
> +
> +	if (tsk->syscall_intercept)
> +		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
> +}

Did the above want to be:

	unsigned int old = tsk->syscall_intercept;
	tsk->syscall_intercept |= type;
	if (!old)
		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT)

?

> +static inline void __clear_tsk_syscall_intercept(struct task_struct *tsk,
> +					     unsigned int type)
> +{
> +	tsk->syscall_intercept &= ~type;
> +
> +	if (tsk->syscall_intercept == 0)
> +		clear_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
> +}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code
  2020-09-04 20:31 ` [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code Gabriel Krisman Bertazi
  2020-09-07 10:16   ` Christian Brauner
@ 2020-09-11  9:35   ` peterz
  2020-09-11 20:11     ` Gabriel Krisman Bertazi
  1 sibling, 1 reply; 40+ messages in thread
From: peterz @ 2020-09-11  9:35 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 04, 2020 at 04:31:40PM -0400, Gabriel Krisman Bertazi wrote:
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index efebbffcd5cc..72ce9ca860c6 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -21,10 +21,6 @@
>  # define _TIF_SYSCALL_TRACEPOINT	(0)
>  #endif
>  
> -#ifndef _TIF_SECCOMP
> -# define _TIF_SECCOMP			(0)
> -#endif
> -
>  #ifndef _TIF_SYSCALL_AUDIT
>  # define _TIF_SYSCALL_AUDIT		(0)
>  #endif

Why doesn't this add:

#ifndef _TIF_SYSCALL_INTERCEPT
#define _TIF_SYSCALL_INTERCEPT		(0)
#endif

?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection
  2020-09-04 20:31 ` [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
  2020-09-05 11:24   ` Matthew Wilcox
@ 2020-09-11  9:44   ` peterz
  1 sibling, 0 replies; 40+ messages in thread
From: peterz @ 2020-09-11  9:44 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel, Paul Gofman

On Fri, Sep 04, 2020 at 04:31:43PM -0400, Gabriel Krisman Bertazi wrote:

> +struct syscall_user_dispatch {
> +	char __user *selector;
> +	unsigned long dispatcher_start;
> +	unsigned long dispatcher_end;
> +};

> +int do_syscall_user_dispatch(struct pt_regs *regs)
> +{
> +	struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> +	unsigned long ip = instruction_pointer(regs);
> +	char state;
> +
> +	if (likely(ip >= sd->dispatcher_start && ip <= sd->dispatcher_end))
> +		return 0;

If you use {offset,size}, instead of {start,end}, you can write the
above like:

	if (ip - sd->dispatcher_offset < sd->dispatcher_size)
		return 0;

which is just a single branch.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry
  2020-09-04 20:31 ` [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry Gabriel Krisman Bertazi
  2020-09-07 10:15   ` Christian Brauner
@ 2020-09-11  9:46   ` peterz
  1 sibling, 0 replies; 40+ messages in thread
From: peterz @ 2020-09-11  9:46 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote:
> Syscall User Dispatch (SUD) must take precedence over seccomp, since the
> use case is emulation (it can be invoked with a different ABI) such that
> seccomp filtering by syscall number doesn't make sense in the first
> place.  In addition, either the syscall is dispatched back to userspace,
> in which case there is no resource for seccomp to protect, or the
> syscall will be executed, and seccomp will execute next.
> 
> Regarding ptrace, I experimented with before and after, and while the
> same ABI argument applies, I felt it was easier to debug if I let ptrace
> happen for syscalls that are dispatched back to userspace.  In addition,
> doing it after ptrace makes the code in syscall_exit_work slightly
> simpler, since it doesn't require special handling for this feature.

I think I'm with Andy that this should be before ptrace(). ptrace()
users will attempt to interpret things like they're regular syscalls,
and that's definitely not the case.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-11  9:32   ` peterz
@ 2020-09-11 20:08     ` Gabriel Krisman Bertazi
  2020-09-24 11:24       ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-11 20:08 UTC (permalink / raw)
  To: peterz
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

peterz@infradead.org writes:

> On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
>> +static inline void __set_tsk_syscall_intercept(struct task_struct *tsk,
>> +					   unsigned int type)
>> +{
>> +	tsk->syscall_intercept |= type;
>> +
>> +	if (tsk->syscall_intercept)
>> +		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
>> +}
>
> Did the above want to be:
>
> 	unsigned int old = tsk->syscall_intercept;
> 	tsk->syscall_intercept |= type;
> 	if (!old)
> 		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT)
>

Hi Peter,

Thanks for the review!

I'm not sure this change gains us anything.  For now,
__set_tsk_syscall_intercept cannot be called with !type, so both
versions behave the same, but my version is safe with that scenario.
This won't be called frequent enough for the extra calls to
set_tsk_thread_flag matter.  Am I missing something?

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code
  2020-09-11  9:35   ` peterz
@ 2020-09-11 20:11     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-11 20:11 UTC (permalink / raw)
  To: peterz
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

peterz@infradead.org writes:

> On Fri, Sep 04, 2020 at 04:31:40PM -0400, Gabriel Krisman Bertazi wrote:
>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>> index efebbffcd5cc..72ce9ca860c6 100644
>> --- a/include/linux/entry-common.h
>> +++ b/include/linux/entry-common.h
>> @@ -21,10 +21,6 @@
>>  # define _TIF_SYSCALL_TRACEPOINT	(0)
>>  #endif
>>  
>> -#ifndef _TIF_SECCOMP
>> -# define _TIF_SECCOMP			(0)
>> -#endif
>> -
>>  #ifndef _TIF_SYSCALL_AUDIT
>>  # define _TIF_SYSCALL_AUDIT		(0)
>>  #endif
>
> Why doesn't this add:
>
> #ifndef _TIF_SYSCALL_INTERCEPT
> #define _TIF_SYSCALL_INTERCEPT		(0)
> #endif
>

I will add in the next version.  Thanks!

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 8/9] selftests: Add kselftest for syscall user dispatch
  2020-09-04 20:31 ` [PATCH v6 8/9] selftests: Add kselftest for syscall user dispatch Gabriel Krisman Bertazi
@ 2020-09-22 19:35   ` Kees Cook
  0 siblings, 0 replies; 40+ messages in thread
From: Kees Cook @ 2020-09-22 19:35 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Fri, Sep 04, 2020 at 04:31:46PM -0400, Gabriel Krisman Bertazi wrote:
> Implement functionality tests for syscall user dispatch.  In order to
> make the test portable, refrain from open coding syscall dispatchers and
> calculating glibc memory ranges.
> 
> Changes since v4:
>   - Update bad selector test to reflect change in API
> 
> Changes since v3:
>   - Sort entry in Makefile
>   - Add SPDX header
>   - Use __NR_syscalls if available
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

This passes, looks good. Thank you again for the self tests!

Acked-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 9/9] doc: Document Syscall User Dispatch
  2020-09-04 20:31 ` [PATCH v6 9/9] doc: Document Syscall User Dispatch Gabriel Krisman Bertazi
@ 2020-09-22 19:35   ` Kees Cook
  0 siblings, 0 replies; 40+ messages in thread
From: Kees Cook @ 2020-09-22 19:35 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Fri, Sep 04, 2020 at 04:31:47PM -0400, Gabriel Krisman Bertazi wrote:
> Explain the interface, provide some background and security notes.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Looks good to me!

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 7/9] x86: Enable Syscall User Dispatch
  2020-09-04 20:31 ` [PATCH v6 7/9] x86: Enable Syscall User Dispatch Gabriel Krisman Bertazi
@ 2020-09-22 19:37   ` Kees Cook
  2020-09-23 20:23     ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 40+ messages in thread
From: Kees Cook @ 2020-09-22 19:37 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Fri, Sep 04, 2020 at 04:31:45PM -0400, Gabriel Krisman Bertazi wrote:
> Syscall User Dispatch requirements are fully supported in x86. This
> patch flips the switch, marking it as supported.  This was tested
> against Syscall User Dispatch selftest.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  arch/x86/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7101ac64bb20..56ac8de99021 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -150,6 +150,7 @@ config X86
>  	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
>  	select HAVE_ARCH_PREL32_RELOCATIONS
>  	select HAVE_ARCH_SECCOMP_FILTER
> +	select HAVE_ARCH_SYSCALL_USER_DISPATCH

Is this needed at all? I think simply "the architecture uses the generic
entry code" is sufficient to enable it. (Especially since there's a top
level config for SYSCALL_USER_DISPATCH, it feels like overkill).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type
  2020-09-04 20:31 ` [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type Gabriel Krisman Bertazi
  2020-09-07 10:15   ` Christian Brauner
@ 2020-09-22 19:39   ` Kees Cook
  1 sibling, 0 replies; 40+ messages in thread
From: Kees Cook @ 2020-09-22 19:39 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Fri, Sep 04, 2020 at 04:31:42PM -0400, Gabriel Krisman Bertazi wrote:
> SYS_USER_DISPATCH will be triggered when a syscall is sent to userspace
> by the Syscall User Dispatch mechanism.  This adjusts eventual
> BUILD_BUG_ON around the tree.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Yup, this looks good; seccomp is glad to have a new SIGSYS friend. ;)

Acked-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel
  2020-09-04 20:31 ` [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel Gabriel Krisman Bertazi
@ 2020-09-22 19:40   ` Kees Cook
  0 siblings, 0 replies; 40+ messages in thread
From: Kees Cook @ 2020-09-22 19:40 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Fri, Sep 04, 2020 at 04:31:41PM -0400, Gabriel Krisman Bertazi wrote:
> Syscall user redirection requires the signal trampoline code to not be
> captured, in order to support returning with a locked selector while
> avoiding recursion back into the signal handler.  For ia-32, which has
> the trampoline in the vDSO, expose the entry points to the kernel, such
> that it can avoid dispatching syscalls from that region to userspace.
> 
> Changes since V1
>   - Change return address to bool (Andy)
> 
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Acked-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Look good to me; would anything else benefit from this information?

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-08  4:59     ` Gabriel Krisman Bertazi
@ 2020-09-22 19:42       ` Kees Cook
  2020-09-23 20:28         ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 40+ messages in thread
From: Kees Cook @ 2020-09-22 19:42 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Christian Brauner, luto, tglx, x86, linux-kernel, linux-api,
	willy, linux-kselftest, shuah, kernel

On Tue, Sep 08, 2020 at 12:59:49AM -0400, Gabriel Krisman Bertazi wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
> >> index afe01e232935..3511c98a7849 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -959,7 +959,11 @@ struct task_struct {
> >>  	kuid_t				loginuid;
> >>  	unsigned int			sessionid;
> >>  #endif
> >> -	struct seccomp			seccomp;
> >> +
> >> +	struct {
> >> +		unsigned int			syscall_intercept;
> >> +		struct seccomp			seccomp;
> >> +	};
> >
> > If there's no specific reason to do this I'd not wrap this in an
> > anonymous struct. It doesn't really buy anything and there doesn't seem
> > to be  precedent in struct task_struct right now. Also, if this somehow
> > adds padding it seems you might end up increasing the size of struct
> > task_struct more than necessary by accident? (I might be wrong
> > though.)
> 
> Hi Christian,
> 
> Thanks for your review on this and on the other patches of this series.
> 
> I wrapped these to prevent struct layout randomization from separating
> the flags field from seccomp, as they are going to be used together and
> I was trying to reduce overhead to seccomp entry due to two cache misses
> when reading this structure.  Measuring it seccomp_benchmark didn't show
> any difference with the unwrapped version, so perhaps it was a bit of
> premature optimization?

That should not be a thing to think about here. Structure randomization
already has a mode to protect against cache line issues. I would leave
this as just a new member; no wrapping struct.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
  2020-09-07 10:16   ` Christian Brauner
  2020-09-11  9:32   ` peterz
@ 2020-09-22 19:44   ` Kees Cook
  2020-09-23 20:18     ` Gabriel Krisman Bertazi
  2 siblings, 1 reply; 40+ messages in thread
From: Kees Cook @ 2020-09-22 19:44 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
> Convert TIF_SECCOMP into a generic TI flag for any syscall interception
> work being done by the kernel.  The actual type of work is exposed by a
> new flag field outside of thread_info.  This ensures that the
> syscall_intercept field is only accessed if struct seccomp has to be
> accessed already, such that it doesn't incur in a much higher cost to
> the seccomp path.
> 
> In order to avoid modifying every architecture at once, this patch has a
> transition mechanism, such that architectures that define TIF_SECCOMP
> continue to work by ignoring the syscall_intercept flag, as long as they
> don't support other syscall interception mechanisms like the future
> syscall user dispatch.  When migrating TIF_SECCOMP to
> TIF_SYSCALL_INTERCEPT, they should adopt the semantics of checking the
> syscall_intercept flag, like it is done in the common entry syscall
> code, or even better, migrate to the common syscall entry code.

Can we "eat" all the other flags like ptrace, audit, etc, too? Doing
this only for seccomp seems strange.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-22 19:44   ` Kees Cook
@ 2020-09-23 20:18     ` Gabriel Krisman Bertazi
  2020-09-23 20:49       ` Kees Cook
  0 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-23 20:18 UTC (permalink / raw)
  To: Kees Cook, tglx
  Cc: luto, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

Kees Cook <keescook@chromium.org> writes:

> On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
>> Convert TIF_SECCOMP into a generic TI flag for any syscall interception
>> work being done by the kernel.  The actual type of work is exposed by a
>> new flag field outside of thread_info.  This ensures that the
>> syscall_intercept field is only accessed if struct seccomp has to be
>> accessed already, such that it doesn't incur in a much higher cost to
>> the seccomp path.
>> 
>> In order to avoid modifying every architecture at once, this patch has a
>> transition mechanism, such that architectures that define TIF_SECCOMP
>> continue to work by ignoring the syscall_intercept flag, as long as they
>> don't support other syscall interception mechanisms like the future
>> syscall user dispatch.  When migrating TIF_SECCOMP to
>> TIF_SYSCALL_INTERCEPT, they should adopt the semantics of checking the
>> syscall_intercept flag, like it is done in the common entry syscall
>> code, or even better, migrate to the common syscall entry code.
>
> Can we "eat" all the other flags like ptrace, audit, etc, too? Doing
> this only for seccomp seems strange.

Hi Kees, Thanks again for the review.

Yes, we can, and I'm happy to follow up with that as part of my TIF
clean up work, but can we not block the current patchset to be merged
waiting for that, as this already grew a lot from the original feature
submission?

Also, Thomas do you mind this approach to reduce the usage of TIF_
flags?  I remember you suggested simply expanding the variable to 64
bits at some point.

Thanks,


-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 7/9] x86: Enable Syscall User Dispatch
  2020-09-22 19:37   ` Kees Cook
@ 2020-09-23 20:23     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-23 20:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: luto, tglx, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

Kees Cook <keescook@chromium.org> writes:

> On Fri, Sep 04, 2020 at 04:31:45PM -0400, Gabriel Krisman Bertazi wrote:
>> Syscall User Dispatch requirements are fully supported in x86. This
>> patch flips the switch, marking it as supported.  This was tested
>> against Syscall User Dispatch selftest.
>> 
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> ---
>>  arch/x86/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 7101ac64bb20..56ac8de99021 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -150,6 +150,7 @@ config X86
>>  	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
>>  	select HAVE_ARCH_PREL32_RELOCATIONS
>>  	select HAVE_ARCH_SECCOMP_FILTER
>> +	select HAVE_ARCH_SYSCALL_USER_DISPATCH
>
> Is this needed at all? I think simply "the architecture uses the generic
> entry code" is sufficient to enable it. (Especially since there's a top
> level config for SYSCALL_USER_DISPATCH, it feels like overkill).

Maybe it is not necessary.  The reason I have this is to prevent
architectures migrating to the generic entry code from inadvertently
starting to support this feature, without thinking in advance whether
arch_syscall_is_vdso_sigreturn is needed.  If that is not a good reason,
I'm happy to drop it.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-22 19:42       ` Kees Cook
@ 2020-09-23 20:28         ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-23 20:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christian Brauner, luto, tglx, x86, linux-kernel, linux-api,
	willy, linux-kselftest, shuah, kernel

Kees Cook <keescook@chromium.org> writes:

> On Tue, Sep 08, 2020 at 12:59:49AM -0400, Gabriel Krisman Bertazi wrote:
>> Christian Brauner <christian.brauner@ubuntu.com> writes:
>> 
>> > On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
>> >> index afe01e232935..3511c98a7849 100644
>> >> --- a/include/linux/sched.h
>> >> +++ b/include/linux/sched.h
>> >> @@ -959,7 +959,11 @@ struct task_struct {
>> >>  	kuid_t				loginuid;
>> >>  	unsigned int			sessionid;
>> >>  #endif
>> >> -	struct seccomp			seccomp;
>> >> +
>> >> +	struct {
>> >> +		unsigned int			syscall_intercept;
>> >> +		struct seccomp			seccomp;
>> >> +	};
>> >
>> > If there's no specific reason to do this I'd not wrap this in an
>> > anonymous struct. It doesn't really buy anything and there doesn't seem
>> > to be  precedent in struct task_struct right now. Also, if this somehow
>> > adds padding it seems you might end up increasing the size of struct
>> > task_struct more than necessary by accident? (I might be wrong
>> > though.)
>> 
>> Hi Christian,
>> 
>> Thanks for your review on this and on the other patches of this series.
>> 
>> I wrapped these to prevent struct layout randomization from separating
>> the flags field from seccomp, as they are going to be used together and
>> I was trying to reduce overhead to seccomp entry due to two cache misses
>> when reading this structure.  Measuring it seccomp_benchmark didn't show
>> any difference with the unwrapped version, so perhaps it was a bit of
>> premature optimization?
>
> That should not be a thing to think about here. Structure randomization
> already has a mode to protect against cache line issues. I would leave
> this as just a new member; no wrapping struct.

Makes sense.  I will drop it for the next iteration.  Thanks!

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-23 20:18     ` Gabriel Krisman Bertazi
@ 2020-09-23 20:49       ` Kees Cook
  2020-09-25  8:00         ` Thomas Gleixner
  0 siblings, 1 reply; 40+ messages in thread
From: Kees Cook @ 2020-09-23 20:49 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: tglx, luto, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Wed, Sep 23, 2020 at 04:18:26PM -0400, Gabriel Krisman Bertazi wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
> >> Convert TIF_SECCOMP into a generic TI flag for any syscall interception
> >> work being done by the kernel.  The actual type of work is exposed by a
> >> new flag field outside of thread_info.  This ensures that the
> >> syscall_intercept field is only accessed if struct seccomp has to be
> >> accessed already, such that it doesn't incur in a much higher cost to
> >> the seccomp path.
> >> 
> >> In order to avoid modifying every architecture at once, this patch has a
> >> transition mechanism, such that architectures that define TIF_SECCOMP
> >> continue to work by ignoring the syscall_intercept flag, as long as they
> >> don't support other syscall interception mechanisms like the future
> >> syscall user dispatch.  When migrating TIF_SECCOMP to
> >> TIF_SYSCALL_INTERCEPT, they should adopt the semantics of checking the
> >> syscall_intercept flag, like it is done in the common entry syscall
> >> code, or even better, migrate to the common syscall entry code.
> >
> > Can we "eat" all the other flags like ptrace, audit, etc, too? Doing
> > this only for seccomp seems strange.
> 
> Hi Kees, Thanks again for the review.
> 
> Yes, we can, and I'm happy to follow up with that as part of my TIF
> clean up work, but can we not block the current patchset to be merged
> waiting for that, as this already grew a lot from the original feature
> submission?

In that case, I'd say just add the new TIF flag. The consolidation can
come later.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-11 20:08     ` Gabriel Krisman Bertazi
@ 2020-09-24 11:24       ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2020-09-24 11:24 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: luto, tglx, keescook, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 11, 2020 at 04:08:45PM -0400, Gabriel Krisman Bertazi wrote:
> peterz@infradead.org writes:
> 
> > On Fri, Sep 04, 2020 at 04:31:39PM -0400, Gabriel Krisman Bertazi wrote:
> >> +static inline void __set_tsk_syscall_intercept(struct task_struct *tsk,
> >> +					   unsigned int type)
> >> +{
> >> +	tsk->syscall_intercept |= type;
> >> +
> >> +	if (tsk->syscall_intercept)
> >> +		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT);
> >> +}
> >
> > Did the above want to be:
> >
> > 	unsigned int old = tsk->syscall_intercept;
> > 	tsk->syscall_intercept |= type;
> > 	if (!old)
> > 		set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT)
> >
> 
> Hi Peter,
> 
> Thanks for the review!
> 
> I'm not sure this change gains us anything.  For now,
> __set_tsk_syscall_intercept cannot be called with !type, so both
> versions behave the same, but my version is safe with that scenario.
> This won't be called frequent enough for the extra calls to
> set_tsk_thread_flag matter.  Am I missing something?

Your version will do set_tsk_thread_flag() for every invocation
(assuming non-zero type). That's sub-optimal.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-23 20:49       ` Kees Cook
@ 2020-09-25  8:00         ` Thomas Gleixner
  2020-09-25 16:15           ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2020-09-25  8:00 UTC (permalink / raw)
  To: Kees Cook, Gabriel Krisman Bertazi
  Cc: luto, x86, linux-kernel, linux-api, willy, linux-kselftest,
	shuah, kernel

On Wed, Sep 23 2020 at 13:49, Kees Cook wrote:
> On Wed, Sep 23, 2020 at 04:18:26PM -0400, Gabriel Krisman Bertazi wrote:
>> Kees Cook <keescook@chromium.org> writes:
>> Yes, we can, and I'm happy to follow up with that as part of my TIF
>> clean up work, but can we not block the current patchset to be merged
>> waiting for that, as this already grew a lot from the original feature
>> submission?
>
> In that case, I'd say just add the new TIF flag. The consolidation can
> come later.

No. This is exactly the wrong order. Cleanup and consolidation have
precedence over features. I'm tired of 'we'll do that later' songs,
simply because in the very end I'm going to be the idiot who mops up the
resulting mess.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-25  8:00         ` Thomas Gleixner
@ 2020-09-25 16:15           ` Gabriel Krisman Bertazi
  2020-09-25 20:30             ` Kees Cook
  0 siblings, 1 reply; 40+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-09-25 16:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, luto, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

Thomas Gleixner <tglx@linutronix.de> writes:

> On Wed, Sep 23 2020 at 13:49, Kees Cook wrote:
>> On Wed, Sep 23, 2020 at 04:18:26PM -0400, Gabriel Krisman Bertazi wrote:
>>> Kees Cook <keescook@chromium.org> writes:
>>> Yes, we can, and I'm happy to follow up with that as part of my TIF
>>> clean up work, but can we not block the current patchset to be merged
>>> waiting for that, as this already grew a lot from the original feature
>>> submission?
>>
>> In that case, I'd say just add the new TIF flag. The consolidation can
>> come later.
>
> No. This is exactly the wrong order. Cleanup and consolidation have
> precedence over features. I'm tired of 'we'll do that later' songs,
> simply because in the very end I'm going to be the idiot who mops up the
> resulting mess.
>

No problem.  I will follow up with a patchset consolidating those flags
into this syscall_intercept interface I proposed.  I assume there is no
immediate concerns with the consolidation approach itself.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag
  2020-09-25 16:15           ` Gabriel Krisman Bertazi
@ 2020-09-25 20:30             ` Kees Cook
  0 siblings, 0 replies; 40+ messages in thread
From: Kees Cook @ 2020-09-25 20:30 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Thomas Gleixner, luto, x86, linux-kernel, linux-api, willy,
	linux-kselftest, shuah, kernel

On Fri, Sep 25, 2020 at 12:15:54PM -0400, Gabriel Krisman Bertazi wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> 
> > On Wed, Sep 23 2020 at 13:49, Kees Cook wrote:
> >> On Wed, Sep 23, 2020 at 04:18:26PM -0400, Gabriel Krisman Bertazi wrote:
> >>> Kees Cook <keescook@chromium.org> writes:
> >>> Yes, we can, and I'm happy to follow up with that as part of my TIF
> >>> clean up work, but can we not block the current patchset to be merged
> >>> waiting for that, as this already grew a lot from the original feature
> >>> submission?
> >>
> >> In that case, I'd say just add the new TIF flag. The consolidation can
> >> come later.
> >
> > No. This is exactly the wrong order. Cleanup and consolidation have
> > precedence over features. I'm tired of 'we'll do that later' songs,
> > simply because in the very end I'm going to be the idiot who mops up the
> > resulting mess.
> >
> 
> No problem.  I will follow up with a patchset consolidating those flags
> into this syscall_intercept interface I proposed.  I assume there is no
> immediate concerns with the consolidation approach itself.

I think the only issue is just finding a clean way to set/unset the
flags safely/quickly (a lock seems too heavy to me).

Should thread_info hold an entire u32 for all intercept flags (then the
TIF_WORK tests is just a zero-test of the intercept u32 word)? Or should
there be a TIF_INTERCEPT and a totally separate u32 (e.g. in
task_struct) indicating which intercepts? (And if they're separate, how
do we atomically set/unset)

i.e.:

atomic_start
	toggle a per-intercept bit
	set TIF_INTERCEPT = !!(intercept word)
atomic_end

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-09-25 20:38 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
2020-09-07 10:16   ` Christian Brauner
2020-09-08  4:59     ` Gabriel Krisman Bertazi
2020-09-22 19:42       ` Kees Cook
2020-09-23 20:28         ` Gabriel Krisman Bertazi
2020-09-11  9:32   ` peterz
2020-09-11 20:08     ` Gabriel Krisman Bertazi
2020-09-24 11:24       ` Peter Zijlstra
2020-09-22 19:44   ` Kees Cook
2020-09-23 20:18     ` Gabriel Krisman Bertazi
2020-09-23 20:49       ` Kees Cook
2020-09-25  8:00         ` Thomas Gleixner
2020-09-25 16:15           ` Gabriel Krisman Bertazi
2020-09-25 20:30             ` Kees Cook
2020-09-04 20:31 ` [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code Gabriel Krisman Bertazi
2020-09-07 10:16   ` Christian Brauner
2020-09-11  9:35   ` peterz
2020-09-11 20:11     ` Gabriel Krisman Bertazi
2020-09-04 20:31 ` [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel Gabriel Krisman Bertazi
2020-09-22 19:40   ` Kees Cook
2020-09-04 20:31 ` [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type Gabriel Krisman Bertazi
2020-09-07 10:15   ` Christian Brauner
2020-09-22 19:39   ` Kees Cook
2020-09-04 20:31 ` [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
2020-09-05 11:24   ` Matthew Wilcox
2020-09-11  9:44   ` peterz
2020-09-04 20:31 ` [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry Gabriel Krisman Bertazi
2020-09-07 10:15   ` Christian Brauner
2020-09-07 14:15     ` Andy Lutomirski
2020-09-07 14:25       ` Christian Brauner
2020-09-07 20:20         ` Andy Lutomirski
2020-09-11  9:46   ` peterz
2020-09-04 20:31 ` [PATCH v6 7/9] x86: Enable Syscall User Dispatch Gabriel Krisman Bertazi
2020-09-22 19:37   ` Kees Cook
2020-09-23 20:23     ` Gabriel Krisman Bertazi
2020-09-04 20:31 ` [PATCH v6 8/9] selftests: Add kselftest for syscall user dispatch Gabriel Krisman Bertazi
2020-09-22 19:35   ` Kees Cook
2020-09-04 20:31 ` [PATCH v6 9/9] doc: Document Syscall User Dispatch Gabriel Krisman Bertazi
2020-09-22 19:35   ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).