All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-12 23:38 Will Drewry
  2012-01-12 23:38 ` [PATCH v3 2/3] seccomp_filters: system call filtering using BPF Will Drewry
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-12 23:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi,
	penberg, viro, wad, luto, mingo, akpm, khilman, borislav.petkov,
	amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

This patch is a placeholder until Andy's (luto@mit.edu) patch arrives
implementing Linus's proposal for applying a "this is a process that has
*no* extra privileges at all, and can never get them".

It adds the "always_unprivileged" member to the task_struct and the
ability for a process to set it to 1 via prctl.  Fixup is then done
alongside MNT_NOSUID in fs/exec.c and security/commoncap.c
(as Eric Paris suggested).

selinux/hooks.c have not been touched but need a similar one line
change.  That said, this is just a placeholder and is not meant to
be authoritative: I look forward to Andy's patches!

Signed-off-by: Will Drewry <wad@chromium.org>
---
 fs/exec.c             |    3 ++-
 include/linux/prctl.h |    6 ++++++
 include/linux/sched.h |    1 +
 kernel/sys.c          |    4 +++-
 security/commoncap.c  |    3 ++-
 5 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 3625464..ce0e477 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1281,7 +1281,8 @@ int prepare_binprm(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
+	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
+	    !current->always_unprivileged) {
 		/* Set-uid? */
 		if (mode & S_ISUID) {
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index a3baeb2..d5d6ab6 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,10 @@
 
 #define PR_MCE_KILL_GET 34
 
+/*
+ * Set this to ensure that a process and any of its descendents may never
+ * escalate privileges (only reduce them).
+ */
+#define PR_SET_ALWAYS_UNPRIVILEGED 35
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1c4f3e9..2d6af15 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1402,6 +1402,7 @@ struct task_struct {
 	unsigned int sessionid;
 #endif
 	seccomp_t seccomp;
+	int always_unprivileged;
 
 /* Thread group tracking */
    	u32 parent_exec_id;
diff --git a/kernel/sys.c b/kernel/sys.c
index 481611f..fbb6248 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1776,7 +1776,6 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		case PR_SET_ENDIAN:
 			error = SET_ENDIAN(me, arg2);
 			break;
-
 		case PR_GET_SECCOMP:
 			error = prctl_get_seccomp();
 			break;
@@ -1841,6 +1840,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			else
 				error = PR_MCE_KILL_DEFAULT;
 			break;
+		case PR_SET_ALWAYS_UNPRIVILEGED:
+			current->always_unprivileged = 1;
+			break;
 		default:
 			error = -EINVAL;
 			break;
diff --git a/security/commoncap.c b/security/commoncap.c
index ee4f848..ca94952 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -439,7 +439,8 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (!file_caps_enabled)
 		return 0;
 
-	if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
+	if ((bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID) ||
+	    current->always_unprivileged)
 		return 0;
 
 	dentry = dget(bprm->file->f_dentry);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
  2012-01-12 23:38 [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Will Drewry
@ 2012-01-12 23:38 ` Will Drewry
  2012-01-13  0:51   ` Randy Dunlap
  2012-01-13 17:39   ` Eric Paris
  2012-01-12 23:38 ` [PATCH v3 3/3] Documentation: prctl/seccomp_filter Will Drewry
  2012-01-12 23:47 ` [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Linus Torvalds
  2 siblings, 2 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-12 23:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi,
	penberg, viro, wad, luto, mingo, akpm, khilman, borislav.petkov,
	amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

This patch adds support for seccomp mode 2.  This mode enables dynamic
enforcement of system call filtering policy in the kernel as specified
by a userland task.  The policy is expressed in terms of a BPF program,
as is used for userland-exposed socket filtering.  Instead of network
data, the BPF program is evaluated over struct user_regs_struct at the
time of the system call (as retrieved using regviews).

A filter program may be installed by a userland task by calling
  prctl(PR_ATTACH_SECCOMP_FILTER, &fprog);
where fprog is of type struct sock_fprog.

If the first filter program allows subsequent prctl(2) calls, then
additional filter programs may be attached.  All attached programs
must be evaluated before a system call will be allowed to proceed.

To avoid CONFIG_COMPAT related landmines, once a filter program is
installed using specific is_compat_task() and current->personality, it
is not allowed to make system calls or attach additional filters which
use a different combination of is_compat_task() and
current->personality.

Filter programs are inherited across fork, clone, and execve, but if a
filter is attached by an unprivileged process (!CAP_SYS_ADMIN), the
"always unprivileged" process bit will be set.  This ensures that
unprivileged processes may not enact unexpected system call filters on
processes that are run as different users or with different capabilities
unless the process already had that ability.

There are a number of benefits to this approach. A few of which are
as follows:
- BPF has been exposed to userland for a long time.
- Userland already knows its ABI: expected register layout and system
  call numbers.
- Full register information is provided which may be relevant for
  certain syscalls (fork, rt_sigreturn) or for other userland
  filtering tactics (checking the PC).
- No time-of-check-time-of-use vulnerable data accesses are possible.

This patch includes its own BPF evaluator, but relies on the
net/core/filter.c BPF checking code.  It is possible to share
evaluators, but the performance sensitive nature of the network
filtering path makes it an iterative optimization which (I think :) can
be tackled separately via separate patchsets. (And at some point sharing
BPF JIT code!)

 v3: - macros to inline (oleg@redhat.com)
     - init_task behavior fixed (oleg@redhat.com)
     - drop creator entry and extra NULL check (oleg@redhat.com)
     - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
     - (!) adds tentative use of "always_unprivileged" as per
       torvalds@linux-foundation.org and luto@mit.edu
 v2: - n/a

Signed-off-by: Will Drewry <wad@chromium.org>
---
 include/linux/prctl.h   |    3 +
 include/linux/seccomp.h |   68 +++++-
 kernel/Makefile         |    1 +
 kernel/fork.c           |    4 +
 kernel/seccomp.c        |    8 +
 kernel/seccomp_filter.c |  620 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c            |    4 +
 security/Kconfig        |   12 +
 8 files changed, 717 insertions(+), 3 deletions(-)
 create mode 100644 kernel/seccomp_filter.c

diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index d5d6ab6..a53fae3 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -64,6 +64,9 @@
 #define PR_GET_SECCOMP	21
 #define PR_SET_SECCOMP	22
 
+/* Set process seccomp filters */
+#define PR_ATTACH_SECCOMP_FILTER	36
+
 /* Get/set the capability bounding set (as per security/commoncap.c) */
 #define PR_CAPBSET_READ 23
 #define PR_CAPBSET_DROP 24
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index cc7a4e9..0296871 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -5,9 +5,28 @@
 #ifdef CONFIG_SECCOMP
 
 #include <linux/thread_info.h>
+#include <linux/types.h>
 #include <asm/seccomp.h>
 
-typedef struct { int mode; } seccomp_t;
+struct seccomp_filter;
+/**
+ * struct seccomp_struct - the state of a seccomp'ed process
+ *
+ * @mode:
+ *     if this is 0, seccomp is not in use.
+ *             is 1, the process is under standard seccomp rules.
+ *             is 2, the process is only allowed to make system calls where
+ *                   associated filters evaluate successfully.
+ * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER.
+ *          @filter must only be accessed from the context of current as there
+ *          is no guard.
+ */
+typedef struct seccomp_struct {
+	int mode;
+#ifdef CONFIG_SECCOMP_FILTER
+	struct seccomp_filter *filter;
+#endif
+} seccomp_t;
 
 extern void __secure_computing(int);
 static inline void secure_computing(int this_syscall)
@@ -28,8 +47,7 @@ static inline int seccomp_mode(seccomp_t *s)
 
 #include <linux/errno.h>
 
-typedef struct { } seccomp_t;
-
+typedef struct seccomp_struct { } seccomp_t;
 #define secure_computing(x) do { } while (0)
 
 static inline long prctl_get_seccomp(void)
@@ -49,4 +67,48 @@ static inline int seccomp_mode(seccomp_t *s)
 
 #endif /* CONFIG_SECCOMP */
 
+#ifdef CONFIG_SECCOMP_FILTER
+
+
+extern long prctl_attach_seccomp_filter(char __user *);
+
+extern struct seccomp_filter *get_seccomp_filter(struct seccomp_filter *);
+extern void put_seccomp_filter(struct seccomp_filter *);
+
+extern int seccomp_test_filters(int);
+extern void seccomp_filter_log_failure(int);
+extern void seccomp_struct_fork(struct seccomp_struct *child,
+				const struct seccomp_struct *parent);
+
+static inline void seccomp_struct_init_task(struct seccomp_struct *seccomp)
+{
+	seccomp->mode = 0;
+	seccomp->filter = NULL;
+}
+
+/* No locking is needed here because the task_struct will
+ * have no parallel consumers.
+ */
+static inline void seccomp_struct_free_task(struct seccomp_struct *seccomp)
+{
+	put_seccomp_filter(seccomp->filter);
+	seccomp->filter = NULL;
+}
+
+#else  /* CONFIG_SECCOMP_FILTER */
+
+#include <linux/errno.h>
+
+struct seccomp_filter { };
+/* Macros consume the unused dereference by the caller. */
+#define seccomp_struct_init_task(_seccomp) do { } while (0);
+#define seccomp_struct_fork(_tsk, _orig) do { } while (0);
+#define seccomp_struct_free_task(_seccomp) do { } while (0);
+
+static inline long prctl_attach_seccomp_filter(char __user *a2)
+{
+	return -ENOSYS;
+}
+
+#endif  /* CONFIG_SECCOMP_FILTER */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index e898c5b..0584090 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -79,6 +79,7 @@ obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
 obj-$(CONFIG_SECCOMP) += seccomp.o
+obj-$(CONFIG_SECCOMP_FILTER) += seccomp_filter.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
 obj-$(CONFIG_TREE_RCU) += rcutree.o
 obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
diff --git a/kernel/fork.c b/kernel/fork.c
index da4a6a1..22f7ec1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -34,6 +34,7 @@
 #include <linux/cgroup.h>
 #include <linux/security.h>
 #include <linux/hugetlb.h>
+#include <linux/seccomp.h>
 #include <linux/swap.h>
 #include <linux/syscalls.h>
 #include <linux/jiffies.h>
@@ -166,6 +167,7 @@ void free_task(struct task_struct *tsk)
 	free_thread_info(tsk->stack);
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
+	seccomp_struct_free_task(&tsk->seccomp);
 	free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
@@ -1089,6 +1091,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		goto fork_out;
 
 	ftrace_graph_init_task(p);
+	seccomp_struct_init_task(&p->seccomp);
 
 	rt_mutex_init_task(p);
 
@@ -1375,6 +1378,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	if (clone_flags & CLONE_THREAD)
 		threadgroup_fork_read_unlock(current);
 	perf_event_fork(p);
+	seccomp_struct_fork(&p->seccomp, &current->seccomp);
 	return p;
 
 bad_fork_free_pid:
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 57d4b13..78719be 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -47,6 +47,14 @@ void __secure_computing(int this_syscall)
 				return;
 		} while (*++syscall);
 		break;
+#ifdef CONFIG_SECCOMP_FILTER
+	case 2:
+		if (seccomp_test_filters(this_syscall) == 0)
+			return;
+
+		seccomp_filter_log_failure(this_syscall);
+		break;
+#endif
 	default:
 		BUG();
 	}
diff --git a/kernel/seccomp_filter.c b/kernel/seccomp_filter.c
new file mode 100644
index 0000000..108a3f3
--- /dev/null
+++ b/kernel/seccomp_filter.c
@@ -0,0 +1,620 @@
+/* bpf program-based system call filtering
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ */
+
+#include <linux/capability.h>
+#include <linux/compat.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/rculist.h>
+#include <linux/filter.h>
+#include <linux/kallsyms.h>
+#include <linux/kref.h>
+#include <linux/module.h>
+#include <linux/pid.h>
+#include <linux/prctl.h>
+#include <linux/ptrace.h>
+#include <linux/ratelimit.h>
+#include <linux/reciprocal_div.h>
+#include <linux/regset.h>
+#include <linux/seccomp.h>
+#include <linux/security.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/user.h>
+
+
+/**
+ * struct seccomp_filter - container for seccomp BPF programs
+ *
+ * @usage: reference count to manage the object lifetime.
+ *         get/put helpers should be used when accessing an instance
+ *         outside of a lifetime-guarded section.  In general, this
+ *         is only needed for handling filters shared across tasks.
+ * @parent: pointer to the ancestor which this filter will be composed with.
+ * @flags: provide information about filter from creation time.
+ * @personality: personality of the process at filter creation time.
+ * @insns: the BPF program instructions to evaluate
+ * @count: the number of instructions in the program.
+ *
+ * seccomp_filter objects should never be modified after being attached
+ * to a task_struct (other than @usage).
+ */
+struct seccomp_filter {
+	struct kref usage;
+	struct seccomp_filter *parent;
+	struct {
+		uint32_t compat:1,		/* CONFIG_COMPAT */
+			 __reserved:31;
+	} flags;
+	int personality;
+	unsigned short count;  /* Instruction count */
+	struct sock_filter insns[0];
+};
+
+static unsigned int seccomp_run_filter(const u8 *buf,
+				       const size_t buflen,
+				       const struct sock_filter *);
+
+/**
+ * seccomp_filter_alloc - allocates a new filter object
+ * @padding: size of the insns[0] array in bytes
+ *
+ * The @padding should be a multiple of
+ * sizeof(struct sock_filter).
+ *
+ * Returns ERR_PTR on error or an allocated object.
+ */
+static struct seccomp_filter *seccomp_filter_alloc(unsigned long padding)
+{
+	struct seccomp_filter *f;
+	unsigned long bpf_blocks = padding / sizeof(struct sock_filter);
+
+	/* Drop oversized requests. */
+	if (bpf_blocks == 0 || bpf_blocks > BPF_MAXINSNS)
+		return ERR_PTR(-EINVAL);
+
+	/* Padding should always be in sock_filter increments. */
+	if (padding % sizeof(struct sock_filter))
+		return ERR_PTR(-EINVAL);
+
+	f = kzalloc(sizeof(struct seccomp_filter) + padding, GFP_KERNEL);
+	if (!f)
+		return ERR_PTR(-ENOMEM);
+	kref_init(&f->usage);
+	f->count = bpf_blocks;
+	return f;
+}
+
+/**
+ * seccomp_filter_free - frees the allocated filter.
+ * @filter: NULL or live object to be completely destructed.
+ */
+static void seccomp_filter_free(struct seccomp_filter *filter)
+{
+	if (!filter)
+		return;
+	put_seccomp_filter(filter->parent);
+	kfree(filter);
+}
+
+static void __put_seccomp_filter(struct kref *kref)
+{
+	struct seccomp_filter *orig =
+		container_of(kref, struct seccomp_filter, usage);
+	seccomp_filter_free(orig);
+}
+
+void seccomp_filter_log_failure(int syscall)
+{
+	pr_info("%s[%d]: system call %d blocked at 0x%lx\n",
+		current->comm, task_pid_nr(current), syscall,
+		KSTK_EIP(current));
+}
+
+/* put_seccomp_filter - decrements the ref count of @orig and may free. */
+void put_seccomp_filter(struct seccomp_filter *orig)
+{
+	if (!orig)
+		return;
+	kref_put(&orig->usage, __put_seccomp_filter);
+}
+
+/* get_seccomp_filter - increments the reference count of @orig. */
+struct seccomp_filter *get_seccomp_filter(struct seccomp_filter *orig)
+{
+	if (!orig)
+		return NULL;
+	kref_get(&orig->usage);
+	return orig;
+}
+
+static int seccomp_check_personality(struct seccomp_filter *filter)
+{
+	if (filter->personality != current->personality)
+		return -EACCES;
+#ifdef CONFIG_COMPAT
+	if (filter->flags.compat != (!!(is_compat_task())))
+		return -EACCES;
+#endif
+	return 0;
+}
+
+static const struct user_regset *
+find_prstatus(const struct user_regset_view *view)
+{
+	const struct user_regset *regset;
+	int n;
+
+	/* Skip 0. */
+	for (n = 1; n < view->n; ++n) {
+		regset = view->regsets + n;
+		if (regset->core_note_type == NT_PRSTATUS)
+			return regset;
+	}
+
+	return NULL;
+}
+
+/**
+ * seccomp_get_regs - returns a pointer to struct user_regs_struct
+ * @scratch: preallocated storage of size @available
+ * @available: pointer to the size of scratch.
+ *
+ * Returns NULL if the registers cannot be acquired or copied.
+ * Returns a populated pointer to @scratch by default.
+ * Otherwise, returns a pointer to a a u8 array containing the struct
+ * user_regs_struct appropriate for the task personality.  The pointer
+ * may be to the beginning of @scratch or to an externally managed data
+ * structure.  On success, @available should be updated with the
+ * valid region size of the returned pointer.
+ *
+ * If the architecture overrides the linkage, then the pointer may pointer to
+ * another location.
+ */
+__weak u8 *seccomp_get_regs(u8 *scratch, size_t *available)
+{
+	/* regset is usually returned based on task personality, not current
+	 * system call convention.  This behavior makes it unsafe to execute
+	 * BPF programs over regviews if is_compat_task or the personality
+	 * have changed since the program was installed.
+	 */
+	const struct user_regset_view *view = task_user_regset_view(current);
+	const struct user_regset *regset = &view->regsets[0];
+	size_t scratch_size = *available;
+	if (regset->core_note_type != NT_PRSTATUS) {
+		/* The architecture should override this method for speed. */
+		regset = find_prstatus(view);
+		if (!regset)
+			return NULL;
+	}
+	*available = regset->n * regset->size;
+	/* Make sure the scratch space isn't exceeded. */
+	if (*available > scratch_size)
+		*available = scratch_size;
+	if (regset->get(current, regset, 0, *available, scratch, NULL))
+		return NULL;
+	return scratch;
+}
+
+/**
+ * seccomp_test_filters - tests 'current' against the given syscall
+ * @syscall: number of the system call to test
+ *
+ * Returns 0 on ok and non-zero on error/failure.
+ */
+int seccomp_test_filters(int syscall)
+{
+	struct seccomp_filter *filter;
+	u8 regs_tmp[sizeof(struct user_regs_struct)], *regs;
+	size_t regs_size = sizeof(struct user_regs_struct);
+	int ret = -EACCES;
+
+	filter = current->seccomp.filter; /* uses task ref */
+	if (!filter)
+		goto out;
+
+	/* All filters in the list are required to share the same system call
+	 * convention so only the first filter is ever checked.
+	 */
+	if (seccomp_check_personality(filter))
+		goto out;
+
+	/* Grab the user_regs_struct.  Normally, regs == &regs_tmp, but
+	 * that is not mandatory.  E.g., it may return a point to
+	 * task_pt_regs(current).  NULL checking is mandatory.
+	 */
+	regs = seccomp_get_regs(regs_tmp, &regs_size);
+	if (!regs)
+		goto out;
+
+	/* Only allow a system call if it is allowed in all ancestors. */
+	ret = 0;
+	for ( ; filter != NULL; filter = filter->parent) {
+		/* Allowed if return value is the size of the data supplied. */
+		if (seccomp_run_filter(regs, regs_size, filter->insns) !=
+		    regs_size)
+			ret = -EACCES;
+	}
+out:
+	return ret;
+}
+
+/**
+ * seccomp_attach_filter: Attaches a seccomp filter to current.
+ * @fprog: BPF program to install
+ *
+ * Context: User context only. This function may sleep on allocation and
+ *          operates on current. current must be attempting a system call
+ *          when this is called (usually prctl).
+ *
+ * This function may be called repeatedly to install additional filters.
+ * Every filter successfully installed will be evaluated (in reverse order)
+ * for each system call the thread makes.
+ *
+ * Returns 0 on success or an errno on failure.
+ */
+long seccomp_attach_filter(struct sock_fprog *fprog)
+{
+	struct seccomp_filter *filter = NULL;
+	/* Note, len is a short so overflow should be impossible. */
+	unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
+	long ret = -EPERM;
+
+	/* Allocate a new seccomp_filter */
+	filter = seccomp_filter_alloc(fp_size);
+	if (IS_ERR(filter)) {
+		ret = PTR_ERR(filter);
+		goto out;
+	}
+
+	/* Lock the process personality and calling convention. */
+#ifdef CONFIG_COMPAT
+	if (is_compat_task())
+		filter->flags.compat = 1;
+#endif
+	filter->personality = current->personality;
+
+	/* If a process lacks CAP_SYS_ADMIN in its namespace, force
+	 * this process and all descendents to always run unprivileged.
+	 * A privileged process will need to set this bit independently,
+	 * if desired.
+	 */
+	if (security_real_capable_noaudit(current, current_user_ns(),
+					  CAP_SYS_ADMIN) != 0)
+		current->always_unprivileged = 1;
+
+	/* Copy the instructions from fprog. */
+	ret = -EFAULT;
+	if (copy_from_user(filter->insns, fprog->filter, fp_size))
+		goto out;
+
+	/* Check the fprog */
+	ret = sk_chk_filter(filter->insns, filter->count);
+	if (ret)
+		goto out;
+
+	/* If there is an existing filter, make it the parent
+	 * and reuse the existing task-based ref.
+	 */
+	filter->parent = current->seccomp.filter;
+
+	/* Force all filters to use one system call convention. */
+	ret = -EINVAL;
+	if (filter->parent) {
+		if (filter->parent->flags.compat != filter->flags.compat)
+			goto out;
+		if (filter->parent->personality != filter->personality)
+			goto out;
+	}
+
+	/* Double claim the new filter so we can release it below simplifying
+	 * the error paths earlier.
+	 */
+	ret = 0;
+	get_seccomp_filter(filter);
+	current->seccomp.filter = filter;
+	/* Engage seccomp if it wasn't. This doesn't use PR_SET_SECCOMP. */
+	if (!current->seccomp.mode) {
+		current->seccomp.mode = 2;
+		set_thread_flag(TIF_SECCOMP);
+	}
+
+out:
+	put_seccomp_filter(filter);  /* for get or task, on err */
+	return ret;
+}
+
+long prctl_attach_seccomp_filter(char __user *user_filter)
+{
+	struct sock_fprog fprog;
+	long ret = -EINVAL;
+
+	ret = -EFAULT;
+	if (!user_filter)
+		goto out;
+
+	if (copy_from_user(&fprog, user_filter, sizeof(fprog)))
+		goto out;
+
+	ret = seccomp_attach_filter(&fprog);
+out:
+	return ret;
+}
+
+/* seccomp_struct_fork: manages inheritance on fork
+ * @child: forkee's seccomp_struct
+ * @parent: forker's seccomp_struct
+ * Ensures that @child inherit a seccomp_filter iff seccomp is enabled
+ * and the set of filters is marked as 'enabled'.
+ */
+void seccomp_struct_fork(struct seccomp_struct *child,
+			 const struct seccomp_struct *parent)
+{
+	if (!parent->mode)
+		return;
+	child->mode = parent->mode;
+	child->filter = get_seccomp_filter(parent->filter);
+}
+
+/* Returns a pointer to the BPF evaluator after checking the offset and size
+ * boundaries.  The signature almost matches the signature from
+ * net/core/filter.c with the hopes of sharing code in the future.
+ */
+static const void *load_pointer(const u8 *buf, size_t buflen,
+				int offset, size_t size,
+				void *unused)
+{
+	if (offset >= buflen)
+		goto fail;
+	if (offset < 0)
+		goto fail;
+	if (size > buflen - offset)
+		goto fail;
+	return buf + offset;
+fail:
+	return NULL;
+}
+
+/**
+ * seccomp_run_filter - evaluate BPF (over user_regs_struct)
+ *	@buf: buffer to execute the filter over
+ *	@buflen: length of the buffer
+ *	@fentry: filter to apply
+ *
+ * Decode and apply filter instructions to the buffer.
+ * Return length to keep, 0 for none. @buf is a regset we are
+ * filtering, @filter is the array of filter instructions.
+ * Because all jumps are guaranteed to be before last instruction,
+ * and last instruction guaranteed to be a RET, we dont need to check
+ * flen.
+ *
+ * See core/net/filter.c as this is nearly an exact copy.
+ * At some point, it would be nice to merge them to take advantage of
+ * optimizations (like JIT).
+ *
+ * A successful filter must return the full length of the data. Anything less
+ * will currently result in a seccomp failure.  In the future, it may be
+ * possible to use that for hard filtering registers on the fly so it is
+ * ideal for consumers to return 0 on intended failure.
+ */
+static unsigned int seccomp_run_filter(const u8 *buf,
+				       const size_t buflen,
+				       const struct sock_filter *fentry)
+{
+	const void *ptr;
+	u32 A = 0;			/* Accumulator */
+	u32 X = 0;			/* Index Register */
+	u32 mem[BPF_MEMWORDS];		/* Scratch Memory Store */
+	u32 tmp;
+	int k;
+
+	/*
+	 * Process array of filter instructions.
+	 */
+	for (;; fentry++) {
+#if defined(CONFIG_X86_32)
+#define	K (fentry->k)
+#else
+		const u32 K = fentry->k;
+#endif
+
+		switch (fentry->code) {
+		case BPF_S_ALU_ADD_X:
+			A += X;
+			continue;
+		case BPF_S_ALU_ADD_K:
+			A += K;
+			continue;
+		case BPF_S_ALU_SUB_X:
+			A -= X;
+			continue;
+		case BPF_S_ALU_SUB_K:
+			A -= K;
+			continue;
+		case BPF_S_ALU_MUL_X:
+			A *= X;
+			continue;
+		case BPF_S_ALU_MUL_K:
+			A *= K;
+			continue;
+		case BPF_S_ALU_DIV_X:
+			if (X == 0)
+				return 0;
+			A /= X;
+			continue;
+		case BPF_S_ALU_DIV_K:
+			A = reciprocal_divide(A, K);
+			continue;
+		case BPF_S_ALU_AND_X:
+			A &= X;
+			continue;
+		case BPF_S_ALU_AND_K:
+			A &= K;
+			continue;
+		case BPF_S_ALU_OR_X:
+			A |= X;
+			continue;
+		case BPF_S_ALU_OR_K:
+			A |= K;
+			continue;
+		case BPF_S_ALU_LSH_X:
+			A <<= X;
+			continue;
+		case BPF_S_ALU_LSH_K:
+			A <<= K;
+			continue;
+		case BPF_S_ALU_RSH_X:
+			A >>= X;
+			continue;
+		case BPF_S_ALU_RSH_K:
+			A >>= K;
+			continue;
+		case BPF_S_ALU_NEG:
+			A = -A;
+			continue;
+		case BPF_S_JMP_JA:
+			fentry += K;
+			continue;
+		case BPF_S_JMP_JGT_K:
+			fentry += (A > K) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_JMP_JGE_K:
+			fentry += (A >= K) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_JMP_JEQ_K:
+			fentry += (A == K) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_JMP_JSET_K:
+			fentry += (A & K) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_JMP_JGT_X:
+			fentry += (A > X) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_JMP_JGE_X:
+			fentry += (A >= X) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_JMP_JEQ_X:
+			fentry += (A == X) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_JMP_JSET_X:
+			fentry += (A & X) ? fentry->jt : fentry->jf;
+			continue;
+		case BPF_S_LD_W_ABS:
+			k = K;
+load_w:
+			ptr = load_pointer(buf, buflen, k, 4, &tmp);
+			if (ptr != NULL) {
+				/* Note, unlike on network data, values are not
+				 * byte swapped.
+				 */
+				A = *(const u32 *)ptr;
+				continue;
+			}
+			return 0;
+		case BPF_S_LD_H_ABS:
+			k = K;
+load_h:
+			ptr = load_pointer(buf, buflen, k, 2, &tmp);
+			if (ptr != NULL) {
+				A = *(const u16 *)ptr;
+				continue;
+			}
+			return 0;
+		case BPF_S_LD_B_ABS:
+			k = K;
+load_b:
+			ptr = load_pointer(buf, buflen, k, 1, &tmp);
+			if (ptr != NULL) {
+				A = *(const u8 *)ptr;
+				continue;
+			}
+			return 0;
+		case BPF_S_LD_W_LEN:
+			A = buflen;
+			continue;
+		case BPF_S_LDX_W_LEN:
+			X = buflen;
+			continue;
+		case BPF_S_LD_W_IND:
+			k = X + K;
+			goto load_w;
+		case BPF_S_LD_H_IND:
+			k = X + K;
+			goto load_h;
+		case BPF_S_LD_B_IND:
+			k = X + K;
+			goto load_b;
+		case BPF_S_LDX_B_MSH:
+			ptr = load_pointer(buf, buflen, K, 1, &tmp);
+			if (ptr != NULL) {
+				X = (*(u8 *)ptr & 0xf) << 2;
+				continue;
+			}
+			return 0;
+		case BPF_S_LD_IMM:
+			A = K;
+			continue;
+		case BPF_S_LDX_IMM:
+			X = K;
+			continue;
+		case BPF_S_LD_MEM:
+			A = mem[K];
+			continue;
+		case BPF_S_LDX_MEM:
+			X = mem[K];
+			continue;
+		case BPF_S_MISC_TAX:
+			X = A;
+			continue;
+		case BPF_S_MISC_TXA:
+			A = X;
+			continue;
+		case BPF_S_RET_K:
+			return K;
+		case BPF_S_RET_A:
+			return A;
+		case BPF_S_ST:
+			mem[K] = A;
+			continue;
+		case BPF_S_STX:
+			mem[K] = X;
+			continue;
+		case BPF_S_ANC_PROTOCOL:
+		case BPF_S_ANC_PKTTYPE:
+		case BPF_S_ANC_IFINDEX:
+		case BPF_S_ANC_MARK:
+		case BPF_S_ANC_QUEUE:
+		case BPF_S_ANC_HATYPE:
+		case BPF_S_ANC_RXHASH:
+		case BPF_S_ANC_CPU:
+		case BPF_S_ANC_NLATTR:
+		case BPF_S_ANC_NLATTR_NEST:
+			/* ignored */
+			continue;
+		default:
+			WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n",
+				       fentry->code, fentry->jt,
+				       fentry->jf, fentry->k);
+			return 0;
+		}
+	}
+
+	return 0;
+}
diff --git a/kernel/sys.c b/kernel/sys.c
index fbb6248..18b7397 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1782,6 +1782,10 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		case PR_SET_SECCOMP:
 			error = prctl_set_seccomp(arg2);
 			break;
+		case PR_ATTACH_SECCOMP_FILTER:
+			error = prctl_attach_seccomp_filter((char __user *)
+								arg2);
+			break;
 		case PR_GET_TSC:
 			error = GET_TSC_CTL(arg2);
 			break;
diff --git a/security/Kconfig b/security/Kconfig
index 51bd5a0..77b1106 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -84,6 +84,18 @@ config SECURITY_DMESG_RESTRICT
 
 	  If you are unsure how to answer this question, answer N.
 
+config SECCOMP_FILTER
+	bool "Enable seccomp-based system call filtering"
+	select SECCOMP
+	depends on EXPERIMENTAL
+	help
+	  This kernel feature expands CONFIG_SECCOMP to allow computing
+	  in environments with reduced kernel access dictated by a system
+	  call filter, expressed in BPF, installed by the application itself
+	  through prctl(2).
+
+	  See Documentation/prctl/seccomp_filter.txt for more detail.
+
 config SECURITY
 	bool "Enable different security models"
 	depends on SYSFS
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 3/3] Documentation: prctl/seccomp_filter
  2012-01-12 23:38 [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Will Drewry
  2012-01-12 23:38 ` [PATCH v3 2/3] seccomp_filters: system call filtering using BPF Will Drewry
@ 2012-01-12 23:38 ` Will Drewry
  2012-01-15  1:52   ` Randy Dunlap
  2012-01-17 23:29     ` Eric Paris
  2012-01-12 23:47 ` [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Linus Torvalds
  2 siblings, 2 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-12 23:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi,
	penberg, viro, wad, luto, mingo, akpm, khilman, borislav.petkov,
	amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

Documents how system call filtering using Berkeley Packet
Filter programs works and how it may be used.
Includes an example for x86 (32-bit).

v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net)
    - document use of tentative always-unprivileged
    - guard sample compilation for i386 and x86_64
v2: - move code to samples (corbet@lwn.net)

Signed-off-by: Will Drewry <wad@chromium.org>
---
 Documentation/prctl/seccomp_filter.txt |   94 ++++++++++++++++++++++++++++++++
 samples/Makefile                       |    2 +-
 samples/seccomp/Makefile               |   18 ++++++
 samples/seccomp/bpf-example.c          |   74 +++++++++++++++++++++++++
 4 files changed, 187 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/prctl/seccomp_filter.txt
 create mode 100644 samples/seccomp/Makefile
 create mode 100644 samples/seccomp/bpf-example.c

diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
new file mode 100644
index 0000000..2db8b89
--- /dev/null
+++ b/Documentation/prctl/seccomp_filter.txt
@@ -0,0 +1,94 @@
+		Seccomp filtering
+		=================
+
+Introduction
+------------
+
+A large number of system calls are exposed to every userland process
+with many of them going unused for the entire lifetime of the process.
+As system calls change and mature, bugs are found and eradicated.  A
+certain subset of userland applications benefit by having a reduced set
+of available system calls.  The resulting set reduces the total kernel
+surface exposed to the application.  System call filtering is meant for
+use with those applications.
+
+Seccomp filtering provides a means for a process to specify a filter
+for incoming system calls.  The filter is expressed as a Berkeley Packet
+Filter (BPF) program, as with socket filters, except that the data
+operated on is the current user_regs_struct.  This allows for expressive
+filtering of system calls using the pre-existing system call ABI and
+using a filter program language with a long history of being exposed to
+userland.  Additionally, BPF makes it impossible for users of seccomp to
+fall prey to time-of-check-time-of-use (TOCTOU) attacks that are common
+in system call interposition frameworks because the evaluated data is
+solely register state just after system call entry.
+
+What it isn't
+-------------
+
+System call filtering isn't a sandbox.  It provides a clearly defined
+mechanism for minimizing the exposed kernel surface.  Beyond that,
+policy for logical behavior and information flow should be managed with
+a combinations of other system hardening techniques and, potentially, a
+LSM of your choosing.  Expressive, dynamic filters provide further options down
+this path (avoiding pathological sizes or selecting which of the multiplexed
+system calls in socketcall() is allowed, for instance) which could be
+construed, incorrectly, as a more complete sandboxing solution.
+
+Usage
+-----
+
+An additional seccomp mode is added, but they are not directly set by the
+consuming process.  The new mode, '2', is only available if
+CONFIG_SECCOMP_FILTER is set and enabled using prctl with the
+PR_ATTACH_SECCOMP_FILTER argument.
+
+Interacting with seccomp filters is done using one prctl(2) call.
+
+PR_ATTACH_SECCOMP_FILTER:
+	Allows the specification of a new filter using a BPF program.
+	The BPF program will be executed over a user_regs_struct data
+	reflecting system call time except with the system call number
+	resident in orig_[register].  To allow a system call, the size
+	of the data must be returned.  At present, all other return values
+	result in the system call being blocked, but it is recommended to
+	return 0 in those cases.  This will allow for future custom return
+	values to be introduced, if ever desired.
+
+	Usage:
+		prctl(PR_ATTACH_SECCOMP_FILTER, prog);
+
+	The 'prog' argument is a pointer to a struct sock_fprog which will
+	contain the filter program.  If the program is invalid, the call
+	will return -1 and set errno to -EINVAL.
+
+	The struct user_regs_struct the @prog will see is based on the
+	personality of the task at the time of this prctl call.  Additionally,
+	is_compat_task is also tracked for the @prog.  This means that once set
+	the calling task will have all of its system calls blocked if it
+	switches its system call ABI (via personality or other means).
+
+	If fork/clone and execve are allowed by @prog, any child processes will
+	be constrained to the same filters and syscal call ABI as the parent.
+
+	When called from an unprivileged process (lacking CAP_SYS_ADMIN), the
+	"always_unprivileged" bit is enabled for the process.
+
+	Additionally, if prctl(2) is allowed by the attached filter,
+	additional filters may be layered on which will increase evaluation
+	time, but allow for further decreasing the attack surface during
+	execution of a process.
+
+The above call returns 0 on success and non-zero on error.
+
+Example
+-------
+
+samples/seccomp-bpf-example.c shows an example process that allows read from stdin,
+write to stdout/err, exit and signal returns for 32-bit x86.
+
+Adding architecture support
+-----------------------
+
+Any platform with seccomp support will support seccomp filters
+as long as CONFIG_SECCOMP_FILTER is enabled.
diff --git a/samples/Makefile b/samples/Makefile
index 6280817..f29b19c 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -1,4 +1,4 @@
 # Makefile for Linux samples code
 
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ tracepoints/ trace_events/ \
-			   hw_breakpoint/ kfifo/ kdb/ hidraw/
+			   hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/
diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
new file mode 100644
index 0000000..cdf0282
--- /dev/null
+++ b/samples/seccomp/Makefile
@@ -0,0 +1,18 @@
+# This sample is x86-only.
+ifeq ($(filter-out x86_64 i386,$(KBUILD_BUILDHOST)),)
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+# List of programs to build
+hostprogs-y := bpf-example
+bpf-example-objs := bpf-example.o
+
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
+
+HOSTCFLAGS_bpf-example.o += -I$(objtree)/usr/include
+ifeq ($(KBUILD_BUILDHOST),x86_64)
+HOSTCFLAGS_bpf-example.o += -m32
+HOSTLOADLIBES_bpf-example += -m32
+endif
+endif  # host arch is x86
diff --git a/samples/seccomp/bpf-example.c b/samples/seccomp/bpf-example.c
new file mode 100644
index 0000000..f98b70a
--- /dev/null
+++ b/samples/seccomp/bpf-example.c
@@ -0,0 +1,74 @@
+/*
+ * Seccomp BPF example
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
+ */
+
+#include <asm/unistd.h>
+#include <linux/filter.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <sys/prctl.h>
+#include <sys/user.h>
+#include <unistd.h>
+
+#ifndef PR_ATTACH_SECCOMP_FILTER
+#	define PR_ATTACH_SECCOMP_FILTER 36
+#endif
+
+#define regoffset(_reg) (offsetof(struct user_regs_struct, _reg))
+static int install_filter(void)
+{
+	struct sock_filter filter[] = {
+		/* Grab the system call number */
+		BPF_STMT(BPF_LD+BPF_W+BPF_IND, regoffset(orig_eax)),
+		/* Jump table for the allowed syscalls */
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6),
+
+		/* Check that read is only using stdin. */
+		BPF_STMT(BPF_LD+BPF_W+BPF_IND, regoffset(ebx)),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4),
+
+		/* Check that write is only using stdout/stderr */
+		BPF_STMT(BPF_LD+BPF_W+BPF_IND, regoffset(ebx)),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1),
+
+		/* Put the "accept" value in A */
+		BPF_STMT(BPF_LD+BPF_W+BPF_LEN, 0),
+
+		BPF_STMT(BPF_RET+BPF_A,0),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+		.filter = filter,
+	};
+	if (prctl(PR_ATTACH_SECCOMP_FILTER, &prog)) {
+		perror("prctl");
+		return 1;
+	}
+	return 0;
+}
+
+#define payload(_c) _c, sizeof(_c)
+int main(int argc, char **argv) {
+	char buf[4096];
+	ssize_t bytes = 0;
+	if (install_filter())
+		return 1;
+	syscall(__NR_write, STDOUT_FILENO, payload("OHAI! WHAT IS YOUR NAME? "));
+	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
+	syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
+	syscall(__NR_write, STDOUT_FILENO, buf, bytes);
+	return 0;
+}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-12 23:38 [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Will Drewry
  2012-01-12 23:38 ` [PATCH v3 2/3] seccomp_filters: system call filtering using BPF Will Drewry
  2012-01-12 23:38 ` [PATCH v3 3/3] Documentation: prctl/seccomp_filter Will Drewry
@ 2012-01-12 23:47 ` Linus Torvalds
  2012-01-13  0:03   ` Will Drewry
  2012-01-13  0:42   ` Andrew Lutomirski
  2 siblings, 2 replies; 47+ messages in thread
From: Linus Torvalds @ 2012-01-12 23:47 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, segoon, rostedt, jmorris, scarybeasts, avi,
	penberg, viro, luto, mingo, akpm, khilman, borislav.petkov,
	amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

On Thu, Jan 12, 2012 at 3:38 PM, Will Drewry <wad@chromium.org> wrote:
> This patch is a placeholder until Andy's (luto@mit.edu) patch arrives
> implementing Linus's proposal for applying a "this is a process that has
> *no* extra privileges at all, and can never get them".

I think we can simplify and improve the naming/logic by just saying
"can't change privileges".

I'd argue that that even includes "can't drop them", just to make it
really clear what the rules are.

So the usage model would be to first simply set the privileges to
whatever you want the sandbox to be, and then enter the restricted
mode.

                    Linus

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
  2012-01-13  0:51   ` Randy Dunlap
@ 2012-01-12 23:59       ` Will Drewry
  0 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-12 23:59 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 6:51 PM, Randy Dunlap <rdunlap@xenotime.net> wrote:
> On 01/12/2012 03:38 PM, Will Drewry wrote:
>>  include/linux/prctl.h   |    3 +
>>  include/linux/seccomp.h |   68 +++++-
>>  kernel/Makefile         |    1 +
>>  kernel/fork.c           |    4 +
>>  kernel/seccomp.c        |    8 +
>>  kernel/seccomp_filter.c |  620 +++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/sys.c            |    4 +
>>  security/Kconfig        |   12 +
>>  8 files changed, 717 insertions(+), 3 deletions(-)
>>  create mode 100644 kernel/seccomp_filter.c
>>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index cc7a4e9..0296871 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -5,9 +5,28 @@
>>  #ifdef CONFIG_SECCOMP
>>
>>  #include <linux/thread_info.h>
>> +#include <linux/types.h>
>>  #include <asm/seccomp.h>
>>
>> -typedef struct { int mode; } seccomp_t;
>> +struct seccomp_filter;
>> +/**
>> + * struct seccomp_struct - the state of a seccomp'ed process
>> + *
>> + * @mode:
>> + *     if this is 0, seccomp is not in use.
>> + *             is 1, the process is under standard seccomp rules.
>> + *             is 2, the process is only allowed to make system calls where
>> + *                   associated filters evaluate successfully.
>> + * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER.
>> + *          @filter must only be accessed from the context of current as there
>> + *          is no guard.
>> + */
>> +typedef struct seccomp_struct {
>> +     int mode;
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +     struct seccomp_filter *filter;
>> +#endif
>> +} seccomp_t;
>>
>>  extern void __secure_computing(int);
>>  static inline void secure_computing(int this_syscall)
>> @@ -28,8 +47,7 @@ static inline int seccomp_mode(seccomp_t *s)
>>
>>  #include <linux/errno.h>
>>
>> -typedef struct { } seccomp_t;
>> -
>> +typedef struct seccomp_struct { } seccomp_t;
>>  #define secure_computing(x) do { } while (0)
>>
>>  static inline long prctl_get_seccomp(void)
>> @@ -49,4 +67,48 @@ static inline int seccomp_mode(seccomp_t *s)
>>
>>  #endif /* CONFIG_SECCOMP */
>>
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +
>> +
>> +extern long prctl_attach_seccomp_filter(char __user *);
>> +
>> +extern struct seccomp_filter *get_seccomp_filter(struct seccomp_filter *);
>> +extern void put_seccomp_filter(struct seccomp_filter *);
>> +
>> +extern int seccomp_test_filters(int);
>> +extern void seccomp_filter_log_failure(int);
>> +extern void seccomp_struct_fork(struct seccomp_struct *child,
>> +                             const struct seccomp_struct *parent);
>> +
>> +static inline void seccomp_struct_init_task(struct seccomp_struct *seccomp)
>> +{
>> +     seccomp->mode = 0;
>> +     seccomp->filter = NULL;
>> +}
>> +
>> +/* No locking is needed here because the task_struct will
>> + * have no parallel consumers.
>> + */
>
> (in multiple places:) Kernel multi-line comment style is:
>
> /*
>  * first line of text
>  * more stuff
>  */

Thanks! I'll roll through and clean them all up.  My apologies!

>> +static inline void seccomp_struct_free_task(struct seccomp_struct *seccomp)
>> +{
>> +     put_seccomp_filter(seccomp->filter);
>> +     seccomp->filter = NULL;
>> +}
>> +
>> +#else  /* CONFIG_SECCOMP_FILTER */
>> +
>> +#include <linux/errno.h>
>> +
>> +struct seccomp_filter { };
>> +/* Macros consume the unused dereference by the caller. */
>> +#define seccomp_struct_init_task(_seccomp) do { } while (0);
>> +#define seccomp_struct_fork(_tsk, _orig) do { } while (0);
>> +#define seccomp_struct_free_task(_seccomp) do { } while (0);
>> +
>> +static inline long prctl_attach_seccomp_filter(char __user *a2)
>> +{
>> +     return -ENOSYS;
>> +}
>> +
>> +#endif  /* CONFIG_SECCOMP_FILTER */
>>  #endif /* _LINUX_SECCOMP_H */
>
>
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index 57d4b13..78719be 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -47,6 +47,14 @@ void __secure_computing(int this_syscall)
>>                               return;
>>               } while (*++syscall);
>>               break;
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +     case 2:
>
> Can we get macros (defines) for these @modes instead of using
> inline constants?

Certainly!

>> +             if (seccomp_test_filters(this_syscall) == 0)
>> +                     return;
>> +
>> +             seccomp_filter_log_failure(this_syscall);
>> +             break;
>> +#endif
>>       default:
>>               BUG();
>>       }
>> diff --git a/kernel/seccomp_filter.c b/kernel/seccomp_filter.c
>> new file mode 100644
>> index 0000000..108a3f3
>> --- /dev/null
>> +++ b/kernel/seccomp_filter.c
>> @@ -0,0 +1,620 @@
>> +/* bpf program-based system call filtering
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
>> + *
>> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
>> + */
>> +

Aside from being a year off, is there a current style?  I just went
off of an existing file.

>
>> +/* seccomp_struct_fork: manages inheritance on fork
>
> /**
>  * seccomp_struct_fork - manages inheritance on fork

Thanks - sorry!

>> + * @child: forkee's seccomp_struct
>> + * @parent: forker's seccomp_struct
>> + * Ensures that @child inherit a seccomp_filter iff seccomp is enabled
>> + * and the set of filters is marked as 'enabled'.
>> + */
>> +void seccomp_struct_fork(struct seccomp_struct *child,
>> +                      const struct seccomp_struct *parent)
>> +{
>> +     if (!parent->mode)
>> +             return;
>> +     child->mode = parent->mode;
>> +     child->filter = get_seccomp_filter(parent->filter);
>> +}
>> +
>> +/* Returns a pointer to the BPF evaluator after checking the offset and size
>> + * boundaries.  The signature almost matches the signature from
>> + * net/core/filter.c with the hopes of sharing code in the future.
>
> Use kernel multi-line comment style.

Of course.

>> + */
>> +static const void *load_pointer(const u8 *buf, size_t buflen,
>> +                             int offset, size_t size,
>> +                             void *unused)
>> +{
>> +     if (offset >= buflen)
>> +             goto fail;
>> +     if (offset < 0)
>> +             goto fail;
>> +     if (size > buflen - offset)
>> +             goto fail;
>> +     return buf + offset;
>> +fail:
>> +     return NULL;
>> +}
>> +
>
>> diff --git a/security/Kconfig b/security/Kconfig
>> index 51bd5a0..77b1106 100644
>> --- a/security/Kconfig
>> +++ b/security/Kconfig
>> @@ -84,6 +84,18 @@ config SECURITY_DMESG_RESTRICT
>>
>>         If you are unsure how to answer this question, answer N.
>>
>> +config SECCOMP_FILTER
>> +     bool "Enable seccomp-based system call filtering"
>> +     select SECCOMP
>> +     depends on EXPERIMENTAL
>> +     help
>> +       This kernel feature expands CONFIG_SECCOMP to allow computing
>> +       in environments with reduced kernel access dictated by a system
>> +       call filter, expressed in BPF, installed by the application itself
>> +       through prctl(2).
>
> This help text is only useful to someone who already knows what it does/means
> IMO.

I'll attempt to clean that up so it makes actual sense to those without context!

>> +
>> +       See Documentation/prctl/seccomp_filter.txt for more detail.
>
> Yes, I'll look at that..

Awesome - thanks!

>> +
>>  config SECURITY
>>       bool "Enable different security models"
>>       depends on SYSFS
>
>
> --
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
@ 2012-01-12 23:59       ` Will Drewry
  0 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-12 23:59 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 6:51 PM, Randy Dunlap <rdunlap@xenotime.net> wrote:
> On 01/12/2012 03:38 PM, Will Drewry wrote:
>>  include/linux/prctl.h   |    3 +
>>  include/linux/seccomp.h |   68 +++++-
>>  kernel/Makefile         |    1 +
>>  kernel/fork.c           |    4 +
>>  kernel/seccomp.c        |    8 +
>>  kernel/seccomp_filter.c |  620 +++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/sys.c            |    4 +
>>  security/Kconfig        |   12 +
>>  8 files changed, 717 insertions(+), 3 deletions(-)
>>  create mode 100644 kernel/seccomp_filter.c
>>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index cc7a4e9..0296871 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -5,9 +5,28 @@
>>  #ifdef CONFIG_SECCOMP
>>
>>  #include <linux/thread_info.h>
>> +#include <linux/types.h>
>>  #include <asm/seccomp.h>
>>
>> -typedef struct { int mode; } seccomp_t;
>> +struct seccomp_filter;
>> +/**
>> + * struct seccomp_struct - the state of a seccomp'ed process
>> + *
>> + * @mode:
>> + *     if this is 0, seccomp is not in use.
>> + *             is 1, the process is under standard seccomp rules.
>> + *             is 2, the process is only allowed to make system calls where
>> + *                   associated filters evaluate successfully.
>> + * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER.
>> + *          @filter must only be accessed from the context of current as there
>> + *          is no guard.
>> + */
>> +typedef struct seccomp_struct {
>> +     int mode;
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +     struct seccomp_filter *filter;
>> +#endif
>> +} seccomp_t;
>>
>>  extern void __secure_computing(int);
>>  static inline void secure_computing(int this_syscall)
>> @@ -28,8 +47,7 @@ static inline int seccomp_mode(seccomp_t *s)
>>
>>  #include <linux/errno.h>
>>
>> -typedef struct { } seccomp_t;
>> -
>> +typedef struct seccomp_struct { } seccomp_t;
>>  #define secure_computing(x) do { } while (0)
>>
>>  static inline long prctl_get_seccomp(void)
>> @@ -49,4 +67,48 @@ static inline int seccomp_mode(seccomp_t *s)
>>
>>  #endif /* CONFIG_SECCOMP */
>>
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +
>> +
>> +extern long prctl_attach_seccomp_filter(char __user *);
>> +
>> +extern struct seccomp_filter *get_seccomp_filter(struct seccomp_filter *);
>> +extern void put_seccomp_filter(struct seccomp_filter *);
>> +
>> +extern int seccomp_test_filters(int);
>> +extern void seccomp_filter_log_failure(int);
>> +extern void seccomp_struct_fork(struct seccomp_struct *child,
>> +                             const struct seccomp_struct *parent);
>> +
>> +static inline void seccomp_struct_init_task(struct seccomp_struct *seccomp)
>> +{
>> +     seccomp->mode = 0;
>> +     seccomp->filter = NULL;
>> +}
>> +
>> +/* No locking is needed here because the task_struct will
>> + * have no parallel consumers.
>> + */
>
> (in multiple places:) Kernel multi-line comment style is:
>
> /*
>  * first line of text
>  * more stuff
>  */

Thanks! I'll roll through and clean them all up.  My apologies!

>> +static inline void seccomp_struct_free_task(struct seccomp_struct *seccomp)
>> +{
>> +     put_seccomp_filter(seccomp->filter);
>> +     seccomp->filter = NULL;
>> +}
>> +
>> +#else  /* CONFIG_SECCOMP_FILTER */
>> +
>> +#include <linux/errno.h>
>> +
>> +struct seccomp_filter { };
>> +/* Macros consume the unused dereference by the caller. */
>> +#define seccomp_struct_init_task(_seccomp) do { } while (0);
>> +#define seccomp_struct_fork(_tsk, _orig) do { } while (0);
>> +#define seccomp_struct_free_task(_seccomp) do { } while (0);
>> +
>> +static inline long prctl_attach_seccomp_filter(char __user *a2)
>> +{
>> +     return -ENOSYS;
>> +}
>> +
>> +#endif  /* CONFIG_SECCOMP_FILTER */
>>  #endif /* _LINUX_SECCOMP_H */
>
>
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index 57d4b13..78719be 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -47,6 +47,14 @@ void __secure_computing(int this_syscall)
>>                               return;
>>               } while (*++syscall);
>>               break;
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +     case 2:
>
> Can we get macros (defines) for these @modes instead of using
> inline constants?

Certainly!

>> +             if (seccomp_test_filters(this_syscall) == 0)
>> +                     return;
>> +
>> +             seccomp_filter_log_failure(this_syscall);
>> +             break;
>> +#endif
>>       default:
>>               BUG();
>>       }
>> diff --git a/kernel/seccomp_filter.c b/kernel/seccomp_filter.c
>> new file mode 100644
>> index 0000000..108a3f3
>> --- /dev/null
>> +++ b/kernel/seccomp_filter.c
>> @@ -0,0 +1,620 @@
>> +/* bpf program-based system call filtering
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
>> + *
>> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
>> + */
>> +

Aside from being a year off, is there a current style?  I just went
off of an existing file.

>
>> +/* seccomp_struct_fork: manages inheritance on fork
>
> /**
>  * seccomp_struct_fork - manages inheritance on fork

Thanks - sorry!

>> + * @child: forkee's seccomp_struct
>> + * @parent: forker's seccomp_struct
>> + * Ensures that @child inherit a seccomp_filter iff seccomp is enabled
>> + * and the set of filters is marked as 'enabled'.
>> + */
>> +void seccomp_struct_fork(struct seccomp_struct *child,
>> +                      const struct seccomp_struct *parent)
>> +{
>> +     if (!parent->mode)
>> +             return;
>> +     child->mode = parent->mode;
>> +     child->filter = get_seccomp_filter(parent->filter);
>> +}
>> +
>> +/* Returns a pointer to the BPF evaluator after checking the offset and size
>> + * boundaries.  The signature almost matches the signature from
>> + * net/core/filter.c with the hopes of sharing code in the future.
>
> Use kernel multi-line comment style.

Of course.

>> + */
>> +static const void *load_pointer(const u8 *buf, size_t buflen,
>> +                             int offset, size_t size,
>> +                             void *unused)
>> +{
>> +     if (offset >= buflen)
>> +             goto fail;
>> +     if (offset < 0)
>> +             goto fail;
>> +     if (size > buflen - offset)
>> +             goto fail;
>> +     return buf + offset;
>> +fail:
>> +     return NULL;
>> +}
>> +
>
>> diff --git a/security/Kconfig b/security/Kconfig
>> index 51bd5a0..77b1106 100644
>> --- a/security/Kconfig
>> +++ b/security/Kconfig
>> @@ -84,6 +84,18 @@ config SECURITY_DMESG_RESTRICT
>>
>>         If you are unsure how to answer this question, answer N.
>>
>> +config SECCOMP_FILTER
>> +     bool "Enable seccomp-based system call filtering"
>> +     select SECCOMP
>> +     depends on EXPERIMENTAL
>> +     help
>> +       This kernel feature expands CONFIG_SECCOMP to allow computing
>> +       in environments with reduced kernel access dictated by a system
>> +       call filter, expressed in BPF, installed by the application itself
>> +       through prctl(2).
>
> This help text is only useful to someone who already knows what it does/means
> IMO.

I'll attempt to clean that up so it makes actual sense to those without context!

>> +
>> +       See Documentation/prctl/seccomp_filter.txt for more detail.
>
> Yes, I'll look at that..

Awesome - thanks!

>> +
>>  config SECURITY
>>       bool "Enable different security models"
>>       depends on SYSFS
>
>
> --
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-12 23:47 ` [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Linus Torvalds
@ 2012-01-13  0:03   ` Will Drewry
  2012-01-13  0:42   ` Andrew Lutomirski
  1 sibling, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-13  0:03 UTC (permalink / raw)
  To: Linus Torvalds, luto
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, segoon, rostedt, jmorris, scarybeasts, avi,
	penberg, viro, mingo, akpm, khilman, borislav.petkov, amwang,
	oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

On Thu, Jan 12, 2012 at 5:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 12, 2012 at 3:38 PM, Will Drewry <wad@chromium.org> wrote:
>> This patch is a placeholder until Andy's (luto@mit.edu) patch arrives
>> implementing Linus's proposal for applying a "this is a process that has
>> *no* extra privileges at all, and can never get them".
>
> I think we can simplify and improve the naming/logic by just saying
> "can't change privileges".
>
> I'd argue that that even includes "can't drop them", just to make it
> really clear what the rules are.
>
> So the usage model would be to first simply set the privileges to
> whatever you want the sandbox to be, and then enter the restricted
> mode.

That sounds ideal to me.  This placeholder is certainly insufficient
then. I'll keep tweaking this patch then until its successor emerges.

Thanks!
will

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-12 23:47 ` [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Linus Torvalds
  2012-01-13  0:03   ` Will Drewry
@ 2012-01-13  0:42   ` Andrew Lutomirski
  2012-01-13  0:57       ` Linus Torvalds
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-13  0:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Will Drewry, linux-kernel, keescook, john.johansen, serge.hallyn,
	coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 3:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 12, 2012 at 3:38 PM, Will Drewry <wad@chromium.org> wrote:
>> This patch is a placeholder until Andy's (luto@mit.edu) patch arrives
>> implementing Linus's proposal for applying a "this is a process that has
>> *no* extra privileges at all, and can never get them".
>
> I think we can simplify and improve the naming/logic by just saying
> "can't change privileges".
>
> I'd argue that that even includes "can't drop them", just to make it
> really clear what the rules are.

That may prevent another use: set this new flag, chroot, drop
privileges, accept network connections.  (The idea being that chroot
might work unprivileged if this flag is set.)

--Andy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
  2012-01-12 23:38 ` [PATCH v3 2/3] seccomp_filters: system call filtering using BPF Will Drewry
@ 2012-01-13  0:51   ` Randy Dunlap
  2012-01-12 23:59       ` Will Drewry
  2012-01-13 17:39   ` Eric Paris
  1 sibling, 1 reply; 47+ messages in thread
From: Randy Dunlap @ 2012-01-13  0:51 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On 01/12/2012 03:38 PM, Will Drewry wrote:
>  include/linux/prctl.h   |    3 +
>  include/linux/seccomp.h |   68 +++++-
>  kernel/Makefile         |    1 +
>  kernel/fork.c           |    4 +
>  kernel/seccomp.c        |    8 +
>  kernel/seccomp_filter.c |  620 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/sys.c            |    4 +
>  security/Kconfig        |   12 +
>  8 files changed, 717 insertions(+), 3 deletions(-)
>  create mode 100644 kernel/seccomp_filter.c
> 
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index cc7a4e9..0296871 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -5,9 +5,28 @@
>  #ifdef CONFIG_SECCOMP
>  
>  #include <linux/thread_info.h>
> +#include <linux/types.h>
>  #include <asm/seccomp.h>
>  
> -typedef struct { int mode; } seccomp_t;
> +struct seccomp_filter;
> +/**
> + * struct seccomp_struct - the state of a seccomp'ed process
> + *
> + * @mode:
> + *     if this is 0, seccomp is not in use.
> + *             is 1, the process is under standard seccomp rules.
> + *             is 2, the process is only allowed to make system calls where
> + *                   associated filters evaluate successfully.
> + * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER.
> + *          @filter must only be accessed from the context of current as there
> + *          is no guard.
> + */
> +typedef struct seccomp_struct {
> +	int mode;
> +#ifdef CONFIG_SECCOMP_FILTER
> +	struct seccomp_filter *filter;
> +#endif
> +} seccomp_t;
>  
>  extern void __secure_computing(int);
>  static inline void secure_computing(int this_syscall)
> @@ -28,8 +47,7 @@ static inline int seccomp_mode(seccomp_t *s)
>  
>  #include <linux/errno.h>
>  
> -typedef struct { } seccomp_t;
> -
> +typedef struct seccomp_struct { } seccomp_t;
>  #define secure_computing(x) do { } while (0)
>  
>  static inline long prctl_get_seccomp(void)
> @@ -49,4 +67,48 @@ static inline int seccomp_mode(seccomp_t *s)
>  
>  #endif /* CONFIG_SECCOMP */
>  
> +#ifdef CONFIG_SECCOMP_FILTER
> +
> +
> +extern long prctl_attach_seccomp_filter(char __user *);
> +
> +extern struct seccomp_filter *get_seccomp_filter(struct seccomp_filter *);
> +extern void put_seccomp_filter(struct seccomp_filter *);
> +
> +extern int seccomp_test_filters(int);
> +extern void seccomp_filter_log_failure(int);
> +extern void seccomp_struct_fork(struct seccomp_struct *child,
> +				const struct seccomp_struct *parent);
> +
> +static inline void seccomp_struct_init_task(struct seccomp_struct *seccomp)
> +{
> +	seccomp->mode = 0;
> +	seccomp->filter = NULL;
> +}
> +
> +/* No locking is needed here because the task_struct will
> + * have no parallel consumers.
> + */

(in multiple places:) Kernel multi-line comment style is:

/*
 * first line of text
 * more stuff
 */


> +static inline void seccomp_struct_free_task(struct seccomp_struct *seccomp)
> +{
> +	put_seccomp_filter(seccomp->filter);
> +	seccomp->filter = NULL;
> +}
> +
> +#else  /* CONFIG_SECCOMP_FILTER */
> +
> +#include <linux/errno.h>
> +
> +struct seccomp_filter { };
> +/* Macros consume the unused dereference by the caller. */
> +#define seccomp_struct_init_task(_seccomp) do { } while (0);
> +#define seccomp_struct_fork(_tsk, _orig) do { } while (0);
> +#define seccomp_struct_free_task(_seccomp) do { } while (0);
> +
> +static inline long prctl_attach_seccomp_filter(char __user *a2)
> +{
> +	return -ENOSYS;
> +}
> +
> +#endif  /* CONFIG_SECCOMP_FILTER */
>  #endif /* _LINUX_SECCOMP_H */


> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 57d4b13..78719be 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -47,6 +47,14 @@ void __secure_computing(int this_syscall)
>  				return;
>  		} while (*++syscall);
>  		break;
> +#ifdef CONFIG_SECCOMP_FILTER
> +	case 2:

Can we get macros (defines) for these @modes instead of using
inline constants?

> +		if (seccomp_test_filters(this_syscall) == 0)
> +			return;
> +
> +		seccomp_filter_log_failure(this_syscall);
> +		break;
> +#endif
>  	default:
>  		BUG();
>  	}
> diff --git a/kernel/seccomp_filter.c b/kernel/seccomp_filter.c
> new file mode 100644
> index 0000000..108a3f3
> --- /dev/null
> +++ b/kernel/seccomp_filter.c
> @@ -0,0 +1,620 @@
> +/* bpf program-based system call filtering
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
> + */
> +
...

> +/* seccomp_struct_fork: manages inheritance on fork

/**
  * seccomp_struct_fork - manages inheritance on fork

> + * @child: forkee's seccomp_struct
> + * @parent: forker's seccomp_struct
> + * Ensures that @child inherit a seccomp_filter iff seccomp is enabled
> + * and the set of filters is marked as 'enabled'.
> + */
> +void seccomp_struct_fork(struct seccomp_struct *child,
> +			 const struct seccomp_struct *parent)
> +{
> +	if (!parent->mode)
> +		return;
> +	child->mode = parent->mode;
> +	child->filter = get_seccomp_filter(parent->filter);
> +}
> +
> +/* Returns a pointer to the BPF evaluator after checking the offset and size
> + * boundaries.  The signature almost matches the signature from
> + * net/core/filter.c with the hopes of sharing code in the future.

Use kernel multi-line comment style.

> + */
> +static const void *load_pointer(const u8 *buf, size_t buflen,
> +				int offset, size_t size,
> +				void *unused)
> +{
> +	if (offset >= buflen)
> +		goto fail;
> +	if (offset < 0)
> +		goto fail;
> +	if (size > buflen - offset)
> +		goto fail;
> +	return buf + offset;
> +fail:
> +	return NULL;
> +}
> +

> diff --git a/security/Kconfig b/security/Kconfig
> index 51bd5a0..77b1106 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -84,6 +84,18 @@ config SECURITY_DMESG_RESTRICT
>  
>  	  If you are unsure how to answer this question, answer N.
>  
> +config SECCOMP_FILTER
> +	bool "Enable seccomp-based system call filtering"
> +	select SECCOMP
> +	depends on EXPERIMENTAL
> +	help
> +	  This kernel feature expands CONFIG_SECCOMP to allow computing
> +	  in environments with reduced kernel access dictated by a system
> +	  call filter, expressed in BPF, installed by the application itself
> +	  through prctl(2).

This help text is only useful to someone who already knows what it does/means
IMO.

> +
> +	  See Documentation/prctl/seccomp_filter.txt for more detail.

Yes, I'll look at that..


> +
>  config SECURITY
>  	bool "Enable different security models"
>  	depends on SYSFS


-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-13  0:42   ` Andrew Lutomirski
@ 2012-01-13  0:57       ` Linus Torvalds
  0 siblings, 0 replies; 47+ messages in thread
From: Linus Torvalds @ 2012-01-13  0:57 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Will Drewry, linux-kernel, keescook, john.johansen, serge.hallyn,
	coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 4:42 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> That may prevent another use: set this new flag, chroot, drop
> privileges, accept network connections.  (The idea being that chroot
> might work unprivileged if this flag is set.)

Well, if you have privileges, then just do

   chroot();
   drop privileges

and if you depend on the new flag, then you do

   drop privileges
   set new flag
   chroot

and if you want to work either way then you just do

   error = chroot
   drop privileges
   set new flag
   if error
      chroot

which does the right thing regardless of whether you had privileges
and/or a new kernel or not.

In any of the three cases I don't see why you'd ever want to drop
privileges *after* setting the new flag.

                   Linus

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-13  0:57       ` Linus Torvalds
  0 siblings, 0 replies; 47+ messages in thread
From: Linus Torvalds @ 2012-01-13  0:57 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Will Drewry, linux-kernel, keescook, john.johansen, serge.hallyn,
	coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 4:42 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> That may prevent another use: set this new flag, chroot, drop
> privileges, accept network connections.  (The idea being that chroot
> might work unprivileged if this flag is set.)

Well, if you have privileges, then just do

   chroot();
   drop privileges

and if you depend on the new flag, then you do

   drop privileges
   set new flag
   chroot

and if you want to work either way then you just do

   error = chroot
   drop privileges
   set new flag
   if error
      chroot

which does the right thing regardless of whether you had privileges
and/or a new kernel or not.

In any of the three cases I don't see why you'd ever want to drop
privileges *after* setting the new flag.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-13  0:57       ` Linus Torvalds
@ 2012-01-13  1:11         ` Andrew Lutomirski
  -1 siblings, 0 replies; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-13  1:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Will Drewry, linux-kernel, keescook, john.johansen, serge.hallyn,
	coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 4:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 12, 2012 at 4:42 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>
>> That may prevent another use: set this new flag, chroot, drop
>> privileges, accept network connections.  (The idea being that chroot
>> might work unprivileged if this flag is set.)
>
> Well, if you have privileges, then just do
>
>   chroot();
>   drop privileges
>
> and if you depend on the new flag, then you do
>
>   drop privileges
>   set new flag
>   chroot
>
> and if you want to work either way then you just do
>
>   error = chroot
>   drop privileges
>   set new flag
>   if error
>      chroot
>
> which does the right thing regardless of whether you had privileges
> and/or a new kernel or not.
>
> In any of the three cases I don't see why you'd ever want to drop
> privileges *after* setting the new flag.

Hmm...

What if you're a daemon that needs something like CAP_NET_BIND but
also wants to be able to run other helpers without CAP_NET_BIND?

(Also, preventing dropping of privileges will probably make a patch
more complicted -- I'll have to find and update all the places that
allow dropping privileges.)

--Andy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-13  1:11         ` Andrew Lutomirski
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-13  1:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Will Drewry, linux-kernel, keescook, john.johansen, serge.hallyn,
	coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 4:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 12, 2012 at 4:42 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>
>> That may prevent another use: set this new flag, chroot, drop
>> privileges, accept network connections.  (The idea being that chroot
>> might work unprivileged if this flag is set.)
>
> Well, if you have privileges, then just do
>
>   chroot();
>   drop privileges
>
> and if you depend on the new flag, then you do
>
>   drop privileges
>   set new flag
>   chroot
>
> and if you want to work either way then you just do
>
>   error = chroot
>   drop privileges
>   set new flag
>   if error
>      chroot
>
> which does the right thing regardless of whether you had privileges
> and/or a new kernel or not.
>
> In any of the three cases I don't see why you'd ever want to drop
> privileges *after* setting the new flag.

Hmm...

What if you're a daemon that needs something like CAP_NET_BIND but
also wants to be able to run other helpers without CAP_NET_BIND?

(Also, preventing dropping of privileges will probably make a patch
more complicted -- I'll have to find and update all the places that
allow dropping privileges.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-13  1:11         ` Andrew Lutomirski
  (?)
@ 2012-01-13  1:17         ` Linus Torvalds
  2012-01-14 13:30           ` Jamie Lokier
  -1 siblings, 1 reply; 47+ messages in thread
From: Linus Torvalds @ 2012-01-13  1:17 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Will Drewry, linux-kernel, keescook, john.johansen, serge.hallyn,
	coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 5:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> What if you're a daemon that needs something like CAP_NET_BIND but
> also wants to be able to run other helpers without CAP_NET_BIND?
>
> (Also, preventing dropping of privileges will probably make a patch
> more complicted -- I'll have to find and update all the places that
> allow dropping privileges.)

Hey, if it actually makes it more complicated to say "don't change
privileges", then I guess my argument that it should be simpler is
wrong.

That said, the thing you bring up is *not* the actual use-case for the
suggestion. The use-case is a "run untrusted code". So the use-case
would be to set the flag after you've dropped CAP_NET_BIND, and
*before* you actually run the other helpers. You clearly must have a
fork() or something like that there, since you want to keep the
NET_BIND in the original daemon.

So I don't t think your example is actually the expected situation.
The obvious approach for your example is
 - run deamon with CAP_NET_BIND
 - per connection, fork, drop privileges in child, then set
"restricted" flag, and run the untrusted code.

(where the restricted mode setting may well obviously also do other
things - like limit allowed system calls etc)

                      Linus

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
  2012-01-12 23:59       ` Will Drewry
  (?)
@ 2012-01-13  1:35       ` Randy Dunlap
  -1 siblings, 0 replies; 47+ messages in thread
From: Randy Dunlap @ 2012-01-13  1:35 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On 01/12/2012 03:59 PM, Will Drewry wrote:
>>> diff --git a/kernel/seccomp_filter.c b/kernel/seccomp_filter.c
>>> new file mode 100644
>>> index 0000000..108a3f3
>>> --- /dev/null
>>> +++ b/kernel/seccomp_filter.c
>>> @@ -0,0 +1,620 @@
>>> +/* bpf program-based system call filtering
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program; if not, write to the Free Software
>>> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
>>> + *
>>> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
>>> + */
>>> +
> 
> Aside from being a year off, is there a current style?  I just went
> off of an existing file.

Sorry, I wasn't making a comment about that header.  Guess I should
have deleted it.

thanks,
-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-13  1:11         ` Andrew Lutomirski
  (?)
  (?)
@ 2012-01-13  1:37         ` Will Drewry
  2012-01-13  1:41             ` Andrew Lutomirski
  -1 siblings, 1 reply; 47+ messages in thread
From: Will Drewry @ 2012-01-13  1:37 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, linux-kernel, keescook, john.johansen,
	serge.hallyn, coreyb, pmoore, eparis, djm, segoon, rostedt,
	jmorris, scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 7:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> (Also, preventing dropping of privileges will probably make a patch
> more complicted -- I'll have to find and update all the places that
> allow dropping privileges.)

An alternative approach might be that the restricted bit drops all
privileges that allows privilege changes in either direction.  E.g.,
- set restricted bit
-- adds a check anywhere MNT_NOSUID is
-- sets securebit to SECURE_NOROOT|..LOCKED
-- drops CAP_SETUID, CAP_DAC_OVERRIDE, ...
-- set the caps bounding set to the minimum the restricted bit allows

That may deviate from the intent (by re-using caps), but it could keep some
of the privilege transition checking code the same.

Just a thought,
will

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-13  1:37         ` Will Drewry
@ 2012-01-13  1:41             ` Andrew Lutomirski
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-13  1:41 UTC (permalink / raw)
  To: Will Drewry
  Cc: Linus Torvalds, linux-kernel, keescook, john.johansen,
	serge.hallyn, coreyb, pmoore, eparis, djm, segoon, rostedt,
	jmorris, scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 5:37 PM, Will Drewry <wad@chromium.org> wrote:
> On Thu, Jan 12, 2012 at 7:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> (Also, preventing dropping of privileges will probably make a patch
>> more complicted -- I'll have to find and update all the places that
>> allow dropping privileges.)
>
> An alternative approach might be that the restricted bit drops all
> privileges that allows privilege changes in either direction.  E.g.,
> - set restricted bit
> -- adds a check anywhere MNT_NOSUID is
> -- sets securebit to SECURE_NOROOT|..LOCKED
> -- drops CAP_SETUID, CAP_DAC_OVERRIDE, ...
> -- set the caps bounding set to the minimum the restricted bit allows
>
> That may deviate from the intent (by re-using caps), but it could keep some
> of the privilege transition checking code the same.

I'm not sure it'll be much of a simplification.  The entire patch is
45 lines right now :)  I'll test it and send it out.

FWIW, though, it breaks apparmor (intentionally).  Can any of you
either explain what *should* happen or (better) volunteer to fix it?
It should be about three lines of code for someone who understands
what's going on.  I don't have an apparmor system, so I can't really
test it.

--Andy

>
> Just a thought,
> will

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-13  1:41             ` Andrew Lutomirski
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-13  1:41 UTC (permalink / raw)
  To: Will Drewry
  Cc: Linus Torvalds, linux-kernel, keescook, john.johansen,
	serge.hallyn, coreyb, pmoore, eparis, djm, segoon, rostedt,
	jmorris, scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 5:37 PM, Will Drewry <wad@chromium.org> wrote:
> On Thu, Jan 12, 2012 at 7:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> (Also, preventing dropping of privileges will probably make a patch
>> more complicted -- I'll have to find and update all the places that
>> allow dropping privileges.)
>
> An alternative approach might be that the restricted bit drops all
> privileges that allows privilege changes in either direction.  E.g.,
> - set restricted bit
> -- adds a check anywhere MNT_NOSUID is
> -- sets securebit to SECURE_NOROOT|..LOCKED
> -- drops CAP_SETUID, CAP_DAC_OVERRIDE, ...
> -- set the caps bounding set to the minimum the restricted bit allows
>
> That may deviate from the intent (by re-using caps), but it could keep some
> of the privilege transition checking code the same.

I'm not sure it'll be much of a simplification.  The entire patch is
45 lines right now :)  I'll test it and send it out.

FWIW, though, it breaks apparmor (intentionally).  Can any of you
either explain what *should* happen or (better) volunteer to fix it?
It should be about three lines of code for someone who understands
what's going on.  I don't have an apparmor system, so I can't really
test it.

--Andy

>
> Just a thought,
> will
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-13  1:41             ` Andrew Lutomirski
@ 2012-01-13  2:09               ` Kees Cook
  -1 siblings, 0 replies; 47+ messages in thread
From: Kees Cook @ 2012-01-13  2:09 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Will Drewry, Linus Torvalds, linux-kernel, john.johansen,
	serge.hallyn, coreyb, pmoore, eparis, djm, segoon, rostedt,
	jmorris, scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 5:41 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Thu, Jan 12, 2012 at 5:37 PM, Will Drewry <wad@chromium.org> wrote:
>> On Thu, Jan 12, 2012 at 7:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>> (Also, preventing dropping of privileges will probably make a patch
>>> more complicted -- I'll have to find and update all the places that
>>> allow dropping privileges.)
>>
>> An alternative approach might be that the restricted bit drops all
>> privileges that allows privilege changes in either direction.  E.g.,
>> - set restricted bit
>> -- adds a check anywhere MNT_NOSUID is
>> -- sets securebit to SECURE_NOROOT|..LOCKED
>> -- drops CAP_SETUID, CAP_DAC_OVERRIDE, ...
>> -- set the caps bounding set to the minimum the restricted bit allows
>>
>> That may deviate from the intent (by re-using caps), but it could keep some
>> of the privilege transition checking code the same.
>
> I'm not sure it'll be much of a simplification.  The entire patch is
> 45 lines right now :)  I'll test it and send it out.
>
> FWIW, though, it breaks apparmor (intentionally).  Can any of you
> either explain what *should* happen or (better) volunteer to fix it?
> It should be about three lines of code for someone who understands
> what's going on.  I don't have an apparmor system, so I can't really
> test it.

I'm happy to take a look at the AppArmor breakage. What's happening
with it at the moment?

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-13  2:09               ` Kees Cook
  0 siblings, 0 replies; 47+ messages in thread
From: Kees Cook @ 2012-01-13  2:09 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Will Drewry, Linus Torvalds, linux-kernel, john.johansen,
	serge.hallyn, coreyb, pmoore, eparis, djm, segoon, rostedt,
	jmorris, scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 5:41 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Thu, Jan 12, 2012 at 5:37 PM, Will Drewry <wad@chromium.org> wrote:
>> On Thu, Jan 12, 2012 at 7:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>> (Also, preventing dropping of privileges will probably make a patch
>>> more complicted -- I'll have to find and update all the places that
>>> allow dropping privileges.)
>>
>> An alternative approach might be that the restricted bit drops all
>> privileges that allows privilege changes in either direction.  E.g.,
>> - set restricted bit
>> -- adds a check anywhere MNT_NOSUID is
>> -- sets securebit to SECURE_NOROOT|..LOCKED
>> -- drops CAP_SETUID, CAP_DAC_OVERRIDE, ...
>> -- set the caps bounding set to the minimum the restricted bit allows
>>
>> That may deviate from the intent (by re-using caps), but it could keep some
>> of the privilege transition checking code the same.
>
> I'm not sure it'll be much of a simplification.  The entire patch is
> 45 lines right now :)  I'll test it and send it out.
>
> FWIW, though, it breaks apparmor (intentionally).  Can any of you
> either explain what *should* happen or (better) volunteer to fix it?
> It should be about three lines of code for someone who understands
> what's going on.  I don't have an apparmor system, so I can't really
> test it.

I'm happy to take a look at the AppArmor breakage. What's happening
with it at the moment?

-- 
Kees Cook
ChromeOS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
  2012-01-12 23:38 ` [PATCH v3 2/3] seccomp_filters: system call filtering using BPF Will Drewry
  2012-01-13  0:51   ` Randy Dunlap
@ 2012-01-13 17:39   ` Eric Paris
  2012-01-13 18:50       ` Will Drewry
  1 sibling, 1 reply; 47+ messages in thread
From: Eric Paris @ 2012-01-13 17:39 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, djm, torvalds, segoon, rostedt, jmorris, scarybeasts,
	avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov,
	amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

On Thu, 2012-01-12 at 17:38 -0600, Will Drewry wrote:

> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index cc7a4e9..0296871 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h

> -typedef struct { int mode; } seccomp_t;
> +struct seccomp_filter;
> +/**
> + * struct seccomp_struct - the state of a seccomp'ed process
> + *
> + * @mode:
> + *     if this is 0, seccomp is not in use.
> + *             is 1, the process is under standard seccomp rules.
> + *             is 2, the process is only allowed to make system calls where
> + *                   associated filters evaluate successfully.
> + * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER.
> + *          @filter must only be accessed from the context of current as there
> + *          is no guard.
> + */
> +typedef struct seccomp_struct {
> +	int mode;
> +#ifdef CONFIG_SECCOMP_FILTER
> +	struct seccomp_filter *filter;
> +#endif
> +} seccomp_t;
>  
>  extern void __secure_computing(int);
>  static inline void secure_computing(int this_syscall)

Can we get rid of all of the typedef stuff?  I know you didn't add it
but now seems like a good time to follow typical kernel semantics if you
have to re-rev for some other reason.

-Eric


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
  2012-01-13 17:39   ` Eric Paris
@ 2012-01-13 18:50       ` Will Drewry
  0 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-13 18:50 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, djm, torvalds, segoon, rostedt, jmorris, scarybeasts,
	avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov,
	amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

On Fri, Jan 13, 2012 at 11:39 AM, Eric Paris <eparis@redhat.com> wrote:
> On Thu, 2012-01-12 at 17:38 -0600, Will Drewry wrote:
>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index cc7a4e9..0296871 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>
>> -typedef struct { int mode; } seccomp_t;
>> +struct seccomp_filter;
>> +/**
>> + * struct seccomp_struct - the state of a seccomp'ed process
>> + *
>> + * @mode:
>> + *     if this is 0, seccomp is not in use.
>> + *             is 1, the process is under standard seccomp rules.
>> + *             is 2, the process is only allowed to make system calls where
>> + *                   associated filters evaluate successfully.
>> + * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER.
>> + *          @filter must only be accessed from the context of current as there
>> + *          is no guard.
>> + */
>> +typedef struct seccomp_struct {
>> +     int mode;
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +     struct seccomp_filter *filter;
>> +#endif
>> +} seccomp_t;
>>
>>  extern void __secure_computing(int);
>>  static inline void secure_computing(int this_syscall)
>
> Can we get rid of all of the typedef stuff?  I know you didn't add it
> but now seems like a good time to follow typical kernel semantics if you
> have to re-rev for some other reason.

Yup - I was hoping to do that separately since it touches extra files.
 I'll make a prereq patch that enacts the change (so it can be picked
up even if the seccomp-bpf is less successful).

cheers!
will

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] seccomp_filters: system call filtering using BPF
@ 2012-01-13 18:50       ` Will Drewry
  0 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-13 18:50 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, djm, torvalds, segoon, rostedt, jmorris, scarybeasts,
	avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov,
	amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

On Fri, Jan 13, 2012 at 11:39 AM, Eric Paris <eparis@redhat.com> wrote:
> On Thu, 2012-01-12 at 17:38 -0600, Will Drewry wrote:
>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index cc7a4e9..0296871 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>
>> -typedef struct { int mode; } seccomp_t;
>> +struct seccomp_filter;
>> +/**
>> + * struct seccomp_struct - the state of a seccomp'ed process
>> + *
>> + * @mode:
>> + *     if this is 0, seccomp is not in use.
>> + *             is 1, the process is under standard seccomp rules.
>> + *             is 2, the process is only allowed to make system calls where
>> + *                   associated filters evaluate successfully.
>> + * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER.
>> + *          @filter must only be accessed from the context of current as there
>> + *          is no guard.
>> + */
>> +typedef struct seccomp_struct {
>> +     int mode;
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +     struct seccomp_filter *filter;
>> +#endif
>> +} seccomp_t;
>>
>>  extern void __secure_computing(int);
>>  static inline void secure_computing(int this_syscall)
>
> Can we get rid of all of the typedef stuff?  I know you didn't add it
> but now seems like a good time to follow typical kernel semantics if you
> have to re-rev for some other reason.

Yup - I was hoping to do that separately since it touches extra files.
 I'll make a prereq patch that enacts the change (so it can be picked
up even if the seccomp-bpf is less successful).

cheers!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-13  1:17         ` Linus Torvalds
@ 2012-01-14 13:30           ` Jamie Lokier
  2012-01-14 19:21               ` Will Drewry
  2012-01-14 20:22             ` Linus Torvalds
  0 siblings, 2 replies; 47+ messages in thread
From: Jamie Lokier @ 2012-01-14 13:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Will Drewry, linux-kernel, keescook,
	john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, segoon,
	rostedt, jmorris, scarybeasts, avi, penberg, viro, mingo, akpm,
	khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

Linus Torvalds wrote:
> On Thu, Jan 12, 2012 at 5:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> >
> > What if you're a daemon that needs something like CAP_NET_BIND but
> > also wants to be able to run other helpers without CAP_NET_BIND?
> >
> > (Also, preventing dropping of privileges will probably make a patch
> > more complicted -- I'll have to find and update all the places that
> > allow dropping privileges.)
> 
> Hey, if it actually makes it more complicated to say "don't change
> privileges", then I guess my argument that it should be simpler is
> wrong.
> 
> That said, the thing you bring up is *not* the actual use-case for the
> suggestion. The use-case is a "run untrusted code". So the use-case
> would be to set the flag after you've dropped CAP_NET_BIND, and
> *before* you actually run the other helpers. You clearly must have a
> fork() or something like that there, since you want to keep the
> NET_BIND in the original daemon.

Well suppose you don't trust the daemon either.  It might be running
in a network namespace where it's safe for untrusted code to bind to
low ports.

Or maybe you just need to let it bind willy-nilly among a restricted
subset of low ports - which of course you would like to restrict with
the seccomp filter.

(This can't happen right now because the filter can only look at
arguments, not memory pointed to - so it can't look at the port
number.  Can it even see when sys_bind is called on archs like x86
that use sys_socketcall?!)

Anyway the principle is there - CAP_NET_BIND doesn't necessarily mean
the daemon code is trusted.

-- Jamie

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-14 13:30           ` Jamie Lokier
@ 2012-01-14 19:21               ` Will Drewry
  2012-01-14 20:22             ` Linus Torvalds
  1 sibling, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-14 19:21 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Linus Torvalds, Andrew Lutomirski, linux-kernel, keescook,
	john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, segoon,
	rostedt, jmorris, scarybeasts, avi, penberg, viro, mingo, akpm,
	khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Sat, Jan 14, 2012 at 7:30 AM, Jamie Lokier <jamie@shareable.org> wrote:
> Linus Torvalds wrote:
>> On Thu, Jan 12, 2012 at 5:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> >
>> > What if you're a daemon that needs something like CAP_NET_BIND but
>> > also wants to be able to run other helpers without CAP_NET_BIND?
>> >
>> > (Also, preventing dropping of privileges will probably make a patch
>> > more complicted -- I'll have to find and update all the places that
>> > allow dropping privileges.)
>>
>> Hey, if it actually makes it more complicated to say "don't change
>> privileges", then I guess my argument that it should be simpler is
>> wrong.
>>
>> That said, the thing you bring up is *not* the actual use-case for the
>> suggestion. The use-case is a "run untrusted code". So the use-case
>> would be to set the flag after you've dropped CAP_NET_BIND, and
>> *before* you actually run the other helpers. You clearly must have a
>> fork() or something like that there, since you want to keep the
>> NET_BIND in the original daemon.
>
> Well suppose you don't trust the daemon either.  It might be running
> in a network namespace where it's safe for untrusted code to bind to
> low ports.
>
> Or maybe you just need to let it bind willy-nilly among a restricted
> subset of low ports - which of course you would like to restrict with
> the seccomp filter.

Unless the port values are the register arguments, seccomp filter
won't help.  It can be used to incrementally drop available system
calls (like socketcall(SYS_LISTEN) or whatever).

> (This can't happen right now because the filter can only look at
> arguments, not memory pointed to - so it can't look at the port
> number.  Can it even see when sys_bind is called on archs like x86
> that use sys_socketcall?!)

Yeah - multiplexed system calls like ipc and socketcall can be filtered
based on the argument value in the register. (socketcall's first argument is
"call".)

> Anyway the principle is there - CAP_NET_BIND doesn't necessarily mean
> the daemon code is trusted.

I think we're comparing apples to oranges. I believe the current proposal is a
bit that says "hey! I'm sandboxed!".   Defensive programming that is often
achieved through continued reduction of capabilities is important, but
orthogonal.  In that model, only once the last vestige of "privilege" is dropped
would the process  set the no_new_privs bit.  Until then, you rely on the
other access contol pieces you've put in place: namespacing, etc.

While I am a fan of capabilities systems, it would be very cool to have a
bottom floor, privilege-freezer which could help against some classes of
sandbox escapes.

cheers!
will

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-14 19:21               ` Will Drewry
  0 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-14 19:21 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Linus Torvalds, Andrew Lutomirski, linux-kernel, keescook,
	john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, segoon,
	rostedt, jmorris, scarybeasts, avi, penberg, viro, mingo, akpm,
	khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Sat, Jan 14, 2012 at 7:30 AM, Jamie Lokier <jamie@shareable.org> wrote:
> Linus Torvalds wrote:
>> On Thu, Jan 12, 2012 at 5:11 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> >
>> > What if you're a daemon that needs something like CAP_NET_BIND but
>> > also wants to be able to run other helpers without CAP_NET_BIND?
>> >
>> > (Also, preventing dropping of privileges will probably make a patch
>> > more complicted -- I'll have to find and update all the places that
>> > allow dropping privileges.)
>>
>> Hey, if it actually makes it more complicated to say "don't change
>> privileges", then I guess my argument that it should be simpler is
>> wrong.
>>
>> That said, the thing you bring up is *not* the actual use-case for the
>> suggestion. The use-case is a "run untrusted code". So the use-case
>> would be to set the flag after you've dropped CAP_NET_BIND, and
>> *before* you actually run the other helpers. You clearly must have a
>> fork() or something like that there, since you want to keep the
>> NET_BIND in the original daemon.
>
> Well suppose you don't trust the daemon either.  It might be running
> in a network namespace where it's safe for untrusted code to bind to
> low ports.
>
> Or maybe you just need to let it bind willy-nilly among a restricted
> subset of low ports - which of course you would like to restrict with
> the seccomp filter.

Unless the port values are the register arguments, seccomp filter
won't help.  It can be used to incrementally drop available system
calls (like socketcall(SYS_LISTEN) or whatever).

> (This can't happen right now because the filter can only look at
> arguments, not memory pointed to - so it can't look at the port
> number.  Can it even see when sys_bind is called on archs like x86
> that use sys_socketcall?!)

Yeah - multiplexed system calls like ipc and socketcall can be filtered
based on the argument value in the register. (socketcall's first argument is
"call".)

> Anyway the principle is there - CAP_NET_BIND doesn't necessarily mean
> the daemon code is trusted.

I think we're comparing apples to oranges. I believe the current proposal is a
bit that says "hey! I'm sandboxed!".   Defensive programming that is often
achieved through continued reduction of capabilities is important, but
orthogonal.  In that model, only once the last vestige of "privilege" is dropped
would the process  set the no_new_privs bit.  Until then, you rely on the
other access contol pieces you've put in place: namespacing, etc.

While I am a fan of capabilities systems, it would be very cool to have a
bottom floor, privilege-freezer which could help against some classes of
sandbox escapes.

cheers!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-14 13:30           ` Jamie Lokier
  2012-01-14 19:21               ` Will Drewry
@ 2012-01-14 20:22             ` Linus Torvalds
  2012-01-14 21:04               ` Andrew Lutomirski
  2012-01-15 20:16               ` Casey Schaufler
  1 sibling, 2 replies; 47+ messages in thread
From: Linus Torvalds @ 2012-01-14 20:22 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Andrew Lutomirski, Will Drewry, linux-kernel, keescook,
	john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, segoon,
	rostedt, jmorris, scarybeasts, avi, penberg, viro, mingo, akpm,
	khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Sat, Jan 14, 2012 at 5:30 AM, Jamie Lokier <jamie@shareable.org> wrote:
>
> Anyway the principle is there - CAP_NET_BIND doesn't necessarily mean
> the daemon code is trusted.

I really think all these arguments are *COMPLETELY* missing the point.

You don't have to use the new flag if you don't want to. Just let it go.

The point of the flag is to not allow security changes. It's that
easy. If you want something else, don't use it.

Because even "dropping capabilities" is quite often the wrong thing to
do. To one person it's "dropping capabilities", to another it is "no
longer running with the capabilities I need".

We've had security bugs that were *due* to dropped capabilities -
people dropped one capability but not another, and fooled code into
doing things they weren't expecting it to do. So quite frankly, I
believe that from a security standpoint it's a hell of a lot safer to
just keep the rules really simple.

Think of the "restrict" bit as "my universe will now run with this
*known* set of permissions".

And if you don't like it, don't use it. It really is that simple. But
don't make the model more complicated. Don't confuse it with "but but
capabilites" crap. The point of the patch is that it makes all of that
go away. There are no capabilities. There only is what you can do.

And yes, I really seriously do believe that is both safer and simpler
than some model that says "you can drop stuff", and then you have to
start making up rules for what "dropping" means.

Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
that no, we should not even allow that. It's simply all too "subtle".

(I don't think Andrew's patch actually touched any of those paths, but
I didn't check)

                      Linus

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-14 20:22             ` Linus Torvalds
@ 2012-01-14 21:04               ` Andrew Lutomirski
  2012-01-15 20:16               ` Casey Schaufler
  1 sibling, 0 replies; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-14 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jamie Lokier, Will Drewry, linux-kernel, keescook, john.johansen,
	serge.hallyn, coreyb, pmoore, eparis, djm, segoon, rostedt,
	jmorris, scarybeasts, avi, penberg, viro, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Sat, Jan 14, 2012 at 12:22 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We've had security bugs that were *due* to dropped capabilities -
> people dropped one capability but not another, and fooled code into
> doing things they weren't expecting it to do. So quite frankly, I
> believe that from a security standpoint it's a hell of a lot safer to
> just keep the rules really simple.
>

Which is the point of this patch.  If suid/sgid/file caps/LSM
transitions are disabled, you can't exploit thinks like sendmail
because they don't have any special privileges to exploit.

>
> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
> that no, we should not even allow that. It's simply all too "subtle".
>
> (I don't think Andrew's patch actually touched any of those paths, but
> I didn't check)

Correct.  I did not touch any of these paths.  I think that's the
right choice for at least three reasons:

1. If there's an exploit that does PR_SET_NO_NEW_CAPS, then
setuid/capset/whatever, then runs something and exploits it, it is
likely to work just as will if setuid/capset is called before
PR_SET_NO_NEW_CAPS.

2. Even if I made that change, it wouldn't help.  You can always
simulate dropping capabilities with LD_PRELOAD or something like
seccomp mode 2.  The secure exec stuff that happens when you execve
something with the setuid bit set won't prevent it because the setuid
bit is *disabled* in this mode.  (Again, there is no exploit here --
there are no new privileges to steal.)

3. Simplicity.  It would be easy to miss something subtle.  There's
the dumpable bit and however it interacts with ptrace (especially on
Ubuntu, which seems to have a patched kernel with different rules),
there's capset, there's file capabilities (which can, I think, drop
capabilities as well as adding them), there's cap_bset, securebits,
etc.  Leaving them all alone means I don't need to worry about bitrot
or about missing one.


I'm tempted to submit a followup patch to allow unprivileged users to
use some of the sys_unshare modes if no_new_privs is set so we can
have an example use case.  CLONE_NEWIPC, CLONE_SYSVSEM, and
CLONE_NEWUTS would be a decent start, I think, and the patch would be
trivial.

--Andy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 3/3] Documentation: prctl/seccomp_filter
  2012-01-12 23:38 ` [PATCH v3 3/3] Documentation: prctl/seccomp_filter Will Drewry
@ 2012-01-15  1:52   ` Randy Dunlap
  2012-01-16  1:41     ` Will Drewry
  2012-01-17 23:29     ` Eric Paris
  1 sibling, 1 reply; 47+ messages in thread
From: Randy Dunlap @ 2012-01-15  1:52 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On 01/12/2012 03:38 PM, Will Drewry wrote:
> Documents how system call filtering using Berkeley Packet
> Filter programs works and how it may be used.
> Includes an example for x86 (32-bit).
> 
> v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net)
>     - document use of tentative always-unprivileged
>     - guard sample compilation for i386 and x86_64
> v2: - move code to samples (corbet@lwn.net)
> 
> Signed-off-by: Will Drewry <wad@chromium.org>
> ---
>  Documentation/prctl/seccomp_filter.txt |   94 ++++++++++++++++++++++++++++++++
>  samples/Makefile                       |    2 +-
>  samples/seccomp/Makefile               |   18 ++++++
>  samples/seccomp/bpf-example.c          |   74 +++++++++++++++++++++++++
>  4 files changed, 187 insertions(+), 1 deletions(-)
>  create mode 100644 Documentation/prctl/seccomp_filter.txt
>  create mode 100644 samples/seccomp/Makefile
>  create mode 100644 samples/seccomp/bpf-example.c
> 
> diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
> new file mode 100644
> index 0000000..2db8b89
> --- /dev/null
> +++ b/Documentation/prctl/seccomp_filter.txt
> @@ -0,0 +1,94 @@
> +		Seccomp filtering
> +		=================
> +
> +Introduction
> +------------
> +
> +A large number of system calls are exposed to every userland process
> +with many of them going unused for the entire lifetime of the process.
> +As system calls change and mature, bugs are found and eradicated.  A
> +certain subset of userland applications benefit by having a reduced set
> +of available system calls.  The resulting set reduces the total kernel
> +surface exposed to the application.  System call filtering is meant for
> +use with those applications.
> +
> +Seccomp filtering provides a means for a process to specify a filter
> +for incoming system calls.  The filter is expressed as a Berkeley Packet
> +Filter (BPF) program, as with socket filters, except that the data
> +operated on is the current user_regs_struct.  This allows for expressive
> +filtering of system calls using the pre-existing system call ABI and
> +using a filter program language with a long history of being exposed to
> +userland.  Additionally, BPF makes it impossible for users of seccomp to
> +fall prey to time-of-check-time-of-use (TOCTOU) attacks that are common
> +in system call interposition frameworks because the evaluated data is
> +solely register state just after system call entry.
> +
> +What it isn't
> +-------------
> +
> +System call filtering isn't a sandbox.  It provides a clearly defined
> +mechanism for minimizing the exposed kernel surface.  Beyond that,
> +policy for logical behavior and information flow should be managed with
> +a combinations of other system hardening techniques and, potentially, a

     combination                                                         an

> +LSM of your choosing.  Expressive, dynamic filters provide further options down
> +this path (avoiding pathological sizes or selecting which of the multiplexed
> +system calls in socketcall() is allowed, for instance) which could be
> +construed, incorrectly, as a more complete sandboxing solution.
> +
> +Usage
> +-----
> +
> +An additional seccomp mode is added, but they are not directly set by the
> +consuming process.  The new mode, '2', is only available if
> +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the
> +PR_ATTACH_SECCOMP_FILTER argument.
> +
> +Interacting with seccomp filters is done using one prctl(2) call.
> +
> +PR_ATTACH_SECCOMP_FILTER:
> +	Allows the specification of a new filter using a BPF program.
> +	The BPF program will be executed over a user_regs_struct data
> +	reflecting system call time except with the system call number
> +	resident in orig_[register].  To allow a system call, the size
> +	of the data must be returned.  At present, all other return values
> +	result in the system call being blocked, but it is recommended to
> +	return 0 in those cases.  This will allow for future custom return
> +	values to be introduced, if ever desired.
> +
> +	Usage:
> +		prctl(PR_ATTACH_SECCOMP_FILTER, prog);
> +
> +	The 'prog' argument is a pointer to a struct sock_fprog which will
> +	contain the filter program.  If the program is invalid, the call
> +	will return -1 and set errno to -EINVAL.

	                                EINVAL.
(I think)

> +
> +	The struct user_regs_struct the @prog will see is based on the
> +	personality of the task at the time of this prctl call.  Additionally,
> +	is_compat_task is also tracked for the @prog.  This means that once set
> +	the calling task will have all of its system calls blocked if it
> +	switches its system call ABI (via personality or other means).
> +
> +	If fork/clone and execve are allowed by @prog, any child processes will
> +	be constrained to the same filters and syscal call ABI as the parent.

	                                       syscall

> +
> +	When called from an unprivileged process (lacking CAP_SYS_ADMIN), the
> +	"always_unprivileged" bit is enabled for the process.
> +
> +	Additionally, if prctl(2) is allowed by the attached filter,
> +	additional filters may be layered on which will increase evaluation
> +	time, but allow for further decreasing the attack surface during
> +	execution of a process.
> +
> +The above call returns 0 on success and non-zero on error.
> +
> +Example
> +-------
> +
> +samples/seccomp-bpf-example.c shows an example process that allows read from stdin,

   samples/seccomp/bpf-example.c

> +write to stdout/err, exit and signal returns for 32-bit x86.

                  /stderr,

> +
> +Adding architecture support
> +-----------------------
> +
> +Any platform with seccomp support will support seccomp filters
> +as long as CONFIG_SECCOMP_FILTER is enabled.



-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-14 20:22             ` Linus Torvalds
  2012-01-14 21:04               ` Andrew Lutomirski
@ 2012-01-15 20:16               ` Casey Schaufler
  2012-01-15 20:59                 ` Andrew Lutomirski
  1 sibling, 1 reply; 47+ messages in thread
From: Casey Schaufler @ 2012-01-15 20:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jamie Lokier, Andrew Lutomirski, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan,
	Casey Schaufler

On 1/14/2012 12:22 PM, Linus Torvalds wrote:
> On Sat, Jan 14, 2012 at 5:30 AM, Jamie Lokier<jamie@shareable.org>  wrote:
>> Anyway the principle is there - CAP_NET_BIND doesn't necessarily mean
>> the daemon code is trusted.
> I really think all these arguments are *COMPLETELY* missing the point.
>
> You don't have to use the new flag if you don't want to. Just let it go.
>
> The point of the flag is to not allow security changes. It's that
> easy. If you want something else, don't use it.
>
> Because even "dropping capabilities" is quite often the wrong thing to
> do. To one person it's "dropping capabilities", to another it is "no
> longer running with the capabilities I need".
>
> We've had security bugs that were *due* to dropped capabilities -
> people dropped one capability but not another, and fooled code into
> doing things they weren't expecting it to do. So quite frankly, I
> believe that from a security standpoint it's a hell of a lot safer to
> just keep the rules really simple.
>
> Think of the "restrict" bit as "my universe will now run with this
> *known* set of permissions".
>
> And if you don't like it, don't use it. It really is that simple. But
> don't make the model more complicated. Don't confuse it with "but but
> capabilites" crap. The point of the patch is that it makes all of that
> go away. There are no capabilities. There only is what you can do.
>
> And yes, I really seriously do believe that is both safer and simpler
> than some model that says "you can drop stuff", and then you have to
> start making up rules for what "dropping" means.
>
> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
> that no, we should not even allow that. It's simply all too "subtle".

I am casting my two cents worth behind Linus. Dropping
privilege can be every bit as dangerous as granting privilege
in the real world of atrocious user land code. Especially in
the case of security policy enforcing user land code.

This even more important in environments that support fine
granularity of privilege, including capabilities and SELinux.
Under SELinux a domain transition can increase, decrease or
completely change a process' access rights and there is really
no way for the kernel to tell which it is because that's all
encoded in the arbitrary SELinux policy. Smack does not try
to maintain a notion of hierarchy of privilege, so the notion
of any change being equivalent to any other is in line with
the Smack philosophy.

>
> (I don't think Andrew's patch actually touched any of those paths, but
> I didn't check)
>
>                        Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-15 20:16               ` Casey Schaufler
@ 2012-01-15 20:59                 ` Andrew Lutomirski
  2012-01-15 21:32                     ` Casey Schaufler
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-15 20:59 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan

On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
<casey@schaufler-ca.com> wrote:
> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>
>> And yes, I really seriously do believe that is both safer and simpler
>> than some model that says "you can drop stuff", and then you have to
>> start making up rules for what "dropping" means.
>>
>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>> that no, we should not even allow that. It's simply all too "subtle".
>
>
> I am casting my two cents worth behind Linus. Dropping
> privilege can be every bit as dangerous as granting privilege
> in the real world of atrocious user land code. Especially in
> the case of security policy enforcing user land code.

Can you think of *any* plausible attack that is possible with my patch
(i.e. no_new_privs allows setuid, setresuid, and capset) that would be
prevented or even mitigated if I blocked those syscalls?  I can't.
(The sendmail-style attack is impossible with no_new_privs.)

Also, how would you even block setuid(2) in a non-confusing manner?
The semantics and error returns are already such a disaster than it's
barely worth it for anything to check the return value.

>
> This even more important in environments that support fine
> granularity of privilege, including capabilities and SELinux.
> Under SELinux a domain transition can increase, decrease or
> completely change a process' access rights and there is really
> no way for the kernel to tell which it is because that's all
> encoded in the arbitrary SELinux policy. Smack does not try
> to maintain a notion of hierarchy of privilege, so the notion
> of any change being equivalent to any other is in line with
> the Smack philosophy.
>

My patch does not (barring bugs) allow selinux domain transitions.  I
certainly think that all security transitions that vary across
distributions should be blocked by no_new_privs.

--Andy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-15 20:59                 ` Andrew Lutomirski
@ 2012-01-15 21:32                     ` Casey Schaufler
  0 siblings, 0 replies; 47+ messages in thread
From: Casey Schaufler @ 2012-01-15 21:32 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan,
	Casey Schaufler

On 1/15/2012 12:59 PM, Andrew Lutomirski wrote:
> On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
> <casey@schaufler-ca.com>  wrote:
>> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>> And yes, I really seriously do believe that is both safer and simpler
>>> than some model that says "you can drop stuff", and then you have to
>>> start making up rules for what "dropping" means.
>>>
>>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>>> that no, we should not even allow that. It's simply all too "subtle".
>>
>> I am casting my two cents worth behind Linus. Dropping
>> privilege can be every bit as dangerous as granting privilege
>> in the real world of atrocious user land code. Especially in
>> the case of security policy enforcing user land code.
> Can you think of *any* plausible attack that is possible with my patch
> (i.e. no_new_privs allows setuid, setresuid, and capset) that would be
> prevented or even mitigated if I blocked those syscalls?  I can't.
> (The sendmail-style attack is impossible with no_new_privs.)

I am notoriously bad at coming up with this sort of example.
I will try, I may not hit the mark, but it should be close.

The application is running with saved uid != euid when
no-new-privs is set. It execs a new binary, which keeps
the saved and effective uids. The program calls setreuid,
which succeeds. It opens the saved userid's files.

Some people reading this will say this is expected and
appropriate behavior, and others will say it's an exploit
under sandbox conditions. Those in the latter group can
point out that blocking setreuid() would prevent what
they consider an exploit. Those in the former group will
say blocking the setreuid would introduce a denial of
service that could lead to an exploit if the program is
poorly written.

Because the correctness of the behavior is ambiguous
(to me at least) when no_new_privs is set I argue that
we shouldn't have it. If we are going to have it my
feeling is that it should be as draconian as possible.
Because you can't always tell if state A is "more"
privileged than state B, any change between A and B has
to be treated as an escalation.

>
> Also, how would you even block setuid(2) in a non-confusing manner?
> The semantics and error returns are already such a disaster than it's
> barely worth it for anything to check the return value.
>
>> This even more important in environments that support fine
>> granularity of privilege, including capabilities and SELinux.
>> Under SELinux a domain transition can increase, decrease or
>> completely change a process' access rights and there is really
>> no way for the kernel to tell which it is because that's all
>> encoded in the arbitrary SELinux policy. Smack does not try
>> to maintain a notion of hierarchy of privilege, so the notion
>> of any change being equivalent to any other is in line with
>> the Smack philosophy.
>>
> My patch does not (barring bugs) allow selinux domain transitions.  I
> certainly think that all security transitions that vary across
> distributions should be blocked by no_new_privs.
>
> --Andy
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-15 21:32                     ` Casey Schaufler
  0 siblings, 0 replies; 47+ messages in thread
From: Casey Schaufler @ 2012-01-15 21:32 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan,
	Casey Schaufler

On 1/15/2012 12:59 PM, Andrew Lutomirski wrote:
> On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
> <casey@schaufler-ca.com>  wrote:
>> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>> And yes, I really seriously do believe that is both safer and simpler
>>> than some model that says "you can drop stuff", and then you have to
>>> start making up rules for what "dropping" means.
>>>
>>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>>> that no, we should not even allow that. It's simply all too "subtle".
>>
>> I am casting my two cents worth behind Linus. Dropping
>> privilege can be every bit as dangerous as granting privilege
>> in the real world of atrocious user land code. Especially in
>> the case of security policy enforcing user land code.
> Can you think of *any* plausible attack that is possible with my patch
> (i.e. no_new_privs allows setuid, setresuid, and capset) that would be
> prevented or even mitigated if I blocked those syscalls?  I can't.
> (The sendmail-style attack is impossible with no_new_privs.)

I am notoriously bad at coming up with this sort of example.
I will try, I may not hit the mark, but it should be close.

The application is running with saved uid != euid when
no-new-privs is set. It execs a new binary, which keeps
the saved and effective uids. The program calls setreuid,
which succeeds. It opens the saved userid's files.

Some people reading this will say this is expected and
appropriate behavior, and others will say it's an exploit
under sandbox conditions. Those in the latter group can
point out that blocking setreuid() would prevent what
they consider an exploit. Those in the former group will
say blocking the setreuid would introduce a denial of
service that could lead to an exploit if the program is
poorly written.

Because the correctness of the behavior is ambiguous
(to me at least) when no_new_privs is set I argue that
we shouldn't have it. If we are going to have it my
feeling is that it should be as draconian as possible.
Because you can't always tell if state A is "more"
privileged than state B, any change between A and B has
to be treated as an escalation.

>
> Also, how would you even block setuid(2) in a non-confusing manner?
> The semantics and error returns are already such a disaster than it's
> barely worth it for anything to check the return value.
>
>> This even more important in environments that support fine
>> granularity of privilege, including capabilities and SELinux.
>> Under SELinux a domain transition can increase, decrease or
>> completely change a process' access rights and there is really
>> no way for the kernel to tell which it is because that's all
>> encoded in the arbitrary SELinux policy. Smack does not try
>> to maintain a notion of hierarchy of privilege, so the notion
>> of any change being equivalent to any other is in line with
>> the Smack philosophy.
>>
> My patch does not (barring bugs) allow selinux domain transitions.  I
> certainly think that all security transitions that vary across
> distributions should be blocked by no_new_privs.
>
> --Andy
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-15 21:32                     ` Casey Schaufler
  (?)
@ 2012-01-15 22:07                     ` Andrew Lutomirski
  2012-01-16  2:04                         ` Will Drewry
  2012-01-16  2:41                         ` Casey Schaufler
  -1 siblings, 2 replies; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-15 22:07 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan

On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 1/15/2012 12:59 PM, Andrew Lutomirski wrote:
>>
>> On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
>> <casey@schaufler-ca.com>  wrote:
>>>
>>> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>>>
>>>> And yes, I really seriously do believe that is both safer and simpler
>>>> than some model that says "you can drop stuff", and then you have to
>>>> start making up rules for what "dropping" means.
>>>>
>>>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>>>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>>>> that no, we should not even allow that. It's simply all too "subtle".
>>>
>>>
>>> I am casting my two cents worth behind Linus. Dropping
>>> privilege can be every bit as dangerous as granting privilege
>>> in the real world of atrocious user land code. Especially in
>>> the case of security policy enforcing user land code.
>>
>> Can you think of *any* plausible attack that is possible with my patch
>> (i.e. no_new_privs allows setuid, setresuid, and capset) that would be
>> prevented or even mitigated if I blocked those syscalls?  I can't.
>> (The sendmail-style attack is impossible with no_new_privs.)
>
>
> I am notoriously bad at coming up with this sort of example.
> I will try, I may not hit the mark, but it should be close.
>
> The application is running with saved uid != euid when
> no-new-privs is set. It execs a new binary, which keeps
> the saved and effective uids. The program calls setreuid,
> which succeeds. It opens the saved userid's files.

If you don't trust that binary, then why are you execing it with saved
uid != euid in the first place?  If you are setting no_new_privs, then
you are new code and should have at least some basic awareness of the
semantics.  The exact same "exploit" is possible if you have
CAP_DAC_OVERRIDE with either no_new_privs semantics -- if you have a
privilege and you run untrusted code, then you had better remove that
privilege somehow for the untrusted code.

IOW, *drop privileges if you are a sandbox*.  Otherwise you're screwed
with or without no_new_privs.


Another way of saying this is: no_new_privs is not a sandbox.  It's
just a way to make it safe for sandboxes and other such weird things
processes can do to themselves safe across execve.  If you want a
sandbox, use seccomp mode 2, which will require you to set
no_new_privs.


--Andy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 3/3] Documentation: prctl/seccomp_filter
  2012-01-15  1:52   ` Randy Dunlap
@ 2012-01-16  1:41     ` Will Drewry
  0 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-16  1:41 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Sat, Jan 14, 2012 at 7:52 PM, Randy Dunlap <rdunlap@xenotime.net> wrote:
> On 01/12/2012 03:38 PM, Will Drewry wrote:
>> Documents how system call filtering using Berkeley Packet
>> Filter programs works and how it may be used.
>> Includes an example for x86 (32-bit).
>>
>> v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net)
>>     - document use of tentative always-unprivileged
>>     - guard sample compilation for i386 and x86_64
>> v2: - move code to samples (corbet@lwn.net)
>>
>> Signed-off-by: Will Drewry <wad@chromium.org>
>> ---
>>  Documentation/prctl/seccomp_filter.txt |   94 ++++++++++++++++++++++++++++++++
>>  samples/Makefile                       |    2 +-
>>  samples/seccomp/Makefile               |   18 ++++++
>>  samples/seccomp/bpf-example.c          |   74 +++++++++++++++++++++++++
>>  4 files changed, 187 insertions(+), 1 deletions(-)
>>  create mode 100644 Documentation/prctl/seccomp_filter.txt
>>  create mode 100644 samples/seccomp/Makefile
>>  create mode 100644 samples/seccomp/bpf-example.c
>>
>> diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
>> new file mode 100644
>> index 0000000..2db8b89
>> --- /dev/null
>> +++ b/Documentation/prctl/seccomp_filter.txt
>> @@ -0,0 +1,94 @@
>> +             Seccomp filtering
>> +             =================
>> +
>> +Introduction
>> +------------
>> +
>> +A large number of system calls are exposed to every userland process
>> +with many of them going unused for the entire lifetime of the process.
>> +As system calls change and mature, bugs are found and eradicated.  A
>> +certain subset of userland applications benefit by having a reduced set
>> +of available system calls.  The resulting set reduces the total kernel
>> +surface exposed to the application.  System call filtering is meant for
>> +use with those applications.
>> +
>> +Seccomp filtering provides a means for a process to specify a filter
>> +for incoming system calls.  The filter is expressed as a Berkeley Packet
>> +Filter (BPF) program, as with socket filters, except that the data
>> +operated on is the current user_regs_struct.  This allows for expressive
>> +filtering of system calls using the pre-existing system call ABI and
>> +using a filter program language with a long history of being exposed to
>> +userland.  Additionally, BPF makes it impossible for users of seccomp to
>> +fall prey to time-of-check-time-of-use (TOCTOU) attacks that are common
>> +in system call interposition frameworks because the evaluated data is
>> +solely register state just after system call entry.
>> +
>> +What it isn't
>> +-------------
>> +
>> +System call filtering isn't a sandbox.  It provides a clearly defined
>> +mechanism for minimizing the exposed kernel surface.  Beyond that,
>> +policy for logical behavior and information flow should be managed with
>> +a combinations of other system hardening techniques and, potentially, a
>
>     combination                                                         an
>
>> +LSM of your choosing.  Expressive, dynamic filters provide further options down
>> +this path (avoiding pathological sizes or selecting which of the multiplexed
>> +system calls in socketcall() is allowed, for instance) which could be
>> +construed, incorrectly, as a more complete sandboxing solution.
>> +
>> +Usage
>> +-----
>> +
>> +An additional seccomp mode is added, but they are not directly set by the
>> +consuming process.  The new mode, '2', is only available if
>> +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the
>> +PR_ATTACH_SECCOMP_FILTER argument.
>> +
>> +Interacting with seccomp filters is done using one prctl(2) call.
>> +
>> +PR_ATTACH_SECCOMP_FILTER:
>> +     Allows the specification of a new filter using a BPF program.
>> +     The BPF program will be executed over a user_regs_struct data
>> +     reflecting system call time except with the system call number
>> +     resident in orig_[register].  To allow a system call, the size
>> +     of the data must be returned.  At present, all other return values
>> +     result in the system call being blocked, but it is recommended to
>> +     return 0 in those cases.  This will allow for future custom return
>> +     values to be introduced, if ever desired.
>> +
>> +     Usage:
>> +             prctl(PR_ATTACH_SECCOMP_FILTER, prog);
>> +
>> +     The 'prog' argument is a pointer to a struct sock_fprog which will
>> +     contain the filter program.  If the program is invalid, the call
>> +     will return -1 and set errno to -EINVAL.
>
>                                        EINVAL.
> (I think)
>
>> +
>> +     The struct user_regs_struct the @prog will see is based on the
>> +     personality of the task at the time of this prctl call.  Additionally,
>> +     is_compat_task is also tracked for the @prog.  This means that once set
>> +     the calling task will have all of its system calls blocked if it
>> +     switches its system call ABI (via personality or other means).
>> +
>> +     If fork/clone and execve are allowed by @prog, any child processes will
>> +     be constrained to the same filters and syscal call ABI as the parent.
>
>                                               syscall
>
>> +
>> +     When called from an unprivileged process (lacking CAP_SYS_ADMIN), the
>> +     "always_unprivileged" bit is enabled for the process.
>> +
>> +     Additionally, if prctl(2) is allowed by the attached filter,
>> +     additional filters may be layered on which will increase evaluation
>> +     time, but allow for further decreasing the attack surface during
>> +     execution of a process.
>> +
>> +The above call returns 0 on success and non-zero on error.
>> +
>> +Example
>> +-------
>> +
>> +samples/seccomp-bpf-example.c shows an example process that allows read from stdin,
>
>   samples/seccomp/bpf-example.c
>
>> +write to stdout/err, exit and signal returns for 32-bit x86.
>
>                  /stderr,
>

Thanks for the close reading! I've got another patchset mostly rolled
and I'll pull these in too.
will

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-15 22:07                     ` Andrew Lutomirski
@ 2012-01-16  2:04                         ` Will Drewry
  2012-01-16  2:41                         ` Casey Schaufler
  1 sibling, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-16  2:04 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Casey Schaufler, Linus Torvalds, Jamie Lokier, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan

On Sun, Jan 15, 2012 at 4:07 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 1/15/2012 12:59 PM, Andrew Lutomirski wrote:
>>>
>>> On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
>>> <casey@schaufler-ca.com>  wrote:
>>>>
>>>> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>>>>
>>>>> And yes, I really seriously do believe that is both safer and simpler
>>>>> than some model that says "you can drop stuff", and then you have to
>>>>> start making up rules for what "dropping" means.
>>>>>
>>>>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>>>>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>>>>> that no, we should not even allow that. It's simply all too "subtle".
>>>>
>>>>
>>>> I am casting my two cents worth behind Linus. Dropping
>>>> privilege can be every bit as dangerous as granting privilege
>>>> in the real world of atrocious user land code. Especially in
>>>> the case of security policy enforcing user land code.
>>>
>>> Can you think of *any* plausible attack that is possible with my patch
>>> (i.e. no_new_privs allows setuid, setresuid, and capset) that would be
>>> prevented or even mitigated if I blocked those syscalls?  I can't.
>>> (The sendmail-style attack is impossible with no_new_privs.)
>>
>>
>> I am notoriously bad at coming up with this sort of example.
>> I will try, I may not hit the mark, but it should be close.
>>
>> The application is running with saved uid != euid when
>> no-new-privs is set. It execs a new binary, which keeps
>> the saved and effective uids. The program calls setreuid,
>> which succeeds. It opens the saved userid's files.
>
> If you don't trust that binary, then why are you execing it with saved
> uid != euid in the first place?  If you are setting no_new_privs, then
> you are new code and should have at least some basic awareness of the
> semantics.  The exact same "exploit" is possible if you have
> CAP_DAC_OVERRIDE with either no_new_privs semantics -- if you have a
> privilege and you run untrusted code, then you had better remove that
> privilege somehow for the untrusted code.
>
> IOW, *drop privileges if you are a sandbox*.  Otherwise you're screwed
> with or without no_new_privs.

One consideration could be to add do_exit()s at known DAC transitions
(set*id, fcaps). I don't know if that'd be wise, but it would remove
some described ambiguity.  The same could be done with exec when the
(e)uid/gid/fcaps change.  However, none of that helps with the opaque
LSM data, so that'd have to be left up to the LSMs and the LSM_* flag
you've added.

This does seem subject to bitrot though, and your current approach
certainly suits the needs of the general
don't-do-something-stupid-on-exec that seccomp filter, unprivileged
chroot, unshare, etc all need (which is really nice).

> Another way of saying this is: no_new_privs is not a sandbox.  It's
> just a way to make it safe for sandboxes and other such weird things
> processes can do to themselves safe across execve.  If you want a
> sandbox, use seccomp mode 2, which will require you to set
> no_new_privs.

And even with system call filtering, you'll need to have appropriately
set up accessible files, file descriptors, memory maps, etc if you
want a proper sandbox.  There's no single sandbox-equivalent kernel
feature.  no_new_privs would be an excellent addition to the sandbox
toolkit. (I've tried to synthesize it in userspace with an empty
capability bounding set, SECURE_NOROOT, and MNT_NOSUID.  Having this
in a future-proof task_struct bit form would be amazing.)

cheers!
will

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-16  2:04                         ` Will Drewry
  0 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-16  2:04 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Casey Schaufler, Linus Torvalds, Jamie Lokier, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan

On Sun, Jan 15, 2012 at 4:07 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 1/15/2012 12:59 PM, Andrew Lutomirski wrote:
>>>
>>> On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
>>> <casey@schaufler-ca.com>  wrote:
>>>>
>>>> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>>>>
>>>>> And yes, I really seriously do believe that is both safer and simpler
>>>>> than some model that says "you can drop stuff", and then you have to
>>>>> start making up rules for what "dropping" means.
>>>>>
>>>>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>>>>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>>>>> that no, we should not even allow that. It's simply all too "subtle".
>>>>
>>>>
>>>> I am casting my two cents worth behind Linus. Dropping
>>>> privilege can be every bit as dangerous as granting privilege
>>>> in the real world of atrocious user land code. Especially in
>>>> the case of security policy enforcing user land code.
>>>
>>> Can you think of *any* plausible attack that is possible with my patch
>>> (i.e. no_new_privs allows setuid, setresuid, and capset) that would be
>>> prevented or even mitigated if I blocked those syscalls?  I can't.
>>> (The sendmail-style attack is impossible with no_new_privs.)
>>
>>
>> I am notoriously bad at coming up with this sort of example.
>> I will try, I may not hit the mark, but it should be close.
>>
>> The application is running with saved uid != euid when
>> no-new-privs is set. It execs a new binary, which keeps
>> the saved and effective uids. The program calls setreuid,
>> which succeeds. It opens the saved userid's files.
>
> If you don't trust that binary, then why are you execing it with saved
> uid != euid in the first place?  If you are setting no_new_privs, then
> you are new code and should have at least some basic awareness of the
> semantics.  The exact same "exploit" is possible if you have
> CAP_DAC_OVERRIDE with either no_new_privs semantics -- if you have a
> privilege and you run untrusted code, then you had better remove that
> privilege somehow for the untrusted code.
>
> IOW, *drop privileges if you are a sandbox*.  Otherwise you're screwed
> with or without no_new_privs.

One consideration could be to add do_exit()s at known DAC transitions
(set*id, fcaps). I don't know if that'd be wise, but it would remove
some described ambiguity.  The same could be done with exec when the
(e)uid/gid/fcaps change.  However, none of that helps with the opaque
LSM data, so that'd have to be left up to the LSMs and the LSM_* flag
you've added.

This does seem subject to bitrot though, and your current approach
certainly suits the needs of the general
don't-do-something-stupid-on-exec that seccomp filter, unprivileged
chroot, unshare, etc all need (which is really nice).

> Another way of saying this is: no_new_privs is not a sandbox.  It's
> just a way to make it safe for sandboxes and other such weird things
> processes can do to themselves safe across execve.  If you want a
> sandbox, use seccomp mode 2, which will require you to set
> no_new_privs.

And even with system call filtering, you'll need to have appropriately
set up accessible files, file descriptors, memory maps, etc if you
want a proper sandbox.  There's no single sandbox-equivalent kernel
feature.  no_new_privs would be an excellent addition to the sandbox
toolkit. (I've tried to synthesize it in userspace with an empty
capability bounding set, SECURE_NOROOT, and MNT_NOSUID.  Having this
in a future-proof task_struct bit form would be amazing.)

cheers!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-15 22:07                     ` Andrew Lutomirski
@ 2012-01-16  2:41                         ` Casey Schaufler
  2012-01-16  2:41                         ` Casey Schaufler
  1 sibling, 0 replies; 47+ messages in thread
From: Casey Schaufler @ 2012-01-16  2:41 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan,
	Casey Schaufler

On 1/15/2012 2:07 PM, Andrew Lutomirski wrote:
> On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler<casey@schaufler-ca.com>  wrote:
>> On 1/15/2012 12:59 PM, Andrew Lutomirski wrote:
>>> On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
>>> <casey@schaufler-ca.com>    wrote:
>>>> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>>>> And yes, I really seriously do believe that is both safer and simpler
>>>>> than some model that says "you can drop stuff", and then you have to
>>>>> start making up rules for what "dropping" means.
>>>>>
>>>>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>>>>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>>>>> that no, we should not even allow that. It's simply all too "subtle".
>>>>
>>>> I am casting my two cents worth behind Linus. Dropping
>>>> privilege can be every bit as dangerous as granting privilege
>>>> in the real world of atrocious user land code. Especially in
>>>> the case of security policy enforcing user land code.
>>> Can you think of *any* plausible attack that is possible with my patch
>>> (i.e. no_new_privs allows setuid, setresuid, and capset) that would be
>>> prevented or even mitigated if I blocked those syscalls?  I can't.
>>> (The sendmail-style attack is impossible with no_new_privs.)
>>
>> I am notoriously bad at coming up with this sort of example.
>> I will try, I may not hit the mark, but it should be close.
>>
>> The application is running with saved uid != euid when
>> no-new-privs is set. It execs a new binary, which keeps
>> the saved and effective uids. The program calls setreuid,
>> which succeeds. It opens the saved userid's files.
> If you don't trust that binary, then why are you execing it with saved
> uid != euid in the first place?

If I could trust the binary I wouldn't need your no_new_privs
semantics in the first place. Do you have any idea how big the
chrome browser binary is? You can't link it on a 32bit machine
it uses so much address space. On top of that, most modern
applications are compositions of scripts and interpreters built
on top of multiple layers of middleware. Of course I don't trust
the binary!

> If you are setting no_new_privs, then
> you are new code and should have at least some basic awareness of the
> semantics.

It's not the program setting no_new_privs that I'm worried about.
It's the nth descendant of that program and its ancestors that are
going to do screwy things.

> The exact same "exploit" is possible if you have
> CAP_DAC_OVERRIDE with either no_new_privs semantics -- if you have a
> privilege and you run untrusted code, then you had better remove that
> privilege somehow for the untrusted code.

Yes, and my very own name is engraved on the security wall of shame
for the classic sendmail capabilities exploit. Don't think for a
minute that something won't get done just because its obviously
inappropriate.

> IOW, *drop privileges if you are a sandbox*.  Otherwise you're screwed
> with or without no_new_privs.
>
>
> Another way of saying this is: no_new_privs is not a sandbox.  It's
> just a way to make it safe for sandboxes and other such weird things
> processes can do to themselves safe across execve.  If you want a
> sandbox, use seccomp mode 2, which will require you to set
> no_new_privs.
>
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-16  2:41                         ` Casey Schaufler
  0 siblings, 0 replies; 47+ messages in thread
From: Casey Schaufler @ 2012-01-16  2:41 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan,
	Casey Schaufler

On 1/15/2012 2:07 PM, Andrew Lutomirski wrote:
> On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler<casey@schaufler-ca.com>  wrote:
>> On 1/15/2012 12:59 PM, Andrew Lutomirski wrote:
>>> On Sun, Jan 15, 2012 at 12:16 PM, Casey Schaufler
>>> <casey@schaufler-ca.com>    wrote:
>>>> On 1/14/2012 12:22 PM, Linus Torvalds wrote:
>>>>> And yes, I really seriously do believe that is both safer and simpler
>>>>> than some model that says "you can drop stuff", and then you have to
>>>>> start making up rules for what "dropping" means.
>>>>>
>>>>> Does "dropping" mean allowing setuid(geteuid()) for example? That *is*
>>>>> dropping the uid in a _POSIX_SAVED_IDS environment. And I'm saying
>>>>> that no, we should not even allow that. It's simply all too "subtle".
>>>>
>>>> I am casting my two cents worth behind Linus. Dropping
>>>> privilege can be every bit as dangerous as granting privilege
>>>> in the real world of atrocious user land code. Especially in
>>>> the case of security policy enforcing user land code.
>>> Can you think of *any* plausible attack that is possible with my patch
>>> (i.e. no_new_privs allows setuid, setresuid, and capset) that would be
>>> prevented or even mitigated if I blocked those syscalls?  I can't.
>>> (The sendmail-style attack is impossible with no_new_privs.)
>>
>> I am notoriously bad at coming up with this sort of example.
>> I will try, I may not hit the mark, but it should be close.
>>
>> The application is running with saved uid != euid when
>> no-new-privs is set. It execs a new binary, which keeps
>> the saved and effective uids. The program calls setreuid,
>> which succeeds. It opens the saved userid's files.
> If you don't trust that binary, then why are you execing it with saved
> uid != euid in the first place?

If I could trust the binary I wouldn't need your no_new_privs
semantics in the first place. Do you have any idea how big the
chrome browser binary is? You can't link it on a 32bit machine
it uses so much address space. On top of that, most modern
applications are compositions of scripts and interpreters built
on top of multiple layers of middleware. Of course I don't trust
the binary!

> If you are setting no_new_privs, then
> you are new code and should have at least some basic awareness of the
> semantics.

It's not the program setting no_new_privs that I'm worried about.
It's the nth descendant of that program and its ancestors that are
going to do screwy things.

> The exact same "exploit" is possible if you have
> CAP_DAC_OVERRIDE with either no_new_privs semantics -- if you have a
> privilege and you run untrusted code, then you had better remove that
> privilege somehow for the untrusted code.

Yes, and my very own name is engraved on the security wall of shame
for the classic sendmail capabilities exploit. Don't think for a
minute that something won't get done just because its obviously
inappropriate.

> IOW, *drop privileges if you are a sandbox*.  Otherwise you're screwed
> with or without no_new_privs.
>
>
> Another way of saying this is: no_new_privs is not a sandbox.  It's
> just a way to make it safe for sandboxes and other such weird things
> processes can do to themselves safe across execve.  If you want a
> sandbox, use seccomp mode 2, which will require you to set
> no_new_privs.
>
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-16  2:41                         ` Casey Schaufler
  (?)
@ 2012-01-16  7:45                         ` Andrew Lutomirski
  2012-01-16 18:02                             ` Casey Schaufler
  -1 siblings, 1 reply; 47+ messages in thread
From: Andrew Lutomirski @ 2012-01-16  7:45 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan

On Sun, Jan 15, 2012 at 6:41 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 1/15/2012 2:07 PM, Andrew Lutomirski wrote:
>>
>> On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler<casey@schaufler-ca.com>
>>
>> If you don't trust that binary, then why are you execing it with saved
>> uid != euid in the first place?
>
>
> If I could trust the binary I wouldn't need your no_new_privs
> semantics in the first place. Do you have any idea how big the
> chrome browser binary is? You can't link it on a 32bit machine
> it uses so much address space. On top of that, most modern
> applications are compositions of scripts and interpreters built
> on top of multiple layers of middleware. Of course I don't trust
> the binary!
>

I'm not sure we're really talking about the same thing here.  I agree
that, if you are trying to sandbox untrusted code, then you probably
don't want that code messing with setuid, capset, or any other
privilege-changing operation.

no_new_privs is not intended to be that sandbox.  It is, by itself, at
best a small reduction in attack surface.

The attack surface accessible to a program (e.g. chrome) that you run
normally is huge.  There is filesystem access, ptrace, unix sockets,
any available privileges, setuid programs, /proc, etc.  LSMs try to
characterize and control that whole attack surface.  seccomp mode 2
allows a whitelisting approach in which everything is denied except
that which is explicitly allowed (and I think that's a much better
approach to sandboxing things).  The problem is that seccomp mode 2,
as well as anything else that changes the behavior of syscalls in a
nonstandard way (chroot, unshare, etc), can cause existing code to
malfunction.  That's how the sendmail bug came to be -- dropping a
privilege made sendmail do the wrong thing.  This type of attack works
by changing something that persists across a *gain* of privilege and
then attacking the code that gains that privilege.  If new things like
seccomp mode 2 require no_new_privs, then that entire class of attacks
is prevented.

In answer to your specific example, if you are trying to sandbox
chrome or anything else and you forget to drop your privileged saved
uid, I really don't think it's no_new_privs's job to rescue you.

--Andy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-16  7:45                         ` Andrew Lutomirski
@ 2012-01-16 18:02                             ` Casey Schaufler
  0 siblings, 0 replies; 47+ messages in thread
From: Casey Schaufler @ 2012-01-16 18:02 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan,
	Casey Schaufler

On 1/15/2012 11:45 PM, Andrew Lutomirski wrote:
> On Sun, Jan 15, 2012 at 6:41 PM, Casey Schaufler<casey@schaufler-ca.com>  wrote:
>> On 1/15/2012 2:07 PM, Andrew Lutomirski wrote:
>>> On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler<casey@schaufler-ca.com>
>>>
>>> If you don't trust that binary, then why are you execing it with saved
>>> uid != euid in the first place?
>>
>> If I could trust the binary I wouldn't need your no_new_privs
>> semantics in the first place. Do you have any idea how big the
>> chrome browser binary is? You can't link it on a 32bit machine
>> it uses so much address space. On top of that, most modern
>> applications are compositions of scripts and interpreters built
>> on top of multiple layers of middleware. Of course I don't trust
>> the binary!
>>
> I'm not sure we're really talking about the same thing here.  I agree
> that, if you are trying to sandbox untrusted code, then you probably
> don't want that code messing with setuid, capset, or any other
> privilege-changing operation.

My point is that if you're interacting with untrusted code it's
somewhat dangerous to change system call behavior and if you're
dealing strictly with trusted code you don't need to change
system call behavior.

> no_new_privs is not intended to be that sandbox.  It is, by itself, at
> best a small reduction in attack surface.

My preference is that we not complicate an environment that has
already proven too difficult for many programmers to use effectively.

> The attack surface accessible to a program (e.g. chrome) that you run
> normally is huge.  There is filesystem access, ptrace, unix sockets,
> any available privileges, setuid programs, /proc, etc.  LSMs try to
> characterize and control that whole attack surface.  seccomp mode 2
> allows a whitelisting approach in which everything is denied except
> that which is explicitly allowed (and I think that's a much better
> approach to sandboxing things).  The problem is that seccomp mode 2,
> as well as anything else that changes the behavior of syscalls in a
> nonstandard way (chroot, unshare, etc), can cause existing code to
> malfunction.  That's how the sendmail bug came to be -- dropping a
> privilege made sendmail do the wrong thing.  This type of attack works
> by changing something that persists across a *gain* of privilege and
> then attacking the code that gains that privilege.  If new things like
> seccomp mode 2 require no_new_privs, then that entire class of attacks
> is prevented.

Believe me, I am familiar with programs, trusted and otherwise,
that go wonky when there are little changes in the behavior of
syscalls. My take is that making security decisions based on the
system call used rather than on the thing being accessed leads
the programmer to expect behavior that does not always match the
underlying implementation. Yes, system call based controls are
easier to use and understand. That does not make them correct.

> In answer to your specific example, if you are trying to sandbox
> chrome or anything else and you forget to drop your privileged saved
> uid, I really don't think it's no_new_privs's job to rescue you.

As much as I dislike the modern application paradigms, I don't
see a lot of point in investing in new security facilities that
are not of value in that arena. And I really mean of value, not
just something that some "architect" threw onto a spreadsheet for
the PM to track. I expect that we're overdue for a mindset changing
sort of facility to address the issues that really matter.

Patches for that to follow, but not any time soon. :-)

>
> --Andy
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-16 18:02                             ` Casey Schaufler
  0 siblings, 0 replies; 47+ messages in thread
From: Casey Schaufler @ 2012-01-16 18:02 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, Jamie Lokier, Will Drewry, linux-kernel,
	keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis,
	djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro,
	mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak,
	eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel,
	linux-security-module, olofj, mhalcrow, dlaor, corbet, alan,
	Casey Schaufler

On 1/15/2012 11:45 PM, Andrew Lutomirski wrote:
> On Sun, Jan 15, 2012 at 6:41 PM, Casey Schaufler<casey@schaufler-ca.com>  wrote:
>> On 1/15/2012 2:07 PM, Andrew Lutomirski wrote:
>>> On Sun, Jan 15, 2012 at 1:32 PM, Casey Schaufler<casey@schaufler-ca.com>
>>>
>>> If you don't trust that binary, then why are you execing it with saved
>>> uid != euid in the first place?
>>
>> If I could trust the binary I wouldn't need your no_new_privs
>> semantics in the first place. Do you have any idea how big the
>> chrome browser binary is? You can't link it on a 32bit machine
>> it uses so much address space. On top of that, most modern
>> applications are compositions of scripts and interpreters built
>> on top of multiple layers of middleware. Of course I don't trust
>> the binary!
>>
> I'm not sure we're really talking about the same thing here.  I agree
> that, if you are trying to sandbox untrusted code, then you probably
> don't want that code messing with setuid, capset, or any other
> privilege-changing operation.

My point is that if you're interacting with untrusted code it's
somewhat dangerous to change system call behavior and if you're
dealing strictly with trusted code you don't need to change
system call behavior.

> no_new_privs is not intended to be that sandbox.  It is, by itself, at
> best a small reduction in attack surface.

My preference is that we not complicate an environment that has
already proven too difficult for many programmers to use effectively.

> The attack surface accessible to a program (e.g. chrome) that you run
> normally is huge.  There is filesystem access, ptrace, unix sockets,
> any available privileges, setuid programs, /proc, etc.  LSMs try to
> characterize and control that whole attack surface.  seccomp mode 2
> allows a whitelisting approach in which everything is denied except
> that which is explicitly allowed (and I think that's a much better
> approach to sandboxing things).  The problem is that seccomp mode 2,
> as well as anything else that changes the behavior of syscalls in a
> nonstandard way (chroot, unshare, etc), can cause existing code to
> malfunction.  That's how the sendmail bug came to be -- dropping a
> privilege made sendmail do the wrong thing.  This type of attack works
> by changing something that persists across a *gain* of privilege and
> then attacking the code that gains that privilege.  If new things like
> seccomp mode 2 require no_new_privs, then that entire class of attacks
> is prevented.

Believe me, I am familiar with programs, trusted and otherwise,
that go wonky when there are little changes in the behavior of
syscalls. My take is that making security decisions based on the
system call used rather than on the thing being accessed leads
the programmer to expect behavior that does not always match the
underlying implementation. Yes, system call based controls are
easier to use and understand. That does not make them correct.

> In answer to your specific example, if you are trying to sandbox
> chrome or anything else and you forget to drop your privileged saved
> uid, I really don't think it's no_new_privs's job to rescue you.

As much as I dislike the modern application paradigms, I don't
see a lot of point in investing in new security facilities that
are not of value in that arena. And I really mean of value, not
just something that some "architect" threw onto a spreadsheet for
the PM to track. I expect that we're overdue for a mindset changing
sort of facility to address the issues that really matter.

Patches for that to follow, but not any time soon. :-)

>
> --Andy
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 3/3] Documentation: prctl/seccomp_filter
  2012-01-12 23:38 ` [PATCH v3 3/3] Documentation: prctl/seccomp_filter Will Drewry
@ 2012-01-17 23:29     ` Eric Paris
  2012-01-17 23:29     ` Eric Paris
  1 sibling, 0 replies; 47+ messages in thread
From: Eric Paris @ 2012-01-17 23:29 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 6:38 PM, Will Drewry <wad@chromium.org> wrote:

> +#ifndef PR_ATTACH_SECCOMP_FILTER
> +#      define PR_ATTACH_SECCOMP_FILTER 36
> +#endif


The example code here uses 36, but the real code I believe uses 37.
Can we get these in sync?  Thanks!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 3/3] Documentation: prctl/seccomp_filter
@ 2012-01-17 23:29     ` Eric Paris
  0 siblings, 0 replies; 47+ messages in thread
From: Eric Paris @ 2012-01-17 23:29 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Thu, Jan 12, 2012 at 6:38 PM, Will Drewry <wad@chromium.org> wrote:

> +#ifndef PR_ATTACH_SECCOMP_FILTER
> +#      define PR_ATTACH_SECCOMP_FILTER 36
> +#endif


The example code here uses 36, but the real code I believe uses 37.
Can we get these in sync?  Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 3/3] Documentation: prctl/seccomp_filter
  2012-01-17 23:29     ` Eric Paris
  (?)
@ 2012-01-17 23:54     ` Will Drewry
  -1 siblings, 0 replies; 47+ messages in thread
From: Will Drewry @ 2012-01-17 23:54 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris,
	scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman,
	borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh,
	dhowells, daniel.lezcano, linux-fsdevel, linux-security-module,
	olofj, mhalcrow, dlaor, corbet, alan

On Tue, Jan 17, 2012 at 5:29 PM, Eric Paris <eparis@parisplace.org> wrote:
> On Thu, Jan 12, 2012 at 6:38 PM, Will Drewry <wad@chromium.org> wrote:
>
>> +#ifndef PR_ATTACH_SECCOMP_FILTER
>> +#      define PR_ATTACH_SECCOMP_FILTER 36
>> +#endif
>
>
> The example code here uses 36, but the real code I believe uses 37.
> Can we get these in sync?  Thanks!

Nice - of course! [That is from before the PR_NO_NEW_PRIVS patch.]

thanks!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
  2012-01-16  2:04                         ` Will Drewry
@ 2012-01-18  3:12                           ` Eric W. Biederman
  -1 siblings, 0 replies; 47+ messages in thread
From: Eric W. Biederman @ 2012-01-18  3:12 UTC (permalink / raw)
  To: Will Drewry
  Cc: Andrew Lutomirski, Casey Schaufler, Linus Torvalds, Jamie Lokier,
	linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, segoon, rostedt, jmorris, scarybeasts, avi,
	penberg, viro, mingo, akpm, khilman, borislav.petkov, amwang,
	oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

Will Drewry <wad@chromium.org> writes:

> One consideration could be to add do_exit()s at known DAC transitions
> (set*id, fcaps). I don't know if that'd be wise, but it would remove
> some described ambiguity.  The same could be done with exec when the
> (e)uid/gid/fcaps change.  However, none of that helps with the opaque
> LSM data, so that'd have to be left up to the LSMs and the LSM_* flag
> you've added.

I went through and audited userspace recently and I could not find
anything that did not handle setuid failing.  It looks like kernel
developers are not the only ones who learned from the
sendmail/capabilities problem.

Eric



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch
@ 2012-01-18  3:12                           ` Eric W. Biederman
  0 siblings, 0 replies; 47+ messages in thread
From: Eric W. Biederman @ 2012-01-18  3:12 UTC (permalink / raw)
  To: Will Drewry
  Cc: Andrew Lutomirski, Casey Schaufler, Linus Torvalds, Jamie Lokier,
	linux-kernel, keescook, john.johansen, serge.hallyn, coreyb,
	pmoore, eparis, djm, segoon, rostedt, jmorris, scarybeasts, avi,
	penberg, viro, mingo, akpm, khilman, borislav.petkov, amwang,
	oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano,
	linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor,
	corbet, alan

Will Drewry <wad@chromium.org> writes:

> One consideration could be to add do_exit()s at known DAC transitions
> (set*id, fcaps). I don't know if that'd be wise, but it would remove
> some described ambiguity.  The same could be done with exec when the
> (e)uid/gid/fcaps change.  However, none of that helps with the opaque
> LSM data, so that'd have to be left up to the LSMs and the LSM_* flag
> you've added.

I went through and audited userspace recently and I could not find
anything that did not handle setuid failing.  It looks like kernel
developers are not the only ones who learned from the
sendmail/capabilities problem.

Eric



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2012-01-18  3:12 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-12 23:38 [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Will Drewry
2012-01-12 23:38 ` [PATCH v3 2/3] seccomp_filters: system call filtering using BPF Will Drewry
2012-01-13  0:51   ` Randy Dunlap
2012-01-12 23:59     ` Will Drewry
2012-01-12 23:59       ` Will Drewry
2012-01-13  1:35       ` Randy Dunlap
2012-01-13 17:39   ` Eric Paris
2012-01-13 18:50     ` Will Drewry
2012-01-13 18:50       ` Will Drewry
2012-01-12 23:38 ` [PATCH v3 3/3] Documentation: prctl/seccomp_filter Will Drewry
2012-01-15  1:52   ` Randy Dunlap
2012-01-16  1:41     ` Will Drewry
2012-01-17 23:29   ` Eric Paris
2012-01-17 23:29     ` Eric Paris
2012-01-17 23:54     ` Will Drewry
2012-01-12 23:47 ` [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch Linus Torvalds
2012-01-13  0:03   ` Will Drewry
2012-01-13  0:42   ` Andrew Lutomirski
2012-01-13  0:57     ` Linus Torvalds
2012-01-13  0:57       ` Linus Torvalds
2012-01-13  1:11       ` Andrew Lutomirski
2012-01-13  1:11         ` Andrew Lutomirski
2012-01-13  1:17         ` Linus Torvalds
2012-01-14 13:30           ` Jamie Lokier
2012-01-14 19:21             ` Will Drewry
2012-01-14 19:21               ` Will Drewry
2012-01-14 20:22             ` Linus Torvalds
2012-01-14 21:04               ` Andrew Lutomirski
2012-01-15 20:16               ` Casey Schaufler
2012-01-15 20:59                 ` Andrew Lutomirski
2012-01-15 21:32                   ` Casey Schaufler
2012-01-15 21:32                     ` Casey Schaufler
2012-01-15 22:07                     ` Andrew Lutomirski
2012-01-16  2:04                       ` Will Drewry
2012-01-16  2:04                         ` Will Drewry
2012-01-18  3:12                         ` Eric W. Biederman
2012-01-18  3:12                           ` Eric W. Biederman
2012-01-16  2:41                       ` Casey Schaufler
2012-01-16  2:41                         ` Casey Schaufler
2012-01-16  7:45                         ` Andrew Lutomirski
2012-01-16 18:02                           ` Casey Schaufler
2012-01-16 18:02                             ` Casey Schaufler
2012-01-13  1:37         ` Will Drewry
2012-01-13  1:41           ` Andrew Lutomirski
2012-01-13  1:41             ` Andrew Lutomirski
2012-01-13  2:09             ` Kees Cook
2012-01-13  2:09               ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.