linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* c/r of seccomp filters via underlying eBPF
@ 2015-09-04 16:04 Tycho Andersen
  2015-09-04 16:04 ` [PATCH 1/6] ebpf: add a seccomp program type Tycho Andersen
                   ` (5 more replies)
  0 siblings, 6 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 16:04 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev

Hi all,

Here is a set that enables checkpoint restore of the underlying eBPF programs
that power seccomp filters via the API we discussed several months ago. A few
notes:

* We expose prog_id in the ebpf dump as the pointer to the ebpf program in
  kernel memory, since this is unique. I'm not sure if this is safe. The goal
  of exposing prog_id is to enable userspace to figure out how the seccomp
  filters are inherited (i.e. is the filter a result of a fork() or an
  identical filter installed). This is relevant because
  SECCOMP_FILTER_FLAG_TSYNC depends on this ancestry.

* I'm not sure what the fd argument to seccomp() should look like in the
  SECCOMP_FILTER_FLAG_EBPF case: it seems ugly either as &fd or fd; any
  thoughts on this are welcome.

* Please double check the first patch. I'm not too framiliar with eBPF, so this
  is mostly monkey see monkey do.

* The last patch is something I discovered while testing this set (I couldn't
  re-import some filters). I'm not sure if this is indicitave of a wider bug or
  if the fix is even correct but it Works For Me.

Thoughts welcome,

Tycho


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 16:04 c/r of seccomp filters via underlying eBPF Tycho Andersen
@ 2015-09-04 16:04 ` Tycho Andersen
  2015-09-04 20:17   ` Alexei Starovoitov
                     ` (2 more replies)
  2015-09-04 16:04 ` [PATCH 2/6] seccomp: make underlying bpf ref counted as well Tycho Andersen
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 16:04 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	Tycho Andersen

seccomp uses eBPF as its underlying storage and execution format, and eBPF
has features that seccomp would like to make use of in the future. This
patch adds a formal seccomp type to the eBPF verifier.

The current implementation of the seccomp eBPF type is very limited, and
doesn't support some interesting features (notably, maps) of eBPF. However,
the primary motivation for this patchset is to enable checkpoint/restore
for seccomp filters later in the series, to this limited feature set is ok
for now.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/bpf.h |  1 +
 net/core/filter.c        | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 29ef6f9..79b825a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -122,6 +122,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_KPROBE,
 	BPF_PROG_TYPE_SCHED_CLS,
 	BPF_PROG_TYPE_SCHED_ACT,
+	BPF_PROG_TYPE_SECCOMP,
 };
 
 #define BPF_PSEUDO_MAP_FD	1
diff --git a/net/core/filter.c b/net/core/filter.c
index be3098f..ed339fa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1466,6 +1466,39 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 	}
 }
 
+static const struct bpf_func_proto *
+seccomp_func_proto(enum bpf_func_id func_id)
+{
+	/* Right now seccomp eBPF loading doesn't support maps; seccomp filters
+	 * are considered to be read-only after they're installed, so map fds
+	 * probably need to be invalidated when a seccomp filter with maps is
+	 * installed.
+	 *
+	 * The rest of these might be reasonable to call from seccomp, so we
+	 * export them.
+	 */
+	switch (func_id) {
+	case BPF_FUNC_ktime_get_ns:
+		return &bpf_ktime_get_ns_proto;
+	case BPF_FUNC_trace_printk:
+		return bpf_get_trace_printk_proto();
+	case BPF_FUNC_get_prandom_u32:
+		return &bpf_get_prandom_u32_proto;
+	case BPF_FUNC_get_smp_processor_id:
+		return &bpf_get_smp_processor_id_proto;
+	case BPF_FUNC_tail_call:
+		return &bpf_tail_call_proto;
+	case BPF_FUNC_get_current_pid_tgid:
+		return &bpf_get_current_pid_tgid_proto;
+	case BPF_FUNC_get_current_uid_gid:
+		return &bpf_get_current_uid_gid_proto;
+	case BPF_FUNC_get_current_comm:
+		return &bpf_get_current_comm_proto;
+	default:
+		return NULL;
+	}
+}
+
 static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 {
 	/* check bounds */
@@ -1516,6 +1549,17 @@ static bool tc_cls_act_is_valid_access(int off, int size,
 	return __is_valid_access(off, size, type);
 }
 
+static bool seccomp_is_valid_access(int off, int size,
+				    enum bpf_access_type type)
+{
+	if (type == BPF_WRITE)
+		return false;
+
+	if (off < 0 || off >= sizeof(struct seccomp_data) || off & 3)
+		return false;
+
+	return true;
+}
 static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 				      int src_reg, int ctx_off,
 				      struct bpf_insn *insn_buf)
@@ -1630,6 +1674,45 @@ static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 	return insn - insn_buf;
 }
 
+static u32 seccomp_convert_ctx_access(enum bpf_access_type type, int dst_reg,
+				      int src_reg, int ctx_off,
+				      struct bpf_insn *insn_buf)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (ctx_off) {
+	case offsetof(struct seccomp_data, nr):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data, nr) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg, ctx_off);
+		break;
+
+	case offsetof(struct seccomp_data, arch):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data, arch) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg, ctx_off);
+		break;
+
+	case offsetof(struct seccomp_data, instruction_pointer):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data,
+					  instruction_pointer) != 8);
+
+		*insn++ = BPF_LDX_MEM(BPF_DW, dst_reg, src_reg, ctx_off);
+		break;
+
+	default:
+		if (ctx_off & 7 ||
+		    ctx_off < offsetof(struct seccomp_data, args))
+			return -EINVAL;
+
+		BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data, args[0]) != 8);
+
+		*insn++ = BPF_LDX_MEM(BPF_DW, dst_reg, src_reg, ctx_off);
+	}
+
+	return insn - insn_buf;
+}
+
 static const struct bpf_verifier_ops sk_filter_ops = {
 	.get_func_proto = sk_filter_func_proto,
 	.is_valid_access = sk_filter_is_valid_access,
@@ -1642,6 +1725,12 @@ static const struct bpf_verifier_ops tc_cls_act_ops = {
 	.convert_ctx_access = bpf_net_convert_ctx_access,
 };
 
+static const struct bpf_verifier_ops seccomp_ops = {
+	.get_func_proto = seccomp_func_proto,
+	.is_valid_access = seccomp_is_valid_access,
+	.convert_ctx_access = seccomp_convert_ctx_access,
+};
+
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
 	.ops = &sk_filter_ops,
 	.type = BPF_PROG_TYPE_SOCKET_FILTER,
@@ -1657,11 +1746,17 @@ static struct bpf_prog_type_list sched_act_type __read_mostly = {
 	.type = BPF_PROG_TYPE_SCHED_ACT,
 };
 
+static struct bpf_prog_type_list seccomp_type __read_mostly = {
+	.ops = &seccomp_ops,
+	.type = BPF_PROG_TYPE_SECCOMP,
+};
+
 static int __init register_sk_filter_ops(void)
 {
 	bpf_register_prog_type(&sk_filter_type);
 	bpf_register_prog_type(&sched_cls_type);
 	bpf_register_prog_type(&sched_act_type);
+	bpf_register_prog_type(&seccomp_type);
 
 	return 0;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/6] seccomp: make underlying bpf ref counted as well
  2015-09-04 16:04 c/r of seccomp filters via underlying eBPF Tycho Andersen
  2015-09-04 16:04 ` [PATCH 1/6] ebpf: add a seccomp program type Tycho Andersen
@ 2015-09-04 16:04 ` Tycho Andersen
  2015-09-04 21:53   ` Andy Lutomirski
  2015-09-04 16:04 ` [PATCH 3/6] ebpf: add a way to dump an eBPF program Tycho Andersen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 16:04 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	Tycho Andersen

In the next patch, we're going to add a way to access the underlying
filters via bpf fds. This means that we need to ref-count both the
struct seccomp_filter objects and the struct bpf_prog objects separately,
in case a process dies but a filter is still referred to by another
process.

Additionally, we mark classic converted seccomp filters as seccomp eBPF
programs, since they are a subset of what is supported in seccomp eBPF.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/seccomp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5bd4779..acfe1fb 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -377,6 +377,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 	}
 
 	atomic_set(&sfilter->usage, 1);
+	atomic_set(&sfilter->prog->aux->refcnt, 1);
+	sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
 
 	return sfilter;
 }
@@ -469,7 +471,7 @@ void get_seccomp_filter(struct task_struct *tsk)
 static inline void seccomp_filter_free(struct seccomp_filter *filter)
 {
 	if (filter) {
-		bpf_prog_free(filter->prog);
+		bpf_prog_put(filter->prog);
 		kfree(filter);
 	}
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 16:04 c/r of seccomp filters via underlying eBPF Tycho Andersen
  2015-09-04 16:04 ` [PATCH 1/6] ebpf: add a seccomp program type Tycho Andersen
  2015-09-04 16:04 ` [PATCH 2/6] seccomp: make underlying bpf ref counted as well Tycho Andersen
@ 2015-09-04 16:04 ` Tycho Andersen
  2015-09-04 20:17   ` Kees Cook
  2015-09-04 20:27   ` Alexei Starovoitov
  2015-09-04 16:04 ` [PATCH 4/6] seccomp: add a way to access filters via bpf fds Tycho Andersen
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 16:04 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	Tycho Andersen

This commit adds a way to dump eBPF programs. The initial implementation
doesn't support maps, and therefore only allows dumping seccomp ebpf
programs which themselves don't currently support maps.

We export the GPL bit as well as a unique ID for the program so that
userspace can detect when two seccomp filters were inherited from each
other and clone the filter tree accordingly.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/bpf.h | 15 +++++++++++++++
 kernel/bpf/syscall.c     | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 79b825a..c5d8dc2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -107,6 +107,13 @@ enum bpf_cmd {
 	 * returns fd or negative error
 	 */
 	BPF_PROG_LOAD,
+
+	/* dump an existing bpf
+	 * err = bpf(BPF_PROG_DUMP, union bpf_attr *attr, u32 size)
+	 * Using attr->prog_fd, attr->dump_insn_cnt, attr->dump_insns
+	 * returns zero or negative error
+	 */
+	BPF_PROG_DUMP,
 };
 
 enum bpf_map_type {
@@ -160,6 +167,14 @@ union bpf_attr {
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
 	};
+
+	struct { /* anonymous struct used by BPF_PROG_DUMP command */
+		__u32		prog_fd;
+		__u32		dump_insn_cnt;
+		__aligned_u64	dump_insns;	/* user supplied buffer */
+		__u8		gpl_compatible;
+		__u64		prog_id;	/* unique id for this prog */
+	};
 } __attribute__((aligned(8)));
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a1b14d1..ee580d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -586,6 +586,47 @@ free_prog:
 	return err;
 }
 
+static int bpf_prog_dump(union bpf_attr *attr, union __user bpf_attr *uattr)
+{
+	int ufd = attr->prog_fd;
+	struct fd f = fdget(ufd);
+	struct bpf_prog *prog;
+	int ret = -EINVAL;
+
+	prog = get_prog(f);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	/* For now, let's refuse to dump anything that isn't a seccomp program.
+	 * Other program types have support for maps, which our current dump
+	 * code doesn't support.
+	 */
+	if (prog->type != BPF_PROG_TYPE_SECCOMP)
+		goto out;
+
+	ret = -EFAULT;
+	if (put_user(prog->len, &uattr->dump_insn_cnt))
+		goto out;
+
+	if (put_user((u8) prog->gpl_compatible, &uattr->gpl_compatible))
+		goto out;
+
+	if (put_user((u64) prog, &uattr->prog_id))
+		goto out;
+
+	if (attr->dump_insns) {
+		u32 len = prog->len * sizeof(struct bpf_insn);
+
+		if (copy_to_user(u64_to_ptr(attr->dump_insns),
+				 prog->insns, len) != 0)
+			goto out;
+	}
+
+	ret = 0;
+out:
+	return ret;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -650,6 +691,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_LOAD:
 		err = bpf_prog_load(&attr);
 		break;
+	case BPF_PROG_DUMP:
+		err = bpf_prog_dump(&attr, uattr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/6] seccomp: add a way to access filters via bpf fds
  2015-09-04 16:04 c/r of seccomp filters via underlying eBPF Tycho Andersen
                   ` (2 preceding siblings ...)
  2015-09-04 16:04 ` [PATCH 3/6] ebpf: add a way to dump an eBPF program Tycho Andersen
@ 2015-09-04 16:04 ` Tycho Andersen
  2015-09-04 20:26   ` Kees Cook
  2015-09-04 16:04 ` [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
  2015-09-04 16:04 ` [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps Tycho Andersen
  5 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 16:04 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	Tycho Andersen

This patch adds a way for a process that is "real root" to access the
seccomp filters of another process. The process first does a
PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
bpf(BPF_PROG_DUMP) to dump the actual program at each step.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h         | 12 ++++++++++
 include/linux/seccomp.h     | 14 +++++++++++
 include/uapi/linux/ptrace.h |  3 +++
 kernel/bpf/syscall.c        | 26 ++++++++++++++++++++-
 kernel/ptrace.c             |  7 ++++++
 kernel/seccomp.c            | 57 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4383476..30682dc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -157,6 +157,8 @@ void bpf_register_prog_type(struct bpf_prog_type_list *tl);
 void bpf_register_map_type(struct bpf_map_type_list *tl);
 
 struct bpf_prog *bpf_prog_get(u32 ufd);
+int bpf_prog_set(u32 ufd, struct bpf_prog *new);
+int bpf_new_fd(struct bpf_prog *prog, int flags);
 void bpf_prog_put(struct bpf_prog *prog);
 void bpf_prog_put_rcu(struct bpf_prog *prog);
 
@@ -175,6 +177,16 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 	return ERR_PTR(-EOPNOTSUPP);
 }
 
+static inline int bpf_prog_set(u32 ufd, struct bpf_prog *new)
+{
+	return -EINVAL;
+}
+
+static inline int bpf_new_fd(struct bpf_prog *prog, int flags)
+{
+	return -EINVAL;
+}
+
 static inline void bpf_prog_put(struct bpf_prog *prog)
 {
 }
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..d1a86ed 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,18 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+extern long seccomp_get_filter_fd(struct task_struct *child);
+extern long seccomp_next_filter(struct task_struct *child, u32 fd);
+#else
+static inline long seccomp_get_filter_fd(struct task_struct *child)
+{
+	return -EINVAL;
+}
+static inline long seccomp_next_filter(struct task_struct *child, u32 fd)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..dfd7d2e 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -23,6 +23,9 @@
 
 #define PTRACE_SYSCALL		  24
 
+#define PTRACE_SECCOMP_GET_FILTER_FD	40
+#define PTRACE_SECCOMP_NEXT_FILTER	41
+
 /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
 #define PTRACE_SETOPTIONS	0x4200
 #define PTRACE_GETEVENTMSG	0x4201
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ee580d0..58e7421 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_get);
 
+int bpf_prog_set(u32 ufd, struct bpf_prog *new)
+{
+	struct fd f;
+	struct bpf_prog *prog;
+
+	f = fdget(ufd);
+
+	prog = get_prog(f);
+	if (!IS_ERR(prog) && prog)
+		bpf_prog_put(prog);
+
+	atomic_inc(&new->aux->refcnt);
+	f.file->private_data = new;
+	fdput(f);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bpf_prog_set);
+
+int bpf_new_fd(struct bpf_prog *prog, int flags)
+{
+	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, flags);
+}
+EXPORT_SYMBOL_GPL(bpf_new_fd);
+
 /* last field in 'union bpf_attr' used by this command */
 #define	BPF_PROG_LOAD_LAST_FIELD kern_version
 
@@ -572,7 +596,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err < 0)
 		goto free_used_maps;
 
-	err = anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, O_RDWR | O_CLOEXEC);
+	err = bpf_new_fd(prog, O_RDWR | O_CLOEXEC);
 	if (err < 0)
 		/* failed to allocate fd */
 		goto free_used_maps;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..4e4b534 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,13 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER_FD:
+		return seccomp_get_filter_fd(child);
+
+	case PTRACE_SECCOMP_NEXT_FILTER:
+		return seccomp_next_filter(child, data);
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index acfe1fb..a2c5b32 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,8 @@
 #endif
 
 #ifdef CONFIG_SECCOMP_FILTER
+#include <linux/bpf.h>
+#include <uapi/linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/pid.h>
 #include <linux/ptrace.h>
@@ -814,6 +816,61 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
 }
 #endif
 
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+long seccomp_get_filter_fd(struct task_struct *child)
+{
+	long fd;
+	struct seccomp_filter *filter;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (child->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EINVAL;
+
+	filter = child->seccomp.filter;
+
+	fd = bpf_new_fd(filter->prog, O_RDONLY);
+	if (fd > 0)
+		atomic_inc(&filter->prog->aux->refcnt);
+
+	return fd;
+}
+
+long seccomp_next_filter(struct task_struct *child, u32 fd)
+{
+	struct seccomp_filter *cur;
+	struct bpf_prog *prog;
+	long ret = -ESRCH;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (child->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EINVAL;
+
+	prog = bpf_prog_get(fd);
+	if (IS_ERR(prog)) {
+		ret = PTR_ERR(prog);
+		goto out;
+	}
+
+	for (cur = child->seccomp.filter; cur; cur = cur->prev) {
+		if (cur->prog == prog) {
+			if (!cur->prev)
+				ret = -ENOENT;
+			else
+				ret = bpf_prog_set(fd, cur->prev->prog);
+			break;
+		}
+	}
+
+out:
+	bpf_prog_put(prog);
+	return ret;
+}
+#endif
+
 /* Common entry point for both prctl and syscall. */
 static long do_seccomp(unsigned int op, unsigned int flags,
 		       const char __user *uargs)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-04 16:04 c/r of seccomp filters via underlying eBPF Tycho Andersen
                   ` (3 preceding siblings ...)
  2015-09-04 16:04 ` [PATCH 4/6] seccomp: add a way to access filters via bpf fds Tycho Andersen
@ 2015-09-04 16:04 ` Tycho Andersen
  2015-09-04 20:40   ` Alexei Starovoitov
  2015-09-04 20:41   ` Kees Cook
  2015-09-04 16:04 ` [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps Tycho Andersen
  5 siblings, 2 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 16:04 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	Tycho Andersen

This is the final bit needed to support seccomp filters created via the bpf
syscall.

One concern with this patch is exactly what the interface should look like
for users, since seccomp()'s second argument is a pointer, we could ask
people to pass a pointer to the fd, but implies we might write to it which
seems impolite. Right now we cast the pointer (and force the user to cast
it), which generates ugly warnings. I'm not sure what the right answer is
here.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h      |  3 +-
 include/uapi/linux/seccomp.h |  1 +
 kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
 3 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index d1a86ed..a725dd5 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,7 +3,8 @@
 
 #include <uapi/linux/seccomp.h>
 
-#define SECCOMP_FILTER_FLAG_MASK	(SECCOMP_FILTER_FLAG_TSYNC)
+#define SECCOMP_FILTER_FLAG_MASK	(\
+	SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
 
 #ifdef CONFIG_SECCOMP
 
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..c29a423 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -16,6 +16,7 @@
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
+#define SECCOMP_FILTER_FLAG_EBPF	(1 << 1)
 
 /*
  * All BPF programs must return a 32-bit value.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index a2c5b32..9c6bea6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 
 	BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
 
-	/*
-	 * Installing a seccomp filter requires that the task has
-	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
-	 * This avoids scenarios where unprivileged tasks can affect the
-	 * behavior of privileged children.
-	 */
-	if (!task_no_new_privs(current) &&
-	    security_capable_noaudit(current_cred(), current_user_ns(),
-				     CAP_SYS_ADMIN) != 0)
-		return ERR_PTR(-EACCES);
-
 	/* Allocate a new seccomp_filter */
 	sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
 	if (!sfilter)
@@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
 	info.si_syscall = syscall;
 	force_sig_info(SIGSYS, &info, current);
 }
+
+#ifdef CONFIG_BPF_SYSCALL
+static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
+{
+	/* XXX: this cast generates a warning. should we make people pass in
+	 * &fd, or is there some nicer way of doing this?
+	 */
+	u32 fd = (u32) filter;
+	struct seccomp_filter *ret;
+	struct bpf_prog *prog;
+
+	prog = bpf_prog_get(fd);
+	if (IS_ERR(prog))
+		return (struct seccomp_filter *) prog;
+
+	if (prog->type != BPF_PROG_TYPE_SECCOMP) {
+		bpf_prog_put(prog);
+		return ERR_PTR(-EINVAL);
+	}
+
+	ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
+	if (!ret) {
+		bpf_prog_put(prog);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret->prog = prog;
+	atomic_set(&ret->usage, 1);
+
+	/* Intentionally don't bpf_prog_put() here, because the underlying prog
+	 * is refcounted too and we're holding a reference from the struct
+	 * seccomp_filter object.
+	 */
+
+	return ret;
+}
+#else
+static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
+{
+	return ERR_PTR(-EINVAL);
+}
+#endif
 #endif	/* CONFIG_SECCOMP_FILTER */
 
 /*
@@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	if (flags & ~SECCOMP_FILTER_FLAG_MASK)
 		return -EINVAL;
 
+	/*
+	 * Installing a seccomp filter requires that the task has
+	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
+	 * This avoids scenarios where unprivileged tasks can affect the
+	 * behavior of privileged children.
+	 */
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(), current_user_ns(),
+				     CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+
 	/* Prepare the new filter before holding any locks. */
-	prepared = seccomp_prepare_user_filter(filter);
+	if (flags & SECCOMP_FILTER_FLAG_EBPF)
+		prepared = seccomp_prepare_ebpf(filter);
+	else
+		prepared = seccomp_prepare_user_filter(filter);
+
 	if (IS_ERR(prepared))
 		return PTR_ERR(prepared);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps
  2015-09-04 16:04 c/r of seccomp filters via underlying eBPF Tycho Andersen
                   ` (4 preceding siblings ...)
  2015-09-04 16:04 ` [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
@ 2015-09-04 16:04 ` Tycho Andersen
  2015-09-04 21:06   ` Alexei Starovoitov
  5 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 16:04 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	Tycho Andersen

The classic converter generates conditional jumps with:

if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
        ...
} else {
        insn->dst_reg = BPF_REG_A;
        insn->src_reg = BPF_REG_X;
        insn->imm = fp->k;
        bpf_src = BPF_SRC(fp->code);
}

but here, we enforce that the src_reg == BPF_REG_0. We should also allow
BPF_REG_X since that's what the converter generates; this enables us to
load eBPF programs that were generated by the converter.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/verifier.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 039d866..2fff8cd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1080,7 +1080,12 @@ static int check_cond_jmp_op(struct verifier_env *env,
 		if (err)
 			return err;
 	} else {
-		if (insn->src_reg != BPF_REG_0) {
+		switch (insn->src_reg) {
+		case BPF_REG_0:
+		/* the classic converter generates BPF_JMP with src_reg of X */
+		case BPF_REG_X:
+			break;
+		default:
 			verbose("BPF_JMP uses reserved fields\n");
 			return -EINVAL;
 		}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 16:04 ` [PATCH 3/6] ebpf: add a way to dump an eBPF program Tycho Andersen
@ 2015-09-04 20:17   ` Kees Cook
  2015-09-04 20:45     ` Tycho Andersen
  2015-09-04 20:27   ` Alexei Starovoitov
  1 sibling, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-04 20:17 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> This commit adds a way to dump eBPF programs. The initial implementation
> doesn't support maps, and therefore only allows dumping seccomp ebpf
> programs which themselves don't currently support maps.
>
> We export the GPL bit as well as a unique ID for the program so that

This unique ID appears to be the heap address for the prog. That's a
huge leak, and should not be done. We don't want to introduce new
kernel address leaks while we're trying to fix the remaining ones.
Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE could
be used, for example.

-Kees

> userspace can detect when two seccomp filters were inherited from each
> other and clone the filter tree accordingly.
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h | 15 +++++++++++++++
>  kernel/bpf/syscall.c     | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 79b825a..c5d8dc2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -107,6 +107,13 @@ enum bpf_cmd {
>          * returns fd or negative error
>          */
>         BPF_PROG_LOAD,
> +
> +       /* dump an existing bpf
> +        * err = bpf(BPF_PROG_DUMP, union bpf_attr *attr, u32 size)
> +        * Using attr->prog_fd, attr->dump_insn_cnt, attr->dump_insns
> +        * returns zero or negative error
> +        */
> +       BPF_PROG_DUMP,
>  };
>
>  enum bpf_map_type {
> @@ -160,6 +167,14 @@ union bpf_attr {
>                 __aligned_u64   log_buf;        /* user supplied buffer */
>                 __u32           kern_version;   /* checked when prog_type=kprobe */
>         };
> +
> +       struct { /* anonymous struct used by BPF_PROG_DUMP command */
> +               __u32           prog_fd;
> +               __u32           dump_insn_cnt;
> +               __aligned_u64   dump_insns;     /* user supplied buffer */
> +               __u8            gpl_compatible;
> +               __u64           prog_id;        /* unique id for this prog */
> +       };
>  } __attribute__((aligned(8)));
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a1b14d1..ee580d0 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -586,6 +586,47 @@ free_prog:
>         return err;
>  }
>
> +static int bpf_prog_dump(union bpf_attr *attr, union __user bpf_attr *uattr)
> +{
> +       int ufd = attr->prog_fd;
> +       struct fd f = fdget(ufd);
> +       struct bpf_prog *prog;
> +       int ret = -EINVAL;
> +
> +       prog = get_prog(f);
> +       if (IS_ERR(prog))
> +               return PTR_ERR(prog);
> +
> +       /* For now, let's refuse to dump anything that isn't a seccomp program.
> +        * Other program types have support for maps, which our current dump
> +        * code doesn't support.
> +        */
> +       if (prog->type != BPF_PROG_TYPE_SECCOMP)
> +               goto out;
> +
> +       ret = -EFAULT;
> +       if (put_user(prog->len, &uattr->dump_insn_cnt))
> +               goto out;
> +
> +       if (put_user((u8) prog->gpl_compatible, &uattr->gpl_compatible))
> +               goto out;
> +
> +       if (put_user((u64) prog, &uattr->prog_id))
> +               goto out;
> +
> +       if (attr->dump_insns) {
> +               u32 len = prog->len * sizeof(struct bpf_insn);
> +
> +               if (copy_to_user(u64_to_ptr(attr->dump_insns),
> +                                prog->insns, len) != 0)
> +                       goto out;
> +       }
> +
> +       ret = 0;
> +out:
> +       return ret;
> +}
> +
>  SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
>  {
>         union bpf_attr attr = {};
> @@ -650,6 +691,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>         case BPF_PROG_LOAD:
>                 err = bpf_prog_load(&attr);
>                 break;
> +       case BPF_PROG_DUMP:
> +               err = bpf_prog_dump(&attr, uattr);
> +               break;
>         default:
>                 err = -EINVAL;
>                 break;
> --
> 2.1.4
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 16:04 ` [PATCH 1/6] ebpf: add a seccomp program type Tycho Andersen
@ 2015-09-04 20:17   ` Alexei Starovoitov
  2015-09-04 21:09     ` Tycho Andersen
  2015-09-04 20:34   ` Kees Cook
  2015-09-04 21:50   ` Andy Lutomirski
  2 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-04 20:17 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

On Fri, Sep 04, 2015 at 10:04:19AM -0600, Tycho Andersen wrote:
> seccomp uses eBPF as its underlying storage and execution format, and eBPF
> has features that seccomp would like to make use of in the future. This
> patch adds a formal seccomp type to the eBPF verifier.
> 
> The current implementation of the seccomp eBPF type is very limited, and
> doesn't support some interesting features (notably, maps) of eBPF. However,
> the primary motivation for this patchset is to enable checkpoint/restore
> for seccomp filters later in the series, to this limited feature set is ok
> for now.

yes. good compromise to start.

> +static const struct bpf_func_proto *
> +seccomp_func_proto(enum bpf_func_id func_id)
> +{
> +	/* Right now seccomp eBPF loading doesn't support maps; seccomp filters
> +	 * are considered to be read-only after they're installed, so map fds
> +	 * probably need to be invalidated when a seccomp filter with maps is
> +	 * installed.

Just disabling bpf_map_lookup/update() helpers (the way you did here)
is enough. The prorgram can still have references to maps, but since they
won't be accessed it's safe.

> +	 *
> +	 * The rest of these might be reasonable to call from seccomp, so we
> +	 * export them.
> +	 */
> +	switch (func_id) {
> +	case BPF_FUNC_ktime_get_ns:
> +		return &bpf_ktime_get_ns_proto;
> +	case BPF_FUNC_trace_printk:
> +		return bpf_get_trace_printk_proto();
> +	case BPF_FUNC_get_prandom_u32:
> +		return &bpf_get_prandom_u32_proto;
> +	case BPF_FUNC_get_smp_processor_id:
> +		return &bpf_get_smp_processor_id_proto;
> +	case BPF_FUNC_tail_call:
> +		return &bpf_tail_call_proto;
> +	case BPF_FUNC_get_current_pid_tgid:
> +		return &bpf_get_current_pid_tgid_proto;
> +	case BPF_FUNC_get_current_uid_gid:
> +		return &bpf_get_current_uid_gid_proto;
> +	case BPF_FUNC_get_current_comm:
> +		return &bpf_get_current_comm_proto;

the list looks good to start with.

>  
> +static u32 seccomp_convert_ctx_access(enum bpf_access_type type, int dst_reg,
> +				      int src_reg, int ctx_off,
> +				      struct bpf_insn *insn_buf)
> +{
> +	struct bpf_insn *insn = insn_buf;
> +
> +	switch (ctx_off) {
> +	case offsetof(struct seccomp_data, nr):

the conversion of seccomp_data fields is unnecessary.
We're doing conversion for sk_buff, because sk_buff and __sk_buff aree two
different structures. __sk_buff is user ABI with its own fields that losely
correspond to in-kernel struct sk_buff.
seccomp_data is already part of user ABI, so it's ok to access as-is.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/6] seccomp: add a way to access filters via bpf fds
  2015-09-04 16:04 ` [PATCH 4/6] seccomp: add a way to access filters via bpf fds Tycho Andersen
@ 2015-09-04 20:26   ` Kees Cook
  2015-09-04 20:29     ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-04 20:26 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> This patch adds a way for a process that is "real root" to access the
> seccomp filters of another process. The process first does a
> PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> bpf(BPF_PROG_DUMP) to dump the actual program at each step.

Why is this a new ptrace interface instead of a new seccomp interface?
I would expect this to only be valid for "current", otherwise we could
run into races as the ptracee adds filters. i.e. it is not safe to
examine seccomp filters from tasks other than current.

-Kees

>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/linux/bpf.h         | 12 ++++++++++
>  include/linux/seccomp.h     | 14 +++++++++++
>  include/uapi/linux/ptrace.h |  3 +++
>  kernel/bpf/syscall.c        | 26 ++++++++++++++++++++-
>  kernel/ptrace.c             |  7 ++++++
>  kernel/seccomp.c            | 57 +++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 4383476..30682dc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -157,6 +157,8 @@ void bpf_register_prog_type(struct bpf_prog_type_list *tl);
>  void bpf_register_map_type(struct bpf_map_type_list *tl);
>
>  struct bpf_prog *bpf_prog_get(u32 ufd);
> +int bpf_prog_set(u32 ufd, struct bpf_prog *new);
> +int bpf_new_fd(struct bpf_prog *prog, int flags);
>  void bpf_prog_put(struct bpf_prog *prog);
>  void bpf_prog_put_rcu(struct bpf_prog *prog);
>
> @@ -175,6 +177,16 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>         return ERR_PTR(-EOPNOTSUPP);
>  }
>
> +static inline int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> +{
> +       return -EINVAL;
> +}
> +
> +static inline int bpf_new_fd(struct bpf_prog *prog, int flags)
> +{
> +       return -EINVAL;
> +}
> +
>  static inline void bpf_prog_put(struct bpf_prog *prog)
>  {
>  }
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index f426503..d1a86ed 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -95,4 +95,18 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
>         return;
>  }
>  #endif /* CONFIG_SECCOMP_FILTER */
> +
> +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
> +extern long seccomp_get_filter_fd(struct task_struct *child);
> +extern long seccomp_next_filter(struct task_struct *child, u32 fd);
> +#else
> +static inline long seccomp_get_filter_fd(struct task_struct *child)
> +{
> +       return -EINVAL;
> +}
> +static inline long seccomp_next_filter(struct task_struct *child, u32 fd)
> +{
> +       return -EINVAL;
> +}
> +#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
>  #endif /* _LINUX_SECCOMP_H */
> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index a7a6979..dfd7d2e 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -23,6 +23,9 @@
>
>  #define PTRACE_SYSCALL           24
>
> +#define PTRACE_SECCOMP_GET_FILTER_FD   40
> +#define PTRACE_SECCOMP_NEXT_FILTER     41
> +
>  /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
>  #define PTRACE_SETOPTIONS      0x4200
>  #define PTRACE_GETEVENTMSG     0x4201
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ee580d0..58e7421 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
>  }
>  EXPORT_SYMBOL_GPL(bpf_prog_get);
>
> +int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> +{
> +       struct fd f;
> +       struct bpf_prog *prog;
> +
> +       f = fdget(ufd);
> +
> +       prog = get_prog(f);
> +       if (!IS_ERR(prog) && prog)
> +               bpf_prog_put(prog);
> +
> +       atomic_inc(&new->aux->refcnt);
> +       f.file->private_data = new;
> +       fdput(f);
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_set);
> +
> +int bpf_new_fd(struct bpf_prog *prog, int flags)
> +{
> +       return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, flags);
> +}
> +EXPORT_SYMBOL_GPL(bpf_new_fd);
> +
>  /* last field in 'union bpf_attr' used by this command */
>  #define        BPF_PROG_LOAD_LAST_FIELD kern_version
>
> @@ -572,7 +596,7 @@ static int bpf_prog_load(union bpf_attr *attr)
>         if (err < 0)
>                 goto free_used_maps;
>
> -       err = anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, O_RDWR | O_CLOEXEC);
> +       err = bpf_new_fd(prog, O_RDWR | O_CLOEXEC);
>         if (err < 0)
>                 /* failed to allocate fd */
>                 goto free_used_maps;
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 787320d..4e4b534 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1016,6 +1016,13 @@ int ptrace_request(struct task_struct *child, long request,
>                 break;
>         }
>  #endif
> +
> +       case PTRACE_SECCOMP_GET_FILTER_FD:
> +               return seccomp_get_filter_fd(child);
> +
> +       case PTRACE_SECCOMP_NEXT_FILTER:
> +               return seccomp_next_filter(child, data);
> +
>         default:
>                 break;
>         }
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index acfe1fb..a2c5b32 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -26,6 +26,8 @@
>  #endif
>
>  #ifdef CONFIG_SECCOMP_FILTER
> +#include <linux/bpf.h>
> +#include <uapi/linux/bpf.h>
>  #include <linux/filter.h>
>  #include <linux/pid.h>
>  #include <linux/ptrace.h>
> @@ -814,6 +816,61 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
>  }
>  #endif
>
> +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
> +long seccomp_get_filter_fd(struct task_struct *child)
> +{
> +       long fd;
> +       struct seccomp_filter *filter;
> +
> +       if (!capable(CAP_SYS_ADMIN))
> +               return -EACCES;
> +
> +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> +               return -EINVAL;
> +
> +       filter = child->seccomp.filter;
> +
> +       fd = bpf_new_fd(filter->prog, O_RDONLY);
> +       if (fd > 0)
> +               atomic_inc(&filter->prog->aux->refcnt);
> +
> +       return fd;
> +}
> +
> +long seccomp_next_filter(struct task_struct *child, u32 fd)
> +{
> +       struct seccomp_filter *cur;
> +       struct bpf_prog *prog;
> +       long ret = -ESRCH;
> +
> +       if (!capable(CAP_SYS_ADMIN))
> +               return -EACCES;
> +
> +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> +               return -EINVAL;
> +
> +       prog = bpf_prog_get(fd);
> +       if (IS_ERR(prog)) {
> +               ret = PTR_ERR(prog);
> +               goto out;
> +       }
> +
> +       for (cur = child->seccomp.filter; cur; cur = cur->prev) {
> +               if (cur->prog == prog) {
> +                       if (!cur->prev)
> +                               ret = -ENOENT;
> +                       else
> +                               ret = bpf_prog_set(fd, cur->prev->prog);
> +                       break;
> +               }
> +       }
> +
> +out:
> +       bpf_prog_put(prog);
> +       return ret;
> +}
> +#endif
> +
>  /* Common entry point for both prctl and syscall. */
>  static long do_seccomp(unsigned int op, unsigned int flags,
>                        const char __user *uargs)
> --
> 2.1.4
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 16:04 ` [PATCH 3/6] ebpf: add a way to dump an eBPF program Tycho Andersen
  2015-09-04 20:17   ` Kees Cook
@ 2015-09-04 20:27   ` Alexei Starovoitov
  2015-09-04 20:42     ` Tycho Andersen
  1 sibling, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-04 20:27 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

On Fri, Sep 04, 2015 at 10:04:21AM -0600, Tycho Andersen wrote:
> This commit adds a way to dump eBPF programs. The initial implementation
> doesn't support maps, and therefore only allows dumping seccomp ebpf
> programs which themselves don't currently support maps.
> 
> 
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h | 15 +++++++++++++++
>  kernel/bpf/syscall.c     | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 79b825a..c5d8dc2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -107,6 +107,13 @@ enum bpf_cmd {
>  	 * returns fd or negative error
>  	 */
>  	BPF_PROG_LOAD,
> +
> +	/* dump an existing bpf
> +	 * err = bpf(BPF_PROG_DUMP, union bpf_attr *attr, u32 size)
> +	 * Using attr->prog_fd, attr->dump_insn_cnt, attr->dump_insns
> +	 * returns zero or negative error
> +	 */
> +	BPF_PROG_DUMP,
>  };
>  
>  enum bpf_map_type {
> @@ -160,6 +167,14 @@ union bpf_attr {
>  		__aligned_u64	log_buf;	/* user supplied buffer */
>  		__u32		kern_version;	/* checked when prog_type=kprobe */
>  	};
> +
> +	struct { /* anonymous struct used by BPF_PROG_DUMP command */
> +		__u32		prog_fd;
> +		__u32		dump_insn_cnt;
> +		__aligned_u64	dump_insns;	/* user supplied buffer */
> +		__u8		gpl_compatible;
> +		__u64		prog_id;	/* unique id for this prog */
> +	};

my first reaction was to may be reuse existing struct used to load,
but I guess it's actually cleaner to have a new one like you did.
though prog_fd looks redundant and prog_id is ...

> +	if (put_user((u64) prog, &uattr->prog_id))
> +		goto out;

.. is definitely not secure.

> We export the GPL bit as well as a unique ID for the program so that
> userspace can detect when two seccomp filters were inherited from each
> other and clone the filter tree accordingly.

you mean that in-kernel prog pointer is the same?
I think user space can memcmp insns of programs instead?
Are you trying to solve the case when parent has an FD for bpf program
and child has another FD that points to the same program, and both
doing dump and need to coordinate?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/6] seccomp: add a way to access filters via bpf fds
  2015-09-04 20:26   ` Kees Cook
@ 2015-09-04 20:29     ` Alexei Starovoitov
  2015-09-04 20:58       ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-04 20:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Fri, Sep 04, 2015 at 01:26:42PM -0700, Kees Cook wrote:
> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > This patch adds a way for a process that is "real root" to access the
> > seccomp filters of another process. The process first does a
> > PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> > attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> > bpf(BPF_PROG_DUMP) to dump the actual program at each step.
> 
> Why is this a new ptrace interface instead of a new seccomp interface?
> I would expect this to only be valid for "current", otherwise we could
> run into races as the ptracee adds filters. i.e. it is not safe to
> examine seccomp filters from tasks other than current.

same question.
I thought we discussed to add a command to seccomp() syscall for that?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 16:04 ` [PATCH 1/6] ebpf: add a seccomp program type Tycho Andersen
  2015-09-04 20:17   ` Alexei Starovoitov
@ 2015-09-04 20:34   ` Kees Cook
  2015-09-04 21:06     ` Tycho Andersen
  2015-09-04 21:50   ` Andy Lutomirski
  2 siblings, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-04 20:34 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> seccomp uses eBPF as its underlying storage and execution format, and eBPF
> has features that seccomp would like to make use of in the future. This
> patch adds a formal seccomp type to the eBPF verifier.
>
> The current implementation of the seccomp eBPF type is very limited, and
> doesn't support some interesting features (notably, maps) of eBPF. However,
> the primary motivation for this patchset is to enable checkpoint/restore
> for seccomp filters later in the series, to this limited feature set is ok
> for now.
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h |  1 +
>  net/core/filter.c        | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 96 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 29ef6f9..79b825a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -122,6 +122,7 @@ enum bpf_prog_type {
>         BPF_PROG_TYPE_KPROBE,
>         BPF_PROG_TYPE_SCHED_CLS,
>         BPF_PROG_TYPE_SCHED_ACT,
> +       BPF_PROG_TYPE_SECCOMP,
>  };
>
>  #define BPF_PSEUDO_MAP_FD      1
> diff --git a/net/core/filter.c b/net/core/filter.c
> index be3098f..ed339fa 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1466,6 +1466,39 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
>         }
>  }
>
> +static const struct bpf_func_proto *
> +seccomp_func_proto(enum bpf_func_id func_id)
> +{
> +       /* Right now seccomp eBPF loading doesn't support maps; seccomp filters
> +        * are considered to be read-only after they're installed, so map fds
> +        * probably need to be invalidated when a seccomp filter with maps is
> +        * installed.
> +        *
> +        * The rest of these might be reasonable to call from seccomp, so we
> +        * export them.
> +        */
> +       switch (func_id) {
> +       case BPF_FUNC_ktime_get_ns:
> +               return &bpf_ktime_get_ns_proto;
> +       case BPF_FUNC_trace_printk:
> +               return bpf_get_trace_printk_proto();
> +       case BPF_FUNC_get_prandom_u32:
> +               return &bpf_get_prandom_u32_proto;
> +       case BPF_FUNC_get_smp_processor_id:
> +               return &bpf_get_smp_processor_id_proto;
> +       case BPF_FUNC_tail_call:
> +               return &bpf_tail_call_proto;
> +       case BPF_FUNC_get_current_pid_tgid:
> +               return &bpf_get_current_pid_tgid_proto;
> +       case BPF_FUNC_get_current_uid_gid:
> +               return &bpf_get_current_uid_gid_proto;
> +       case BPF_FUNC_get_current_comm:
> +               return &bpf_get_current_comm_proto;
> +       default:
> +               return NULL;
> +       }
> +}

While this list is probably fine, I don't want to mix the addition of
eBPF functions to the seccomp ABI with the CRIU changes. No function
calls are currently possible and it should stay that way.

I was expecting to see a validator, similar to the existing BPF
validator that is called when creating seccomp filters currently. Can
we add a similar validator for new BPF_PROG_TYPE_SECCOMP?

-Kees

> +
>  static bool __is_valid_access(int off, int size, enum bpf_access_type type)
>  {
>         /* check bounds */
> @@ -1516,6 +1549,17 @@ static bool tc_cls_act_is_valid_access(int off, int size,
>         return __is_valid_access(off, size, type);
>  }
>
> +static bool seccomp_is_valid_access(int off, int size,
> +                                   enum bpf_access_type type)
> +{
> +       if (type == BPF_WRITE)
> +               return false;
> +
> +       if (off < 0 || off >= sizeof(struct seccomp_data) || off & 3)
> +               return false;
> +
> +       return true;
> +}
>  static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
>                                       int src_reg, int ctx_off,
>                                       struct bpf_insn *insn_buf)
> @@ -1630,6 +1674,45 @@ static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
>         return insn - insn_buf;
>  }
>
> +static u32 seccomp_convert_ctx_access(enum bpf_access_type type, int dst_reg,
> +                                     int src_reg, int ctx_off,
> +                                     struct bpf_insn *insn_buf)
> +{
> +       struct bpf_insn *insn = insn_buf;
> +
> +       switch (ctx_off) {
> +       case offsetof(struct seccomp_data, nr):
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data, nr) != 4);
> +
> +               *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg, ctx_off);
> +               break;
> +
> +       case offsetof(struct seccomp_data, arch):
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data, arch) != 4);
> +
> +               *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg, ctx_off);
> +               break;
> +
> +       case offsetof(struct seccomp_data, instruction_pointer):
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data,
> +                                         instruction_pointer) != 8);
> +
> +               *insn++ = BPF_LDX_MEM(BPF_DW, dst_reg, src_reg, ctx_off);
> +               break;
> +
> +       default:
> +               if (ctx_off & 7 ||
> +                   ctx_off < offsetof(struct seccomp_data, args))
> +                       return -EINVAL;
> +
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct seccomp_data, args[0]) != 8);
> +
> +               *insn++ = BPF_LDX_MEM(BPF_DW, dst_reg, src_reg, ctx_off);
> +       }
> +
> +       return insn - insn_buf;
> +}
> +
>  static const struct bpf_verifier_ops sk_filter_ops = {
>         .get_func_proto = sk_filter_func_proto,
>         .is_valid_access = sk_filter_is_valid_access,
> @@ -1642,6 +1725,12 @@ static const struct bpf_verifier_ops tc_cls_act_ops = {
>         .convert_ctx_access = bpf_net_convert_ctx_access,
>  };
>
> +static const struct bpf_verifier_ops seccomp_ops = {
> +       .get_func_proto = seccomp_func_proto,
> +       .is_valid_access = seccomp_is_valid_access,
> +       .convert_ctx_access = seccomp_convert_ctx_access,
> +};
> +
>  static struct bpf_prog_type_list sk_filter_type __read_mostly = {
>         .ops = &sk_filter_ops,
>         .type = BPF_PROG_TYPE_SOCKET_FILTER,
> @@ -1657,11 +1746,17 @@ static struct bpf_prog_type_list sched_act_type __read_mostly = {
>         .type = BPF_PROG_TYPE_SCHED_ACT,
>  };
>
> +static struct bpf_prog_type_list seccomp_type __read_mostly = {
> +       .ops = &seccomp_ops,
> +       .type = BPF_PROG_TYPE_SECCOMP,
> +};
> +
>  static int __init register_sk_filter_ops(void)
>  {
>         bpf_register_prog_type(&sk_filter_type);
>         bpf_register_prog_type(&sched_cls_type);
>         bpf_register_prog_type(&sched_act_type);
> +       bpf_register_prog_type(&seccomp_type);
>
>         return 0;
>  }
> --
> 2.1.4
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-04 16:04 ` [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
@ 2015-09-04 20:40   ` Alexei Starovoitov
  2015-09-04 20:41   ` Kees Cook
  1 sibling, 0 replies; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-04 20:40 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

On Fri, Sep 04, 2015 at 10:04:23AM -0600, Tycho Andersen wrote:
> This is the final bit needed to support seccomp filters created via the bpf
> syscall.
> 
> One concern with this patch is exactly what the interface should look like
> for users, since seccomp()'s second argument is a pointer, we could ask
> people to pass a pointer to the fd, but implies we might write to it which
> seems impolite. Right now we cast the pointer (and force the user to cast
> it), which generates ugly warnings. I'm not sure what the right answer is
> here.

I think passing &fd is fine. setsockopt does similar things.

> -#define SECCOMP_FILTER_FLAG_MASK	(SECCOMP_FILTER_FLAG_TSYNC)
> +#define SECCOMP_FILTER_FLAG_MASK	(\
> +	SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>  
>  #ifdef CONFIG_SECCOMP
>  
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a4..c29a423 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -16,6 +16,7 @@
>  
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>  #define SECCOMP_FILTER_FLAG_TSYNC	1
> +#define SECCOMP_FILTER_FLAG_EBPF	(1 << 1)
...
> -	prepared = seccomp_prepare_user_filter(filter);
> +	if (flags & SECCOMP_FILTER_FLAG_EBPF)
> +		prepared = seccomp_prepare_ebpf(filter);
> +	else
> +		prepared = seccomp_prepare_user_filter(filter);
> +

I think instead of flag for existing SECCOMP_SET_MODE_FILTER
command, it would have been cleaner to add new command
SECCOMP_SET_MODE_BPF
and pass &fd to it.
Both kernel implementation and user side would look better ?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-04 16:04 ` [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
  2015-09-04 20:40   ` Alexei Starovoitov
@ 2015-09-04 20:41   ` Kees Cook
  2015-09-05  7:13     ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-04 20:41 UTC (permalink / raw)
  To: Tycho Andersen, Linux API
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> This is the final bit needed to support seccomp filters created via the bpf
> syscall.
>
> One concern with this patch is exactly what the interface should look like
> for users, since seccomp()'s second argument is a pointer, we could ask
> people to pass a pointer to the fd, but implies we might write to it which
> seems impolite. Right now we cast the pointer (and force the user to cast
> it), which generates ugly warnings. I'm not sure what the right answer is
> here.
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/linux/seccomp.h      |  3 +-
>  include/uapi/linux/seccomp.h |  1 +
>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 61 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index d1a86ed..a725dd5 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -3,7 +3,8 @@
>
>  #include <uapi/linux/seccomp.h>
>
> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
> +#define SECCOMP_FILTER_FLAG_MASK       (\
> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>
>  #ifdef CONFIG_SECCOMP
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a4..c29a423 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -16,6 +16,7 @@
>
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>  #define SECCOMP_FILTER_FLAG_TSYNC      1
> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
>
>  /*
>   * All BPF programs must return a 32-bit value.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index a2c5b32..9c6bea6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>
>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>
> -       /*
> -        * Installing a seccomp filter requires that the task has
> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> -        * This avoids scenarios where unprivileged tasks can affect the
> -        * behavior of privileged children.
> -        */
> -       if (!task_no_new_privs(current) &&
> -           security_capable_noaudit(current_cred(), current_user_ns(),
> -                                    CAP_SYS_ADMIN) != 0)
> -               return ERR_PTR(-EACCES);
> -
>         /* Allocate a new seccomp_filter */
>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>         if (!sfilter)
> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
>         info.si_syscall = syscall;
>         force_sig_info(SIGSYS, &info, current);
>  }
> +
> +#ifdef CONFIG_BPF_SYSCALL
> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> +{
> +       /* XXX: this cast generates a warning. should we make people pass in
> +        * &fd, or is there some nicer way of doing this?
> +        */
> +       u32 fd = (u32) filter;

I think this is probably the right way to do it, modulo getting the
warning fixed. Let me invoke the great linux-api subscribers to get
some more opinions.

tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
pointer argument into an fd argument. Is this sane, should it be a
pointer to an fd, or should it not be a flag at all, creating a new
seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?

-Kees

> +       struct seccomp_filter *ret;
> +       struct bpf_prog *prog;
> +
> +       prog = bpf_prog_get(fd);
> +       if (IS_ERR(prog))
> +               return (struct seccomp_filter *) prog;
> +
> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> +               bpf_prog_put(prog);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> +       if (!ret) {
> +               bpf_prog_put(prog);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       ret->prog = prog;
> +       atomic_set(&ret->usage, 1);
> +
> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
> +        * is refcounted too and we're holding a reference from the struct
> +        * seccomp_filter object.
> +        */
> +
> +       return ret;
> +}
> +#else
> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> +{
> +       return ERR_PTR(-EINVAL);
> +}
> +#endif
>  #endif /* CONFIG_SECCOMP_FILTER */
>
>  /*
> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
>                 return -EINVAL;
>
> +       /*
> +        * Installing a seccomp filter requires that the task has
> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> +        * This avoids scenarios where unprivileged tasks can affect the
> +        * behavior of privileged children.
> +        */
> +       if (!task_no_new_privs(current) &&
> +           security_capable_noaudit(current_cred(), current_user_ns(),
> +                                    CAP_SYS_ADMIN) != 0)
> +               return -EACCES;
> +
>         /* Prepare the new filter before holding any locks. */
> -       prepared = seccomp_prepare_user_filter(filter);
> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
> +               prepared = seccomp_prepare_ebpf(filter);
> +       else
> +               prepared = seccomp_prepare_user_filter(filter);
> +
>         if (IS_ERR(prepared))
>                 return PTR_ERR(prepared);
>
> --
> 2.1.4
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 20:27   ` Alexei Starovoitov
@ 2015-09-04 20:42     ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 20:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

Hi Alexei,

On Fri, Sep 04, 2015 at 01:27:05PM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 04, 2015 at 10:04:21AM -0600, Tycho Andersen wrote:
> > This commit adds a way to dump eBPF programs. The initial implementation
> > doesn't support maps, and therefore only allows dumping seccomp ebpf
> > programs which themselves don't currently support maps.
> > 
> > 
> > Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> > CC: Kees Cook <keescook@chromium.org>
> > CC: Will Drewry <wad@chromium.org>
> > CC: Oleg Nesterov <oleg@redhat.com>
> > CC: Andy Lutomirski <luto@amacapital.net>
> > CC: Pavel Emelyanov <xemul@parallels.com>
> > CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> > CC: Alexei Starovoitov <ast@kernel.org>
> > CC: Daniel Borkmann <daniel@iogearbox.net>
> > ---
> >  include/uapi/linux/bpf.h | 15 +++++++++++++++
> >  kernel/bpf/syscall.c     | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 59 insertions(+)
> > 
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 79b825a..c5d8dc2 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -107,6 +107,13 @@ enum bpf_cmd {
> >  	 * returns fd or negative error
> >  	 */
> >  	BPF_PROG_LOAD,
> > +
> > +	/* dump an existing bpf
> > +	 * err = bpf(BPF_PROG_DUMP, union bpf_attr *attr, u32 size)
> > +	 * Using attr->prog_fd, attr->dump_insn_cnt, attr->dump_insns
> > +	 * returns zero or negative error
> > +	 */
> > +	BPF_PROG_DUMP,
> >  };
> >  
> >  enum bpf_map_type {
> > @@ -160,6 +167,14 @@ union bpf_attr {
> >  		__aligned_u64	log_buf;	/* user supplied buffer */
> >  		__u32		kern_version;	/* checked when prog_type=kprobe */
> >  	};
> > +
> > +	struct { /* anonymous struct used by BPF_PROG_DUMP command */
> > +		__u32		prog_fd;
> > +		__u32		dump_insn_cnt;
> > +		__aligned_u64	dump_insns;	/* user supplied buffer */
> > +		__u8		gpl_compatible;
> > +		__u64		prog_id;	/* unique id for this prog */
> > +	};
> 
> my first reaction was to may be reuse existing struct used to load,
> but I guess it's actually cleaner to have a new one like you did.
> though prog_fd looks redundant and prog_id is ...

prog_fd is input here, the rest are outputs.

> > +	if (put_user((u64) prog, &uattr->prog_id))
> > +		goto out;
> 
> .. is definitely not secure.
> 
> > We export the GPL bit as well as a unique ID for the program so that
> > userspace can detect when two seccomp filters were inherited from each
> > other and clone the filter tree accordingly.
> 
> you mean that in-kernel prog pointer is the same?
> I think user space can memcmp insns of programs instead?
> Are you trying to solve the case when parent has an FD for bpf program
> and child has another FD that points to the same program, and both
> doing dump and need to coordinate?

Yes, exactly. If we just do a memcmp(), two users can install the same
filter and have a different inheritance model on checkpoint vs
restore. This means that a checkpoint/restore'd process may see
different behavior when using SECCOMP_FILTER_FLAG_TSYNC in the future.

I'm not entirely clear on how much of a problem this actually is, and
perhaps it is too small to be worth worry about, but if there was
another way to export some unique id, that would be dandy.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 20:17   ` Kees Cook
@ 2015-09-04 20:45     ` Tycho Andersen
  2015-09-04 20:50       ` Kees Cook
  2015-09-04 21:48       ` Andy Lutomirski
  0 siblings, 2 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 20:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > This commit adds a way to dump eBPF programs. The initial implementation
> > doesn't support maps, and therefore only allows dumping seccomp ebpf
> > programs which themselves don't currently support maps.
> >
> > We export the GPL bit as well as a unique ID for the program so that
> 
> This unique ID appears to be the heap address for the prog. That's a
> huge leak, and should not be done. We don't want to introduce new
> kernel address leaks while we're trying to fix the remaining ones.
> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
> could be used, for example.

No; we acquire the fd per process, so if a task installs a filter and
then forks N times, we'll grab N (+1) copies of the filter from N (+1)
different file descriptors. Ideally, we'd have some way to figure out
that these were all the same. Some sort of prog_id is one way,
although there may be others.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 20:45     ` Tycho Andersen
@ 2015-09-04 20:50       ` Kees Cook
  2015-09-04 20:58         ` Alexei Starovoitov
  2015-09-04 21:48       ` Andy Lutomirski
  1 sibling, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-04 20:50 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
>> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > This commit adds a way to dump eBPF programs. The initial implementation
>> > doesn't support maps, and therefore only allows dumping seccomp ebpf
>> > programs which themselves don't currently support maps.
>> >
>> > We export the GPL bit as well as a unique ID for the program so that
>>
>> This unique ID appears to be the heap address for the prog. That's a
>> huge leak, and should not be done. We don't want to introduce new
>> kernel address leaks while we're trying to fix the remaining ones.
>> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
>> could be used, for example.
>
> No; we acquire the fd per process, so if a task installs a filter and
> then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> different file descriptors. Ideally, we'd have some way to figure out
> that these were all the same. Some sort of prog_id is one way,
> although there may be others.

If KCMP_FILE or a new KCMP_BPF isn't possible, then we'll probably
have to add a unique id (counter) to all bpf programs as they're
created.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 20:50       ` Kees Cook
@ 2015-09-04 20:58         ` Alexei Starovoitov
  2015-09-04 21:00           ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-04 20:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Fri, Sep 04, 2015 at 01:50:55PM -0700, Kees Cook wrote:
> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > This commit adds a way to dump eBPF programs. The initial implementation
> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
> >> > programs which themselves don't currently support maps.
> >> >
> >> > We export the GPL bit as well as a unique ID for the program so that
> >>
> >> This unique ID appears to be the heap address for the prog. That's a
> >> huge leak, and should not be done. We don't want to introduce new
> >> kernel address leaks while we're trying to fix the remaining ones.
> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
> >> could be used, for example.
> >
> > No; we acquire the fd per process, so if a task installs a filter and
> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> > different file descriptors. Ideally, we'd have some way to figure out
> > that these were all the same. Some sort of prog_id is one way,
> > although there may be others.
> 
> If KCMP_FILE or a new KCMP_BPF isn't possible, then we'll probably
> have to add a unique id (counter) to all bpf programs as they're
> created.

I think tweaking KCMP_FILE for anon_inodes should do the trick
and should work at the end (if it's not working already).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/6] seccomp: add a way to access filters via bpf fds
  2015-09-04 20:29     ` Alexei Starovoitov
@ 2015-09-04 20:58       ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 20:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Fri, Sep 04, 2015 at 01:29:49PM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 04, 2015 at 01:26:42PM -0700, Kees Cook wrote:
> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> > <tycho.andersen@canonical.com> wrote:
> > > This patch adds a way for a process that is "real root" to access the
> > > seccomp filters of another process. The process first does a
> > > PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> > > attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> > > bpf(BPF_PROG_DUMP) to dump the actual program at each step.
> > 
> > Why is this a new ptrace interface instead of a new seccomp interface?
> > I would expect this to only be valid for "current", otherwise we could
> > run into races as the ptracee adds filters.

The task has to be in the stopped state in order for the filters to be
inspected, so I think this isn't a problem.

> > i.e. it is not safe to
> > examine seccomp filters from tasks other than current.
> 
> same question.
> I thought we discussed to add a command to seccomp() syscall for that?

We did initially, although at the end Andy posted about not allowing
tasks to see some filters that had been installed:

On Mon, 10 Aug 2015 14:36:09 -0700 Andy Lutomirski
<luto@amacapital.net> wrote:
> On Mon, Aug 10, 2015 at 2:31 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Mon, Aug 10, 2015 at 02:18:29PM -0700, Kees Cook wrote:
> >> On Sun, Aug 9, 2015 at 7:20 PM, Andy Lutomirski
> >> <luto@amacapital.net> wrote:
> >> > On Fri, Aug 7, 2015 at 10:59 AM, Tycho Andersen
> >> > <tycho.andersen@canonical.com> wrote:
> >> >> On Fri, Aug 07, 2015 at 10:36:02AM -0700, Andy Lutomirski wrote:
> >> >>> On Fri, Aug 7, 2015 at 8:45 AM, Tycho Andersen
> >> >>> <tycho.andersen@canonical.com> wrote:
> >> >>> >
> >> >>> > fd = seccomp(SECCOMP_GET_FILTER_FD);
> >> >>>
> >> >>>
> >> >
> >> > Let me propose something crazy.  First, some scenarios:
> >> >
> >> > Scenario A: A program runs in a sandbox.  It tries to interrogate
> >> > its
> >> > seccomp filter.  I think it would be fine, and maybe even
> >> > beneficial,
> >> > if this didn't work (or pretended there was no filter).
> >> >
> >> > Scenario B: A shell runs in a sandbox.  The user fires up some
> >> > program
> >> > in that shell and then, *still in that shell*, checkpoints it
> >> > with
> >> > CRIU.  I think CRIU shouldn't see the seccomp filter.
> >> >
> >> > Scenario C: A program runs in a sandbox.  CRIU, *in a different
> >> > sandbox*, tries to checkpoint that program.  IMO this is doomed
> >> > to
> >> > eventual failure no matter what the kernel does: restoring the
> >> > program
> >> > isn't going to work as expected.
> >> >
> >> > Scenario D: A shell runs in a sandbox.  Someone fires up a
> >> > seccomp-using program (with in inner filter) under that shell and
> >> > then
> >> > separately checkpoints it from inside the shell.  I think CRIU
> >> > should
> >> > see the inner filter but not the outer filter.
> >> >
> >> > So here's my crazy proposal: If process A tries to read process
> >> > B's
> >> > seccomp state, the kernel could check whether B's seccomp state
> >> > is a
> >> > descendent of A's.  If so, then the attempt to read B's seccomp
> >> > state
> >> > should see only the part that stricly descends from A's state.
> >> > (If
> >> > B's and A's states are the same under a referential equality
> >> > test,
> >> > then this means A should see no seccomp state at all).  If, on
> >> > the
> >> > other hand, B's state is not a descendent of A's state, then the
> >> > read
> >> > call should fail.
> >> >
> >> > Thoughts?
> >> >
> >> > At the very least, this means that no one ever needs to worry
> >> > about
> >> > preventing seccomp state reads in their seccomp filter program,
> >> > since
> >> > the filter is invisible from inside the sandbox using that
> >> > filter.
> >>
> >> I don't oppose this idea, but I think our first pass at CRIU can
> >> just
> >> be to work from the init namespace. I don't mind rejecting
> >> filter-reading requests when CAP_SYS_ADMIN is missing or something
> >> like that.
> >
> > The problem is that we can't do a CAP_SYS_ADMIN check, at least with
> > the proposed interface, since the seccomp() syscall will only tell
> > the
> > process about it's own filters. I was going to just have the
> > PT_SUSPEND_SECCOMP flag we use also affect whether seccomp() would
> > allow a task to inspect its own filters.
> >
> > If we do that, the result is that without PT_SUSPEND_SECCOMP, a task
> > always gets EACCES (or an empty list, whichever is better), and with
> > PT_SUSPEND_SECCOMP it always gets the real filters.
> 
> Hmm.  IMO this interface has the downside that we can't really
> implement my crazy proposal on top of it.  Why do we need the task to
> be able to read its own filters at all?  Wouldn't it be easier if the
> ptracer could read the tracee's filters directly?  (Even if it
> required that the tracee be stopped?)

Which is how we ended up with the proposed ptrace() interface. We
could proxy this through seccomp instead, but give that it is an
external process inspecting the filters, it fits better in ptrace IMO.
If we go back to a process asking for its own filters, then we could
do it via seccomp().

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 20:58         ` Alexei Starovoitov
@ 2015-09-04 21:00           ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 21:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Fri, Sep 04, 2015 at 01:58:25PM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 04, 2015 at 01:50:55PM -0700, Kees Cook wrote:
> > On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
> > <tycho.andersen@canonical.com> wrote:
> > > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
> > >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> > >> <tycho.andersen@canonical.com> wrote:
> > >> > This commit adds a way to dump eBPF programs. The initial implementation
> > >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
> > >> > programs which themselves don't currently support maps.
> > >> >
> > >> > We export the GPL bit as well as a unique ID for the program so that
> > >>
> > >> This unique ID appears to be the heap address for the prog. That's a
> > >> huge leak, and should not be done. We don't want to introduce new
> > >> kernel address leaks while we're trying to fix the remaining ones.
> > >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
> > >> could be used, for example.
> > >
> > > No; we acquire the fd per process, so if a task installs a filter and
> > > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> > > different file descriptors. Ideally, we'd have some way to figure out
> > > that these were all the same. Some sort of prog_id is one way,
> > > although there may be others.
> > 
> > If KCMP_FILE or a new KCMP_BPF isn't possible, then we'll probably
> > have to add a unique id (counter) to all bpf programs as they're
> > created.
> 
> I think tweaking KCMP_FILE for anon_inodes should do the trick
> and should work at the end (if it's not working already).

Sounds good. I'll look into that for the next version, thanks.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 20:34   ` Kees Cook
@ 2015-09-04 21:06     ` Tycho Andersen
  2015-09-04 21:08       ` Kees Cook
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 21:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 04, 2015 at 01:34:12PM -0700, Kees Cook wrote:
> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > +static const struct bpf_func_proto *
> > +seccomp_func_proto(enum bpf_func_id func_id)
> > +{
> > +       /* Right now seccomp eBPF loading doesn't support maps; seccomp filters
> > +        * are considered to be read-only after they're installed, so map fds
> > +        * probably need to be invalidated when a seccomp filter with maps is
> > +        * installed.
> > +        *
> > +        * The rest of these might be reasonable to call from seccomp, so we
> > +        * export them.
> > +        */
> > +       switch (func_id) {
> > +       case BPF_FUNC_ktime_get_ns:
> > +               return &bpf_ktime_get_ns_proto;
> > +       case BPF_FUNC_trace_printk:
> > +               return bpf_get_trace_printk_proto();
> > +       case BPF_FUNC_get_prandom_u32:
> > +               return &bpf_get_prandom_u32_proto;
> > +       case BPF_FUNC_get_smp_processor_id:
> > +               return &bpf_get_smp_processor_id_proto;
> > +       case BPF_FUNC_tail_call:
> > +               return &bpf_tail_call_proto;
> > +       case BPF_FUNC_get_current_pid_tgid:
> > +               return &bpf_get_current_pid_tgid_proto;
> > +       case BPF_FUNC_get_current_uid_gid:
> > +               return &bpf_get_current_uid_gid_proto;
> > +       case BPF_FUNC_get_current_comm:
> > +               return &bpf_get_current_comm_proto;
> > +       default:
> > +               return NULL;
> > +       }
> > +}
> 
> While this list is probably fine, I don't want to mix the addition of
> eBPF functions to the seccomp ABI with the CRIU changes. No function
> calls are currently possible and it should stay that way.

Ok, I can remove them.

> I was expecting to see a validator, similar to the existing BPF
> validator that is called when creating seccomp filters currently. Can
> we add a similar validator for new BPF_PROG_TYPE_SECCOMP?

That's effectively what this patch does; when the eBPF is loaded via
bpf(), you tell bpf() you want a BPF_PROG_TYPE_SECCOMP, and it invokes
this validation/translation code, i.e. it uses
seccomp_is_valid_access() to check and make sure access are aligned
and inside struct seccomp_data.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps
  2015-09-04 16:04 ` [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps Tycho Andersen
@ 2015-09-04 21:06   ` Alexei Starovoitov
  2015-09-04 22:43     ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-04 21:06 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

On Fri, Sep 04, 2015 at 10:04:24AM -0600, Tycho Andersen wrote:
> The classic converter generates conditional jumps with:
> 
> if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
>         ...
> } else {
>         insn->dst_reg = BPF_REG_A;
>         insn->src_reg = BPF_REG_X;
>         insn->imm = fp->k;
>         bpf_src = BPF_SRC(fp->code);
> }
> 
> but here, we enforce that the src_reg == BPF_REG_0. We should also allow
> BPF_REG_X since that's what the converter generates; this enables us to
> load eBPF programs that were generated by the converter.

good catch. classic->extended converter is just being untidy.
It shouldn't be populating unused 'src_reg' field when BPF_SRC == BPF_K
verifier is doing the right thing. It's rejecting instructions that
have junk in unused fields to make sure that someday we can extend it
with something useful.
The fix should be something like this:
diff --git a/net/core/filter.c b/net/core/filter.c
index 13079f03902e..05a04ea87172 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -478,9 +478,9 @@ do_pass:
                                bpf_src = BPF_X;
                        } else {
                                insn->dst_reg = BPF_REG_A;
-                               insn->src_reg = BPF_REG_X;
                                insn->imm = fp->k;
                                bpf_src = BPF_SRC(fp->code);
+                               insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
                        }


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 21:06     ` Tycho Andersen
@ 2015-09-04 21:08       ` Kees Cook
  2015-09-09 15:50         ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-04 21:08 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 2:06 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 01:34:12PM -0700, Kees Cook wrote:
>> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > +static const struct bpf_func_proto *
>> > +seccomp_func_proto(enum bpf_func_id func_id)
>> > +{
>> > +       /* Right now seccomp eBPF loading doesn't support maps; seccomp filters
>> > +        * are considered to be read-only after they're installed, so map fds
>> > +        * probably need to be invalidated when a seccomp filter with maps is
>> > +        * installed.
>> > +        *
>> > +        * The rest of these might be reasonable to call from seccomp, so we
>> > +        * export them.
>> > +        */
>> > +       switch (func_id) {
>> > +       case BPF_FUNC_ktime_get_ns:
>> > +               return &bpf_ktime_get_ns_proto;
>> > +       case BPF_FUNC_trace_printk:
>> > +               return bpf_get_trace_printk_proto();
>> > +       case BPF_FUNC_get_prandom_u32:
>> > +               return &bpf_get_prandom_u32_proto;
>> > +       case BPF_FUNC_get_smp_processor_id:
>> > +               return &bpf_get_smp_processor_id_proto;
>> > +       case BPF_FUNC_tail_call:
>> > +               return &bpf_tail_call_proto;
>> > +       case BPF_FUNC_get_current_pid_tgid:
>> > +               return &bpf_get_current_pid_tgid_proto;
>> > +       case BPF_FUNC_get_current_uid_gid:
>> > +               return &bpf_get_current_uid_gid_proto;
>> > +       case BPF_FUNC_get_current_comm:
>> > +               return &bpf_get_current_comm_proto;
>> > +       default:
>> > +               return NULL;
>> > +       }
>> > +}
>>
>> While this list is probably fine, I don't want to mix the addition of
>> eBPF functions to the seccomp ABI with the CRIU changes. No function
>> calls are currently possible and it should stay that way.
>
> Ok, I can remove them.
>
>> I was expecting to see a validator, similar to the existing BPF
>> validator that is called when creating seccomp filters currently. Can
>> we add a similar validator for new BPF_PROG_TYPE_SECCOMP?
>
> That's effectively what this patch does; when the eBPF is loaded via
> bpf(), you tell bpf() you want a BPF_PROG_TYPE_SECCOMP, and it invokes
> this validation/translation code, i.e. it uses
> seccomp_is_valid_access() to check and make sure access are aligned
> and inside struct seccomp_data.

What about limiting the possible instructions?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 20:17   ` Alexei Starovoitov
@ 2015-09-04 21:09     ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 21:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

On Fri, Sep 04, 2015 at 01:17:47PM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 04, 2015 at 10:04:19AM -0600, Tycho Andersen wrote:
> > seccomp uses eBPF as its underlying storage and execution format, and eBPF
> > has features that seccomp would like to make use of in the future. This
> > patch adds a formal seccomp type to the eBPF verifier.
> > 
> > The current implementation of the seccomp eBPF type is very limited, and
> > doesn't support some interesting features (notably, maps) of eBPF. However,
> > the primary motivation for this patchset is to enable checkpoint/restore
> > for seccomp filters later in the series, to this limited feature set is ok
> > for now.
> 
> yes. good compromise to start.
> 
> > +static const struct bpf_func_proto *
> > +seccomp_func_proto(enum bpf_func_id func_id)
> > +{
> > +	/* Right now seccomp eBPF loading doesn't support maps; seccomp filters
> > +	 * are considered to be read-only after they're installed, so map fds
> > +	 * probably need to be invalidated when a seccomp filter with maps is
> > +	 * installed.
> 
> Just disabling bpf_map_lookup/update() helpers (the way you did here)
> is enough. The prorgram can still have references to maps, but since they
> won't be accessed it's safe.
> 
> > +	 *
> > +	 * The rest of these might be reasonable to call from seccomp, so we
> > +	 * export them.
> > +	 */
> > +	switch (func_id) {
> > +	case BPF_FUNC_ktime_get_ns:
> > +		return &bpf_ktime_get_ns_proto;
> > +	case BPF_FUNC_trace_printk:
> > +		return bpf_get_trace_printk_proto();
> > +	case BPF_FUNC_get_prandom_u32:
> > +		return &bpf_get_prandom_u32_proto;
> > +	case BPF_FUNC_get_smp_processor_id:
> > +		return &bpf_get_smp_processor_id_proto;
> > +	case BPF_FUNC_tail_call:
> > +		return &bpf_tail_call_proto;
> > +	case BPF_FUNC_get_current_pid_tgid:
> > +		return &bpf_get_current_pid_tgid_proto;
> > +	case BPF_FUNC_get_current_uid_gid:
> > +		return &bpf_get_current_uid_gid_proto;
> > +	case BPF_FUNC_get_current_comm:
> > +		return &bpf_get_current_comm_proto;
> 
> the list looks good to start with.
> 
> >  
> > +static u32 seccomp_convert_ctx_access(enum bpf_access_type type, int dst_reg,
> > +				      int src_reg, int ctx_off,
> > +				      struct bpf_insn *insn_buf)
> > +{
> > +	struct bpf_insn *insn = insn_buf;
> > +
> > +	switch (ctx_off) {
> > +	case offsetof(struct seccomp_data, nr):
> 
> the conversion of seccomp_data fields is unnecessary.
> We're doing conversion for sk_buff, because sk_buff and __sk_buff aree two
> different structures. __sk_buff is user ABI with its own fields that losely
> correspond to in-kernel struct sk_buff.
> seccomp_data is already part of user ABI, so it's ok to access as-is.

Ok, I noticed this but somehow didn't put it all together. I'll axe
this for the next version, thanks.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 20:45     ` Tycho Andersen
  2015-09-04 20:50       ` Kees Cook
@ 2015-09-04 21:48       ` Andy Lutomirski
  2015-09-04 22:28         ` Tycho Andersen
  1 sibling, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2015-09-04 21:48 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
>> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > This commit adds a way to dump eBPF programs. The initial implementation
>> > doesn't support maps, and therefore only allows dumping seccomp ebpf
>> > programs which themselves don't currently support maps.
>> >
>> > We export the GPL bit as well as a unique ID for the program so that
>>
>> This unique ID appears to be the heap address for the prog. That's a
>> huge leak, and should not be done. We don't want to introduce new
>> kernel address leaks while we're trying to fix the remaining ones.
>> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
>> could be used, for example.
>
> No; we acquire the fd per process, so if a task installs a filter and
> then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> different file descriptors. Ideally, we'd have some way to figure out
> that these were all the same. Some sort of prog_id is one way,
> although there may be others.

I disagree a bit.  I think we want the actual hierarchy to be a
well-defined thing, because I have plans to make the hierarchy
actually do something.  That means that we'll need to have a more
exact way to dump the hierarchy than "these two filters are identical"
or "these two filters are not identical".

--Andy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 16:04 ` [PATCH 1/6] ebpf: add a seccomp program type Tycho Andersen
  2015-09-04 20:17   ` Alexei Starovoitov
  2015-09-04 20:34   ` Kees Cook
@ 2015-09-04 21:50   ` Andy Lutomirski
  2015-09-09 16:13     ` Daniel Borkmann
  2 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2015-09-04 21:50 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development

On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> seccomp uses eBPF as its underlying storage and execution format, and eBPF
> has features that seccomp would like to make use of in the future. This
> patch adds a formal seccomp type to the eBPF verifier.
>
> The current implementation of the seccomp eBPF type is very limited, and
> doesn't support some interesting features (notably, maps) of eBPF. However,
> the primary motivation for this patchset is to enable checkpoint/restore
> for seccomp filters later in the series, to this limited feature set is ok
> for now.
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h |  1 +
>  net/core/filter.c        | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 96 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 29ef6f9..79b825a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -122,6 +122,7 @@ enum bpf_prog_type {
>         BPF_PROG_TYPE_KPROBE,
>         BPF_PROG_TYPE_SCHED_CLS,
>         BPF_PROG_TYPE_SCHED_ACT,
> +       BPF_PROG_TYPE_SECCOMP,
>  };
>
>  #define BPF_PSEUDO_MAP_FD      1
> diff --git a/net/core/filter.c b/net/core/filter.c
> index be3098f..ed339fa 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1466,6 +1466,39 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
>         }
>  }
>
> +static const struct bpf_func_proto *
> +seccomp_func_proto(enum bpf_func_id func_id)
> +{
> +       /* Right now seccomp eBPF loading doesn't support maps; seccomp filters
> +        * are considered to be read-only after they're installed, so map fds
> +        * probably need to be invalidated when a seccomp filter with maps is
> +        * installed.
> +        *
> +        * The rest of these might be reasonable to call from seccomp, so we
> +        * export them.
> +        */
> +       switch (func_id) {
> +       case BPF_FUNC_ktime_get_ns:
> +               return &bpf_ktime_get_ns_proto;
> +       case BPF_FUNC_trace_printk:
> +               return bpf_get_trace_printk_proto();
> +       case BPF_FUNC_get_prandom_u32:
> +               return &bpf_get_prandom_u32_proto;

I don't think we should expose prandom to unprivileged userspace.
This may be an attack vector.

Also, before we allow anything with side effects, we should think
carefully about whether seccomp filters with side effects (other than
through their return value) are okay.

--Andy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/6] seccomp: make underlying bpf ref counted as well
  2015-09-04 16:04 ` [PATCH 2/6] seccomp: make underlying bpf ref counted as well Tycho Andersen
@ 2015-09-04 21:53   ` Andy Lutomirski
  0 siblings, 0 replies; 55+ messages in thread
From: Andy Lutomirski @ 2015-09-04 21:53 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development

On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> Additionally, we mark classic converted seccomp filters as seccomp eBPF
> programs, since they are a subset of what is supported in seccomp eBPF.

Off the top of my head, I'm okay with this.

--Andy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 21:48       ` Andy Lutomirski
@ 2015-09-04 22:28         ` Tycho Andersen
  2015-09-04 23:08           ` Andy Lutomirski
  2015-09-04 23:27           ` Kees Cook
  0 siblings, 2 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 22:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Fri, Sep 04, 2015 at 02:48:03PM -0700, Andy Lutomirski wrote:
> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > This commit adds a way to dump eBPF programs. The initial implementation
> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
> >> > programs which themselves don't currently support maps.
> >> >
> >> > We export the GPL bit as well as a unique ID for the program so that
> >>
> >> This unique ID appears to be the heap address for the prog. That's a
> >> huge leak, and should not be done. We don't want to introduce new
> >> kernel address leaks while we're trying to fix the remaining ones.
> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
> >> could be used, for example.
> >
> > No; we acquire the fd per process, so if a task installs a filter and
> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> > different file descriptors. Ideally, we'd have some way to figure out
> > that these were all the same. Some sort of prog_id is one way,
> > although there may be others.
> 
> I disagree a bit.  I think we want the actual hierarchy to be a
> well-defined thing, because I have plans to make the hierarchy
> actually do something.  That means that we'll need to have a more
> exact way to dump the hierarchy than "these two filters are identical"
> or "these two filters are not identical".

Can you elaborate on what this would look like? I think with the
"these two filters are the same" primitive (the same in the sense that
they were inherited during a fork, not just that
memcmp(filter1->insns, filter2->insns) == 0) you can infer the entire
hierarchy, however clunky it may be to do so.

Another issue is that KCMP_FILE won't work in this case, as it
effectively compares the struct file *, which will be different since
we need to call anon_inode_getfd() for each call of
ptrace(PTRACE_SECCOMP_GET_FILTER_FD). We could add a KCMP_BPF (or just
a KCMP_FILE_PRIVATE_DATA, since that's effectively what it would be).
Does that make sense? [added Cyrill]

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps
  2015-09-04 21:06   ` Alexei Starovoitov
@ 2015-09-04 22:43     ` Tycho Andersen
  2015-09-05  4:12       ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-04 22:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

Hi Alexei,

On Fri, Sep 04, 2015 at 02:06:19PM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 04, 2015 at 10:04:24AM -0600, Tycho Andersen wrote:
> > The classic converter generates conditional jumps with:
> > 
> > if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
> >         ...
> > } else {
> >         insn->dst_reg = BPF_REG_A;
> >         insn->src_reg = BPF_REG_X;
> >         insn->imm = fp->k;
> >         bpf_src = BPF_SRC(fp->code);
> > }
> > 
> > but here, we enforce that the src_reg == BPF_REG_0. We should also allow
> > BPF_REG_X since that's what the converter generates; this enables us to
> > load eBPF programs that were generated by the converter.
> 
> good catch. classic->extended converter is just being untidy.
> It shouldn't be populating unused 'src_reg' field when BPF_SRC == BPF_K
> verifier is doing the right thing. It's rejecting instructions that
> have junk in unused fields to make sure that someday we can extend it
> with something useful.
> The fix should be something like this:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 13079f03902e..05a04ea87172 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -478,9 +478,9 @@ do_pass:
>                                 bpf_src = BPF_X;
>                         } else {
>                                 insn->dst_reg = BPF_REG_A;
> -                               insn->src_reg = BPF_REG_X;
>                                 insn->imm = fp->k;
>                                 bpf_src = BPF_SRC(fp->code);
> +                               insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
>                         }

Yep, I just tested this and it works for me. Do you want to manage it
or should I carry it as part of this set?

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 22:28         ` Tycho Andersen
@ 2015-09-04 23:08           ` Andy Lutomirski
  2015-09-05  0:27             ` Tycho Andersen
  2015-09-04 23:27           ` Kees Cook
  1 sibling, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2015-09-04 23:08 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Fri, Sep 4, 2015 at 3:28 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 02:48:03PM -0700, Andy Lutomirski wrote:
>> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
>> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> >> <tycho.andersen@canonical.com> wrote:
>> >> > This commit adds a way to dump eBPF programs. The initial implementation
>> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
>> >> > programs which themselves don't currently support maps.
>> >> >
>> >> > We export the GPL bit as well as a unique ID for the program so that
>> >>
>> >> This unique ID appears to be the heap address for the prog. That's a
>> >> huge leak, and should not be done. We don't want to introduce new
>> >> kernel address leaks while we're trying to fix the remaining ones.
>> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
>> >> could be used, for example.
>> >
>> > No; we acquire the fd per process, so if a task installs a filter and
>> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
>> > different file descriptors. Ideally, we'd have some way to figure out
>> > that these were all the same. Some sort of prog_id is one way,
>> > although there may be others.
>>
>> I disagree a bit.  I think we want the actual hierarchy to be a
>> well-defined thing, because I have plans to make the hierarchy
>> actually do something.  That means that we'll need to have a more
>> exact way to dump the hierarchy than "these two filters are identical"
>> or "these two filters are not identical".
>
> Can you elaborate on what this would look like? I think with the
> "these two filters are the same" primitive (the same in the sense that
> they were inherited during a fork, not just that
> memcmp(filter1->insns, filter2->insns) == 0) you can infer the entire
> hierarchy, however clunky it may be to do so.
>
> Another issue is that KCMP_FILE won't work in this case, as it
> effectively compares the struct file *, which will be different since
> we need to call anon_inode_getfd() for each call of
> ptrace(PTRACE_SECCOMP_GET_FILTER_FD). We could add a KCMP_BPF (or just
> a KCMP_FILE_PRIVATE_DATA, since that's effectively what it would be).
> Does that make sense? [added Cyrill]
>

I don't really know what it would look like.  I think we want a way to
compare struct seccomp_filter pointers.

FWIW, I *hate* kcmp.  It might be worth trying to come up with a less
awful way to do this.  For example, what if we could generate a kcmpfd
such that each kcmpfd contains (internally) a random symmetric key?
We could have a function that would return kernel pointers encrypted
by that key.

Of course, then we need to make sure that no one ever tries to keep a
kcmpfd around long enough that CRIU needs to checkpoint it, because
that's impossible.

Grr.

--Andy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 22:28         ` Tycho Andersen
  2015-09-04 23:08           ` Andy Lutomirski
@ 2015-09-04 23:27           ` Kees Cook
  2015-09-05  0:08             ` Andy Lutomirski
  1 sibling, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-04 23:27 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Andy Lutomirski, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Fri, Sep 4, 2015 at 3:28 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 02:48:03PM -0700, Andy Lutomirski wrote:
>> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
>> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> >> <tycho.andersen@canonical.com> wrote:
>> >> > This commit adds a way to dump eBPF programs. The initial implementation
>> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
>> >> > programs which themselves don't currently support maps.
>> >> >
>> >> > We export the GPL bit as well as a unique ID for the program so that
>> >>
>> >> This unique ID appears to be the heap address for the prog. That's a
>> >> huge leak, and should not be done. We don't want to introduce new
>> >> kernel address leaks while we're trying to fix the remaining ones.
>> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
>> >> could be used, for example.
>> >
>> > No; we acquire the fd per process, so if a task installs a filter and
>> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
>> > different file descriptors. Ideally, we'd have some way to figure out
>> > that these were all the same. Some sort of prog_id is one way,
>> > although there may be others.
>>
>> I disagree a bit.  I think we want the actual hierarchy to be a
>> well-defined thing, because I have plans to make the hierarchy
>> actually do something.  That means that we'll need to have a more
>> exact way to dump the hierarchy than "these two filters are identical"
>> or "these two filters are not identical".
>
> Can you elaborate on what this would look like? I think with the
> "these two filters are the same" primitive (the same in the sense that
> they were inherited during a fork, not just that
> memcmp(filter1->insns, filter2->insns) == 0) you can infer the entire
> hierarchy, however clunky it may be to do so.
>
> Another issue is that KCMP_FILE won't work in this case, as it
> effectively compares the struct file *, which will be different since
> we need to call anon_inode_getfd() for each call of
> ptrace(PTRACE_SECCOMP_GET_FILTER_FD). We could add a KCMP_BPF (or just
> a KCMP_FILE_PRIVATE_DATA, since that's effectively what it would be).
> Does that make sense? [added Cyrill]

If KCMP_FILE_PRIVATE_DATA isn't desired, I think a global counter id
is the next best.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 23:27           ` Kees Cook
@ 2015-09-05  0:08             ` Andy Lutomirski
  0 siblings, 0 replies; 55+ messages in thread
From: Andy Lutomirski @ 2015-09-05  0:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Fri, Sep 4, 2015 at 4:27 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Sep 4, 2015 at 3:28 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
>> On Fri, Sep 04, 2015 at 02:48:03PM -0700, Andy Lutomirski wrote:
>>> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
>>> <tycho.andersen@canonical.com> wrote:
>>> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
>>> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>>> >> <tycho.andersen@canonical.com> wrote:
>>> >> > This commit adds a way to dump eBPF programs. The initial implementation
>>> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
>>> >> > programs which themselves don't currently support maps.
>>> >> >
>>> >> > We export the GPL bit as well as a unique ID for the program so that
>>> >>
>>> >> This unique ID appears to be the heap address for the prog. That's a
>>> >> huge leak, and should not be done. We don't want to introduce new
>>> >> kernel address leaks while we're trying to fix the remaining ones.
>>> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
>>> >> could be used, for example.
>>> >
>>> > No; we acquire the fd per process, so if a task installs a filter and
>>> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
>>> > different file descriptors. Ideally, we'd have some way to figure out
>>> > that these were all the same. Some sort of prog_id is one way,
>>> > although there may be others.
>>>
>>> I disagree a bit.  I think we want the actual hierarchy to be a
>>> well-defined thing, because I have plans to make the hierarchy
>>> actually do something.  That means that we'll need to have a more
>>> exact way to dump the hierarchy than "these two filters are identical"
>>> or "these two filters are not identical".
>>
>> Can you elaborate on what this would look like? I think with the
>> "these two filters are the same" primitive (the same in the sense that
>> they were inherited during a fork, not just that
>> memcmp(filter1->insns, filter2->insns) == 0) you can infer the entire
>> hierarchy, however clunky it may be to do so.
>>
>> Another issue is that KCMP_FILE won't work in this case, as it
>> effectively compares the struct file *, which will be different since
>> we need to call anon_inode_getfd() for each call of
>> ptrace(PTRACE_SECCOMP_GET_FILTER_FD). We could add a KCMP_BPF (or just
>> a KCMP_FILE_PRIVATE_DATA, since that's effectively what it would be).
>> Does that make sense? [added Cyrill]
>
> If KCMP_FILE_PRIVATE_DATA isn't desired, I think a global counter id
> is the next best.

The problem is that you can't checkpoint and restore it.  We could
have a counter relative to the parent filter, though.

--Andy

>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-04 23:08           ` Andy Lutomirski
@ 2015-09-05  0:27             ` Tycho Andersen
  2015-09-09 22:34               ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-05  0:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Fri, Sep 04, 2015 at 04:08:53PM -0700, Andy Lutomirski wrote:
> On Fri, Sep 4, 2015 at 3:28 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Fri, Sep 04, 2015 at 02:48:03PM -0700, Andy Lutomirski wrote:
> >> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
> >> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> >> >> <tycho.andersen@canonical.com> wrote:
> >> >> > This commit adds a way to dump eBPF programs. The initial implementation
> >> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
> >> >> > programs which themselves don't currently support maps.
> >> >> >
> >> >> > We export the GPL bit as well as a unique ID for the program so that
> >> >>
> >> >> This unique ID appears to be the heap address for the prog. That's a
> >> >> huge leak, and should not be done. We don't want to introduce new
> >> >> kernel address leaks while we're trying to fix the remaining ones.
> >> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
> >> >> could be used, for example.
> >> >
> >> > No; we acquire the fd per process, so if a task installs a filter and
> >> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> >> > different file descriptors. Ideally, we'd have some way to figure out
> >> > that these were all the same. Some sort of prog_id is one way,
> >> > although there may be others.
> >>
> >> I disagree a bit.  I think we want the actual hierarchy to be a
> >> well-defined thing, because I have plans to make the hierarchy
> >> actually do something.  That means that we'll need to have a more
> >> exact way to dump the hierarchy than "these two filters are identical"
> >> or "these two filters are not identical".
> >
> > Can you elaborate on what this would look like? I think with the
> > "these two filters are the same" primitive (the same in the sense that
> > they were inherited during a fork, not just that
> > memcmp(filter1->insns, filter2->insns) == 0) you can infer the entire
> > hierarchy, however clunky it may be to do so.
> >
> > Another issue is that KCMP_FILE won't work in this case, as it
> > effectively compares the struct file *, which will be different since
> > we need to call anon_inode_getfd() for each call of
> > ptrace(PTRACE_SECCOMP_GET_FILTER_FD). We could add a KCMP_BPF (or just
> > a KCMP_FILE_PRIVATE_DATA, since that's effectively what it would be).
> > Does that make sense? [added Cyrill]
> >
> 
> I don't really know what it would look like.  I think we want a way to
> compare struct seccomp_filter pointers.

Not to complicate things further, but this brings up another
interesting issue. Right now, we require PT_SUSPEND_SECCOMP in order
to restore seccomp and do things afterwards (otherwise the seccomp
filters might kill whatever things the restore process is doing). If
we want the struct seccomp_filter pointers to be identical on restore,
that means we need to restore when we are real root, because bpf()
requires that we be real root. This means that we essentially need to
ptrace the entire restore, which we don't want to do.

In order to work around this, I was thinking we could change the
ancestry check slightly:

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 9c6bea6..efc3f36 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -239,7 +239,7 @@ static int is_ancestor(struct seccomp_filter *parent,
 	if (parent == NULL)
 		return 1;
 	for (; child; child = child->prev)
-		if (child == parent)
+		if (child->prog == parent->prog)
 			return 1;
 	return 0;
 }

so that we can do bpf() when we're real root, and just restore seccomp
at the very end. This would mean that the struct bpf_prog pointers are
shared, but the struct seccomp_filter pointers aren't.

Even assuming we have some sort of way to identify shared ancestry, we
still need something like the above in order to be able to restore it.
This sort of sucks; it would be ideal to to share struct
seccomp_filter *s too. We could do something like
seccomp(COPY_FROM_PARENT) or something, but given the struggles Kees
told me he had with getting SECCOMP_FILTER_FLAG_TSYNC right, I suspect
that won't fly.

> FWIW, I *hate* kcmp.  It might be worth trying to come up with a less
> awful way to do this.  For example, what if we could generate a kcmpfd
> such that each kcmpfd contains (internally) a random symmetric key?
> We could have a function that would return kernel pointers encrypted
> by that key.

We could do what Kees is proposing for struct bpf_prog and just keep a
globally unique id on struct seccomp_filter, and allow asking for that
via some seccomp(GET_ID, fd) over the same fd we're using to dump the
bpf prog. That doesn't solve our restore problem, though.

Tycho

> Of course, then we need to make sure that no one ever tries to keep a
> kcmpfd around long enough that CRIU needs to checkpoint it, because
> that's impossible.
> 
> Grr.
> 
> --Andy

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps
  2015-09-04 22:43     ` Tycho Andersen
@ 2015-09-05  4:12       ` Alexei Starovoitov
  0 siblings, 0 replies; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-05  4:12 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev

On Fri, Sep 04, 2015 at 04:43:50PM -0600, Tycho Andersen wrote:
> > The fix should be something like this:
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 13079f03902e..05a04ea87172 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -478,9 +478,9 @@ do_pass:
> >                                 bpf_src = BPF_X;
> >                         } else {
> >                                 insn->dst_reg = BPF_REG_A;
> > -                               insn->src_reg = BPF_REG_X;
> >                                 insn->imm = fp->k;
> >                                 bpf_src = BPF_SRC(fp->code);
> > +                               insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
> >                         }
> 
> Yep, I just tested this and it works for me. Do you want to manage it
> or should I carry it as part of this set?

Though it's a bug, it doesn't affect anything at the moment
and not worth fixing in net, so please submit it as separate bug
fix when net-next reopens. imo the rest of the patches should
also go via net-next to minimize cross-tree conflicts.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-04 20:41   ` Kees Cook
@ 2015-09-05  7:13     ` Michael Kerrisk (man-pages)
  2015-09-08 13:40       ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-09-05  7:13 UTC (permalink / raw)
  To: Kees Cook, Tycho Andersen, Linux API
  Cc: mtk.manpages, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On 09/04/2015 10:41 PM, Kees Cook wrote:
> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
>> This is the final bit needed to support seccomp filters created via the bpf
>> syscall.

Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
the outset.

Tycho, where's the man-pages patch describing this new kernel-userspace
API feature? :-)

>> One concern with this patch is exactly what the interface should look like
>> for users, since seccomp()'s second argument is a pointer, we could ask
>> people to pass a pointer to the fd, but implies we might write to it which
>> seems impolite. Right now we cast the pointer (and force the user to cast
>> it), which generates ugly warnings. I'm not sure what the right answer is
>> here.
>>
>> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
>> CC: Kees Cook <keescook@chromium.org>
>> CC: Will Drewry <wad@chromium.org>
>> CC: Oleg Nesterov <oleg@redhat.com>
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Pavel Emelyanov <xemul@parallels.com>
>> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
>> CC: Alexei Starovoitov <ast@kernel.org>
>> CC: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>  include/linux/seccomp.h      |  3 +-
>>  include/uapi/linux/seccomp.h |  1 +
>>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
>>  3 files changed, 61 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index d1a86ed..a725dd5 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -3,7 +3,8 @@
>>
>>  #include <uapi/linux/seccomp.h>
>>
>> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
>> +#define SECCOMP_FILTER_FLAG_MASK       (\
>> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>>
>>  #ifdef CONFIG_SECCOMP
>>
>> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> index 0f238a4..c29a423 100644
>> --- a/include/uapi/linux/seccomp.h
>> +++ b/include/uapi/linux/seccomp.h
>> @@ -16,6 +16,7 @@
>>
>>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>>  #define SECCOMP_FILTER_FLAG_TSYNC      1
>> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
>>
>>  /*
>>   * All BPF programs must return a 32-bit value.
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index a2c5b32..9c6bea6 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>>
>>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>>
>> -       /*
>> -        * Installing a seccomp filter requires that the task has
>> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> -        * This avoids scenarios where unprivileged tasks can affect the
>> -        * behavior of privileged children.
>> -        */
>> -       if (!task_no_new_privs(current) &&
>> -           security_capable_noaudit(current_cred(), current_user_ns(),
>> -                                    CAP_SYS_ADMIN) != 0)
>> -               return ERR_PTR(-EACCES);
>> -
>>         /* Allocate a new seccomp_filter */
>>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>>         if (!sfilter)
>> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
>>         info.si_syscall = syscall;
>>         force_sig_info(SIGSYS, &info, current);
>>  }
>> +
>> +#ifdef CONFIG_BPF_SYSCALL
>> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> +{
>> +       /* XXX: this cast generates a warning. should we make people pass in
>> +        * &fd, or is there some nicer way of doing this?
>> +        */
>> +       u32 fd = (u32) filter;
> 
> I think this is probably the right way to do it, modulo getting the
> warning fixed. Let me invoke the great linux-api subscribers to get
> some more opinions.

Sigh. It's sad, but the using a cast does seem the simplest option.
But, how about another idea...

> tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
> pointer argument into an fd argument. Is this sane, should it be a
> pointer to an fd, or should it not be a flag at all, creating a new
> seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?

What about

    seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)

Where structp is a pointer to something like

struct seccomp_ebpf {
    int size;      /* Size of this whole struct */
    int fd;
}

'size' allows for future expansion of the struct (in case we want to 
expand it later), and placing 'fd' inside a struct avoids unpleasant
implication that would be made by passing a pointer to an fd as the
third argument.

Cheers,

Michael


> -Kees
> 
>> +       struct seccomp_filter *ret;
>> +       struct bpf_prog *prog;
>> +
>> +       prog = bpf_prog_get(fd);
>> +       if (IS_ERR(prog))
>> +               return (struct seccomp_filter *) prog;
>> +
>> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
>> +               bpf_prog_put(prog);
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
>> +       if (!ret) {
>> +               bpf_prog_put(prog);
>> +               return ERR_PTR(-ENOMEM);
>> +       }
>> +
>> +       ret->prog = prog;
>> +       atomic_set(&ret->usage, 1);
>> +
>> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
>> +        * is refcounted too and we're holding a reference from the struct
>> +        * seccomp_filter object.
>> +        */
>> +
>> +       return ret;
>> +}
>> +#else
>> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> +{
>> +       return ERR_PTR(-EINVAL);
>> +}
>> +#endif
>>  #endif /* CONFIG_SECCOMP_FILTER */
>>
>>  /*
>> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
>>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
>>                 return -EINVAL;
>>
>> +       /*
>> +        * Installing a seccomp filter requires that the task has
>> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> +        * This avoids scenarios where unprivileged tasks can affect the
>> +        * behavior of privileged children.
>> +        */
>> +       if (!task_no_new_privs(current) &&
>> +           security_capable_noaudit(current_cred(), current_user_ns(),
>> +                                    CAP_SYS_ADMIN) != 0)
>> +               return -EACCES;
>> +
>>         /* Prepare the new filter before holding any locks. */
>> -       prepared = seccomp_prepare_user_filter(filter);
>> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
>> +               prepared = seccomp_prepare_ebpf(filter);
>> +       else
>> +               prepared = seccomp_prepare_user_filter(filter);
>> +
>>         if (IS_ERR(prepared))
>>                 return PTR_ERR(prepared);
>>
>> --
>> 2.1.4
>>
> 
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-05  7:13     ` Michael Kerrisk (man-pages)
@ 2015-09-08 13:40       ` Tycho Andersen
  2015-09-09  0:07         ` Kees Cook
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-08 13:40 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Kees Cook, Linux API, Alexei Starovoitov, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote:
> On 09/04/2015 10:41 PM, Kees Cook wrote:
> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> > <tycho.andersen@canonical.com> wrote:
> >> This is the final bit needed to support seccomp filters created via the bpf
> >> syscall.
> 
> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
> the outset.

Apologies, I'll cc the list on future versions.

> Tycho, where's the man-pages patch describing this new kernel-userspace
> API feature? :-)

Once we get the API finalized I'm happy to write it.

> >> One concern with this patch is exactly what the interface should look like
> >> for users, since seccomp()'s second argument is a pointer, we could ask
> >> people to pass a pointer to the fd, but implies we might write to it which
> >> seems impolite. Right now we cast the pointer (and force the user to cast
> >> it), which generates ugly warnings. I'm not sure what the right answer is
> >> here.
> >>
> >> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> >> CC: Kees Cook <keescook@chromium.org>
> >> CC: Will Drewry <wad@chromium.org>
> >> CC: Oleg Nesterov <oleg@redhat.com>
> >> CC: Andy Lutomirski <luto@amacapital.net>
> >> CC: Pavel Emelyanov <xemul@parallels.com>
> >> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> >> CC: Alexei Starovoitov <ast@kernel.org>
> >> CC: Daniel Borkmann <daniel@iogearbox.net>
> >> ---
> >>  include/linux/seccomp.h      |  3 +-
> >>  include/uapi/linux/seccomp.h |  1 +
> >>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
> >>  3 files changed, 61 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> >> index d1a86ed..a725dd5 100644
> >> --- a/include/linux/seccomp.h
> >> +++ b/include/linux/seccomp.h
> >> @@ -3,7 +3,8 @@
> >>
> >>  #include <uapi/linux/seccomp.h>
> >>
> >> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
> >> +#define SECCOMP_FILTER_FLAG_MASK       (\
> >> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
> >>
> >>  #ifdef CONFIG_SECCOMP
> >>
> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> >> index 0f238a4..c29a423 100644
> >> --- a/include/uapi/linux/seccomp.h
> >> +++ b/include/uapi/linux/seccomp.h
> >> @@ -16,6 +16,7 @@
> >>
> >>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
> >>  #define SECCOMP_FILTER_FLAG_TSYNC      1
> >> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
> >>
> >>  /*
> >>   * All BPF programs must return a 32-bit value.
> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> >> index a2c5b32..9c6bea6 100644
> >> --- a/kernel/seccomp.c
> >> +++ b/kernel/seccomp.c
> >> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
> >>
> >>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
> >>
> >> -       /*
> >> -        * Installing a seccomp filter requires that the task has
> >> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >> -        * This avoids scenarios where unprivileged tasks can affect the
> >> -        * behavior of privileged children.
> >> -        */
> >> -       if (!task_no_new_privs(current) &&
> >> -           security_capable_noaudit(current_cred(), current_user_ns(),
> >> -                                    CAP_SYS_ADMIN) != 0)
> >> -               return ERR_PTR(-EACCES);
> >> -
> >>         /* Allocate a new seccomp_filter */
> >>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
> >>         if (!sfilter)
> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
> >>         info.si_syscall = syscall;
> >>         force_sig_info(SIGSYS, &info, current);
> >>  }
> >> +
> >> +#ifdef CONFIG_BPF_SYSCALL
> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> >> +{
> >> +       /* XXX: this cast generates a warning. should we make people pass in
> >> +        * &fd, or is there some nicer way of doing this?
> >> +        */
> >> +       u32 fd = (u32) filter;
> > 
> > I think this is probably the right way to do it, modulo getting the
> > warning fixed. Let me invoke the great linux-api subscribers to get
> > some more opinions.
> 
> Sigh. It's sad, but the using a cast does seem the simplest option.
> But, how about another idea...
> 
> > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
> > pointer argument into an fd argument. Is this sane, should it be a
> > pointer to an fd, or should it not be a flag at all, creating a new
> > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?
> 
> What about
> 
>     seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)
> 
> Where structp is a pointer to something like
> 
> struct seccomp_ebpf {
>     int size;      /* Size of this whole struct */
>     int fd;
> }
> 
> 'size' allows for future expansion of the struct (in case we want to 
> expand it later), and placing 'fd' inside a struct avoids unpleasant
> implication that would be made by passing a pointer to an fd as the
> third argument.

I like this; although perhaps something like bpf() has, with the
unions inside the struct so that we don't have to declare another
struct if we add another seccomp command. Kees?

Tycho

> Cheers,
> 
> Michael
> 
> 
> > -Kees
> > 
> >> +       struct seccomp_filter *ret;
> >> +       struct bpf_prog *prog;
> >> +
> >> +       prog = bpf_prog_get(fd);
> >> +       if (IS_ERR(prog))
> >> +               return (struct seccomp_filter *) prog;
> >> +
> >> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> >> +               bpf_prog_put(prog);
> >> +               return ERR_PTR(-EINVAL);
> >> +       }
> >> +
> >> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> >> +       if (!ret) {
> >> +               bpf_prog_put(prog);
> >> +               return ERR_PTR(-ENOMEM);
> >> +       }
> >> +
> >> +       ret->prog = prog;
> >> +       atomic_set(&ret->usage, 1);
> >> +
> >> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
> >> +        * is refcounted too and we're holding a reference from the struct
> >> +        * seccomp_filter object.
> >> +        */
> >> +
> >> +       return ret;
> >> +}
> >> +#else
> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> >> +{
> >> +       return ERR_PTR(-EINVAL);
> >> +}
> >> +#endif
> >>  #endif /* CONFIG_SECCOMP_FILTER */
> >>
> >>  /*
> >> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
> >>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
> >>                 return -EINVAL;
> >>
> >> +       /*
> >> +        * Installing a seccomp filter requires that the task has
> >> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >> +        * This avoids scenarios where unprivileged tasks can affect the
> >> +        * behavior of privileged children.
> >> +        */
> >> +       if (!task_no_new_privs(current) &&
> >> +           security_capable_noaudit(current_cred(), current_user_ns(),
> >> +                                    CAP_SYS_ADMIN) != 0)
> >> +               return -EACCES;
> >> +
> >>         /* Prepare the new filter before holding any locks. */
> >> -       prepared = seccomp_prepare_user_filter(filter);
> >> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
> >> +               prepared = seccomp_prepare_ebpf(filter);
> >> +       else
> >> +               prepared = seccomp_prepare_user_filter(filter);
> >> +
> >>         if (IS_ERR(prepared))
> >>                 return PTR_ERR(prepared);
> >>
> >> --
> >> 2.1.4
> >>
> > 
> > 
> > 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-08 13:40       ` Tycho Andersen
@ 2015-09-09  0:07         ` Kees Cook
  2015-09-09 14:47           ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-09  0:07 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Michael Kerrisk (man-pages),
	Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Tue, Sep 8, 2015 at 6:40 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote:
>> On 09/04/2015 10:41 PM, Kees Cook wrote:
>> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> > <tycho.andersen@canonical.com> wrote:
>> >> This is the final bit needed to support seccomp filters created via the bpf
>> >> syscall.
>>
>> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
>> the outset.
>
> Apologies, I'll cc the list on future versions.
>
>> Tycho, where's the man-pages patch describing this new kernel-userspace
>> API feature? :-)
>
> Once we get the API finalized I'm happy to write it.
>
>> >> One concern with this patch is exactly what the interface should look like
>> >> for users, since seccomp()'s second argument is a pointer, we could ask
>> >> people to pass a pointer to the fd, but implies we might write to it which
>> >> seems impolite. Right now we cast the pointer (and force the user to cast
>> >> it), which generates ugly warnings. I'm not sure what the right answer is
>> >> here.
>> >>
>> >> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
>> >> CC: Kees Cook <keescook@chromium.org>
>> >> CC: Will Drewry <wad@chromium.org>
>> >> CC: Oleg Nesterov <oleg@redhat.com>
>> >> CC: Andy Lutomirski <luto@amacapital.net>
>> >> CC: Pavel Emelyanov <xemul@parallels.com>
>> >> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
>> >> CC: Alexei Starovoitov <ast@kernel.org>
>> >> CC: Daniel Borkmann <daniel@iogearbox.net>
>> >> ---
>> >>  include/linux/seccomp.h      |  3 +-
>> >>  include/uapi/linux/seccomp.h |  1 +
>> >>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
>> >>  3 files changed, 61 insertions(+), 13 deletions(-)
>> >>
>> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> >> index d1a86ed..a725dd5 100644
>> >> --- a/include/linux/seccomp.h
>> >> +++ b/include/linux/seccomp.h
>> >> @@ -3,7 +3,8 @@
>> >>
>> >>  #include <uapi/linux/seccomp.h>
>> >>
>> >> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
>> >> +#define SECCOMP_FILTER_FLAG_MASK       (\
>> >> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>> >>
>> >>  #ifdef CONFIG_SECCOMP
>> >>
>> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> >> index 0f238a4..c29a423 100644
>> >> --- a/include/uapi/linux/seccomp.h
>> >> +++ b/include/uapi/linux/seccomp.h
>> >> @@ -16,6 +16,7 @@
>> >>
>> >>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>> >>  #define SECCOMP_FILTER_FLAG_TSYNC      1
>> >> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
>> >>
>> >>  /*
>> >>   * All BPF programs must return a 32-bit value.
>> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> >> index a2c5b32..9c6bea6 100644
>> >> --- a/kernel/seccomp.c
>> >> +++ b/kernel/seccomp.c
>> >> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>> >>
>> >>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>> >>
>> >> -       /*
>> >> -        * Installing a seccomp filter requires that the task has
>> >> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> >> -        * This avoids scenarios where unprivileged tasks can affect the
>> >> -        * behavior of privileged children.
>> >> -        */
>> >> -       if (!task_no_new_privs(current) &&
>> >> -           security_capable_noaudit(current_cred(), current_user_ns(),
>> >> -                                    CAP_SYS_ADMIN) != 0)
>> >> -               return ERR_PTR(-EACCES);
>> >> -
>> >>         /* Allocate a new seccomp_filter */
>> >>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>> >>         if (!sfilter)
>> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
>> >>         info.si_syscall = syscall;
>> >>         force_sig_info(SIGSYS, &info, current);
>> >>  }
>> >> +
>> >> +#ifdef CONFIG_BPF_SYSCALL
>> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> >> +{
>> >> +       /* XXX: this cast generates a warning. should we make people pass in
>> >> +        * &fd, or is there some nicer way of doing this?
>> >> +        */
>> >> +       u32 fd = (u32) filter;
>> >
>> > I think this is probably the right way to do it, modulo getting the
>> > warning fixed. Let me invoke the great linux-api subscribers to get
>> > some more opinions.
>>
>> Sigh. It's sad, but the using a cast does seem the simplest option.
>> But, how about another idea...
>>
>> > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
>> > pointer argument into an fd argument. Is this sane, should it be a
>> > pointer to an fd, or should it not be a flag at all, creating a new
>> > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?
>>
>> What about
>>
>>     seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)
>>
>> Where structp is a pointer to something like
>>
>> struct seccomp_ebpf {
>>     int size;      /* Size of this whole struct */
>>     int fd;
>> }
>>
>> 'size' allows for future expansion of the struct (in case we want to
>> expand it later), and placing 'fd' inside a struct avoids unpleasant
>> implication that would be made by passing a pointer to an fd as the
>> third argument.
>
> I like this; although perhaps something like bpf() has, with the
> unions inside the struct so that we don't have to declare another
> struct if we add another seccomp command. Kees?

Yeah, bpf's union looks good. Let's add a "command" flag, though:

seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);

And this cmd could be ADD_FD or something?

How's that look?

-Kees

>
> Tycho
>
>> Cheers,
>>
>> Michael
>>
>>
>> > -Kees
>> >
>> >> +       struct seccomp_filter *ret;
>> >> +       struct bpf_prog *prog;
>> >> +
>> >> +       prog = bpf_prog_get(fd);
>> >> +       if (IS_ERR(prog))
>> >> +               return (struct seccomp_filter *) prog;
>> >> +
>> >> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
>> >> +               bpf_prog_put(prog);
>> >> +               return ERR_PTR(-EINVAL);
>> >> +       }
>> >> +
>> >> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
>> >> +       if (!ret) {
>> >> +               bpf_prog_put(prog);
>> >> +               return ERR_PTR(-ENOMEM);
>> >> +       }
>> >> +
>> >> +       ret->prog = prog;
>> >> +       atomic_set(&ret->usage, 1);
>> >> +
>> >> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
>> >> +        * is refcounted too and we're holding a reference from the struct
>> >> +        * seccomp_filter object.
>> >> +        */
>> >> +
>> >> +       return ret;
>> >> +}
>> >> +#else
>> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> >> +{
>> >> +       return ERR_PTR(-EINVAL);
>> >> +}
>> >> +#endif
>> >>  #endif /* CONFIG_SECCOMP_FILTER */
>> >>
>> >>  /*
>> >> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
>> >>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
>> >>                 return -EINVAL;
>> >>
>> >> +       /*
>> >> +        * Installing a seccomp filter requires that the task has
>> >> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> >> +        * This avoids scenarios where unprivileged tasks can affect the
>> >> +        * behavior of privileged children.
>> >> +        */
>> >> +       if (!task_no_new_privs(current) &&
>> >> +           security_capable_noaudit(current_cred(), current_user_ns(),
>> >> +                                    CAP_SYS_ADMIN) != 0)
>> >> +               return -EACCES;
>> >> +
>> >>         /* Prepare the new filter before holding any locks. */
>> >> -       prepared = seccomp_prepare_user_filter(filter);
>> >> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
>> >> +               prepared = seccomp_prepare_ebpf(filter);
>> >> +       else
>> >> +               prepared = seccomp_prepare_user_filter(filter);
>> >> +
>> >>         if (IS_ERR(prepared))
>> >>                 return PTR_ERR(prepared);
>> >>
>> >> --
>> >> 2.1.4
>> >>
>> >
>> >
>> >
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-09  0:07         ` Kees Cook
@ 2015-09-09 14:47           ` Tycho Andersen
  2015-09-09 15:14             ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-09 14:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michael Kerrisk (man-pages),
	Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote:
>
> Yeah, bpf's union looks good. Let's add a "command" flag, though:
> 
> seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);
> 
> And this cmd could be ADD_FD or something?
> 
> How's that look?

I think we can drop the size (using the same strategy as bpf() and
checking for zeroes at the end), and keep the same signature for
seccomp(); so:

seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union)

Yes, I'll use this in the next version.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-09 14:47           ` Tycho Andersen
@ 2015-09-09 15:14             ` Alexei Starovoitov
  2015-09-09 15:55               ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-09 15:14 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Michael Kerrisk (man-pages),
	Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Wed, Sep 09, 2015 at 08:47:24AM -0600, Tycho Andersen wrote:
> On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote:
> >
> > Yeah, bpf's union looks good. Let's add a "command" flag, though:
> > 
> > seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);
> > 
> > And this cmd could be ADD_FD or something?
> > 
> > How's that look?
> 
> I think we can drop the size (using the same strategy as bpf() and
> checking for zeroes at the end), and keep the same signature for
> seccomp(); so:
> 
> seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union)
> 
> Yes, I'll use this in the next version.

actually bpf() has size as the last argument:
SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
perf_event_open() doesn't and size is embedded as one of the fields.
Both approaches are equivally powerfull from extensitiblity
point of view. My preference was to keep size as an explicit
argument.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 21:08       ` Kees Cook
@ 2015-09-09 15:50         ` Tycho Andersen
  2015-09-09 16:07           ` Alexei Starovoitov
  2015-09-09 16:07           ` Daniel Borkmann
  0 siblings, 2 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-09 15:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 04, 2015 at 02:08:37PM -0700, Kees Cook wrote:
> On Fri, Sep 4, 2015 at 2:06 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Fri, Sep 04, 2015 at 01:34:12PM -0700, Kees Cook wrote:
> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > +static const struct bpf_func_proto *
> >> > +seccomp_func_proto(enum bpf_func_id func_id)
> >> > +{
> >> > +       /* Right now seccomp eBPF loading doesn't support maps; seccomp filters
> >> > +        * are considered to be read-only after they're installed, so map fds
> >> > +        * probably need to be invalidated when a seccomp filter with maps is
> >> > +        * installed.
> >> > +        *
> >> > +        * The rest of these might be reasonable to call from seccomp, so we
> >> > +        * export them.
> >> > +        */
> >> > +       switch (func_id) {
> >> > +       case BPF_FUNC_ktime_get_ns:
> >> > +               return &bpf_ktime_get_ns_proto;
> >> > +       case BPF_FUNC_trace_printk:
> >> > +               return bpf_get_trace_printk_proto();
> >> > +       case BPF_FUNC_get_prandom_u32:
> >> > +               return &bpf_get_prandom_u32_proto;
> >> > +       case BPF_FUNC_get_smp_processor_id:
> >> > +               return &bpf_get_smp_processor_id_proto;
> >> > +       case BPF_FUNC_tail_call:
> >> > +               return &bpf_tail_call_proto;
> >> > +       case BPF_FUNC_get_current_pid_tgid:
> >> > +               return &bpf_get_current_pid_tgid_proto;
> >> > +       case BPF_FUNC_get_current_uid_gid:
> >> > +               return &bpf_get_current_uid_gid_proto;
> >> > +       case BPF_FUNC_get_current_comm:
> >> > +               return &bpf_get_current_comm_proto;
> >> > +       default:
> >> > +               return NULL;
> >> > +       }
> >> > +}
> >>
> >> While this list is probably fine, I don't want to mix the addition of
> >> eBPF functions to the seccomp ABI with the CRIU changes. No function
> >> calls are currently possible and it should stay that way.
> >
> > Ok, I can remove them.
> >
> >> I was expecting to see a validator, similar to the existing BPF
> >> validator that is called when creating seccomp filters currently. Can
> >> we add a similar validator for new BPF_PROG_TYPE_SECCOMP?
> >
> > That's effectively what this patch does; when the eBPF is loaded via
> > bpf(), you tell bpf() you want a BPF_PROG_TYPE_SECCOMP, and it invokes
> > this validation/translation code, i.e. it uses
> > seccomp_is_valid_access() to check and make sure access are aligned
> > and inside struct seccomp_data.
> 
> What about limiting the possible instructions?

I totally overlooked this. A quick glance through the eBPF verifier
makes me think that we can just add another function to struct
bpf_verifier_ops called valid_instruction, which shouldn't be too
hard. Perhaps a more interesting question is what to allow:

BPF_LD(X) and BPF_ST(X): it looks like all types of stores are
  allowed, and only BPF_MEM and BPF_IMM loads are allowed; I think
  these can stay the same. BPF_XADD is new in eBPF, and I don't think
  we need it for seccomp (yet), since we don't have any shared memory
  via maps.

BPF_ALU: It looks like we're also not allowing regular BPF_ALU
  instruction BPF_MOD; eBPF adds a few ones: BPF_MOV (register move),
  BPF_ARSH (sign extended right shift), and BPF_END (endianness
  conversion), wich I think should all be safe. In particular, we need
  to allow BPF_MOV at least, since that's how the converter implements
  BPF_MISC | BPF_TAX from classic.

BPF_ALU64: I think we can safely allow all these as above, since
  they're just the 64-bit versions.

BPF_JMP: eBPF adds BPF_JNE, BPF_JSGT, BPF_JSGE, BPF_CALL, and
  BPF_EXIT, which I think all should be safe (except maybe BPF_CALL
  since we're not allowing functions really). Again we have to allow
  one of the new eBPF codes, as the converter implements BPF_RET as
  BPF_JMP | BPF_EXIT.

Thoughts?

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-09 15:14             ` Alexei Starovoitov
@ 2015-09-09 15:55               ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-09 15:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Michael Kerrisk (man-pages),
	Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Wed, Sep 09, 2015 at 08:14:04AM -0700, Alexei Starovoitov wrote:
> On Wed, Sep 09, 2015 at 08:47:24AM -0600, Tycho Andersen wrote:
> > On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote:
> > >
> > > Yeah, bpf's union looks good. Let's add a "command" flag, though:
> > > 
> > > seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);
> > > 
> > > And this cmd could be ADD_FD or something?
> > > 
> > > How's that look?
> > 
> > I think we can drop the size (using the same strategy as bpf() and
> > checking for zeroes at the end), and keep the same signature for
> > seccomp(); so:
> > 
> > seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union)
> > 
> > Yes, I'll use this in the next version.
> 
> actually bpf() has size as the last argument:
> SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
> perf_event_open() doesn't and size is embedded as one of the fields.
> Both approaches are equivally powerfull from extensitiblity
> point of view. My preference was to keep size as an explicit
> argument.

Yep, sorry that was poorly written. I meant keeping the size as a
member of the struct as Michael originally suggested, mostly to avoid
having to change the signature of seccomp().

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-09 15:50         ` Tycho Andersen
@ 2015-09-09 16:07           ` Alexei Starovoitov
  2015-09-09 16:09             ` Daniel Borkmann
  2015-09-09 16:07           ` Daniel Borkmann
  1 sibling, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-09 16:07 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
> > >
> > > That's effectively what this patch does; when the eBPF is loaded via
> > > bpf(), you tell bpf() you want a BPF_PROG_TYPE_SECCOMP, and it invokes
> > > this validation/translation code, i.e. it uses
> > > seccomp_is_valid_access() to check and make sure access are aligned
> > > and inside struct seccomp_data.
> > 
> > What about limiting the possible instructions?
> 
> I totally overlooked this. A quick glance through the eBPF verifier
> makes me think that we can just add another function to struct
> bpf_verifier_ops called valid_instruction, which shouldn't be too
> hard. Perhaps a more interesting question is what to allow:
> 
> BPF_LD(X) and BPF_ST(X): it looks like all types of stores are
>   allowed, and only BPF_MEM and BPF_IMM loads are allowed; I think
>   these can stay the same. BPF_XADD is new in eBPF, and I don't think
>   we need it for seccomp (yet), since we don't have any shared memory
>   via maps.
> 
> BPF_ALU: It looks like we're also not allowing regular BPF_ALU
>   instruction BPF_MOD; eBPF adds a few ones: BPF_MOV (register move),
>   BPF_ARSH (sign extended right shift), and BPF_END (endianness
>   conversion), wich I think should all be safe. In particular, we need
>   to allow BPF_MOV at least, since that's how the converter implements
>   BPF_MISC | BPF_TAX from classic.
> 
> BPF_ALU64: I think we can safely allow all these as above, since
>   they're just the 64-bit versions.
> 
> BPF_JMP: eBPF adds BPF_JNE, BPF_JSGT, BPF_JSGE, BPF_CALL, and
>   BPF_EXIT, which I think all should be safe (except maybe BPF_CALL
>   since we're not allowing functions really). Again we have to allow
>   one of the new eBPF codes, as the converter implements BPF_RET as
>   BPF_JMP | BPF_EXIT.
> 
> Thoughts?

Please do not add any per-instruction hacks. None of them are
necessary. Classic had to do extra ugly checks in seccomp only
because verifier wasn't flexible enough.
If you don't want to see any BPF_CALL in seccomp, just have
empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
and verifier will reject all calls.
Currently we have only two non-generic instrucitons
LD_ABS and LD_IND that are avaialable for sockets/TC only,
because these are legacy instructions and we had to make
exceptions for them.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-09 15:50         ` Tycho Andersen
  2015-09-09 16:07           ` Alexei Starovoitov
@ 2015-09-09 16:07           ` Daniel Borkmann
  1 sibling, 0 replies; 55+ messages in thread
From: Daniel Borkmann @ 2015-09-09 16:07 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, LKML, Network Development

On 09/09/2015 05:50 PM, Tycho Andersen wrote:
> On Fri, Sep 04, 2015 at 02:08:37PM -0700, Kees Cook wrote:
>> On Fri, Sep 4, 2015 at 2:06 PM, Tycho Andersen
[...]
>>>> I was expecting to see a validator, similar to the existing BPF
>>>> validator that is called when creating seccomp filters currently. Can
>>>> we add a similar validator for new BPF_PROG_TYPE_SECCOMP?
>>>
>>> That's effectively what this patch does; when the eBPF is loaded via
>>> bpf(), you tell bpf() you want a BPF_PROG_TYPE_SECCOMP, and it invokes
>>> this validation/translation code, i.e. it uses
>>> seccomp_is_valid_access() to check and make sure access are aligned
>>> and inside struct seccomp_data.
>>
>> What about limiting the possible instructions?
>
> I totally overlooked this. A quick glance through the eBPF verifier
> makes me think that we can just add another function to struct
> bpf_verifier_ops called valid_instruction, which shouldn't be too
> hard. Perhaps a more interesting question is what to allow:

It's possible, but keep in mind that when you disallow various
instructions from the base insns set, you won't be able to leverage
filter creation in the minimal C subset via clang/llvm anymore, so
usability would suffer from this side, even if you just use clang/llvm
to create the raw insns and later keep them in your application
directly.

And if you later on decide to allow maps, etc, hacking this together
by hand is a bit of a pain. ;)

[ Restricting helper functions and ctx access, etc via bpf_verifier_ops
   (as you can currently do) should not affect this. ]

> BPF_LD(X) and BPF_ST(X): it looks like all types of stores are
>    allowed, and only BPF_MEM and BPF_IMM loads are allowed; I think
>    these can stay the same. BPF_XADD is new in eBPF, and I don't think
>    we need it for seccomp (yet), since we don't have any shared memory
>    via maps.
>
> BPF_ALU: It looks like we're also not allowing regular BPF_ALU
>    instruction BPF_MOD; eBPF adds a few ones: BPF_MOV (register move),
>    BPF_ARSH (sign extended right shift), and BPF_END (endianness
>    conversion), wich I think should all be safe. In particular, we need
>    to allow BPF_MOV at least, since that's how the converter implements
>    BPF_MISC | BPF_TAX from classic.
>
> BPF_ALU64: I think we can safely allow all these as above, since
>    they're just the 64-bit versions.
>
> BPF_JMP: eBPF adds BPF_JNE, BPF_JSGT, BPF_JSGE, BPF_CALL, and
>    BPF_EXIT, which I think all should be safe (except maybe BPF_CALL
>    since we're not allowing functions really). Again we have to allow
>    one of the new eBPF codes, as the converter implements BPF_RET as
>    BPF_JMP | BPF_EXIT.
>
> Thoughts?
>
> Tycho


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-09 16:07           ` Alexei Starovoitov
@ 2015-09-09 16:09             ` Daniel Borkmann
  2015-09-09 16:37               ` Kees Cook
  0 siblings, 1 reply; 55+ messages in thread
From: Daniel Borkmann @ 2015-09-09 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, LKML,
	Network Development

On 09/09/2015 06:07 PM, Alexei Starovoitov wrote:
> On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
[...]
>> Thoughts?
>
> Please do not add any per-instruction hacks. None of them are
> necessary. Classic had to do extra ugly checks in seccomp only
> because verifier wasn't flexible enough.
> If you don't want to see any BPF_CALL in seccomp, just have
> empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
> and verifier will reject all calls.
> Currently we have only two non-generic instrucitons
> LD_ABS and LD_IND that are avaialable for sockets/TC only,
> because these are legacy instructions and we had to make
> exceptions for them.

Yep, +1.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-04 21:50   ` Andy Lutomirski
@ 2015-09-09 16:13     ` Daniel Borkmann
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Borkmann @ 2015-09-09 16:13 UTC (permalink / raw)
  To: Andy Lutomirski, Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, linux-kernel,
	Network Development

On 09/04/2015 11:50 PM, Andy Lutomirski wrote:
> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
[...]
>> +static const struct bpf_func_proto *
>> +seccomp_func_proto(enum bpf_func_id func_id)
>> +{
>> +       /* Right now seccomp eBPF loading doesn't support maps; seccomp filters
>> +        * are considered to be read-only after they're installed, so map fds
>> +        * probably need to be invalidated when a seccomp filter with maps is
>> +        * installed.
>> +        *
>> +        * The rest of these might be reasonable to call from seccomp, so we
>> +        * export them.
>> +        */
>> +       switch (func_id) {
>> +       case BPF_FUNC_ktime_get_ns:
>> +               return &bpf_ktime_get_ns_proto;
>> +       case BPF_FUNC_trace_printk:
>> +               return bpf_get_trace_printk_proto();
>> +       case BPF_FUNC_get_prandom_u32:
>> +               return &bpf_get_prandom_u32_proto;
>
> I don't think we should expose prandom to unprivileged userspace.
> This may be an attack vector.

Agreed, it should not be exposed for unpriv'ed users.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-09 16:09             ` Daniel Borkmann
@ 2015-09-09 16:37               ` Kees Cook
  2015-09-09 16:52                 ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-09 16:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Tycho Andersen, Alexei Starovoitov,
	Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, LKML, Network Development

On Wed, Sep 9, 2015 at 9:09 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 09/09/2015 06:07 PM, Alexei Starovoitov wrote:
>>
>> On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
>
> [...]
>>>
>>> Thoughts?
>>
>>
>> Please do not add any per-instruction hacks. None of them are
>> necessary. Classic had to do extra ugly checks in seccomp only
>> because verifier wasn't flexible enough.
>> If you don't want to see any BPF_CALL in seccomp, just have
>> empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
>> and verifier will reject all calls.
>> Currently we have only two non-generic instrucitons
>> LD_ABS and LD_IND that are avaialable for sockets/TC only,
>> because these are legacy instructions and we had to make
>> exceptions for them.
>
> Yep, +1.

Hrmpf. This adds to the cognitive load for accepting this patch
series. :P Now I have to convince myself that there is no additional
exposure to seccomp by using the entire set of eBPF instructions.
While I'm pretty sure it'll be fine, I really don't want to risk being
wrong and opening a hole here. I will spend some time looking at the
new eBPF instructions...

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-09 16:37               ` Kees Cook
@ 2015-09-09 16:52                 ` Alexei Starovoitov
  2015-09-09 17:27                   ` Kees Cook
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2015-09-09 16:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Daniel Borkmann, Tycho Andersen, Alexei Starovoitov, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	LKML, Network Development

On Wed, Sep 09, 2015 at 09:37:51AM -0700, Kees Cook wrote:
> On Wed, Sep 9, 2015 at 9:09 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > On 09/09/2015 06:07 PM, Alexei Starovoitov wrote:
> >>
> >> On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
> >
> > [...]
> >>>
> >>> Thoughts?
> >>
> >>
> >> Please do not add any per-instruction hacks. None of them are
> >> necessary. Classic had to do extra ugly checks in seccomp only
> >> because verifier wasn't flexible enough.
> >> If you don't want to see any BPF_CALL in seccomp, just have
> >> empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
> >> and verifier will reject all calls.
> >> Currently we have only two non-generic instrucitons
> >> LD_ABS and LD_IND that are avaialable for sockets/TC only,
> >> because these are legacy instructions and we had to make
> >> exceptions for them.
> >
> > Yep, +1.
> 
> Hrmpf. This adds to the cognitive load for accepting this patch
> series. :P Now I have to convince myself that there is no additional
> exposure to seccomp by using the entire set of eBPF instructions.
> While I'm pretty sure it'll be fine, I really don't want to risk being
> wrong and opening a hole here. I will spend some time looking at the
> new eBPF instructions...

note, as was discussed many times before, there is no pointer leak
prevention pass yet, so eBPF is root only.
Once the pass is complete it will prevent passing addresses to
functions, storing them in maps and returning from the program.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-09 16:52                 ` Alexei Starovoitov
@ 2015-09-09 17:27                   ` Kees Cook
  2015-09-09 17:31                     ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Kees Cook @ 2015-09-09 17:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Tycho Andersen, Alexei Starovoitov, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	LKML, Network Development

On Wed, Sep 9, 2015 at 9:52 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Sep 09, 2015 at 09:37:51AM -0700, Kees Cook wrote:
>> On Wed, Sep 9, 2015 at 9:09 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> > On 09/09/2015 06:07 PM, Alexei Starovoitov wrote:
>> >>
>> >> On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
>> >
>> > [...]
>> >>>
>> >>> Thoughts?
>> >>
>> >>
>> >> Please do not add any per-instruction hacks. None of them are
>> >> necessary. Classic had to do extra ugly checks in seccomp only
>> >> because verifier wasn't flexible enough.
>> >> If you don't want to see any BPF_CALL in seccomp, just have
>> >> empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
>> >> and verifier will reject all calls.
>> >> Currently we have only two non-generic instrucitons
>> >> LD_ABS and LD_IND that are avaialable for sockets/TC only,
>> >> because these are legacy instructions and we had to make
>> >> exceptions for them.
>> >
>> > Yep, +1.
>>
>> Hrmpf. This adds to the cognitive load for accepting this patch
>> series. :P Now I have to convince myself that there is no additional
>> exposure to seccomp by using the entire set of eBPF instructions.
>> While I'm pretty sure it'll be fine, I really don't want to risk being
>> wrong and opening a hole here. I will spend some time looking at the
>> new eBPF instructions...
>
> note, as was discussed many times before, there is no pointer leak
> prevention pass yet, so eBPF is root only.
> Once the pass is complete it will prevent passing addresses to
> functions, storing them in maps and returning from the program.

Tycho, are you building new eBPF filters as the root user and then
attaching them later? I was imagining you were going to need this
entirely as non-root.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] ebpf: add a seccomp program type
  2015-09-09 17:27                   ` Kees Cook
@ 2015-09-09 17:31                     ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-09 17:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Daniel Borkmann, Alexei Starovoitov,
	Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, LKML, Network Development

On Wed, Sep 09, 2015 at 10:27:08AM -0700, Kees Cook wrote:
> On Wed, Sep 9, 2015 at 9:52 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Wed, Sep 09, 2015 at 09:37:51AM -0700, Kees Cook wrote:
> >> On Wed, Sep 9, 2015 at 9:09 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> > On 09/09/2015 06:07 PM, Alexei Starovoitov wrote:
> >> >>
> >> >> On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
> >> >
> >> > [...]
> >> >>>
> >> >>> Thoughts?
> >> >>
> >> >>
> >> >> Please do not add any per-instruction hacks. None of them are
> >> >> necessary. Classic had to do extra ugly checks in seccomp only
> >> >> because verifier wasn't flexible enough.
> >> >> If you don't want to see any BPF_CALL in seccomp, just have
> >> >> empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
> >> >> and verifier will reject all calls.
> >> >> Currently we have only two non-generic instrucitons
> >> >> LD_ABS and LD_IND that are avaialable for sockets/TC only,
> >> >> because these are legacy instructions and we had to make
> >> >> exceptions for them.
> >> >
> >> > Yep, +1.
> >>
> >> Hrmpf. This adds to the cognitive load for accepting this patch
> >> series. :P Now I have to convince myself that there is no additional
> >> exposure to seccomp by using the entire set of eBPF instructions.
> >> While I'm pretty sure it'll be fine, I really don't want to risk being
> >> wrong and opening a hole here. I will spend some time looking at the
> >> new eBPF instructions...
> >
> > note, as was discussed many times before, there is no pointer leak
> > prevention pass yet, so eBPF is root only.
> > Once the pass is complete it will prevent passing addresses to
> > functions, storing them in maps and returning from the program.
> 
> Tycho, are you building new eBPF filters as the root user and then
> attaching them later?

Yep, exactly.

> I was imagining you were going to need this entirely as non-root.

We can't exactly because of the real root restriction on bpf(). So we
just open the bpf fds when we're real root and keep them until the
very end of the restore when we finally install the seccomp filters.
Not ideal, of course, but I assume we can change how this works once
bpf() deemed safe for non-root.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-05  0:27             ` Tycho Andersen
@ 2015-09-09 22:34               ` Tycho Andersen
  2015-09-09 23:44                 ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-09 22:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Fri, Sep 04, 2015 at 06:27:27PM -0600, Tycho Andersen wrote:
> On Fri, Sep 04, 2015 at 04:08:53PM -0700, Andy Lutomirski wrote:
> > On Fri, Sep 4, 2015 at 3:28 PM, Tycho Andersen
> > <tycho.andersen@canonical.com> wrote:
> > > On Fri, Sep 04, 2015 at 02:48:03PM -0700, Andy Lutomirski wrote:
> > >> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
> > >> <tycho.andersen@canonical.com> wrote:
> > >> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
> > >> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> > >> >> <tycho.andersen@canonical.com> wrote:
> > >> >> > This commit adds a way to dump eBPF programs. The initial implementation
> > >> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
> > >> >> > programs which themselves don't currently support maps.
> > >> >> >
> > >> >> > We export the GPL bit as well as a unique ID for the program so that
> > >> >>
> > >> >> This unique ID appears to be the heap address for the prog. That's a
> > >> >> huge leak, and should not be done. We don't want to introduce new
> > >> >> kernel address leaks while we're trying to fix the remaining ones.
> > >> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
> > >> >> could be used, for example.
> > >> >
> > >> > No; we acquire the fd per process, so if a task installs a filter and
> > >> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> > >> > different file descriptors. Ideally, we'd have some way to figure out
> > >> > that these were all the same. Some sort of prog_id is one way,
> > >> > although there may be others.
> > >>
> > >> I disagree a bit.  I think we want the actual hierarchy to be a
> > >> well-defined thing, because I have plans to make the hierarchy
> > >> actually do something.  That means that we'll need to have a more
> > >> exact way to dump the hierarchy than "these two filters are identical"
> > >> or "these two filters are not identical".
> > >
> > > Can you elaborate on what this would look like? I think with the
> > > "these two filters are the same" primitive (the same in the sense that
> > > they were inherited during a fork, not just that
> > > memcmp(filter1->insns, filter2->insns) == 0) you can infer the entire
> > > hierarchy, however clunky it may be to do so.
> > >
> > > Another issue is that KCMP_FILE won't work in this case, as it
> > > effectively compares the struct file *, which will be different since
> > > we need to call anon_inode_getfd() for each call of
> > > ptrace(PTRACE_SECCOMP_GET_FILTER_FD). We could add a KCMP_BPF (or just
> > > a KCMP_FILE_PRIVATE_DATA, since that's effectively what it would be).
> > > Does that make sense? [added Cyrill]
> > >
> > 
> > I don't really know what it would look like.  I think we want a way to
> > compare struct seccomp_filter pointers.

Here's a thought,

The set I'm currently proposing effectively separates the ref-counting
of the struct seccomp_filter from the struct bpf_prog (by necessity,
since we're referring to filters from fds). What if we went a little
futher, and made a copy of each seccomp_filter on fork(), keeping it
pointed at the same bpf_prog but adding some metadata about how it was
inherited (tsk->seccomp.filter->inheritence_count++ perhaps). This
would still require this change:

> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 9c6bea6..efc3f36 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -239,7 +239,7 @@ static int is_ancestor(struct seccomp_filter *parent,
>  	if (parent == NULL)
>  		return 1;
>  	for (; child; child = child->prev)
> -		if (child == parent)
> +		if (child->prog == parent->prog)
>  			return 1;
>  	return 0;
>  }

to get the ancestry right on restore, but the change would make more
sense given that the seccomp_filter pointers were in fact unique at
this point.

To access it, we can change the current set to instead of iterating on
bpf_prog, to iterate on seccomp_filter, and add a few seccomp
commands, so the whole sequence looks like this:

seccomp_fd = ptrace(PTRACE_SECCOMP_GET_FILTER_FD, pid);
if (seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_EBPF_INHERITANCE, fd) > 0) {
        /* mark this as inherited from parent */
} else {
        bpf_fd = seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_EBPF_GET_FD, seccomp_fd);
        bpf(BPF_PROG_DUMP, &attr, sizeof(attr));
        /* save things as normal */
}

Then we need some way to restore this inheritance_count; the only
thing I can think of right now is to let people specify it when they
add the filter via SECCOMP_EBPF_ADD_FD, but that doesn't seem ideal
since these are potentially unprivileged users (root in their user
ns). However, it's not clear to me how one would abuse it given that
is_ancestor actually checks the bpf_prog pointers.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-09 22:34               ` Tycho Andersen
@ 2015-09-09 23:44                 ` Andy Lutomirski
  2015-09-10  0:13                   ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2015-09-09 23:44 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Wed, Sep 9, 2015 at 3:34 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 06:27:27PM -0600, Tycho Andersen wrote:
>> On Fri, Sep 04, 2015 at 04:08:53PM -0700, Andy Lutomirski wrote:
>> > On Fri, Sep 4, 2015 at 3:28 PM, Tycho Andersen
>> > <tycho.andersen@canonical.com> wrote:
>> > > On Fri, Sep 04, 2015 at 02:48:03PM -0700, Andy Lutomirski wrote:
>> > >> On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
>> > >> <tycho.andersen@canonical.com> wrote:
>> > >> > On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
>> > >> >> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> > >> >> <tycho.andersen@canonical.com> wrote:
>> > >> >> > This commit adds a way to dump eBPF programs. The initial implementation
>> > >> >> > doesn't support maps, and therefore only allows dumping seccomp ebpf
>> > >> >> > programs which themselves don't currently support maps.
>> > >> >> >
>> > >> >> > We export the GPL bit as well as a unique ID for the program so that
>> > >> >>
>> > >> >> This unique ID appears to be the heap address for the prog. That's a
>> > >> >> huge leak, and should not be done. We don't want to introduce new
>> > >> >> kernel address leaks while we're trying to fix the remaining ones.
>> > >> >> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
>> > >> >> could be used, for example.
>> > >> >
>> > >> > No; we acquire the fd per process, so if a task installs a filter and
>> > >> > then forks N times, we'll grab N (+1) copies of the filter from N (+1)
>> > >> > different file descriptors. Ideally, we'd have some way to figure out
>> > >> > that these were all the same. Some sort of prog_id is one way,
>> > >> > although there may be others.
>> > >>
>> > >> I disagree a bit.  I think we want the actual hierarchy to be a
>> > >> well-defined thing, because I have plans to make the hierarchy
>> > >> actually do something.  That means that we'll need to have a more
>> > >> exact way to dump the hierarchy than "these two filters are identical"
>> > >> or "these two filters are not identical".
>> > >
>> > > Can you elaborate on what this would look like? I think with the
>> > > "these two filters are the same" primitive (the same in the sense that
>> > > they were inherited during a fork, not just that
>> > > memcmp(filter1->insns, filter2->insns) == 0) you can infer the entire
>> > > hierarchy, however clunky it may be to do so.
>> > >
>> > > Another issue is that KCMP_FILE won't work in this case, as it
>> > > effectively compares the struct file *, which will be different since
>> > > we need to call anon_inode_getfd() for each call of
>> > > ptrace(PTRACE_SECCOMP_GET_FILTER_FD). We could add a KCMP_BPF (or just
>> > > a KCMP_FILE_PRIVATE_DATA, since that's effectively what it would be).
>> > > Does that make sense? [added Cyrill]
>> > >
>> >
>> > I don't really know what it would look like.  I think we want a way to
>> > compare struct seccomp_filter pointers.
>
> Here's a thought,
>
> The set I'm currently proposing effectively separates the ref-counting
> of the struct seccomp_filter from the struct bpf_prog (by necessity,
> since we're referring to filters from fds). What if we went a little
> futher, and made a copy of each seccomp_filter on fork(), keeping it
> pointed at the same bpf_prog but adding some metadata about how it was
> inherited (tsk->seccomp.filter->inheritence_count++ perhaps). This
> would still require this change:

Won't that break the tsync mechanism?

--Andy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-09 23:44                 ` Andy Lutomirski
@ 2015-09-10  0:13                   ` Tycho Andersen
  2015-09-10  0:44                     ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2015-09-10  0:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Wed, Sep 09, 2015 at 04:44:24PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 9, 2015 at 3:34 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> >
> > Here's a thought,
> >
> > The set I'm currently proposing effectively separates the ref-counting
> > of the struct seccomp_filter from the struct bpf_prog (by necessity,
> > since we're referring to filters from fds). What if we went a little
> > futher, and made a copy of each seccomp_filter on fork(), keeping it
> > pointed at the same bpf_prog but adding some metadata about how it was
> > inherited (tsk->seccomp.filter->inheritence_count++ perhaps). This
> > would still require this change:
> 
> Won't that break the tsync mechanism?

We'll need the change I posted (is_ancestor comparing the underlying
bpf_prog instead of the seccomp_filter), but then I think it'll work.
I guess we'll need to do some more bookkeeping when we install filters
via TSYNC since each thread would need its own seccomp_filter, and
we'd also have to decide whether a filter installed via TSYNC was
inherited or not.

Am I missing something?

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-10  0:13                   ` Tycho Andersen
@ 2015-09-10  0:44                     ` Andy Lutomirski
  2015-09-10  0:58                       ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2015-09-10  0:44 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Wed, Sep 9, 2015 at 5:13 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Wed, Sep 09, 2015 at 04:44:24PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 9, 2015 at 3:34 PM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> >
>> > Here's a thought,
>> >
>> > The set I'm currently proposing effectively separates the ref-counting
>> > of the struct seccomp_filter from the struct bpf_prog (by necessity,
>> > since we're referring to filters from fds). What if we went a little
>> > futher, and made a copy of each seccomp_filter on fork(), keeping it
>> > pointed at the same bpf_prog but adding some metadata about how it was
>> > inherited (tsk->seccomp.filter->inheritence_count++ perhaps). This
>> > would still require this change:
>>
>> Won't that break the tsync mechanism?
>
> We'll need the change I posted (is_ancestor comparing the underlying
> bpf_prog instead of the seccomp_filter), but then I think it'll work.
> I guess we'll need to do some more bookkeeping when we install filters
> via TSYNC since each thread would need its own seccomp_filter, and
> we'd also have to decide whether a filter installed via TSYNC was
> inherited or not.
>
> Am I missing something?

Yes.  I don't think that:

int fd = [create an ebpf fd];
if (fork()) {
  /* Process A */
  seccomp(attach fd);
  ...
} else {
  /* Process B */
  seccomp(attach fd);
  ...
}

should result in processes A and B being considered to have the same
seccomp_filter state.  In particular, I eventually want to make the
seccomp_filter state be considerably more interesting than just the
bpf program.

IOW I really do think that seccomp_filter should have identity.

There's another severe problem, I think.  Suppose that ebpf1 and ebpf2
are ebpf fds.  If processes C and D start out with no filters at all,
C attaches ebpf1 and ebpf2, and D attaches just ebpf2, then C and D
are definitely *not* in the same state, and neither is an ancestor of
the other.

--Andy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program
  2015-09-10  0:44                     ` Andy Lutomirski
@ 2015-09-10  0:58                       ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2015-09-10  0:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Cyrill Gorcunov

On Wed, Sep 09, 2015 at 05:44:06PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 9, 2015 at 5:13 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Wed, Sep 09, 2015 at 04:44:24PM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 9, 2015 at 3:34 PM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> >
> >> > Here's a thought,
> >> >
> >> > The set I'm currently proposing effectively separates the ref-counting
> >> > of the struct seccomp_filter from the struct bpf_prog (by necessity,
> >> > since we're referring to filters from fds). What if we went a little
> >> > futher, and made a copy of each seccomp_filter on fork(), keeping it
> >> > pointed at the same bpf_prog but adding some metadata about how it was
> >> > inherited (tsk->seccomp.filter->inheritence_count++ perhaps). This
> >> > would still require this change:
> >>
> >> Won't that break the tsync mechanism?
> >
> > We'll need the change I posted (is_ancestor comparing the underlying
> > bpf_prog instead of the seccomp_filter), but then I think it'll work.
> > I guess we'll need to do some more bookkeeping when we install filters
> > via TSYNC since each thread would need its own seccomp_filter, and
> > we'd also have to decide whether a filter installed via TSYNC was
> > inherited or not.
> >
> > Am I missing something?
> 
> Yes.  I don't think that:
> 
> int fd = [create an ebpf fd];
> if (fork()) {
>   /* Process A */
>   seccomp(attach fd);
>   ...
> } else {
>   /* Process B */
>   seccomp(attach fd);
>   ...
> }
> 
> should result in processes A and B being considered to have the same
> seccomp_filter state.  In particular, I eventually want to make the
> seccomp_filter state be considerably more interesting than just the
> bpf program.
> 
> There's another severe problem, I think.  Suppose that ebpf1 and ebpf2
> are ebpf fds.  If processes C and D start out with no filters at all,
> C attaches ebpf1 and ebpf2, and D attaches just ebpf2, then C and D
> are definitely *not* in the same state, and neither is an ancestor of
> the other.

Ah, yes.

> IOW I really do think that seccomp_filter should have identity.

What if we kept a pointer to the seccomp_filter that was inherited on
fork()? Everything "below" that in the tree is not inherited, and
everything above is. Unfortunately, it's not obvious how to restore
this state.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2015-09-10  0:59 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-04 16:04 c/r of seccomp filters via underlying eBPF Tycho Andersen
2015-09-04 16:04 ` [PATCH 1/6] ebpf: add a seccomp program type Tycho Andersen
2015-09-04 20:17   ` Alexei Starovoitov
2015-09-04 21:09     ` Tycho Andersen
2015-09-04 20:34   ` Kees Cook
2015-09-04 21:06     ` Tycho Andersen
2015-09-04 21:08       ` Kees Cook
2015-09-09 15:50         ` Tycho Andersen
2015-09-09 16:07           ` Alexei Starovoitov
2015-09-09 16:09             ` Daniel Borkmann
2015-09-09 16:37               ` Kees Cook
2015-09-09 16:52                 ` Alexei Starovoitov
2015-09-09 17:27                   ` Kees Cook
2015-09-09 17:31                     ` Tycho Andersen
2015-09-09 16:07           ` Daniel Borkmann
2015-09-04 21:50   ` Andy Lutomirski
2015-09-09 16:13     ` Daniel Borkmann
2015-09-04 16:04 ` [PATCH 2/6] seccomp: make underlying bpf ref counted as well Tycho Andersen
2015-09-04 21:53   ` Andy Lutomirski
2015-09-04 16:04 ` [PATCH 3/6] ebpf: add a way to dump an eBPF program Tycho Andersen
2015-09-04 20:17   ` Kees Cook
2015-09-04 20:45     ` Tycho Andersen
2015-09-04 20:50       ` Kees Cook
2015-09-04 20:58         ` Alexei Starovoitov
2015-09-04 21:00           ` Tycho Andersen
2015-09-04 21:48       ` Andy Lutomirski
2015-09-04 22:28         ` Tycho Andersen
2015-09-04 23:08           ` Andy Lutomirski
2015-09-05  0:27             ` Tycho Andersen
2015-09-09 22:34               ` Tycho Andersen
2015-09-09 23:44                 ` Andy Lutomirski
2015-09-10  0:13                   ` Tycho Andersen
2015-09-10  0:44                     ` Andy Lutomirski
2015-09-10  0:58                       ` Tycho Andersen
2015-09-04 23:27           ` Kees Cook
2015-09-05  0:08             ` Andy Lutomirski
2015-09-04 20:27   ` Alexei Starovoitov
2015-09-04 20:42     ` Tycho Andersen
2015-09-04 16:04 ` [PATCH 4/6] seccomp: add a way to access filters via bpf fds Tycho Andersen
2015-09-04 20:26   ` Kees Cook
2015-09-04 20:29     ` Alexei Starovoitov
2015-09-04 20:58       ` Tycho Andersen
2015-09-04 16:04 ` [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
2015-09-04 20:40   ` Alexei Starovoitov
2015-09-04 20:41   ` Kees Cook
2015-09-05  7:13     ` Michael Kerrisk (man-pages)
2015-09-08 13:40       ` Tycho Andersen
2015-09-09  0:07         ` Kees Cook
2015-09-09 14:47           ` Tycho Andersen
2015-09-09 15:14             ` Alexei Starovoitov
2015-09-09 15:55               ` Tycho Andersen
2015-09-04 16:04 ` [PATCH 6/6] ebpf: allow BPF_REG_X in src_reg conditional jumps Tycho Andersen
2015-09-04 21:06   ` Alexei Starovoitov
2015-09-04 22:43     ` Tycho Andersen
2015-09-05  4:12       ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).