linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* checkpoint/restore of seccomp filters v3
@ 2015-09-30 18:13 Tycho Andersen
  2015-09-30 18:13 ` [PATCH v3 1/5] seccomp: save the original filter Tycho Andersen
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:13 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	linux-api

Hi all,

Here's a re-worked set for c/r of seccomp filters which keeps around the
original bpf program passed to the kernel instead of trying to dump the
ebpf version. There are various comments/questions in the individual patch
notes.

I'm not sure this needs to go via net-next any more, as the impact in net/
is fairly minimal, and it seems more seccomp heavy. As such, this set is
based on seccomp/tip.

Thoughts welcome,

Tycho

P.S. Man page patches to come once we agree on the API :)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/5] seccomp: save the original filter
  2015-09-30 18:13 checkpoint/restore of seccomp filters v3 Tycho Andersen
@ 2015-09-30 18:13 ` Tycho Andersen
  2015-09-30 18:13 ` [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD Tycho Andersen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:13 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	linux-api, Tycho Andersen

In order to implement checkpoint of seccomp filters, we need to keep track
of the original filter as the user gave it to us. Since we're doing this,
we need to also use bpf_prog_destroy to free the struct bpf_brogs so we
don't leak this memory.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/filter.h |  2 ++
 kernel/seccomp.c       | 24 ++++++++++++++++--------
 net/core/filter.c      |  4 ++--
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index fa2cab9..6c045ba 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -410,6 +410,8 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog);
 int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
 			      bpf_aux_classic_check_t trans);
 void bpf_prog_destroy(struct bpf_prog *fp);
+int bpf_prog_store_orig_filter(struct bpf_prog *fp,
+			       const struct sock_fprog *fprog);
 
 int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
 int sk_attach_bpf(u32 ufd, struct sock *sk);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5bd4779..09f3769 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -337,6 +337,14 @@ static inline void seccomp_sync_threads(void)
 	}
 }
 
+static inline void seccomp_filter_free(struct seccomp_filter *filter)
+{
+	if (filter) {
+		bpf_prog_destroy(filter->prog);
+		kfree(filter);
+	}
+}
+
 /**
  * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
@@ -376,6 +384,14 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(ret);
 	}
 
+	if (config_enabled(CONFIG_CHECKPOINT_RESTORE)) {
+		ret = bpf_prog_store_orig_filter(sfilter->prog, fprog);
+		if (ret < 0) {
+			seccomp_filter_free(sfilter);
+			return ERR_PTR(ret);
+		}
+	}
+
 	atomic_set(&sfilter->usage, 1);
 
 	return sfilter;
@@ -466,14 +482,6 @@ void get_seccomp_filter(struct task_struct *tsk)
 	atomic_inc(&orig->usage);
 }
 
-static inline void seccomp_filter_free(struct seccomp_filter *filter)
-{
-	if (filter) {
-		bpf_prog_free(filter->prog);
-		kfree(filter);
-	}
-}
-
 /* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
 void put_seccomp_filter(struct task_struct *tsk)
 {
diff --git a/net/core/filter.c b/net/core/filter.c
index 13079f0..70995dd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -832,8 +832,8 @@ static int bpf_check_classic(const struct sock_filter *filter,
 	return -EINVAL;
 }
 
-static int bpf_prog_store_orig_filter(struct bpf_prog *fp,
-				      const struct sock_fprog *fprog)
+int bpf_prog_store_orig_filter(struct bpf_prog *fp,
+			       const struct sock_fprog *fprog)
 {
 	unsigned int fsize = bpf_classic_proglen(fprog);
 	struct sock_fprog_kern *fkprog;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD
  2015-09-30 18:13 checkpoint/restore of seccomp filters v3 Tycho Andersen
  2015-09-30 18:13 ` [PATCH v3 1/5] seccomp: save the original filter Tycho Andersen
@ 2015-09-30 18:13 ` Tycho Andersen
  2015-09-30 18:27   ` Andy Lutomirski
  2015-09-30 18:29   ` kbuild test robot
  2015-09-30 18:13 ` [PATCH v3 3/5] seccomp: add a ptrace command to get seccomp filter fds Tycho Andersen
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:13 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	linux-api, Tycho Andersen

This patch introduces the concept of a seccomp fd, with a similar interface
and usage to ebpf fds. Initially, one is allowed to create, install, and
dump these fds. Any manipulation of seccomp fds requires users to be root
in their own user namespace, matching the checks done for
SECCOMP_SET_MODE_FILTER.

Installing a filterfd has some gotchas, though. Andy mentioned previously
that we should restrict installation to filter fds whose parent is already
in the filter tree. This doesn't quite work in the case of created seccomp
fds, since once you install a filter fd, you can't install any other filter
fd since it has no parent and there is no way to "pre-chain" filters before
installing them. To work around this, we allow installing filters who have
no parent. If the filter has a parent, we require the current filter try to
be an ancestor of it.

I'm not quite sure that the ancestor restriction is correct, since it can
still allow for "re-parenting" of filters, potentially introducing new
filters to a task. However, since these operations are limited to root in
the user ns, perhaps it is ok. There is also some potentially racy
behavior where a task re-parents a filter that another task has installed.

One option to work around this is to keep a bit on struct seccomp_filter to
allow each filter to have its parent set exactly once. (This would still
allow you to install a filter multiple times, as long as the parent was the
same in each case.)

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/seccomp.h |  24 ++++++
 kernel/seccomp.c             | 189 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 210 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..4ee8770 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,10 +13,16 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
+#define SECCOMP_FILTER_FD		2
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
 
+/* Valid commands for SECCOMP_FILTER_FD */
+#define SECCOMP_FD_NEW		0
+#define SECCOMP_FD_INSTALL	1
+#define SECCOMP_FD_DUMP	2
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
@@ -51,4 +57,22 @@ struct seccomp_data {
 	__u64 args[6];
 };
 
+struct seccomp_fd {
+	__u32			size;
+
+	union {
+		/* SECCOMP_FD_NEW */
+		struct sock_fprog __user	*new_prog;
+
+		/* SECCOMP_FD_INSTALL */
+		int	install_fd;
+
+		/* SECCOMP_FD_DUMP */
+		struct {
+			int				dump_fd;
+			struct sock_filter __user	*insns;
+		};
+	};
+};
+
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 09f3769..6f0465c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,8 @@
 #endif
 
 #ifdef CONFIG_SECCOMP_FILTER
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
 #include <linux/filter.h>
 #include <linux/pid.h>
 #include <linux/ptrace.h>
@@ -58,6 +60,7 @@ struct seccomp_filter {
 	atomic_t usage;
 	struct seccomp_filter *prev;
 	struct bpf_prog *prog;
+	spinlock_t prev_lock;
 };
 
 /* Limit any path through the tree to 256KB worth of instructions. */
@@ -393,6 +396,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 	}
 
 	atomic_set(&sfilter->usage, 1);
+	sfilter->prev_lock = __SPIN_LOCK_UNLOCKED(&sfilter->prev_lock);
 
 	return sfilter;
 }
@@ -441,6 +445,7 @@ static long seccomp_attach_filter(unsigned int flags,
 	struct seccomp_filter *walker;
 
 	assert_spin_locked(&current->sighand->siglock);
+	assert_spin_locked(&filter->prev_lock);
 
 	/* Validate resulting filter length. */
 	total_insns = filter->prog->len;
@@ -482,10 +487,8 @@ void get_seccomp_filter(struct task_struct *tsk)
 	atomic_inc(&orig->usage);
 }
 
-/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
-void put_seccomp_filter(struct task_struct *tsk)
+static void seccomp_filter_decref(struct seccomp_filter *orig)
 {
-	struct seccomp_filter *orig = tsk->seccomp.filter;
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
@@ -494,6 +497,12 @@ void put_seccomp_filter(struct task_struct *tsk)
 	}
 }
 
+/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
+void put_seccomp_filter(struct task_struct *tsk)
+{
+	seccomp_filter_decref(tsk->seccomp.filter);
+}
+
 /**
  * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
  * @syscall: syscall number to send to userland
@@ -797,7 +806,9 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	if (!seccomp_may_assign_mode(seccomp_mode))
 		goto out;
 
+	spin_lock_irq(&prepared->prev_lock);
 	ret = seccomp_attach_filter(flags, prepared);
+	spin_unlock_irq(&prepared->prev_lock);
 	if (ret)
 		goto out;
 	/* Do not free the successfully attached filter. */
@@ -812,12 +823,182 @@ out_free:
 	seccomp_filter_free(prepared);
 	return ret;
 }
+
+int seccomp_fd_release(struct inode *ino, struct file *f)
+{
+	seccomp_filter_decref(f->private_data);
+	return 0;
+}
+
+static const struct file_operations seccomp_fops = {
+	.release = seccomp_fd_release,
+};
+
+static long seccomp_fd_new(struct seccomp_fd *seccomp_fd)
+{
+	struct seccomp_filter *filter;
+	long fd;
+	char __user *prog = (char __user *) seccomp_fd->new_prog;
+
+	filter = seccomp_prepare_user_filter(prog);
+	if (IS_ERR(filter))
+		return PTR_ERR(filter);
+
+	fd = anon_inode_getfd("seccomp", &seccomp_fops, filter,
+			      O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		seccomp_filter_decref(filter);
+
+	return fd;
+}
+
+static struct seccomp_filter *seccomp_filter_get(struct fd f)
+{
+	struct seccomp_filter *filter;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	if (f.file->f_op != &seccomp_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	filter = f.file->private_data;
+
+	return filter;
+}
+
+static long seccomp_fd_install(struct seccomp_fd *seccomp_fd)
+{
+	struct fd f;
+	struct seccomp_filter *filter;
+	int ret = -EINVAL;
+
+	f = fdget(seccomp_fd->install_fd);
+	filter = seccomp_filter_get(f);
+	if (IS_ERR(filter))
+		return PTR_ERR(filter);
+	atomic_inc(&filter->usage);
+	fdput(f);
+
+	spin_lock_irq(&current->sighand->siglock);
+	if (!seccomp_may_assign_mode(SECCOMP_MODE_FILTER))
+		goto out_sigunlock;
+
+	spin_lock_irq(&filter->prev_lock);
+	if (filter->prev && !is_ancestor(current->seccomp.filter, filter))
+		goto out_prev_unlock;
+
+	ret = seccomp_attach_filter(0, filter);
+	/* This may be the first filter installed, so let's set mode */
+	if (ret >= 0)
+		seccomp_assign_mode(current, SECCOMP_MODE_FILTER);
+
+out_prev_unlock:
+	spin_unlock_irq(&filter->prev_lock);
+
+out_sigunlock:
+	spin_unlock_irq(&current->sighand->siglock);
+
+	/* If the filter failed to install, let's decref it */
+	if (ret < 0)
+		seccomp_filter_decref(filter);
+	return ret;
+}
+
+static long seccomp_fd_dump(struct seccomp_fd *seccomp_fd)
+{
+	struct fd f;
+	int len;
+	struct sock_fprog_kern *orig;
+	struct seccomp_filter *filter;
+
+	f = fdget(seccomp_fd->dump_fd);
+	filter = seccomp_filter_get(f);
+	if (IS_ERR(filter))
+		return PTR_ERR(filter);
+
+	orig = filter->prog->orig_prog;
+	len = bpf_classic_proglen(orig);
+
+	/* Allow asking how long the filter is by passing a null buffer. */
+	if (seccomp_fd->insns &&
+	    copy_to_user(seccomp_fd->insns, orig->filter, len))
+		len = -EFAULT;
+
+	fdput(f);
+	return len;
+}
+
+static long seccomp_filter_fd(unsigned int cmd,
+			      const char __user *ulayer)
+{
+	long ret;
+	u32 size;
+	struct seccomp_fd seccomp_fd;
+	struct seccomp_fd __user *useccomp_fd =
+		(struct seccomp_fd __user *) ulayer;
+
+	/* As above, we restrict access to seccomp fds to processes who are
+	 * root in their own user ns.
+	 */
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(), current_user_ns(),
+				     CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+
+	if (get_user(size, &useccomp_fd->size))
+		return -EFAULT;
+
+	if (size > sizeof(seccomp_fd)) {
+		unsigned char __user *addr;
+		unsigned char __user *end;
+		unsigned char val;
+
+		addr = (void __user *)useccomp_fd + sizeof(seccomp_fd);
+		end  = (void __user *)useccomp_fd + size;
+
+		for (; addr < end; addr++) {
+			if (get_user(val, addr))
+				return -EFAULT;
+			if (val)
+				return -E2BIG;
+		}
+		size = sizeof(seccomp_fd);
+	}
+
+	if (copy_from_user(&seccomp_fd, useccomp_fd, size))
+		return -EFAULT;
+
+	switch (cmd) {
+	case SECCOMP_FD_NEW:
+		ret = seccomp_fd_new(&seccomp_fd);
+		break;
+	case SECCOMP_FD_INSTALL:
+		ret = seccomp_fd_install(&seccomp_fd);
+		break;
+	case SECCOMP_FD_DUMP:
+		ret = seccomp_fd_dump(&seccomp_fd);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
 #else
 static inline long seccomp_set_mode_filter(unsigned int flags,
 					   const char __user *filter)
 {
 	return -EINVAL;
 }
+
+static inline long seccomp_filter_fd(unsigned int flags
+				     const char __user *filter)
+{
+	return -EINVAL;
+}
 #endif
 
 /* Common entry point for both prctl and syscall. */
@@ -831,6 +1012,8 @@ static long do_seccomp(unsigned int op, unsigned int flags,
 		return seccomp_set_mode_strict();
 	case SECCOMP_SET_MODE_FILTER:
 		return seccomp_set_mode_filter(flags, uargs);
+	case SECCOMP_FILTER_FD:
+		return seccomp_filter_fd(flags, uargs);
 	default:
 		return -EINVAL;
 	}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/5] seccomp: add a ptrace command to get seccomp filter fds
  2015-09-30 18:13 checkpoint/restore of seccomp filters v3 Tycho Andersen
  2015-09-30 18:13 ` [PATCH v3 1/5] seccomp: save the original filter Tycho Andersen
  2015-09-30 18:13 ` [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD Tycho Andersen
@ 2015-09-30 18:13 ` Tycho Andersen
  2015-09-30 18:13 ` [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA Tycho Andersen
  2015-09-30 18:13 ` [PATCH v3 5/5] bpf: save the program the user actually supplied Tycho Andersen
  4 siblings, 0 replies; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:13 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	linux-api, Tycho Andersen

I just picked 40 for the constant out of thin air, but there may be a more
appropriate value for this. Also, we return EINVAL when there is no filter
for the index the user requested, but ptrace also returns EINVAL for
invalid commands, making it slightly awkward to test whether or not the
kernel supports this feature. It can still be done via,

if (is_in_mode_filter(pid)) {
	int fd;

	fd = ptrace(PTRACE_SECCOMP_GET_FILTER_FD, pid, NULL, 0);
	if (fd < 0 && errno == -EINVAL)
		/* not supported */

	...
}

since being in SECCOMP_MODE_FILTER implies that there is at least one
filter. If there is a more appropriate errno (ESRCH collides as well with
ptrace) to give here that may be better.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h     |  9 +++++++++
 include/uapi/linux/ptrace.h |  2 ++
 kernel/ptrace.c             |  4 ++++
 kernel/seccomp.c            | 28 ++++++++++++++++++++++++++++
 4 files changed, 43 insertions(+)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..637d91f 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,13 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_CHECKPOINT_RESTORE) && defined(CONFIG_SECCOMP_FILTER)
+extern long seccomp_get_filter_fd(struct task_struct *task, long data);
+#else
+static inline long seccomp_get_filter_fd(struct task_struct *task, long data)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_CHECKPOINT_RESTORE && CONFIG_SECCOMP_FILTER */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..3271f5a 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -23,6 +23,8 @@
 
 #define PTRACE_SYSCALL		  24
 
+#define PTRACE_SECCOMP_GET_FILTER_FD 40
+
 /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
 #define PTRACE_SETOPTIONS	0x4200
 #define PTRACE_GETEVENTMSG	0x4201
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..aede440 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,10 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER_FD:
+		return seccomp_get_filter_fd(child, data);
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 6f0465c..7275ce0 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1058,3 +1058,31 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 	/* prctl interface doesn't have flags, so they are always zero. */
 	return do_seccomp(op, 0, uargs);
 }
+
+#if defined(CONFIG_CHECKPOINT_RESTORE) && defined(CONFIG_SECCOMP_FILTER)
+long seccomp_get_filter_fd(struct task_struct *task, long n)
+{
+	struct seccomp_filter *filter;
+	long fd;
+
+	if (task->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EINVAL;
+
+	filter = task->seccomp.filter;
+	while (n > 0 && filter) {
+		filter = filter->prev;
+		n--;
+	}
+
+	if (!filter)
+		return -EINVAL;
+
+	atomic_inc(&filter->usage);
+	fd = anon_inode_getfd("seccomp", &seccomp_fops, filter,
+			      O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		seccomp_filter_decref(filter);
+
+	return fd;
+}
+#endif
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 18:13 checkpoint/restore of seccomp filters v3 Tycho Andersen
                   ` (2 preceding siblings ...)
  2015-09-30 18:13 ` [PATCH v3 3/5] seccomp: add a ptrace command to get seccomp filter fds Tycho Andersen
@ 2015-09-30 18:13 ` Tycho Andersen
  2015-09-30 18:25   ` Andy Lutomirski
  2015-09-30 18:13 ` [PATCH v3 5/5] bpf: save the program the user actually supplied Tycho Andersen
  4 siblings, 1 reply; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:13 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	linux-api, Tycho Andersen

This command allows comparing the underling private data of two fds. This
is useful e.g. to find out if a seccomp filter is inherited, since struct
seccomp_filter are unique across tasks and are the private_data seccomp
fds.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/kcmp.h |  1 +
 kernel/kcmp.c             | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/uapi/linux/kcmp.h b/include/uapi/linux/kcmp.h
index 84df14b..ed389d2 100644
--- a/include/uapi/linux/kcmp.h
+++ b/include/uapi/linux/kcmp.h
@@ -10,6 +10,7 @@ enum kcmp_type {
 	KCMP_SIGHAND,
 	KCMP_IO,
 	KCMP_SYSVSEM,
+	KCMP_FILE_PRIVATE_DATA,
 
 	KCMP_TYPES,
 };
diff --git a/kernel/kcmp.c b/kernel/kcmp.c
index 0aa69ea..9ae673b 100644
--- a/kernel/kcmp.c
+++ b/kernel/kcmp.c
@@ -165,6 +165,20 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
 		ret = -EOPNOTSUPP;
 #endif
 		break;
+	case KCMP_FILE_PRIVATE_DATA: {
+		struct file *filp1, *filp2;
+
+		filp1 = get_file_raw_ptr(task1, idx1);
+		filp2 = get_file_raw_ptr(task2, idx2);
+
+		if (filp1 && filp2)
+			ret = kcmp_ptr(filp1->private_data,
+				       filp2->private_data,
+				       KCMP_FILE_PRIVATE_DATA);
+		else
+			ret = -EBADF;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/5] bpf: save the program the user actually supplied
  2015-09-30 18:13 checkpoint/restore of seccomp filters v3 Tycho Andersen
                   ` (3 preceding siblings ...)
  2015-09-30 18:13 ` [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA Tycho Andersen
@ 2015-09-30 18:13 ` Tycho Andersen
  4 siblings, 0 replies; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:13 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, linux-kernel, netdev,
	linux-api, Tycho Andersen

In some cases (e.g. seccomp) the program result might be translated from
the original program the user supplied. If we're saving the result for
checkpoint/restore, we should save exactly the program the user initially
supplied.

This causes problems when the translations seccomp makes are not allowed by
bpf_check_classic.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 net/core/filter.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 70995dd..5a4596b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -845,8 +845,7 @@ int bpf_prog_store_orig_filter(struct bpf_prog *fp,
 	fkprog = fp->orig_prog;
 	fkprog->len = fprog->len;
 
-	fkprog->filter = kmemdup(fp->insns, fsize,
-				 GFP_KERNEL | __GFP_NOWARN);
+	fkprog->filter = memdup_user(fprog->filter, fsize);
 	if (!fkprog->filter) {
 		kfree(fp->orig_prog);
 		return -ENOMEM;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 18:13 ` [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA Tycho Andersen
@ 2015-09-30 18:25   ` Andy Lutomirski
  2015-09-30 18:41     ` Tycho Andersen
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2015-09-30 18:25 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> This command allows comparing the underling private data of two fds. This
> is useful e.g. to find out if a seccomp filter is inherited, since struct
> seccomp_filter are unique across tasks and are the private_data seccomp
> fds.

This is very implementation-specific and may have nasty ABI
consequences far outside seccomp.  Let's do something specific to
seccomp and/or eBPF.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD
  2015-09-30 18:13 ` [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD Tycho Andersen
@ 2015-09-30 18:27   ` Andy Lutomirski
  2015-09-30 18:36     ` Tycho Andersen
  2015-09-30 18:29   ` kbuild test robot
  1 sibling, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2015-09-30 18:27 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> This patch introduces the concept of a seccomp fd, with a similar interface
> and usage to ebpf fds. Initially, one is allowed to create, install, and
> dump these fds. Any manipulation of seccomp fds requires users to be root
> in their own user namespace, matching the checks done for
> SECCOMP_SET_MODE_FILTER.
>
> Installing a filterfd has some gotchas, though. Andy mentioned previously
> that we should restrict installation to filter fds whose parent is already
> in the filter tree. This doesn't quite work in the case of created seccomp
> fds, since once you install a filter fd, you can't install any other filter
> fd since it has no parent and there is no way to "pre-chain" filters before
> installing them.

ISTM, if we like the seccomp fd approach, we should have them be
created with a parent already set.  IOW the default should be that
their parent is the creator's seccomp fd and, if needed, creators
could specify a different parent.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD
  2015-09-30 18:13 ` [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD Tycho Andersen
  2015-09-30 18:27   ` Andy Lutomirski
@ 2015-09-30 18:29   ` kbuild test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kbuild test robot @ 2015-09-30 18:29 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: kbuild-all, Kees Cook, Alexei Starovoitov, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel, netdev, linux-api, Tycho Andersen

[-- Attachment #1: Type: text/plain, Size: 1765 bytes --]

Hi Tycho,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: i386-alldefconfig (attached as .config)
reproduce:
  git checkout 9613ae6bf5f111701614acb3eda3123d21a59239
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

>> kernel/seccomp.c:998:10: error: expected ';', ',' or ')' before 'const'
             const char __user *filter)
             ^
   kernel/seccomp.c: In function 'do_seccomp':
>> kernel/seccomp.c:1016:10: error: implicit declaration of function 'seccomp_filter_fd' [-Werror=implicit-function-declaration]
      return seccomp_filter_fd(flags, uargs);
             ^
   cc1: some warnings being treated as errors

vim +998 kernel/seccomp.c

   992						   const char __user *filter)
   993	{
   994		return -EINVAL;
   995	}
   996	
   997	static inline long seccomp_filter_fd(unsigned int flags
 > 998					     const char __user *filter)
   999	{
  1000		return -EINVAL;
  1001	}
  1002	#endif
  1003	
  1004	/* Common entry point for both prctl and syscall. */
  1005	static long do_seccomp(unsigned int op, unsigned int flags,
  1006			       const char __user *uargs)
  1007	{
  1008		switch (op) {
  1009		case SECCOMP_SET_MODE_STRICT:
  1010			if (flags != 0 || uargs != NULL)
  1011				return -EINVAL;
  1012			return seccomp_set_mode_strict();
  1013		case SECCOMP_SET_MODE_FILTER:
  1014			return seccomp_set_mode_filter(flags, uargs);
  1015		case SECCOMP_FILTER_FD:
> 1016			return seccomp_filter_fd(flags, uargs);
  1017		default:
  1018			return -EINVAL;
  1019		}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 9474 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD
  2015-09-30 18:27   ` Andy Lutomirski
@ 2015-09-30 18:36     ` Tycho Andersen
  2015-09-30 18:47       ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:27:34AM -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > This patch introduces the concept of a seccomp fd, with a similar interface
> > and usage to ebpf fds. Initially, one is allowed to create, install, and
> > dump these fds. Any manipulation of seccomp fds requires users to be root
> > in their own user namespace, matching the checks done for
> > SECCOMP_SET_MODE_FILTER.
> >
> > Installing a filterfd has some gotchas, though. Andy mentioned previously
> > that we should restrict installation to filter fds whose parent is already
> > in the filter tree. This doesn't quite work in the case of created seccomp
> > fds, since once you install a filter fd, you can't install any other filter
> > fd since it has no parent and there is no way to "pre-chain" filters before
> > installing them.
> 
> ISTM, if we like the seccomp fd approach, we should have them be
> created with a parent already set.  IOW the default should be that
> their parent is the creator's seccomp fd and, if needed, creators
> could specify a different parent.

Allowing people doing SECCOMP_FD_NEW to specify a parent fd would
work. Then we can disallow installing a seccomp fd if its parent is
not the current filter, and get rid of the whole mess with prev
locking and all that.

Tycho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 18:25   ` Andy Lutomirski
@ 2015-09-30 18:41     ` Tycho Andersen
  2015-09-30 18:47       ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > This command allows comparing the underling private data of two fds. This
> > is useful e.g. to find out if a seccomp filter is inherited, since struct
> > seccomp_filter are unique across tasks and are the private_data seccomp
> > fds.
> 
> This is very implementation-specific and may have nasty ABI
> consequences far outside seccomp.  Let's do something specific to
> seccomp and/or eBPF.

We could change the name to a less generic KCMP_SECCOMP_FD or
something, but without some sort of GUID on each struct
seccomp_filter, the implementation would be effectively the same as it
is today. Is that enough, or do we need a GUID?

Tycho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 18:41     ` Tycho Andersen
@ 2015-09-30 18:47       ` Andy Lutomirski
  2015-09-30 18:55         ` Tycho Andersen
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2015-09-30 18:47 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:41 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > This command allows comparing the underling private data of two fds. This
>> > is useful e.g. to find out if a seccomp filter is inherited, since struct
>> > seccomp_filter are unique across tasks and are the private_data seccomp
>> > fds.
>>
>> This is very implementation-specific and may have nasty ABI
>> consequences far outside seccomp.  Let's do something specific to
>> seccomp and/or eBPF.
>
> We could change the name to a less generic KCMP_SECCOMP_FD or
> something, but without some sort of GUID on each struct
> seccomp_filter, the implementation would be effectively the same as it
> is today. Is that enough, or do we need a GUID?
>

I don't care about the GUID.  I think we should name it
KCMP_SECCOMP_FD and make it only work on seccomp fds.

Alternatively, we could figure out why KCMP_FILE doesn't do the trick
and consider fixing it.  IMO it's really too bad that struct file is
so heavyweight that we can't really just embed one in all kinds of
structures.


--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD
  2015-09-30 18:36     ` Tycho Andersen
@ 2015-09-30 18:47       ` Andy Lutomirski
  0 siblings, 0 replies; 19+ messages in thread
From: Andy Lutomirski @ 2015-09-30 18:47 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:36 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Wed, Sep 30, 2015 at 11:27:34AM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > This patch introduces the concept of a seccomp fd, with a similar interface
>> > and usage to ebpf fds. Initially, one is allowed to create, install, and
>> > dump these fds. Any manipulation of seccomp fds requires users to be root
>> > in their own user namespace, matching the checks done for
>> > SECCOMP_SET_MODE_FILTER.
>> >
>> > Installing a filterfd has some gotchas, though. Andy mentioned previously
>> > that we should restrict installation to filter fds whose parent is already
>> > in the filter tree. This doesn't quite work in the case of created seccomp
>> > fds, since once you install a filter fd, you can't install any other filter
>> > fd since it has no parent and there is no way to "pre-chain" filters before
>> > installing them.
>>
>> ISTM, if we like the seccomp fd approach, we should have them be
>> created with a parent already set.  IOW the default should be that
>> their parent is the creator's seccomp fd and, if needed, creators
>> could specify a different parent.
>
> Allowing people doing SECCOMP_FD_NEW to specify a parent fd would
> work. Then we can disallow installing a seccomp fd if its parent is
> not the current filter, and get rid of the whole mess with prev
> locking and all that.
>

Yes, please.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 18:47       ` Andy Lutomirski
@ 2015-09-30 18:55         ` Tycho Andersen
  2015-09-30 18:56           ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 18:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:47:05AM -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 11:41 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > This command allows comparing the underling private data of two fds. This
> >> > is useful e.g. to find out if a seccomp filter is inherited, since struct
> >> > seccomp_filter are unique across tasks and are the private_data seccomp
> >> > fds.
> >>
> >> This is very implementation-specific and may have nasty ABI
> >> consequences far outside seccomp.  Let's do something specific to
> >> seccomp and/or eBPF.
> >
> > We could change the name to a less generic KCMP_SECCOMP_FD or
> > something, but without some sort of GUID on each struct
> > seccomp_filter, the implementation would be effectively the same as it
> > is today. Is that enough, or do we need a GUID?
> >
> 
> I don't care about the GUID.  I think we should name it
> KCMP_SECCOMP_FD and make it only work on seccomp fds.

Ok, I can do that.

> Alternatively, we could figure out why KCMP_FILE doesn't do the trick
> and consider fixing it.  IMO it's really too bad that struct file is
> so heavyweight that we can't really just embed one in all kinds of
> structures.

The problem is that KCMP_FILE compares the file objects themselves,
instead of the underlying data. If I ask for a seccomp fd for filter 0
twice, I'll have two different file objects and they won't be equal. I
suppose we could add some special logic inside KCMP_FILE to compare
the underlying data in special cases (seccomp, ebpf, others?), but it
seems cleaner to have a separate command as you described above.

Tycho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 18:55         ` Tycho Andersen
@ 2015-09-30 18:56           ` Andy Lutomirski
  2015-09-30 21:39             ` Tycho Andersen
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2015-09-30 18:56 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 11:55 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Wed, Sep 30, 2015 at 11:47:05AM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 30, 2015 at 11:41 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
>> >> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
>> >> <tycho.andersen@canonical.com> wrote:
>> >> > This command allows comparing the underling private data of two fds. This
>> >> > is useful e.g. to find out if a seccomp filter is inherited, since struct
>> >> > seccomp_filter are unique across tasks and are the private_data seccomp
>> >> > fds.
>> >>
>> >> This is very implementation-specific and may have nasty ABI
>> >> consequences far outside seccomp.  Let's do something specific to
>> >> seccomp and/or eBPF.
>> >
>> > We could change the name to a less generic KCMP_SECCOMP_FD or
>> > something, but without some sort of GUID on each struct
>> > seccomp_filter, the implementation would be effectively the same as it
>> > is today. Is that enough, or do we need a GUID?
>> >
>>
>> I don't care about the GUID.  I think we should name it
>> KCMP_SECCOMP_FD and make it only work on seccomp fds.
>
> Ok, I can do that.
>
>> Alternatively, we could figure out why KCMP_FILE doesn't do the trick
>> and consider fixing it.  IMO it's really too bad that struct file is
>> so heavyweight that we can't really just embed one in all kinds of
>> structures.
>
> The problem is that KCMP_FILE compares the file objects themselves,
> instead of the underlying data. If I ask for a seccomp fd for filter 0
> twice, I'll have two different file objects and they won't be equal. I
> suppose we could add some special logic inside KCMP_FILE to compare
> the underlying data in special cases (seccomp, ebpf, others?), but it
> seems cleaner to have a separate command as you described above.
>

What I meant was that maybe we could get the two requests to actually
produce the same struct file.  But that could get very messy
memory-wise.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 18:56           ` Andy Lutomirski
@ 2015-09-30 21:39             ` Tycho Andersen
  2015-09-30 21:48               ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 21:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

[-- Attachment #1: Type: text/plain, Size: 2362 bytes --]

On Wed, Sep 30, 2015 at 11:56:25AM -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 11:55 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Wed, Sep 30, 2015 at 11:47:05AM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 30, 2015 at 11:41 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
> >> >> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
> >> >> <tycho.andersen@canonical.com> wrote:
> >> >> > This command allows comparing the underling private data of two fds. This
> >> >> > is useful e.g. to find out if a seccomp filter is inherited, since struct
> >> >> > seccomp_filter are unique across tasks and are the private_data seccomp
> >> >> > fds.
> >> >>
> >> >> This is very implementation-specific and may have nasty ABI
> >> >> consequences far outside seccomp.  Let's do something specific to
> >> >> seccomp and/or eBPF.
> >> >
> >> > We could change the name to a less generic KCMP_SECCOMP_FD or
> >> > something, but without some sort of GUID on each struct
> >> > seccomp_filter, the implementation would be effectively the same as it
> >> > is today. Is that enough, or do we need a GUID?
> >> >
> >>
> >> I don't care about the GUID.  I think we should name it
> >> KCMP_SECCOMP_FD and make it only work on seccomp fds.
> >
> > Ok, I can do that.
> >
> >> Alternatively, we could figure out why KCMP_FILE doesn't do the trick
> >> and consider fixing it.  IMO it's really too bad that struct file is
> >> so heavyweight that we can't really just embed one in all kinds of
> >> structures.
> >
> > The problem is that KCMP_FILE compares the file objects themselves,
> > instead of the underlying data. If I ask for a seccomp fd for filter 0
> > twice, I'll have two different file objects and they won't be equal. I
> > suppose we could add some special logic inside KCMP_FILE to compare
> > the underlying data in special cases (seccomp, ebpf, others?), but it
> > seems cleaner to have a separate command as you described above.
> >
> 
> What I meant was that maybe we could get the two requests to actually
> produce the same struct file.  But that could get very messy
> memory-wise.

I see. The attached patch seems to work with KCMP_FILE and doesn't
look too bad if you don't mind the circular references. What do you
think?

Tycho

[-- Attachment #2: 0001-use-exactly-one-seccomp-file-per-filter-object.patch --]
[-- Type: text/x-diff, Size: 2488 bytes --]

>From 5c410df2df219dc9a68074afe5458b5563b89940 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.andersen@canonical.com>
Date: Wed, 30 Sep 2015 14:53:48 -0600
Subject: [PATCH] use exactly one seccomp file per filter object

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
---
 kernel/seccomp.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index af58c49..ff3b1bd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -60,6 +60,13 @@ struct seccomp_filter {
 	atomic_t usage;
 	struct seccomp_filter *prev;
 	struct bpf_prog *prog;
+
+	/* The file representing this seccomp_filter, if there is one. A 1:1
+	 * file:seccomp_filter mapping allows us to compare seccomp_filters via
+	 * kcmp(KCMP_FILE, ...).
+	 */
+	struct file *seccomp_file;
+	struct mutex file_lock;
 };
 
 /* Limit any path through the tree to 256KB worth of instructions. */
@@ -395,6 +402,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 	}
 
 	atomic_set(&sfilter->usage, 1);
+	mutex_init(&sfilter->file_lock);
 
 	return sfilter;
 }
@@ -821,7 +829,14 @@ out_free:
 
 int seccomp_fd_release(struct inode *ino, struct file *f)
 {
-	seccomp_filter_decref(f->private_data);
+	struct seccomp_filter *filter = f->private_data;
+
+	mutex_lock(&filter->file_lock);
+	filter->seccomp_file = NULL;
+	mutex_unlock(&filter->file_lock);
+
+	seccomp_filter_decref(filter);
+
 	return 0;
 }
 
@@ -1073,7 +1088,9 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 long seccomp_get_filter_fd(struct task_struct *task, long n)
 {
 	struct seccomp_filter *filter;
+	struct file *file;
 	long fd;
+	int flags = O_RDONLY | O_CLOEXEC;
 
 	if (task->seccomp.mode != SECCOMP_MODE_FILTER)
 		return -EINVAL;
@@ -1087,11 +1104,21 @@ long seccomp_get_filter_fd(struct task_struct *task, long n)
 	if (!filter)
 		return -EINVAL;
 
-	atomic_inc(&filter->usage);
-	fd = anon_inode_getfd("seccomp", &seccomp_fops, filter,
-			      O_RDONLY | O_CLOEXEC);
+	fd = get_unused_fd_flags(flags);
 	if (fd < 0)
-		seccomp_filter_decref(filter);
+		return fd;
+
+	mutex_lock(&filter->file_lock);
+	file = filter->seccomp_file;
+	if (!file) {
+		atomic_inc(&filter->usage);
+		file = anon_inode_getfile("seccomp", &seccomp_fops, filter,
+					  flags);
+		filter->seccomp_file = file;
+	}
+	mutex_unlock(&filter->file_lock);
+
+	fd_install(fd, file);
 
 	return fd;
 }
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 21:39             ` Tycho Andersen
@ 2015-09-30 21:48               ` Andy Lutomirski
  2015-09-30 22:10                 ` Tycho Andersen
  2015-10-01 16:45                 ` Tycho Andersen
  0 siblings, 2 replies; 19+ messages in thread
From: Andy Lutomirski @ 2015-09-30 21:48 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 2:39 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Wed, Sep 30, 2015 at 11:56:25AM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 30, 2015 at 11:55 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > On Wed, Sep 30, 2015 at 11:47:05AM -0700, Andy Lutomirski wrote:
>> >> On Wed, Sep 30, 2015 at 11:41 AM, Tycho Andersen
>> >> <tycho.andersen@canonical.com> wrote:
>> >> > On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
>> >> >> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
>> >> >> <tycho.andersen@canonical.com> wrote:
>> >> >> > This command allows comparing the underling private data of two fds. This
>> >> >> > is useful e.g. to find out if a seccomp filter is inherited, since struct
>> >> >> > seccomp_filter are unique across tasks and are the private_data seccomp
>> >> >> > fds.
>> >> >>
>> >> >> This is very implementation-specific and may have nasty ABI
>> >> >> consequences far outside seccomp.  Let's do something specific to
>> >> >> seccomp and/or eBPF.
>> >> >
>> >> > We could change the name to a less generic KCMP_SECCOMP_FD or
>> >> > something, but without some sort of GUID on each struct
>> >> > seccomp_filter, the implementation would be effectively the same as it
>> >> > is today. Is that enough, or do we need a GUID?
>> >> >
>> >>
>> >> I don't care about the GUID.  I think we should name it
>> >> KCMP_SECCOMP_FD and make it only work on seccomp fds.
>> >
>> > Ok, I can do that.
>> >
>> >> Alternatively, we could figure out why KCMP_FILE doesn't do the trick
>> >> and consider fixing it.  IMO it's really too bad that struct file is
>> >> so heavyweight that we can't really just embed one in all kinds of
>> >> structures.
>> >
>> > The problem is that KCMP_FILE compares the file objects themselves,
>> > instead of the underlying data. If I ask for a seccomp fd for filter 0
>> > twice, I'll have two different file objects and they won't be equal. I
>> > suppose we could add some special logic inside KCMP_FILE to compare
>> > the underlying data in special cases (seccomp, ebpf, others?), but it
>> > seems cleaner to have a separate command as you described above.
>> >
>>
>> What I meant was that maybe we could get the two requests to actually
>> produce the same struct file.  But that could get very messy
>> memory-wise.
>
> I see. The attached patch seems to work with KCMP_FILE and doesn't
> look too bad if you don't mind the circular references. What do you
> think?
>

Could be reasonable.  I'm not that well versed on what fd_release
does.  Are we guaranteed that it's called when the last fd goes away
even if the struct file is pinned by the struct seccomp filter but is
otherwise unreferenced?

Kees?

How many of you will be at KS?

--Andy

> Tycho



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 21:48               ` Andy Lutomirski
@ 2015-09-30 22:10                 ` Tycho Andersen
  2015-10-01 16:45                 ` Tycho Andersen
  1 sibling, 0 replies; 19+ messages in thread
From: Tycho Andersen @ 2015-09-30 22:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 02:48:47PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 2:39 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Wed, Sep 30, 2015 at 11:56:25AM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 30, 2015 at 11:55 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > On Wed, Sep 30, 2015 at 11:47:05AM -0700, Andy Lutomirski wrote:
> >> >> On Wed, Sep 30, 2015 at 11:41 AM, Tycho Andersen
> >> >> <tycho.andersen@canonical.com> wrote:
> >> >> > On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
> >> >> >> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
> >> >> >> <tycho.andersen@canonical.com> wrote:
> >> >> >> > This command allows comparing the underling private data of two fds. This
> >> >> >> > is useful e.g. to find out if a seccomp filter is inherited, since struct
> >> >> >> > seccomp_filter are unique across tasks and are the private_data seccomp
> >> >> >> > fds.
> >> >> >>
> >> >> >> This is very implementation-specific and may have nasty ABI
> >> >> >> consequences far outside seccomp.  Let's do something specific to
> >> >> >> seccomp and/or eBPF.
> >> >> >
> >> >> > We could change the name to a less generic KCMP_SECCOMP_FD or
> >> >> > something, but without some sort of GUID on each struct
> >> >> > seccomp_filter, the implementation would be effectively the same as it
> >> >> > is today. Is that enough, or do we need a GUID?
> >> >> >
> >> >>
> >> >> I don't care about the GUID.  I think we should name it
> >> >> KCMP_SECCOMP_FD and make it only work on seccomp fds.
> >> >
> >> > Ok, I can do that.
> >> >
> >> >> Alternatively, we could figure out why KCMP_FILE doesn't do the trick
> >> >> and consider fixing it.  IMO it's really too bad that struct file is
> >> >> so heavyweight that we can't really just embed one in all kinds of
> >> >> structures.
> >> >
> >> > The problem is that KCMP_FILE compares the file objects themselves,
> >> > instead of the underlying data. If I ask for a seccomp fd for filter 0
> >> > twice, I'll have two different file objects and they won't be equal. I
> >> > suppose we could add some special logic inside KCMP_FILE to compare
> >> > the underlying data in special cases (seccomp, ebpf, others?), but it
> >> > seems cleaner to have a separate command as you described above.
> >> >
> >>
> >> What I meant was that maybe we could get the two requests to actually
> >> produce the same struct file.  But that could get very messy
> >> memory-wise.
> >
> > I see. The attached patch seems to work with KCMP_FILE and doesn't
> > look too bad if you don't mind the circular references. What do you
> > think?
> >
> 
> Could be reasonable.  I'm not that well versed on what fd_release
> does.  Are we guaranteed that it's called when the last fd goes away
> even if the struct file is pinned by the struct seccomp filter but is
> otherwise unreferenced?

I think so; my understanding is that the vfs doesn't know anything
about the seccomp_filter reference, so when the last fd referring to
the struct file is closed that's when release is called.

> Kees?
> 
> How many of you will be at KS?

I will not.

Tycho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA
  2015-09-30 21:48               ` Andy Lutomirski
  2015-09-30 22:10                 ` Tycho Andersen
@ 2015-10-01 16:45                 ` Tycho Andersen
  1 sibling, 0 replies; 19+ messages in thread
From: Tycho Andersen @ 2015-10-01 16:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	Network Development, Linux API

On Wed, Sep 30, 2015 at 02:48:47PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 2:39 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Wed, Sep 30, 2015 at 11:56:25AM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 30, 2015 at 11:55 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > On Wed, Sep 30, 2015 at 11:47:05AM -0700, Andy Lutomirski wrote:
> >> >> On Wed, Sep 30, 2015 at 11:41 AM, Tycho Andersen
> >> >> <tycho.andersen@canonical.com> wrote:
> >> >> > On Wed, Sep 30, 2015 at 11:25:41AM -0700, Andy Lutomirski wrote:
> >> >> >> On Wed, Sep 30, 2015 at 11:13 AM, Tycho Andersen
> >> >> >> <tycho.andersen@canonical.com> wrote:
> >> >> >> > This command allows comparing the underling private data of two fds. This
> >> >> >> > is useful e.g. to find out if a seccomp filter is inherited, since struct
> >> >> >> > seccomp_filter are unique across tasks and are the private_data seccomp
> >> >> >> > fds.
> >> >> >>
> >> >> >> This is very implementation-specific and may have nasty ABI
> >> >> >> consequences far outside seccomp.  Let's do something specific to
> >> >> >> seccomp and/or eBPF.
> >> >> >
> >> >> > We could change the name to a less generic KCMP_SECCOMP_FD or
> >> >> > something, but without some sort of GUID on each struct
> >> >> > seccomp_filter, the implementation would be effectively the same as it
> >> >> > is today. Is that enough, or do we need a GUID?
> >> >> >
> >> >>
> >> >> I don't care about the GUID.  I think we should name it
> >> >> KCMP_SECCOMP_FD and make it only work on seccomp fds.
> >> >
> >> > Ok, I can do that.
> >> >
> >> >> Alternatively, we could figure out why KCMP_FILE doesn't do the trick
> >> >> and consider fixing it.  IMO it's really too bad that struct file is
> >> >> so heavyweight that we can't really just embed one in all kinds of
> >> >> structures.
> >> >
> >> > The problem is that KCMP_FILE compares the file objects themselves,
> >> > instead of the underlying data. If I ask for a seccomp fd for filter 0
> >> > twice, I'll have two different file objects and they won't be equal. I
> >> > suppose we could add some special logic inside KCMP_FILE to compare
> >> > the underlying data in special cases (seccomp, ebpf, others?), but it
> >> > seems cleaner to have a separate command as you described above.
> >> >
> >>
> >> What I meant was that maybe we could get the two requests to actually
> >> produce the same struct file.  But that could get very messy
> >> memory-wise.
> >
> > I see. The attached patch seems to work with KCMP_FILE and doesn't
> > look too bad if you don't mind the circular references. What do you
> > think?
> >
> 
> Could be reasonable.  I'm not that well versed on what fd_release
> does.  Are we guaranteed that it's called when the last fd goes away
> even if the struct file is pinned by the struct seccomp filter but is
> otherwise unreferenced?

After thinking about the patch a bit more, I think it's not safe.
There is a race where the last referenced is closed at the same time
someone else asks for the strcut file via ptrace(). I thought the lock
would fix this, but it doesn't. I'll see if I can find another way.

Tycho

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-10-01 16:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-30 18:13 checkpoint/restore of seccomp filters v3 Tycho Andersen
2015-09-30 18:13 ` [PATCH v3 1/5] seccomp: save the original filter Tycho Andersen
2015-09-30 18:13 ` [PATCH v3 2/5] seccomp: add the concept of a seccomp filter FD Tycho Andersen
2015-09-30 18:27   ` Andy Lutomirski
2015-09-30 18:36     ` Tycho Andersen
2015-09-30 18:47       ` Andy Lutomirski
2015-09-30 18:29   ` kbuild test robot
2015-09-30 18:13 ` [PATCH v3 3/5] seccomp: add a ptrace command to get seccomp filter fds Tycho Andersen
2015-09-30 18:13 ` [PATCH v3 4/5] kcmp: add KCMP_FILE_PRIVATE_DATA Tycho Andersen
2015-09-30 18:25   ` Andy Lutomirski
2015-09-30 18:41     ` Tycho Andersen
2015-09-30 18:47       ` Andy Lutomirski
2015-09-30 18:55         ` Tycho Andersen
2015-09-30 18:56           ` Andy Lutomirski
2015-09-30 21:39             ` Tycho Andersen
2015-09-30 21:48               ` Andy Lutomirski
2015-09-30 22:10                 ` Tycho Andersen
2015-10-01 16:45                 ` Tycho Andersen
2015-09-30 18:13 ` [PATCH v3 5/5] bpf: save the program the user actually supplied Tycho Andersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).