linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* v5 of seccomp filter c/r patches
@ 2015-10-02 16:27 Tycho Andersen
  2015-10-02 16:27 ` [PATCH v5 1/3] seccomp: add the concept of a seccomp filter FD Tycho Andersen
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Tycho Andersen @ 2015-10-02 16:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	netdev, linux-api

Hi all,

Here's v5 of the seccomp filter c/r set. The individual patch notes have
changes, but two highlights are:

* This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
  will need to be built with that patch applied. This gets rid of two incorrect
  patches in the previous series and is a nicer API.

* I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
  same struct file across calls, so we still need a kcmp command. I've narrowed
  the scope of the one being added to only compare seccomp fds.

Thoughts welcome,

Tycho


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/3] seccomp: add the concept of a seccomp filter FD
  2015-10-02 16:27 v5 of seccomp filter c/r patches Tycho Andersen
@ 2015-10-02 16:27 ` Tycho Andersen
  2015-10-02 16:27 ` [PATCH v5 2/3] seccomp: add a ptrace command to get seccomp filter fds Tycho Andersen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Tycho Andersen @ 2015-10-02 16:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	netdev, linux-api, Tycho Andersen

This patch introduces the concept of a seccomp fd, with a similar interface
and usage to ebpf fds. Initially, one is allowed to create, install, and
dump these fds. Any manipulation of seccomp fds requires users to be root
in their own user namespace, matching the checks done for
SECCOMP_SET_MODE_FILTER.

v2: Force users to specify the parent (as another fd) during fd creation,
    and don't allow them to install filters when a previously installed
    filter is not the parent of the to be installed filter. This avoids
    "re-parenting" scenarios, which can be racy or perhaps insecure.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h      |   5 ++
 include/uapi/linux/seccomp.h |  27 ++++++
 kernel/seccomp.c             | 203 ++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 232 insertions(+), 3 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..4253579 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -85,6 +85,7 @@ static inline int seccomp_mode(struct seccomp *s)
 #ifdef CONFIG_SECCOMP_FILTER
 extern void put_seccomp_filter(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
+extern struct seccomp_filter *seccomp_filter_from_file(struct file *f);
 #else  /* CONFIG_SECCOMP_FILTER */
 static inline void put_seccomp_filter(struct task_struct *tsk)
 {
@@ -94,5 +95,9 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 {
 	return;
 }
+static inline struct seccomp_filter *seccomp_filter_from_file(struct file *f)
+{
+	return ERR_PTR(-EINVAL);
+}
 #endif /* CONFIG_SECCOMP_FILTER */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..e9f3660 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,10 +13,16 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
+#define SECCOMP_FILTER_FD		2
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
 
+/* Valid commands for SECCOMP_FILTER_FD */
+#define SECCOMP_FD_NEW		0
+#define SECCOMP_FD_INSTALL	1
+#define SECCOMP_FD_DUMP	2
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
@@ -51,4 +57,25 @@ struct seccomp_data {
 	__u64 args[6];
 };
 
+struct seccomp_fd {
+	__u32			size;
+
+	union {
+		/* SECCOMP_FD_NEW */
+		struct {
+			struct sock_fprog __user	*new_prog;
+			int				new_parent;
+		};
+
+		/* SECCOMP_FD_INSTALL */
+		int	install_fd;
+
+		/* SECCOMP_FD_DUMP */
+		struct {
+			int				dump_fd;
+			struct sock_filter __user	*insns;
+		};
+	};
+};
+
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 06858a7..ea3337d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,8 @@
 #endif
 
 #ifdef CONFIG_SECCOMP_FILTER
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
 #include <linux/filter.h>
 #include <linux/pid.h>
 #include <linux/ptrace.h>
@@ -474,10 +476,8 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter)
 	}
 }
 
-/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
-void put_seccomp_filter(struct task_struct *tsk)
+static void seccomp_filter_decref(struct seccomp_filter *orig)
 {
-	struct seccomp_filter *orig = tsk->seccomp.filter;
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
@@ -486,6 +486,12 @@ void put_seccomp_filter(struct task_struct *tsk)
 	}
 }
 
+/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
+void put_seccomp_filter(struct task_struct *tsk)
+{
+	seccomp_filter_decref(tsk->seccomp.filter);
+}
+
 /**
  * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
  * @syscall: syscall number to send to userland
@@ -804,12 +810,201 @@ out_free:
 	seccomp_filter_free(prepared);
 	return ret;
 }
+
+int seccomp_fd_release(struct inode *ino, struct file *f)
+{
+	seccomp_filter_decref(f->private_data);
+	return 0;
+}
+
+static const struct file_operations seccomp_fops = {
+	.release = seccomp_fd_release,
+};
+
+struct seccomp_filter *seccomp_filter_from_file(struct file *f)
+{
+	struct seccomp_filter *filter;
+
+	if (!f)
+		return ERR_PTR(-EBADF);
+
+	if (f->f_op != &seccomp_fops)
+		return ERR_PTR(-EINVAL);
+
+	filter = f->private_data;
+
+	return filter;
+}
+
+static long seccomp_fd_new(struct seccomp_fd *seccomp_fd)
+{
+	struct seccomp_filter *filter, *parent = NULL;
+	long fd = -1;
+	char __user *prog = (char __user *) seccomp_fd->new_prog;
+
+	if (seccomp_fd->new_parent >= 0) {
+		struct fd f;
+
+		f = fdget(seccomp_fd->new_parent);
+		parent = seccomp_filter_from_file(f.file);
+		if (IS_ERR(parent)) {
+			fdput(f);
+			return PTR_ERR(parent);
+		}
+
+		atomic_inc(&parent->usage);
+		fdput(f);
+	}
+
+	filter = seccomp_prepare_user_filter(prog);
+	if (IS_ERR(filter)) {
+		fd = PTR_ERR(filter);
+		goto out;
+	}
+
+	filter->prev = parent;
+
+	fd = anon_inode_getfd("seccomp", &seccomp_fops, filter,
+			      O_RDONLY | O_CLOEXEC);
+out:
+	/* decref iteravely frees parent, so we don't need to do so */
+	if (fd < 0)
+		seccomp_filter_decref(filter);
+
+	return fd;
+}
+
+static long seccomp_fd_install(struct seccomp_fd *seccomp_fd)
+{
+	struct fd f;
+	struct seccomp_filter *filter;
+	int ret = -EINVAL;
+
+	f = fdget(seccomp_fd->install_fd);
+	filter = seccomp_filter_from_file(f.file);
+	if (IS_ERR(filter)) {
+		fdput(f);
+		return PTR_ERR(filter);
+	}
+	atomic_inc(&filter->usage);
+	fdput(f);
+
+	spin_lock_irq(&current->sighand->siglock);
+	if (!seccomp_may_assign_mode(SECCOMP_MODE_FILTER))
+		goto out_sigunlock;
+
+	if (current->seccomp.mode == SECCOMP_MODE_FILTER &&
+	    current->seccomp.filter != filter->prev)
+		goto out_sigunlock;
+
+	ret = seccomp_attach_filter(0, filter);
+	/* This may be the first filter installed, so let's set mode */
+	if (ret >= 0)
+		seccomp_assign_mode(current, SECCOMP_MODE_FILTER);
+
+out_sigunlock:
+	spin_unlock_irq(&current->sighand->siglock);
+
+	/* If the filter failed to install, let's decref it */
+	if (ret < 0)
+		seccomp_filter_decref(filter);
+	return ret;
+}
+
+static long seccomp_fd_dump(struct seccomp_fd *seccomp_fd)
+{
+	struct fd f;
+	int len;
+	struct sock_fprog_kern *orig;
+	struct seccomp_filter *filter;
+
+	f = fdget(seccomp_fd->dump_fd);
+	filter = seccomp_filter_from_file(f.file);
+	if (IS_ERR(filter)) {
+		fdput(f);
+		return PTR_ERR(filter);
+	}
+
+	orig = filter->prog->orig_prog;
+	len = bpf_classic_proglen(orig);
+
+	/* Allow asking how long the filter is by passing a null buffer. */
+	if (seccomp_fd->insns &&
+	    copy_to_user(seccomp_fd->insns, orig->filter, len))
+		len = -EFAULT;
+
+	fdput(f);
+	return len;
+}
+
+static long seccomp_filter_fd(unsigned int cmd,
+			      const char __user *ulayer)
+{
+	long ret;
+	u32 size;
+	struct seccomp_fd seccomp_fd;
+	struct seccomp_fd __user *useccomp_fd =
+		(struct seccomp_fd __user *) ulayer;
+
+	/* As above, we restrict access to seccomp fds to processes who are
+	 * root in their own user ns.
+	 */
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(), current_user_ns(),
+				     CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+
+	if (get_user(size, &useccomp_fd->size))
+		return -EFAULT;
+
+	if (size > sizeof(seccomp_fd)) {
+		unsigned char __user *addr;
+		unsigned char __user *end;
+		unsigned char val;
+
+		addr = (void __user *)useccomp_fd + sizeof(seccomp_fd);
+		end  = (void __user *)useccomp_fd + size;
+
+		for (; addr < end; addr++) {
+			if (get_user(val, addr))
+				return -EFAULT;
+			if (val)
+				return -E2BIG;
+		}
+		size = sizeof(seccomp_fd);
+	}
+
+	if (copy_from_user(&seccomp_fd, useccomp_fd, size))
+		return -EFAULT;
+
+	switch (cmd) {
+	case SECCOMP_FD_NEW:
+		ret = seccomp_fd_new(&seccomp_fd);
+		break;
+	case SECCOMP_FD_INSTALL:
+		ret = seccomp_fd_install(&seccomp_fd);
+		break;
+	case SECCOMP_FD_DUMP:
+		ret = seccomp_fd_dump(&seccomp_fd);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
 #else
 static inline long seccomp_set_mode_filter(unsigned int flags,
 					   const char __user *filter)
 {
 	return -EINVAL;
 }
+
+static inline long seccomp_filter_fd(unsigned int flags,
+				     const char __user *filter)
+{
+	return -EINVAL;
+}
 #endif
 
 /* Common entry point for both prctl and syscall. */
@@ -823,6 +1018,8 @@ static long do_seccomp(unsigned int op, unsigned int flags,
 		return seccomp_set_mode_strict();
 	case SECCOMP_SET_MODE_FILTER:
 		return seccomp_set_mode_filter(flags, uargs);
+	case SECCOMP_FILTER_FD:
+		return seccomp_filter_fd(flags, uargs);
 	default:
 		return -EINVAL;
 	}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/3] seccomp: add a ptrace command to get seccomp filter fds
  2015-10-02 16:27 v5 of seccomp filter c/r patches Tycho Andersen
  2015-10-02 16:27 ` [PATCH v5 1/3] seccomp: add the concept of a seccomp filter FD Tycho Andersen
@ 2015-10-02 16:27 ` Tycho Andersen
  2015-10-02 16:27 ` [PATCH v5 3/3] kcmp: add KCMP_SECCOMP_FD Tycho Andersen
  2015-10-02 21:10 ` v5 of seccomp filter c/r patches Kees Cook
  3 siblings, 0 replies; 16+ messages in thread
From: Tycho Andersen @ 2015-10-02 16:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	netdev, linux-api, Tycho Andersen

I just picked 40 for the constant out of thin air, but there may be a more
appropriate value for this. Also, we return EINVAL when there is no filter
for the index the user requested, but ptrace also returns EINVAL for
invalid commands, making it slightly awkward to test whether or not the
kernel supports this feature. It can still be done via,

if (is_in_mode_filter(pid)) {
	int fd;

	fd = ptrace(PTRACE_SECCOMP_GET_FILTER_FD, pid, NULL, 0);
	if (fd < 0 && errno == -EINVAL)
		/* not supported */

	...
}

since being in SECCOMP_MODE_FILTER implies that there is at least one
filter. If there is a more appropriate errno (ESRCH collides as well with
ptrace) to give here that may be better.

v2: use new bpf interface save_orig to save the original filter when
    necessary

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h     |  9 +++++++++
 include/uapi/linux/ptrace.h |  2 ++
 kernel/ptrace.c             |  4 ++++
 kernel/seccomp.c            | 31 ++++++++++++++++++++++++++++++-
 4 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 4253579..b0b1a52 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -100,4 +100,13 @@ static inline struct seccomp_filter *seccomp_filter_from_file(struct file *f)
 	return ERR_PTR(-EINVAL);
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_CHECKPOINT_RESTORE) && defined(CONFIG_SECCOMP_FILTER)
+extern long seccomp_get_filter_fd(struct task_struct *task, long data);
+#else
+static inline long seccomp_get_filter_fd(struct task_struct *task, long data)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_CHECKPOINT_RESTORE && CONFIG_SECCOMP_FILTER */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..3271f5a 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -23,6 +23,8 @@
 
 #define PTRACE_SYSCALL		  24
 
+#define PTRACE_SECCOMP_GET_FILTER_FD 40
+
 /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
 #define PTRACE_SETOPTIONS	0x4200
 #define PTRACE_GETEVENTMSG	0x4201
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..aede440 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,10 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER_FD:
+		return seccomp_get_filter_fd(child, data);
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index ea3337d..4d2b8f1 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -349,6 +349,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *sfilter;
 	int ret;
+	bool save_orig = config_enabled(CONFIG_CHECKPOINT_RESTORE);
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
 		return ERR_PTR(-EINVAL);
@@ -372,7 +373,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(-ENOMEM);
 
 	ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
-					seccomp_check_filter, false);
+					seccomp_check_filter, save_orig);
 	if (ret < 0) {
 		kfree(sfilter);
 		return ERR_PTR(ret);
@@ -1064,3 +1065,31 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 	/* prctl interface doesn't have flags, so they are always zero. */
 	return do_seccomp(op, 0, uargs);
 }
+
+#if defined(CONFIG_CHECKPOINT_RESTORE) && defined(CONFIG_SECCOMP_FILTER)
+long seccomp_get_filter_fd(struct task_struct *task, long n)
+{
+	struct seccomp_filter *filter;
+	long fd;
+
+	if (task->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EINVAL;
+
+	filter = task->seccomp.filter;
+	while (n > 0 && filter) {
+		filter = filter->prev;
+		n--;
+	}
+
+	if (!filter)
+		return -EINVAL;
+
+	atomic_inc(&filter->usage);
+	fd = anon_inode_getfd("seccomp", &seccomp_fops, filter,
+			      O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		seccomp_filter_decref(filter);
+
+	return fd;
+}
+#endif
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/3] kcmp: add KCMP_SECCOMP_FD
  2015-10-02 16:27 v5 of seccomp filter c/r patches Tycho Andersen
  2015-10-02 16:27 ` [PATCH v5 1/3] seccomp: add the concept of a seccomp filter FD Tycho Andersen
  2015-10-02 16:27 ` [PATCH v5 2/3] seccomp: add a ptrace command to get seccomp filter fds Tycho Andersen
@ 2015-10-02 16:27 ` Tycho Andersen
  2015-10-02 21:10 ` v5 of seccomp filter c/r patches Kees Cook
  3 siblings, 0 replies; 16+ messages in thread
From: Tycho Andersen @ 2015-10-02 16:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	netdev, linux-api, Tycho Andersen

This command allows for comparing the filters pointed to by two seccomp
fds. This is useful e.g. to find out if a seccomp filter is inherited,
since struct seccomp_filter are unique across tasks and are the
private_data seccomp fds.

v2: switch to KCMP_SECCOMP_FD instead of KCMP_FILE_PRIVATE_DATA

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/kcmp.h |  1 +
 kernel/kcmp.c             | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/uapi/linux/kcmp.h b/include/uapi/linux/kcmp.h
index 84df14b..cd7870b 100644
--- a/include/uapi/linux/kcmp.h
+++ b/include/uapi/linux/kcmp.h
@@ -10,6 +10,7 @@ enum kcmp_type {
 	KCMP_SIGHAND,
 	KCMP_IO,
 	KCMP_SYSVSEM,
+	KCMP_SECCOMP_FD,
 
 	KCMP_TYPES,
 };
diff --git a/kernel/kcmp.c b/kernel/kcmp.c
index 0aa69ea..d53db53 100644
--- a/kernel/kcmp.c
+++ b/kernel/kcmp.c
@@ -11,6 +11,7 @@
 #include <linux/bug.h>
 #include <linux/err.h>
 #include <linux/kcmp.h>
+#include <linux/seccomp.h>
 
 #include <asm/unistd.h>
 
@@ -165,6 +166,32 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
 		ret = -EOPNOTSUPP;
 #endif
 		break;
+	case KCMP_SECCOMP_FD: {
+		struct file *filp1, *filp2;
+
+		filp1 = get_file_raw_ptr(task1, idx1);
+		filp2 = get_file_raw_ptr(task2, idx2);
+
+		if (filp1 && filp2) {
+			struct seccomp_filter *filter1, *filter2;
+
+			filter1 = seccomp_filter_from_file(filp1);
+			if (IS_ERR(filter1)) {
+				ret = PTR_ERR(filter1);
+				break;
+			}
+
+			filter2 = seccomp_filter_from_file(filp2);
+			if (IS_ERR(filter2)) {
+				ret = PTR_ERR(filter2);
+				break;
+			}
+
+			ret = kcmp_ptr(filter1, filter2, KCMP_SECCOMP_FD);
+		} else
+			ret = -EBADF;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 16:27 v5 of seccomp filter c/r patches Tycho Andersen
                   ` (2 preceding siblings ...)
  2015-10-02 16:27 ` [PATCH v5 3/3] kcmp: add KCMP_SECCOMP_FD Tycho Andersen
@ 2015-10-02 21:10 ` Kees Cook
  2015-10-02 21:29   ` Andy Lutomirski
  2015-10-02 22:44   ` Tycho Andersen
  3 siblings, 2 replies; 16+ messages in thread
From: Kees Cook @ 2015-10-02 21:10 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> Hi all,
>
> Here's v5 of the seccomp filter c/r set. The individual patch notes have
> changes, but two highlights are:
>
> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>   will need to be built with that patch applied. This gets rid of two incorrect
>   patches in the previous series and is a nicer API.
>
> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>   same struct file across calls, so we still need a kcmp command. I've narrowed
>   the scope of the one being added to only compare seccomp fds.
>
> Thoughts welcome,

Hi, sorry I've been slow/busy. I'm finally reading through these threads.

Happy bit:
- avoiding eBPF and just saving the original filters makes things much easier.

Sad bit:
- inventing a new interface for seccompfds feels like massive overkill to me.

While Andy has big dreams, we're not presently doing seccompfd
monitoring, etc. There's no driving user for that kind of interface,
and accepting the maintenance burden of it only for CRIU seems unwise.

So, I'll go back to what I originally proposed at LSS (which it looks
like we're half way there now):

- save the original filter (done!)
- extract filters through a single special-purpose interface (looks
like ptrace is the way to go: root-only, stopped process, etc)
- compare filter content and issue TSYNCs to merge detected sibling
threads, since merging things that weren't merged before creates no
problems.

This means the parenting logic is heuristic, but it's entirely in
userspace, so the complexity burden doesn't live in seccomp which we,
by design, want to keep as simple as possible.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 21:10 ` v5 of seccomp filter c/r patches Kees Cook
@ 2015-10-02 21:29   ` Andy Lutomirski
  2015-10-02 22:02     ` Kees Cook
  2015-10-02 22:44   ` Tycho Andersen
  1 sibling, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2015-10-02 21:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
>> Hi all,
>>
>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>> changes, but two highlights are:
>>
>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>>   will need to be built with that patch applied. This gets rid of two incorrect
>>   patches in the previous series and is a nicer API.
>>
>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>>   same struct file across calls, so we still need a kcmp command. I've narrowed
>>   the scope of the one being added to only compare seccomp fds.
>>
>> Thoughts welcome,
>
> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>
> Happy bit:
> - avoiding eBPF and just saving the original filters makes things much easier.
>
> Sad bit:
> - inventing a new interface for seccompfds feels like massive overkill to me.
>
> While Andy has big dreams, we're not presently doing seccompfd
> monitoring, etc. There's no driving user for that kind of interface,
> and accepting the maintenance burden of it only for CRIU seems unwise.
>
> So, I'll go back to what I originally proposed at LSS (which it looks
> like we're half way there now):
>
> - save the original filter (done!)
> - extract filters through a single special-purpose interface (looks
> like ptrace is the way to go: root-only, stopped process, etc)
> - compare filter content and issue TSYNCs to merge detected sibling
> threads, since merging things that weren't merged before creates no
> problems.
>
> This means the parenting logic is heuristic, but it's entirely in
> userspace, so the complexity burden doesn't live in seccomp which we,
> by design, want to keep as simple as possible.

This is okay with me with a future-proofing caveat: I think that
whatever reads out the filter should be clearly documented as
returning some special error code that indicates that that filter it
tried to read wasn't in the expected form.  That would happen for
native eBPF filters, and it would also happen for seccomp monitors
even if those monitors use classic BPF.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 21:29   ` Andy Lutomirski
@ 2015-10-02 22:02     ` Kees Cook
  2015-10-02 22:04       ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Kees Cook @ 2015-10-02 22:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>>> Hi all,
>>>
>>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>>> changes, but two highlights are:
>>>
>>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>>>   will need to be built with that patch applied. This gets rid of two incorrect
>>>   patches in the previous series and is a nicer API.
>>>
>>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>>>   same struct file across calls, so we still need a kcmp command. I've narrowed
>>>   the scope of the one being added to only compare seccomp fds.
>>>
>>> Thoughts welcome,
>>
>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>
>> Happy bit:
>> - avoiding eBPF and just saving the original filters makes things much easier.
>>
>> Sad bit:
>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>
>> While Andy has big dreams, we're not presently doing seccompfd
>> monitoring, etc. There's no driving user for that kind of interface,
>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>
>> So, I'll go back to what I originally proposed at LSS (which it looks
>> like we're half way there now):
>>
>> - save the original filter (done!)
>> - extract filters through a single special-purpose interface (looks
>> like ptrace is the way to go: root-only, stopped process, etc)
>> - compare filter content and issue TSYNCs to merge detected sibling
>> threads, since merging things that weren't merged before creates no
>> problems.
>>
>> This means the parenting logic is heuristic, but it's entirely in
>> userspace, so the complexity burden doesn't live in seccomp which we,
>> by design, want to keep as simple as possible.
>
> This is okay with me with a future-proofing caveat: I think that
> whatever reads out the filter should be clearly documented as
> returning some special error code that indicates that that filter it
> tried to read wasn't in the expected form.  That would happen for
> native eBPF filters, and it would also happen for seccomp monitors
> even if those monitors use classic BPF.

As in, it should have something like "give me BPF" and that'll start
failing when it's only eBPF in the future?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:02     ` Kees Cook
@ 2015-10-02 22:04       ` Andy Lutomirski
  2015-10-02 22:06         ` Kees Cook
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2015-10-02 22:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 3:02 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>>> <tycho.andersen@canonical.com> wrote:
>>>> Hi all,
>>>>
>>>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>>>> changes, but two highlights are:
>>>>
>>>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>>>>   will need to be built with that patch applied. This gets rid of two incorrect
>>>>   patches in the previous series and is a nicer API.
>>>>
>>>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>>>>   same struct file across calls, so we still need a kcmp command. I've narrowed
>>>>   the scope of the one being added to only compare seccomp fds.
>>>>
>>>> Thoughts welcome,
>>>
>>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>>
>>> Happy bit:
>>> - avoiding eBPF and just saving the original filters makes things much easier.
>>>
>>> Sad bit:
>>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>>
>>> While Andy has big dreams, we're not presently doing seccompfd
>>> monitoring, etc. There's no driving user for that kind of interface,
>>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>>
>>> So, I'll go back to what I originally proposed at LSS (which it looks
>>> like we're half way there now):
>>>
>>> - save the original filter (done!)
>>> - extract filters through a single special-purpose interface (looks
>>> like ptrace is the way to go: root-only, stopped process, etc)
>>> - compare filter content and issue TSYNCs to merge detected sibling
>>> threads, since merging things that weren't merged before creates no
>>> problems.
>>>
>>> This means the parenting logic is heuristic, but it's entirely in
>>> userspace, so the complexity burden doesn't live in seccomp which we,
>>> by design, want to keep as simple as possible.
>>
>> This is okay with me with a future-proofing caveat: I think that
>> whatever reads out the filter should be clearly documented as
>> returning some special error code that indicates that that filter it
>> tried to read wasn't in the expected form.  That would happen for
>> native eBPF filters, and it would also happen for seccomp monitors
>> even if those monitors use classic BPF.
>
> As in, it should have something like "give me BPF" and that'll start
> failing when it's only eBPF in the future?

Yes, but it might also start failing when if my dreams come true, it's
still classic BPF, but it's no longer a classic seccomp bpf filter
layer with the semantics we expect today.  (E.g. if it's classic bpf
but has a monitor attached, then the read should fail because
restoring it without restoring the monitor will cause all kinds of
mess.)

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:04       ` Andy Lutomirski
@ 2015-10-02 22:06         ` Kees Cook
  2015-10-02 22:16           ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Kees Cook @ 2015-10-02 22:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 3:04 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Oct 2, 2015 at 3:02 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook <keescook@chromium.org> wrote:
>>>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>>>> <tycho.andersen@canonical.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>>>>> changes, but two highlights are:
>>>>>
>>>>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>>>>>   will need to be built with that patch applied. This gets rid of two incorrect
>>>>>   patches in the previous series and is a nicer API.
>>>>>
>>>>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>>>>>   same struct file across calls, so we still need a kcmp command. I've narrowed
>>>>>   the scope of the one being added to only compare seccomp fds.
>>>>>
>>>>> Thoughts welcome,
>>>>
>>>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>>>
>>>> Happy bit:
>>>> - avoiding eBPF and just saving the original filters makes things much easier.
>>>>
>>>> Sad bit:
>>>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>>>
>>>> While Andy has big dreams, we're not presently doing seccompfd
>>>> monitoring, etc. There's no driving user for that kind of interface,
>>>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>>>
>>>> So, I'll go back to what I originally proposed at LSS (which it looks
>>>> like we're half way there now):
>>>>
>>>> - save the original filter (done!)
>>>> - extract filters through a single special-purpose interface (looks
>>>> like ptrace is the way to go: root-only, stopped process, etc)
>>>> - compare filter content and issue TSYNCs to merge detected sibling
>>>> threads, since merging things that weren't merged before creates no
>>>> problems.
>>>>
>>>> This means the parenting logic is heuristic, but it's entirely in
>>>> userspace, so the complexity burden doesn't live in seccomp which we,
>>>> by design, want to keep as simple as possible.
>>>
>>> This is okay with me with a future-proofing caveat: I think that
>>> whatever reads out the filter should be clearly documented as
>>> returning some special error code that indicates that that filter it
>>> tried to read wasn't in the expected form.  That would happen for
>>> native eBPF filters, and it would also happen for seccomp monitors
>>> even if those monitors use classic BPF.
>>
>> As in, it should have something like "give me BPF" and that'll start
>> failing when it's only eBPF in the future?
>
> Yes, but it might also start failing when if my dreams come true, it's
> still classic BPF, but it's no longer a classic seccomp bpf filter
> layer with the semantics we expect today.  (E.g. if it's classic bpf
> but has a monitor attached, then the read should fail because
> restoring it without restoring the monitor will cause all kinds of
> mess.)

Ah-ha! Understood, and yeah, that seems fine.

Speaking of dreams -- what do you think about re-running seccomp in
the face of changed syscalls due to ptrace? Closing the ptrace hole
would be really nice.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:06         ` Kees Cook
@ 2015-10-02 22:16           ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2015-10-02 22:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 3:06 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Oct 2, 2015 at 3:04 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Oct 2, 2015 at 3:02 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook <keescook@chromium.org> wrote:
>>>>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>>>>> <tycho.andersen@canonical.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>>>>>> changes, but two highlights are:
>>>>>>
>>>>>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>>>>>>   will need to be built with that patch applied. This gets rid of two incorrect
>>>>>>   patches in the previous series and is a nicer API.
>>>>>>
>>>>>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>>>>>>   same struct file across calls, so we still need a kcmp command. I've narrowed
>>>>>>   the scope of the one being added to only compare seccomp fds.
>>>>>>
>>>>>> Thoughts welcome,
>>>>>
>>>>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>>>>
>>>>> Happy bit:
>>>>> - avoiding eBPF and just saving the original filters makes things much easier.
>>>>>
>>>>> Sad bit:
>>>>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>>>>
>>>>> While Andy has big dreams, we're not presently doing seccompfd
>>>>> monitoring, etc. There's no driving user for that kind of interface,
>>>>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>>>>
>>>>> So, I'll go back to what I originally proposed at LSS (which it looks
>>>>> like we're half way there now):
>>>>>
>>>>> - save the original filter (done!)
>>>>> - extract filters through a single special-purpose interface (looks
>>>>> like ptrace is the way to go: root-only, stopped process, etc)
>>>>> - compare filter content and issue TSYNCs to merge detected sibling
>>>>> threads, since merging things that weren't merged before creates no
>>>>> problems.
>>>>>
>>>>> This means the parenting logic is heuristic, but it's entirely in
>>>>> userspace, so the complexity burden doesn't live in seccomp which we,
>>>>> by design, want to keep as simple as possible.
>>>>
>>>> This is okay with me with a future-proofing caveat: I think that
>>>> whatever reads out the filter should be clearly documented as
>>>> returning some special error code that indicates that that filter it
>>>> tried to read wasn't in the expected form.  That would happen for
>>>> native eBPF filters, and it would also happen for seccomp monitors
>>>> even if those monitors use classic BPF.
>>>
>>> As in, it should have something like "give me BPF" and that'll start
>>> failing when it's only eBPF in the future?
>>
>> Yes, but it might also start failing when if my dreams come true, it's
>> still classic BPF, but it's no longer a classic seccomp bpf filter
>> layer with the semantics we expect today.  (E.g. if it's classic bpf
>> but has a monitor attached, then the read should fail because
>> restoring it without restoring the monitor will cause all kinds of
>> mess.)
>
> Ah-ha! Understood, and yeah, that seems fine.
>
> Speaking of dreams -- what do you think about re-running seccomp in
> the face of changed syscalls due to ptrace? Closing the ptrace hole
> would be really nice.

Yes, absolutely!  We might even want to just move the seccomp check
after ptrace (except for seccomp-induced ptrace).

Unfortunately, I backed us into a corner with two-phase seccomp on
x86, and it's a big mess.  (I wrote the seccomp vs ptrace patches, and
I don't think they're acceptable.)  My big x86 low-level rewrite is an
attempt to get back out of that corner, and I'm hoping to resubmit the
bulk of it today or tomorrow.  Once that happens, I just need to fix
up the 64-bit native case (trivial, I know) and then revert two-phase
seccomp.

One nice outcome of all of this will be that the syscall tables will
contain bona fide C ABI compliant function pointers, which is
currently not the case.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 21:10 ` v5 of seccomp filter c/r patches Kees Cook
  2015-10-02 21:29   ` Andy Lutomirski
@ 2015-10-02 22:44   ` Tycho Andersen
  2015-10-02 22:52     ` Andy Lutomirski
  2015-10-02 22:57     ` Daniel Borkmann
  1 sibling, 2 replies; 16+ messages in thread
From: Tycho Andersen @ 2015-10-02 22:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > Hi all,
> >
> > Here's v5 of the seccomp filter c/r set. The individual patch notes have
> > changes, but two highlights are:
> >
> > * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
> >   will need to be built with that patch applied. This gets rid of two incorrect
> >   patches in the previous series and is a nicer API.
> >
> > * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
> >   same struct file across calls, so we still need a kcmp command. I've narrowed
> >   the scope of the one being added to only compare seccomp fds.
> >
> > Thoughts welcome,
> 
> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
> 
> Happy bit:
> - avoiding eBPF and just saving the original filters makes things much easier.
> 
> Sad bit:
> - inventing a new interface for seccompfds feels like massive overkill to me.
> 
> While Andy has big dreams, we're not presently doing seccompfd
> monitoring, etc. There's no driving user for that kind of interface,
> and accepting the maintenance burden of it only for CRIU seems unwise.
> 
> So, I'll go back to what I originally proposed at LSS (which it looks
> like we're half way there now):
> 
> - save the original filter (done!)
> - extract filters through a single special-purpose interface (looks
> like ptrace is the way to go: root-only, stopped process, etc)
> - compare filter content and issue TSYNCs to merge detected sibling
> threads, since merging things that weren't merged before creates no
> problems.
> 
> This means the parenting logic is heuristic, but it's entirely in
> userspace, so the complexity burden doesn't live in seccomp which we,
> by design, want to keep as simple as possible.

Ok, how about,

struct sock_filter insns[BPF_MAXINSNS];
insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);

when asking for the ith filter? It returns either the number of
instructions, -EINVAL if something was wrong (i, pid,
CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
succeed now, if/when the underlying filter was not created from a bpf
classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
picked this mostly based on what sounds nice.)

Tycho

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:44   ` Tycho Andersen
@ 2015-10-02 22:52     ` Andy Lutomirski
  2015-10-02 22:56       ` Tycho Andersen
  2015-10-02 22:57     ` Daniel Borkmann
  1 sibling, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2015-10-02 22:52 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 3:44 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>> <tycho.andersen@canonical.com> wrote:
>> > Hi all,
>> >
>> > Here's v5 of the seccomp filter c/r set. The individual patch notes have
>> > changes, but two highlights are:
>> >
>> > * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>> >   will need to be built with that patch applied. This gets rid of two incorrect
>> >   patches in the previous series and is a nicer API.
>> >
>> > * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>> >   same struct file across calls, so we still need a kcmp command. I've narrowed
>> >   the scope of the one being added to only compare seccomp fds.
>> >
>> > Thoughts welcome,
>>
>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>
>> Happy bit:
>> - avoiding eBPF and just saving the original filters makes things much easier.
>>
>> Sad bit:
>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>
>> While Andy has big dreams, we're not presently doing seccompfd
>> monitoring, etc. There's no driving user for that kind of interface,
>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>
>> So, I'll go back to what I originally proposed at LSS (which it looks
>> like we're half way there now):
>>
>> - save the original filter (done!)
>> - extract filters through a single special-purpose interface (looks
>> like ptrace is the way to go: root-only, stopped process, etc)
>> - compare filter content and issue TSYNCs to merge detected sibling
>> threads, since merging things that weren't merged before creates no
>> problems.
>>
>> This means the parenting logic is heuristic, but it's entirely in
>> userspace, so the complexity burden doesn't live in seccomp which we,
>> by design, want to keep as simple as possible.
>
> Ok, how about,
>
> struct sock_filter insns[BPF_MAXINSNS];
> insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
>
> when asking for the ith filter? It returns either the number of
> instructions, -EINVAL if something was wrong (i, pid,
> CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
> succeed now, if/when the underlying filter was not created from a bpf
> classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
> picked this mostly based on what sounds nice.)
>

Are we still requiring global permissions or that the caller isn't
seccomped at all?  I've not lost track of how we're resolving the case
where the caller and the tracee have exactly the same seccomp state
(or the tracee is derived from the caller's state or they're totally
unrelated states).

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:52     ` Andy Lutomirski
@ 2015-10-02 22:56       ` Tycho Andersen
  0 siblings, 0 replies; 16+ messages in thread
From: Tycho Andersen @ 2015-10-02 22:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development, Linux API

On Fri, Oct 02, 2015 at 03:52:03PM -0700, Andy Lutomirski wrote:
> On Fri, Oct 2, 2015 at 3:44 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
> >> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > Hi all,
> >> >
> >> > Here's v5 of the seccomp filter c/r set. The individual patch notes have
> >> > changes, but two highlights are:
> >> >
> >> > * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
> >> >   will need to be built with that patch applied. This gets rid of two incorrect
> >> >   patches in the previous series and is a nicer API.
> >> >
> >> > * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
> >> >   same struct file across calls, so we still need a kcmp command. I've narrowed
> >> >   the scope of the one being added to only compare seccomp fds.
> >> >
> >> > Thoughts welcome,
> >>
> >> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
> >>
> >> Happy bit:
> >> - avoiding eBPF and just saving the original filters makes things much easier.
> >>
> >> Sad bit:
> >> - inventing a new interface for seccompfds feels like massive overkill to me.
> >>
> >> While Andy has big dreams, we're not presently doing seccompfd
> >> monitoring, etc. There's no driving user for that kind of interface,
> >> and accepting the maintenance burden of it only for CRIU seems unwise.
> >>
> >> So, I'll go back to what I originally proposed at LSS (which it looks
> >> like we're half way there now):
> >>
> >> - save the original filter (done!)
> >> - extract filters through a single special-purpose interface (looks
> >> like ptrace is the way to go: root-only, stopped process, etc)
> >> - compare filter content and issue TSYNCs to merge detected sibling
> >> threads, since merging things that weren't merged before creates no
> >> problems.
> >>
> >> This means the parenting logic is heuristic, but it's entirely in
> >> userspace, so the complexity burden doesn't live in seccomp which we,
> >> by design, want to keep as simple as possible.
> >
> > Ok, how about,
> >
> > struct sock_filter insns[BPF_MAXINSNS];
> > insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
> >
> > when asking for the ith filter? It returns either the number of
> > instructions, -EINVAL if something was wrong (i, pid,
> > CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
> > succeed now, if/when the underlying filter was not created from a bpf
> > classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
> > picked this mostly based on what sounds nice.)
> >
> 
> Are we still requiring global permissions or that the caller isn't
> seccomped at all?  I've not lost track of how we're resolving the case
> where the caller and the tracee have exactly the same seccomp state
> (or the tracee is derived from the caller's state or they're totally
> unrelated states).

At least for now, I think requiring real root and no seccomp is fine,
so I'll do that.

Tycho

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:44   ` Tycho Andersen
  2015-10-02 22:52     ` Andy Lutomirski
@ 2015-10-02 22:57     ` Daniel Borkmann
  2015-10-02 22:59       ` Tycho Andersen
  2015-10-02 23:00       ` Kees Cook
  1 sibling, 2 replies; 16+ messages in thread
From: Daniel Borkmann @ 2015-10-02 22:57 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, LKML, Network Development,
	Linux API

On 10/03/2015 12:44 AM, Tycho Andersen wrote:
> On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
...
> Ok, how about,
>
> struct sock_filter insns[BPF_MAXINSNS];
> insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);

Would also be good that when the storage buffer (insns) is NULL,
it just returns you the number of sock_filter insns (or 0 when
nothing attached).

That would be consistent with classic socket filters (see
sk_get_filter()), and user space could allocate a specific
size instead of always passing in max insns.

> when asking for the ith filter? It returns either the number of
> instructions, -EINVAL if something was wrong (i, pid,
> CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
> succeed now, if/when the underlying filter was not created from a bpf
> classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
> picked this mostly based on what sounds nice.)
>
> Tycho
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:57     ` Daniel Borkmann
@ 2015-10-02 22:59       ` Tycho Andersen
  2015-10-02 23:00       ` Kees Cook
  1 sibling, 0 replies; 16+ messages in thread
From: Tycho Andersen @ 2015-10-02 22:59 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, LKML,
	Network Development, Linux API

On Sat, Oct 03, 2015 at 12:57:49AM +0200, Daniel Borkmann wrote:
> On 10/03/2015 12:44 AM, Tycho Andersen wrote:
> >On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
> ...
> >Ok, how about,
> >
> >struct sock_filter insns[BPF_MAXINSNS];
> >insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
> 
> Would also be good that when the storage buffer (insns) is NULL,
> it just returns you the number of sock_filter insns (or 0 when
> nothing attached).
> 
> That would be consistent with classic socket filters (see
> sk_get_filter()), and user space could allocate a specific
> size instead of always passing in max insns.

Yep, the current set does this with SECCOMP_FD_DUMP and I agree that
it's nice behavior, so I'll plan on preserving it.

Thanks,

Tycho

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: v5 of seccomp filter c/r patches
  2015-10-02 22:57     ` Daniel Borkmann
  2015-10-02 22:59       ` Tycho Andersen
@ 2015-10-02 23:00       ` Kees Cook
  1 sibling, 0 replies; 16+ messages in thread
From: Kees Cook @ 2015-10-02 23:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Oleg Nesterov,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, LKML,
	Network Development, Linux API

On Fri, Oct 2, 2015 at 3:57 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 10/03/2015 12:44 AM, Tycho Andersen wrote:
>>
>> On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
>
> ...
>>
>> Ok, how about,
>>
>> struct sock_filter insns[BPF_MAXINSNS];
>> insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
>
>
> Would also be good that when the storage buffer (insns) is NULL,
> it just returns you the number of sock_filter insns (or 0 when
> nothing attached).
>
> That would be consistent with classic socket filters (see
> sk_get_filter()), and user space could allocate a specific
> size instead of always passing in max insns.

Yes please. :)

>> when asking for the ith filter? It returns either the number of
>> instructions, -EINVAL if something was wrong (i, pid,
>> CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
>> succeed now, if/when the underlying filter was not created from a bpf
>> classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
>> picked this mostly based on what sounds nice.)

We can bikeshed the non-classic case when we need it, but I think
EINVAL is "not under seccomp", and ENOENT is "no such index".

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-10-02 23:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-02 16:27 v5 of seccomp filter c/r patches Tycho Andersen
2015-10-02 16:27 ` [PATCH v5 1/3] seccomp: add the concept of a seccomp filter FD Tycho Andersen
2015-10-02 16:27 ` [PATCH v5 2/3] seccomp: add a ptrace command to get seccomp filter fds Tycho Andersen
2015-10-02 16:27 ` [PATCH v5 3/3] kcmp: add KCMP_SECCOMP_FD Tycho Andersen
2015-10-02 21:10 ` v5 of seccomp filter c/r patches Kees Cook
2015-10-02 21:29   ` Andy Lutomirski
2015-10-02 22:02     ` Kees Cook
2015-10-02 22:04       ` Andy Lutomirski
2015-10-02 22:06         ` Kees Cook
2015-10-02 22:16           ` Andy Lutomirski
2015-10-02 22:44   ` Tycho Andersen
2015-10-02 22:52     ` Andy Lutomirski
2015-10-02 22:56       ` Tycho Andersen
2015-10-02 22:57     ` Daniel Borkmann
2015-10-02 22:59       ` Tycho Andersen
2015-10-02 23:00       ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).