All of lore.kernel.org
 help / color / mirror / Atom feed
* v8 of seccomp filter c/r
@ 2015-10-20 19:50 Tycho Andersen
  2015-10-20 19:50   ` Tycho Andersen
  0 siblings, 1 reply; 27+ messages in thread
From: Tycho Andersen @ 2015-10-20 19:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

Hi all,

Here's v8 of the seccomp filter c/r stuff, which has some minor changes
dropping a lock and changing a constant.

Note that this is now based on http://patchwork.ozlabs.org/patch/525492/ and
will need to be built with that patch applied. This gets rid of two incorrect
patches in a previous series and is a nicer API.

Thoughts welcome,

Tycho


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-20 19:50   ` Tycho Andersen
  0 siblings, 0 replies; 27+ messages in thread
From: Tycho Andersen @ 2015-10-20 19:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api, Tycho Andersen

This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
    * check that the orig_prog exists (not necessary right now, but when
       grows eBPF support it will be)
    * s/n/filter_off and make it an unsigned long to match ptrace
    * count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
    * use a 0x42** constant for the ptrace command value

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h     | 11 +++++++
 include/uapi/linux/ptrace.h |  2 ++
 kernel/ptrace.c             |  5 ++++
 kernel/seccomp.c            | 72 ++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..2296e6b 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,15 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+extern long seccomp_get_filter(struct task_struct *task,
+			       unsigned long filter_off, void __user *data);
+#else
+static inline long seccomp_get_filter(struct task_struct *task,
+				      unsigned long n, void __user *data)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..fb81065 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -64,6 +64,8 @@ struct ptrace_peeksiginfo_args {
 #define PTRACE_GETSIGMASK	0x420a
 #define PTRACE_SETSIGMASK	0x420b
 
+#define PTRACE_SECCOMP_GET_FILTER	0x420c
+
 /* Read signals from a shared (process wide) queue */
 #define PTRACE_PEEKSIGINFO_SHARED	(1 << 0)
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..b760bae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,11 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER:
+		ret = seccomp_get_filter(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 06858a7..9a9008f 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -347,6 +347,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *sfilter;
 	int ret;
+	const bool save_orig = config_enabled(CONFIG_CHECKPOINT_RESTORE);
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
 		return ERR_PTR(-EINVAL);
@@ -370,7 +371,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(-ENOMEM);
 
 	ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
-					seccomp_check_filter, false);
+					seccomp_check_filter, save_orig);
 	if (ret < 0) {
 		kfree(sfilter);
 		return ERR_PTR(ret);
@@ -867,3 +868,72 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 	/* prctl interface doesn't have flags, so they are always zero. */
 	return do_seccomp(op, 0, uargs);
 }
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
+			void __user *data)
+{
+	struct seccomp_filter *filter;
+	struct sock_fprog_kern *fprog;
+	long ret;
+	unsigned long count = 0;
+
+	if (!capable(CAP_SYS_ADMIN) ||
+	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
+		return -EACCES;
+	}
+
+	spin_lock_irq(&task->sighand->siglock);
+	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
+		ret = -EINVAL;
+		goto out_task;
+	}
+
+	filter = task->seccomp.filter;
+	while (filter) {
+		filter = filter->prev;
+		count++;
+	}
+
+	if (filter_off >= count) {
+		ret = -ENOENT;
+		goto out_task;
+	}
+	count -= filter_off;
+
+	filter = task->seccomp.filter;
+	while (filter && count > 1) {
+		filter = filter->prev;
+		count--;
+	}
+
+	if (WARN_ON(count != 1)) {
+		/* The filter tree shouldn't shrink while we're using it. */
+		ret = -ENOENT;
+		goto out_task;
+	}
+
+	fprog = filter->prog->orig_prog;
+	if (!fprog) {
+		/* This must be a new non-cBPF filter, since we save every
+		 * every cBPF filter's orig_prog above when
+		 * CONFIG_CHECKPOINT_RESTORE is enabled.
+		 */
+		ret = -EMEDIUMTYPE;
+		goto out_task;
+	}
+
+	ret = fprog->len;
+	if (!data)
+		goto out_task;
+
+	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog))) {
+		ret = -EFAULT;
+		goto out_task;
+	}
+
+out_task:
+	spin_unlock_irq(&task->sighand->siglock);
+	return ret;
+}
+#endif
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-20 19:50   ` Tycho Andersen
  0 siblings, 0 replies; 27+ messages in thread
From: Tycho Andersen @ 2015-10-20 19:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Tycho Andersen

This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
    * check that the orig_prog exists (not necessary right now, but when
       grows eBPF support it will be)
    * s/n/filter_off and make it an unsigned long to match ptrace
    * count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
    * use a 0x42** constant for the ptrace command value

Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
---
 include/linux/seccomp.h     | 11 +++++++
 include/uapi/linux/ptrace.h |  2 ++
 kernel/ptrace.c             |  5 ++++
 kernel/seccomp.c            | 72 ++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..2296e6b 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,15 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+extern long seccomp_get_filter(struct task_struct *task,
+			       unsigned long filter_off, void __user *data);
+#else
+static inline long seccomp_get_filter(struct task_struct *task,
+				      unsigned long n, void __user *data)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..fb81065 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -64,6 +64,8 @@ struct ptrace_peeksiginfo_args {
 #define PTRACE_GETSIGMASK	0x420a
 #define PTRACE_SETSIGMASK	0x420b
 
+#define PTRACE_SECCOMP_GET_FILTER	0x420c
+
 /* Read signals from a shared (process wide) queue */
 #define PTRACE_PEEKSIGINFO_SHARED	(1 << 0)
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..b760bae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,11 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER:
+		ret = seccomp_get_filter(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 06858a7..9a9008f 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -347,6 +347,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *sfilter;
 	int ret;
+	const bool save_orig = config_enabled(CONFIG_CHECKPOINT_RESTORE);
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
 		return ERR_PTR(-EINVAL);
@@ -370,7 +371,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(-ENOMEM);
 
 	ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
-					seccomp_check_filter, false);
+					seccomp_check_filter, save_orig);
 	if (ret < 0) {
 		kfree(sfilter);
 		return ERR_PTR(ret);
@@ -867,3 +868,72 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 	/* prctl interface doesn't have flags, so they are always zero. */
 	return do_seccomp(op, 0, uargs);
 }
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
+			void __user *data)
+{
+	struct seccomp_filter *filter;
+	struct sock_fprog_kern *fprog;
+	long ret;
+	unsigned long count = 0;
+
+	if (!capable(CAP_SYS_ADMIN) ||
+	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
+		return -EACCES;
+	}
+
+	spin_lock_irq(&task->sighand->siglock);
+	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
+		ret = -EINVAL;
+		goto out_task;
+	}
+
+	filter = task->seccomp.filter;
+	while (filter) {
+		filter = filter->prev;
+		count++;
+	}
+
+	if (filter_off >= count) {
+		ret = -ENOENT;
+		goto out_task;
+	}
+	count -= filter_off;
+
+	filter = task->seccomp.filter;
+	while (filter && count > 1) {
+		filter = filter->prev;
+		count--;
+	}
+
+	if (WARN_ON(count != 1)) {
+		/* The filter tree shouldn't shrink while we're using it. */
+		ret = -ENOENT;
+		goto out_task;
+	}
+
+	fprog = filter->prog->orig_prog;
+	if (!fprog) {
+		/* This must be a new non-cBPF filter, since we save every
+		 * every cBPF filter's orig_prog above when
+		 * CONFIG_CHECKPOINT_RESTORE is enabled.
+		 */
+		ret = -EMEDIUMTYPE;
+		goto out_task;
+	}
+
+	ret = fprog->len;
+	if (!data)
+		goto out_task;
+
+	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog))) {
+		ret = -EFAULT;
+		goto out_task;
+	}
+
+out_task:
+	spin_unlock_irq(&task->sighand->siglock);
+	return ret;
+}
+#endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-20 20:20     ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-20 20:20 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

On 10/20, Tycho Andersen wrote:
>
> +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
> +			void __user *data)
> +{
> +	struct seccomp_filter *filter;
> +	struct sock_fprog_kern *fprog;
> +	long ret;
> +	unsigned long count = 0;
> +
> +	if (!capable(CAP_SYS_ADMIN) ||
> +	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
> +		return -EACCES;
> +	}
> +
> +	spin_lock_irq(&task->sighand->siglock);
> +	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
> +		ret = -EINVAL;
> +		goto out_task;
> +	}
> +
> +	filter = task->seccomp.filter;
> +	while (filter) {
> +		filter = filter->prev;
> +		count++;
> +	}
> +
> +	if (filter_off >= count) {
> +		ret = -ENOENT;
> +		goto out_task;
> +	}
> +	count -= filter_off;
> +
> +	filter = task->seccomp.filter;
> +	while (filter && count > 1) {
> +		filter = filter->prev;
> +		count--;
> +	}
> +
> +	if (WARN_ON(count != 1)) {
> +		/* The filter tree shouldn't shrink while we're using it. */
> +		ret = -ENOENT;
> +		goto out_task;
> +	}
> +
> +	fprog = filter->prog->orig_prog;
> +	if (!fprog) {
> +		/* This must be a new non-cBPF filter, since we save every
> +		 * every cBPF filter's orig_prog above when
> +		 * CONFIG_CHECKPOINT_RESTORE is enabled.
> +		 */
> +		ret = -EMEDIUMTYPE;
> +		goto out_task;
> +	}
> +
> +	ret = fprog->len;
> +	if (!data)
> +		goto out_task;
> +
> +	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog))) {
> +		ret = -EFAULT;
> +		goto out_task;
> +	}

Oh wait, I didn't notice this when I looked at v7.

No, you can't do copy_to_user() from atomic context. You need to pin this
filter, drop the lock/irq, then copy_to_user().

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-20 20:20     ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-20 20:20 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 10/20, Tycho Andersen wrote:
>
> +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
> +			void __user *data)
> +{
> +	struct seccomp_filter *filter;
> +	struct sock_fprog_kern *fprog;
> +	long ret;
> +	unsigned long count = 0;
> +
> +	if (!capable(CAP_SYS_ADMIN) ||
> +	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
> +		return -EACCES;
> +	}
> +
> +	spin_lock_irq(&task->sighand->siglock);
> +	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
> +		ret = -EINVAL;
> +		goto out_task;
> +	}
> +
> +	filter = task->seccomp.filter;
> +	while (filter) {
> +		filter = filter->prev;
> +		count++;
> +	}
> +
> +	if (filter_off >= count) {
> +		ret = -ENOENT;
> +		goto out_task;
> +	}
> +	count -= filter_off;
> +
> +	filter = task->seccomp.filter;
> +	while (filter && count > 1) {
> +		filter = filter->prev;
> +		count--;
> +	}
> +
> +	if (WARN_ON(count != 1)) {
> +		/* The filter tree shouldn't shrink while we're using it. */
> +		ret = -ENOENT;
> +		goto out_task;
> +	}
> +
> +	fprog = filter->prog->orig_prog;
> +	if (!fprog) {
> +		/* This must be a new non-cBPF filter, since we save every
> +		 * every cBPF filter's orig_prog above when
> +		 * CONFIG_CHECKPOINT_RESTORE is enabled.
> +		 */
> +		ret = -EMEDIUMTYPE;
> +		goto out_task;
> +	}
> +
> +	ret = fprog->len;
> +	if (!data)
> +		goto out_task;
> +
> +	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog))) {
> +		ret = -EFAULT;
> +		goto out_task;
> +	}

Oh wait, I didn't notice this when I looked at v7.

No, you can't do copy_to_user() from atomic context. You need to pin this
filter, drop the lock/irq, then copy_to_user().

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-20 20:20     ` Oleg Nesterov
  (?)
@ 2015-10-20 20:26     ` Kees Cook
  2015-10-20 20:37       ` Tycho Andersen
  -1 siblings, 1 reply; 27+ messages in thread
From: Kees Cook @ 2015-10-20 20:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Tue, Oct 20, 2015 at 1:20 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/20, Tycho Andersen wrote:
>>
>> +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
>> +                     void __user *data)
>> +{
>> +     struct seccomp_filter *filter;
>> +     struct sock_fprog_kern *fprog;
>> +     long ret;
>> +     unsigned long count = 0;
>> +
>> +     if (!capable(CAP_SYS_ADMIN) ||
>> +         current->seccomp.mode != SECCOMP_MODE_DISABLED) {
>> +             return -EACCES;
>> +     }
>> +
>> +     spin_lock_irq(&task->sighand->siglock);
>> +     if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
>> +             ret = -EINVAL;
>> +             goto out_task;
>> +     }
>> +
>> +     filter = task->seccomp.filter;
>> +     while (filter) {
>> +             filter = filter->prev;
>> +             count++;
>> +     }
>> +
>> +     if (filter_off >= count) {
>> +             ret = -ENOENT;
>> +             goto out_task;
>> +     }
>> +     count -= filter_off;
>> +
>> +     filter = task->seccomp.filter;
>> +     while (filter && count > 1) {
>> +             filter = filter->prev;
>> +             count--;
>> +     }
>> +
>> +     if (WARN_ON(count != 1)) {
>> +             /* The filter tree shouldn't shrink while we're using it. */
>> +             ret = -ENOENT;
>> +             goto out_task;
>> +     }
>> +
>> +     fprog = filter->prog->orig_prog;
>> +     if (!fprog) {
>> +             /* This must be a new non-cBPF filter, since we save every
>> +              * every cBPF filter's orig_prog above when
>> +              * CONFIG_CHECKPOINT_RESTORE is enabled.
>> +              */
>> +             ret = -EMEDIUMTYPE;
>> +             goto out_task;
>> +     }
>> +
>> +     ret = fprog->len;
>> +     if (!data)
>> +             goto out_task;
>> +
>> +     if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog))) {
>> +             ret = -EFAULT;
>> +             goto out_task;
>> +     }
>
> Oh wait, I didn't notice this when I looked at v7.
>
> No, you can't do copy_to_user() from atomic context. You need to pin this
> filter, drop the lock/irq, then copy_to_user().

Which CONFIGs would yell about this? CONFIG_DEBUG_ATOMIC_SLEEP?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-20 20:26     ` Kees Cook
@ 2015-10-20 20:37       ` Tycho Andersen
  0 siblings, 0 replies; 27+ messages in thread
From: Tycho Andersen @ 2015-10-20 20:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: Oleg Nesterov, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Tue, Oct 20, 2015 at 01:26:01PM -0700, Kees Cook wrote:
> On Tue, Oct 20, 2015 at 1:20 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Oh wait, I didn't notice this when I looked at v7.
> >
> > No, you can't do copy_to_user() from atomic context. You need to pin this
> > filter, drop the lock/irq, then copy_to_user().
> 
> Which CONFIGs would yell about this? CONFIG_DEBUG_ATOMIC_SLEEP?

Yep, it seems to,

Oct 20 14:35:55 kernel kernel: [   17.879492] BUG: sleeping function called from invalid context at ./arch/x86/include/asm/uaccess.h:732
Oct 20 14:35:55 kernel kernel: [   17.880925] in_atomic(): 1, irqs_disabled(): 1, pid: 2023, name: criu
Oct 20 14:35:55 kernel kernel: [   17.881913] CPU: 2 PID: 2023 Comm: criu Not tainted 4.3.0-rc3+ #11
Oct 20 14:35:55 kernel kernel: [   17.881915] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Oct 20 14:35:55 kernel kernel: [   17.881916]  00000000000002dc ffff880078ca7dc8 ffffffff8133605d ffff880078819780
Oct 20 14:35:55 kernel kernel: [   17.881920]  ffff880078ca7de0 ffffffff81077b3f ffffffff81baf4a0 ffff880078ca7e08
Oct 20 14:35:55 kernel kernel: [   17.881922]  ffffffff81077bc4 00007ffe1570c420 ffff8800798bb460 0000000000000004
Oct 20 14:35:55 kernel kernel: [   17.881924] Call Trace:
Oct 20 14:35:55 kernel kernel: [   17.881932]  [<ffffffff8133605d>] dump_stack+0x4b/0x6e
Oct 20 14:35:55 kernel kernel: [   17.881940]  [<ffffffff81077b3f>] ___might_sleep+0xcf/0x110
Oct 20 14:35:55 kernel kernel: [   17.881943]  [<ffffffff81077bc4>] __might_sleep+0x44/0x80
Oct 20 14:35:55 kernel kernel: [   17.881950]  [<ffffffff8114b2f2>] __might_fault+0x32/0x40
Oct 20 14:35:55 kernel kernel: [   17.881956]  [<ffffffff810eadc5>] seccomp_get_filter+0x115/0x170
Oct 20 14:35:55 kernel kernel: [   17.881961]  [<ffffffff8105f483>] ptrace_request+0x73/0x5d0
Oct 20 14:35:55 kernel kernel: [   17.881969]  [<ffffffff81182698>] ? __fput+0x188/0x1f0
Oct 20 14:35:55 kernel kernel: [   17.881980]  [<ffffffff818ddf29>] ? _raw_spin_unlock_irqrestore+0x9/0x10
Oct 20 14:35:55 kernel kernel: [   17.881983]  [<ffffffff8107c30c>] ? wait_task_inactive+0xfc/0x1f0
Oct 20 14:35:55 kernel kernel: [   17.881986]  [<ffffffff810490ea>] ? __do_page_fault+0x1ca/0x410
Oct 20 14:35:55 kernel kernel: [   17.881990]  [<ffffffff81011d94>] arch_ptrace+0x2a4/0x320
Oct 20 14:35:55 kernel kernel: [   17.881993]  [<ffffffff8105f32a>] SyS_ptrace+0x7a/0x100
Oct 20 14:35:55 kernel kernel: [   17.881996]  [<ffffffff818de4ae>] entry_SYSCALL_64_fastpath+0x12/0x71

Thanks, Oleg. I'll make the change and re-send.

Tycho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-20 20:20     ` Oleg Nesterov
  (?)
  (?)
@ 2015-10-20 22:08     ` Tycho Andersen
  2015-10-21 18:51         ` Oleg Nesterov
  -1 siblings, 1 reply; 27+ messages in thread
From: Tycho Andersen @ 2015-10-20 22:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

[-- Attachment #1: Type: text/plain, Size: 257 bytes --]

Hi Kees, Oleg,

On Tue, Oct 20, 2015 at 10:20:24PM +0200, Oleg Nesterov wrote:
>
> No, you can't do copy_to_user() from atomic context. You need to pin this
> filter, drop the lock/irq, then copy_to_user().

Attached is a patch which addresses this.

Tycho

[-- Attachment #2: 0001-seccomp-ptrace-add-support-for-dumping-seccomp-filte.patch --]
[-- Type: text/x-diff, Size: 6353 bytes --]

>From 5be84ece99312304df2a61f727355ffd3fc604da Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.andersen@canonical.com>
Date: Fri, 2 Oct 2015 18:49:43 -0600
Subject: [PATCH v9] seccomp, ptrace: add support for dumping seccomp filters

This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
    * check that the orig_prog exists (not necessary right now, but when
       grows eBPF support it will be)
    * s/n/filter_off and make it an unsigned long to match ptrace
    * count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
    * use a 0x42** constant for the ptrace command value

v4: * don't copy to userspace while holding spinlocks

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h     | 11 +++++++
 include/uapi/linux/ptrace.h |  2 ++
 kernel/ptrace.c             |  5 +++
 kernel/seccomp.c            | 76 ++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..2296e6b 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,15 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+extern long seccomp_get_filter(struct task_struct *task,
+			       unsigned long filter_off, void __user *data);
+#else
+static inline long seccomp_get_filter(struct task_struct *task,
+				      unsigned long n, void __user *data)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..fb81065 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -64,6 +64,8 @@ struct ptrace_peeksiginfo_args {
 #define PTRACE_GETSIGMASK	0x420a
 #define PTRACE_SETSIGMASK	0x420b
 
+#define PTRACE_SECCOMP_GET_FILTER	0x420c
+
 /* Read signals from a shared (process wide) queue */
 #define PTRACE_PEEKSIGINFO_SHARED	(1 << 0)
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..b760bae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,11 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER:
+		ret = seccomp_get_filter(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 06858a7..c922805b 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -347,6 +347,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *sfilter;
 	int ret;
+	const bool save_orig = config_enabled(CONFIG_CHECKPOINT_RESTORE);
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
 		return ERR_PTR(-EINVAL);
@@ -370,7 +371,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(-ENOMEM);
 
 	ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
-					seccomp_check_filter, false);
+					seccomp_check_filter, save_orig);
 	if (ret < 0) {
 		kfree(sfilter);
 		return ERR_PTR(ret);
@@ -867,3 +868,76 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 	/* prctl interface doesn't have flags, so they are always zero. */
 	return do_seccomp(op, 0, uargs);
 }
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
+			void __user *data)
+{
+	struct seccomp_filter *filter;
+	struct sock_fprog_kern *fprog;
+	long ret;
+	unsigned long count = 0;
+
+	if (!capable(CAP_SYS_ADMIN) ||
+	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
+		return -EACCES;
+	}
+
+	spin_lock_irq(&task->sighand->siglock);
+	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	filter = task->seccomp.filter;
+	while (filter) {
+		filter = filter->prev;
+		count++;
+	}
+
+	if (filter_off >= count) {
+		ret = -ENOENT;
+		goto out;
+	}
+	count -= filter_off;
+
+	filter = task->seccomp.filter;
+	while (filter && count > 1) {
+		filter = filter->prev;
+		count--;
+	}
+
+	if (WARN_ON(count != 1)) {
+		/* The filter tree shouldn't shrink while we're using it. */
+		ret = -ENOENT;
+		goto out;
+	}
+
+	fprog = filter->prog->orig_prog;
+	if (!fprog) {
+		/* This must be a new non-cBPF filter, since we save every
+		 * every cBPF filter's orig_prog above when
+		 * CONFIG_CHECKPOINT_RESTORE is enabled.
+		 */
+		ret = -EMEDIUMTYPE;
+		goto out;
+	}
+
+	ret = fprog->len;
+	if (!data)
+		goto out;
+
+	get_seccomp_filter(task);
+	spin_unlock_irq(&task->sighand->siglock);
+
+	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
+		ret = -EFAULT;
+
+	put_seccomp_filter(task);
+	return ret;
+
+out:
+	spin_unlock_irq(&task->sighand->siglock);
+	return ret;
+}
+#endif
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-20 22:08     ` Tycho Andersen
@ 2015-10-21 18:51         ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-21 18:51 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

On 10/20, Tycho Andersen wrote:
>
> Hi Kees, Oleg,
>
> On Tue, Oct 20, 2015 at 10:20:24PM +0200, Oleg Nesterov wrote:
> >
> > No, you can't do copy_to_user() from atomic context. You need to pin this
> > filter, drop the lock/irq, then copy_to_user().
>
> Attached is a patch which addresses this.

Looks good to me, feel free to add my reviewed-by.


a couple of questions, I am just curious...

> +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
> +			void __user *data)
> +{
> +	struct seccomp_filter *filter;
> +	struct sock_fprog_kern *fprog;
> +	long ret;
> +	unsigned long count = 0;
> +
> +	if (!capable(CAP_SYS_ADMIN) ||
> +	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
> +		return -EACCES;
> +	}
> +
> +	spin_lock_irq(&task->sighand->siglock);
> +	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	filter = task->seccomp.filter;
> +	while (filter) {
> +		filter = filter->prev;
> +		count++;
> +	}
> +
> +	if (filter_off >= count) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +	count -= filter_off;
> +
> +	filter = task->seccomp.filter;
> +	while (filter && count > 1) {
> +		filter = filter->prev;
> +		count--;
> +	}
> +
> +	if (WARN_ON(count != 1)) {
> +		/* The filter tree shouldn't shrink while we're using it. */
> +		ret = -ENOENT;

Yes. but this looks a bit confusing. If we want this WARN_ON() check
because we are paranoid, then we should do

	WARN_ON(count != 1 || filter);

And "while we're using it" look misleading, we rely on ->siglock.

Plus if we could be shrinked the additional check can't help anyway,
we can used the free filter. So I don't really understand this check
and "filter != NULL" in the previous "while (filter && count > 1)".
Nevermind...

The question is:

> +	fprog = filter->prog->orig_prog;
> +	if (!fprog) {

So is it possible or not? I didn't see the previous changes which
added "bool save" to seccomp_attach_filter() so I simply can't know.

Now,

> +		/* This must be a new non-cBPF filter, since we save every
> +		 * every cBPF filter's orig_prog above when
> +		 * CONFIG_CHECKPOINT_RESTORE is enabled.
> +		 */
> +		ret = -EMEDIUMTYPE;

If this is possible, then probably we should simply change both
"while (filter)" loops above to skip a filter if orig_prog == NULL
and remove the -EMEDIUMTYPE code ?

Or what? Probably "a new non-cBPF filter" answers my question,
but I do not know what this cBPF/non-cBPF actually means ;)

In short. Who can attach a filter without "save => true" ?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-21 18:51         ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-21 18:51 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 10/20, Tycho Andersen wrote:
>
> Hi Kees, Oleg,
>
> On Tue, Oct 20, 2015 at 10:20:24PM +0200, Oleg Nesterov wrote:
> >
> > No, you can't do copy_to_user() from atomic context. You need to pin this
> > filter, drop the lock/irq, then copy_to_user().
>
> Attached is a patch which addresses this.

Looks good to me, feel free to add my reviewed-by.


a couple of questions, I am just curious...

> +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
> +			void __user *data)
> +{
> +	struct seccomp_filter *filter;
> +	struct sock_fprog_kern *fprog;
> +	long ret;
> +	unsigned long count = 0;
> +
> +	if (!capable(CAP_SYS_ADMIN) ||
> +	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
> +		return -EACCES;
> +	}
> +
> +	spin_lock_irq(&task->sighand->siglock);
> +	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	filter = task->seccomp.filter;
> +	while (filter) {
> +		filter = filter->prev;
> +		count++;
> +	}
> +
> +	if (filter_off >= count) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +	count -= filter_off;
> +
> +	filter = task->seccomp.filter;
> +	while (filter && count > 1) {
> +		filter = filter->prev;
> +		count--;
> +	}
> +
> +	if (WARN_ON(count != 1)) {
> +		/* The filter tree shouldn't shrink while we're using it. */
> +		ret = -ENOENT;

Yes. but this looks a bit confusing. If we want this WARN_ON() check
because we are paranoid, then we should do

	WARN_ON(count != 1 || filter);

And "while we're using it" look misleading, we rely on ->siglock.

Plus if we could be shrinked the additional check can't help anyway,
we can used the free filter. So I don't really understand this check
and "filter != NULL" in the previous "while (filter && count > 1)".
Nevermind...

The question is:

> +	fprog = filter->prog->orig_prog;
> +	if (!fprog) {

So is it possible or not? I didn't see the previous changes which
added "bool save" to seccomp_attach_filter() so I simply can't know.

Now,

> +		/* This must be a new non-cBPF filter, since we save every
> +		 * every cBPF filter's orig_prog above when
> +		 * CONFIG_CHECKPOINT_RESTORE is enabled.
> +		 */
> +		ret = -EMEDIUMTYPE;

If this is possible, then probably we should simply change both
"while (filter)" loops above to skip a filter if orig_prog == NULL
and remove the -EMEDIUMTYPE code ?

Or what? Probably "a new non-cBPF filter" answers my question,
but I do not know what this cBPF/non-cBPF actually means ;)

In short. Who can attach a filter without "save => true" ?

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 18:51         ` Oleg Nesterov
  (?)
@ 2015-10-21 19:15         ` Tycho Andersen
  2015-10-21 20:12             ` Kees Cook
  2015-10-21 21:07             ` Oleg Nesterov
  -1 siblings, 2 replies; 27+ messages in thread
From: Tycho Andersen @ 2015-10-21 19:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

Hi Oleg,

On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
> On 10/20, Tycho Andersen wrote:
> >
> > Hi Kees, Oleg,
> >
> > On Tue, Oct 20, 2015 at 10:20:24PM +0200, Oleg Nesterov wrote:
> > >
> > > No, you can't do copy_to_user() from atomic context. You need to pin this
> > > filter, drop the lock/irq, then copy_to_user().
> >
> > Attached is a patch which addresses this.
> 
> Looks good to me, feel free to add my reviewed-by.
> 
> 
> a couple of questions, I am just curious...
> 
> > +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
> > +			void __user *data)
> > +{
> > +	struct seccomp_filter *filter;
> > +	struct sock_fprog_kern *fprog;
> > +	long ret;
> > +	unsigned long count = 0;
> > +
> > +	if (!capable(CAP_SYS_ADMIN) ||
> > +	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
> > +		return -EACCES;
> > +	}
> > +
> > +	spin_lock_irq(&task->sighand->siglock);
> > +	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	filter = task->seccomp.filter;
> > +	while (filter) {
> > +		filter = filter->prev;
> > +		count++;
> > +	}
> > +
> > +	if (filter_off >= count) {
> > +		ret = -ENOENT;
> > +		goto out;
> > +	}
> > +	count -= filter_off;
> > +
> > +	filter = task->seccomp.filter;
> > +	while (filter && count > 1) {
> > +		filter = filter->prev;
> > +		count--;
> > +	}
> > +
> > +	if (WARN_ON(count != 1)) {
> > +		/* The filter tree shouldn't shrink while we're using it. */
> > +		ret = -ENOENT;
> 
> Yes. but this looks a bit confusing. If we want this WARN_ON() check
> because we are paranoid, then we should do
> 
> 	WARN_ON(count != 1 || filter);

I guess you mean !filter here? We want filter to be non-null, because
we use it later.

> And "while we're using it" look misleading, we rely on ->siglock.
> 
> Plus if we could be shrinked the additional check can't help anyway,
> we can used the free filter. So I don't really understand this check
> and "filter != NULL" in the previous "while (filter && count > 1)".
> Nevermind...

Just paranoia. You're right that we could get rid of WARN_ON and the
null check. I can send an updated patch to drop these bits if
necessary. Kees?

> The question is:
> 
> > +	fprog = filter->prog->orig_prog;
> > +	if (!fprog) {
> 
> So is it possible or not? I didn't see the previous changes which
> added "bool save" to seccomp_attach_filter() so I simply can't know.

Currently, no, it's not. Every struct seccomp_filter is created via a
classic filter,

> Now,
> 
> > +		/* This must be a new non-cBPF filter, since we save every
> > +		 * every cBPF filter's orig_prog above when
> > +		 * CONFIG_CHECKPOINT_RESTORE is enabled.
> > +		 */
> > +		ret = -EMEDIUMTYPE;
> 
> If this is possible, then probably we should simply change both
> "while (filter)" loops above to skip a filter if orig_prog == NULL
> and remove the -EMEDIUMTYPE code ?
> 
> Or what? Probably "a new non-cBPF filter" answers my question,
> but I do not know what this cBPF/non-cBPF actually means ;)
> 
> In short. Who can attach a filter without "save => true" ?

There are two kinds of BPF programs, a "classic" instruction set, and
an "extended" one (which has more features, like maps, that seccomp
probably wants to use someday). Right now, the kernel only supports
adding filters via the classic interface, which saves the orig_prog
and then converts it into the "extended" instruction set for internal
use in the kernel. This ptrace command just dumps the classic
programs.

In the future, if there exists a seccomp interface to add extended BPF
programs directly, they won't have an orig_prog, which will trigger
this error. We don't want to skip these filters because userspace has
no way to know that there is a filter there it couldn't dump. Instead,
we give EMEDIUMTYPE, so userspace knows to use whatever dumping
mechanism exists for this new filter type.

Tycho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 19:15         ` Tycho Andersen
@ 2015-10-21 20:12             ` Kees Cook
  2015-10-21 21:07             ` Oleg Nesterov
  1 sibling, 0 replies; 27+ messages in thread
From: Kees Cook @ 2015-10-21 20:12 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Oleg Nesterov, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Wed, Oct 21, 2015 at 12:15 PM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> Hi Oleg,
>
> On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
>> On 10/20, Tycho Andersen wrote:
>> >
>> > Hi Kees, Oleg,
>> >
>> > On Tue, Oct 20, 2015 at 10:20:24PM +0200, Oleg Nesterov wrote:
>> > >
>> > > No, you can't do copy_to_user() from atomic context. You need to pin this
>> > > filter, drop the lock/irq, then copy_to_user().
>> >
>> > Attached is a patch which addresses this.
>>
>> Looks good to me, feel free to add my reviewed-by.
>>
>>
>> a couple of questions, I am just curious...
>>
>> > +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
>> > +                   void __user *data)
>> > +{
>> > +   struct seccomp_filter *filter;
>> > +   struct sock_fprog_kern *fprog;
>> > +   long ret;
>> > +   unsigned long count = 0;
>> > +
>> > +   if (!capable(CAP_SYS_ADMIN) ||
>> > +       current->seccomp.mode != SECCOMP_MODE_DISABLED) {
>> > +           return -EACCES;
>> > +   }
>> > +
>> > +   spin_lock_irq(&task->sighand->siglock);
>> > +   if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
>> > +           ret = -EINVAL;
>> > +           goto out;
>> > +   }
>> > +
>> > +   filter = task->seccomp.filter;
>> > +   while (filter) {
>> > +           filter = filter->prev;
>> > +           count++;
>> > +   }
>> > +
>> > +   if (filter_off >= count) {
>> > +           ret = -ENOENT;
>> > +           goto out;
>> > +   }
>> > +   count -= filter_off;
>> > +
>> > +   filter = task->seccomp.filter;
>> > +   while (filter && count > 1) {
>> > +           filter = filter->prev;
>> > +           count--;
>> > +   }
>> > +
>> > +   if (WARN_ON(count != 1)) {
>> > +           /* The filter tree shouldn't shrink while we're using it. */
>> > +           ret = -ENOENT;
>>
>> Yes. but this looks a bit confusing. If we want this WARN_ON() check
>> because we are paranoid, then we should do
>>
>>       WARN_ON(count != 1 || filter);
>
> I guess you mean !filter here? We want filter to be non-null, because
> we use it later.
>
>> And "while we're using it" look misleading, we rely on ->siglock.
>>
>> Plus if we could be shrinked the additional check can't help anyway,
>> we can used the free filter. So I don't really understand this check
>> and "filter != NULL" in the previous "while (filter && count > 1)".
>> Nevermind...
>
> Just paranoia. You're right that we could get rid of WARN_ON and the
> null check. I can send an updated patch to drop these bits if
> necessary. Kees?

I like being really paranoid when dealing with the filters. Let's keep
the WARN_ON (with the "|| !filter" added) but maybe wrap it in
"unlikely"?

>> The question is:
>>
>> > +   fprog = filter->prog->orig_prog;
>> > +   if (!fprog) {
>>
>> So is it possible or not? I didn't see the previous changes which
>> added "bool save" to seccomp_attach_filter() so I simply can't know.
>
> Currently, no, it's not. Every struct seccomp_filter is created via a
> classic filter,
>
>> Now,
>>
>> > +           /* This must be a new non-cBPF filter, since we save every
>> > +            * every cBPF filter's orig_prog above when
>> > +            * CONFIG_CHECKPOINT_RESTORE is enabled.
>> > +            */
>> > +           ret = -EMEDIUMTYPE;
>>
>> If this is possible, then probably we should simply change both
>> "while (filter)" loops above to skip a filter if orig_prog == NULL
>> and remove the -EMEDIUMTYPE code ?
>>
>> Or what? Probably "a new non-cBPF filter" answers my question,
>> but I do not know what this cBPF/non-cBPF actually means ;)
>>
>> In short. Who can attach a filter without "save => true" ?
>
> There are two kinds of BPF programs, a "classic" instruction set, and
> an "extended" one (which has more features, like maps, that seccomp
> probably wants to use someday). Right now, the kernel only supports
> adding filters via the classic interface, which saves the orig_prog
> and then converts it into the "extended" instruction set for internal
> use in the kernel. This ptrace command just dumps the classic
> programs.
>
> In the future, if there exists a seccomp interface to add extended BPF
> programs directly, they won't have an orig_prog, which will trigger
> this error. We don't want to skip these filters because userspace has
> no way to know that there is a filter there it couldn't dump. Instead,
> we give EMEDIUMTYPE, so userspace knows to use whatever dumping
> mechanism exists for this new filter type.

These tests are for future-proofing, and I think we should keep.

-Kees

>
> Tycho



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-21 20:12             ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2015-10-21 20:12 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Oleg Nesterov, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Wed, Oct 21, 2015 at 12:15 PM, Tycho Andersen
<tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> Hi Oleg,
>
> On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
>> On 10/20, Tycho Andersen wrote:
>> >
>> > Hi Kees, Oleg,
>> >
>> > On Tue, Oct 20, 2015 at 10:20:24PM +0200, Oleg Nesterov wrote:
>> > >
>> > > No, you can't do copy_to_user() from atomic context. You need to pin this
>> > > filter, drop the lock/irq, then copy_to_user().
>> >
>> > Attached is a patch which addresses this.
>>
>> Looks good to me, feel free to add my reviewed-by.
>>
>>
>> a couple of questions, I am just curious...
>>
>> > +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
>> > +                   void __user *data)
>> > +{
>> > +   struct seccomp_filter *filter;
>> > +   struct sock_fprog_kern *fprog;
>> > +   long ret;
>> > +   unsigned long count = 0;
>> > +
>> > +   if (!capable(CAP_SYS_ADMIN) ||
>> > +       current->seccomp.mode != SECCOMP_MODE_DISABLED) {
>> > +           return -EACCES;
>> > +   }
>> > +
>> > +   spin_lock_irq(&task->sighand->siglock);
>> > +   if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
>> > +           ret = -EINVAL;
>> > +           goto out;
>> > +   }
>> > +
>> > +   filter = task->seccomp.filter;
>> > +   while (filter) {
>> > +           filter = filter->prev;
>> > +           count++;
>> > +   }
>> > +
>> > +   if (filter_off >= count) {
>> > +           ret = -ENOENT;
>> > +           goto out;
>> > +   }
>> > +   count -= filter_off;
>> > +
>> > +   filter = task->seccomp.filter;
>> > +   while (filter && count > 1) {
>> > +           filter = filter->prev;
>> > +           count--;
>> > +   }
>> > +
>> > +   if (WARN_ON(count != 1)) {
>> > +           /* The filter tree shouldn't shrink while we're using it. */
>> > +           ret = -ENOENT;
>>
>> Yes. but this looks a bit confusing. If we want this WARN_ON() check
>> because we are paranoid, then we should do
>>
>>       WARN_ON(count != 1 || filter);
>
> I guess you mean !filter here? We want filter to be non-null, because
> we use it later.
>
>> And "while we're using it" look misleading, we rely on ->siglock.
>>
>> Plus if we could be shrinked the additional check can't help anyway,
>> we can used the free filter. So I don't really understand this check
>> and "filter != NULL" in the previous "while (filter && count > 1)".
>> Nevermind...
>
> Just paranoia. You're right that we could get rid of WARN_ON and the
> null check. I can send an updated patch to drop these bits if
> necessary. Kees?

I like being really paranoid when dealing with the filters. Let's keep
the WARN_ON (with the "|| !filter" added) but maybe wrap it in
"unlikely"?

>> The question is:
>>
>> > +   fprog = filter->prog->orig_prog;
>> > +   if (!fprog) {
>>
>> So is it possible or not? I didn't see the previous changes which
>> added "bool save" to seccomp_attach_filter() so I simply can't know.
>
> Currently, no, it's not. Every struct seccomp_filter is created via a
> classic filter,
>
>> Now,
>>
>> > +           /* This must be a new non-cBPF filter, since we save every
>> > +            * every cBPF filter's orig_prog above when
>> > +            * CONFIG_CHECKPOINT_RESTORE is enabled.
>> > +            */
>> > +           ret = -EMEDIUMTYPE;
>>
>> If this is possible, then probably we should simply change both
>> "while (filter)" loops above to skip a filter if orig_prog == NULL
>> and remove the -EMEDIUMTYPE code ?
>>
>> Or what? Probably "a new non-cBPF filter" answers my question,
>> but I do not know what this cBPF/non-cBPF actually means ;)
>>
>> In short. Who can attach a filter without "save => true" ?
>
> There are two kinds of BPF programs, a "classic" instruction set, and
> an "extended" one (which has more features, like maps, that seccomp
> probably wants to use someday). Right now, the kernel only supports
> adding filters via the classic interface, which saves the orig_prog
> and then converts it into the "extended" instruction set for internal
> use in the kernel. This ptrace command just dumps the classic
> programs.
>
> In the future, if there exists a seccomp interface to add extended BPF
> programs directly, they won't have an orig_prog, which will trigger
> this error. We don't want to skip these filters because userspace has
> no way to know that there is a filter there it couldn't dump. Instead,
> we give EMEDIUMTYPE, so userspace knows to use whatever dumping
> mechanism exists for this new filter type.

These tests are for future-proofing, and I think we should keep.

-Kees

>
> Tycho



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 20:12             ` Kees Cook
  (?)
@ 2015-10-21 20:18             ` Daniel Borkmann
  2015-10-21 20:37               ` Tycho Andersen
  -1 siblings, 1 reply; 27+ messages in thread
From: Daniel Borkmann @ 2015-10-21 20:18 UTC (permalink / raw)
  To: Kees Cook, Tycho Andersen
  Cc: Oleg Nesterov, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, LKML, Linux API

On 10/21/2015 10:12 PM, Kees Cook wrote:
> On Wed, Oct 21, 2015 at 12:15 PM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
>> Hi Oleg,
>>
>> On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
>>> On 10/20, Tycho Andersen wrote:
>>>>
>>>> Hi Kees, Oleg,
>>>>
>>>> On Tue, Oct 20, 2015 at 10:20:24PM +0200, Oleg Nesterov wrote:
>>>>>
>>>>> No, you can't do copy_to_user() from atomic context. You need to pin this
>>>>> filter, drop the lock/irq, then copy_to_user().
>>>>
>>>> Attached is a patch which addresses this.
>>>
>>> Looks good to me, feel free to add my reviewed-by.
>>>
>>>
>>> a couple of questions, I am just curious...
>>>
>>>> +long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
>>>> +                   void __user *data)
>>>> +{
>>>> +   struct seccomp_filter *filter;
>>>> +   struct sock_fprog_kern *fprog;
>>>> +   long ret;
>>>> +   unsigned long count = 0;
>>>> +
>>>> +   if (!capable(CAP_SYS_ADMIN) ||
>>>> +       current->seccomp.mode != SECCOMP_MODE_DISABLED) {
>>>> +           return -EACCES;
>>>> +   }
>>>> +
>>>> +   spin_lock_irq(&task->sighand->siglock);
>>>> +   if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
>>>> +           ret = -EINVAL;
>>>> +           goto out;
>>>> +   }
>>>> +
>>>> +   filter = task->seccomp.filter;
>>>> +   while (filter) {
>>>> +           filter = filter->prev;
>>>> +           count++;
>>>> +   }
>>>> +
>>>> +   if (filter_off >= count) {
>>>> +           ret = -ENOENT;
>>>> +           goto out;
>>>> +   }
>>>> +   count -= filter_off;
>>>> +
>>>> +   filter = task->seccomp.filter;
>>>> +   while (filter && count > 1) {
>>>> +           filter = filter->prev;
>>>> +           count--;
>>>> +   }
>>>> +
>>>> +   if (WARN_ON(count != 1)) {
>>>> +           /* The filter tree shouldn't shrink while we're using it. */
>>>> +           ret = -ENOENT;
>>>
>>> Yes. but this looks a bit confusing. If we want this WARN_ON() check
>>> because we are paranoid, then we should do
>>>
>>>        WARN_ON(count != 1 || filter);
>>
>> I guess you mean !filter here? We want filter to be non-null, because
>> we use it later.
>>
>>> And "while we're using it" look misleading, we rely on ->siglock.
>>>
>>> Plus if we could be shrinked the additional check can't help anyway,
>>> we can used the free filter. So I don't really understand this check
>>> and "filter != NULL" in the previous "while (filter && count > 1)".
>>> Nevermind...
>>
>> Just paranoia. You're right that we could get rid of WARN_ON and the
>> null check. I can send an updated patch to drop these bits if
>> necessary. Kees?
>
> I like being really paranoid when dealing with the filters. Let's keep
> the WARN_ON (with the "|| !filter" added) but maybe wrap it in
> "unlikely"?

Btw, the conditions inside the WARN_ON() macro would already resolve
to unlikely().

Best,
Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 20:18             ` Daniel Borkmann
@ 2015-10-21 20:37               ` Tycho Andersen
  0 siblings, 0 replies; 27+ messages in thread
From: Tycho Andersen @ 2015-10-21 20:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Oleg Nesterov, Alexei Starovoitov, Will Drewry,
	Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, LKML,
	Linux API

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On Wed, Oct 21, 2015 at 10:18:20PM +0200, Daniel Borkmann wrote:
> On 10/21/2015 10:12 PM, Kees Cook wrote:
> >
> >I like being really paranoid when dealing with the filters. Let's keep
> >the WARN_ON (with the "|| !filter" added) but maybe wrap it in
> >"unlikely"?
> 
> Btw, the conditions inside the WARN_ON() macro would already resolve
> to unlikely().

Here's an updated patch with the !filter as well.

Thanks,

Tycho

[-- Attachment #2: 0001-seccomp-ptrace-add-support-for-dumping-seccomp-filte.patch --]
[-- Type: text/x-diff, Size: 6445 bytes --]

>From f37256a6f5e9e975943024ec0a26796a48521492 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.andersen@canonical.com>
Date: Fri, 2 Oct 2015 18:49:43 -0600
Subject: [PATCH] seccomp, ptrace: add support for dumping seccomp filters

This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
    * check that the orig_prog exists (not necessary right now, but when
       grows eBPF support it will be)
    * s/n/filter_off and make it an unsigned long to match ptrace
    * count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
    * use a 0x42** constant for the ptrace command value

v4: * don't copy to userspace while holding spinlocks

v5: * add another condition to WARN_ON

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h     | 11 +++++++
 include/uapi/linux/ptrace.h |  2 ++
 kernel/ptrace.c             |  5 +++
 kernel/seccomp.c            | 76 ++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..2296e6b 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,15 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+extern long seccomp_get_filter(struct task_struct *task,
+			       unsigned long filter_off, void __user *data);
+#else
+static inline long seccomp_get_filter(struct task_struct *task,
+				      unsigned long n, void __user *data)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..fb81065 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -64,6 +64,8 @@ struct ptrace_peeksiginfo_args {
 #define PTRACE_GETSIGMASK	0x420a
 #define PTRACE_SETSIGMASK	0x420b
 
+#define PTRACE_SECCOMP_GET_FILTER	0x420c
+
 /* Read signals from a shared (process wide) queue */
 #define PTRACE_PEEKSIGINFO_SHARED	(1 << 0)
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..b760bae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,11 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER:
+		ret = seccomp_get_filter(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 06858a7..580ac2d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -347,6 +347,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *sfilter;
 	int ret;
+	const bool save_orig = config_enabled(CONFIG_CHECKPOINT_RESTORE);
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
 		return ERR_PTR(-EINVAL);
@@ -370,7 +371,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(-ENOMEM);
 
 	ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
-					seccomp_check_filter, false);
+					seccomp_check_filter, save_orig);
 	if (ret < 0) {
 		kfree(sfilter);
 		return ERR_PTR(ret);
@@ -867,3 +868,76 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 	/* prctl interface doesn't have flags, so they are always zero. */
 	return do_seccomp(op, 0, uargs);
 }
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
+			void __user *data)
+{
+	struct seccomp_filter *filter;
+	struct sock_fprog_kern *fprog;
+	long ret;
+	unsigned long count = 0;
+
+	if (!capable(CAP_SYS_ADMIN) ||
+	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
+		return -EACCES;
+	}
+
+	spin_lock_irq(&task->sighand->siglock);
+	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	filter = task->seccomp.filter;
+	while (filter) {
+		filter = filter->prev;
+		count++;
+	}
+
+	if (filter_off >= count) {
+		ret = -ENOENT;
+		goto out;
+	}
+	count -= filter_off;
+
+	filter = task->seccomp.filter;
+	while (filter && count > 1) {
+		filter = filter->prev;
+		count--;
+	}
+
+	if (WARN_ON(count != 1 || !filter)) {
+		/* The filter tree shouldn't shrink while we're using it. */
+		ret = -ENOENT;
+		goto out;
+	}
+
+	fprog = filter->prog->orig_prog;
+	if (!fprog) {
+		/* This must be a new non-cBPF filter, since we save every
+		 * every cBPF filter's orig_prog above when
+		 * CONFIG_CHECKPOINT_RESTORE is enabled.
+		 */
+		ret = -EMEDIUMTYPE;
+		goto out;
+	}
+
+	ret = fprog->len;
+	if (!data)
+		goto out;
+
+	get_seccomp_filter(task);
+	spin_unlock_irq(&task->sighand->siglock);
+
+	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
+		ret = -EFAULT;
+
+	put_seccomp_filter(task);
+	return ret;
+
+out:
+	spin_unlock_irq(&task->sighand->siglock);
+	return ret;
+}
+#endif
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 19:15         ` Tycho Andersen
@ 2015-10-21 21:07             ` Oleg Nesterov
  2015-10-21 21:07             ` Oleg Nesterov
  1 sibling, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-21 21:07 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

On 10/21, Tycho Andersen wrote:
>
> Hi Oleg,
>
> On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
> > > +
> > > +	if (WARN_ON(count != 1)) {
> > > +		/* The filter tree shouldn't shrink while we're using it. */
> > > +		ret = -ENOENT;
> >
> > Yes. but this looks a bit confusing. If we want this WARN_ON() check
> > because we are paranoid, then we should do
> >
> > 	WARN_ON(count != 1 || filter);
>
> I guess you mean !filter here? We want filter to be non-null, because
> we use it later.

Yes, yes, sorry for confusion. And (if we could race with shrink) it
could be NULL so this paranoid check is not complete.

> > And "while we're using it" look misleading, we rely on ->siglock.
> >
> > Plus if we could be shrinked the additional check can't help anyway,
> > we can used the free filter. So I don't really understand this check
> > and "filter != NULL" in the previous "while (filter && count > 1)".
> > Nevermind...
>
> Just paranoia. You're right that we could get rid of WARN_ON and the
> null check. I can send an updated patch to drop these bits if
> necessary. Kees?

Just in case, I am fine either way, this is minor.

> > In short. Who can attach a filter without "save => true" ?
>
> There are two kinds of BPF programs, a "classic" instruction set, and
> an "extended" one (which has more features, like maps, that seccomp
> probably wants to use someday). Right now, the kernel only supports
> adding filters via the classic interface, which saves the orig_prog
> and then converts it into the "extended" instruction set for internal
> use in the kernel. This ptrace command just dumps the classic
> programs.

OK,

> In the future, if there exists a seccomp interface to add extended BPF
> programs directly, they won't have an orig_prog, which will trigger
> this error.

Hmm. It is not clear to me why this "new" interface won't or can't save
orig_prog like we currently do. But this doesn't matter.

If we know that currently this is not possible why should be confuse the
reader? Can't we remove this code or turn it into WARN_ON(!orig_prog)
to make it clear?


And this leads to another question... If we expect that this interface
can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
dump some header before copy_to_user(fprog->filter) ? Say, just
"unsigned long version" == 0 for now. So that we can avoid
PTRACE_SECCOMP_GET_FILTER_V2 in future.

Tycho, Kees, to clarify, it is not that I really think we should do this,
up to you. I am just asking.

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-21 21:07             ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-21 21:07 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 10/21, Tycho Andersen wrote:
>
> Hi Oleg,
>
> On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
> > > +
> > > +	if (WARN_ON(count != 1)) {
> > > +		/* The filter tree shouldn't shrink while we're using it. */
> > > +		ret = -ENOENT;
> >
> > Yes. but this looks a bit confusing. If we want this WARN_ON() check
> > because we are paranoid, then we should do
> >
> > 	WARN_ON(count != 1 || filter);
>
> I guess you mean !filter here? We want filter to be non-null, because
> we use it later.

Yes, yes, sorry for confusion. And (if we could race with shrink) it
could be NULL so this paranoid check is not complete.

> > And "while we're using it" look misleading, we rely on ->siglock.
> >
> > Plus if we could be shrinked the additional check can't help anyway,
> > we can used the free filter. So I don't really understand this check
> > and "filter != NULL" in the previous "while (filter && count > 1)".
> > Nevermind...
>
> Just paranoia. You're right that we could get rid of WARN_ON and the
> null check. I can send an updated patch to drop these bits if
> necessary. Kees?

Just in case, I am fine either way, this is minor.

> > In short. Who can attach a filter without "save => true" ?
>
> There are two kinds of BPF programs, a "classic" instruction set, and
> an "extended" one (which has more features, like maps, that seccomp
> probably wants to use someday). Right now, the kernel only supports
> adding filters via the classic interface, which saves the orig_prog
> and then converts it into the "extended" instruction set for internal
> use in the kernel. This ptrace command just dumps the classic
> programs.

OK,

> In the future, if there exists a seccomp interface to add extended BPF
> programs directly, they won't have an orig_prog, which will trigger
> this error.

Hmm. It is not clear to me why this "new" interface won't or can't save
orig_prog like we currently do. But this doesn't matter.

If we know that currently this is not possible why should be confuse the
reader? Can't we remove this code or turn it into WARN_ON(!orig_prog)
to make it clear?


And this leads to another question... If we expect that this interface
can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
dump some header before copy_to_user(fprog->filter) ? Say, just
"unsigned long version" == 0 for now. So that we can avoid
PTRACE_SECCOMP_GET_FILTER_V2 in future.

Tycho, Kees, to clarify, it is not that I really think we should do this,
up to you. I am just asking.

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 21:07             ` Oleg Nesterov
  (?)
@ 2015-10-21 21:20             ` Kees Cook
  -1 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2015-10-21 21:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Wed, Oct 21, 2015 at 2:07 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/21, Tycho Andersen wrote:
>>
>> Hi Oleg,
>>
>> On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
>> > > +
>> > > + if (WARN_ON(count != 1)) {
>> > > +         /* The filter tree shouldn't shrink while we're using it. */
>> > > +         ret = -ENOENT;
>> >
>> > Yes. but this looks a bit confusing. If we want this WARN_ON() check
>> > because we are paranoid, then we should do
>> >
>> >     WARN_ON(count != 1 || filter);
>>
>> I guess you mean !filter here? We want filter to be non-null, because
>> we use it later.
>
> Yes, yes, sorry for confusion. And (if we could race with shrink) it
> could be NULL so this paranoid check is not complete.
>
>> > And "while we're using it" look misleading, we rely on ->siglock.
>> >
>> > Plus if we could be shrinked the additional check can't help anyway,
>> > we can used the free filter. So I don't really understand this check
>> > and "filter != NULL" in the previous "while (filter && count > 1)".
>> > Nevermind...
>>
>> Just paranoia. You're right that we could get rid of WARN_ON and the
>> null check. I can send an updated patch to drop these bits if
>> necessary. Kees?
>
> Just in case, I am fine either way, this is minor.
>
>> > In short. Who can attach a filter without "save => true" ?
>>
>> There are two kinds of BPF programs, a "classic" instruction set, and
>> an "extended" one (which has more features, like maps, that seccomp
>> probably wants to use someday). Right now, the kernel only supports
>> adding filters via the classic interface, which saves the orig_prog
>> and then converts it into the "extended" instruction set for internal
>> use in the kernel. This ptrace command just dumps the classic
>> programs.
>
> OK,
>
>> In the future, if there exists a seccomp interface to add extended BPF
>> programs directly, they won't have an orig_prog, which will trigger
>> this error.
>
> Hmm. It is not clear to me why this "new" interface won't or can't save
> orig_prog like we currently do. But this doesn't matter.
>
> If we know that currently this is not possible why should be confuse the
> reader? Can't we remove this code or turn it into WARN_ON(!orig_prog)
> to make it clear?
>
>
> And this leads to another question... If we expect that this interface
> can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
> dump some header before copy_to_user(fprog->filter) ? Say, just
> "unsigned long version" == 0 for now. So that we can avoid
> PTRACE_SECCOMP_GET_FILTER_V2 in future.
>
> Tycho, Kees, to clarify, it is not that I really think we should do this,
> up to you. I am just asking.

There is a long and painful thread on this and eBPF. The tl;dr is
mostly: anything in the future will be using eBPF, and that already
has the bpf syscall it would be using for its work.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 21:07             ` Oleg Nesterov
  (?)
  (?)
@ 2015-10-21 21:33             ` Tycho Andersen
  2015-10-25 15:39                 ` Oleg Nesterov
  -1 siblings, 1 reply; 27+ messages in thread
From: Tycho Andersen @ 2015-10-21 21:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

Hi Oleg,

On Wed, Oct 21, 2015 at 11:07:56PM +0200, Oleg Nesterov wrote:
> On 10/21, Tycho Andersen wrote:
> >
> > Hi Oleg,
> >
> > On Wed, Oct 21, 2015 at 08:51:46PM +0200, Oleg Nesterov wrote:
> > > > +
> > > > +	if (WARN_ON(count != 1)) {
> > > > +		/* The filter tree shouldn't shrink while we're using it. */
> > > > +		ret = -ENOENT;
> > >
> > > Yes. but this looks a bit confusing. If we want this WARN_ON() check
> > > because we are paranoid, then we should do
> > >
> > > 	WARN_ON(count != 1 || filter);
> >
> > I guess you mean !filter here? We want filter to be non-null, because
> > we use it later.
> 
> Yes, yes, sorry for confusion. And (if we could race with shrink) it
> could be NULL so this paranoid check is not complete.

Yep, but you're right that we shouldn't be able to race, so it is
mostly unnecessary.

> > > And "while we're using it" look misleading, we rely on ->siglock.
> > >
> > > Plus if we could be shrinked the additional check can't help anyway,
> > > we can used the free filter. So I don't really understand this check
> > > and "filter != NULL" in the previous "while (filter && count > 1)".
> > > Nevermind...
> >
> > Just paranoia. You're right that we could get rid of WARN_ON and the
> > null check. I can send an updated patch to drop these bits if
> > necessary. Kees?
> 
> Just in case, I am fine either way, this is minor.
> 
> > > In short. Who can attach a filter without "save => true" ?
> >
> > There are two kinds of BPF programs, a "classic" instruction set, and
> > an "extended" one (which has more features, like maps, that seccomp
> > probably wants to use someday). Right now, the kernel only supports
> > adding filters via the classic interface, which saves the orig_prog
> > and then converts it into the "extended" instruction set for internal
> > use in the kernel. This ptrace command just dumps the classic
> > programs.
> 
> OK,
> 
> > In the future, if there exists a seccomp interface to add extended BPF
> > programs directly, they won't have an orig_prog, which will trigger
> > this error.
> 
> Hmm. It is not clear to me why this "new" interface won't or can't save
> orig_prog like we currently do. But this doesn't matter.

orig_prog is the classic BPF format, but if we import a program
directly from userspace as extended BPF, there was no classic to
convert from, so no concept of "original" program in that sense.

> If we know that currently this is not possible why should be confuse the
> reader? Can't we remove this code or turn it into WARN_ON(!orig_prog)
> to make it clear?

I guess to me it seems better to just future proof it, since we think
that this functionality may come soon, but I don't mind a WARN_ON
either.

> 
> And this leads to another question... If we expect that this interface
> can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
> dump some header before copy_to_user(fprog->filter) ? Say, just
> "unsigned long version" == 0 for now. So that we can avoid
> PTRACE_SECCOMP_GET_FILTER_V2 in future.

So this is interesting. Like Kees mentioned, the bulk of the work
would be done by the bpf syscall. We'd still need some way to get
access to the fd itself, which we could (ab)use
PTRACE_SECCOMP_GET_FILTER for, by returning the fd + BPF_MAXINSNS (so
that it doesn't conflict with length) or something like that. Or add a
_V2 as you say. If there is some change we can make to have a nicer
interface than fd + BPF_MAXINSNS to future proof, I'm fine with making
it.

> Tycho, Kees, to clarify, it is not that I really think we should do this,
> up to you. I am just asking.

No problem, thanks for the reviews.

Tycho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-21 21:33             ` Tycho Andersen
@ 2015-10-25 15:39                 ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-25 15:39 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	linux-api

On 10/21, Tycho Andersen wrote:
>
> > And this leads to another question... If we expect that this interface
> > can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
> > dump some header before copy_to_user(fprog->filter) ? Say, just
> > "unsigned long version" == 0 for now. So that we can avoid
> > PTRACE_SECCOMP_GET_FILTER_V2 in future.
>
> So this is interesting. Like Kees mentioned, the bulk of the work
> would be done by the bpf syscall. We'd still need some way to get
> access to the fd itself, which we could (ab)use
> PTRACE_SECCOMP_GET_FILTER for, by returning the fd + BPF_MAXINSNS (so
> that it doesn't conflict with length) or something like that. Or add a
> _V2 as you say. If there is some change we can make to have a nicer
> interface than fd + BPF_MAXINSNS to future proof, I'm fine with making
> it.

Can't comment, this is up to you/Kees ;)

So, just in case, let me repeat I am fine with this patch.

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-25 15:39                 ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2015-10-25 15:39 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 10/21, Tycho Andersen wrote:
>
> > And this leads to another question... If we expect that this interface
> > can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
> > dump some header before copy_to_user(fprog->filter) ? Say, just
> > "unsigned long version" == 0 for now. So that we can avoid
> > PTRACE_SECCOMP_GET_FILTER_V2 in future.
>
> So this is interesting. Like Kees mentioned, the bulk of the work
> would be done by the bpf syscall. We'd still need some way to get
> access to the fd itself, which we could (ab)use
> PTRACE_SECCOMP_GET_FILTER for, by returning the fd + BPF_MAXINSNS (so
> that it doesn't conflict with length) or something like that. Or add a
> _V2 as you say. If there is some change we can make to have a nicer
> interface than fd + BPF_MAXINSNS to future proof, I'm fine with making
> it.

Can't comment, this is up to you/Kees ;)

So, just in case, let me repeat I am fine with this patch.

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-25 15:39                 ` Oleg Nesterov
  (?)
@ 2015-10-26  6:46                 ` Kees Cook
  2015-10-26  7:07                     ` Kees Cook
  -1 siblings, 1 reply; 27+ messages in thread
From: Kees Cook @ 2015-10-26  6:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Mon, Oct 26, 2015 at 12:39 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/21, Tycho Andersen wrote:
>>
>> > And this leads to another question... If we expect that this interface
>> > can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
>> > dump some header before copy_to_user(fprog->filter) ? Say, just
>> > "unsigned long version" == 0 for now. So that we can avoid
>> > PTRACE_SECCOMP_GET_FILTER_V2 in future.
>>
>> So this is interesting. Like Kees mentioned, the bulk of the work
>> would be done by the bpf syscall. We'd still need some way to get
>> access to the fd itself, which we could (ab)use
>> PTRACE_SECCOMP_GET_FILTER for, by returning the fd + BPF_MAXINSNS (so
>> that it doesn't conflict with length) or something like that. Or add a
>> _V2 as you say. If there is some change we can make to have a nicer
>> interface than fd + BPF_MAXINSNS to future proof, I'm fine with making
>> it.
>
> Can't comment, this is up to you/Kees ;)
>
> So, just in case, let me repeat I am fine with this patch.

Cool, thanks. I'll get this into my tree after kernel summit. Thanks
for suffering through all this Tycho!

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-26  7:07                     ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2015-10-26  7:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Mon, Oct 26, 2015 at 3:46 PM, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Oct 26, 2015 at 12:39 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 10/21, Tycho Andersen wrote:
>>>
>>> > And this leads to another question... If we expect that this interface
>>> > can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
>>> > dump some header before copy_to_user(fprog->filter) ? Say, just
>>> > "unsigned long version" == 0 for now. So that we can avoid
>>> > PTRACE_SECCOMP_GET_FILTER_V2 in future.
>>>
>>> So this is interesting. Like Kees mentioned, the bulk of the work
>>> would be done by the bpf syscall. We'd still need some way to get
>>> access to the fd itself, which we could (ab)use
>>> PTRACE_SECCOMP_GET_FILTER for, by returning the fd + BPF_MAXINSNS (so
>>> that it doesn't conflict with length) or something like that. Or add a
>>> _V2 as you say. If there is some change we can make to have a nicer
>>> interface than fd + BPF_MAXINSNS to future proof, I'm fine with making
>>> it.
>>
>> Can't comment, this is up to you/Kees ;)
>>
>> So, just in case, let me repeat I am fine with this patch.
>
> Cool, thanks. I'll get this into my tree after kernel summit. Thanks
> for suffering through all this Tycho!

Actually, since this depends on changes in net, could this get pulled
in from that direction?

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-26  7:07                     ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2015-10-26  7:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tycho Andersen, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API

On Mon, Oct 26, 2015 at 3:46 PM, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
> On Mon, Oct 26, 2015 at 12:39 AM, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On 10/21, Tycho Andersen wrote:
>>>
>>> > And this leads to another question... If we expect that this interface
>>> > can change later, then perhaps PTRACE_SECCOMP_GET_FILTER should also
>>> > dump some header before copy_to_user(fprog->filter) ? Say, just
>>> > "unsigned long version" == 0 for now. So that we can avoid
>>> > PTRACE_SECCOMP_GET_FILTER_V2 in future.
>>>
>>> So this is interesting. Like Kees mentioned, the bulk of the work
>>> would be done by the bpf syscall. We'd still need some way to get
>>> access to the fd itself, which we could (ab)use
>>> PTRACE_SECCOMP_GET_FILTER for, by returning the fd + BPF_MAXINSNS (so
>>> that it doesn't conflict with length) or something like that. Or add a
>>> _V2 as you say. If there is some change we can make to have a nicer
>>> interface than fd + BPF_MAXINSNS to future proof, I'm fine with making
>>> it.
>>
>> Can't comment, this is up to you/Kees ;)
>>
>> So, just in case, let me repeat I am fine with this patch.
>
> Cool, thanks. I'll get this into my tree after kernel summit. Thanks
> for suffering through all this Tycho!

Actually, since this depends on changes in net, could this get pulled
in from that direction?

Acked-by: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-26  7:07                     ` Kees Cook
  (?)
@ 2015-10-27  0:04                     ` Tycho Andersen
  2015-10-27  0:17                         ` Daniel Borkmann
  -1 siblings, 1 reply; 27+ messages in thread
From: Tycho Andersen @ 2015-10-27  0:04 UTC (permalink / raw)
  To: Kees Cook, David S. Miller
  Cc: Oleg Nesterov, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Linux API, netdev

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

Hi David,

On Mon, Oct 26, 2015 at 04:07:01PM +0900, Kees Cook wrote:
> On Mon, Oct 26, 2015 at 3:46 PM, Kees Cook <keescook@chromium.org> wrote:
> > Cool, thanks. I'll get this into my tree after kernel summit. Thanks
> > for suffering through all this Tycho!
> 
> Actually, since this depends on changes in net, could this get pulled
> in from that direction?
> 
> Acked-by: Kees Cook <keescook@chromium.org>

Can we get the attached patch into net-next?

Thanks,

Tycho

[-- Attachment #2: 0001-seccomp-ptrace-add-support-for-dumping-seccomp-filte.patch --]
[-- Type: text/x-diff, Size: 6441 bytes --]

>From 5d9be66e4f48e0882a5546376380147f2f711bec Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.andersen@canonical.com>
Date: Fri, 2 Oct 2015 18:49:43 -0600
Subject: [PATCH] seccomp, ptrace: add support for dumping seccomp filters

This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
    * check that the orig_prog exists (not necessary right now, but when
       grows eBPF support it will be)
    * s/n/filter_off and make it an unsigned long to match ptrace
    * count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
    * use a 0x42** constant for the ptrace command value

v4: * don't copy to userspace while holding spinlocks

v5: * add another condition to WARN_ON

v6: * rebase on net-next

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/seccomp.h     | 11 +++++++
 include/uapi/linux/ptrace.h |  2 ++
 kernel/ptrace.c             |  5 +++
 kernel/seccomp.c            | 76 ++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index f426503..2296e6b 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,15 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+extern long seccomp_get_filter(struct task_struct *task,
+			       unsigned long filter_off, void __user *data);
+#else
+static inline long seccomp_get_filter(struct task_struct *task,
+				      unsigned long n, void __user *data)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a7a6979..fb81065 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -64,6 +64,8 @@ struct ptrace_peeksiginfo_args {
 #define PTRACE_GETSIGMASK	0x420a
 #define PTRACE_SETSIGMASK	0x420b
 
+#define PTRACE_SECCOMP_GET_FILTER	0x420c
+
 /* Read signals from a shared (process wide) queue */
 #define PTRACE_PEEKSIGINFO_SHARED	(1 << 0)
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..b760bae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1016,6 +1016,11 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER:
+		ret = seccomp_get_filter(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 06858a7..580ac2d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -347,6 +347,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *sfilter;
 	int ret;
+	const bool save_orig = config_enabled(CONFIG_CHECKPOINT_RESTORE);
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
 		return ERR_PTR(-EINVAL);
@@ -370,7 +371,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(-ENOMEM);
 
 	ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
-					seccomp_check_filter, false);
+					seccomp_check_filter, save_orig);
 	if (ret < 0) {
 		kfree(sfilter);
 		return ERR_PTR(ret);
@@ -867,3 +868,76 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 	/* prctl interface doesn't have flags, so they are always zero. */
 	return do_seccomp(op, 0, uargs);
 }
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
+			void __user *data)
+{
+	struct seccomp_filter *filter;
+	struct sock_fprog_kern *fprog;
+	long ret;
+	unsigned long count = 0;
+
+	if (!capable(CAP_SYS_ADMIN) ||
+	    current->seccomp.mode != SECCOMP_MODE_DISABLED) {
+		return -EACCES;
+	}
+
+	spin_lock_irq(&task->sighand->siglock);
+	if (task->seccomp.mode != SECCOMP_MODE_FILTER) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	filter = task->seccomp.filter;
+	while (filter) {
+		filter = filter->prev;
+		count++;
+	}
+
+	if (filter_off >= count) {
+		ret = -ENOENT;
+		goto out;
+	}
+	count -= filter_off;
+
+	filter = task->seccomp.filter;
+	while (filter && count > 1) {
+		filter = filter->prev;
+		count--;
+	}
+
+	if (WARN_ON(count != 1 || !filter)) {
+		/* The filter tree shouldn't shrink while we're using it. */
+		ret = -ENOENT;
+		goto out;
+	}
+
+	fprog = filter->prog->orig_prog;
+	if (!fprog) {
+		/* This must be a new non-cBPF filter, since we save every
+		 * every cBPF filter's orig_prog above when
+		 * CONFIG_CHECKPOINT_RESTORE is enabled.
+		 */
+		ret = -EMEDIUMTYPE;
+		goto out;
+	}
+
+	ret = fprog->len;
+	if (!data)
+		goto out;
+
+	get_seccomp_filter(task);
+	spin_unlock_irq(&task->sighand->siglock);
+
+	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
+		ret = -EFAULT;
+
+	put_seccomp_filter(task);
+	return ret;
+
+out:
+	spin_unlock_irq(&task->sighand->siglock);
+	return ret;
+}
+#endif
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
  2015-10-27  0:04                     ` Tycho Andersen
@ 2015-10-27  0:17                         ` Daniel Borkmann
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Borkmann @ 2015-10-27  0:17 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook, David S. Miller
  Cc: Oleg Nesterov, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, LKML, Linux API, netdev

Hi Tycho,

On 10/27/2015 01:04 AM, Tycho Andersen wrote:
> On Mon, Oct 26, 2015 at 04:07:01PM +0900, Kees Cook wrote:
>> On Mon, Oct 26, 2015 at 3:46 PM, Kees Cook <keescook@chromium.org> wrote:
>>> Cool, thanks. I'll get this into my tree after kernel summit. Thanks
>>> for suffering through all this Tycho!
>>
>> Actually, since this depends on changes in net, could this get pulled
>> in from that direction?
>>
>> Acked-by: Kees Cook <keescook@chromium.org>
>
> Can we get the attached patch into net-next?

You need to make a fresh, formal submission of your patch to netdev,
not as an attachment (otherwise patchwork cannot properly pick it up).

Also, indicate the right tree in the subject as: [PATCH net-next] ...

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters
@ 2015-10-27  0:17                         ` Daniel Borkmann
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Borkmann @ 2015-10-27  0:17 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook, David S. Miller
  Cc: Oleg Nesterov, Alexei Starovoitov, Will Drewry, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, LKML, Linux API,
	netdev-u79uwXL29TY76Z2rM5mHXA

Hi Tycho,

On 10/27/2015 01:04 AM, Tycho Andersen wrote:
> On Mon, Oct 26, 2015 at 04:07:01PM +0900, Kees Cook wrote:
>> On Mon, Oct 26, 2015 at 3:46 PM, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
>>> Cool, thanks. I'll get this into my tree after kernel summit. Thanks
>>> for suffering through all this Tycho!
>>
>> Actually, since this depends on changes in net, could this get pulled
>> in from that direction?
>>
>> Acked-by: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>
> Can we get the attached patch into net-next?

You need to make a fresh, formal submission of your patch to netdev,
not as an attachment (otherwise patchwork cannot properly pick it up).

Also, indicate the right tree in the subject as: [PATCH net-next] ...

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-10-27  0:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-20 19:50 v8 of seccomp filter c/r Tycho Andersen
2015-10-20 19:50 ` [PATCH v8] seccomp, ptrace: add support for dumping seccomp filters Tycho Andersen
2015-10-20 19:50   ` Tycho Andersen
2015-10-20 20:20   ` Oleg Nesterov
2015-10-20 20:20     ` Oleg Nesterov
2015-10-20 20:26     ` Kees Cook
2015-10-20 20:37       ` Tycho Andersen
2015-10-20 22:08     ` Tycho Andersen
2015-10-21 18:51       ` Oleg Nesterov
2015-10-21 18:51         ` Oleg Nesterov
2015-10-21 19:15         ` Tycho Andersen
2015-10-21 20:12           ` Kees Cook
2015-10-21 20:12             ` Kees Cook
2015-10-21 20:18             ` Daniel Borkmann
2015-10-21 20:37               ` Tycho Andersen
2015-10-21 21:07           ` Oleg Nesterov
2015-10-21 21:07             ` Oleg Nesterov
2015-10-21 21:20             ` Kees Cook
2015-10-21 21:33             ` Tycho Andersen
2015-10-25 15:39               ` Oleg Nesterov
2015-10-25 15:39                 ` Oleg Nesterov
2015-10-26  6:46                 ` Kees Cook
2015-10-26  7:07                   ` Kees Cook
2015-10-26  7:07                     ` Kees Cook
2015-10-27  0:04                     ` Tycho Andersen
2015-10-27  0:17                       ` Daniel Borkmann
2015-10-27  0:17                         ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.