* [PATCH v6 1/3] seccomp: kill the seccomp_t typedef @ 2012-01-28 22:11 Will Drewry 2012-01-28 22:11 ` [PATCH v6 2/3] seccomp_filters: system call filtering using BPF Will Drewry ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Will Drewry @ 2012-01-28 22:11 UTC (permalink / raw) To: linux-kernel Cc: keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, wad, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr Replaces the seccomp_t typedef with seccomp_struct to match modern kernel style. Signed-off-by: Will Drewry <wad@chromium.org> --- include/linux/sched.h | 2 +- include/linux/seccomp.h | 10 ++++++---- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 4032ec1..288b5cb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1418,7 +1418,7 @@ struct task_struct { uid_t loginuid; unsigned int sessionid; #endif - seccomp_t seccomp; + struct seccomp_struct seccomp; /* Thread group tracking */ u32 parent_exec_id; diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index cc7a4e9..171ab66 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -7,7 +7,9 @@ #include <linux/thread_info.h> #include <asm/seccomp.h> -typedef struct { int mode; } seccomp_t; +struct seccomp_struct { + int mode; +}; extern void __secure_computing(int); static inline void secure_computing(int this_syscall) @@ -19,7 +21,7 @@ static inline void secure_computing(int this_syscall) extern long prctl_get_seccomp(void); extern long prctl_set_seccomp(unsigned long); -static inline int seccomp_mode(seccomp_t *s) +static inline int seccomp_mode(struct seccomp_struct *s) { return s->mode; } @@ -28,7 +30,7 @@ static inline int seccomp_mode(seccomp_t *s) #include <linux/errno.h> -typedef struct { } seccomp_t; +struct seccomp_struct { }; #define secure_computing(x) do { } while (0) @@ -42,7 +44,7 @@ static inline long prctl_set_seccomp(unsigned long arg2) return -EINVAL; } -static inline int seccomp_mode(seccomp_t *s) +static inline int seccomp_mode(struct seccomp_struct *s) { return 0; } -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v6 2/3] seccomp_filters: system call filtering using BPF 2012-01-28 22:11 [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Will Drewry @ 2012-01-28 22:11 ` Will Drewry 2012-01-31 14:13 ` Eduardo Otubo 2012-02-02 15:32 ` Serge E. Hallyn 2012-01-28 22:11 ` [PATCH v6 3/3] Documentation: prctl/seccomp_filter Will Drewry 2012-02-02 15:29 ` [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Serge E. Hallyn 2 siblings, 2 replies; 13+ messages in thread From: Will Drewry @ 2012-01-28 22:11 UTC (permalink / raw) To: linux-kernel Cc: keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, wad, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr [This patch depends on luto@mit.edu's no_new_privs patch: https://lkml.org/lkml/2012/1/12/446 ] This patch adds support for seccomp mode 2. This mode enables dynamic enforcement of system call filtering policy in the kernel as specified by a userland task. The policy is expressed in terms of a Berkeley Packet Filter program, as is used for userland-exposed socket filtering. Instead of network data, the BPF program is evaluated over struct seccomp_filter_data at the time of the system call. A filter program may be installed by a userland task by calling prctl(PR_ATTACH_SECCOMP_FILTER, &fprog); where fprog is of type struct sock_fprog. If the first filter program allows subsequent prctl(2) calls, then additional filter programs may be attached. All attached programs must be evaluated before a system call will be allowed to proceed. To avoid CONFIG_COMPAT related landmines, once a filter program is installed using specific is_compat_task() value, it is not allowed to make system calls using the alternate entry point. Filter programs will be inherited across fork/clone and execve, however the installation of filters must be preceded by setting 'no_new_privs' to ensure that unprivileged tasks cannot attach filters that affect privileged tasks (e.g., setuid binary). Tasks with CAP_SYS_ADMIN in their namespace may install inheritable filters without setting the no_new_privs bit. There are a number of benefits to this approach. A few of which are as follows: - BPF has been exposed to userland for a long time. - Userland already knows its ABI: system call numbers and desired arguments - No time-of-check-time-of-use vulnerable data accesses are possible. - system call arguments are loaded on demand only to minimize copying required for system call number-only policy decisions. This patch includes its own BPF evaluator, but relies on the net/core/filter.c BPF checking code. It is possible to share evaluators, but the performance sensitive nature of the network filtering path makes it an iterative optimization which (I think :) can be tackled separately via separate patchsets. (And at some point sharing BPF JIT code!) v6: - fix memory leak on attach compat check failure - require no_new_privs || CAP_SYS_ADMIN prior to filter installation. (luto@mit.edu) - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com) - cleaned up Kconfig (amwang@redhat.com) - on block, note if the call was compat (so the # means something) v5: - uses syscall_get_arguments (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) - uses union-based arg storage with hi/lo struct to handle endianness. Compromises between the two alternate proposals to minimize extra arg shuffling and account for endianness assuming userspace uses offsetof(). (mcgrathr@chromium.org, indan@nul.nu) - update Kconfig description - add include/seccomp_filter.h and add its installation - (naive) on-demand syscall argument loading - drop seccomp_t (eparis@redhat.com) v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS - now uses current->no_new_privs (luto@mit.edu,torvalds@linux-foundation.com) - assign names to seccomp modes (rdunlap@xenotime.net) - fix style issues (rdunlap@xenotime.net) - reworded Kconfig entry (rdunlap@xenotime.net) v3: - macros to inline (oleg@redhat.com) - init_task behavior fixed (oleg@redhat.com) - drop creator entry and extra NULL check (oleg@redhat.com) - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) - adds tentative use of "always_unprivileged" as per torvalds@linux-foundation.org and luto@mit.edu v2: - (patch 2 only) Signed-off-by: Will Drewry <wad@chromium.org> --- include/linux/Kbuild | 1 + include/linux/prctl.h | 3 + include/linux/seccomp.h | 63 ++++ include/linux/seccomp_filter.h | 79 +++++ kernel/Makefile | 1 + kernel/fork.c | 4 + kernel/seccomp.c | 10 +- kernel/seccomp_filter.c | 627 ++++++++++++++++++++++++++++++++++++++++ kernel/sys.c | 4 + security/Kconfig | 20 ++ 10 files changed, 811 insertions(+), 1 deletions(-) create mode 100644 include/linux/seccomp_filter.h create mode 100644 kernel/seccomp_filter.c diff --git a/include/linux/Kbuild b/include/linux/Kbuild index c94e717..5659454 100644 --- a/include/linux/Kbuild +++ b/include/linux/Kbuild @@ -330,6 +330,7 @@ header-y += scc.h header-y += sched.h header-y += screen_info.h header-y += sdla.h +header-y += seccomp_filter.h header-y += securebits.h header-y += selinux_netlink.h header-y += sem.h diff --git a/include/linux/prctl.h b/include/linux/prctl.h index 7ddc7f1..b8c4beb 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -114,4 +114,7 @@ # define PR_SET_MM_START_BRK 6 # define PR_SET_MM_BRK 7 +/* Set process seccomp filters */ +#define PR_ATTACH_SECCOMP_FILTER 37 + #endif /* _LINUX_PRCTL_H */ diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 171ab66..d3b896b 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -5,10 +5,29 @@ #ifdef CONFIG_SECCOMP #include <linux/thread_info.h> +#include <linux/types.h> #include <asm/seccomp.h> +/* Valid values of seccomp_struct.mode */ +#define SECCOMP_MODE_DISABLED 0 /* seccomp is not in use. */ +#define SECCOMP_MODE_STRICT 1 /* uses hard-coded seccomp.c rules. */ +#define SECCOMP_MODE_FILTER 2 /* system call access determined by filter. */ + +struct seccomp_filter; +/** + * struct seccomp_struct - the state of a seccomp'ed process + * + * @mode: indicates one of the valid values above for controlled + * system calls available to a process. + * @filter: Metadata for filter if using CONFIG_SECCOMP_FILTER. + * @filter must only be accessed from the context of current as there + * is no guard. + */ struct seccomp_struct { int mode; +#ifdef CONFIG_SECCOMP_FILTER + struct seccomp_filter *filter; +#endif }; extern void __secure_computing(int); @@ -51,4 +70,48 @@ static inline int seccomp_mode(struct seccomp_struct *s) #endif /* CONFIG_SECCOMP */ +#ifdef CONFIG_SECCOMP_FILTER + + +extern long prctl_attach_seccomp_filter(char __user *); + +extern struct seccomp_filter *get_seccomp_filter(struct seccomp_filter *); +extern void put_seccomp_filter(struct seccomp_filter *); + +extern int seccomp_test_filters(int); +extern void seccomp_filter_log_failure(int); +extern void seccomp_fork(struct seccomp_struct *child, + const struct seccomp_struct *parent); + +static inline void seccomp_init_task(struct seccomp_struct *seccomp) +{ + seccomp->mode = SECCOMP_MODE_DISABLED; + seccomp->filter = NULL; +} + +/* No locking is needed here because the task_struct will + * have no parallel consumers. + */ +static inline void seccomp_free_task(struct seccomp_struct *seccomp) +{ + put_seccomp_filter(seccomp->filter); + seccomp->filter = NULL; +} + +#else /* CONFIG_SECCOMP_FILTER */ + +#include <linux/errno.h> + +struct seccomp_filter { }; +/* Macros consume the unused dereference by the caller. */ +#define seccomp_init_task(_seccomp) do { } while (0); +#define seccomp_fork(_tsk, _orig) do { } while (0); +#define seccomp_free_task(_seccomp) do { } while (0); + +static inline long prctl_attach_seccomp_filter(char __user *a2) +{ + return -ENOSYS; +} + +#endif /* CONFIG_SECCOMP_FILTER */ #endif /* _LINUX_SECCOMP_H */ diff --git a/include/linux/seccomp_filter.h b/include/linux/seccomp_filter.h new file mode 100644 index 0000000..3ecd641 --- /dev/null +++ b/include/linux/seccomp_filter.h @@ -0,0 +1,79 @@ +/* + * Secomp-based system call filtering data structures and definitions. + * + * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + * + */ + +#ifndef __LINUX_SECCOMP_FILTER_H__ +#define __LINUX_SECCOMP_FILTER_H__ + +#include <asm/byteorder.h> +#include <linux/compiler.h> +#include <linux/types.h> + +/* + * Keep the contents of this file similar to linux/filter.h: + * struct sock_filter and sock_fprog and versions. + * Custom naming exists solely if divergence is ever needed. + */ + +/* + * Current version of the filter code architecture. + */ +#define SECCOMP_BPF_MAJOR_VERSION 1 +#define SECCOMP_BPF_MINOR_VERSION 1 + +struct seccomp_filter_block { /* Filter block */ + __u16 code; /* Actual filter code */ + __u8 jt; /* Jump true */ + __u8 jf; /* Jump false */ + __u32 k; /* Generic multiuse field */ +}; + +struct seccomp_fprog { /* Required for SO_ATTACH_FILTER. */ + unsigned short len; /* Number of filter blocks */ + struct seccomp_filter_block __user *filter; +}; + +/* Ensure the u32 ordering is consistent with platform byte order. */ +#if defined(__LITTLE_ENDIAN) +#define SECCOMP_ENDIAN_SWAP(x, y) x, y +#elif defined(__BIG_ENDIAN) +#define SECCOMP_ENDIAN_SWAP(x, y) y, x +#else +#error edit for your odd arch byteorder. +#endif + +/* System call argument layout for the filter data. */ +union seccomp_filter_arg { + struct { + __u32 SECCOMP_ENDIAN_SWAP(lo32, hi32); + }; + __u64 u64; +}; + +/* + * Expected data the BPF program will execute over. + * Endianness will be arch specific, but the values will be + * swapped, as above, to allow for consistent BPF programs. + */ +struct seccomp_filter_data { + int syscall_nr; + __u32 __reserved; + union seccomp_filter_arg args[6]; +}; + +#undef SECCOMP_ENDIAN_SWAP + +/* + * Defined valid return values for the BPF program. + */ +#define SECCOMP_BPF_ALLOW 0xFFFFFFFF +#define SECCOMP_BPF_DENY 0 + +#endif /* __LINUX_SECCOMP_FILTER_H__ */ diff --git a/kernel/Makefile b/kernel/Makefile index 2d9de86..fd81bac 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -78,6 +78,7 @@ obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_SECCOMP) += seccomp.o +obj-$(CONFIG_SECCOMP_FILTER) += seccomp_filter.o obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o obj-$(CONFIG_TREE_RCU) += rcutree.o obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o diff --git a/kernel/fork.c b/kernel/fork.c index 051f090..0007933 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -34,6 +34,7 @@ #include <linux/cgroup.h> #include <linux/security.h> #include <linux/hugetlb.h> +#include <linux/seccomp.h> #include <linux/swap.h> #include <linux/syscalls.h> #include <linux/jiffies.h> @@ -169,6 +170,7 @@ void free_task(struct task_struct *tsk) free_thread_info(tsk->stack); rt_mutex_debug_task_free(tsk); ftrace_graph_exit_task(tsk); + seccomp_free_task(&tsk->seccomp); free_task_struct(tsk); } EXPORT_SYMBOL(free_task); @@ -1093,6 +1095,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, goto fork_out; ftrace_graph_init_task(p); + seccomp_init_task(&p->seccomp); rt_mutex_init_task(p); @@ -1376,6 +1379,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, if (clone_flags & CLONE_THREAD) threadgroup_change_end(current); perf_event_fork(p); + seccomp_fork(&p->seccomp, ¤t->seccomp); trace_task_newtask(p, clone_flags); diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e8d76c5..a045dd4 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -37,7 +37,7 @@ void __secure_computing(int this_syscall) int * syscall; switch (mode) { - case 1: + case SECCOMP_MODE_STRICT: syscall = mode1_syscalls; #ifdef CONFIG_COMPAT if (is_compat_task()) @@ -48,6 +48,14 @@ void __secure_computing(int this_syscall) return; } while (*++syscall); break; +#ifdef CONFIG_SECCOMP_FILTER + case SECCOMP_MODE_FILTER: + if (seccomp_test_filters(this_syscall) == 0) + return; + + seccomp_filter_log_failure(this_syscall); + break; +#endif default: BUG(); } diff --git a/kernel/seccomp_filter.c b/kernel/seccomp_filter.c new file mode 100644 index 0000000..0e2e56c --- /dev/null +++ b/kernel/seccomp_filter.c @@ -0,0 +1,627 @@ +/* + * linux/kernel/seccomp_filter.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> + * + * Extends linux/kernel/seccomp.c to allow tasks to install system call + * filters using a Berkeley Packet Filter program which is executed over + * struct seccomp_filter_data. + */ + +#include <asm/syscall.h> + +#include <linux/capability.h> +#include <linux/compat.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/rculist.h> +#include <linux/filter.h> +#include <linux/kallsyms.h> +#include <linux/kref.h> +#include <linux/module.h> +#include <linux/pid.h> +#include <linux/prctl.h> +#include <linux/ptrace.h> +#include <linux/ratelimit.h> +#include <linux/reciprocal_div.h> +#include <linux/regset.h> +#include <linux/seccomp.h> +#include <linux/seccomp_filter.h> +#include <linux/security.h> +#include <linux/seccomp.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/uaccess.h> +#include <linux/user.h> + + +/** + * struct seccomp_filter - container for seccomp BPF programs + * + * @usage: reference count to manage the object lifetime. + * get/put helpers should be used when accessing an instance + * outside of a lifetime-guarded section. In general, this + * is only needed for handling filters shared across tasks. + * @parent: pointer to the ancestor which this filter will be composed with. + * @insns: the BPF program instructions to evaluate + * @count: the number of instructions in the program. + * + * seccomp_filter objects should never be modified after being attached + * to a task_struct (other than @usage). + */ +struct seccomp_filter { + struct kref usage; + struct seccomp_filter *parent; + struct { + uint32_t compat:1; + } flags; + unsigned short count; /* Instruction count */ + struct sock_filter insns[0]; +}; + +/* + * struct seccomp_filter_metadata - BPF data wrapper + * @data: data accessible to the BPF program. + * @has_args: indicates that the args have been lazily populated. + * + * Used by seccomp_load_pointer. + */ +struct seccomp_filter_metadata { + struct seccomp_filter_data data; + bool has_args; +}; + +static unsigned int seccomp_run_filter(void *, uint32_t, + const struct sock_filter *); + +/** + * seccomp_filter_alloc - allocates a new filter object + * @padding: size of the insns[0] array in bytes + * + * The @padding should be a multiple of + * sizeof(struct sock_filter). + * + * Returns ERR_PTR on error or an allocated object. + */ +static struct seccomp_filter *seccomp_filter_alloc(unsigned long padding) +{ + struct seccomp_filter *f; + unsigned long bpf_blocks = padding / sizeof(struct sock_filter); + + /* Drop oversized requests. */ + if (bpf_blocks == 0 || bpf_blocks > BPF_MAXINSNS) + return ERR_PTR(-EINVAL); + + /* Padding should always be in sock_filter increments. */ + if (padding % sizeof(struct sock_filter)) + return ERR_PTR(-EINVAL); + + f = kzalloc(sizeof(struct seccomp_filter) + padding, GFP_KERNEL); + if (!f) + return ERR_PTR(-ENOMEM); + kref_init(&f->usage); + f->count = bpf_blocks; + return f; +} + +/** + * seccomp_filter_free - frees the allocated filter. + * @filter: NULL or live object to be completely destructed. + */ +static void seccomp_filter_free(struct seccomp_filter *filter) +{ + if (!filter) + return; + put_seccomp_filter(filter->parent); + kfree(filter); +} + +static void __put_seccomp_filter(struct kref *kref) +{ + struct seccomp_filter *orig = + container_of(kref, struct seccomp_filter, usage); + seccomp_filter_free(orig); +} + +void seccomp_filter_log_failure(int syscall) +{ + int compat = 0; +#ifdef CONFIG_COMPAT + compat = is_compat_task(); +#endif + pr_info("%s[%d]: %ssystem call %d blocked at 0x%lx\n", + current->comm, task_pid_nr(current), + (compat ? "compat " : ""), + syscall, KSTK_EIP(current)); +} + +/* put_seccomp_filter - decrements the ref count of @orig and may free. */ +void put_seccomp_filter(struct seccomp_filter *orig) +{ + if (!orig) + return; + kref_put(&orig->usage, __put_seccomp_filter); +} + +/* get_seccomp_filter - increments the reference count of @orig. */ +struct seccomp_filter *get_seccomp_filter(struct seccomp_filter *orig) +{ + if (!orig) + return NULL; + kref_get(&orig->usage); + return orig; +} + +#if BITS_PER_LONG == 32 +static inline unsigned long *seccomp_filter_data_arg( + struct seccomp_filter_data *data, int index) +{ + /* Avoid inconsistent hi contents. */ + data->args[index].hi32 = 0; + return (unsigned long *) &(data->args[index].lo32); +} +#elif BITS_PER_LONG == 64 +static inline unsigned long *seccomp_filter_data_arg( + struct seccomp_filter_data *data, int index) +{ + return (unsigned long *) &(data->args[index].u64); +} +#else +#error Unknown BITS_PER_LONG. +#endif + +/** + * seccomp_load_pointer: checks and returns a pointer to the requested offset + * @buf: u8 array to index into + * @buflen: length of the @buf array + * @offset: offset to return data from + * @size: size of the data to retrieve at offset + * @unused: placeholder which net/core/filter.c uses for for temporary + * storage. Ideally, the two code paths can be merged. + * + * Returns a pointer to the BPF evaluator after checking the offset and size + * boundaries. + */ +static inline void *seccomp_load_pointer(void *data, int offset, size_t size, + void *buffer) +{ + struct seccomp_filter_metadata *metadata = data; + int arg; + if (offset >= sizeof(metadata->data)) + goto fail; + if (offset < 0) + goto fail; + if (size > sizeof(metadata->data) - offset) + goto fail; + if (metadata->has_args) + goto pass; + /* No argument data touched. */ + if (offset + size - 1 < offsetof(struct seccomp_filter_data, args)) + goto pass; + for (arg = 0; arg < ARRAY_SIZE(metadata->data.args); ++arg) + syscall_get_arguments(current, task_pt_regs(current), arg, 1, + seccomp_filter_data_arg(&metadata->data, arg)); + metadata->has_args = true; +pass: + return ((__u8 *)(&metadata->data)) + offset; +fail: + return NULL; +} + +/** + * seccomp_test_filters - tests 'current' against the given syscall + * @syscall: number of the system call to test + * + * Returns 0 on ok and non-zero on error/failure. + */ +int seccomp_test_filters(int syscall) +{ + int ret = -EACCES; + struct seccomp_filter *filter; + struct seccomp_filter_metadata metadata; + + filter = current->seccomp.filter; /* uses task ref */ + if (!filter) + goto out; + + metadata.data.syscall_nr = syscall; + metadata.has_args = false; + +#ifdef CONFIG_COMPAT + if (filter->flags.compat != !!(is_compat_task())) + goto out; +#endif + + /* Only allow a system call if it is allowed in all ancestors. */ + ret = 0; + for ( ; filter != NULL; filter = filter->parent) { + /* Allowed if return value is SECCOMP_BPF_ALLOW */ + if (seccomp_run_filter(&metadata, sizeof(metadata.data), + filter->insns) != SECCOMP_BPF_ALLOW) + ret = -EACCES; + } +out: + return ret; +} + +/** + * seccomp_attach_filter: Attaches a seccomp filter to current. + * @fprog: BPF program to install + * + * Context: User context only. This function may sleep on allocation and + * operates on current. current must be attempting a system call + * when this is called (usually prctl). + * + * This function may be called repeatedly to install additional filters. + * Every filter successfully installed will be evaluated (in reverse order) + * for each system call the thread makes. + * + * Returns 0 on success or an errno on failure. + */ +long seccomp_attach_filter(struct sock_fprog *fprog) +{ + struct seccomp_filter *filter = NULL; + /* Note, len is a short so overflow should be impossible. */ + unsigned long fp_size = fprog->len * sizeof(struct sock_filter); + long ret = -EPERM; + + /* Allocate a new seccomp_filter */ + filter = seccomp_filter_alloc(fp_size); + if (IS_ERR(filter)) { + ret = PTR_ERR(filter); + goto out; + } + + /* Copy the instructions from fprog. */ + ret = -EFAULT; + if (copy_from_user(filter->insns, fprog->filter, fp_size)) + goto out; + + /* Check the fprog */ + ret = sk_chk_filter(filter->insns, filter->count); + if (ret) + goto out; + + /* + * Installing a seccomp filter requires that the task + * have CAP_SYS_ADMIN in its namespace or be running with + * no_new_privs. This avoids scenarios where unprivileged + * tasks can affect the behavior of privileged children. + */ + ret = -EACCES; + if (!current->no_new_privs && + security_capable_noaudit(current_cred(), current_user_ns(), + CAP_SYS_ADMIN) != 0) + goto out; + + /* + * If there is an existing filter, make it the parent + * and reuse the existing task-based ref. + */ + filter->parent = current->seccomp.filter; + +#ifdef CONFIG_COMPAT + /* Disallow changing system calling conventions after the fact. */ + filter->flags.compat = !!(is_compat_task()); + + if (filter->parent && + filter->parent->flags.compat != filter->flags.compat) + goto out; +#endif + + /* + * Double claim the new filter so we can release it below simplifying + * the error paths earlier. + */ + ret = 0; + get_seccomp_filter(filter); + current->seccomp.filter = filter; + /* Engage seccomp if it wasn't. This doesn't use PR_SET_SECCOMP. */ + if (current->seccomp.mode == SECCOMP_MODE_DISABLED) { + current->seccomp.mode = SECCOMP_MODE_FILTER; + set_thread_flag(TIF_SECCOMP); + } + +out: + put_seccomp_filter(filter); /* for get or task, on err */ + return ret; +} + +#ifdef CONFIG_COMPAT +/* This should be kept in sync with net/compat.c which changes infrequently. */ +struct compat_sock_fprog { + u16 len; + compat_uptr_t filter; /* struct sock_filter */ +}; + +static long compat_attach_seccomp_filter(char __user *optval) +{ + struct compat_sock_fprog __user *fprog32 = + (struct compat_sock_fprog __user *)optval; + struct sock_fprog __user *kfprog = + compat_alloc_user_space(sizeof(struct sock_fprog)); + compat_uptr_t ptr; + u16 len; + + if (!access_ok(VERIFY_READ, fprog32, sizeof(*fprog32)) || + !access_ok(VERIFY_WRITE, kfprog, sizeof(struct sock_fprog)) || + __get_user(len, &fprog32->len) || + __get_user(ptr, &fprog32->filter) || + __put_user(len, &kfprog->len) || + __put_user(compat_ptr(ptr), &kfprog->filter)) + return -EFAULT; + + return seccomp_attach_filter(kfprog); +} +#endif + +long prctl_attach_seccomp_filter(char __user *user_filter) +{ + struct sock_fprog fprog; + long ret = -EINVAL; + ret = -EFAULT; + if (!user_filter) + goto out; + +#ifdef CONFIG_COMPAT + if (is_compat_task()) + return compat_attach_seccomp_filter(user_filter); +#endif + + if (copy_from_user(&fprog, user_filter, sizeof(fprog))) + goto out; + + ret = seccomp_attach_filter(&fprog); +out: + return ret; +} + +/** + * seccomp_fork: manages inheritance on fork + * @child: forkee's seccomp_struct + * @parent: forker's seccomp_struct + * + * Ensures that @child inherits seccomp mode and state iff + * seccomp filtering is in use. + */ +void seccomp_fork(struct seccomp_struct *child, + const struct seccomp_struct *parent) +{ + child->mode = parent->mode; + if (parent->mode != SECCOMP_MODE_FILTER) + return; + child->filter = get_seccomp_filter(parent->filter); +} + +/** + * seccomp_run_filter - evaluate BPF + * @buf: opaque buffer to execute the filter over + * @buflen: length of the buffer + * @fentry: filter to apply + * + * Decode and apply filter instructions to the buffer. Return length to + * keep, 0 for none. @buf is a seccomp_filter_metadata we are filtering, + * @filter is the array of filter instructions. Because all jumps are + * guaranteed to be before last instruction, and last instruction + * guaranteed to be a RET, we dont need to check flen. + * + * See core/net/filter.c as this is nearly an exact copy. + * At some point, it would be nice to merge them to take advantage of + * optimizations (like JIT). + */ +static unsigned int seccomp_run_filter(void *data, uint32_t datalen, + const struct sock_filter *fentry) +{ + const void *ptr; + u32 A = 0; /* Accumulator */ + u32 X = 0; /* Index Register */ + u32 mem[BPF_MEMWORDS]; /* Scratch Memory Store */ + u32 tmp; + int k; + + /* + * Process array of filter instructions. + */ + for (;; fentry++) { +#if defined(CONFIG_X86_32) +#define K (fentry->k) +#else + const u32 K = fentry->k; +#endif + + switch (fentry->code) { + case BPF_S_ALU_ADD_X: + A += X; + continue; + case BPF_S_ALU_ADD_K: + A += K; + continue; + case BPF_S_ALU_SUB_X: + A -= X; + continue; + case BPF_S_ALU_SUB_K: + A -= K; + continue; + case BPF_S_ALU_MUL_X: + A *= X; + continue; + case BPF_S_ALU_MUL_K: + A *= K; + continue; + case BPF_S_ALU_DIV_X: + if (X == 0) + return 0; + A /= X; + continue; + case BPF_S_ALU_DIV_K: + A = reciprocal_divide(A, K); + continue; + case BPF_S_ALU_AND_X: + A &= X; + continue; + case BPF_S_ALU_AND_K: + A &= K; + continue; + case BPF_S_ALU_OR_X: + A |= X; + continue; + case BPF_S_ALU_OR_K: + A |= K; + continue; + case BPF_S_ALU_LSH_X: + A <<= X; + continue; + case BPF_S_ALU_LSH_K: + A <<= K; + continue; + case BPF_S_ALU_RSH_X: + A >>= X; + continue; + case BPF_S_ALU_RSH_K: + A >>= K; + continue; + case BPF_S_ALU_NEG: + A = -A; + continue; + case BPF_S_JMP_JA: + fentry += K; + continue; + case BPF_S_JMP_JGT_K: + fentry += (A > K) ? fentry->jt : fentry->jf; + continue; + case BPF_S_JMP_JGE_K: + fentry += (A >= K) ? fentry->jt : fentry->jf; + continue; + case BPF_S_JMP_JEQ_K: + fentry += (A == K) ? fentry->jt : fentry->jf; + continue; + case BPF_S_JMP_JSET_K: + fentry += (A & K) ? fentry->jt : fentry->jf; + continue; + case BPF_S_JMP_JGT_X: + fentry += (A > X) ? fentry->jt : fentry->jf; + continue; + case BPF_S_JMP_JGE_X: + fentry += (A >= X) ? fentry->jt : fentry->jf; + continue; + case BPF_S_JMP_JEQ_X: + fentry += (A == X) ? fentry->jt : fentry->jf; + continue; + case BPF_S_JMP_JSET_X: + fentry += (A & X) ? fentry->jt : fentry->jf; + continue; + case BPF_S_LD_W_ABS: + k = K; +load_w: + ptr = seccomp_load_pointer(data, k, 4, &tmp); + if (ptr != NULL) { + /* + * Assume load_pointer did any byte swapping. + */ + A = *(const u32 *)ptr; + continue; + } + return 0; + case BPF_S_LD_H_ABS: + k = K; +load_h: + ptr = seccomp_load_pointer(data, k, 2, &tmp); + if (ptr != NULL) { + A = *(const u16 *)ptr; + continue; + } + return 0; + case BPF_S_LD_B_ABS: + k = K; +load_b: + ptr = seccomp_load_pointer(data, k, 1, &tmp); + if (ptr != NULL) { + A = *(const u8 *)ptr; + continue; + } + return 0; + case BPF_S_LD_W_LEN: + A = datalen; + continue; + case BPF_S_LDX_W_LEN: + X = datalen; + continue; + case BPF_S_LD_W_IND: + k = X + K; + goto load_w; + case BPF_S_LD_H_IND: + k = X + K; + goto load_h; + case BPF_S_LD_B_IND: + k = X + K; + goto load_b; + case BPF_S_LDX_B_MSH: + ptr = seccomp_load_pointer(data, K, 1, &tmp); + if (ptr != NULL) { + X = (*(u8 *)ptr & 0xf) << 2; + continue; + } + return 0; + case BPF_S_LD_IMM: + A = K; + continue; + case BPF_S_LDX_IMM: + X = K; + continue; + case BPF_S_LD_MEM: + A = mem[K]; + continue; + case BPF_S_LDX_MEM: + X = mem[K]; + continue; + case BPF_S_MISC_TAX: + X = A; + continue; + case BPF_S_MISC_TXA: + A = X; + continue; + case BPF_S_RET_K: + return K; + case BPF_S_RET_A: + return A; + case BPF_S_ST: + mem[K] = A; + continue; + case BPF_S_STX: + mem[K] = X; + continue; + case BPF_S_ANC_PROTOCOL: + case BPF_S_ANC_PKTTYPE: + case BPF_S_ANC_IFINDEX: + case BPF_S_ANC_MARK: + case BPF_S_ANC_QUEUE: + case BPF_S_ANC_HATYPE: + case BPF_S_ANC_RXHASH: + case BPF_S_ANC_CPU: + case BPF_S_ANC_NLATTR: + case BPF_S_ANC_NLATTR_NEST: + continue; + default: + WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n", + fentry->code, fentry->jt, + fentry->jf, fentry->k); + return 0; + } + } + + return 0; +} diff --git a/kernel/sys.c b/kernel/sys.c index 4070153..8e43f70 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1901,6 +1901,10 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_SET_SECCOMP: error = prctl_set_seccomp(arg2); break; + case PR_ATTACH_SECCOMP_FILTER: + error = prctl_attach_seccomp_filter((char __user *) + arg2); + break; case PR_GET_TSC: error = GET_TSC_CTL(arg2); break; diff --git a/security/Kconfig b/security/Kconfig index 51bd5a0..3c55d36 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -84,6 +84,26 @@ config SECURITY_DMESG_RESTRICT If you are unsure how to answer this question, answer N. +config SECCOMP_FILTER + bool "Enable seccomp-based system call filtering" + select SECCOMP + help + This option provides support for limiting the accessibility of + system calls at a task-level using a dynamically defined policy. + + System call filtering policy is expressed as a Berkeley Packet + Filter program. The program is attached using prctl(2) and + cannot be detached. Once attached, the filter program will + evaluate each system call, and its arguments, the task + makes. Its output determines if the system call may proceed. + If the system call is disallowed, the task will be terminated + immediately. + + Dynamically limiting system call access aids software in the + creation of secure computation environments. + + See Documentation/prctl/seccomp_filter.txt for more detail. + config SECURITY bool "Enable different security models" depends on SYSFS -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v6 2/3] seccomp_filters: system call filtering using BPF 2012-01-28 22:11 ` [PATCH v6 2/3] seccomp_filters: system call filtering using BPF Will Drewry @ 2012-01-31 14:13 ` Eduardo Otubo 2012-01-31 15:20 ` Will Drewry 2012-02-02 15:32 ` Serge E. Hallyn 1 sibling, 1 reply; 13+ messages in thread From: Eduardo Otubo @ 2012-01-31 14:13 UTC (permalink / raw) To: Will Drewry Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On Sat, Jan 28, 2012 at 04:11:54PM -0600, Will Drewry wrote: > [This patch depends on luto@mit.edu's no_new_privs patch: > https://lkml.org/lkml/2012/1/12/446 > ] Will, I know you clearly pointed to use luto@mit.edu's first no_new_privs patch, but I couldn't avoid to test it with the latest (and 3rd) version of the patch [0]. Which defines PR_GET_NO_NEW_PRIVS as 37 as you can see here [1]. The compilation then would break here: CC kernel/sys.o kernel/sys.c: In function ‘sys_prctl’: kernel/sys.c:1975: error: duplicate case value kernel/sys.c:1904: error: previously used here make[1]: *** [kernel/sys.o] Error 1 make: *** [kernel] Error 2 I just changed the value of PR_ATTACH_SECCOMP_FILTER to 38 and everything went fine. Do you see any problems on changing this value? Regards, [0] - https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=heads [1] - https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=blobdiff;f=include/linux/prctl.h;h=a6b5ac9cfe560eeb277646fbe338ae2b14c46caf;hp=7ddc7f1b480fd41318d94c0a39c8e2ff80f9c5f8;hb=7102b0e278af50d27b5d61d1be5faaba1b0a091e;hpb=acb42a3b611d7ad4cb173c3b37674b549df2ffeb -- Eduardo Otubo Software Engineer Linux Technology Center IBM Systems & Technology Group Mobile: +55 19 8135 0885 eotubo@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 2/3] seccomp_filters: system call filtering using BPF 2012-01-31 14:13 ` Eduardo Otubo @ 2012-01-31 15:20 ` Will Drewry 0 siblings, 0 replies; 13+ messages in thread From: Will Drewry @ 2012-01-31 15:20 UTC (permalink / raw) To: Eduardo Otubo Cc: linux-kernel, keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On Tue, Jan 31, 2012 at 7:13 AM, Eduardo Otubo <otubo@linux.vnet.ibm.com> wrote: > On Sat, Jan 28, 2012 at 04:11:54PM -0600, Will Drewry wrote: >> [This patch depends on luto@mit.edu's no_new_privs patch: >> https://lkml.org/lkml/2012/1/12/446 >> ] > > Will, > > I know you clearly pointed to use luto@mit.edu's first no_new_privs > patch, but I couldn't avoid to test it with the latest (and 3rd) version > of the patch [0]. Which defines PR_GET_NO_NEW_PRIVS as 37 as you can see > here [1]. The compilation then would break here: > > CC kernel/sys.o > kernel/sys.c: In function ‘sys_prctl’: > kernel/sys.c:1975: error: duplicate case value > kernel/sys.c:1904: error: previously used here > make[1]: *** [kernel/sys.o] Error 1 > make: *** [kernel] Error 2 > > I just changed the value of PR_ATTACH_SECCOMP_FILTER to 38 and > everything went fine. Do you see any problems on changing this value? Should be fine -- in the next version, I won't be adding a new PR_ define at all. Feel free to change it to whatever compiles -- the code only uses the define name for access. Sorry for the collision - I posted the last rev without the latest from luto. Cheers! will > Regards, > > [0] - https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=heads > [1] - > https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=blobdiff;f=include/linux/prctl.h;h=a6b5ac9cfe560eeb277646fbe338ae2b14c46caf;hp=7ddc7f1b480fd41318d94c0a39c8e2ff80f9c5f8;hb=7102b0e278af50d27b5d61d1be5faaba1b0a091e;hpb=acb42a3b611d7ad4cb173c3b37674b549df2ffeb > > -- > Eduardo Otubo > Software Engineer > Linux Technology Center > IBM Systems & Technology Group > Mobile: +55 19 8135 0885 > eotubo@linux.vnet.ibm.com > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 2/3] seccomp_filters: system call filtering using BPF 2012-01-28 22:11 ` [PATCH v6 2/3] seccomp_filters: system call filtering using BPF Will Drewry 2012-01-31 14:13 ` Eduardo Otubo @ 2012-02-02 15:32 ` Serge E. Hallyn 2012-02-03 23:14 ` Will Drewry 1 sibling, 1 reply; 13+ messages in thread From: Serge E. Hallyn @ 2012-02-02 15:32 UTC (permalink / raw) To: Will Drewry Cc: linux-kernel, keescook, john.johansen, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr Quoting Will Drewry (wad@chromium.org): > [This patch depends on luto@mit.edu's no_new_privs patch: > https://lkml.org/lkml/2012/1/12/446 > ] > > This patch adds support for seccomp mode 2. This mode enables dynamic > enforcement of system call filtering policy in the kernel as specified > by a userland task. The policy is expressed in terms of a Berkeley > Packet Filter program, as is used for userland-exposed socket filtering. > Instead of network data, the BPF program is evaluated over struct > seccomp_filter_data at the time of the system call. > > A filter program may be installed by a userland task by calling > prctl(PR_ATTACH_SECCOMP_FILTER, &fprog); > where fprog is of type struct sock_fprog. > > If the first filter program allows subsequent prctl(2) calls, then > additional filter programs may be attached. All attached programs > must be evaluated before a system call will be allowed to proceed. > > To avoid CONFIG_COMPAT related landmines, once a filter program is > installed using specific is_compat_task() value, it is not allowed to > make system calls using the alternate entry point. > > Filter programs will be inherited across fork/clone and execve, however > the installation of filters must be preceded by setting 'no_new_privs' > to ensure that unprivileged tasks cannot attach filters that affect > privileged tasks (e.g., setuid binary). Tasks with CAP_SYS_ADMIN > in their namespace may install inheritable filters without setting > the no_new_privs bit. > > There are a number of benefits to this approach. A few of which are > as follows: > - BPF has been exposed to userland for a long time. > - Userland already knows its ABI: system call numbers and desired > arguments > - No time-of-check-time-of-use vulnerable data accesses are possible. > - system call arguments are loaded on demand only to minimize copying > required for system call number-only policy decisions. > > This patch includes its own BPF evaluator, but relies on the > net/core/filter.c BPF checking code. It is possible to share > evaluators, but the performance sensitive nature of the network > filtering path makes it an iterative optimization which (I think :) can > be tackled separately via separate patchsets. (And at some point sharing > BPF JIT code!) > > v6: - fix memory leak on attach compat check failure > - require no_new_privs || CAP_SYS_ADMIN prior to filter > installation. (luto@mit.edu) > - s/seccomp_struct_/seccomp_/ for macros/functions > (amwang@redhat.com) > - cleaned up Kconfig (amwang@redhat.com) > - on block, note if the call was compat (so the # means something) > v5: - uses syscall_get_arguments > (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) > - uses union-based arg storage with hi/lo struct to > handle endianness. Compromises between the two alternate > proposals to minimize extra arg shuffling and account for > endianness assuming userspace uses offsetof(). > (mcgrathr@chromium.org, indan@nul.nu) > - update Kconfig description > - add include/seccomp_filter.h and add its installation > - (naive) on-demand syscall argument loading > - drop seccomp_t (eparis@redhat.com) > v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS > - now uses current->no_new_privs > (luto@mit.edu,torvalds@linux-foundation.com) > - assign names to seccomp modes (rdunlap@xenotime.net) > - fix style issues (rdunlap@xenotime.net) > - reworded Kconfig entry (rdunlap@xenotime.net) > v3: - macros to inline (oleg@redhat.com) > - init_task behavior fixed (oleg@redhat.com) > - drop creator entry and extra NULL check (oleg@redhat.com) > - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) > - adds tentative use of "always_unprivileged" as per > torvalds@linux-foundation.org and luto@mit.edu > v2: - (patch 2 only) > > Signed-off-by: Will Drewry <wad@chromium.org> Hi Will, as far as I can tell based on changelog I suspect you could have kept my Acked-by (from v3?). However, I'll wait until your next submission (as I see there were a few change requests), and do a final complete new review of that. Thanks for continuing to push on this. -serge ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 2/3] seccomp_filters: system call filtering using BPF 2012-02-02 15:32 ` Serge E. Hallyn @ 2012-02-03 23:14 ` Will Drewry 0 siblings, 0 replies; 13+ messages in thread From: Will Drewry @ 2012-02-03 23:14 UTC (permalink / raw) To: Serge E. Hallyn Cc: linux-kernel, keescook, john.johansen, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On Thu, Feb 2, 2012 at 7:32 AM, Serge E. Hallyn <serge.hallyn@canonical.com> wrote: > Quoting Will Drewry (wad@chromium.org): >> [This patch depends on luto@mit.edu's no_new_privs patch: >> https://lkml.org/lkml/2012/1/12/446 >> ] >> >> This patch adds support for seccomp mode 2. This mode enables dynamic >> enforcement of system call filtering policy in the kernel as specified >> by a userland task. The policy is expressed in terms of a Berkeley >> Packet Filter program, as is used for userland-exposed socket filtering. >> Instead of network data, the BPF program is evaluated over struct >> seccomp_filter_data at the time of the system call. >> >> A filter program may be installed by a userland task by calling >> prctl(PR_ATTACH_SECCOMP_FILTER, &fprog); >> where fprog is of type struct sock_fprog. >> >> If the first filter program allows subsequent prctl(2) calls, then >> additional filter programs may be attached. All attached programs >> must be evaluated before a system call will be allowed to proceed. >> >> To avoid CONFIG_COMPAT related landmines, once a filter program is >> installed using specific is_compat_task() value, it is not allowed to >> make system calls using the alternate entry point. >> >> Filter programs will be inherited across fork/clone and execve, however >> the installation of filters must be preceded by setting 'no_new_privs' >> to ensure that unprivileged tasks cannot attach filters that affect >> privileged tasks (e.g., setuid binary). Tasks with CAP_SYS_ADMIN >> in their namespace may install inheritable filters without setting >> the no_new_privs bit. >> >> There are a number of benefits to this approach. A few of which are >> as follows: >> - BPF has been exposed to userland for a long time. >> - Userland already knows its ABI: system call numbers and desired >> arguments >> - No time-of-check-time-of-use vulnerable data accesses are possible. >> - system call arguments are loaded on demand only to minimize copying >> required for system call number-only policy decisions. >> >> This patch includes its own BPF evaluator, but relies on the >> net/core/filter.c BPF checking code. It is possible to share >> evaluators, but the performance sensitive nature of the network >> filtering path makes it an iterative optimization which (I think :) can >> be tackled separately via separate patchsets. (And at some point sharing >> BPF JIT code!) >> >> v6: - fix memory leak on attach compat check failure >> - require no_new_privs || CAP_SYS_ADMIN prior to filter >> installation. (luto@mit.edu) >> - s/seccomp_struct_/seccomp_/ for macros/functions >> (amwang@redhat.com) >> - cleaned up Kconfig (amwang@redhat.com) >> - on block, note if the call was compat (so the # means something) >> v5: - uses syscall_get_arguments >> (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) >> - uses union-based arg storage with hi/lo struct to >> handle endianness. Compromises between the two alternate >> proposals to minimize extra arg shuffling and account for >> endianness assuming userspace uses offsetof(). >> (mcgrathr@chromium.org, indan@nul.nu) >> - update Kconfig description >> - add include/seccomp_filter.h and add its installation >> - (naive) on-demand syscall argument loading >> - drop seccomp_t (eparis@redhat.com) >> v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS >> - now uses current->no_new_privs >> (luto@mit.edu,torvalds@linux-foundation.com) >> - assign names to seccomp modes (rdunlap@xenotime.net) >> - fix style issues (rdunlap@xenotime.net) >> - reworded Kconfig entry (rdunlap@xenotime.net) >> v3: - macros to inline (oleg@redhat.com) >> - init_task behavior fixed (oleg@redhat.com) >> - drop creator entry and extra NULL check (oleg@redhat.com) >> - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) >> - adds tentative use of "always_unprivileged" as per >> torvalds@linux-foundation.org and luto@mit.edu >> v2: - (patch 2 only) >> >> Signed-off-by: Will Drewry <wad@chromium.org> > > Hi Will, > > as far as I can tell based on changelog I suspect you could have > kept my Acked-by (from v3?). However, I'll wait until your next > submission (as I see there were a few change requests), and do a > final complete new review of that. Thanks, Serge! I just failed at the proper protocol and didn't mean to not include your Acked-by. However, I am changing a fair amount of the internals this time around, so I'll be happy to have another full review. > Thanks for continuing to push on this. Definitely! I've been traveling this week, so it's been a bit slow going, but I hope to have the next rev up early next week if not sooner. Cheers! will ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v6 3/3] Documentation: prctl/seccomp_filter 2012-01-28 22:11 [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Will Drewry 2012-01-28 22:11 ` [PATCH v6 2/3] seccomp_filters: system call filtering using BPF Will Drewry @ 2012-01-28 22:11 ` Will Drewry 2012-01-30 22:47 ` Corey Bryant 2012-02-02 15:29 ` [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Serge E. Hallyn 2 siblings, 1 reply; 13+ messages in thread From: Will Drewry @ 2012-01-28 22:11 UTC (permalink / raw) To: linux-kernel Cc: keescook, john.johansen, serge.hallyn, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, wad, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr Documents how system call filtering using Berkeley Packet Filter programs works and how it may be used. Includes an example for x86 (32-bit) and a semi-generic example using an example code generator. v6: - tweak the language to note the requirement of PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu) v5: - update sample to use system call arguments - adds a "fancy" example using a macro-based generator - cleaned up bpf in the sample - update docs to mention arguments - fix prctl value (eparis@redhat.com) - language cleanup (rdunlap@xenotime.net) v4: - update for no_new_privs use - minor tweaks v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net) - document use of tentative always-unprivileged - guard sample compilation for i386 and x86_64 v2: - move code to samples (corbet@lwn.net) Signed-off-by: Will Drewry <wad@chromium.org> --- Documentation/prctl/seccomp_filter.txt | 100 +++++++++++++++ samples/Makefile | 2 +- samples/seccomp/Makefile | 27 ++++ samples/seccomp/bpf-direct.c | 77 +++++++++++ samples/seccomp/bpf-fancy.c | 95 ++++++++++++++ samples/seccomp/bpf-helper.c | 89 +++++++++++++ samples/seccomp/bpf-helper.h | 219 ++++++++++++++++++++++++++++++++ 7 files changed, 608 insertions(+), 1 deletions(-) create mode 100644 Documentation/prctl/seccomp_filter.txt create mode 100644 samples/seccomp/Makefile create mode 100644 samples/seccomp/bpf-direct.c create mode 100644 samples/seccomp/bpf-fancy.c create mode 100644 samples/seccomp/bpf-helper.c create mode 100644 samples/seccomp/bpf-helper.h diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt new file mode 100644 index 0000000..4ad7649 --- /dev/null +++ b/Documentation/prctl/seccomp_filter.txt @@ -0,0 +1,100 @@ + Seccomp filtering + ================= + +Introduction +------------ + +A large number of system calls are exposed to every userland process +with many of them going unused for the entire lifetime of the process. +As system calls change and mature, bugs are found and eradicated. A +certain subset of userland applications benefit by having a reduced set +of available system calls. The resulting set reduces the total kernel +surface exposed to the application. System call filtering is meant for +use with those applications. + +Seccomp filtering provides a means for a process to specify a filter for +incoming system calls. The filter is expressed as a Berkeley Packet +Filter (BPF) program, as with socket filters, except that the data +operated on is related to the system call being made: system call +number, and the system call arguments. This allows for expressive +filtering of system calls using a filter program language with a long +history of being exposed to userland and a straightforward data set. + +Additionally, BPF makes it impossible for users of seccomp to fall prey +to time-of-check-time-of-use (TOCTOU) attacks that are common in system +call interposition frameworks. BPF programs may not dereference +pointers which constrains all filters to solely evaluating the system +call arguments directly. + +What it isn't +------------- + +System call filtering isn't a sandbox. It provides a clearly defined +mechanism for minimizing the exposed kernel surface. Beyond that, +policy for logical behavior and information flow should be managed with +a combination of other system hardening techniques and, potentially, an +LSM of your choosing. Expressive, dynamic filters provide further options down +this path (avoiding pathological sizes or selecting which of the multiplexed +system calls in socketcall() is allowed, for instance) which could be +construed, incorrectly, as a more complete sandboxing solution. + +Usage +----- + +An additional seccomp mode is added, but they are not directly set by +the consuming process. The new mode, '2', is only available if +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the +PR_ATTACH_SECCOMP_FILTER argument. + +Interacting with seccomp filters is done using one prctl(2) call. + +PR_ATTACH_SECCOMP_FILTER: + Allows the specification of a new filter using a BPF program. + The BPF program will be executed over struct seccomp_filter_data + reflecting the system call number, arguments, and other + metadata, To allow a system call, SECCOMP_BPF_ALLOW must be + returned. At present, all other return values result in the + system call being blocked, but it is recommended to return + SECCOMP_BPF_DENY in those cases. This will allow for future + custom return values to be introduced, if ever desired. + + Usage: + prctl(PR_ATTACH_SECCOMP_FILTER, prog); + + The 'prog' argument is a pointer to a struct sock_fprog which will + contain the filter program. If the program is invalid, the call + will return -1 and set errno to EINVAL. + + Note, is_compat_task is also tracked for the @prog. This means + that once set the calling task will have all of its system calls + blocked if it switches its system call ABI. + + If fork/clone and execve are allowed by @prog, any child processes will + be constrained to the same filters and system call ABI as the parent. + + Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or + run with CAP_SYS_ADMIN privileges in its namespace. If these are not + true, -EACCES will be returned. This requirement ensures that filter + programs cannot be applied to child processes with greater privileges + than the task that installed them. + + Additionally, if prctl(2) is allowed by the attached filter, + additional filters may be layered on which will increase evaluation + time, but allow for further decreasing the attack surface during + execution of a process. + +The above call returns 0 on success and non-zero on error. + +Example +------- + +The samples/seccomp/ directory contains both a 32-bit specific example +and a more generic example of a higher level macro interface for BPF +program generation. + +Adding architecture support +----------------------- + +Any platform with seccomp support will support seccomp filters as long +as CONFIG_SECCOMP_FILTER is enabled and the architecture has implemented +syscall_get_arguments. diff --git a/samples/Makefile b/samples/Makefile index 6280817..f29b19c 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -1,4 +1,4 @@ # Makefile for Linux samples code obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/ \ - hw_breakpoint/ kfifo/ kdb/ hidraw/ + hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/ diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile new file mode 100644 index 0000000..0298c6f --- /dev/null +++ b/samples/seccomp/Makefile @@ -0,0 +1,27 @@ +# kbuild trick to avoid linker error. Can be omitted if a module is built. +obj- := dummy.o + +hostprogs-y := bpf-fancy +bpf-fancy-objs := bpf-fancy.o bpf-helper.o + +HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include +HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include +HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include +HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include + +# bpf-direct.c is x86-only. +ifeq ($(filter-out x86_64 i386,$(KBUILD_BUILDHOST)),) +# List of programs to build +hostprogs-y += bpf-direct +bpf-direct-objs := bpf-direct.o +endif + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include +HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include +ifeq ($(KBUILD_BUILDHOST),x86_64) +HOSTCFLAGS_bpf-direct.o += -m32 +HOSTLOADLIBES_bpf-direct += -m32 +endif diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c new file mode 100644 index 0000000..d799244 --- /dev/null +++ b/samples/seccomp/bpf-direct.c @@ -0,0 +1,77 @@ +/* + * 32-bit seccomp filter example with BPF macros + * + * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> + * Author: Will Drewry <wad@chromium.org> + * + * The code may be used by anyone for any purpose, + * and can serve as a starting point for developing + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). + */ + +#include <linux/filter.h> +#include <linux/ptrace.h> +#include <linux/seccomp_filter.h> +#include <linux/unistd.h> +#include <stdio.h> +#include <stddef.h> +#include <sys/prctl.h> +#include <unistd.h> + +#ifndef PR_ATTACH_SECCOMP_FILTER +# define PR_ATTACH_SECCOMP_FILTER 37 +#endif + +#define syscall_arg(_n) (offsetof(struct seccomp_filter_data, args[_n].lo32)) +#define nr (offsetof(struct seccomp_filter_data, syscall_nr)) + +static int install_filter(void) +{ + struct seccomp_filter_block filter[] = { + /* Grab the system call number */ + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, nr), + /* Jump table for the allowed syscalls */ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6), + + /* Check that read is only using stdin. */ + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4), + + /* Check that write is only using stdout/stderr */ + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1), + + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_ALLOW), + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_DENY), + }; + struct seccomp_fprog prog = { + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), + .filter = filter, + }; + if (prctl(PR_ATTACH_SECCOMP_FILTER, &prog)) { + perror("prctl"); + return 1; + } + return 0; +} + +#define payload(_c) (_c), sizeof((_c)) +int main(int argc, char **argv) +{ + char buf[4096]; + ssize_t bytes = 0; + if (install_filter()) + return 1; + syscall(__NR_write, STDOUT_FILENO, + payload("OHAI! WHAT IS YOUR NAME? ")); + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)); + syscall(__NR_write, STDOUT_FILENO, payload("HELLO, ")); + syscall(__NR_write, STDOUT_FILENO, buf, bytes); + return 0; +} diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c new file mode 100644 index 0000000..1318b1a --- /dev/null +++ b/samples/seccomp/bpf-fancy.c @@ -0,0 +1,95 @@ +/* + * Seccomp BPF example using a macro-based generator. + * + * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> + * Author: Will Drewry <wad@chromium.org> + * + * The code may be used by anyone for any purpose, + * and can serve as a starting point for developing + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). + */ + +#include <linux/seccomp_filter.h> +#include <linux/unistd.h> +#include <stdio.h> +#include <string.h> +#include <sys/prctl.h> +#include <unistd.h> + +#include "bpf-helper.h" + +#ifndef PR_ATTACH_SECCOMP_FILTER +# define PR_ATTACH_SECCOMP_FILTER 37 +#endif + +int main(int argc, char **argv) +{ + struct bpf_labels l; + static const char msg1[] = "Please type something: "; + static const char msg2[] = "You typed: "; + char buf[256]; + struct seccomp_filter_block filter[] = { + LOAD_SYSCALL_NR, + SYSCALL(__NR_exit, ALLOW), + SYSCALL(__NR_exit_group, ALLOW), + SYSCALL(__NR_write, JUMP(&l, write_fd)), + SYSCALL(__NR_read, JUMP(&l, read)), + DENY, /* Don't passthrough into a label */ + + LABEL(&l, read), + ARG(0), + JNE(STDIN_FILENO, DENY), + ARG(1), + JNE((unsigned long)buf, DENY), + ARG(2), + JGE(sizeof(buf), DENY), + ALLOW, + + LABEL(&l, write_fd), + ARG(0), + JEQ(STDOUT_FILENO, JUMP(&l, write_buf)), + JEQ(STDERR_FILENO, JUMP(&l, write_buf)), + DENY, + + LABEL(&l, write_buf), + ARG(1), + JEQ((unsigned long)msg1, JUMP(&l, msg1_len)), + JEQ((unsigned long)msg2, JUMP(&l, msg2_len)), + JEQ((unsigned long)buf, JUMP(&l, buf_len)), + DENY, + + LABEL(&l, msg1_len), + ARG(2), + JLT(sizeof(msg1), ALLOW), + DENY, + + LABEL(&l, msg2_len), + ARG(2), + JLT(sizeof(msg2), ALLOW), + DENY, + + LABEL(&l, buf_len), + ARG(2), + JLT(sizeof(buf), ALLOW), + DENY, + }; + struct seccomp_fprog prog = { + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), + .filter = filter, + }; + ssize_t bytes; + bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter)); + + if (prctl(PR_ATTACH_SECCOMP_FILTER, &prog)) { + perror("prctl"); + return 1; + } + syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1)); + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1); + bytes = (bytes > 0 ? bytes : 0); + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)); + syscall(__NR_write, STDERR_FILENO, buf, bytes); + /* Now get killed */ + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2); + return 0; +} diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c new file mode 100644 index 0000000..e1b6bc7 --- /dev/null +++ b/samples/seccomp/bpf-helper.c @@ -0,0 +1,89 @@ +/* + * Seccomp BPF helper functions + * + * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> + * Author: Will Drewry <wad@chromium.org> + * + * The code may be used by anyone for any purpose, + * and can serve as a starting point for developing + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). + */ + +#include <stdio.h> +#include <string.h> + +#include "bpf-helper.h" + +int bpf_resolve_jumps(struct bpf_labels *labels, + struct seccomp_filter_block *filter, size_t count) +{ + struct seccomp_filter_block *begin = filter; + __u8 insn = count - 1; + + if (count < 1) + return -1; + /* + * Walk it once, backwards, to build the label table and do fixups. + * Since backward jumps are disallowed by BPF, this is easy. + */ + filter += insn; + for (; filter >= begin; --insn, --filter) { + if (filter->code != (BPF_JMP+BPF_JA)) + continue; + switch ((filter->jt<<8)|filter->jf) { + case (JUMP_JT<<8)|JUMP_JF: + if (labels->labels[filter->k].location == 0xffffffff) { + fprintf(stderr, "Unresolved label: '%s'\n", + labels->labels[filter->k].label); + return 1; + } + filter->k = labels->labels[filter->k].location - + (insn + 1); + filter->jt = 0; + filter->jf = 0; + continue; + case (LABEL_JT<<8)|LABEL_JF: + if (labels->labels[filter->k].location != 0xffffffff) { + fprintf(stderr, "Duplicate label use: '%s'\n", + labels->labels[filter->k].label); + return 1; + } + labels->labels[filter->k].location = insn; + filter->k = 0; /* fall through */ + filter->jt = 0; + filter->jf = 0; + continue; + } + } + return 0; +} + +/* Simple lookup table for labels. */ +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label) +{ + struct __bpf_label *begin = labels->labels, *end; + int id; + if (labels->count == 0) { + begin->label = label; + begin->location = 0xffffffff; + labels->count++; + return 0; + } + end = begin + labels->count; + for (id = 0; begin < end; ++begin, ++id) { + if (!strcmp(label, begin->label)) + return id; + } + begin->label = label; + begin->location = 0xffffffff; + labels->count++; + return id; +} + +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count) +{ + struct seccomp_filter_block *end = filter + count; + for ( ; filter < end; ++filter) + printf("{ code=%u,jt=%u,jf=%u,k=%u },\n", + filter->code, filter->jt, filter->jf, filter->k); +} diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h new file mode 100644 index 0000000..92b94ec --- /dev/null +++ b/samples/seccomp/bpf-helper.h @@ -0,0 +1,219 @@ +/* + * Example wrapper around BPF macros. + * + * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> + * Author: Will Drewry <wad@chromium.org> + * + * The code may be used by anyone for any purpose, + * and can serve as a starting point for developing + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). + * + * No guarantees are provided with respect to the correctness + * or functionality of this code. + */ +#ifndef __BPF_HELPER_H__ +#define __BPF_HELPER_H__ + +#include <asm/bitsperlong.h> /* for __BITS_PER_LONG */ +#include <linux/filter.h> +#include <linux/seccomp_filter.h> /* for seccomp_filter_data.arg */ +#include <linux/types.h> +#include <linux/unistd.h> +#include <stddef.h> + +#define BPF_LABELS_MAX 256 +struct bpf_labels { + int count; + struct __bpf_label { + const char *label; + __u32 location; + } labels[BPF_LABELS_MAX]; +}; + +int bpf_resolve_jumps(struct bpf_labels *labels, + struct seccomp_filter_block *filter, size_t count); +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label); +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count); + +#define JUMP_JT 0xff +#define JUMP_JF 0xff +#define LABEL_JT 0xfe +#define LABEL_JF 0xfe + +#define ALLOW \ + BPF_STMT(BPF_RET+BPF_K, 0xFFFFFFFF) +#define DENY \ + BPF_STMT(BPF_RET+BPF_K, 0) +#define JUMP(labels, label) \ + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ + JUMP_JT, JUMP_JF) +#define LABEL(labels, label) \ + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ + LABEL_JT, LABEL_JF) +#define SYSCALL(nr, jt) \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \ + jt + +/* Lame, but just an example */ +#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label) + +#define EXPAND(...) __VA_ARGS__ +/* Map all width-sensitive operations */ +#if __BITS_PER_LONG == 32 + +#define JEQ(x, jt) JEQ32(x, EXPAND(jt)) +#define JNE(x, jt) JNE32(x, EXPAND(jt)) +#define JGT(x, jt) JGT32(x, EXPAND(jt)) +#define JLT(x, jt) JLT32(x, EXPAND(jt)) +#define JGE(x, jt) JGE32(x, EXPAND(jt)) +#define JLE(x, jt) JLE32(x, EXPAND(jt)) +#define JA(x, jt) JA32(x, EXPAND(jt)) +#define ARG(i) ARG_32(i) + +#elif __BITS_PER_LONG == 64 + +#define JEQ(x, jt) \ + JEQ64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ + EXPAND(jt)) +#define JGT(x, jt) \ + JGT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ + EXPAND(jt)) +#define JGE(x, jt) \ + JGE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ + EXPAND(jt)) +#define JNE(x, jt) \ + JNE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ + EXPAND(jt)) +#define JLT(x, jt) \ + JLT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ + EXPAND(jt)) +#define JLE(x, jt) \ + JLE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ + EXPAND(jt)) + +#define JA(x, jt) \ + JA64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ + EXPAND(jt)) +#define ARG(i) ARG_64(i) + +#else +#error __BITS_PER_LONG value unusable. +#endif + +/* Loads the arg into A */ +#define ARG_32(idx) \ + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ + offsetof(struct seccomp_filter_data, args[(idx)].lo32)) + +/* Loads hi into A and lo in X */ +#define ARG_64(idx) \ + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ + offsetof(struct seccomp_filter_data, args[(idx)].lo32)), \ + BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \ + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ + offsetof(struct seccomp_filter_data, args[(idx)].hi32)), \ + BPF_STMT(BPF_ST, 1) /* hi -> M[1] */ + +#define JEQ32(value, jt) \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \ + jt + +#define JNE32(value, jt) \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \ + jt + +/* Checks the lo, then swaps to check the hi. A=lo,X=hi */ +#define JEQ64(lo, hi, jt) \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \ + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ + jt, \ + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ + +#define JNE64(lo, hi, jt) \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \ + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \ + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ + jt, \ + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ + +#define JA32(value, jt) \ + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \ + jt + +#define JA64(lo, hi, jt) \ + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \ + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \ + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ + jt, \ + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ + +#define JGE32(value, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \ + jt + +#define JLT32(value, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \ + jt + +/* Shortcut checking if hi > arg.hi. */ +#define JGE64(lo, hi, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \ + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ + jt, \ + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ + +#define JLT64(lo, hi, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ + jt, \ + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ + +#define JGT32(value, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ + jt + +#define JLE32(value, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ + jt + +/* Check hi > args.hi first, then do the GE checking */ +#define JGT64(lo, hi, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \ + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ + jt, \ + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ + +#define JLE64(lo, hi, jt) \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \ + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \ + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ + jt, \ + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ + +#define LOAD_SYSCALL_NR \ + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ + offsetof(struct seccomp_filter_data, syscall_nr)) + +#endif /* __BPF_HELPER_H__ */ -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v6 3/3] Documentation: prctl/seccomp_filter 2012-01-28 22:11 ` [PATCH v6 3/3] Documentation: prctl/seccomp_filter Will Drewry @ 2012-01-30 22:47 ` Corey Bryant 2012-01-30 22:52 ` Will Drewry 0 siblings, 1 reply; 13+ messages in thread From: Corey Bryant @ 2012-01-30 22:47 UTC (permalink / raw) To: Will Drewry Cc: linux-kernel, keescook, john.johansen, serge.hallyn, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On 01/28/2012 05:11 PM, Will Drewry wrote: > Documents how system call filtering using Berkeley Packet > Filter programs works and how it may be used. > Includes an example for x86 (32-bit) and a semi-generic > example using an example code generator. > > v6: - tweak the language to note the requirement of > PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu) > v5: - update sample to use system call arguments > - adds a "fancy" example using a macro-based generator > - cleaned up bpf in the sample > - update docs to mention arguments > - fix prctl value (eparis@redhat.com) > - language cleanup (rdunlap@xenotime.net) > v4: - update for no_new_privs use > - minor tweaks > v3: - call out BPF<-> Berkeley Packet Filter (rdunlap@xenotime.net) > - document use of tentative always-unprivileged > - guard sample compilation for i386 and x86_64 > v2: - move code to samples (corbet@lwn.net) > > Signed-off-by: Will Drewry<wad@chromium.org> > --- > Documentation/prctl/seccomp_filter.txt | 100 +++++++++++++++ > samples/Makefile | 2 +- > samples/seccomp/Makefile | 27 ++++ > samples/seccomp/bpf-direct.c | 77 +++++++++++ > samples/seccomp/bpf-fancy.c | 95 ++++++++++++++ > samples/seccomp/bpf-helper.c | 89 +++++++++++++ > samples/seccomp/bpf-helper.h | 219 ++++++++++++++++++++++++++++++++ > 7 files changed, 608 insertions(+), 1 deletions(-) > create mode 100644 Documentation/prctl/seccomp_filter.txt > create mode 100644 samples/seccomp/Makefile > create mode 100644 samples/seccomp/bpf-direct.c > create mode 100644 samples/seccomp/bpf-fancy.c > create mode 100644 samples/seccomp/bpf-helper.c > create mode 100644 samples/seccomp/bpf-helper.h > > diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt > new file mode 100644 > index 0000000..4ad7649 > --- /dev/null > +++ b/Documentation/prctl/seccomp_filter.txt > @@ -0,0 +1,100 @@ > + Seccomp filtering > + ================= > + > +Introduction > +------------ > + > +A large number of system calls are exposed to every userland process > +with many of them going unused for the entire lifetime of the process. > +As system calls change and mature, bugs are found and eradicated. A > +certain subset of userland applications benefit by having a reduced set > +of available system calls. The resulting set reduces the total kernel > +surface exposed to the application. System call filtering is meant for > +use with those applications. > + > +Seccomp filtering provides a means for a process to specify a filter for > +incoming system calls. The filter is expressed as a Berkeley Packet > +Filter (BPF) program, as with socket filters, except that the data > +operated on is related to the system call being made: system call > +number, and the system call arguments. This allows for expressive > +filtering of system calls using a filter program language with a long > +history of being exposed to userland and a straightforward data set. > + > +Additionally, BPF makes it impossible for users of seccomp to fall prey > +to time-of-check-time-of-use (TOCTOU) attacks that are common in system > +call interposition frameworks. BPF programs may not dereference > +pointers which constrains all filters to solely evaluating the system > +call arguments directly. > + > +What it isn't > +------------- > + > +System call filtering isn't a sandbox. It provides a clearly defined > +mechanism for minimizing the exposed kernel surface. Beyond that, > +policy for logical behavior and information flow should be managed with > +a combination of other system hardening techniques and, potentially, an > +LSM of your choosing. Expressive, dynamic filters provide further options down > +this path (avoiding pathological sizes or selecting which of the multiplexed > +system calls in socketcall() is allowed, for instance) which could be > +construed, incorrectly, as a more complete sandboxing solution. > + > +Usage > +----- > + > +An additional seccomp mode is added, but they are not directly set by > +the consuming process. The new mode, '2', is only available if > +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the > +PR_ATTACH_SECCOMP_FILTER argument. > + > +Interacting with seccomp filters is done using one prctl(2) call. > + > +PR_ATTACH_SECCOMP_FILTER: > + Allows the specification of a new filter using a BPF program. > + The BPF program will be executed over struct seccomp_filter_data > + reflecting the system call number, arguments, and other > + metadata, To allow a system call, SECCOMP_BPF_ALLOW must be > + returned. At present, all other return values result in the > + system call being blocked, but it is recommended to return > + SECCOMP_BPF_DENY in those cases. This will allow for future > + custom return values to be introduced, if ever desired. > + > + Usage: > + prctl(PR_ATTACH_SECCOMP_FILTER, prog); > + > + The 'prog' argument is a pointer to a struct sock_fprog which will > + contain the filter program. If the program is invalid, the call > + will return -1 and set errno to EINVAL. > + > + Note, is_compat_task is also tracked for the @prog. This means > + that once set the calling task will have all of its system calls > + blocked if it switches its system call ABI. > + > + If fork/clone and execve are allowed by @prog, any child processes will > + be constrained to the same filters and system call ABI as the parent. > + > + Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or > + run with CAP_SYS_ADMIN privileges in its namespace. If these are not > + true, -EACCES will be returned. This requirement ensures that filter > + programs cannot be applied to child processes with greater privileges > + than the task that installed them. > + > + Additionally, if prctl(2) is allowed by the attached filter, > + additional filters may be layered on which will increase evaluation > + time, but allow for further decreasing the attack surface during > + execution of a process. > + > +The above call returns 0 on success and non-zero on error. > + > +Example > +------- > + > +The samples/seccomp/ directory contains both a 32-bit specific example > +and a more generic example of a higher level macro interface for BPF > +program generation. > + > +Adding architecture support > +----------------------- > + > +Any platform with seccomp support will support seccomp filters as long > +as CONFIG_SECCOMP_FILTER is enabled and the architecture has implemented > +syscall_get_arguments. > diff --git a/samples/Makefile b/samples/Makefile > index 6280817..f29b19c 100644 > --- a/samples/Makefile > +++ b/samples/Makefile > @@ -1,4 +1,4 @@ > # Makefile for Linux samples code > > obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/ \ > - hw_breakpoint/ kfifo/ kdb/ hidraw/ > + hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/ > diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile > new file mode 100644 > index 0000000..0298c6f > --- /dev/null > +++ b/samples/seccomp/Makefile > @@ -0,0 +1,27 @@ > +# kbuild trick to avoid linker error. Can be omitted if a module is built. > +obj- := dummy.o > + > +hostprogs-y := bpf-fancy > +bpf-fancy-objs := bpf-fancy.o bpf-helper.o > + > +HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include > +HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include > +HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include > +HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include > + > +# bpf-direct.c is x86-only. > +ifeq ($(filter-out x86_64 i386,$(KBUILD_BUILDHOST)),) > +# List of programs to build > +hostprogs-y += bpf-direct > +bpf-direct-objs := bpf-direct.o > +endif > + > +# Tell kbuild to always build the programs > +always := $(hostprogs-y) > + > +HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include > +HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include > +ifeq ($(KBUILD_BUILDHOST),x86_64) > +HOSTCFLAGS_bpf-direct.o += -m32 > +HOSTLOADLIBES_bpf-direct += -m32 > +endif > diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c > new file mode 100644 > index 0000000..d799244 > --- /dev/null > +++ b/samples/seccomp/bpf-direct.c > @@ -0,0 +1,77 @@ > +/* > + * 32-bit seccomp filter example with BPF macros > + * > + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org> > + * Author: Will Drewry<wad@chromium.org> > + * > + * The code may be used by anyone for any purpose, > + * and can serve as a starting point for developing > + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). > + */ > + > +#include<linux/filter.h> > +#include<linux/ptrace.h> > +#include<linux/seccomp_filter.h> > +#include<linux/unistd.h> > +#include<stdio.h> > +#include<stddef.h> > +#include<sys/prctl.h> > +#include<unistd.h> > + > +#ifndef PR_ATTACH_SECCOMP_FILTER > +# define PR_ATTACH_SECCOMP_FILTER 37 > +#endif > + > +#define syscall_arg(_n) (offsetof(struct seccomp_filter_data, args[_n].lo32)) > +#define nr (offsetof(struct seccomp_filter_data, syscall_nr)) > + > +static int install_filter(void) > +{ > + struct seccomp_filter_block filter[] = { > + /* Grab the system call number */ > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, nr), > + /* Jump table for the allowed syscalls */ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6), > + > + /* Check that read is only using stdin. */ > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4), > + > + /* Check that write is only using stdout/stderr */ > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1), > + > + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_ALLOW), > + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_DENY), > + }; > + struct seccomp_fprog prog = { > + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), > + .filter = filter, > + }; > + if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) { > + perror("prctl"); > + return 1; > + } > + return 0; > +} > + > +#define payload(_c) (_c), sizeof((_c)) > +int main(int argc, char **argv) > +{ > + char buf[4096]; > + ssize_t bytes = 0; > + if (install_filter()) > + return 1; > + syscall(__NR_write, STDOUT_FILENO, > + payload("OHAI! WHAT IS YOUR NAME? ")); > + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)); > + syscall(__NR_write, STDOUT_FILENO, payload("HELLO, ")); > + syscall(__NR_write, STDOUT_FILENO, buf, bytes); > + return 0; > +} > diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c > new file mode 100644 > index 0000000..1318b1a > --- /dev/null > +++ b/samples/seccomp/bpf-fancy.c > @@ -0,0 +1,95 @@ > +/* > + * Seccomp BPF example using a macro-based generator. > + * > + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org> > + * Author: Will Drewry<wad@chromium.org> > + * > + * The code may be used by anyone for any purpose, > + * and can serve as a starting point for developing > + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). > + */ > + > +#include<linux/seccomp_filter.h> > +#include<linux/unistd.h> > +#include<stdio.h> > +#include<string.h> > +#include<sys/prctl.h> > +#include<unistd.h> > + > +#include "bpf-helper.h" > + > +#ifndef PR_ATTACH_SECCOMP_FILTER > +# define PR_ATTACH_SECCOMP_FILTER 37 > +#endif > + > +int main(int argc, char **argv) > +{ > + struct bpf_labels l; > + static const char msg1[] = "Please type something: "; > + static const char msg2[] = "You typed: "; > + char buf[256]; > + struct seccomp_filter_block filter[] = { > + LOAD_SYSCALL_NR, > + SYSCALL(__NR_exit, ALLOW), > + SYSCALL(__NR_exit_group, ALLOW), > + SYSCALL(__NR_write, JUMP(&l, write_fd)), > + SYSCALL(__NR_read, JUMP(&l, read)), > + DENY, /* Don't passthrough into a label */ > + > + LABEL(&l, read), > + ARG(0), > + JNE(STDIN_FILENO, DENY), > + ARG(1), > + JNE((unsigned long)buf, DENY), > + ARG(2), > + JGE(sizeof(buf), DENY), > + ALLOW, > + > + LABEL(&l, write_fd), > + ARG(0), > + JEQ(STDOUT_FILENO, JUMP(&l, write_buf)), > + JEQ(STDERR_FILENO, JUMP(&l, write_buf)), > + DENY, > + > + LABEL(&l, write_buf), > + ARG(1), > + JEQ((unsigned long)msg1, JUMP(&l, msg1_len)), > + JEQ((unsigned long)msg2, JUMP(&l, msg2_len)), > + JEQ((unsigned long)buf, JUMP(&l, buf_len)), > + DENY, > + > + LABEL(&l, msg1_len), > + ARG(2), > + JLT(sizeof(msg1), ALLOW), > + DENY, > + > + LABEL(&l, msg2_len), > + ARG(2), > + JLT(sizeof(msg2), ALLOW), > + DENY, > + > + LABEL(&l, buf_len), > + ARG(2), > + JLT(sizeof(buf), ALLOW), > + DENY, > + }; > + struct seccomp_fprog prog = { > + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), > + .filter = filter, > + }; > + ssize_t bytes; > + bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter)); > + > + if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) { > + perror("prctl"); > + return 1; > + } > + syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1)); > + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1); > + bytes = (bytes> 0 ? bytes : 0); > + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)); > + syscall(__NR_write, STDERR_FILENO, buf, bytes); > + /* Now get killed */ > + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2); > + return 0; > +} > diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c > new file mode 100644 > index 0000000..e1b6bc7 > --- /dev/null > +++ b/samples/seccomp/bpf-helper.c > @@ -0,0 +1,89 @@ > +/* > + * Seccomp BPF helper functions > + * > + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org> > + * Author: Will Drewry<wad@chromium.org> > + * > + * The code may be used by anyone for any purpose, > + * and can serve as a starting point for developing > + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). > + */ > + > +#include<stdio.h> > +#include<string.h> > + > +#include "bpf-helper.h" > + > +int bpf_resolve_jumps(struct bpf_labels *labels, > + struct seccomp_filter_block *filter, size_t count) > +{ > + struct seccomp_filter_block *begin = filter; > + __u8 insn = count - 1; > + > + if (count< 1) > + return -1; > + /* > + * Walk it once, backwards, to build the label table and do fixups. > + * Since backward jumps are disallowed by BPF, this is easy. > + */ > + filter += insn; > + for (; filter>= begin; --insn, --filter) { > + if (filter->code != (BPF_JMP+BPF_JA)) > + continue; > + switch ((filter->jt<<8)|filter->jf) { > + case (JUMP_JT<<8)|JUMP_JF: > + if (labels->labels[filter->k].location == 0xffffffff) { > + fprintf(stderr, "Unresolved label: '%s'\n", > + labels->labels[filter->k].label); > + return 1; > + } > + filter->k = labels->labels[filter->k].location - > + (insn + 1); > + filter->jt = 0; > + filter->jf = 0; > + continue; > + case (LABEL_JT<<8)|LABEL_JF: > + if (labels->labels[filter->k].location != 0xffffffff) { > + fprintf(stderr, "Duplicate label use: '%s'\n", > + labels->labels[filter->k].label); > + return 1; > + } > + labels->labels[filter->k].location = insn; > + filter->k = 0; /* fall through */ > + filter->jt = 0; > + filter->jf = 0; > + continue; > + } > + } > + return 0; > +} > + > +/* Simple lookup table for labels. */ > +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label) > +{ > + struct __bpf_label *begin = labels->labels, *end; > + int id; > + if (labels->count == 0) { > + begin->label = label; > + begin->location = 0xffffffff; > + labels->count++; > + return 0; > + } > + end = begin + labels->count; > + for (id = 0; begin< end; ++begin, ++id) { > + if (!strcmp(label, begin->label)) > + return id; > + } > + begin->label = label; > + begin->location = 0xffffffff; > + labels->count++; > + return id; > +} > + > +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count) > +{ > + struct seccomp_filter_block *end = filter + count; > + for ( ; filter< end; ++filter) > + printf("{ code=%u,jt=%u,jf=%u,k=%u },\n", > + filter->code, filter->jt, filter->jf, filter->k); > +} > diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h > new file mode 100644 > index 0000000..92b94ec > --- /dev/null > +++ b/samples/seccomp/bpf-helper.h > @@ -0,0 +1,219 @@ > +/* > + * Example wrapper around BPF macros. > + * > + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org> > + * Author: Will Drewry<wad@chromium.org> > + * > + * The code may be used by anyone for any purpose, > + * and can serve as a starting point for developing > + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). > + * > + * No guarantees are provided with respect to the correctness > + * or functionality of this code. > + */ > +#ifndef __BPF_HELPER_H__ > +#define __BPF_HELPER_H__ > + > +#include<asm/bitsperlong.h> /* for __BITS_PER_LONG */ > +#include<linux/filter.h> > +#include<linux/seccomp_filter.h> /* for seccomp_filter_data.arg */ > +#include<linux/types.h> > +#include<linux/unistd.h> > +#include<stddef.h> > + > +#define BPF_LABELS_MAX 256 > +struct bpf_labels { > + int count; > + struct __bpf_label { > + const char *label; > + __u32 location; > + } labels[BPF_LABELS_MAX]; > +}; > + > +int bpf_resolve_jumps(struct bpf_labels *labels, > + struct seccomp_filter_block *filter, size_t count); > +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label); > +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count); > + > +#define JUMP_JT 0xff > +#define JUMP_JF 0xff > +#define LABEL_JT 0xfe > +#define LABEL_JF 0xfe > + > +#define ALLOW \ > + BPF_STMT(BPF_RET+BPF_K, 0xFFFFFFFF) > +#define DENY \ > + BPF_STMT(BPF_RET+BPF_K, 0) > +#define JUMP(labels, label) \ > + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ > + JUMP_JT, JUMP_JF) > +#define LABEL(labels, label) \ > + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ > + LABEL_JT, LABEL_JF) > +#define SYSCALL(nr, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \ > + jt > + > +/* Lame, but just an example */ > +#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label) > + > +#define EXPAND(...) __VA_ARGS__ > +/* Map all width-sensitive operations */ > +#if __BITS_PER_LONG == 32 > + > +#define JEQ(x, jt) JEQ32(x, EXPAND(jt)) > +#define JNE(x, jt) JNE32(x, EXPAND(jt)) > +#define JGT(x, jt) JGT32(x, EXPAND(jt)) > +#define JLT(x, jt) JLT32(x, EXPAND(jt)) > +#define JGE(x, jt) JGE32(x, EXPAND(jt)) > +#define JLE(x, jt) JLE32(x, EXPAND(jt)) > +#define JA(x, jt) JA32(x, EXPAND(jt)) > +#define ARG(i) ARG_32(i) > + > +#elif __BITS_PER_LONG == 64 > + > +#define JEQ(x, jt) \ > + JEQ64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ > + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ > + EXPAND(jt)) > +#define JGT(x, jt) \ > + JGT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ > + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ > + EXPAND(jt)) > +#define JGE(x, jt) \ > + JGE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ > + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ > + EXPAND(jt)) > +#define JNE(x, jt) \ > + JNE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ > + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ > + EXPAND(jt)) > +#define JLT(x, jt) \ > + JLT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ > + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ > + EXPAND(jt)) > +#define JLE(x, jt) \ > + JLE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ > + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ > + EXPAND(jt)) > + > +#define JA(x, jt) \ > + JA64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ > + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ > + EXPAND(jt)) > +#define ARG(i) ARG_64(i) > + > +#else > +#error __BITS_PER_LONG value unusable. > +#endif > + > +/* Loads the arg into A */ > +#define ARG_32(idx) \ > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ > + offsetof(struct seccomp_filter_data, args[(idx)].lo32)) > + > +/* Loads hi into A and lo in X */ > +#define ARG_64(idx) \ > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ > + offsetof(struct seccomp_filter_data, args[(idx)].lo32)), \ > + BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \ > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ > + offsetof(struct seccomp_filter_data, args[(idx)].hi32)), \ > + BPF_STMT(BPF_ST, 1) /* hi -> M[1] */ > + > +#define JEQ32(value, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \ > + jt > + > +#define JNE32(value, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \ > + jt > + > +/* Checks the lo, then swaps to check the hi. A=lo,X=hi */ > +#define JEQ64(lo, hi, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ > + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \ > + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ > + jt, \ > + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ > + > +#define JNE64(lo, hi, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \ > + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \ > + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ > + jt, \ > + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ > + > +#define JA32(value, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \ > + jt > + > +#define JA64(lo, hi, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \ > + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ > + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \ > + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ > + jt, \ > + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ > + > +#define JGE32(value, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \ > + jt > + > +#define JLT32(value, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \ > + jt > + > +/* Shortcut checking if hi> arg.hi. */ > +#define JGE64(lo, hi, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ > + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ > + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \ > + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ > + jt, \ > + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ > + > +#define JLT64(lo, hi, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ > + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ > + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ > + jt, \ > + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ > + > +#define JGT32(value, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ > + jt > + > +#define JLE32(value, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ > + jt Should the true/false offsets be reversed here? Thanks for all the work on this. We're looking forward to using it with QEMU. > + > +/* Check hi> args.hi first, then do the GE checking */ > +#define JGT64(lo, hi, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ > + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \ > + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ > + jt, \ > + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ > + > +#define JLE64(lo, hi, jt) \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \ > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \ > + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ > + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ > + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ > + jt, \ > + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ > + > +#define LOAD_SYSCALL_NR \ > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ > + offsetof(struct seccomp_filter_data, syscall_nr)) > + > +#endif /* __BPF_HELPER_H__ */ -- Regards, Corey ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 3/3] Documentation: prctl/seccomp_filter 2012-01-30 22:47 ` Corey Bryant @ 2012-01-30 22:52 ` Will Drewry 0 siblings, 0 replies; 13+ messages in thread From: Will Drewry @ 2012-01-30 22:52 UTC (permalink / raw) To: Corey Bryant Cc: linux-kernel, keescook, john.johansen, serge.hallyn, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On Mon, Jan 30, 2012 at 4:47 PM, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote: > > > On 01/28/2012 05:11 PM, Will Drewry wrote: >> >> Documents how system call filtering using Berkeley Packet >> Filter programs works and how it may be used. >> Includes an example for x86 (32-bit) and a semi-generic >> example using an example code generator. >> >> v6: - tweak the language to note the requirement of >> PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu) >> v5: - update sample to use system call arguments >> - adds a "fancy" example using a macro-based generator >> - cleaned up bpf in the sample >> - update docs to mention arguments >> - fix prctl value (eparis@redhat.com) >> - language cleanup (rdunlap@xenotime.net) >> v4: - update for no_new_privs use >> - minor tweaks >> v3: - call out BPF<-> Berkeley Packet Filter (rdunlap@xenotime.net) >> - document use of tentative always-unprivileged >> - guard sample compilation for i386 and x86_64 >> v2: - move code to samples (corbet@lwn.net) >> >> Signed-off-by: Will Drewry<wad@chromium.org> >> --- >> Documentation/prctl/seccomp_filter.txt | 100 +++++++++++++++ >> samples/Makefile | 2 +- >> samples/seccomp/Makefile | 27 ++++ >> samples/seccomp/bpf-direct.c | 77 +++++++++++ >> samples/seccomp/bpf-fancy.c | 95 ++++++++++++++ >> samples/seccomp/bpf-helper.c | 89 +++++++++++++ >> samples/seccomp/bpf-helper.h | 219 >> ++++++++++++++++++++++++++++++++ >> 7 files changed, 608 insertions(+), 1 deletions(-) >> create mode 100644 Documentation/prctl/seccomp_filter.txt >> create mode 100644 samples/seccomp/Makefile >> create mode 100644 samples/seccomp/bpf-direct.c >> create mode 100644 samples/seccomp/bpf-fancy.c >> create mode 100644 samples/seccomp/bpf-helper.c >> create mode 100644 samples/seccomp/bpf-helper.h >> >> diff --git a/Documentation/prctl/seccomp_filter.txt >> b/Documentation/prctl/seccomp_filter.txt >> new file mode 100644 >> index 0000000..4ad7649 >> --- /dev/null >> +++ b/Documentation/prctl/seccomp_filter.txt >> @@ -0,0 +1,100 @@ >> + Seccomp filtering >> + ================= >> + >> +Introduction >> +------------ >> + >> +A large number of system calls are exposed to every userland process >> +with many of them going unused for the entire lifetime of the process. >> +As system calls change and mature, bugs are found and eradicated. A >> +certain subset of userland applications benefit by having a reduced set >> +of available system calls. The resulting set reduces the total kernel >> +surface exposed to the application. System call filtering is meant for >> +use with those applications. >> + >> +Seccomp filtering provides a means for a process to specify a filter for >> +incoming system calls. The filter is expressed as a Berkeley Packet >> +Filter (BPF) program, as with socket filters, except that the data >> +operated on is related to the system call being made: system call >> +number, and the system call arguments. This allows for expressive >> +filtering of system calls using a filter program language with a long >> +history of being exposed to userland and a straightforward data set. >> + >> +Additionally, BPF makes it impossible for users of seccomp to fall prey >> +to time-of-check-time-of-use (TOCTOU) attacks that are common in system >> +call interposition frameworks. BPF programs may not dereference >> +pointers which constrains all filters to solely evaluating the system >> +call arguments directly. >> + >> +What it isn't >> +------------- >> + >> +System call filtering isn't a sandbox. It provides a clearly defined >> +mechanism for minimizing the exposed kernel surface. Beyond that, >> +policy for logical behavior and information flow should be managed with >> +a combination of other system hardening techniques and, potentially, an >> +LSM of your choosing. Expressive, dynamic filters provide further >> options down >> +this path (avoiding pathological sizes or selecting which of the >> multiplexed >> +system calls in socketcall() is allowed, for instance) which could be >> +construed, incorrectly, as a more complete sandboxing solution. >> + >> +Usage >> +----- >> + >> +An additional seccomp mode is added, but they are not directly set by >> +the consuming process. The new mode, '2', is only available if >> +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the >> +PR_ATTACH_SECCOMP_FILTER argument. >> + >> +Interacting with seccomp filters is done using one prctl(2) call. >> + >> +PR_ATTACH_SECCOMP_FILTER: >> + Allows the specification of a new filter using a BPF program. >> + The BPF program will be executed over struct seccomp_filter_data >> + reflecting the system call number, arguments, and other >> + metadata, To allow a system call, SECCOMP_BPF_ALLOW must be >> + returned. At present, all other return values result in the >> + system call being blocked, but it is recommended to return >> + SECCOMP_BPF_DENY in those cases. This will allow for future >> + custom return values to be introduced, if ever desired. >> + >> + Usage: >> + prctl(PR_ATTACH_SECCOMP_FILTER, prog); >> + >> + The 'prog' argument is a pointer to a struct sock_fprog which will >> + contain the filter program. If the program is invalid, the call >> + will return -1 and set errno to EINVAL. >> + >> + Note, is_compat_task is also tracked for the @prog. This means >> + that once set the calling task will have all of its system calls >> + blocked if it switches its system call ABI. >> + >> + If fork/clone and execve are allowed by @prog, any child processes >> will >> + be constrained to the same filters and system call ABI as the >> parent. >> + >> + Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or >> + run with CAP_SYS_ADMIN privileges in its namespace. If these are >> not >> + true, -EACCES will be returned. This requirement ensures that >> filter >> + programs cannot be applied to child processes with greater >> privileges >> + than the task that installed them. >> + >> + Additionally, if prctl(2) is allowed by the attached filter, >> + additional filters may be layered on which will increase >> evaluation >> + time, but allow for further decreasing the attack surface during >> + execution of a process. >> + >> +The above call returns 0 on success and non-zero on error. >> + >> +Example >> +------- >> + >> +The samples/seccomp/ directory contains both a 32-bit specific example >> +and a more generic example of a higher level macro interface for BPF >> +program generation. >> + >> +Adding architecture support >> +----------------------- >> + >> +Any platform with seccomp support will support seccomp filters as long >> +as CONFIG_SECCOMP_FILTER is enabled and the architecture has implemented >> +syscall_get_arguments. >> diff --git a/samples/Makefile b/samples/Makefile >> index 6280817..f29b19c 100644 >> --- a/samples/Makefile >> +++ b/samples/Makefile >> @@ -1,4 +1,4 @@ >> # Makefile for Linux samples code >> >> obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/ \ >> - hw_breakpoint/ kfifo/ kdb/ hidraw/ >> + hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/ >> diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile >> new file mode 100644 >> index 0000000..0298c6f >> --- /dev/null >> +++ b/samples/seccomp/Makefile >> @@ -0,0 +1,27 @@ >> +# kbuild trick to avoid linker error. Can be omitted if a module is >> built. >> +obj- := dummy.o >> + >> +hostprogs-y := bpf-fancy >> +bpf-fancy-objs := bpf-fancy.o bpf-helper.o >> + >> +HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include >> +HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include >> +HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include >> +HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include >> + >> +# bpf-direct.c is x86-only. >> +ifeq ($(filter-out x86_64 i386,$(KBUILD_BUILDHOST)),) >> +# List of programs to build >> +hostprogs-y += bpf-direct >> +bpf-direct-objs := bpf-direct.o >> +endif >> + >> +# Tell kbuild to always build the programs >> +always := $(hostprogs-y) >> + >> +HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include >> +HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include >> +ifeq ($(KBUILD_BUILDHOST),x86_64) >> +HOSTCFLAGS_bpf-direct.o += -m32 >> +HOSTLOADLIBES_bpf-direct += -m32 >> +endif >> diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c >> new file mode 100644 >> index 0000000..d799244 >> --- /dev/null >> +++ b/samples/seccomp/bpf-direct.c >> @@ -0,0 +1,77 @@ >> +/* >> + * 32-bit seccomp filter example with BPF macros >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@chromium.org> >> + * Author: Will Drewry<wad@chromium.org> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + */ >> + >> +#include<linux/filter.h> >> +#include<linux/ptrace.h> >> +#include<linux/seccomp_filter.h> >> +#include<linux/unistd.h> >> +#include<stdio.h> >> +#include<stddef.h> >> +#include<sys/prctl.h> >> +#include<unistd.h> >> + >> +#ifndef PR_ATTACH_SECCOMP_FILTER >> +# define PR_ATTACH_SECCOMP_FILTER 37 >> +#endif >> + >> +#define syscall_arg(_n) (offsetof(struct seccomp_filter_data, >> args[_n].lo32)) >> +#define nr (offsetof(struct seccomp_filter_data, syscall_nr)) >> + >> +static int install_filter(void) >> +{ >> + struct seccomp_filter_block filter[] = { >> + /* Grab the system call number */ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, nr), >> + /* Jump table for the allowed syscalls */ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6), >> + >> + /* Check that read is only using stdin. */ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4), >> + >> + /* Check that write is only using stdout/stderr */ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1), >> + >> + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_ALLOW), >> + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_DENY), >> + }; >> + struct seccomp_fprog prog = { >> + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), >> + .filter = filter, >> + }; >> + if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) { >> + perror("prctl"); >> + return 1; >> + } >> + return 0; >> +} >> + >> +#define payload(_c) (_c), sizeof((_c)) >> +int main(int argc, char **argv) >> +{ >> + char buf[4096]; >> + ssize_t bytes = 0; >> + if (install_filter()) >> + return 1; >> + syscall(__NR_write, STDOUT_FILENO, >> + payload("OHAI! WHAT IS YOUR NAME? ")); >> + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)); >> + syscall(__NR_write, STDOUT_FILENO, payload("HELLO, ")); >> + syscall(__NR_write, STDOUT_FILENO, buf, bytes); >> + return 0; >> +} >> diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c >> new file mode 100644 >> index 0000000..1318b1a >> --- /dev/null >> +++ b/samples/seccomp/bpf-fancy.c >> @@ -0,0 +1,95 @@ >> +/* >> + * Seccomp BPF example using a macro-based generator. >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@chromium.org> >> + * Author: Will Drewry<wad@chromium.org> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + */ >> + >> +#include<linux/seccomp_filter.h> >> +#include<linux/unistd.h> >> +#include<stdio.h> >> +#include<string.h> >> +#include<sys/prctl.h> >> +#include<unistd.h> >> + >> +#include "bpf-helper.h" >> + >> +#ifndef PR_ATTACH_SECCOMP_FILTER >> +# define PR_ATTACH_SECCOMP_FILTER 37 >> +#endif >> + >> +int main(int argc, char **argv) >> +{ >> + struct bpf_labels l; >> + static const char msg1[] = "Please type something: "; >> + static const char msg2[] = "You typed: "; >> + char buf[256]; >> + struct seccomp_filter_block filter[] = { >> + LOAD_SYSCALL_NR, >> + SYSCALL(__NR_exit, ALLOW), >> + SYSCALL(__NR_exit_group, ALLOW), >> + SYSCALL(__NR_write, JUMP(&l, write_fd)), >> + SYSCALL(__NR_read, JUMP(&l, read)), >> + DENY, /* Don't passthrough into a label */ >> + >> + LABEL(&l, read), >> + ARG(0), >> + JNE(STDIN_FILENO, DENY), >> + ARG(1), >> + JNE((unsigned long)buf, DENY), >> + ARG(2), >> + JGE(sizeof(buf), DENY), >> + ALLOW, >> + >> + LABEL(&l, write_fd), >> + ARG(0), >> + JEQ(STDOUT_FILENO, JUMP(&l, write_buf)), >> + JEQ(STDERR_FILENO, JUMP(&l, write_buf)), >> + DENY, >> + >> + LABEL(&l, write_buf), >> + ARG(1), >> + JEQ((unsigned long)msg1, JUMP(&l, msg1_len)), >> + JEQ((unsigned long)msg2, JUMP(&l, msg2_len)), >> + JEQ((unsigned long)buf, JUMP(&l, buf_len)), >> + DENY, >> + >> + LABEL(&l, msg1_len), >> + ARG(2), >> + JLT(sizeof(msg1), ALLOW), >> + DENY, >> + >> + LABEL(&l, msg2_len), >> + ARG(2), >> + JLT(sizeof(msg2), ALLOW), >> + DENY, >> + >> + LABEL(&l, buf_len), >> + ARG(2), >> + JLT(sizeof(buf), ALLOW), >> + DENY, >> + }; >> + struct seccomp_fprog prog = { >> + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), >> + .filter = filter, >> + }; >> + ssize_t bytes; >> + bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter)); >> + >> + if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) { >> + perror("prctl"); >> + return 1; >> + } >> + syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1)); >> + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1); >> + bytes = (bytes> 0 ? bytes : 0); >> + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)); >> + syscall(__NR_write, STDERR_FILENO, buf, bytes); >> + /* Now get killed */ >> + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2); >> + return 0; >> +} >> diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c >> new file mode 100644 >> index 0000000..e1b6bc7 >> --- /dev/null >> +++ b/samples/seccomp/bpf-helper.c >> @@ -0,0 +1,89 @@ >> +/* >> + * Seccomp BPF helper functions >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@chromium.org> >> + * Author: Will Drewry<wad@chromium.org> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + */ >> + >> +#include<stdio.h> >> +#include<string.h> >> + >> +#include "bpf-helper.h" >> + >> +int bpf_resolve_jumps(struct bpf_labels *labels, >> + struct seccomp_filter_block *filter, size_t count) >> +{ >> + struct seccomp_filter_block *begin = filter; >> + __u8 insn = count - 1; >> + >> + if (count< 1) >> + return -1; >> + /* >> + * Walk it once, backwards, to build the label table and do fixups. >> + * Since backward jumps are disallowed by BPF, this is easy. >> + */ >> + filter += insn; >> + for (; filter>= begin; --insn, --filter) { >> + if (filter->code != (BPF_JMP+BPF_JA)) >> + continue; >> + switch ((filter->jt<<8)|filter->jf) { >> + case (JUMP_JT<<8)|JUMP_JF: >> + if (labels->labels[filter->k].location == >> 0xffffffff) { >> + fprintf(stderr, "Unresolved label: >> '%s'\n", >> + labels->labels[filter->k].label); >> + return 1; >> + } >> + filter->k = labels->labels[filter->k].location - >> + (insn + 1); >> + filter->jt = 0; >> + filter->jf = 0; >> + continue; >> + case (LABEL_JT<<8)|LABEL_JF: >> + if (labels->labels[filter->k].location != >> 0xffffffff) { >> + fprintf(stderr, "Duplicate label use: >> '%s'\n", >> + labels->labels[filter->k].label); >> + return 1; >> + } >> + labels->labels[filter->k].location = insn; >> + filter->k = 0; /* fall through */ >> + filter->jt = 0; >> + filter->jf = 0; >> + continue; >> + } >> + } >> + return 0; >> +} >> + >> +/* Simple lookup table for labels. */ >> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label) >> +{ >> + struct __bpf_label *begin = labels->labels, *end; >> + int id; >> + if (labels->count == 0) { >> + begin->label = label; >> + begin->location = 0xffffffff; >> + labels->count++; >> + return 0; >> + } >> + end = begin + labels->count; >> + for (id = 0; begin< end; ++begin, ++id) { >> + if (!strcmp(label, begin->label)) >> + return id; >> + } >> + begin->label = label; >> + begin->location = 0xffffffff; >> + labels->count++; >> + return id; >> +} >> + >> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count) >> +{ >> + struct seccomp_filter_block *end = filter + count; >> + for ( ; filter< end; ++filter) >> + printf("{ code=%u,jt=%u,jf=%u,k=%u },\n", >> + filter->code, filter->jt, filter->jf, filter->k); >> +} >> diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h >> new file mode 100644 >> index 0000000..92b94ec >> --- /dev/null >> +++ b/samples/seccomp/bpf-helper.h >> @@ -0,0 +1,219 @@ >> +/* >> + * Example wrapper around BPF macros. >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@chromium.org> >> + * Author: Will Drewry<wad@chromium.org> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + * >> + * No guarantees are provided with respect to the correctness >> + * or functionality of this code. >> + */ >> +#ifndef __BPF_HELPER_H__ >> +#define __BPF_HELPER_H__ >> + >> +#include<asm/bitsperlong.h> /* for __BITS_PER_LONG */ >> +#include<linux/filter.h> >> +#include<linux/seccomp_filter.h> /* for seccomp_filter_data.arg */ >> +#include<linux/types.h> >> +#include<linux/unistd.h> >> +#include<stddef.h> >> + >> +#define BPF_LABELS_MAX 256 >> +struct bpf_labels { >> + int count; >> + struct __bpf_label { >> + const char *label; >> + __u32 location; >> + } labels[BPF_LABELS_MAX]; >> +}; >> + >> +int bpf_resolve_jumps(struct bpf_labels *labels, >> + struct seccomp_filter_block *filter, size_t count); >> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label); >> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t >> count); >> + >> +#define JUMP_JT 0xff >> +#define JUMP_JF 0xff >> +#define LABEL_JT 0xfe >> +#define LABEL_JF 0xfe >> + >> +#define ALLOW \ >> + BPF_STMT(BPF_RET+BPF_K, 0xFFFFFFFF) >> +#define DENY \ >> + BPF_STMT(BPF_RET+BPF_K, 0) >> +#define JUMP(labels, label) \ >> + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ >> + JUMP_JT, JUMP_JF) >> +#define LABEL(labels, label) \ >> + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ >> + LABEL_JT, LABEL_JF) >> +#define SYSCALL(nr, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \ >> + jt >> + >> +/* Lame, but just an example */ >> +#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label) >> + >> +#define EXPAND(...) __VA_ARGS__ >> +/* Map all width-sensitive operations */ >> +#if __BITS_PER_LONG == 32 >> + >> +#define JEQ(x, jt) JEQ32(x, EXPAND(jt)) >> +#define JNE(x, jt) JNE32(x, EXPAND(jt)) >> +#define JGT(x, jt) JGT32(x, EXPAND(jt)) >> +#define JLT(x, jt) JLT32(x, EXPAND(jt)) >> +#define JGE(x, jt) JGE32(x, EXPAND(jt)) >> +#define JLE(x, jt) JLE32(x, EXPAND(jt)) >> +#define JA(x, jt) JA32(x, EXPAND(jt)) >> +#define ARG(i) ARG_32(i) >> + >> +#elif __BITS_PER_LONG == 64 >> + >> +#define JEQ(x, jt) \ >> + JEQ64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JGT(x, jt) \ >> + JGT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JGE(x, jt) \ >> + JGE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JNE(x, jt) \ >> + JNE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JLT(x, jt) \ >> + JLT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JLE(x, jt) \ >> + JLE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> + >> +#define JA(x, jt) \ >> + JA64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define ARG(i) ARG_64(i) >> + >> +#else >> +#error __BITS_PER_LONG value unusable. >> +#endif >> + >> +/* Loads the arg into A */ >> +#define ARG_32(idx) \ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ >> + offsetof(struct seccomp_filter_data, args[(idx)].lo32)) >> + >> +/* Loads hi into A and lo in X */ >> +#define ARG_64(idx) \ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ >> + offsetof(struct seccomp_filter_data, args[(idx)].lo32)), \ >> + BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ >> + offsetof(struct seccomp_filter_data, args[(idx)].hi32)), \ >> + BPF_STMT(BPF_ST, 1) /* hi -> M[1] */ >> + >> +#define JEQ32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JNE32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \ >> + jt >> + >> +/* Checks the lo, then swaps to check the hi. A=lo,X=hi */ >> +#define JEQ64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JNE64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JA32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JA64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JGE32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JLT32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \ >> + jt >> + >> +/* Shortcut checking if hi> arg.hi. */ >> +#define JGE64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JLT64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JGT32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JLE32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ >> + jt > > > Should the true/false offsets be reversed here? Looks that way :) > Thanks for all the work on this. We're looking forward to using it with > QEMU. Definitely - thanks! will ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 1/3] seccomp: kill the seccomp_t typedef 2012-01-28 22:11 [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Will Drewry 2012-01-28 22:11 ` [PATCH v6 2/3] seccomp_filters: system call filtering using BPF Will Drewry 2012-01-28 22:11 ` [PATCH v6 3/3] Documentation: prctl/seccomp_filter Will Drewry @ 2012-02-02 15:29 ` Serge E. Hallyn 2012-02-03 23:16 ` Will Drewry 2 siblings, 1 reply; 13+ messages in thread From: Serge E. Hallyn @ 2012-02-02 15:29 UTC (permalink / raw) To: Will Drewry Cc: linux-kernel, keescook, john.johansen, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr Quoting Will Drewry (wad@chromium.org): > Replaces the seccomp_t typedef with seccomp_struct to match modern > kernel style. (sorry, I'm a bit behind on list) You were going to switch this to 'struct seccomp' right? > Signed-off-by: Will Drewry <wad@chromium.org> > --- > include/linux/sched.h | 2 +- > include/linux/seccomp.h | 10 ++++++---- > 2 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 4032ec1..288b5cb 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1418,7 +1418,7 @@ struct task_struct { > uid_t loginuid; > unsigned int sessionid; > #endif > - seccomp_t seccomp; > + struct seccomp_struct seccomp; > > /* Thread group tracking */ > u32 parent_exec_id; > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index cc7a4e9..171ab66 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -7,7 +7,9 @@ > #include <linux/thread_info.h> > #include <asm/seccomp.h> > > -typedef struct { int mode; } seccomp_t; > +struct seccomp_struct { > + int mode; > +}; > > extern void __secure_computing(int); > static inline void secure_computing(int this_syscall) > @@ -19,7 +21,7 @@ static inline void secure_computing(int this_syscall) > extern long prctl_get_seccomp(void); > extern long prctl_set_seccomp(unsigned long); > > -static inline int seccomp_mode(seccomp_t *s) > +static inline int seccomp_mode(struct seccomp_struct *s) > { > return s->mode; > } > @@ -28,7 +30,7 @@ static inline int seccomp_mode(seccomp_t *s) > > #include <linux/errno.h> > > -typedef struct { } seccomp_t; > +struct seccomp_struct { }; > > #define secure_computing(x) do { } while (0) > > @@ -42,7 +44,7 @@ static inline long prctl_set_seccomp(unsigned long arg2) > return -EINVAL; > } > > -static inline int seccomp_mode(seccomp_t *s) > +static inline int seccomp_mode(struct seccomp_struct *s) > { > return 0; > } > -- > 1.7.5.4 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 1/3] seccomp: kill the seccomp_t typedef 2012-02-02 15:29 ` [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Serge E. Hallyn @ 2012-02-03 23:16 ` Will Drewry 2012-02-04 1:05 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Will Drewry @ 2012-02-03 23:16 UTC (permalink / raw) To: Serge E. Hallyn Cc: linux-kernel, keescook, john.johansen, coreyb, pmoore, eparis, djm, torvalds, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On Thu, Feb 2, 2012 at 7:29 AM, Serge E. Hallyn <serge.hallyn@canonical.com> wrote: > Quoting Will Drewry (wad@chromium.org): >> Replaces the seccomp_t typedef with seccomp_struct to match modern >> kernel style. > > (sorry, I'm a bit behind on list) > > You were going to switch this to 'struct seccomp' right? I wasn;'t sure if task_struct { ... struct seccomp seccomp; } was as ideal. I've noticed that almost all of the duplicate names in the task struct use redundancy to differentiate the naming, but I'm happy enough to rename if appropriate. >> Signed-off-by: Will Drewry <wad@chromium.org> >> --- >> include/linux/sched.h | 2 +- >> include/linux/seccomp.h | 10 ++++++---- >> 2 files changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index 4032ec1..288b5cb 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -1418,7 +1418,7 @@ struct task_struct { >> uid_t loginuid; >> unsigned int sessionid; >> #endif >> - seccomp_t seccomp; >> + struct seccomp_struct seccomp; >> >> /* Thread group tracking */ >> u32 parent_exec_id; >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h >> index cc7a4e9..171ab66 100644 >> --- a/include/linux/seccomp.h >> +++ b/include/linux/seccomp.h >> @@ -7,7 +7,9 @@ >> #include <linux/thread_info.h> >> #include <asm/seccomp.h> >> >> -typedef struct { int mode; } seccomp_t; >> +struct seccomp_struct { >> + int mode; >> +}; >> >> extern void __secure_computing(int); >> static inline void secure_computing(int this_syscall) >> @@ -19,7 +21,7 @@ static inline void secure_computing(int this_syscall) >> extern long prctl_get_seccomp(void); >> extern long prctl_set_seccomp(unsigned long); >> >> -static inline int seccomp_mode(seccomp_t *s) >> +static inline int seccomp_mode(struct seccomp_struct *s) >> { >> return s->mode; >> } >> @@ -28,7 +30,7 @@ static inline int seccomp_mode(seccomp_t *s) >> >> #include <linux/errno.h> >> >> -typedef struct { } seccomp_t; >> +struct seccomp_struct { }; >> >> #define secure_computing(x) do { } while (0) >> >> @@ -42,7 +44,7 @@ static inline long prctl_set_seccomp(unsigned long arg2) >> return -EINVAL; >> } >> >> -static inline int seccomp_mode(seccomp_t *s) >> +static inline int seccomp_mode(struct seccomp_struct *s) >> { >> return 0; >> } >> -- >> 1.7.5.4 >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 1/3] seccomp: kill the seccomp_t typedef 2012-02-03 23:16 ` Will Drewry @ 2012-02-04 1:05 ` Linus Torvalds 2012-02-06 16:13 ` Will Drewry 0 siblings, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2012-02-04 1:05 UTC (permalink / raw) To: Will Drewry Cc: Serge E. Hallyn, linux-kernel, keescook, john.johansen, coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On Fri, Feb 3, 2012 at 3:16 PM, Will Drewry <wad@chromium.org> wrote: > > task_struct { > ... > struct seccomp seccomp; > } > > was as ideal. I've noticed that almost all of the duplicate names in > the task struct use redundancy to differentiate the naming, but I'm > happy enough to rename if appropriate. The redundant "struct xyz_struct" naming is traditional, but we try to avoid it these days. The reason for it is that I long long ago was a bit confused about the C namespace rules, so for the longest time I made struct names unique for no really good reason. The struct/union namespace is separate from the other namespaces, so trying to make things unique really has no good reason. And obviously "struct task_struct" is one of those very old things, and then the "struct xyz_struct" naming kind of spread from there. I think "struct seccomp" is fine, and even if "struct x x" looks a bit odd, it's at least _less_ repetition than "struct x_struct x" which is just really repetitive. That said, just to make "grep" easier, please do the whole "struct xyz" always together, and always with just a single space in between them, so that git grep "struct xyz" does the right thing. And for the same reason, when declaring a struct, people should always use "struct xyz {", with that exact spacing. The exact details of spacing obviously has no semantic meaning, but making it easy to grep for use and for definition is really convenient. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 1/3] seccomp: kill the seccomp_t typedef 2012-02-04 1:05 ` Linus Torvalds @ 2012-02-06 16:13 ` Will Drewry 0 siblings, 0 replies; 13+ messages in thread From: Will Drewry @ 2012-02-06 16:13 UTC (permalink / raw) To: Linus Torvalds Cc: Serge E. Hallyn, linux-kernel, keescook, john.johansen, coreyb, pmoore, eparis, djm, segoon, rostedt, jmorris, scarybeasts, avi, penberg, viro, luto, mingo, akpm, khilman, borislav.petkov, amwang, oleg, ak, eric.dumazet, gregkh, dhowells, daniel.lezcano, linux-fsdevel, linux-security-module, olofj, mhalcrow, dlaor, corbet, alan, indan, mcgrathr On Fri, Feb 3, 2012 at 7:05 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Feb 3, 2012 at 3:16 PM, Will Drewry <wad@chromium.org> wrote: >> >> task_struct { >> ... >> struct seccomp seccomp; >> } >> >> was as ideal. I've noticed that almost all of the duplicate names in >> the task struct use redundancy to differentiate the naming, but I'm >> happy enough to rename if appropriate. > > The redundant "struct xyz_struct" naming is traditional, but we try to > avoid it these days. The reason for it is that I long long ago was a > bit confused about the C namespace rules, so for the longest time I > made struct names unique for no really good reason. The struct/union > namespace is separate from the other namespaces, so trying to make > things unique really has no good reason. > > And obviously "struct task_struct" is one of those very old things, > and then the "struct xyz_struct" naming kind of spread from there. > > I think "struct seccomp" is fine, and even if "struct x x" looks a bit > odd, it's at least _less_ repetition than "struct x_struct x" which is > just really repetitive. > > That said, just to make "grep" easier, please do the whole "struct > xyz" always together, and always with just a single space in between > them, so that > > git grep "struct xyz" > > does the right thing. And for the same reason, when declaring a > struct, people should always use "struct xyz {", with that exact > spacing. The exact details of spacing obviously has no semantic > meaning, but making it easy to grep for use and for definition is > really convenient. Thanks for the background and explanation! will ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-02-06 16:13 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-01-28 22:11 [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Will Drewry 2012-01-28 22:11 ` [PATCH v6 2/3] seccomp_filters: system call filtering using BPF Will Drewry 2012-01-31 14:13 ` Eduardo Otubo 2012-01-31 15:20 ` Will Drewry 2012-02-02 15:32 ` Serge E. Hallyn 2012-02-03 23:14 ` Will Drewry 2012-01-28 22:11 ` [PATCH v6 3/3] Documentation: prctl/seccomp_filter Will Drewry 2012-01-30 22:47 ` Corey Bryant 2012-01-30 22:52 ` Will Drewry 2012-02-02 15:29 ` [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Serge E. Hallyn 2012-02-03 23:16 ` Will Drewry 2012-02-04 1:05 ` Linus Torvalds 2012-02-06 16:13 ` Will Drewry
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).