* [PATCH v2 0/4] seccomp trap to userspace
@ 2018-05-17 15:12 Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
` (4 more replies)
0 siblings, 5 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:12 UTC (permalink / raw)
To: linux-kernel, containers
Cc: Kees Cook, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
Serge E . Hallyn, Christian Brauner, Tyler Hicks, Akihiro Suda,
Tobin C . Harding, Tycho Andersen
Hi,
After a while focusing on other things, I finally managed ot get a v2 of
this series prepared. I believe I've addressed all the feedback from v1,
except for one major point: switching the communication protocol over
the fd to nlattr. I looked into doing this, but the kernel stuff for
dealing with nlattr seems to require an skb (via nlmsg_{new,put} and
netlink_unicast), which means we need to deal with the netlink sequence
numbers, portids, and create a socket protocol. I can do this if we
still think nlattr is necessary, but based on looking at it, it seems
like a lot of extra code for no real benefit.
I've also added support for passing fds. The code itself is simple, but
the API could/should probably be different, see patch 4 for discussion.
Tycho
Tycho Andersen (4):
seccomp: add a return code to trap to userspace
seccomp: make get_nth_filter available outside of CHECKPOINT_RESTORE
seccomp: add a way to get a listener fd from ptrace
seccomp: add support for passing fds via USER_NOTIF
arch/Kconfig | 7 +
include/linux/seccomp.h | 14 +-
include/uapi/linux/ptrace.h | 2 +
include/uapi/linux/seccomp.h | 20 +-
kernel/ptrace.c | 4 +
kernel/seccomp.c | 480 +++++++++++++++++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 359 ++++++++++++-
7 files changed, 878 insertions(+), 8 deletions(-)
--
2.17.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:12 [PATCH v2 0/4] seccomp trap to userspace Tycho Andersen
@ 2018-05-17 15:12 ` Tycho Andersen
2018-05-17 15:33 ` Oleg Nesterov
` (3 more replies)
2018-05-17 15:12 ` [PATCH v2 2/4] seccomp: make get_nth_filter available outside of CHECKPOINT_RESTORE Tycho Andersen
` (3 subsequent siblings)
4 siblings, 4 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:12 UTC (permalink / raw)
To: linux-kernel, containers
Cc: Kees Cook, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
Serge E . Hallyn, Christian Brauner, Tyler Hicks, Akihiro Suda,
Tobin C . Harding, Tycho Andersen
This patch introduces a means for syscalls matched in seccomp to notify
some other task that a particular filter has been triggered.
The motivation for this is primarily for use with containers. For example,
if a container does an init_module(), we obviously don't want to load this
untrusted code, which may be compiled for the wrong version of the kernel
anyway. Instead, we could parse the module image, figure out which module
the container is trying to load and load it on the host.
As another example, containers cannot mknod(), since this checks
capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or
/dev/zero should be ok for containers to mknod, but we'd like to avoid hard
coding some whitelist in the kernel. Another example is mount(), which has
many security restrictions for good reason, but configuration or runtime
knowledge could potentially be used to relax these restrictions.
This patch adds functionality that is already possible via at least two
other means that I know about, both of which involve ptrace(): first, one
could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL.
Unfortunately this is slow, so a faster version would be to install a
filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP.
Since ptrace allows only one tracer, if the container runtime is that
tracer, users inside the container (or outside) trying to debug it will not
be able to use ptrace, which is annoying. It also means that older
distributions based on Upstart cannot boot inside containers using ptrace,
since upstart itself uses ptrace to start services.
The actual implementation of this is fairly small, although getting the
synchronization right was/is slightly complex.
Finally, it's worth noting that the classic seccomp TOCTOU of reading
memory data from the task still applies here, but can be avoided with
careful design of the userspace handler: if the userspace handler reads all
of the task memory that is necessary before applying its security policy,
the tracee's subsequent memory edits will not be read by the tracer.
v2: * make id a u64; the idea here being that it will never overflow,
because 64 is huge (one syscall every nanosecond => wrap every 584
years)
* prevent nesting of user notifications: if someone is already attached
the tree in one place, nobody else can attach to the tree
* notify the listener of signals the tracee receives as well
* implement poll
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: "Serge E. Hallyn" <serge@hallyn.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Tyler Hicks <tyhicks@canonical.com>
CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
---
arch/Kconfig | 7 +
include/linux/seccomp.h | 3 +-
include/uapi/linux/seccomp.h | 18 +-
kernel/seccomp.c | 402 +++++++++++++++++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 181 +++++++-
5 files changed, 605 insertions(+), 6 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 8e0d665c8d53..dd99eef3e049 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -401,6 +401,13 @@ config SECCOMP_FILTER
See Documentation/prctl/seccomp_filter.txt for details.
+config SECCOMP_USER_NOTIFICATION
+ bool "Enable the SECCOMP_RET_USER_NOTIF seccomp action"
+ depends on SECCOMP_FILTER
+ help
+ Enable SECCOMP_RET_USER_NOTIF, a return code which can be used by seccomp
+ programs to notify a userspace listener that a particular event happened.
+
config HAVE_GCC_PLUGINS
bool
help
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index c723a5c4e3ff..0fd3e0676a1c 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -5,7 +5,8 @@
#include <uapi/linux/seccomp.h>
#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
- SECCOMP_FILTER_FLAG_LOG)
+ SECCOMP_FILTER_FLAG_LOG | \
+ SECCOMP_FILTER_FLAG_GET_LISTENER)
#ifdef CONFIG_SECCOMP
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 2a0bd9dd104d..8160e6cad528 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -17,8 +17,9 @@
#define SECCOMP_GET_ACTION_AVAIL 2
/* Valid flags for SECCOMP_SET_MODE_FILTER */
-#define SECCOMP_FILTER_FLAG_TSYNC 1
-#define SECCOMP_FILTER_FLAG_LOG 2
+#define SECCOMP_FILTER_FLAG_TSYNC 1
+#define SECCOMP_FILTER_FLAG_LOG 2
+#define SECCOMP_FILTER_FLAG_GET_LISTENER 4
/*
* All BPF programs must return a 32-bit value.
@@ -34,6 +35,7 @@
#define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD
#define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */
#define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */
+#define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */
#define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */
#define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */
#define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */
@@ -59,4 +61,16 @@ struct seccomp_data {
__u64 args[6];
};
+struct seccomp_notif {
+ __u64 id;
+ pid_t pid;
+ struct seccomp_data data;
+};
+
+struct seccomp_notif_resp {
+ __u64 id;
+ __s32 error;
+ __s64 val;
+};
+
#endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index dc77548167ef..a169a62cb78a 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -38,6 +38,53 @@
#include <linux/tracehook.h>
#include <linux/uaccess.h>
+#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
+#include <linux/file.h>
+#include <linux/anon_inodes.h>
+
+enum notify_state {
+ SECCOMP_NOTIFY_INIT,
+ SECCOMP_NOTIFY_SENT,
+ SECCOMP_NOTIFY_REPLIED,
+};
+
+struct seccomp_knotif {
+ /* The pid whose filter triggered the notification */
+ pid_t pid;
+
+ /*
+ * The "cookie" for this request; this is unique for this filter.
+ */
+ u32 id;
+
+ /*
+ * The seccomp data. This pointer is valid the entire time this
+ * notification is active, since it comes from __seccomp_filter which
+ * eclipses the entire lifecycle here.
+ */
+ const struct seccomp_data *data;
+
+ /*
+ * Notification states. When SECCOMP_RET_USER_NOTIF is returned, a
+ * struct seccomp_knotif is created and starts out in INIT. Once the
+ * handler reads the notification off of an FD, it transitions to READ.
+ * If a signal is received the state transitions back to INIT and
+ * another message is sent. When the userspace handler replies, state
+ * transitions to REPLIED.
+ */
+ enum notify_state state;
+
+ /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */
+ int error;
+ long val;
+
+ /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
+ struct completion ready;
+
+ struct list_head list;
+};
+#endif
+
/**
* struct seccomp_filter - container for seccomp BPF programs
*
@@ -64,6 +111,27 @@ struct seccomp_filter {
bool log;
struct seccomp_filter *prev;
struct bpf_prog *prog;
+
+#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
+ /*
+ * A semaphore that users of this notification can wait on for
+ * changes. Actual reads and writes are still controlled with
+ * filter->notify_lock.
+ */
+ struct semaphore request;
+
+ /* A lock for all notification-related accesses. */
+ struct mutex notify_lock;
+
+ /* Is there currently an attached listener? */
+ bool has_listener;
+
+ /* The id of the next request. */
+ u64 next_id;
+
+ /* A list of struct seccomp_knotif elements. */
+ struct list_head notifications;
+#endif
};
/* Limit any path through the tree to 256KB worth of instructions. */
@@ -383,6 +451,13 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
if (!sfilter)
return ERR_PTR(-ENOMEM);
+#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
+ mutex_init(&sfilter->notify_lock);
+ sema_init(&sfilter->request, 0);
+ INIT_LIST_HEAD(&sfilter->notifications);
+ sfilter->next_id = get_random_u64();
+#endif
+
ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
seccomp_check_filter, save_orig);
if (ret < 0) {
@@ -547,13 +622,15 @@ static void seccomp_send_sigsys(int syscall, int reason)
#define SECCOMP_LOG_TRACE (1 << 4)
#define SECCOMP_LOG_LOG (1 << 5)
#define SECCOMP_LOG_ALLOW (1 << 6)
+#define SECCOMP_LOG_USER_NOTIF (1 << 7)
static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
SECCOMP_LOG_KILL_THREAD |
SECCOMP_LOG_TRAP |
SECCOMP_LOG_ERRNO |
SECCOMP_LOG_TRACE |
- SECCOMP_LOG_LOG;
+ SECCOMP_LOG_LOG |
+ SECCOMP_LOG_USER_NOTIF;
static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
bool requested)
@@ -572,6 +649,9 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
case SECCOMP_RET_TRACE:
log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE;
break;
+ case SECCOMP_RET_USER_NOTIF:
+ log = requested && seccomp_actions_logged & SECCOMP_LOG_USER_NOTIF;
+ break;
case SECCOMP_RET_LOG:
log = seccomp_actions_logged & SECCOMP_LOG_LOG;
break;
@@ -645,6 +725,91 @@ void secure_computing_strict(int this_syscall)
}
#else
+#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
+static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
+{
+ u64 ret = filter->next_id;
+
+ /* Note: overflow is ok here, the id just needs to be unique */
+ filter->next_id++;
+
+ return ret;
+}
+
+static void seccomp_do_user_notification(int this_syscall,
+ struct seccomp_filter *match,
+ const struct seccomp_data *sd)
+{
+ int err;
+ long ret = 0;
+ struct seccomp_knotif n = {};
+
+ mutex_lock(&match->notify_lock);
+ if (!match->has_listener) {
+ err = -ENOSYS;
+ goto out;
+ }
+
+ n.pid = current->pid;
+ n.state = SECCOMP_NOTIFY_INIT;
+ n.data = sd;
+ n.id = seccomp_next_notify_id(match);
+ init_completion(&n.ready);
+
+ list_add(&n.list, &match->notifications);
+
+ mutex_unlock(&match->notify_lock);
+ up(&match->request);
+
+ err = wait_for_completion_interruptible(&n.ready);
+ mutex_lock(&match->notify_lock);
+
+ /*
+ * Here it's possible we got a signal and then had to wait on the mutex
+ * while the reply was sent, so let's be sure there wasn't a response
+ * in the meantime.
+ */
+ if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
+ /*
+ * We got a signal. Let's tell userspace about it (potentially
+ * again, if we had already notified them about the first one).
+ */
+ if (n.state == SECCOMP_NOTIFY_SENT) {
+ n.state = SECCOMP_NOTIFY_INIT;
+ up(&match->request);
+ }
+ mutex_unlock(&match->notify_lock);
+ err = wait_for_completion_killable(&n.ready);
+ mutex_lock(&match->notify_lock);
+ if (err < 0)
+ goto remove_list;
+ }
+
+ ret = n.val;
+ err = n.error;
+
+ WARN(n.state != SECCOMP_NOTIFY_REPLIED,
+ "notified about write complete when state is not write");
+
+remove_list:
+ list_del(&n.list);
+out:
+ mutex_unlock(&match->notify_lock);
+ syscall_set_return_value(current, task_pt_regs(current),
+ err, ret);
+}
+#else
+static void seccomp_do_user_notification(int this_syscall,
+ u32 action,
+ struct seccomp_filter *match,
+ const struct seccomp_data *sd)
+{
+ WARN(1, "user notification received, but disabled");
+ seccomp_log(this_syscall, SIGSYS, action, true);
+ do_exit(SIGSYS);
+}
+#endif
+
#ifdef CONFIG_SECCOMP_FILTER
static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
const bool recheck_after_trace)
@@ -722,6 +887,9 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
return 0;
+ case SECCOMP_RET_USER_NOTIF:
+ seccomp_do_user_notification(this_syscall, match, sd);
+ goto skip;
case SECCOMP_RET_LOG:
seccomp_log(this_syscall, 0, action, true);
return 0;
@@ -828,6 +996,11 @@ static long seccomp_set_mode_strict(void)
}
#ifdef CONFIG_SECCOMP_FILTER
+#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
+static struct file *init_listener(struct task_struct *,
+ struct seccomp_filter *);
+#endif
+
/**
* seccomp_set_mode_filter: internal function for setting seccomp filter
* @flags: flags to change filter behavior
@@ -847,6 +1020,8 @@ static long seccomp_set_mode_filter(unsigned int flags,
const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
struct seccomp_filter *prepared = NULL;
long ret = -EINVAL;
+ int listener = 0;
+ struct file *listener_f = NULL;
/* Validate flags. */
if (flags & ~SECCOMP_FILTER_FLAG_MASK)
@@ -857,13 +1032,28 @@ static long seccomp_set_mode_filter(unsigned int flags,
if (IS_ERR(prepared))
return PTR_ERR(prepared);
+ if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
+ listener = get_unused_fd_flags(O_RDWR);
+ if (listener < 0) {
+ ret = listener;
+ goto out_free;
+ }
+
+ listener_f = init_listener(current, prepared);
+ if (IS_ERR(listener_f)) {
+ put_unused_fd(listener);
+ ret = PTR_ERR(listener_f);
+ goto out_free;
+ }
+ }
+
/*
* Make sure we cannot change seccomp or nnp state via TSYNC
* while another thread is in the middle of calling exec.
*/
if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
mutex_lock_killable(¤t->signal->cred_guard_mutex))
- goto out_free;
+ goto out_put_fd;
spin_lock_irq(¤t->sighand->siglock);
@@ -881,6 +1071,16 @@ static long seccomp_set_mode_filter(unsigned int flags,
spin_unlock_irq(¤t->sighand->siglock);
if (flags & SECCOMP_FILTER_FLAG_TSYNC)
mutex_unlock(¤t->signal->cred_guard_mutex);
+out_put_fd:
+ if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
+ if (ret < 0) {
+ fput(listener_f);
+ put_unused_fd(listener);
+ } else {
+ fd_install(listener, listener_f);
+ ret = listener;
+ }
+ }
out_free:
seccomp_filter_free(prepared);
return ret;
@@ -909,6 +1109,9 @@ static long seccomp_get_action_avail(const char __user *uaction)
case SECCOMP_RET_LOG:
case SECCOMP_RET_ALLOW:
break;
+ case SECCOMP_RET_USER_NOTIF:
+ if (IS_ENABLED(CONFIG_SECCOMP_USER_NOTIFICATION))
+ break;
default:
return -EOPNOTSUPP;
}
@@ -1105,6 +1308,7 @@ long seccomp_get_metadata(struct task_struct *task,
#define SECCOMP_RET_KILL_THREAD_NAME "kill_thread"
#define SECCOMP_RET_TRAP_NAME "trap"
#define SECCOMP_RET_ERRNO_NAME "errno"
+#define SECCOMP_RET_USER_NOTIF_NAME "user_notif"
#define SECCOMP_RET_TRACE_NAME "trace"
#define SECCOMP_RET_LOG_NAME "log"
#define SECCOMP_RET_ALLOW_NAME "allow"
@@ -1114,6 +1318,7 @@ static const char seccomp_actions_avail[] =
SECCOMP_RET_KILL_THREAD_NAME " "
SECCOMP_RET_TRAP_NAME " "
SECCOMP_RET_ERRNO_NAME " "
+ SECCOMP_RET_USER_NOTIF_NAME " "
SECCOMP_RET_TRACE_NAME " "
SECCOMP_RET_LOG_NAME " "
SECCOMP_RET_ALLOW_NAME;
@@ -1131,6 +1336,7 @@ static const struct seccomp_log_name seccomp_log_names[] = {
{ SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME },
{ SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME },
{ SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME },
+ { SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME },
{ }
};
@@ -1279,3 +1485,195 @@ static int __init seccomp_sysctl_init(void)
device_initcall(seccomp_sysctl_init)
#endif /* CONFIG_SYSCTL */
+
+#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
+static int seccomp_notify_release(struct inode *inode, struct file *file)
+{
+ struct seccomp_filter *filter = file->private_data;
+ struct seccomp_knotif *knotif;
+
+ mutex_lock(&filter->notify_lock);
+
+ /*
+ * If this file is being closed because e.g. the task who owned it
+ * died, let's wake everyone up who was waiting on us.
+ */
+ list_for_each_entry(knotif, &filter->notifications, list) {
+ if (knotif->state == SECCOMP_NOTIFY_REPLIED)
+ continue;
+
+ knotif->state = SECCOMP_NOTIFY_REPLIED;
+ knotif->error = -ENOSYS;
+ knotif->val = 0;
+
+ complete(&knotif->ready);
+ }
+
+ filter->has_listener = false;
+ mutex_unlock(&filter->notify_lock);
+ __put_seccomp_filter(filter);
+ return 0;
+}
+
+static ssize_t seccomp_notify_read(struct file *f, char __user *buf,
+ size_t size, loff_t *ppos)
+{
+ struct seccomp_filter *filter = f->private_data;
+ struct seccomp_knotif *knotif = NULL, *cur;
+ struct seccomp_notif unotif;
+ ssize_t ret;
+
+ /* No offset reads. */
+ if (*ppos != 0)
+ return -EINVAL;
+
+ ret = down_interruptible(&filter->request);
+ if (ret < 0)
+ return ret;
+
+ mutex_lock(&filter->notify_lock);
+ list_for_each_entry(cur, &filter->notifications, list) {
+ if (cur->state == SECCOMP_NOTIFY_INIT) {
+ knotif = cur;
+ break;
+ }
+ }
+
+ /*
+ * If we didn't find a notification, it could be that the task was
+ * interrupted between the time we were woken and when we were able to
+ * acquire the rw lock. Should we retry here or just -ENOENT? -ENOENT
+ * for now.
+ */
+ if (!knotif) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ unotif.id = knotif->id;
+ unotif.pid = knotif->pid;
+ unotif.data = *(knotif->data);
+
+ size = min_t(size_t, size, sizeof(struct seccomp_notif));
+ if (copy_to_user(buf, &unotif, size)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ ret = sizeof(unotif);
+ knotif->state = SECCOMP_NOTIFY_SENT;
+
+out:
+ mutex_unlock(&filter->notify_lock);
+ return ret;
+}
+
+static ssize_t seccomp_notify_write(struct file *file, const char __user *buf,
+ size_t size, loff_t *ppos)
+{
+ struct seccomp_filter *filter = file->private_data;
+ struct seccomp_notif_resp resp = {};
+ struct seccomp_knotif *knotif = NULL;
+ ssize_t ret = -EINVAL;
+
+ /* No partial writes. */
+ if (*ppos != 0)
+ return -EINVAL;
+
+ size = min_t(size_t, size, sizeof(resp));
+ if (copy_from_user(&resp, buf, size))
+ return -EFAULT;
+
+ ret = mutex_lock_interruptible(&filter->notify_lock);
+ if (ret < 0)
+ return ret;
+
+ list_for_each_entry(knotif, &filter->notifications, list) {
+ if (knotif->id == resp.id)
+ break;
+ }
+
+ if (!knotif || knotif->id != resp.id) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* Allow exactly one reply. */
+ if (knotif->state != SECCOMP_NOTIFY_SENT) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = size;
+ knotif->state = SECCOMP_NOTIFY_REPLIED;
+ knotif->error = resp.error;
+ knotif->val = resp.val;
+ complete(&knotif->ready);
+out:
+ mutex_unlock(&filter->notify_lock);
+ return ret;
+}
+
+static __poll_t seccomp_notify_poll(struct file *file,
+ struct poll_table_struct *poll_tab)
+{
+ struct seccomp_filter *filter = file->private_data;
+ __poll_t ret = 0;
+ struct seccomp_knotif *cur;
+
+ ret = mutex_lock_interruptible(&filter->notify_lock);
+ if (ret < 0)
+ return ret;
+
+ list_for_each_entry(cur, &filter->notifications, list) {
+ if (cur->state == SECCOMP_NOTIFY_INIT)
+ ret |= EPOLLIN | EPOLLRDNORM;
+ if (cur->state == SECCOMP_NOTIFY_SENT)
+ ret |= EPOLLOUT | EPOLLWRNORM;
+ }
+
+ mutex_unlock(&filter->notify_lock);
+
+ return ret;
+}
+
+static const struct file_operations seccomp_notify_ops = {
+ .read = seccomp_notify_read,
+ .write = seccomp_notify_write,
+ .poll = seccomp_notify_poll,
+ .release = seccomp_notify_release,
+};
+
+static struct file *init_listener(struct task_struct *task,
+ struct seccomp_filter *filter)
+{
+ struct file *ret = ERR_PTR(-EBUSY);
+ struct seccomp_filter *cur;
+ bool have_listener = false;
+
+ for (cur = task->seccomp.filter; cur; cur = cur->prev) {
+ mutex_lock(&cur->notify_lock);
+ if (cur->has_listener)
+ have_listener = true;
+ }
+
+ if (have_listener)
+ goto out;
+
+ ret = anon_inode_getfile("seccomp notify", &seccomp_notify_ops,
+ filter, O_RDWR);
+ if (IS_ERR(ret))
+ goto out;
+
+
+ /* The file has a reference to it now */
+ __get_seccomp_filter(filter);
+ filter->has_listener = true;
+
+out:
+ for (cur = task->seccomp.filter; cur; cur = cur->prev)
+ mutex_unlock(&cur->notify_lock);
+
+ return ret;
+}
+#endif
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 168c66d74fc5..bb96df66222f 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -40,10 +40,12 @@
#include <sys/fcntl.h>
#include <sys/mman.h>
#include <sys/times.h>
+#include <sys/socket.h>
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
+#include <poll.h>
#include "../kselftest_harness.h"
@@ -150,6 +152,24 @@ struct seccomp_metadata {
};
#endif
+#ifndef SECCOMP_FILTER_FLAG_GET_LISTENER
+#define SECCOMP_FILTER_FLAG_GET_LISTENER 4
+
+#define SECCOMP_RET_USER_NOTIF 0x7fc00000U
+
+struct seccomp_notif {
+ __u64 id;
+ pid_t pid;
+ struct seccomp_data data;
+};
+
+struct seccomp_notif_resp {
+ __u64 id;
+ __s32 error;
+ __s64 val;
+};
+#endif
+
#ifndef seccomp
int seccomp(unsigned int op, unsigned int flags, void *args)
{
@@ -2072,7 +2092,8 @@ TEST(seccomp_syscall_mode_lock)
TEST(detect_seccomp_filter_flags)
{
unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC,
- SECCOMP_FILTER_FLAG_LOG };
+ SECCOMP_FILTER_FLAG_LOG,
+ SECCOMP_FILTER_FLAG_GET_LISTENER };
unsigned int flag, all_flags;
int i;
long ret;
@@ -2917,6 +2938,164 @@ TEST(get_metadata)
ASSERT_EQ(0, kill(pid, SIGKILL));
}
+static int user_trap_syscall(int nr, unsigned int flags)
+{
+ struct sock_filter filter[] = {
+ BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
+ offsetof(struct seccomp_data, nr)),
+ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1),
+ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_USER_NOTIF),
+ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+ };
+
+ struct sock_fprog prog = {
+ .len = (unsigned short)ARRAY_SIZE(filter),
+ .filter = filter,
+ };
+
+ return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog);
+}
+
+static int read_notif(int listener, struct seccomp_notif *req)
+{
+ int ret;
+
+ do {
+ errno = 0;
+ ret = read(listener, req, sizeof(*req));
+ } while (ret == -1 && errno == ENOENT);
+ return ret;
+}
+
+static void signal_handler(int signal)
+{
+}
+
+#define USER_NOTIF_MAGIC 116983961184613L
+TEST(get_user_notification_syscall)
+{
+ pid_t pid;
+ long ret;
+ int status, listener;
+ struct seccomp_notif req = {};
+ struct seccomp_notif_resp resp = {};
+ struct pollfd pollfd;
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ /* Check that we get -ENOSYS with no listener attached */
+ if (pid == 0) {
+ if (user_trap_syscall(__NR_getpid, 0) < 0)
+ exit(1);
+ ret = syscall(__NR_getpid);
+ exit(ret >= 0 || errno != ENOSYS);
+ }
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+
+ /* Check that the basic notification machinery works */
+ listener = user_trap_syscall(__NR_getpid,
+ SECCOMP_FILTER_FLAG_GET_LISTENER);
+ EXPECT_GE(listener, 0);
+
+ /* Installing a second listener in the chain should EBUSY */
+ EXPECT_EQ(user_trap_syscall(__NR_getpid,
+ SECCOMP_FILTER_FLAG_GET_LISTENER),
+ -1);
+ EXPECT_EQ(errno, EBUSY);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ ret = syscall(__NR_getpid);
+ exit(ret != USER_NOTIF_MAGIC);
+ }
+
+ EXPECT_EQ(read(listener, &req, sizeof(req)), sizeof(req));
+
+ pollfd.fd = listener;
+ pollfd.events = POLLIN | POLLOUT;
+
+ EXPECT_GT(poll(&pollfd, 1, -1), 0);
+ EXPECT_EQ(pollfd.revents, POLLOUT);
+
+ EXPECT_EQ(req.data.nr, __NR_getpid);
+
+ resp.id = req.id;
+ resp.error = 0;
+ resp.val = USER_NOTIF_MAGIC;
+
+ EXPECT_EQ(write(listener, &resp, sizeof(resp)), sizeof(resp));
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+
+ /*
+ * Check that nothing bad happens when we kill the task in the middle
+ * of a syscall.
+ */
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ ret = syscall(__NR_getpid);
+ exit(ret != USER_NOTIF_MAGIC);
+ }
+
+ ret = read(listener, &req, sizeof(req));
+ EXPECT_EQ(ret, sizeof(req));
+
+ EXPECT_EQ(kill(pid, SIGKILL), 0);
+ EXPECT_EQ(waitpid(pid, NULL, 0), pid);
+
+ resp.id = req.id;
+ ret = write(listener, &resp, sizeof(resp));
+ EXPECT_EQ(ret, -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ /*
+ * Check that we get another notification about a signal in the middle
+ * of a syscall.
+ */
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ if (signal(SIGUSR1, signal_handler) == SIG_ERR) {
+ perror("signal");
+ exit(1);
+ }
+ ret = syscall(__NR_getpid);
+ exit(ret != USER_NOTIF_MAGIC);
+ }
+
+ ret = read_notif(listener, &req);
+ EXPECT_EQ(ret, sizeof(req));
+ EXPECT_EQ(errno, 0);
+
+ EXPECT_EQ(kill(pid, SIGUSR1), 0);
+
+ ret = read_notif(listener, &req);
+ EXPECT_EQ(ret, sizeof(req));
+ EXPECT_EQ(errno, 0);
+
+ resp.id = req.id;
+ ret = write(listener, &resp, sizeof(resp));
+ EXPECT_EQ(ret, sizeof(resp));
+ EXPECT_EQ(errno, 0);
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+
+ close(listener);
+}
+
/*
* TODO:
* - add microbenchmarks
--
2.17.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 2/4] seccomp: make get_nth_filter available outside of CHECKPOINT_RESTORE
2018-05-17 15:12 [PATCH v2 0/4] seccomp trap to userspace Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
@ 2018-05-17 15:12 ` Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace Tycho Andersen
` (2 subsequent siblings)
4 siblings, 0 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:12 UTC (permalink / raw)
To: linux-kernel, containers
Cc: Kees Cook, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
Serge E . Hallyn, Christian Brauner, Tyler Hicks, Akihiro Suda,
Tobin C . Harding, Tycho Andersen
In the next commit we'll use this same mnemonic to get a listener for the
nth filter, so we need it available outside of CHECKPOINT_RESTORE. This is
slightly looser than necessary, because it really could be
CHECKPOINT_RESTORE || USER_NOTIFICATION, but it's declared static and this
complicates the code less, so hopefully it's ok.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: "Serge E. Hallyn" <serge@hallyn.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Tyler Hicks <tyhicks@canonical.com>
CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
v2: new in v2
---
kernel/seccomp.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index a169a62cb78a..f136eca93f2f 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1180,7 +1180,7 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
return do_seccomp(op, 0, uargs);
}
-#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+#if defined(CONFIG_SECCOMP_FILTER)
static struct seccomp_filter *get_nth_filter(struct task_struct *task,
unsigned long filter_off)
{
@@ -1227,6 +1227,7 @@ static struct seccomp_filter *get_nth_filter(struct task_struct *task,
return filter;
}
+#if defined(CONFIG_CHECKPOINT_RESTORE)
long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
void __user *data)
{
@@ -1299,7 +1300,8 @@ long seccomp_get_metadata(struct task_struct *task,
__put_seccomp_filter(filter);
return ret;
}
-#endif
+#endif /* CONFIG_CHECKPOINT_RESTORE */
+#endif /* CONFIG_SECCOMP_FILTER */
#ifdef CONFIG_SYSCTL
--
2.17.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace
2018-05-17 15:12 [PATCH v2 0/4] seccomp trap to userspace Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 2/4] seccomp: make get_nth_filter available outside of CHECKPOINT_RESTORE Tycho Andersen
@ 2018-05-17 15:12 ` Tycho Andersen
2018-05-17 15:41 ` Oleg Nesterov
2018-05-18 14:05 ` Christian Brauner
2018-05-17 15:12 ` [PATCH v2 4/4] seccomp: add support for passing fds via USER_NOTIF Tycho Andersen
2018-05-18 14:03 ` [PATCH v2 0/4] seccomp trap to userspace Christian Brauner
4 siblings, 2 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:12 UTC (permalink / raw)
To: linux-kernel, containers
Cc: Kees Cook, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
Serge E . Hallyn, Christian Brauner, Tyler Hicks, Akihiro Suda,
Tobin C . Harding, Tycho Andersen
As an alternative to SECCOMP_FILTER_FLAG_GET_LISTENER, perhaps a ptrace()
version which can acquire filters is useful. There are at least two reasons
this is preferable, even though it uses ptrace:
1. You can control tasks that aren't cooperating with you
2. You can control tasks whose filters block sendmsg() and socket(); if the
task installs a filter which blocks these calls, there's no way with
SECCOMP_FILTER_FLAG_GET_LISTENER to get the fd out to the privileged task.
v2: fix a bug where listener mode was not unset when an unused fd was not
available
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: "Serge E. Hallyn" <serge@hallyn.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Tyler Hicks <tyhicks@canonical.com>
CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
---
include/linux/seccomp.h | 11 ++++
include/uapi/linux/ptrace.h | 2 +
kernel/ptrace.c | 4 ++
kernel/seccomp.c | 27 ++++++++
tools/testing/selftests/seccomp/seccomp_bpf.c | 66 +++++++++++++++++++
5 files changed, 110 insertions(+)
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 0fd3e0676a1c..10e684899b7b 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -111,4 +111,15 @@ static inline long seccomp_get_metadata(struct task_struct *task,
return -EINVAL;
}
#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
+
+#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
+extern long seccomp_get_listener(struct task_struct *task,
+ unsigned long filter_off);
+#else
+static inline long seccomp_get_listener(struct task_struct *task,
+ unsigned long filter_off)
+{
+ return -EINVAL;
+}
+#endif/* CONFIG_SECCOMP_USER_NOTIFICATION */
#endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index d5a1b8a492b9..dc0abf81de3b 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -73,6 +73,8 @@ struct seccomp_metadata {
__u64 flags; /* Output: filter's flags */
};
+#define PTRACE_SECCOMP_GET_LISTENER 0x420e
+
/* Read signals from a shared (process wide) queue */
#define PTRACE_PEEKSIGINFO_SHARED (1 << 0)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 21fec73d45d4..fcbdb6f4dc07 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1096,6 +1096,10 @@ int ptrace_request(struct task_struct *child, long request,
ret = seccomp_get_metadata(child, addr, datavp);
break;
+ case PTRACE_SECCOMP_GET_LISTENER:
+ ret = seccomp_get_listener(child, addr);
+ break;
+
default:
break;
}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index f136eca93f2f..7c23aee76bb4 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1678,4 +1678,31 @@ static struct file *init_listener(struct task_struct *task,
return ret;
}
+
+long seccomp_get_listener(struct task_struct *task,
+ unsigned long filter_off)
+{
+ struct seccomp_filter *filter;
+ struct file *listener;
+ int fd;
+
+ filter = get_nth_filter(task, filter_off);
+ if (IS_ERR(filter))
+ return PTR_ERR(filter);
+
+ fd = get_unused_fd_flags(O_RDWR);
+ if (fd < 0) {
+ __put_seccomp_filter(filter);
+ return fd;
+ }
+
+ listener = init_listener(task, task->seccomp.filter);
+ if (IS_ERR(listener)) {
+ put_unused_fd(fd);
+ return PTR_ERR(listener);
+ }
+
+ fd_install(fd, listener);
+ return fd;
+}
#endif
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index bb96df66222f..473905f33e0b 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -178,6 +178,10 @@ int seccomp(unsigned int op, unsigned int flags, void *args)
}
#endif
+#ifndef PTRACE_SECCOMP_GET_LISTENER
+#define PTRACE_SECCOMP_GET_LISTENER 0x420e
+#endif
+
#if __BYTE_ORDER == __LITTLE_ENDIAN
#define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
#elif __BYTE_ORDER == __BIG_ENDIAN
@@ -3096,6 +3100,68 @@ TEST(get_user_notification_syscall)
close(listener);
}
+TEST(get_user_notification_ptrace)
+{
+ pid_t pid;
+ int status, listener;
+ int sk_pair[2];
+ char c;
+ struct seccomp_notif req = {};
+ struct seccomp_notif_resp resp = {};
+
+ ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0);
+
+ /* Test that we get ENOSYS while not attached */
+ EXPECT_EQ(syscall(__NR_getpid), -1);
+ EXPECT_EQ(errno, ENOSYS);
+
+ /* Signal we're ready and have installed the filter. */
+ EXPECT_EQ(write(sk_pair[1], "J", 1), 1);
+
+ EXPECT_EQ(read(sk_pair[1], &c, 1), 1);
+ EXPECT_EQ(c, 'H');
+
+ exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC);
+ }
+
+ EXPECT_EQ(read(sk_pair[0], &c, 1), 1);
+ EXPECT_EQ(c, 'J');
+
+ EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0);
+ EXPECT_EQ(waitpid(pid, NULL, 0), pid);
+ listener = ptrace(PTRACE_SECCOMP_GET_LISTENER, pid, 0);
+ EXPECT_GE(listener, 0);
+
+ /* EBUSY for second listener */
+ EXPECT_EQ(ptrace(PTRACE_SECCOMP_GET_LISTENER, pid, 0), -1);
+ EXPECT_EQ(errno, EBUSY);
+
+ EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0);
+
+ /* Now signal we are done and respond with magic */
+ EXPECT_EQ(write(sk_pair[0], "H", 1), 1);
+
+ EXPECT_EQ(read(listener, &req, sizeof(req)), sizeof(req));
+
+ resp.id = req.id;
+ resp.error = 0;
+ resp.val = USER_NOTIF_MAGIC;
+
+ EXPECT_EQ(write(listener, &resp, sizeof(resp)), sizeof(resp));
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+
+ close(listener);
+}
+
/*
* TODO:
* - add microbenchmarks
--
2.17.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 4/4] seccomp: add support for passing fds via USER_NOTIF
2018-05-17 15:12 [PATCH v2 0/4] seccomp trap to userspace Tycho Andersen
` (2 preceding siblings ...)
2018-05-17 15:12 ` [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace Tycho Andersen
@ 2018-05-17 15:12 ` Tycho Andersen
2018-05-18 14:03 ` [PATCH v2 0/4] seccomp trap to userspace Christian Brauner
4 siblings, 0 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:12 UTC (permalink / raw)
To: linux-kernel, containers
Cc: Kees Cook, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
Serge E . Hallyn, Christian Brauner, Tyler Hicks, Akihiro Suda,
Tobin C . Harding, Tycho Andersen
The idea here is that the userspace handler should be able to pass an fd
back to the trapped task, for example so it can be returned from socket().
I've proposed one API here, but I'm open to other options. In particular,
this only lets you return an fd from a syscall, which may not be enough in
all cases. For example, if an fd is written to an output parameter instead
of returned, the current API can't handle this. Another case is that
netlink takes as input fds sometimes (IFLA_NET_NS_FD, e.g.). If netlink
ever decides to install an fd and output it, we wouldn't be able to handle
this either.
Still, the vast majority of interesting cases are covered by this API, so
perhaps it is Enough.
I've left it as a separate commit for two reasons:
* It illustrates the way in which we would grow struct seccomp_notif and
struct seccomp_notif_resp without using netlink
* It shows just how little code is needed to accomplish this :)
v2: new in v2
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: "Serge E. Hallyn" <serge@hallyn.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Tyler Hicks <tyhicks@canonical.com>
CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
---
include/uapi/linux/seccomp.h | 2 +
kernel/seccomp.c | 49 +++++++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 112 ++++++++++++++++++
3 files changed, 161 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 8160e6cad528..3124427219cb 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -71,6 +71,8 @@ struct seccomp_notif_resp {
__u64 id;
__s32 error;
__s64 val;
+ __u8 return_fd;
+ __u32 fd;
};
#endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7c23aee76bb4..c783b8dcd001 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -77,6 +77,8 @@ struct seccomp_knotif {
/* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */
int error;
long val;
+ struct file *file;
+ unsigned int flags;
/* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
struct completion ready;
@@ -785,13 +787,35 @@ static void seccomp_do_user_notification(int this_syscall,
goto remove_list;
}
- ret = n.val;
- err = n.error;
+ if (n.file) {
+ int fd;
+
+ fd = get_unused_fd_flags(n.flags);
+ if (fd < 0) {
+ err = fd;
+ ret = -1;
+ goto remove_list;
+ }
+
+ ret = fd;
+ err = 0;
+
+ fd_install(fd, n.file);
+ /* Don't fput, since fd has a reference now */
+ n.file = NULL;
+ } else {
+ ret = n.val;
+ err = n.error;
+ }
+
WARN(n.state != SECCOMP_NOTIFY_REPLIED,
"notified about write complete when state is not write");
remove_list:
+ if (n.file)
+ fput(n.file);
+
list_del(&n.list);
out:
mutex_unlock(&match->notify_lock);
@@ -1610,6 +1634,27 @@ static ssize_t seccomp_notify_write(struct file *file, const char __user *buf,
knotif->state = SECCOMP_NOTIFY_REPLIED;
knotif->error = resp.error;
knotif->val = resp.val;
+
+ if (resp.return_fd) {
+ struct fd fd;
+
+ /*
+ * This is a little hokey: we need a real fget() (i.e. not
+ * __fget_light(), which is what fdget does), but we also need
+ * the flags from strcut fd. So, we get it, put it, and get it
+ * again for real.
+ */
+ fd = fdget(resp.fd);
+ knotif->flags = fd.flags;
+ fdput(fd);
+
+ knotif->file = fget(resp.fd);
+ if (!knotif->file) {
+ ret = -EBADF;
+ goto out;
+ }
+ }
+
complete(&knotif->ready);
out:
mutex_unlock(&filter->notify_lock);
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 473905f33e0b..b04d3ecc61f4 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -167,6 +167,8 @@ struct seccomp_notif_resp {
__u64 id;
__s32 error;
__s64 val;
+ __u8 return_fd;
+ __u32 fd;
};
#endif
@@ -3162,6 +3164,116 @@ TEST(get_user_notification_ptrace)
close(listener);
}
+TEST(user_notification_pass_fd)
+{
+ pid_t pid;
+ int status, listener;
+ int sk_pair[2];
+ char c;
+ struct seccomp_notif req = {};
+ struct seccomp_notif_resp resp = {};
+ long ret;
+
+ ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ int fd;
+ char buf[16];
+
+ EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0);
+
+ /* Signal we're ready and have installed the filter. */
+ EXPECT_EQ(write(sk_pair[1], "J", 1), 1);
+
+ EXPECT_EQ(read(sk_pair[1], &c, 1), 1);
+ EXPECT_EQ(c, 'H');
+ close(sk_pair[1]);
+
+ /* An fd from getpid(). Let the games begin. */
+ fd = syscall(__NR_getpid);
+ EXPECT_GT(fd, 0);
+ EXPECT_EQ(read(fd, buf, sizeof(buf)), 12);
+ close(fd);
+
+ exit(strcmp("hello world", buf));
+ }
+
+ EXPECT_EQ(read(sk_pair[0], &c, 1), 1);
+ EXPECT_EQ(c, 'J');
+
+ EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0);
+ EXPECT_EQ(waitpid(pid, NULL, 0), pid);
+ listener = ptrace(PTRACE_SECCOMP_GET_LISTENER, pid, 0);
+ EXPECT_GE(listener, 0);
+ EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0);
+
+ /* Now signal we are done installing so it can do a getpid */
+ EXPECT_EQ(write(sk_pair[0], "H", 1), 1);
+ close(sk_pair[0]);
+
+ /* Make a new socket pair so we can send half across */
+ EXPECT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0);
+
+ ret = read_notif(listener, &req);
+ EXPECT_EQ(ret, sizeof(req));
+ EXPECT_EQ(errno, 0);
+
+ resp.id = req.id;
+ resp.return_fd = 1;
+ resp.fd = sk_pair[1];
+ EXPECT_EQ(write(listener, &resp, sizeof(resp)), sizeof(resp));
+ close(sk_pair[1]);
+
+ EXPECT_EQ(write(sk_pair[0], "hello world\0", 12), 12);
+ close(sk_pair[0]);
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+ close(listener);
+}
+
+TEST(user_notification_struct_size_mismatch)
+{
+ pid_t pid;
+ long ret;
+ int status, listener, len;
+ struct seccomp_notif req;
+ struct seccomp_notif_resp resp;
+
+ listener = user_trap_syscall(__NR_getpid,
+ SECCOMP_FILTER_FLAG_GET_LISTENER);
+ EXPECT_GE(listener, 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ ret = syscall(__NR_getpid);
+ exit(ret != USER_NOTIF_MAGIC);
+ }
+
+ EXPECT_EQ(read(listener, &req, sizeof(req)), sizeof(req));
+
+ resp.id = req.id;
+ resp.error = 0;
+ resp.val = USER_NOTIF_MAGIC;
+
+ /*
+ * Only write a partial structure: this is what was available before we
+ * had fd support.
+ */
+ len = offsetof(struct seccomp_notif_resp, val) + sizeof(resp.val);
+ EXPECT_EQ(write(listener, &resp, len), len);
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+}
+
/*
* TODO:
* - add microbenchmarks
--
2.17.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
@ 2018-05-17 15:33 ` Oleg Nesterov
2018-05-17 15:39 ` Tycho Andersen
2018-05-18 14:04 ` Christian Brauner
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2018-05-17 15:33 UTC (permalink / raw)
To: Tycho Andersen
Cc: linux-kernel, containers, Kees Cook, Andy Lutomirski,
Eric W . Biederman, Serge E . Hallyn, Christian Brauner,
Tyler Hicks, Akihiro Suda, Tobin C . Harding
I didn't read this series yet, and I don't even understand what are you
trying to do, just one question...
On 05/17, Tycho Andersen wrote:
>
> +static struct file *init_listener(struct task_struct *task,
> + struct seccomp_filter *filter)
> +{
> + struct file *ret = ERR_PTR(-EBUSY);
> + struct seccomp_filter *cur;
> + bool have_listener = false;
> +
> + for (cur = task->seccomp.filter; cur; cur = cur->prev) {
> + mutex_lock(&cur->notify_lock);
Did you test this patch with CONFIG_LOCKDEP ?
>From lockdep pov this loop tries to take the same lock twice or more, it shoul
complain.
Oleg.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:33 ` Oleg Nesterov
@ 2018-05-17 15:39 ` Tycho Andersen
2018-05-17 15:46 ` Oleg Nesterov
0 siblings, 1 reply; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:39 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, containers, Kees Cook, Andy Lutomirski,
Eric W . Biederman, Serge E . Hallyn, Christian Brauner,
Tyler Hicks, Akihiro Suda, Tobin C . Harding
Hi Oleg,
Thanks for taking a look!
On Thu, May 17, 2018 at 05:33:24PM +0200, Oleg Nesterov wrote:
> I didn't read this series yet, and I don't even understand what are you
> trying to do, just one question...
>
> On 05/17, Tycho Andersen wrote:
> >
> > +static struct file *init_listener(struct task_struct *task,
> > + struct seccomp_filter *filter)
> > +{
> > + struct file *ret = ERR_PTR(-EBUSY);
> > + struct seccomp_filter *cur;
> > + bool have_listener = false;
> > +
> > + for (cur = task->seccomp.filter; cur; cur = cur->prev) {
> > + mutex_lock(&cur->notify_lock);
>
> Did you test this patch with CONFIG_LOCKDEP ?
Yes, with,
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
> From lockdep pov this loop tries to take the same lock twice or more, it shoul
> complain.
I didn't, but I guess that's because it's not trying to take the same lock
twice -- the pointer cur is changing in the loop. Unless I'm misunderstanding
what you're saying.
The idea behind this code is to lock the entire chain of filters up to the
parent so that we can ensure none of them have a listener installed. This is
based on a suggestion from Andy last review cycle to not allow two listeners,
since nesting would be confusing.
Tycho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace
2018-05-17 15:12 ` [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace Tycho Andersen
@ 2018-05-17 15:41 ` Oleg Nesterov
2018-05-17 15:57 ` Tycho Andersen
2018-05-18 14:05 ` Christian Brauner
1 sibling, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2018-05-17 15:41 UTC (permalink / raw)
To: Tycho Andersen
Cc: linux-kernel, containers, Kees Cook, Andy Lutomirski,
Eric W . Biederman, Serge E . Hallyn, Christian Brauner,
Tyler Hicks, Akihiro Suda, Tobin C . Harding
again, I don't understand this code yet, but
On 05/17, Tycho Andersen wrote:
>
> +long seccomp_get_listener(struct task_struct *task,
> + unsigned long filter_off)
> +{
> + struct seccomp_filter *filter;
> + struct file *listener;
> + int fd;
> +
> + filter = get_nth_filter(task, filter_off);
> + if (IS_ERR(filter))
> + return PTR_ERR(filter);
> +
> + fd = get_unused_fd_flags(O_RDWR);
> + if (fd < 0) {
> + __put_seccomp_filter(filter);
> + return fd;
> + }
> +
> + listener = init_listener(task, task->seccomp.filter);
> + if (IS_ERR(listener)) {
> + put_unused_fd(fd);
> + return PTR_ERR(listener);
__put_seccomp_filter() ?
and since init_listener() does __get_seccomp_filter() on sucess, it is needed
uncondtitionally?
Oleg.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:39 ` Tycho Andersen
@ 2018-05-17 15:46 ` Oleg Nesterov
2018-05-24 15:28 ` Tycho Andersen
0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2018-05-17 15:46 UTC (permalink / raw)
To: Tycho Andersen
Cc: linux-kernel, containers, Kees Cook, Andy Lutomirski,
Eric W . Biederman, Serge E . Hallyn, Christian Brauner,
Tyler Hicks, Akihiro Suda, Tobin C . Harding
On 05/17, Tycho Andersen wrote:
>
> > From lockdep pov this loop tries to take the same lock twice or more, it shoul
> > complain.
>
> I didn't, but I guess that's because it's not trying to take the same lock
> twice -- the pointer cur is changing in the loop.
Yes, I see. But this is the same lock for lockdep, it has the same class.
Oleg.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace
2018-05-17 15:41 ` Oleg Nesterov
@ 2018-05-17 15:57 ` Tycho Andersen
2018-05-17 15:59 ` Tycho Andersen
0 siblings, 1 reply; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:57 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, containers, Kees Cook, Andy Lutomirski,
Eric W . Biederman, Serge E . Hallyn, Christian Brauner,
Tyler Hicks, Akihiro Suda, Tobin C . Harding
On Thu, May 17, 2018 at 05:41:39PM +0200, Oleg Nesterov wrote:
> again, I don't understand this code yet, but
>
> On 05/17, Tycho Andersen wrote:
> >
> > +long seccomp_get_listener(struct task_struct *task,
> > + unsigned long filter_off)
> > +{
> > + struct seccomp_filter *filter;
> > + struct file *listener;
> > + int fd;
> > +
> > + filter = get_nth_filter(task, filter_off);
> > + if (IS_ERR(filter))
> > + return PTR_ERR(filter);
> > +
> > + fd = get_unused_fd_flags(O_RDWR);
> > + if (fd < 0) {
> > + __put_seccomp_filter(filter);
> > + return fd;
> > + }
> > +
> > + listener = init_listener(task, task->seccomp.filter);
> > + if (IS_ERR(listener)) {
> > + put_unused_fd(fd);
> > + return PTR_ERR(listener);
>
> __put_seccomp_filter() ?
Yes, I think you're right here.
> and since init_listener() does __get_seccomp_filter() on sucess, it is needed
> uncondtitionally?
I think there does need to be a __get_seccomp_filter() on success in
init_listener(), because it's paired with the __put_seccomp_filter in
seccomp_notify_release. The listener fd has a reference to the filter,
and that shouldn't go away until after the fd is freed.
Tycho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace
2018-05-17 15:57 ` Tycho Andersen
@ 2018-05-17 15:59 ` Tycho Andersen
0 siblings, 0 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-17 15:59 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, containers, Kees Cook, Andy Lutomirski,
Eric W . Biederman, Serge E . Hallyn, Christian Brauner,
Tyler Hicks, Akihiro Suda, Tobin C . Harding
On Thu, May 17, 2018 at 09:57:33AM -0600, Tycho Andersen wrote:
> On Thu, May 17, 2018 at 05:41:39PM +0200, Oleg Nesterov wrote:
> > and since init_listener() does __get_seccomp_filter() on sucess, it is needed
> > uncondtitionally?
>
> I think there does need to be a __get_seccomp_filter() on success in
> init_listener(), because it's paired with the __put_seccomp_filter in
> seccomp_notify_release. The listener fd has a reference to the filter,
> and that shouldn't go away until after the fd is freed.
Oh, sorry. I see what you meant here. Yes, it should be unconditional.
Tycho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/4] seccomp trap to userspace
2018-05-17 15:12 [PATCH v2 0/4] seccomp trap to userspace Tycho Andersen
` (3 preceding siblings ...)
2018-05-17 15:12 ` [PATCH v2 4/4] seccomp: add support for passing fds via USER_NOTIF Tycho Andersen
@ 2018-05-18 14:03 ` Christian Brauner
4 siblings, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2018-05-18 14:03 UTC (permalink / raw)
To: Tycho Andersen
Cc: linux-kernel, containers, Tobin C . Harding, Kees Cook,
Akihiro Suda, Oleg Nesterov, Andy Lutomirski, Eric W . Biederman,
Christian Brauner, Tyler Hicks
On Thu, May 17, 2018 at 09:12:14AM -0600, Tycho Andersen wrote:
> Hi,
>
> After a while focusing on other things, I finally managed ot get a v2 of
> this series prepared. I believe I've addressed all the feedback from v1,
> except for one major point: switching the communication protocol over
> the fd to nlattr. I looked into doing this, but the kernel stuff for
> dealing with nlattr seems to require an skb (via nlmsg_{new,put} and
> netlink_unicast), which means we need to deal with the netlink sequence
> numbers, portids, and create a socket protocol. I can do this if we
> still think nlattr is necessary, but based on looking at it, it seems
> like a lot of extra code for no real benefit.
Yes, we've had that discussion before and I agree. I fail to see the
benefit here too.
Christian
>
> I've also added support for passing fds. The code itself is simple, but
> the API could/should probably be different, see patch 4 for discussion.
>
> Tycho
>
> Tycho Andersen (4):
> seccomp: add a return code to trap to userspace
> seccomp: make get_nth_filter available outside of CHECKPOINT_RESTORE
> seccomp: add a way to get a listener fd from ptrace
> seccomp: add support for passing fds via USER_NOTIF
>
> arch/Kconfig | 7 +
> include/linux/seccomp.h | 14 +-
> include/uapi/linux/ptrace.h | 2 +
> include/uapi/linux/seccomp.h | 20 +-
> kernel/ptrace.c | 4 +
> kernel/seccomp.c | 480 +++++++++++++++++-
> tools/testing/selftests/seccomp/seccomp_bpf.c | 359 ++++++++++++-
> 7 files changed, 878 insertions(+), 8 deletions(-)
>
> --
> 2.17.0
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
2018-05-17 15:33 ` Oleg Nesterov
@ 2018-05-18 14:04 ` Christian Brauner
2018-05-18 15:21 ` Tycho Andersen
2018-05-19 0:14 ` kbuild test robot
2018-05-19 5:01 ` kbuild test robot
3 siblings, 1 reply; 20+ messages in thread
From: Christian Brauner @ 2018-05-18 14:04 UTC (permalink / raw)
To: Tycho Andersen
Cc: linux-kernel, containers, Tobin C . Harding, Kees Cook,
Akihiro Suda, Oleg Nesterov, Andy Lutomirski, Eric W . Biederman,
Christian Brauner, Tyler Hicks
On Thu, May 17, 2018 at 09:12:15AM -0600, Tycho Andersen wrote:
> This patch introduces a means for syscalls matched in seccomp to notify
> some other task that a particular filter has been triggered.
>
> The motivation for this is primarily for use with containers. For example,
> if a container does an init_module(), we obviously don't want to load this
> untrusted code, which may be compiled for the wrong version of the kernel
> anyway. Instead, we could parse the module image, figure out which module
> the container is trying to load and load it on the host.
>
> As another example, containers cannot mknod(), since this checks
> capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or
> /dev/zero should be ok for containers to mknod, but we'd like to avoid hard
> coding some whitelist in the kernel. Another example is mount(), which has
> many security restrictions for good reason, but configuration or runtime
> knowledge could potentially be used to relax these restrictions.
>
> This patch adds functionality that is already possible via at least two
> other means that I know about, both of which involve ptrace(): first, one
> could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL.
> Unfortunately this is slow, so a faster version would be to install a
> filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP.
> Since ptrace allows only one tracer, if the container runtime is that
> tracer, users inside the container (or outside) trying to debug it will not
> be able to use ptrace, which is annoying. It also means that older
> distributions based on Upstart cannot boot inside containers using ptrace,
> since upstart itself uses ptrace to start services.
>
> The actual implementation of this is fairly small, although getting the
> synchronization right was/is slightly complex.
>
> Finally, it's worth noting that the classic seccomp TOCTOU of reading
> memory data from the task still applies here, but can be avoided with
> careful design of the userspace handler: if the userspace handler reads all
> of the task memory that is necessary before applying its security policy,
> the tracee's subsequent memory edits will not be read by the tracer.
>
> v2: * make id a u64; the idea here being that it will never overflow,
> because 64 is huge (one syscall every nanosecond => wrap every 584
> years)
> * prevent nesting of user notifications: if someone is already attached
> the tree in one place, nobody else can attach to the tree
> * notify the listener of signals the tracee receives as well
> * implement poll
>
> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
> CC: Kees Cook <keescook@chromium.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> CC: "Serge E. Hallyn" <serge@hallyn.com>
> CC: Christian Brauner <christian.brauner@ubuntu.com>
> CC: Tyler Hicks <tyhicks@canonical.com>
> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
> ---
> arch/Kconfig | 7 +
> include/linux/seccomp.h | 3 +-
> include/uapi/linux/seccomp.h | 18 +-
> kernel/seccomp.c | 402 +++++++++++++++++-
> tools/testing/selftests/seccomp/seccomp_bpf.c | 181 +++++++-
> 5 files changed, 605 insertions(+), 6 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8e0d665c8d53..dd99eef3e049 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -401,6 +401,13 @@ config SECCOMP_FILTER
>
> See Documentation/prctl/seccomp_filter.txt for details.
>
> +config SECCOMP_USER_NOTIFICATION
> + bool "Enable the SECCOMP_RET_USER_NOTIF seccomp action"
> + depends on SECCOMP_FILTER
> + help
> + Enable SECCOMP_RET_USER_NOTIF, a return code which can be used by seccomp
> + programs to notify a userspace listener that a particular event happened.
> +
> config HAVE_GCC_PLUGINS
> bool
> help
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index c723a5c4e3ff..0fd3e0676a1c 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -5,7 +5,8 @@
> #include <uapi/linux/seccomp.h>
>
> #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
> - SECCOMP_FILTER_FLAG_LOG)
> + SECCOMP_FILTER_FLAG_LOG | \
> + SECCOMP_FILTER_FLAG_GET_LISTENER)
>
> #ifdef CONFIG_SECCOMP
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 2a0bd9dd104d..8160e6cad528 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -17,8 +17,9 @@
> #define SECCOMP_GET_ACTION_AVAIL 2
>
> /* Valid flags for SECCOMP_SET_MODE_FILTER */
> -#define SECCOMP_FILTER_FLAG_TSYNC 1
> -#define SECCOMP_FILTER_FLAG_LOG 2
> +#define SECCOMP_FILTER_FLAG_TSYNC 1
> +#define SECCOMP_FILTER_FLAG_LOG 2
> +#define SECCOMP_FILTER_FLAG_GET_LISTENER 4
>
> /*
> * All BPF programs must return a 32-bit value.
> @@ -34,6 +35,7 @@
> #define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD
> #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */
> #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */
> +#define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */
> #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */
> #define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */
> #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */
> @@ -59,4 +61,16 @@ struct seccomp_data {
> __u64 args[6];
> };
>
> +struct seccomp_notif {
> + __u64 id;
> + pid_t pid;
> + struct seccomp_data data;
> +};
> +
> +struct seccomp_notif_resp {
> + __u64 id;
> + __s32 error;
> + __s64 val;
> +};
> +
> #endif /* _UAPI_LINUX_SECCOMP_H */
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index dc77548167ef..a169a62cb78a 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -38,6 +38,53 @@
> #include <linux/tracehook.h>
> #include <linux/uaccess.h>
>
> +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> +#include <linux/file.h>
> +#include <linux/anon_inodes.h>
> +
> +enum notify_state {
> + SECCOMP_NOTIFY_INIT,
> + SECCOMP_NOTIFY_SENT,
> + SECCOMP_NOTIFY_REPLIED,
> +};
> +
> +struct seccomp_knotif {
> + /* The pid whose filter triggered the notification */
> + pid_t pid;
> +
> + /*
> + * The "cookie" for this request; this is unique for this filter.
> + */
> + u32 id;
> +
> + /*
> + * The seccomp data. This pointer is valid the entire time this
> + * notification is active, since it comes from __seccomp_filter which
> + * eclipses the entire lifecycle here.
> + */
> + const struct seccomp_data *data;
> +
> + /*
> + * Notification states. When SECCOMP_RET_USER_NOTIF is returned, a
> + * struct seccomp_knotif is created and starts out in INIT. Once the
> + * handler reads the notification off of an FD, it transitions to READ.
> + * If a signal is received the state transitions back to INIT and
> + * another message is sent. When the userspace handler replies, state
> + * transitions to REPLIED.
> + */
> + enum notify_state state;
> +
> + /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */
> + int error;
> + long val;
> +
> + /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
> + struct completion ready;
> +
> + struct list_head list;
> +};
> +#endif
> +
> /**
> * struct seccomp_filter - container for seccomp BPF programs
> *
> @@ -64,6 +111,27 @@ struct seccomp_filter {
> bool log;
> struct seccomp_filter *prev;
> struct bpf_prog *prog;
> +
> +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> + /*
> + * A semaphore that users of this notification can wait on for
> + * changes. Actual reads and writes are still controlled with
> + * filter->notify_lock.
> + */
> + struct semaphore request;
> +
> + /* A lock for all notification-related accesses. */
> + struct mutex notify_lock;
> +
> + /* Is there currently an attached listener? */
> + bool has_listener;
> +
> + /* The id of the next request. */
> + u64 next_id;
> +
> + /* A list of struct seccomp_knotif elements. */
> + struct list_head notifications;
> +#endif
> };
>
> /* Limit any path through the tree to 256KB worth of instructions. */
> @@ -383,6 +451,13 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
> if (!sfilter)
> return ERR_PTR(-ENOMEM);
>
> +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> + mutex_init(&sfilter->notify_lock);
> + sema_init(&sfilter->request, 0);
> + INIT_LIST_HEAD(&sfilter->notifications);
> + sfilter->next_id = get_random_u64();
> +#endif
> +
> ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
> seccomp_check_filter, save_orig);
> if (ret < 0) {
> @@ -547,13 +622,15 @@ static void seccomp_send_sigsys(int syscall, int reason)
> #define SECCOMP_LOG_TRACE (1 << 4)
> #define SECCOMP_LOG_LOG (1 << 5)
> #define SECCOMP_LOG_ALLOW (1 << 6)
> +#define SECCOMP_LOG_USER_NOTIF (1 << 7)
>
> static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
> SECCOMP_LOG_KILL_THREAD |
> SECCOMP_LOG_TRAP |
> SECCOMP_LOG_ERRNO |
> SECCOMP_LOG_TRACE |
> - SECCOMP_LOG_LOG;
> + SECCOMP_LOG_LOG |
> + SECCOMP_LOG_USER_NOTIF;
>
> static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
> bool requested)
> @@ -572,6 +649,9 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
> case SECCOMP_RET_TRACE:
> log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE;
> break;
> + case SECCOMP_RET_USER_NOTIF:
> + log = requested && seccomp_actions_logged & SECCOMP_LOG_USER_NOTIF;
> + break;
> case SECCOMP_RET_LOG:
> log = seccomp_actions_logged & SECCOMP_LOG_LOG;
> break;
> @@ -645,6 +725,91 @@ void secure_computing_strict(int this_syscall)
> }
> #else
>
> +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> +static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
> +{
> + u64 ret = filter->next_id;
> +
> + /* Note: overflow is ok here, the id just needs to be unique */
> + filter->next_id++;
> +
> + return ret;
> +}
Nit: Depending on how averse people are to relying on side-effects this
could be simplified to:
static inline u64 seccomp_next_notify_id(struct seccomp_filter *filter)
{
/* Note: Overflow is ok. The id just needs to be unique. */
return filter->next_id++;
}
> +
> +static void seccomp_do_user_notification(int this_syscall,
> + struct seccomp_filter *match,
> + const struct seccomp_data *sd)
> +{
> + int err;
> + long ret = 0;
> + struct seccomp_knotif n = {};
> +
> + mutex_lock(&match->notify_lock);
> + if (!match->has_listener) {
> + err = -ENOSYS;
> + goto out;
> + }
Nit:
err = -ENOSYS;
mutex_lock(&match->notify_lock);
if (!match->has_listener)
goto out;
looks cleaner to me or you do the err initalization at the top of the
function. :)
> +
> + n.pid = current->pid;
> + n.state = SECCOMP_NOTIFY_INIT;
> + n.data = sd;
> + n.id = seccomp_next_notify_id(match);
> + init_completion(&n.ready);
> +
> + list_add(&n.list, &match->notifications);
> +
> + mutex_unlock(&match->notify_lock);
> + up(&match->request);
> +
> + err = wait_for_completion_interruptible(&n.ready);
> + mutex_lock(&match->notify_lock);
> +
> + /*
> + * Here it's possible we got a signal and then had to wait on the mutex
> + * while the reply was sent, so let's be sure there wasn't a response
> + * in the meantime.
> + */
> + if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
> + /*
> + * We got a signal. Let's tell userspace about it (potentially
> + * again, if we had already notified them about the first one).
> + */
> + if (n.state == SECCOMP_NOTIFY_SENT) {
> + n.state = SECCOMP_NOTIFY_INIT;
> + up(&match->request);
> + }
> + mutex_unlock(&match->notify_lock);
> + err = wait_for_completion_killable(&n.ready);
> + mutex_lock(&match->notify_lock);
> + if (err < 0)
> + goto remove_list;
> + }
> +
> + ret = n.val;
> + err = n.error;
> +
> + WARN(n.state != SECCOMP_NOTIFY_REPLIED,
> + "notified about write complete when state is not write");
Nit: That message seems a little cryptic.
> +
> +remove_list:
> + list_del(&n.list);
> +out:
> + mutex_unlock(&match->notify_lock);
> + syscall_set_return_value(current, task_pt_regs(current),
> + err, ret);
> +}
> +#else
> +static void seccomp_do_user_notification(int this_syscall,
> + u32 action,
> + struct seccomp_filter *match,
> + const struct seccomp_data *sd)
> +{
> + WARN(1, "user notification received, but disabled");
Nit: "received unexpected user notification" might be clearer
> + seccomp_log(this_syscall, SIGSYS, action, true);
> + do_exit(SIGSYS);
> +}
> +#endif
> +
> #ifdef CONFIG_SECCOMP_FILTER
> static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> const bool recheck_after_trace)
> @@ -722,6 +887,9 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>
> return 0;
>
> + case SECCOMP_RET_USER_NOTIF:
> + seccomp_do_user_notification(this_syscall, match, sd);
> + goto skip;
> case SECCOMP_RET_LOG:
> seccomp_log(this_syscall, 0, action, true);
> return 0;
> @@ -828,6 +996,11 @@ static long seccomp_set_mode_strict(void)
> }
>
> #ifdef CONFIG_SECCOMP_FILTER
> +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> +static struct file *init_listener(struct task_struct *,
> + struct seccomp_filter *);
> +#endif
> +
> /**
> * seccomp_set_mode_filter: internal function for setting seccomp filter
> * @flags: flags to change filter behavior
> @@ -847,6 +1020,8 @@ static long seccomp_set_mode_filter(unsigned int flags,
> const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
> struct seccomp_filter *prepared = NULL;
> long ret = -EINVAL;
> + int listener = 0;
> + struct file *listener_f = NULL;
>
> /* Validate flags. */
> if (flags & ~SECCOMP_FILTER_FLAG_MASK)
> @@ -857,13 +1032,28 @@ static long seccomp_set_mode_filter(unsigned int flags,
> if (IS_ERR(prepared))
> return PTR_ERR(prepared);
>
> + if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
> + listener = get_unused_fd_flags(O_RDWR);
> + if (listener < 0) {
> + ret = listener;
> + goto out_free;
> + }
> +
> + listener_f = init_listener(current, prepared);
> + if (IS_ERR(listener_f)) {
> + put_unused_fd(listener);
> + ret = PTR_ERR(listener_f);
> + goto out_free;
> + }
> + }
> +
> /*
> * Make sure we cannot change seccomp or nnp state via TSYNC
> * while another thread is in the middle of calling exec.
> */
> if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
> mutex_lock_killable(¤t->signal->cred_guard_mutex))
> - goto out_free;
> + goto out_put_fd;
>
> spin_lock_irq(¤t->sighand->siglock);
>
> @@ -881,6 +1071,16 @@ static long seccomp_set_mode_filter(unsigned int flags,
> spin_unlock_irq(¤t->sighand->siglock);
> if (flags & SECCOMP_FILTER_FLAG_TSYNC)
> mutex_unlock(¤t->signal->cred_guard_mutex);
> +out_put_fd:
> + if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
> + if (ret < 0) {
> + fput(listener_f);
> + put_unused_fd(listener);
> + } else {
> + fd_install(listener, listener_f);
> + ret = listener;
> + }
> + }
> out_free:
> seccomp_filter_free(prepared);
> return ret;
> @@ -909,6 +1109,9 @@ static long seccomp_get_action_avail(const char __user *uaction)
> case SECCOMP_RET_LOG:
> case SECCOMP_RET_ALLOW:
> break;
> + case SECCOMP_RET_USER_NOTIF:
> + if (IS_ENABLED(CONFIG_SECCOMP_USER_NOTIFICATION))
> + break;
> default:
> return -EOPNOTSUPP;
> }
> @@ -1105,6 +1308,7 @@ long seccomp_get_metadata(struct task_struct *task,
> #define SECCOMP_RET_KILL_THREAD_NAME "kill_thread"
> #define SECCOMP_RET_TRAP_NAME "trap"
> #define SECCOMP_RET_ERRNO_NAME "errno"
> +#define SECCOMP_RET_USER_NOTIF_NAME "user_notif"
> #define SECCOMP_RET_TRACE_NAME "trace"
> #define SECCOMP_RET_LOG_NAME "log"
> #define SECCOMP_RET_ALLOW_NAME "allow"
> @@ -1114,6 +1318,7 @@ static const char seccomp_actions_avail[] =
> SECCOMP_RET_KILL_THREAD_NAME " "
> SECCOMP_RET_TRAP_NAME " "
> SECCOMP_RET_ERRNO_NAME " "
> + SECCOMP_RET_USER_NOTIF_NAME " "
> SECCOMP_RET_TRACE_NAME " "
> SECCOMP_RET_LOG_NAME " "
> SECCOMP_RET_ALLOW_NAME;
> @@ -1131,6 +1336,7 @@ static const struct seccomp_log_name seccomp_log_names[] = {
> { SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME },
> { SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME },
> { SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME },
> + { SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME },
> { }
> };
>
> @@ -1279,3 +1485,195 @@ static int __init seccomp_sysctl_init(void)
> device_initcall(seccomp_sysctl_init)
>
> #endif /* CONFIG_SYSCTL */
> +
> +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> +static int seccomp_notify_release(struct inode *inode, struct file *file)
> +{
> + struct seccomp_filter *filter = file->private_data;
> + struct seccomp_knotif *knotif;
> +
> + mutex_lock(&filter->notify_lock);
> +
> + /*
> + * If this file is being closed because e.g. the task who owned it
> + * died, let's wake everyone up who was waiting on us.
> + */
> + list_for_each_entry(knotif, &filter->notifications, list) {
> + if (knotif->state == SECCOMP_NOTIFY_REPLIED)
> + continue;
> +
> + knotif->state = SECCOMP_NOTIFY_REPLIED;
> + knotif->error = -ENOSYS;
> + knotif->val = 0;
> +
> + complete(&knotif->ready);
> + }
> +
> + filter->has_listener = false;
> + mutex_unlock(&filter->notify_lock);
> + __put_seccomp_filter(filter);
> + return 0;
> +}
> +
> +static ssize_t seccomp_notify_read(struct file *f, char __user *buf,
> + size_t size, loff_t *ppos)
> +{
> + struct seccomp_filter *filter = f->private_data;
> + struct seccomp_knotif *knotif = NULL, *cur;
> + struct seccomp_notif unotif;
> + ssize_t ret;
> +
> + /* No offset reads. */
> + if (*ppos != 0)
> + return -EINVAL;
> +
> + ret = down_interruptible(&filter->request);
> + if (ret < 0)
> + return ret;
> +
> + mutex_lock(&filter->notify_lock);
> + list_for_each_entry(cur, &filter->notifications, list) {
> + if (cur->state == SECCOMP_NOTIFY_INIT) {
> + knotif = cur;
> + break;
> + }
> + }
> +
> + /*
> + * If we didn't find a notification, it could be that the task was
> + * interrupted between the time we were woken and when we were able to
> + * acquire the rw lock. Should we retry here or just -ENOENT? -ENOENT
> + * for now.
> + */
> + if (!knotif) {
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + unotif.id = knotif->id;
> + unotif.pid = knotif->pid;
> + unotif.data = *(knotif->data);
> +
> + size = min_t(size_t, size, sizeof(struct seccomp_notif));
> + if (copy_to_user(buf, &unotif, size)) {
> + ret = -EFAULT;
> + goto out;
> + }
> +
> + ret = sizeof(unotif);
> + knotif->state = SECCOMP_NOTIFY_SENT;
> +
> +out:
> + mutex_unlock(&filter->notify_lock);
> + return ret;
> +}
> +
> +static ssize_t seccomp_notify_write(struct file *file, const char __user *buf,
> + size_t size, loff_t *ppos)
> +{
> + struct seccomp_filter *filter = file->private_data;
> + struct seccomp_notif_resp resp = {};
> + struct seccomp_knotif *knotif = NULL;
> + ssize_t ret = -EINVAL;
> +
> + /* No partial writes. */
> + if (*ppos != 0)
> + return -EINVAL;
> +
> + size = min_t(size_t, size, sizeof(resp));
> + if (copy_from_user(&resp, buf, size))
> + return -EFAULT;
> +
> + ret = mutex_lock_interruptible(&filter->notify_lock);
> + if (ret < 0)
> + return ret;
> +
> + list_for_each_entry(knotif, &filter->notifications, list) {
> + if (knotif->id == resp.id)
> + break;
> + }
> +
> + if (!knotif || knotif->id != resp.id) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + /* Allow exactly one reply. */
> + if (knotif->state != SECCOMP_NOTIFY_SENT) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + ret = size;
> + knotif->state = SECCOMP_NOTIFY_REPLIED;
> + knotif->error = resp.error;
> + knotif->val = resp.val;
> + complete(&knotif->ready);
> +out:
> + mutex_unlock(&filter->notify_lock);
> + return ret;
> +}
> +
> +static __poll_t seccomp_notify_poll(struct file *file,
> + struct poll_table_struct *poll_tab)
> +{
> + struct seccomp_filter *filter = file->private_data;
> + __poll_t ret = 0;
> + struct seccomp_knotif *cur;
> +
> + ret = mutex_lock_interruptible(&filter->notify_lock);
> + if (ret < 0)
> + return ret;
> +
> + list_for_each_entry(cur, &filter->notifications, list) {
> + if (cur->state == SECCOMP_NOTIFY_INIT)
> + ret |= EPOLLIN | EPOLLRDNORM;
> + if (cur->state == SECCOMP_NOTIFY_SENT)
> + ret |= EPOLLOUT | EPOLLWRNORM;
> + }
> +
> + mutex_unlock(&filter->notify_lock);
> +
> + return ret;
> +}
> +
> +static const struct file_operations seccomp_notify_ops = {
> + .read = seccomp_notify_read,
> + .write = seccomp_notify_write,
> + .poll = seccomp_notify_poll,
> + .release = seccomp_notify_release,
> +};
> +
> +static struct file *init_listener(struct task_struct *task,
> + struct seccomp_filter *filter)
> +{
> + struct file *ret = ERR_PTR(-EBUSY);
> + struct seccomp_filter *cur;
> + bool have_listener = false;
> +
> + for (cur = task->seccomp.filter; cur; cur = cur->prev) {
> + mutex_lock(&cur->notify_lock);
> + if (cur->has_listener)
> + have_listener = true;
> + }
> +
> + if (have_listener)
> + goto out;
> +
> + ret = anon_inode_getfile("seccomp notify", &seccomp_notify_ops,
> + filter, O_RDWR);
> + if (IS_ERR(ret))
> + goto out;
> +
> +
> + /* The file has a reference to it now */
> + __get_seccomp_filter(filter);
> + filter->has_listener = true;
> +
> +out:
> + for (cur = task->seccomp.filter; cur; cur = cur->prev)
> + mutex_unlock(&cur->notify_lock);
> +
> + return ret;
> +}
> +#endif
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index 168c66d74fc5..bb96df66222f 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -40,10 +40,12 @@
> #include <sys/fcntl.h>
> #include <sys/mman.h>
> #include <sys/times.h>
> +#include <sys/socket.h>
>
> #define _GNU_SOURCE
> #include <unistd.h>
> #include <sys/syscall.h>
> +#include <poll.h>
>
> #include "../kselftest_harness.h"
>
> @@ -150,6 +152,24 @@ struct seccomp_metadata {
> };
> #endif
>
> +#ifndef SECCOMP_FILTER_FLAG_GET_LISTENER
> +#define SECCOMP_FILTER_FLAG_GET_LISTENER 4
> +
> +#define SECCOMP_RET_USER_NOTIF 0x7fc00000U
> +
> +struct seccomp_notif {
> + __u64 id;
> + pid_t pid;
> + struct seccomp_data data;
> +};
> +
> +struct seccomp_notif_resp {
> + __u64 id;
> + __s32 error;
> + __s64 val;
> +};
> +#endif
> +
> #ifndef seccomp
> int seccomp(unsigned int op, unsigned int flags, void *args)
> {
> @@ -2072,7 +2092,8 @@ TEST(seccomp_syscall_mode_lock)
> TEST(detect_seccomp_filter_flags)
> {
> unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC,
> - SECCOMP_FILTER_FLAG_LOG };
> + SECCOMP_FILTER_FLAG_LOG,
> + SECCOMP_FILTER_FLAG_GET_LISTENER };
> unsigned int flag, all_flags;
> int i;
> long ret;
> @@ -2917,6 +2938,164 @@ TEST(get_metadata)
> ASSERT_EQ(0, kill(pid, SIGKILL));
> }
>
> +static int user_trap_syscall(int nr, unsigned int flags)
> +{
> + struct sock_filter filter[] = {
> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
> + offsetof(struct seccomp_data, nr)),
> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1),
> + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_USER_NOTIF),
> + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
> + };
> +
> + struct sock_fprog prog = {
> + .len = (unsigned short)ARRAY_SIZE(filter),
> + .filter = filter,
> + };
> +
> + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog);
> +}
> +
> +static int read_notif(int listener, struct seccomp_notif *req)
> +{
> + int ret;
> +
> + do {
> + errno = 0;
> + ret = read(listener, req, sizeof(*req));
> + } while (ret == -1 && errno == ENOENT);
> + return ret;
> +}
> +
> +static void signal_handler(int signal)
> +{
> +}
> +
> +#define USER_NOTIF_MAGIC 116983961184613L
> +TEST(get_user_notification_syscall)
> +{
> + pid_t pid;
> + long ret;
> + int status, listener;
> + struct seccomp_notif req = {};
> + struct seccomp_notif_resp resp = {};
> + struct pollfd pollfd;
> +
> + pid = fork();
> + ASSERT_GE(pid, 0);
> +
> + /* Check that we get -ENOSYS with no listener attached */
> + if (pid == 0) {
> + if (user_trap_syscall(__NR_getpid, 0) < 0)
> + exit(1);
> + ret = syscall(__NR_getpid);
> + exit(ret >= 0 || errno != ENOSYS);
> + }
> +
> + EXPECT_EQ(waitpid(pid, &status, 0), pid);
> + EXPECT_EQ(true, WIFEXITED(status));
> + EXPECT_EQ(0, WEXITSTATUS(status));
> +
> + /* Check that the basic notification machinery works */
> + listener = user_trap_syscall(__NR_getpid,
> + SECCOMP_FILTER_FLAG_GET_LISTENER);
> + EXPECT_GE(listener, 0);
> +
> + /* Installing a second listener in the chain should EBUSY */
> + EXPECT_EQ(user_trap_syscall(__NR_getpid,
> + SECCOMP_FILTER_FLAG_GET_LISTENER),
> + -1);
> + EXPECT_EQ(errno, EBUSY);
> +
> + pid = fork();
> + ASSERT_GE(pid, 0);
> +
> + if (pid == 0) {
> + ret = syscall(__NR_getpid);
> + exit(ret != USER_NOTIF_MAGIC);
> + }
> +
> + EXPECT_EQ(read(listener, &req, sizeof(req)), sizeof(req));
> +
> + pollfd.fd = listener;
> + pollfd.events = POLLIN | POLLOUT;
> +
> + EXPECT_GT(poll(&pollfd, 1, -1), 0);
> + EXPECT_EQ(pollfd.revents, POLLOUT);
> +
> + EXPECT_EQ(req.data.nr, __NR_getpid);
> +
> + resp.id = req.id;
> + resp.error = 0;
> + resp.val = USER_NOTIF_MAGIC;
> +
> + EXPECT_EQ(write(listener, &resp, sizeof(resp)), sizeof(resp));
> +
> + EXPECT_EQ(waitpid(pid, &status, 0), pid);
> + EXPECT_EQ(true, WIFEXITED(status));
> + EXPECT_EQ(0, WEXITSTATUS(status));
> +
> + /*
> + * Check that nothing bad happens when we kill the task in the middle
> + * of a syscall.
> + */
> + pid = fork();
> + ASSERT_GE(pid, 0);
> +
> + if (pid == 0) {
> + ret = syscall(__NR_getpid);
> + exit(ret != USER_NOTIF_MAGIC);
> + }
> +
> + ret = read(listener, &req, sizeof(req));
> + EXPECT_EQ(ret, sizeof(req));
> +
> + EXPECT_EQ(kill(pid, SIGKILL), 0);
> + EXPECT_EQ(waitpid(pid, NULL, 0), pid);
> +
> + resp.id = req.id;
> + ret = write(listener, &resp, sizeof(resp));
> + EXPECT_EQ(ret, -1);
> + EXPECT_EQ(errno, EINVAL);
> +
> + /*
> + * Check that we get another notification about a signal in the middle
> + * of a syscall.
> + */
> + pid = fork();
> + ASSERT_GE(pid, 0);
> +
> + if (pid == 0) {
> + if (signal(SIGUSR1, signal_handler) == SIG_ERR) {
> + perror("signal");
> + exit(1);
> + }
> + ret = syscall(__NR_getpid);
> + exit(ret != USER_NOTIF_MAGIC);
> + }
> +
> + ret = read_notif(listener, &req);
> + EXPECT_EQ(ret, sizeof(req));
> + EXPECT_EQ(errno, 0);
> +
> + EXPECT_EQ(kill(pid, SIGUSR1), 0);
> +
> + ret = read_notif(listener, &req);
> + EXPECT_EQ(ret, sizeof(req));
> + EXPECT_EQ(errno, 0);
> +
> + resp.id = req.id;
> + ret = write(listener, &resp, sizeof(resp));
> + EXPECT_EQ(ret, sizeof(resp));
> + EXPECT_EQ(errno, 0);
> +
> + EXPECT_EQ(waitpid(pid, &status, 0), pid);
> + EXPECT_EQ(true, WIFEXITED(status));
> + EXPECT_EQ(0, WEXITSTATUS(status));
> +
> + close(listener);
> +}
> +
> /*
> * TODO:
> * - add microbenchmarks
> --
> 2.17.0
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace
2018-05-17 15:12 ` [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace Tycho Andersen
2018-05-17 15:41 ` Oleg Nesterov
@ 2018-05-18 14:05 ` Christian Brauner
2018-05-18 15:10 ` Tycho Andersen
1 sibling, 1 reply; 20+ messages in thread
From: Christian Brauner @ 2018-05-18 14:05 UTC (permalink / raw)
To: Tycho Andersen
Cc: linux-kernel, containers, Tobin C . Harding, Kees Cook,
Akihiro Suda, Oleg Nesterov, Andy Lutomirski, Eric W . Biederman,
Christian Brauner, Tyler Hicks
On Thu, May 17, 2018 at 09:12:17AM -0600, Tycho Andersen wrote:
> As an alternative to SECCOMP_FILTER_FLAG_GET_LISTENER, perhaps a ptrace()
> version which can acquire filters is useful. There are at least two reasons
> this is preferable, even though it uses ptrace:
>
> 1. You can control tasks that aren't cooperating with you
> 2. You can control tasks whose filters block sendmsg() and socket(); if the
> task installs a filter which blocks these calls, there's no way with
> SECCOMP_FILTER_FLAG_GET_LISTENER to get the fd out to the privileged task.
I get the problem I guess the question we need to answer is do we care
enought to bring ptrace into this? Not really objecting, just asking. :)
If blocking sendmsg() or socket() becomes an issue because people like
to shoot themselves in the foot we can surely add this option later.
Christian
>
> v2: fix a bug where listener mode was not unset when an unused fd was not
> available
>
> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
> CC: Kees Cook <keescook@chromium.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> CC: "Serge E. Hallyn" <serge@hallyn.com>
> CC: Christian Brauner <christian.brauner@ubuntu.com>
> CC: Tyler Hicks <tyhicks@canonical.com>
> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
> ---
> include/linux/seccomp.h | 11 ++++
> include/uapi/linux/ptrace.h | 2 +
> kernel/ptrace.c | 4 ++
> kernel/seccomp.c | 27 ++++++++
> tools/testing/selftests/seccomp/seccomp_bpf.c | 66 +++++++++++++++++++
> 5 files changed, 110 insertions(+)
>
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index 0fd3e0676a1c..10e684899b7b 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -111,4 +111,15 @@ static inline long seccomp_get_metadata(struct task_struct *task,
> return -EINVAL;
> }
> #endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
> +
> +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> +extern long seccomp_get_listener(struct task_struct *task,
> + unsigned long filter_off);
> +#else
> +static inline long seccomp_get_listener(struct task_struct *task,
> + unsigned long filter_off)
> +{
> + return -EINVAL;
> +}
> +#endif/* CONFIG_SECCOMP_USER_NOTIFICATION */
> #endif /* _LINUX_SECCOMP_H */
> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index d5a1b8a492b9..dc0abf81de3b 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -73,6 +73,8 @@ struct seccomp_metadata {
> __u64 flags; /* Output: filter's flags */
> };
>
> +#define PTRACE_SECCOMP_GET_LISTENER 0x420e
> +
> /* Read signals from a shared (process wide) queue */
> #define PTRACE_PEEKSIGINFO_SHARED (1 << 0)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 21fec73d45d4..fcbdb6f4dc07 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1096,6 +1096,10 @@ int ptrace_request(struct task_struct *child, long request,
> ret = seccomp_get_metadata(child, addr, datavp);
> break;
>
> + case PTRACE_SECCOMP_GET_LISTENER:
> + ret = seccomp_get_listener(child, addr);
> + break;
> +
> default:
> break;
> }
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index f136eca93f2f..7c23aee76bb4 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1678,4 +1678,31 @@ static struct file *init_listener(struct task_struct *task,
>
> return ret;
> }
> +
> +long seccomp_get_listener(struct task_struct *task,
> + unsigned long filter_off)
> +{
> + struct seccomp_filter *filter;
> + struct file *listener;
> + int fd;
> +
> + filter = get_nth_filter(task, filter_off);
> + if (IS_ERR(filter))
> + return PTR_ERR(filter);
> +
> + fd = get_unused_fd_flags(O_RDWR);
> + if (fd < 0) {
> + __put_seccomp_filter(filter);
> + return fd;
> + }
> +
> + listener = init_listener(task, task->seccomp.filter);
> + if (IS_ERR(listener)) {
> + put_unused_fd(fd);
> + return PTR_ERR(listener);
> + }
> +
> + fd_install(fd, listener);
> + return fd;
> +}
> #endif
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index bb96df66222f..473905f33e0b 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -178,6 +178,10 @@ int seccomp(unsigned int op, unsigned int flags, void *args)
> }
> #endif
>
> +#ifndef PTRACE_SECCOMP_GET_LISTENER
> +#define PTRACE_SECCOMP_GET_LISTENER 0x420e
> +#endif
> +
> #if __BYTE_ORDER == __LITTLE_ENDIAN
> #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
> #elif __BYTE_ORDER == __BIG_ENDIAN
> @@ -3096,6 +3100,68 @@ TEST(get_user_notification_syscall)
> close(listener);
> }
>
> +TEST(get_user_notification_ptrace)
> +{
> + pid_t pid;
> + int status, listener;
> + int sk_pair[2];
> + char c;
> + struct seccomp_notif req = {};
> + struct seccomp_notif_resp resp = {};
> +
> + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0);
> +
> + pid = fork();
> + ASSERT_GE(pid, 0);
> +
> + if (pid == 0) {
> + EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0);
> +
> + /* Test that we get ENOSYS while not attached */
> + EXPECT_EQ(syscall(__NR_getpid), -1);
> + EXPECT_EQ(errno, ENOSYS);
> +
> + /* Signal we're ready and have installed the filter. */
> + EXPECT_EQ(write(sk_pair[1], "J", 1), 1);
> +
> + EXPECT_EQ(read(sk_pair[1], &c, 1), 1);
> + EXPECT_EQ(c, 'H');
> +
> + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC);
> + }
> +
> + EXPECT_EQ(read(sk_pair[0], &c, 1), 1);
> + EXPECT_EQ(c, 'J');
> +
> + EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0);
> + EXPECT_EQ(waitpid(pid, NULL, 0), pid);
> + listener = ptrace(PTRACE_SECCOMP_GET_LISTENER, pid, 0);
> + EXPECT_GE(listener, 0);
> +
> + /* EBUSY for second listener */
> + EXPECT_EQ(ptrace(PTRACE_SECCOMP_GET_LISTENER, pid, 0), -1);
> + EXPECT_EQ(errno, EBUSY);
> +
> + EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0);
> +
> + /* Now signal we are done and respond with magic */
> + EXPECT_EQ(write(sk_pair[0], "H", 1), 1);
> +
> + EXPECT_EQ(read(listener, &req, sizeof(req)), sizeof(req));
> +
> + resp.id = req.id;
> + resp.error = 0;
> + resp.val = USER_NOTIF_MAGIC;
> +
> + EXPECT_EQ(write(listener, &resp, sizeof(resp)), sizeof(resp));
> +
> + EXPECT_EQ(waitpid(pid, &status, 0), pid);
> + EXPECT_EQ(true, WIFEXITED(status));
> + EXPECT_EQ(0, WEXITSTATUS(status));
> +
> + close(listener);
> +}
> +
> /*
> * TODO:
> * - add microbenchmarks
> --
> 2.17.0
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace
2018-05-18 14:05 ` Christian Brauner
@ 2018-05-18 15:10 ` Tycho Andersen
0 siblings, 0 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-18 15:10 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-kernel, containers, Tobin C . Harding, Kees Cook,
Akihiro Suda, Oleg Nesterov, Andy Lutomirski, Eric W . Biederman,
Christian Brauner, Tyler Hicks
On Fri, May 18, 2018 at 04:05:56PM +0200, Christian Brauner wrote:
> On Thu, May 17, 2018 at 09:12:17AM -0600, Tycho Andersen wrote:
> > As an alternative to SECCOMP_FILTER_FLAG_GET_LISTENER, perhaps a ptrace()
> > version which can acquire filters is useful. There are at least two reasons
> > this is preferable, even though it uses ptrace:
> >
> > 1. You can control tasks that aren't cooperating with you
> > 2. You can control tasks whose filters block sendmsg() and socket(); if the
> > task installs a filter which blocks these calls, there's no way with
> > SECCOMP_FILTER_FLAG_GET_LISTENER to get the fd out to the privileged task.
>
> I get the problem I guess the question we need to answer is do we care
> enought to bring ptrace into this? Not really objecting, just asking. :)
> If blocking sendmsg() or socket() becomes an issue because people like
> to shoot themselves in the foot we can surely add this option later.
It doesn't seem that unreasonable to me to want to filter socket() or
sendmsg() though, so designing an API that doesn't support that from
the get-go seems like a bad idea. But that's why there are two
alternatives, so we can argue about it :)
Tycho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-18 14:04 ` Christian Brauner
@ 2018-05-18 15:21 ` Tycho Andersen
0 siblings, 0 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-18 15:21 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-kernel, containers, Tobin C . Harding, Kees Cook,
Akihiro Suda, Oleg Nesterov, Andy Lutomirski, Eric W . Biederman,
Christian Brauner, Tyler Hicks
On Fri, May 18, 2018 at 04:04:16PM +0200, Christian Brauner wrote:
> On Thu, May 17, 2018 at 09:12:15AM -0600, Tycho Andersen wrote:
> > +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> > +static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
> > +{
> > + u64 ret = filter->next_id;
> > +
> > + /* Note: overflow is ok here, the id just needs to be unique */
> > + filter->next_id++;
> > +
> > + return ret;
> > +}
>
> Nit: Depending on how averse people are to relying on side-effects this
> could be simplified to:
>
> static inline u64 seccomp_next_notify_id(struct seccomp_filter *filter)
> {
> /* Note: Overflow is ok. The id just needs to be unique. */
> return filter->next_id++;
> }
Oh, yes, definitely. I think this is leftover from when this function
worked a different way.
> > +
> > +static void seccomp_do_user_notification(int this_syscall,
> > + struct seccomp_filter *match,
> > + const struct seccomp_data *sd)
> > +{
> > + int err;
> > + long ret = 0;
> > + struct seccomp_knotif n = {};
> > +
> > + mutex_lock(&match->notify_lock);
> > + if (!match->has_listener) {
> > + err = -ENOSYS;
> > + goto out;
> > + }
>
> Nit:
>
> err = -ENOSYS;
> mutex_lock(&match->notify_lock);
> if (!match->has_listener)
> goto out;
>
> looks cleaner to me or you do the err initalization at the top of the
> function. :)
Ok :)
> > +
> > + n.pid = current->pid;
> > + n.state = SECCOMP_NOTIFY_INIT;
> > + n.data = sd;
> > + n.id = seccomp_next_notify_id(match);
> > + init_completion(&n.ready);
> > +
> > + list_add(&n.list, &match->notifications);
> > +
> > + mutex_unlock(&match->notify_lock);
> > + up(&match->request);
> > +
> > + err = wait_for_completion_interruptible(&n.ready);
> > + mutex_lock(&match->notify_lock);
> > +
> > + /*
> > + * Here it's possible we got a signal and then had to wait on the mutex
> > + * while the reply was sent, so let's be sure there wasn't a response
> > + * in the meantime.
> > + */
> > + if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
> > + /*
> > + * We got a signal. Let's tell userspace about it (potentially
> > + * again, if we had already notified them about the first one).
> > + */
> > + if (n.state == SECCOMP_NOTIFY_SENT) {
> > + n.state = SECCOMP_NOTIFY_INIT;
> > + up(&match->request);
> > + }
> > + mutex_unlock(&match->notify_lock);
> > + err = wait_for_completion_killable(&n.ready);
> > + mutex_lock(&match->notify_lock);
> > + if (err < 0)
> > + goto remove_list;
> > + }
> > +
> > + ret = n.val;
> > + err = n.error;
> > +
> > + WARN(n.state != SECCOMP_NOTIFY_REPLIED,
> > + "notified about write complete when state is not write");
>
> Nit: That message seems a little cryptic.
Perhaps we can just drop it. It's just a sanity check, but given the
tests above, it doesn't seem likely.
> > +
> > +remove_list:
> > + list_del(&n.list);
> > +out:
> > + mutex_unlock(&match->notify_lock);
> > + syscall_set_return_value(current, task_pt_regs(current),
> > + err, ret);
> > +}
> > +#else
> > +static void seccomp_do_user_notification(int this_syscall,
> > + u32 action,
> > + struct seccomp_filter *match,
> > + const struct seccomp_data *sd)
> > +{
> > + WARN(1, "user notification received, but disabled");
>
> Nit: "received unexpected user notification" might be clearer
Yes, I wonder if we shouldn't just drop this too -- it's not a kernel
bug, but a userspace bug that they're using features that aren't
enabled.
We could enhance the verifier with a static check for
BPF_RET | BPF_K == SECCOMPO_RET_USER_NOTIF and reject such programs if
user notification isn't enabled. Of course, it wouldn't handle the
dynamic case, but it might be useful.
Tycho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
2018-05-17 15:33 ` Oleg Nesterov
2018-05-18 14:04 ` Christian Brauner
@ 2018-05-19 0:14 ` kbuild test robot
2018-05-19 5:01 ` kbuild test robot
3 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2018-05-19 0:14 UTC (permalink / raw)
To: Tycho Andersen
Cc: kbuild-all, linux-kernel, containers, Kees Cook, Andy Lutomirski,
Oleg Nesterov, Eric W . Biederman, Serge E . Hallyn,
Christian Brauner, Tyler Hicks, Akihiro Suda, Tobin C . Harding,
Tycho Andersen
[-- Attachment #1: Type: text/plain, Size: 15430 bytes --]
Hi Tycho,
I love your patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc5]
[cannot apply to next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527
config: x86_64-randconfig-x010-201819 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All error/warnings (new ones prefixed by >>):
kernel/seccomp.c: In function '__seccomp_filter':
>> kernel/seccomp.c:891:46: warning: passing argument 2 of 'seccomp_do_user_notification' makes integer from pointer without a cast [-Wint-conversion]
seccomp_do_user_notification(this_syscall, match, sd);
^~~~~
kernel/seccomp.c:802:13: note: expected 'u32 {aka unsigned int}' but argument is of type 'struct seccomp_filter *'
static void seccomp_do_user_notification(int this_syscall,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/seccomp.c:891:53: error: passing argument 3 of 'seccomp_do_user_notification' from incompatible pointer type [-Werror=incompatible-pointer-types]
seccomp_do_user_notification(this_syscall, match, sd);
^~
kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but argument is of type 'const struct seccomp_data *'
static void seccomp_do_user_notification(int this_syscall,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/seccomp.c:891:3: error: too few arguments to function 'seccomp_do_user_notification'
seccomp_do_user_notification(this_syscall, match, sd);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/seccomp.c:802:13: note: declared here
static void seccomp_do_user_notification(int this_syscall,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/seccomp.c: In function 'seccomp_set_mode_filter':
>> kernel/seccomp.c:1036:14: error: implicit declaration of function 'get_unused_fd_flags'; did you mean 'getname_flags'? [-Werror=implicit-function-declaration]
listener = get_unused_fd_flags(O_RDWR);
^~~~~~~~~~~~~~~~~~~
getname_flags
>> kernel/seccomp.c:1042:16: error: implicit declaration of function 'init_listener'; did you mean 'init_llist_head'? [-Werror=implicit-function-declaration]
listener_f = init_listener(current, prepared);
^~~~~~~~~~~~~
init_llist_head
>> kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
listener_f = init_listener(current, prepared);
^
>> kernel/seccomp.c:1044:4: error: implicit declaration of function 'put_unused_fd'; did you mean 'put_user_ns'? [-Werror=implicit-function-declaration]
put_unused_fd(listener);
^~~~~~~~~~~~~
put_user_ns
>> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput'; did you mean 'iput'? [-Werror=implicit-function-declaration]
fput(listener_f);
^~~~
iput
>> kernel/seccomp.c:1080:4: error: implicit declaration of function 'fd_install'; did you mean 'fs_initcall'? [-Werror=implicit-function-declaration]
fd_install(listener, listener_f);
^~~~~~~~~~
fs_initcall
cc1: some warnings being treated as errors
vim +/seccomp_do_user_notification +891 kernel/seccomp.c
738
739 static void seccomp_do_user_notification(int this_syscall,
740 struct seccomp_filter *match,
741 const struct seccomp_data *sd)
742 {
743 int err;
744 long ret = 0;
745 struct seccomp_knotif n = {};
746
747 mutex_lock(&match->notify_lock);
748 if (!match->has_listener) {
749 err = -ENOSYS;
750 goto out;
751 }
752
753 n.pid = current->pid;
754 n.state = SECCOMP_NOTIFY_INIT;
755 n.data = sd;
756 n.id = seccomp_next_notify_id(match);
757 init_completion(&n.ready);
758
759 list_add(&n.list, &match->notifications);
760
761 mutex_unlock(&match->notify_lock);
762 up(&match->request);
763
764 err = wait_for_completion_interruptible(&n.ready);
765 mutex_lock(&match->notify_lock);
766
767 /*
768 * Here it's possible we got a signal and then had to wait on the mutex
769 * while the reply was sent, so let's be sure there wasn't a response
770 * in the meantime.
771 */
772 if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
773 /*
774 * We got a signal. Let's tell userspace about it (potentially
775 * again, if we had already notified them about the first one).
776 */
777 if (n.state == SECCOMP_NOTIFY_SENT) {
778 n.state = SECCOMP_NOTIFY_INIT;
779 up(&match->request);
780 }
781 mutex_unlock(&match->notify_lock);
782 err = wait_for_completion_killable(&n.ready);
783 mutex_lock(&match->notify_lock);
784 if (err < 0)
785 goto remove_list;
786 }
787
788 ret = n.val;
789 err = n.error;
790
791 WARN(n.state != SECCOMP_NOTIFY_REPLIED,
792 "notified about write complete when state is not write");
793
794 remove_list:
795 list_del(&n.list);
796 out:
797 mutex_unlock(&match->notify_lock);
798 syscall_set_return_value(current, task_pt_regs(current),
799 err, ret);
800 }
801 #else
> 802 static void seccomp_do_user_notification(int this_syscall,
803 u32 action,
804 struct seccomp_filter *match,
805 const struct seccomp_data *sd)
806 {
807 WARN(1, "user notification received, but disabled");
808 seccomp_log(this_syscall, SIGSYS, action, true);
809 do_exit(SIGSYS);
810 }
811 #endif
812
813 #ifdef CONFIG_SECCOMP_FILTER
814 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
815 const bool recheck_after_trace)
816 {
817 u32 filter_ret, action;
818 struct seccomp_filter *match = NULL;
819 int data;
820
821 /*
822 * Make sure that any changes to mode from another thread have
823 * been seen after TIF_SECCOMP was seen.
824 */
825 rmb();
826
827 filter_ret = seccomp_run_filters(sd, &match);
828 data = filter_ret & SECCOMP_RET_DATA;
829 action = filter_ret & SECCOMP_RET_ACTION_FULL;
830
831 switch (action) {
832 case SECCOMP_RET_ERRNO:
833 /* Set low-order bits as an errno, capped at MAX_ERRNO. */
834 if (data > MAX_ERRNO)
835 data = MAX_ERRNO;
836 syscall_set_return_value(current, task_pt_regs(current),
837 -data, 0);
838 goto skip;
839
840 case SECCOMP_RET_TRAP:
841 /* Show the handler the original registers. */
842 syscall_rollback(current, task_pt_regs(current));
843 /* Let the filter pass back 16 bits of data. */
844 seccomp_send_sigsys(this_syscall, data);
845 goto skip;
846
847 case SECCOMP_RET_TRACE:
848 /* We've been put in this state by the ptracer already. */
849 if (recheck_after_trace)
850 return 0;
851
852 /* ENOSYS these calls if there is no tracer attached. */
853 if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) {
854 syscall_set_return_value(current,
855 task_pt_regs(current),
856 -ENOSYS, 0);
857 goto skip;
858 }
859
860 /* Allow the BPF to provide the event message */
861 ptrace_event(PTRACE_EVENT_SECCOMP, data);
862 /*
863 * The delivery of a fatal signal during event
864 * notification may silently skip tracer notification,
865 * which could leave us with a potentially unmodified
866 * syscall that the tracer would have liked to have
867 * changed. Since the process is about to die, we just
868 * force the syscall to be skipped and let the signal
869 * kill the process and correctly handle any tracer exit
870 * notifications.
871 */
872 if (fatal_signal_pending(current))
873 goto skip;
874 /* Check if the tracer forced the syscall to be skipped. */
875 this_syscall = syscall_get_nr(current, task_pt_regs(current));
876 if (this_syscall < 0)
877 goto skip;
878
879 /*
880 * Recheck the syscall, since it may have changed. This
881 * intentionally uses a NULL struct seccomp_data to force
882 * a reload of all registers. This does not goto skip since
883 * a skip would have already been reported.
884 */
885 if (__seccomp_filter(this_syscall, NULL, true))
886 return -1;
887
888 return 0;
889
890 case SECCOMP_RET_USER_NOTIF:
> 891 seccomp_do_user_notification(this_syscall, match, sd);
892 goto skip;
893 case SECCOMP_RET_LOG:
894 seccomp_log(this_syscall, 0, action, true);
895 return 0;
896
897 case SECCOMP_RET_ALLOW:
898 /*
899 * Note that the "match" filter will always be NULL for
900 * this action since SECCOMP_RET_ALLOW is the starting
901 * state in seccomp_run_filters().
902 */
903 return 0;
904
905 case SECCOMP_RET_KILL_THREAD:
906 case SECCOMP_RET_KILL_PROCESS:
907 default:
908 seccomp_log(this_syscall, SIGSYS, action, true);
909 /* Dump core only if this is the last remaining thread. */
910 if (action == SECCOMP_RET_KILL_PROCESS ||
911 get_nr_threads(current) == 1) {
912 siginfo_t info;
913
914 /* Show the original registers in the dump. */
915 syscall_rollback(current, task_pt_regs(current));
916 /* Trigger a manual coredump since do_exit skips it. */
917 seccomp_init_siginfo(&info, this_syscall, data);
918 do_coredump(&info);
919 }
920 if (action == SECCOMP_RET_KILL_PROCESS)
921 do_group_exit(SIGSYS);
922 else
923 do_exit(SIGSYS);
924 }
925
926 unreachable();
927
928 skip:
929 seccomp_log(this_syscall, 0, action, match ? match->log : false);
930 return -1;
931 }
932 #else
933 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
934 const bool recheck_after_trace)
935 {
936 BUG();
937 }
938 #endif
939
940 int __secure_computing(const struct seccomp_data *sd)
941 {
942 int mode = current->seccomp.mode;
943 int this_syscall;
944
945 if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
946 unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
947 return 0;
948
949 this_syscall = sd ? sd->nr :
950 syscall_get_nr(current, task_pt_regs(current));
951
952 switch (mode) {
953 case SECCOMP_MODE_STRICT:
954 __secure_computing_strict(this_syscall); /* may call do_exit */
955 return 0;
956 case SECCOMP_MODE_FILTER:
957 return __seccomp_filter(this_syscall, sd, false);
958 default:
959 BUG();
960 }
961 }
962 #endif /* CONFIG_HAVE_ARCH_SECCOMP_FILTER */
963
964 long prctl_get_seccomp(void)
965 {
966 return current->seccomp.mode;
967 }
968
969 /**
970 * seccomp_set_mode_strict: internal function for setting strict seccomp
971 *
972 * Once current->seccomp.mode is non-zero, it may not be changed.
973 *
974 * Returns 0 on success or -EINVAL on failure.
975 */
976 static long seccomp_set_mode_strict(void)
977 {
978 const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
979 long ret = -EINVAL;
980
981 spin_lock_irq(¤t->sighand->siglock);
982
983 if (!seccomp_may_assign_mode(seccomp_mode))
984 goto out;
985
986 #ifdef TIF_NOTSC
987 disable_TSC();
988 #endif
989 seccomp_assign_mode(current, seccomp_mode);
990 ret = 0;
991
992 out:
993 spin_unlock_irq(¤t->sighand->siglock);
994
995 return ret;
996 }
997
998 #ifdef CONFIG_SECCOMP_FILTER
999 #ifdef CONFIG_SECCOMP_USER_NOTIFICATION
1000 static struct file *init_listener(struct task_struct *,
1001 struct seccomp_filter *);
1002 #endif
1003
1004 /**
1005 * seccomp_set_mode_filter: internal function for setting seccomp filter
1006 * @flags: flags to change filter behavior
1007 * @filter: struct sock_fprog containing filter
1008 *
1009 * This function may be called repeatedly to install additional filters.
1010 * Every filter successfully installed will be evaluated (in reverse order)
1011 * for each system call the task makes.
1012 *
1013 * Once current->seccomp.mode is non-zero, it may not be changed.
1014 *
1015 * Returns 0 on success or -EINVAL on failure.
1016 */
1017 static long seccomp_set_mode_filter(unsigned int flags,
1018 const char __user *filter)
1019 {
1020 const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
1021 struct seccomp_filter *prepared = NULL;
1022 long ret = -EINVAL;
1023 int listener = 0;
1024 struct file *listener_f = NULL;
1025
1026 /* Validate flags. */
1027 if (flags & ~SECCOMP_FILTER_FLAG_MASK)
1028 return -EINVAL;
1029
1030 /* Prepare the new filter before holding any locks. */
1031 prepared = seccomp_prepare_user_filter(filter);
1032 if (IS_ERR(prepared))
1033 return PTR_ERR(prepared);
1034
1035 if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
> 1036 listener = get_unused_fd_flags(O_RDWR);
1037 if (listener < 0) {
1038 ret = listener;
1039 goto out_free;
1040 }
1041
> 1042 listener_f = init_listener(current, prepared);
1043 if (IS_ERR(listener_f)) {
> 1044 put_unused_fd(listener);
1045 ret = PTR_ERR(listener_f);
1046 goto out_free;
1047 }
1048 }
1049
1050 /*
1051 * Make sure we cannot change seccomp or nnp state via TSYNC
1052 * while another thread is in the middle of calling exec.
1053 */
1054 if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
1055 mutex_lock_killable(¤t->signal->cred_guard_mutex))
1056 goto out_put_fd;
1057
1058 spin_lock_irq(¤t->sighand->siglock);
1059
1060 if (!seccomp_may_assign_mode(seccomp_mode))
1061 goto out;
1062
1063 ret = seccomp_attach_filter(flags, prepared);
1064 if (ret)
1065 goto out;
1066 /* Do not free the successfully attached filter. */
1067 prepared = NULL;
1068
1069 seccomp_assign_mode(current, seccomp_mode);
1070 out:
1071 spin_unlock_irq(¤t->sighand->siglock);
1072 if (flags & SECCOMP_FILTER_FLAG_TSYNC)
1073 mutex_unlock(¤t->signal->cred_guard_mutex);
1074 out_put_fd:
1075 if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
1076 if (ret < 0) {
> 1077 fput(listener_f);
1078 put_unused_fd(listener);
1079 } else {
> 1080 fd_install(listener, listener_f);
1081 ret = listener;
1082 }
1083 }
1084 out_free:
1085 seccomp_filter_free(prepared);
1086 return ret;
1087 }
1088 #else
1089 static inline long seccomp_set_mode_filter(unsigned int flags,
1090 const char __user *filter)
1091 {
1092 return -EINVAL;
1093 }
1094 #endif
1095
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26139 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
` (2 preceding siblings ...)
2018-05-19 0:14 ` kbuild test robot
@ 2018-05-19 5:01 ` kbuild test robot
2018-05-21 22:55 ` Tycho Andersen
3 siblings, 1 reply; 20+ messages in thread
From: kbuild test robot @ 2018-05-19 5:01 UTC (permalink / raw)
To: Tycho Andersen
Cc: kbuild-all, linux-kernel, containers, Kees Cook, Andy Lutomirski,
Oleg Nesterov, Eric W . Biederman, Serge E . Hallyn,
Christian Brauner, Tyler Hicks, Akihiro Suda, Tobin C . Harding,
Tycho Andersen
[-- Attachment #1: Type: text/plain, Size: 12534 bytes --]
Hi Tycho,
I love your patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc5]
[cannot apply to next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527
config: i386-randconfig-a1-05181545 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All error/warnings (new ones prefixed by >>):
kernel/seccomp.c: In function '__seccomp_filter':
kernel/seccomp.c:891:46: warning: passing argument 2 of 'seccomp_do_user_notification' makes integer from pointer without a cast
seccomp_do_user_notification(this_syscall, match, sd);
^
kernel/seccomp.c:802:13: note: expected 'u32' but argument is of type 'struct seccomp_filter *'
static void seccomp_do_user_notification(int this_syscall,
^
>> kernel/seccomp.c:891:53: warning: passing argument 3 of 'seccomp_do_user_notification' from incompatible pointer type
seccomp_do_user_notification(this_syscall, match, sd);
^
kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but argument is of type 'const struct seccomp_data *'
static void seccomp_do_user_notification(int this_syscall,
^
kernel/seccomp.c:891:3: error: too few arguments to function 'seccomp_do_user_notification'
seccomp_do_user_notification(this_syscall, match, sd);
^
kernel/seccomp.c:802:13: note: declared here
static void seccomp_do_user_notification(int this_syscall,
^
kernel/seccomp.c: In function 'seccomp_set_mode_filter':
>> kernel/seccomp.c:1036:3: error: implicit declaration of function 'get_unused_fd_flags' [-Werror=implicit-function-declaration]
listener = get_unused_fd_flags(O_RDWR);
^
>> kernel/seccomp.c:1042:3: error: implicit declaration of function 'init_listener' [-Werror=implicit-function-declaration]
listener_f = init_listener(current, prepared);
^
kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer without a cast
listener_f = init_listener(current, prepared);
^
>> kernel/seccomp.c:1044:4: error: implicit declaration of function 'put_unused_fd' [-Werror=implicit-function-declaration]
put_unused_fd(listener);
^
>> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput' [-Werror=implicit-function-declaration]
fput(listener_f);
^
>> kernel/seccomp.c:1080:4: error: implicit declaration of function 'fd_install' [-Werror=implicit-function-declaration]
fd_install(listener, listener_f);
^
cc1: some warnings being treated as errors
vim +/get_unused_fd_flags +1036 kernel/seccomp.c
812
813 #ifdef CONFIG_SECCOMP_FILTER
814 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
815 const bool recheck_after_trace)
816 {
817 u32 filter_ret, action;
818 struct seccomp_filter *match = NULL;
819 int data;
820
821 /*
822 * Make sure that any changes to mode from another thread have
823 * been seen after TIF_SECCOMP was seen.
824 */
825 rmb();
826
827 filter_ret = seccomp_run_filters(sd, &match);
828 data = filter_ret & SECCOMP_RET_DATA;
829 action = filter_ret & SECCOMP_RET_ACTION_FULL;
830
831 switch (action) {
832 case SECCOMP_RET_ERRNO:
833 /* Set low-order bits as an errno, capped at MAX_ERRNO. */
834 if (data > MAX_ERRNO)
835 data = MAX_ERRNO;
836 syscall_set_return_value(current, task_pt_regs(current),
837 -data, 0);
838 goto skip;
839
840 case SECCOMP_RET_TRAP:
841 /* Show the handler the original registers. */
842 syscall_rollback(current, task_pt_regs(current));
843 /* Let the filter pass back 16 bits of data. */
844 seccomp_send_sigsys(this_syscall, data);
845 goto skip;
846
847 case SECCOMP_RET_TRACE:
848 /* We've been put in this state by the ptracer already. */
849 if (recheck_after_trace)
850 return 0;
851
852 /* ENOSYS these calls if there is no tracer attached. */
853 if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) {
854 syscall_set_return_value(current,
855 task_pt_regs(current),
856 -ENOSYS, 0);
857 goto skip;
858 }
859
860 /* Allow the BPF to provide the event message */
861 ptrace_event(PTRACE_EVENT_SECCOMP, data);
862 /*
863 * The delivery of a fatal signal during event
864 * notification may silently skip tracer notification,
865 * which could leave us with a potentially unmodified
866 * syscall that the tracer would have liked to have
867 * changed. Since the process is about to die, we just
868 * force the syscall to be skipped and let the signal
869 * kill the process and correctly handle any tracer exit
870 * notifications.
871 */
872 if (fatal_signal_pending(current))
873 goto skip;
874 /* Check if the tracer forced the syscall to be skipped. */
875 this_syscall = syscall_get_nr(current, task_pt_regs(current));
876 if (this_syscall < 0)
877 goto skip;
878
879 /*
880 * Recheck the syscall, since it may have changed. This
881 * intentionally uses a NULL struct seccomp_data to force
882 * a reload of all registers. This does not goto skip since
883 * a skip would have already been reported.
884 */
885 if (__seccomp_filter(this_syscall, NULL, true))
886 return -1;
887
888 return 0;
889
890 case SECCOMP_RET_USER_NOTIF:
> 891 seccomp_do_user_notification(this_syscall, match, sd);
892 goto skip;
893 case SECCOMP_RET_LOG:
894 seccomp_log(this_syscall, 0, action, true);
895 return 0;
896
897 case SECCOMP_RET_ALLOW:
898 /*
899 * Note that the "match" filter will always be NULL for
900 * this action since SECCOMP_RET_ALLOW is the starting
901 * state in seccomp_run_filters().
902 */
903 return 0;
904
905 case SECCOMP_RET_KILL_THREAD:
906 case SECCOMP_RET_KILL_PROCESS:
907 default:
908 seccomp_log(this_syscall, SIGSYS, action, true);
909 /* Dump core only if this is the last remaining thread. */
910 if (action == SECCOMP_RET_KILL_PROCESS ||
911 get_nr_threads(current) == 1) {
912 siginfo_t info;
913
914 /* Show the original registers in the dump. */
915 syscall_rollback(current, task_pt_regs(current));
916 /* Trigger a manual coredump since do_exit skips it. */
917 seccomp_init_siginfo(&info, this_syscall, data);
918 do_coredump(&info);
919 }
920 if (action == SECCOMP_RET_KILL_PROCESS)
921 do_group_exit(SIGSYS);
922 else
923 do_exit(SIGSYS);
924 }
925
926 unreachable();
927
928 skip:
929 seccomp_log(this_syscall, 0, action, match ? match->log : false);
930 return -1;
931 }
932 #else
933 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
934 const bool recheck_after_trace)
935 {
936 BUG();
937 }
938 #endif
939
940 int __secure_computing(const struct seccomp_data *sd)
941 {
942 int mode = current->seccomp.mode;
943 int this_syscall;
944
945 if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
946 unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
947 return 0;
948
949 this_syscall = sd ? sd->nr :
950 syscall_get_nr(current, task_pt_regs(current));
951
952 switch (mode) {
953 case SECCOMP_MODE_STRICT:
954 __secure_computing_strict(this_syscall); /* may call do_exit */
955 return 0;
956 case SECCOMP_MODE_FILTER:
957 return __seccomp_filter(this_syscall, sd, false);
958 default:
959 BUG();
960 }
961 }
962 #endif /* CONFIG_HAVE_ARCH_SECCOMP_FILTER */
963
964 long prctl_get_seccomp(void)
965 {
966 return current->seccomp.mode;
967 }
968
969 /**
970 * seccomp_set_mode_strict: internal function for setting strict seccomp
971 *
972 * Once current->seccomp.mode is non-zero, it may not be changed.
973 *
974 * Returns 0 on success or -EINVAL on failure.
975 */
976 static long seccomp_set_mode_strict(void)
977 {
978 const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
979 long ret = -EINVAL;
980
981 spin_lock_irq(¤t->sighand->siglock);
982
983 if (!seccomp_may_assign_mode(seccomp_mode))
984 goto out;
985
986 #ifdef TIF_NOTSC
987 disable_TSC();
988 #endif
989 seccomp_assign_mode(current, seccomp_mode);
990 ret = 0;
991
992 out:
993 spin_unlock_irq(¤t->sighand->siglock);
994
995 return ret;
996 }
997
998 #ifdef CONFIG_SECCOMP_FILTER
999 #ifdef CONFIG_SECCOMP_USER_NOTIFICATION
1000 static struct file *init_listener(struct task_struct *,
1001 struct seccomp_filter *);
1002 #endif
1003
1004 /**
1005 * seccomp_set_mode_filter: internal function for setting seccomp filter
1006 * @flags: flags to change filter behavior
1007 * @filter: struct sock_fprog containing filter
1008 *
1009 * This function may be called repeatedly to install additional filters.
1010 * Every filter successfully installed will be evaluated (in reverse order)
1011 * for each system call the task makes.
1012 *
1013 * Once current->seccomp.mode is non-zero, it may not be changed.
1014 *
1015 * Returns 0 on success or -EINVAL on failure.
1016 */
1017 static long seccomp_set_mode_filter(unsigned int flags,
1018 const char __user *filter)
1019 {
1020 const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
1021 struct seccomp_filter *prepared = NULL;
1022 long ret = -EINVAL;
1023 int listener = 0;
1024 struct file *listener_f = NULL;
1025
1026 /* Validate flags. */
1027 if (flags & ~SECCOMP_FILTER_FLAG_MASK)
1028 return -EINVAL;
1029
1030 /* Prepare the new filter before holding any locks. */
1031 prepared = seccomp_prepare_user_filter(filter);
1032 if (IS_ERR(prepared))
1033 return PTR_ERR(prepared);
1034
1035 if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
> 1036 listener = get_unused_fd_flags(O_RDWR);
1037 if (listener < 0) {
1038 ret = listener;
1039 goto out_free;
1040 }
1041
> 1042 listener_f = init_listener(current, prepared);
1043 if (IS_ERR(listener_f)) {
> 1044 put_unused_fd(listener);
1045 ret = PTR_ERR(listener_f);
1046 goto out_free;
1047 }
1048 }
1049
1050 /*
1051 * Make sure we cannot change seccomp or nnp state via TSYNC
1052 * while another thread is in the middle of calling exec.
1053 */
1054 if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
1055 mutex_lock_killable(¤t->signal->cred_guard_mutex))
1056 goto out_put_fd;
1057
1058 spin_lock_irq(¤t->sighand->siglock);
1059
1060 if (!seccomp_may_assign_mode(seccomp_mode))
1061 goto out;
1062
1063 ret = seccomp_attach_filter(flags, prepared);
1064 if (ret)
1065 goto out;
1066 /* Do not free the successfully attached filter. */
1067 prepared = NULL;
1068
1069 seccomp_assign_mode(current, seccomp_mode);
1070 out:
1071 spin_unlock_irq(¤t->sighand->siglock);
1072 if (flags & SECCOMP_FILTER_FLAG_TSYNC)
1073 mutex_unlock(¤t->signal->cred_guard_mutex);
1074 out_put_fd:
1075 if (flags & SECCOMP_FILTER_FLAG_GET_LISTENER) {
1076 if (ret < 0) {
> 1077 fput(listener_f);
1078 put_unused_fd(listener);
1079 } else {
> 1080 fd_install(listener, listener_f);
1081 ret = listener;
1082 }
1083 }
1084 out_free:
1085 seccomp_filter_free(prepared);
1086 return ret;
1087 }
1088 #else
1089 static inline long seccomp_set_mode_filter(unsigned int flags,
1090 const char __user *filter)
1091 {
1092 return -EINVAL;
1093 }
1094 #endif
1095
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28767 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-19 5:01 ` kbuild test robot
@ 2018-05-21 22:55 ` Tycho Andersen
0 siblings, 0 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-21 22:55 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, linux-kernel, containers, Kees Cook, Andy Lutomirski,
Oleg Nesterov, Eric W . Biederman, Serge E . Hallyn,
Christian Brauner, Tyler Hicks, Akihiro Suda, Tobin C . Harding
On Sat, May 19, 2018 at 01:01:15PM +0800, kbuild test robot wrote:
> Hi Tycho,
>
> I love your patch! Yet something to improve:
Whoops, seems I forgot to compile the
!CONFIG_SECCOMP_USER_NOTIFICATION case. Anyways, I've fixed this for
v3.
Tycho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
2018-05-17 15:46 ` Oleg Nesterov
@ 2018-05-24 15:28 ` Tycho Andersen
0 siblings, 0 replies; 20+ messages in thread
From: Tycho Andersen @ 2018-05-24 15:28 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, containers, Kees Cook, Andy Lutomirski,
Eric W . Biederman, Serge E . Hallyn, Christian Brauner,
Tyler Hicks, Akihiro Suda, Tobin C . Harding
Hi Oleg,
On Thu, May 17, 2018 at 05:46:37PM +0200, Oleg Nesterov wrote:
> On 05/17, Tycho Andersen wrote:
> >
> > > From lockdep pov this loop tries to take the same lock twice or more, it shoul
> > > complain.
> >
> > I didn't, but I guess that's because it's not trying to take the same lock
> > twice -- the pointer cur is changing in the loop.
>
> Yes, I see. But this is the same lock for lockdep, it has the same class.
I finally figured this out, I needed CONFIG_PROVE_LOCKING=y too,
anyway, I've added the nesting annotations for v3. Thanks!
Tycho
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2018-05-24 15:28 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-17 15:12 [PATCH v2 0/4] seccomp trap to userspace Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 1/4] seccomp: add a return code to " Tycho Andersen
2018-05-17 15:33 ` Oleg Nesterov
2018-05-17 15:39 ` Tycho Andersen
2018-05-17 15:46 ` Oleg Nesterov
2018-05-24 15:28 ` Tycho Andersen
2018-05-18 14:04 ` Christian Brauner
2018-05-18 15:21 ` Tycho Andersen
2018-05-19 0:14 ` kbuild test robot
2018-05-19 5:01 ` kbuild test robot
2018-05-21 22:55 ` Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 2/4] seccomp: make get_nth_filter available outside of CHECKPOINT_RESTORE Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 3/4] seccomp: add a way to get a listener fd from ptrace Tycho Andersen
2018-05-17 15:41 ` Oleg Nesterov
2018-05-17 15:57 ` Tycho Andersen
2018-05-17 15:59 ` Tycho Andersen
2018-05-18 14:05 ` Christian Brauner
2018-05-18 15:10 ` Tycho Andersen
2018-05-17 15:12 ` [PATCH v2 4/4] seccomp: add support for passing fds via USER_NOTIF Tycho Andersen
2018-05-18 14:03 ` [PATCH v2 0/4] seccomp trap to userspace Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).