From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751116AbbIJVzf (ORCPT ); Thu, 10 Sep 2015 17:55:35 -0400 Received: from mail-io0-f178.google.com ([209.85.223.178]:36033 "EHLO mail-io0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750950AbbIJVze (ORCPT ); Thu, 10 Sep 2015 17:55:34 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Thu, 10 Sep 2015 21:55:33 +0000 Message-ID: Subject: Re: eBPF / seccomp globals? From: Michael Tirado To: Kees Cook Cc: LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 4, 2015 at 8:37 PM, Kees Cook wrote: > > Do you still need file capabilities with the availability of the new > ambient capabilities? > > https://s3hh.wordpress.com/2015/07/25/ambient-capabilities/ > http://thread.gmane.org/gmane.linux.kernel.lsm/24034 Ah.. thanks for the info on this, my launcher program could use ambient capabilities if whoever invoked it already has that capability. I am trying to have the new environment explicitly defined as a white list, and avoid any type of privilege escalation not already granted by root user either by filesystem mechanisms (setuid / file caps) or inheritable caps. I would still like to be able to launch programs with file capabilities since we can lock those down with capability bounding set, and maybe even setuid binaries too (with a hefty warning message). This rules out LD_PRELOAD for me, and also some linkers may not support it at all. > On the TODO list is > doing deep argument inspection, but it is not an easy thing to get > right. :) Yes, please do not rush such a thing!! It might even be a can of worms not worth opening. In case anyone is wondering what I am doing for-now(tm) while waiting for eBPF map support, or some other way to deal with this problem: I have crafted a very hacky patch to work around the issue that will allow 2 system calls to pass through before the filter program is run. I'm lazily using google webmail so, sorry if the tabs are missing :( From: Michael R. Tirado Date: Thu, 10 Sep 2015 08:28:41 +0000 Subject: [PATCH] Add new seccomp filter mode + flag to allow two syscalls to pass before the filter is run. allows a launcher program to setuid(drop caps) and exec if those two privileges are not granted in seccomp filter whitelist. DISCLAIMER: I am doing this as a quick temporary workaround to this complex problem. Also, there may be a more efficient way to implement it instead of branching in the filter loop. --- include/linux/seccomp.h | 2 +- include/uapi/linux/seccomp.h | 2 ++ kernel/seccomp.c | 23 ++++++++++++++++++++--- 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index a19ddac..5547448c 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -3,7 +3,7 @@ #include -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_DEFERRED) #ifdef CONFIG_SECCOMP diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 0f238a4..43a8fb8 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -9,6 +9,7 @@ #define SECCOMP_MODE_DISABLED 0 /* seccomp is not in use. */ #define SECCOMP_MODE_STRICT 1 /* uses hard-coded filter. */ #define SECCOMP_MODE_FILTER 2 /* uses user-supplied filter. */ +#define SECCOMP_MODE_FILTER_DEFERRED 3 /* sets filter mode + deferred flag */ /* Valid operations for seccomp syscall. */ #define SECCOMP_SET_MODE_STRICT 0 @@ -16,6 +17,7 @@ /* Valid flags for SECCOMP_SET_MODE_FILTER */ #define SECCOMP_FILTER_FLAG_TSYNC 1 +#define SECCOMP_FILTER_FLAG_DEFERRED 2 /* grant two unfiltered syscalls */ /* * All BPF programs must return a 32-bit value. diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 245df6b..dc2a5af 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -58,6 +58,7 @@ struct seccomp_filter { atomic_t usage; struct seccomp_filter *prev; struct bpf_prog *prog; + unsigned int deferred; }; /* Limit any path through the tree to 256KB worth of instructions. */ @@ -196,7 +197,12 @@ static u32 seccomp_run_filters(struct seccomp_data *sd) * value always takes priority (ignoring the DATA). */ for (; f; f = f->prev) { - u32 cur_ret = BPF_PROG_RUN(f->prog, (void *)sd); + u32 cur_ret; + if (unlikely(f->deferred)) { + --f->deferred; + continue; + } + cur_ret = BPF_PROG_RUN(f->prog, (void *)sd); if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) ret = cur_ret; @@ -444,6 +450,14 @@ static long seccomp_attach_filter(unsigned int flags, } /* + * in certain cases we may wish to defer filtering, and allow some + * syscalls. eg, a launcher program will setuid(drop caps) then exec. + */ + if (flags & SECCOMP_FILTER_FLAG_DEFERRED) { + filter->deferred = 2; + } + + /* * If there is an existing filter, make it the prev and don't drop its * task reference. */ @@ -838,6 +852,7 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) { unsigned int op; char __user *uargs; + unsigned int flags = 0; switch (seccomp_mode) { case SECCOMP_MODE_STRICT: @@ -849,6 +864,9 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) */ uargs = NULL; break; + /* set flag, older kernels lack seccomp syscall */ + case SECCOMP_MODE_FILTER_DEFERRED: + flags = SECCOMP_FILTER_FLAG_DEFERRED; case SECCOMP_MODE_FILTER: op = SECCOMP_SET_MODE_FILTER; uargs = filter; @@ -857,6 +875,5 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) return -EINVAL; } - /* prctl interface doesn't have flags, so they are always zero. */ - return do_seccomp(op, 0, uargs); + return do_seccomp(op, flags, uargs); } -- 1.8.4