From: Will Drewry <wad@chromium.org>
To: Indan Zupancic <indan@nul.nu>
Cc: linux-kernel@vger.kernel.org, keescook@chromium.org,
john.johansen@canonical.com, serge.hallyn@canonical.com,
coreyb@linux.vnet.ibm.com, pmoore@redhat.com, eparis@redhat.com,
djm@mindrot.org, torvalds@linux-foundation.org,
segoon@openwall.com, rostedt@goodmis.org, jmorris@namei.org,
scarybeasts@gmail.com, avi@redhat.com, penberg@cs.helsinki.fi,
viro@zeniv.linux.org.uk, luto@mit.edu, mingo@elte.hu,
akpm@linux-foundation.org, khilman@ti.com,
borislav.petkov@amd.com, amwang@redhat.com, oleg@redhat.com,
ak@linux.intel.com, eric.dumazet@gmail.com, gregkh@suse.de,
dhowells@redhat.com, daniel.lezcano@free.fr,
linux-fsdevel@vger.kernel.org,
linux-security-module@vger.kernel.org, olofj@chromium.org,
mhalcrow@google.com, dlaor@redhat.com, corbet@lwn.net,
alan@lxorguk.ukuu.org.uk, mcgrathr@chromium.org
Subject: Re: [PATCH v5 2/3] seccomp_filters: system call filtering using BPF
Date: Tue, 31 Jan 2012 08:18:51 -0700 [thread overview]
Message-ID: <CABqD9ha9g-8zE6bTt=67LFCbt-y3nGvOOatZoDbFxn=mM0Fy_g@mail.gmail.com> (raw)
In-Reply-To: <CABqD9hYpiDtCM5gPgjjHOJ6DcFs-HHUORAGi+AMFs1+9EAk1UA@mail.gmail.com>
On Tue, Jan 31, 2012 at 4:04 AM, Will Drewry <wad@chromium.org> wrote:
> On Mon, Jan 30, 2012 at 7:42 PM, Indan Zupancic <indan@nul.nu> wrote:
>> On Mon, January 30, 2012 23:26, Will Drewry wrote:
>>> Do you think something along the lines of 2 kB is sane for a config-less change?
>>
>> Yes, especially if there is some way to get rid of it anyway,
>> like disabling SECCOMP or some option under CONFIG_EMBEDDED.
>> But it seems you need at least a hidden config option which
>> depends on the stuff you need.
>
> Disabling SECCOMP would definitely do it.
>
>>>
>>> Doing exactly that. I've been tinkering with the best way to minimize
>>> the impact to the existing BPF evaluator. Right now, I'm adding a
>>> very small number of new instructions to the sk_buff specific code
>>> path, but I haven't yet benchmarked - just disasssembled.
>>
>> I would do all the checking in sk_chk_filter(), so you know that when
>> you do run the filter, you can't hit the sk_buff paths. This doesn't
>> cause any slow down for the networking path.
>
> Ah sorry - I was referring to the intrusion of a load_pointer function
> pointer. I want to leave the current networking path as untouched as
> possible. For checking, I agree -- a quick change to sk_chk_filter or
> even just a small helper function that scans for any codes in the
> ancillary range will do the trick.
I see now. I can do all the fixup in sk_chk_filter. Clever! Sorry for
not catching on faster :)
>
>>> I agree. I will post the next series with a proposed integration. If
>>> there is a lot of resistance, then the difference will be going from a
>>> 2kB changes to a 3kB change.
>>
>> Let's see how it goes.
>>
>>>> I think you should go on a quest to make sure (almost) all archs have that file,
>>>> before this patch can be merged. At least the archs that have ptrace support.
>>>
>>> I'm an idiot. CONFIG_HAVE_ARCH_TRACEHOOK covers asm/syscall.h
>>>
>>> So I have two choices:
>>> 1. allow seccomp with filtering on these platforms by fail if an
>>> argument is accessed
>>> 2. return ENOSYS when a filter is attempted to be installed on
>>> platforms with no tracehook support.
>>
>> I vote for:
>>
>> 3. Add tracehook support to all archs.
>
> I don't see these #3 as mutually exclusive :) tracehook requires:
> - task_pt_regs() in asm/processor.h or asm/ptrace.h
> - arch_has_single_step() if there is hardware single-step support
> - arch_has_block_step() if there is hardware block-step support
> - asm/syscall.h supplying asm-generic/syscall.h interface
> - linux/regset.h user_regset interfaces
> - CORE_DUMP_USE_REGSET #define'd in linux/elf.h
> -TIF_SYSCALL_TRACE calls tracehook_report_syscall_{entry,exit}
> - TIF_NOTIFY_RESUME calls tracehook_notify_resume()
> - signal delivery calls tracehook_signal_handler()
>
>> Maybe not all archs, but at least some more. That way, every time someone
>> adds something tracehook specific, more archs support it.
>
> Well the other arch I want this on specifically for my purposes is
> arm, but someone recently posted a partial asm/syscall.h for arm, but
> I didn't see that one go anywhere just yet. (I know syscall_get_nr
> can be tricky on arm because it doesn't (didn't) have a way of
> tracking in-syscall state.)
>
> ref: https://lkml.org/lkml/2011/12/1/131
>
>> syscall.h has no TRACEHOOK defines or anything though.
>
> Nope - it is just part of what is expected.
>
>> Only syscall_rollback() looks tricky. I have no clue what the difference
>> between syscall_get_error() and syscall_get_return_value() is. But you
>> only need to add syscall_get_nr() and syscall_[gs]et_arguments(), which
>> should be possible for all archs.
>
> It seems even syscall_get_nr can have some wrinkles :)
>
> ref: http://lkml.indiana.edu/hypermail/linux/kernel/0906.3/00096.html
>
>> How many archs don't support tracehook?
>
> 14 out of 26. However, 5 of those still have asm/syscall.h
>
>>> I think #2 is the nicest user contract, but #1 allows working filters
>>> even on less hospitable arches even if they can't user arguments
>>> (yet). I'm coding it up as #2, but it is easy to switch to #1.
>>
>> If you don't support the arch, don't compile any code at all, and
>> let prctl(2) return EINVAL. You don't want to return ENOSYS.
>
> I was thinking of the inline prctl handler, but EINVAL makes sense.
>
>>>> Yeah, I figured that out later on. It's quite nifty, but I find the recursion
>>>> within kref_put() slightly worrisome. Perhaps the code would be cleaner if this
>>>> was avoided and done differently, but I can't think of a good alternative. I'll
>>>> wait for the new version to see if I can find a way.
>>>
>>> Thanks - sure. Since kref_put is just an atomic_dec_and_test followed
>>> by a call to the function pointer, I wasn't too concerned. Without
>>> changing how the relationships are handled, I'm not sure how to
>>> approach it differently (and still avoid races). IIRC, this isn't much
>>> different to how namespaces work, they just use their own atomic
>>> counters.
>>
>> Well, the thing is, this recursion is controlled by user space depending
>> on how many filters they have installed. What is preventing them to force
>> you out of stack?
>
> Hrm true. The easiest option is to just convert it to iterative by
> not using kref_t, but I'll look more closely.
>
>> So perhaps add at least some arbitrary filter limit to avoid this?
>
> Definitely possible -- probably as a sysctl. I'm not quite sure what
> number makes sense yet, but I'll look at breaking the recursion first.
> Thanks!
>
>>> I'll clarify a bit. My original ptrace integration worked such that a
>>> tracer may only intercept syscall failures if it attached prior to the
>>> failing filter being installed. I did it this way to avoid using
>>> ptrace to escape system call filtering. However, since I don't have
>>> that as part of the patch series, it doesn't make sense to keep it. (I
>>> tracked a tracer pid_struct in the filters.) If it needs to come back
>>> with later patchsets, then it can be tackled then!
>>
>> The problem of that is that filters can be shared between processes with
>> different ptracers. And you have all the hassle of keeping it up to date.
>>
>> I think seccomp should always come first and just trust ptrace. This
>> because it can deny all ptrace() calls for filtered tasks, so the only
>> untrusted tasks doing ptrace() are outside of seccomp's filtering control.
>> And those could do the same system calls themselves.
>>
>> The case where there is one task being filtered and allowed to do ptrace,
>> but not some other syscall, ptracing another filtered task which isn't
>> allowed to do ptrace, but allowed to do that other syscall, is quite far
>> fetched I think. If you really want to handle this, then you could run
>> the ptracer's filters against the tracee's post-ptrace syscall state.
>> This is best done in the ptracer's context, just before continuing the
>> system call. (You really want Oleg's SIKILL immediate patch then.)
>>
>> What about:
>>
>> 1) Seccomp filters can deny a syscall by killing the task.
>>
>> 2) Seccomp can deny a syscall and let it return an error value.
>>
>> I know you're not fond of this one. It's just a performance
>> optimisation as sometimes a lot of denied but harmless syscalls
>> are called by glibc all the time, like getxattr(). Hardcoding
>> the kill policy seems wrong when it can be avoided. If this is
>> too hard then skip it, but if it's easy to add then please do.
>> I'll take a look at this with your next patch version.
>
> It's easy on x86 harder on other arches. I would suggest saving
> changing the __secure_computing signature until after the core
> functionality lands, but that's just me.
>
>> 3) Seccomp can allow a syscall to proceed normally.
>>
>> 4) Seccomp can set a hint to skip ptrace syscall events for this syscall.
>> A filter sets this by returning a specific value.
>>
>> 5) Ptrace always gets a syscall event when it asked for it.
>>
>> 6) Ptrace can set an option to honour seccomp's hint and to not get all
>> syscall events.
>>
>> This way all seccomp needs to do is to set some flags which ptrace can check.
>
> I like the use of flags/options to trigger ptrace handling. If I were
> to stack rank these for pursuit after the core functionality lands,
> it'd be to add #6 (and its deps) then #2. With #6, #2 can be
> simulated (by having a supervisor that changes the syscall number to
> -1), but that is much less ideal than just returning SECCOMP_ERROR
> instead of SECCOMP_ALLOW/DENY and letting an error code get bubbled
> up.
>
> thanks!
> will
next prev parent reply other threads:[~2012-01-31 15:18 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-27 23:24 [PATCH v5 1/3] seccomp: kill the seccomp_t typedef Will Drewry
2012-01-27 23:24 ` [PATCH v5 2/3] seccomp_filters: system call filtering using BPF Will Drewry
2012-01-28 11:21 ` Cong Wang
2012-01-28 20:13 ` Will Drewry
2012-01-28 16:28 ` Andrew Lutomirski
2012-01-28 19:50 ` Will Drewry
2012-01-29 4:39 ` Indan Zupancic
2012-01-29 20:27 ` Will Drewry
2012-01-30 2:37 ` Indan Zupancic
2012-01-30 22:26 ` Will Drewry
2012-01-30 22:29 ` Andrew Lutomirski
2012-01-30 22:42 ` Will Drewry
2012-01-30 22:59 ` Indan Zupancic
2012-01-31 1:42 ` Indan Zupancic
2012-01-31 11:04 ` Will Drewry
2012-01-31 15:18 ` Will Drewry [this message]
2012-02-01 1:36 ` Indan Zupancic
2012-02-01 9:02 ` Will Drewry
2012-02-01 10:56 ` Indan Zupancic
2012-02-01 17:34 ` Roland McGrath
2012-01-27 23:24 ` [PATCH v5 3/3] Documentation: prctl/seccomp_filter Will Drewry
2012-01-27 23:41 ` [PATCH v5 1/3] seccomp: kill the seccomp_t typedef Greg KH
2012-01-27 23:47 ` Will Drewry
2012-01-28 0:38 ` Steven Rostedt
2012-01-28 11:13 ` Cong Wang
2012-01-28 21:06 ` Will Drewry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABqD9ha9g-8zE6bTt=67LFCbt-y3nGvOOatZoDbFxn=mM0Fy_g@mail.gmail.com' \
--to=wad@chromium.org \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=amwang@redhat.com \
--cc=avi@redhat.com \
--cc=borislav.petkov@amd.com \
--cc=corbet@lwn.net \
--cc=coreyb@linux.vnet.ibm.com \
--cc=daniel.lezcano@free.fr \
--cc=dhowells@redhat.com \
--cc=djm@mindrot.org \
--cc=dlaor@redhat.com \
--cc=eparis@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=gregkh@suse.de \
--cc=indan@nul.nu \
--cc=jmorris@namei.org \
--cc=john.johansen@canonical.com \
--cc=keescook@chromium.org \
--cc=khilman@ti.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=luto@mit.edu \
--cc=mcgrathr@chromium.org \
--cc=mhalcrow@google.com \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=olofj@chromium.org \
--cc=penberg@cs.helsinki.fi \
--cc=pmoore@redhat.com \
--cc=rostedt@goodmis.org \
--cc=scarybeasts@gmail.com \
--cc=segoon@openwall.com \
--cc=serge.hallyn@canonical.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).