All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julien Tinnes <jln@google.com>
To: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	David Drysdale <drysdale@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Paolo Bonzini <pbonzini@redhat.com>,
	LSM List <linux-security-module@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Paul Moore <paul@paul-moore.com>,
	James Morris <james.l.morris@oracle.com>,
	Linux API <linux-api@vger.kernel.org>,
	Meredydd Luff <meredydd@senatehouse.org>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 11/11] seccomp: Add tgid and tid into seccomp_data
Date: Fri, 25 Jul 2014 11:24:51 -0700	[thread overview]
Message-ID: <CAKyRK=gTVV+isc8=MpQ=x9xXj+wVsd7WtHzxHknZ=9WxT+c01g@mail.gmail.com> (raw)
In-Reply-To: <CAGXu5jLPrKA5LR-9=M6jAfPXYoztGzXPiaSiXgEcUE=+na73GA@mail.gmail.com>

On Fri, Jul 25, 2014 at 10:38 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jul 25, 2014 at 10:18 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> [cc: Eric Biederman]
>>
>> On Fri, Jul 25, 2014 at 10:10 AM, Kees Cook <keescook@chromium.org> wrote:

>>> Julien had been wanting something like this too (though he'd suggested
>>> it via prctl): limit the signal functions to "self" only. I wonder if
>>> adding a prctl like done for O_BENEATH could work for signal sending?
>>>
>>
>>
>> Can we do one better and add a flag to prevent any non-self pid
>> lookups?  This might actually be easy on top of the pid namespace work
>> (e.g. we could change the way that find_task_by_vpid works).
>
> Ooh, that would be extremely interesting, yes. Kind of an extreme form
> of pid namespace without actually being a namespace.
>
>> It's far from just being signals.  There's access_process_vm, ptrace,
>> all the signal functions, clock_gettime (see CPUCLOCK_PID -- yes, this
>> is ridiculous), and probably some others that I've forgotten about or
>> never noticed in the first place.
>
> Yeah, that would be very interesting.

Yes, this would be incredibly useful.

1. For Chromium [1], I dislike relying on seccomp purely for
"access-control" (to other processes or files). Because it's really
hard to think about everything (things like CPUCLOCK_PID bite,
seehttps://crbug.com/374479).
Se we have a first layer of sandboxing (using PID + NET namespaces and
chroot) for "access-control" and a second layer for kernel attack
surface reduction and a few other things using seccomp-bpf.

The first layer isn't currently very good; it's heavyweight and
complex (you need an init(1) per namespace and that init cannot be
multi-purposed as a useful process because pid = 1 can never receive
signals). One PID namespace per process isn't something that scales
well. (Also before USER_NS it required a setuid root program).

2. Even with a safe pure seccomp-bpf sandbox that prevents sending
signals to other process / ptrace() et al and that restrict
clock_gettime(2) properly, things become quickly very tedious because
as far as the kernel is concerned, the process under this BPF program
can still pass ptrace_may_access() to other processes. This means for
instance that no matter what you do, a model where open() is allowed
can't work if /proc is available. We need a mode that says
"ptrace_may_access()" will never pass.

So yes, I really would like:
- a prctl that says: "I'm dropping privileges and I now can't interact
with other thread groups (via signals, ptrace, etc..)".
- Something to drop access to the file system. It could be an
unprivileged way to chroot() to an empty directory (unprivileged
namespaces work for that, - except if you're already in a chroot -).
This is a little tricky without allowing chroot escapes, so I suspect
we would want to express it in terms of mount namespace, or something
else, rather than chroot.

Then we have the primitives  we need to build sandboxes in a simple
way and we can add seccomp-bpf on top to do things such as open()
hooking (via SECCOMP_RET_TRAP) and to restrict the kernel attack
surface.

Julien

[1] https://code.google.com/p/chromium/wiki/LinuxSandboxing

WARNING: multiple messages have this Message-ID (diff)
From: Julien Tinnes <jln-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Cc: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	"Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
	David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	LSM List
	<linux-security-module-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Greg Kroah-Hartman
	<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>,
	Paul Moore <paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org>,
	James Morris
	<james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Meredydd Luff <meredydd-zPN50pYk8eUaUu29zAJCuw@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 11/11] seccomp: Add tgid and tid into seccomp_data
Date: Fri, 25 Jul 2014 11:24:51 -0700	[thread overview]
Message-ID: <CAKyRK=gTVV+isc8=MpQ=x9xXj+wVsd7WtHzxHknZ=9WxT+c01g@mail.gmail.com> (raw)
In-Reply-To: <CAGXu5jLPrKA5LR-9=M6jAfPXYoztGzXPiaSiXgEcUE=+na73GA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Jul 25, 2014 at 10:38 AM, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
> On Fri, Jul 25, 2014 at 10:18 AM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> [cc: Eric Biederman]
>>
>> On Fri, Jul 25, 2014 at 10:10 AM, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:

>>> Julien had been wanting something like this too (though he'd suggested
>>> it via prctl): limit the signal functions to "self" only. I wonder if
>>> adding a prctl like done for O_BENEATH could work for signal sending?
>>>
>>
>>
>> Can we do one better and add a flag to prevent any non-self pid
>> lookups?  This might actually be easy on top of the pid namespace work
>> (e.g. we could change the way that find_task_by_vpid works).
>
> Ooh, that would be extremely interesting, yes. Kind of an extreme form
> of pid namespace without actually being a namespace.
>
>> It's far from just being signals.  There's access_process_vm, ptrace,
>> all the signal functions, clock_gettime (see CPUCLOCK_PID -- yes, this
>> is ridiculous), and probably some others that I've forgotten about or
>> never noticed in the first place.
>
> Yeah, that would be very interesting.

Yes, this would be incredibly useful.

1. For Chromium [1], I dislike relying on seccomp purely for
"access-control" (to other processes or files). Because it's really
hard to think about everything (things like CPUCLOCK_PID bite,
seehttps://crbug.com/374479).
Se we have a first layer of sandboxing (using PID + NET namespaces and
chroot) for "access-control" and a second layer for kernel attack
surface reduction and a few other things using seccomp-bpf.

The first layer isn't currently very good; it's heavyweight and
complex (you need an init(1) per namespace and that init cannot be
multi-purposed as a useful process because pid = 1 can never receive
signals). One PID namespace per process isn't something that scales
well. (Also before USER_NS it required a setuid root program).

2. Even with a safe pure seccomp-bpf sandbox that prevents sending
signals to other process / ptrace() et al and that restrict
clock_gettime(2) properly, things become quickly very tedious because
as far as the kernel is concerned, the process under this BPF program
can still pass ptrace_may_access() to other processes. This means for
instance that no matter what you do, a model where open() is allowed
can't work if /proc is available. We need a mode that says
"ptrace_may_access()" will never pass.

So yes, I really would like:
- a prctl that says: "I'm dropping privileges and I now can't interact
with other thread groups (via signals, ptrace, etc..)".
- Something to drop access to the file system. It could be an
unprivileged way to chroot() to an empty directory (unprivileged
namespaces work for that, - except if you're already in a chroot -).
This is a little tricky without allowing chroot escapes, so I suspect
we would want to express it in terms of mount namespace, or something
else, rather than chroot.

Then we have the primitives  we need to build sandboxes in a simple
way and we can add seccomp-bpf on top to do things such as open()
hooking (via SECCOMP_RET_TRAP) and to restrict the kernel attack
surface.

Julien

[1] https://code.google.com/p/chromium/wiki/LinuxSandboxing

  reply	other threads:[~2014-07-25 18:24 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-25 13:46 [RFC PATCHv2 00/11] Adding FreeBSD's Capsicum security framework David Drysdale
2014-07-25 13:46 ` [PATCH 01/11] fs: add O_BENEATH flag to openat(2) David Drysdale
2014-07-25 13:46 ` [PATCH 02/11] selftests: Add test of O_BENEATH & openat(2) David Drysdale
2014-07-25 13:46 ` [PATCH 03/11] capsicum: rights values and structure definitions David Drysdale
2014-07-25 13:47 ` [PATCH 04/11] capsicum: implement fgetr() and friends David Drysdale
2014-07-25 13:47   ` David Drysdale
2014-07-25 13:47 ` [PATCH 05/11] capsicum: convert callers to use fgetr() etc David Drysdale
2014-07-25 13:47   ` David Drysdale
2014-07-25 13:47 ` [PATCH 06/11] capsicum: implement sockfd_lookupr() David Drysdale
2014-07-25 13:47 ` [PATCH 07/11] capsicum: convert callers to use sockfd_lookupr() etc David Drysdale
2014-07-25 13:47 ` [PATCH 08/11] capsicum: invoke Capsicum on FD/file conversion David Drysdale
2014-07-25 13:47 ` [PATCH 09/11] capsicum: add syscalls to limit FD rights David Drysdale
2014-07-25 13:47   ` David Drysdale
2014-07-25 13:47 ` [PATCH 10/11] capsicum: prctl(2) to force use of O_BENEATH David Drysdale
2014-07-25 13:47   ` David Drysdale
2014-07-25 14:01   ` Paolo Bonzini
2014-07-25 16:00     ` Andy Lutomirski
2014-07-27 12:08       ` David Drysdale
2014-07-25 13:47 ` [PATCH 11/11] seccomp: Add tgid and tid into seccomp_data David Drysdale
2014-07-25 15:59   ` Andy Lutomirski
2014-07-25 17:10     ` Kees Cook
2014-07-25 17:18       ` Andy Lutomirski
2014-07-25 17:38         ` Kees Cook
2014-07-25 18:24           ` Julien Tinnes [this message]
2014-07-25 18:24             ` Julien Tinnes
     [not found]           ` <CAKyRK=j-f92xHTL3+TNr9WOv_y47dkZR=WZkpY_a5YW3Q8HfaQ@mail.gmail.com>
2014-07-25 18:32             ` Andy Lutomirski
2014-07-27 12:10               ` David Drysdale
2014-07-27 12:10                 ` David Drysdale
2014-07-27 12:09         ` David Drysdale
2014-07-28 21:18         ` Eric W. Biederman
2014-07-28 21:18           ` Eric W. Biederman
2014-07-30  4:05           ` Andy Lutomirski
2014-07-30  4:05             ` Andy Lutomirski
2014-07-30  4:08             ` Eric W. Biederman
2014-07-30  4:08               ` Eric W. Biederman
2014-07-30  4:35               ` Andy Lutomirski
     [not found]                 ` <8761ifie81.fsf@x220.int.ebiederm.org>
2014-07-30 14:52                   ` Andy Lutomirski
2014-07-30 14:52                     ` Andy Lutomirski
2014-07-25 13:47 ` [PATCH 1/6] open.2: describe O_BENEATH flag David Drysdale
2014-07-25 13:47 ` [PATCH 2/6] capsicum.7: describe Capsicum capability framework David Drysdale
2014-07-25 13:47 ` [PATCH 3/6] rights.7: Describe Capsicum primary rights David Drysdale
2014-07-25 13:47 ` [PATCH 4/6] cap_rights_limit.2: limit FD rights for Capsicum David Drysdale
2014-07-25 13:47 ` [PATCH 5/6] cap_rights_get.2: retrieve Capsicum fd rights David Drysdale
2014-07-25 13:47 ` [PATCH 6/6] prctl.2: describe PR_SET_OPENAT_BENEATH/PR_GET_OPENAT_BENEATH David Drysdale
2014-07-25 13:47   ` David Drysdale
2014-07-26 21:04 ` [RFC PATCHv2 00/11] Adding FreeBSD's Capsicum security framework Eric W. Biederman
2014-07-26 21:04   ` Eric W. Biederman
2014-07-28 12:30   ` Paolo Bonzini
2014-07-28 12:30     ` Paolo Bonzini
2014-07-28 16:04   ` David Drysdale
2014-07-28 21:13     ` Eric W. Biederman
2014-07-28 21:13       ` Eric W. Biederman
2014-07-29  8:43       ` Paolo Bonzini
2014-07-29  8:43         ` Paolo Bonzini
2014-07-29 10:58       ` David Drysdale
2014-07-30  6:22         ` Eric W. Biederman
2014-07-30  6:22           ` Eric W. Biederman
2014-07-30 14:51           ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKyRK=gTVV+isc8=MpQ=x9xXj+wVsd7WtHzxHknZ=9WxT+c01g@mail.gmail.com' \
    --to=jln@google.com \
    --cc=drysdale@google.com \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=james.l.morris@oracle.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=meredydd@senatehouse.org \
    --cc=paul@paul-moore.com \
    --cc=pbonzini@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.