Kernel-hardening Archive on lore.kernel.org
 help / color / Atom feed
From: Kees Cook <keescook@chromium.org>
To: "Mickaël Salaün" <mic@digikod.net>
Cc: linux-security-module <linux-security-module@vger.kernel.org>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Casey Schaufler <casey@schaufler-ca.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Drysdale <drysdale@google.com>,
	Eric Paris <eparis@redhat.com>,
	James Morris <james.l.morris@oracle.com>,
	Jeff Dike <jdike@addtoit.com>, Julien Tinnes <jln@google.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Paul Moore <pmoore@redhat.com>,
	Richard Weinberger <richard@nod.at>,
	"Serge E . Hallyn" <serge@hallyn.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Will Drewry <wad@chromium.org>,
	Linux API <linux-api@vger.kernel.org>,
	"kernel-hardening@lists.openwall.com"
	<kernel-hardening@lists.openwall.com>
Subject: [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
Date: Tue, 26 Apr 2016 15:46:00 -0700
Message-ID: <CAGXu5jJEQ0gm=bZ+c3ttxsz3WFU6xWPcpcpdUH0ttQTwwCfnqA@mail.gmail.com> (raw)
In-Reply-To: <5717C897.1050508@digikod.net>

On Wed, Apr 20, 2016 at 11:21 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> Does anyone had time to review some patches?

Hi! Sorry for the delay on this. I keep getting distracted by other
stuff. I've got some time on a plane tomorrow, so I'll bring your
series along and spend some time reading through it more carefully.

-Kees

>
> What do you think about the ToCToU workarounds?
> What about the userland API?
>
> The series can be found here: https://github.com/l0kod/linux/commits/seccomp-object-v1
>
>  Mickaël
>
>
> On 24/03/2016 02:46, Mickaël Salaün wrote:
>> Hi,
>>
>> This series is a proof of concept (not ready for production) to extend seccomp
>> with the ability to check argument pointers of syscalls as kernel object (e.g.
>> file path). This add a needed feature to create a full sandbox managed by
>> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
>> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)
>>
>> The audience for this RFC is limited to security-related actors to discuss
>> about this new feature before enlarging the scope to a wider audience. This
>> aims to focus on the security goal, usability and architecture before entering
>> into the gory details of each subsystem. I also wish to get constructive
>> criticisms about the userland API and intrusiveness of the code (and what could
>> be the other ways to do it better) before going further (and addressing the
>> TODO and FIXME in the code).
>>
>> The approach taken is to add the minimum amount of code while still allowing
>> the userland to create access rules via seccomp. The current limitation of
>> seccomp is to get raw syscall arguments value but there is no way to
>> dereference a pointer to check its content (e.g. the first argument of the open
>> syscall). This seccomp evolution brings a generic way to check against argument
>> pointer regardless from the syscall unlike current LSMs.
>>
>> Here is the use case scenario:
>> * First, a process must load some groups of seccomp checkers. This checkers are
>>   dedicated structs describing a pointed data (e.g. path). They are
>>   semantically grouped to be efficiently managed and checked in batch. Each
>>   group have a static ID. This IDs are unique and they reference groups only
>>   accessible from the filters created by the same process.
>> * The loaded checkers are inherited and accessible by the newly created
>>   filters. This groups can be referenced by filters with a new return value
>>   SECCOMP_RET_ARGEVAL. Value in  SECCOMP_RET_DATA contains a group ID and an
>>   argument bitmask. This return value is only meaningful between stacked
>>   filters to ask a check and get the result in the extended struct
>>   seccomp_data. The new fields are "is_valid_syscall", "arg_group" containing a
>>   group ID and "matches[6]" consisting of one 64-bits mask per argument. This
>>   bitmasks are useful to get the check result of each checker from a group on a
>>   syscall argument which is handy to create a custom access control engine from
>>   userland.
>> * SECCOMP_RET_ARGEVAL is equivalent to SECCOMP_RET_ACCESS except that the
>>   following filters can take a decision regarding a match (e.g. return EACCESS
>>   or emulate the syscall).
>>
>> Each checker is autonomous and new ones can easily be added in the future.
>> There is currently two checkers for path objects:
>> * SECCOMP_CHECK_FS_LITERAL checks if a string match a defined path;
>> * SECCOMP_CHECK_FS_BENEATH checks if the path representation of a string is
>>   equal or equivalent to a file belonging to a defined path.
>>
>> This design does not seems too intrusive but is flexible enough to allow a
>> powerful sandbox mechanism accessible by any process on Linux. The use of
>> seccomp, including this new feature, is more suitable with the help of a
>> userland library (e.g. libseccomp) that could help to specify a high-level
>> language to express a security policy instead of raw syscall rules.
>>
>> The main concern should be about time-of-check-time-of-use (TOCTOU) race
>> conditions attacks. Because of the nature of seccomp (executed before the
>> effective syscall and before a potential ptrace), it is not possible to block
>> all races but to detect them.
>>
>> There is still some questions I couldn't answer for sure (grep for FIXME or
>> XXX). Comments appreciated.
>>
>> Tested on the x86 and UM architectures in 32 and 64 bits (with audit enabled).
>>
>> [1] https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/lsm
>>
>>
>> # Need for LSM
>>
>> Because the arguments can be checked before the syscall actually evaluate them,
>> there is two race condition classes:
>> * The data pointed by the user address is in control of the userland (e.g. a
>>   tracing process) and is so subject to TOCTOU race conditions between the
>>   seccomp filter evaluation and the effective resource grabbing (part of each
>>   syscall code).
>> * The semantic of the pointed data is also subject to race condition because
>>   there is no lock on the resource (e.g. file) between the evaluation of the
>>   argument by the seccomp filter and the use of the pointed resource by each
>>   part of the syscall code.
>>
>> The solution to fix these race conditions is to copy the userspace data and to
>> lock the pointed resource. Whereas it is easy to copy the userspace data, it is
>> not realistic to lock any pointed resources because of obvious locking issues.
>> However, it is possible to detect a TOCTOU race condition with the help of LSM
>> hooks. This way, we can keep a flexible access control (e.g. by controlling
>> syscall return values) while blocking unattended malicious or bogus userland
>> behavior (e.g. exploit a race-condition).
>>
>> To be able to deny access to a malicious userland behavior we must replay the
>> seccomp filters and verify the intermediate return values to find out if the
>> filters policy is still respected. Thanks to a cache we can detect if a check
>> replay is necessary. Otherwise, the LSM hooks are really quick for
>> non-malicious userland.
>>
>> # Cache handling
>>
>> Each time a checker is called, for each argument to check, it get them from
>> it's seccomp_argeval_checked cache if any, or create a new cache entry and put
>> it otherwise. This cache entries will be used to evaluate arguments.
>>
>> When rechecking in the LSM hooks, first it find out which argument is mapped to
>> the hook check and find if it differ from the corresponding cache entry. If it
>> match, then return OK without replaying the checks, or if nothing match, replay
>> all the checks from this check type.
>>
>> # How to use it
>>
>> The SECCOMP_ARGFLAG_* help to narrow the rules constraints:
>> * SECCOMP_ARGFLAG_FS_DENTRY: Check and rely on the path name.
>> * SECCOMP_ARGFLAG_FS_INODE: Check the data "container" whatever it's path name.
>> * SECCOMP_ARGFLAG_FS_DEVICE: Check the device (i.e. file system) on which the
>>   file is, e.g. it can be use to allow access to USB mass-storage or dm-verity
>>   content only
>> * SECCOMP_ARGFLAG_FS_MOUNT: Check the file mount point, e.g. can enforce a
>>   read-only bind mount (but is less flexible than the other checks)
>> * SECCOMP_ARGFLAG_FS_NOFOLLOW: Check the file without following it if it is a
>>   symlink. Useful for rename(2) or open(2) with O_NOFOLLOW to have consistent
>>   check. However, LSM hooks will deny all unattended accesses set by the rules
>>   ignoring this flag (i.e. it act as a fail-safe).
>>
>> # Limitations
>>
>> ## Ptrace
>> If a process can ptrace another one, the tracer can execute whatever syscall it
>> wants without being constrained by any seccomp filter from the tracee. This
>> apply for this seccomp extension as well. Any seccomp filter should then deny
>> the use of ptrace.
>>
>> The LSM hooks must ensure that the filters results are the same (with the same
>> arguments) but must not deny any ptraced modifications (e.g. syscall argument
>> change).
>>
>> ## Stateless access
>> Unlike current LSMs, the policies are stateless. It's not possible to mark and
>> track a kernel object (e.g. file descriptor). Capsicum seems more appropriate
>> for this kind of feature.
>>
>> ## Resource usage
>> We must limit the resources taken by a filter list, and so the number of rules,
>> to not allow any process to exhaust the system.
>>
>>
>> Regards,
>>
>> Mickaël Salaün (17):
>>   um: Export the sys_call_table
>>   seccomp: Fix typo
>>   selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
>>   selftest/seccomp: Fix the seccomp(2) signature
>>   security/seccomp: Add LSM and create arrays of syscall metadata
>>   seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command
>>   seccomp: Add seccomp object checker evaluation
>>   selftest/seccomp: Remove unknown_ret_is_kill_above_allow test
>>   selftest/seccomp: Extend seccomp_data until matches[6]
>>   selftest/seccomp: Add field_is_valid_syscall test
>>   selftest/seccomp: Add argeval_open_whitelist test
>>   audit,seccomp: Extend audit with seccomp state
>>   selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read
>>   selftest/seccomp: Make tracer_poke() more generic
>>   selftest/seccomp: Add argeval_toctou_argument test
>>   security/seccomp: Protect against filesystem TOCTOU
>>   selftest/seccomp: Add argeval_toctou_filesystem test
>>
>>  arch/x86/um/asm/syscall.h                     |   2 +
>>  include/asm-generic/vmlinux.lds.h             |  22 +
>>  include/linux/audit.h                         |  25 ++
>>  include/linux/compat.h                        |  10 +
>>  include/linux/lsm_hooks.h                     |   5 +
>>  include/linux/seccomp.h                       | 136 +++++-
>>  include/linux/syscalls.h                      |  68 +++
>>  include/uapi/linux/seccomp.h                  | 105 +++++
>>  kernel/audit.h                                |   3 +
>>  kernel/auditsc.c                              |  36 +-
>>  kernel/fork.c                                 |  13 +-
>>  kernel/seccomp.c                              | 594 +++++++++++++++++++++++++-
>>  security/Kconfig                              |   1 +
>>  security/Makefile                             |   2 +
>>  security/seccomp/Kconfig                      |  14 +
>>  security/seccomp/Makefile                     |   3 +
>>  security/seccomp/checker_fs.c                 | 524 +++++++++++++++++++++++
>>  security/seccomp/checker_fs.h                 |  18 +
>>  security/seccomp/lsm.c                        | 135 ++++++
>>  security/seccomp/lsm.h                        |  19 +
>>  security/security.c                           |   1 +
>>  tools/testing/selftests/seccomp/seccomp_bpf.c | 572 +++++++++++++++++++++++--
>>  22 files changed, 2248 insertions(+), 60 deletions(-)
>>  create mode 100644 security/seccomp/Kconfig
>>  create mode 100644 security/seccomp/Makefile
>>  create mode 100644 security/seccomp/checker_fs.c
>>  create mode 100644 security/seccomp/checker_fs.h
>>  create mode 100644 security/seccomp/lsm.c
>>  create mode 100644 security/seccomp/lsm.h
>>
>



-- 
Kees Cook
Chrome OS & Brillo Security

  reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-24  1:46 [kernel-hardening] " Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 01/17] um: Export the sys_call_table Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 02/17] seccomp: Fix typo Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC Mickaël Salaün
2016-03-24  4:35   ` [kernel-hardening] " Kees Cook
2016-03-29 15:35     ` Shuah Khan
2016-03-29 18:46       ` [kernel-hardening] [PATCH 1/2] " Mickaël Salaün
2016-03-29 19:06         ` [kernel-hardening] " Shuah Khan
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature Mickaël Salaün
2016-03-24  4:36   ` [kernel-hardening] " Kees Cook
2016-03-29 15:38     ` Shuah Khan
2016-03-29 18:51       ` [kernel-hardening] [PATCH 2/2] " Mickaël Salaün
2016-03-29 19:07         ` [kernel-hardening] " Shuah Khan
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata Mickaël Salaün
2016-03-24 15:47   ` [kernel-hardening] " Casey Schaufler
2016-03-24 16:01   ` Casey Schaufler
2016-03-24 21:31     ` Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 06/17] seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 07/17] seccomp: Add seccomp object checker evaluation Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 08/17] selftest/seccomp: Remove unknown_ret_is_kill_above_allow test Mickaël Salaün
2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 10/17] selftest/seccomp: Add field_is_valid_syscall test Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 11/17] selftest/seccomp: Add argeval_open_whitelist test Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 12/17] audit,seccomp: Extend audit with seccomp state Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 13/17] selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 14/17] selftest/seccomp: Make tracer_poke() more generic Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 15/17] selftest/seccomp: Add argeval_toctou_argument test Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 16/17] security/seccomp: Protect against filesystem TOCTOU Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 17/17] selftest/seccomp: Add argeval_toctou_filesystem test Mickaël Salaün
2016-03-24 16:24 ` [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Kees Cook
2016-03-27  5:03   ` Loganaden Velvindron
2016-04-20 18:21 ` Mickaël Salaün
2016-04-26 22:46   ` Kees Cook [this message]
2016-04-28  2:36 ` Kees Cook
2016-04-28 23:45   ` Mickaël Salaün
2016-05-21 12:58     ` Mickaël Salaün
2016-05-02 22:19   ` James Morris
2016-05-21 15:19   ` Daniel Borkmann
2016-05-22 21:30     ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGXu5jJEQ0gm=bZ+c3ttxsz3WFU6xWPcpcpdUH0ttQTwwCfnqA@mail.gmail.com' \
    --to=keescook@chromium.org \
    --cc=agruenba@redhat.com \
    --cc=arnd@arndb.de \
    --cc=casey@schaufler-ca.com \
    --cc=daniel@iogearbox.net \
    --cc=drysdale@google.com \
    --cc=eparis@redhat.com \
    --cc=james.l.morris@oracle.com \
    --cc=jdike@addtoit.com \
    --cc=jln@google.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mic@digikod.net \
    --cc=mtk.manpages@gmail.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=pmoore@redhat.com \
    --cc=richard@nod.at \
    --cc=sds@tycho.nsa.gov \
    --cc=serge@hallyn.com \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Kernel-hardening Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernel-hardening/0 kernel-hardening/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernel-hardening kernel-hardening/ https://lore.kernel.org/kernel-hardening \
		kernel-hardening@lists.openwall.com
	public-inbox-index kernel-hardening

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.openwall.lists.kernel-hardening


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git