Kernel-hardening Archive on lore.kernel.org
 help / color / Atom feed
From: "Mickaël Salaün" <mic@digikod.net>
To: linux-security-module@vger.kernel.org
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Casey Schaufler <casey@schaufler-ca.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Drysdale <drysdale@google.com>,
	Eric Paris <eparis@redhat.com>,
	James Morris <james.l.morris@oracle.com>,
	Jeff Dike <jdike@addtoit.com>, Julien Tinnes <jln@google.com>,
	Kees Cook <keescook@chromium.org>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Paul Moore <pmoore@redhat.com>,
	Richard Weinberger <richard@nod.at>,
	"Serge E . Hallyn" <serge@hallyn.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Will Drewry <wad@chromium.org>,
	linux-api@vger.kernel.org, kernel-hardening@lists.openwall.com
Subject: [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
Date: Wed, 20 Apr 2016 20:21:11 +0200
Message-ID: <5717C897.1050508@digikod.net> (raw)
In-Reply-To: <1458784008-16277-1-git-send-email-mic@digikod.net>

[-- Attachment #1.1: Type: text/plain, Size: 10853 bytes --]

Hi,

Does anyone had time to review some patches?

What do you think about the ToCToU workarounds?
What about the userland API?

The series can be found here: https://github.com/l0kod/linux/commits/seccomp-object-v1

 Mickaël


On 24/03/2016 02:46, Mickaël Salaün wrote:
> Hi,
> 
> This series is a proof of concept (not ready for production) to extend seccomp
> with the ability to check argument pointers of syscalls as kernel object (e.g.
> file path). This add a needed feature to create a full sandbox managed by
> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)
> 
> The audience for this RFC is limited to security-related actors to discuss
> about this new feature before enlarging the scope to a wider audience. This
> aims to focus on the security goal, usability and architecture before entering
> into the gory details of each subsystem. I also wish to get constructive
> criticisms about the userland API and intrusiveness of the code (and what could
> be the other ways to do it better) before going further (and addressing the
> TODO and FIXME in the code).
> 
> The approach taken is to add the minimum amount of code while still allowing
> the userland to create access rules via seccomp. The current limitation of
> seccomp is to get raw syscall arguments value but there is no way to
> dereference a pointer to check its content (e.g. the first argument of the open
> syscall). This seccomp evolution brings a generic way to check against argument
> pointer regardless from the syscall unlike current LSMs.
> 
> Here is the use case scenario:
> * First, a process must load some groups of seccomp checkers. This checkers are
>   dedicated structs describing a pointed data (e.g. path). They are
>   semantically grouped to be efficiently managed and checked in batch. Each
>   group have a static ID. This IDs are unique and they reference groups only
>   accessible from the filters created by the same process.
> * The loaded checkers are inherited and accessible by the newly created
>   filters. This groups can be referenced by filters with a new return value
>   SECCOMP_RET_ARGEVAL. Value in  SECCOMP_RET_DATA contains a group ID and an
>   argument bitmask. This return value is only meaningful between stacked
>   filters to ask a check and get the result in the extended struct
>   seccomp_data. The new fields are "is_valid_syscall", "arg_group" containing a
>   group ID and "matches[6]" consisting of one 64-bits mask per argument. This
>   bitmasks are useful to get the check result of each checker from a group on a
>   syscall argument which is handy to create a custom access control engine from
>   userland.
> * SECCOMP_RET_ARGEVAL is equivalent to SECCOMP_RET_ACCESS except that the
>   following filters can take a decision regarding a match (e.g. return EACCESS
>   or emulate the syscall).
> 
> Each checker is autonomous and new ones can easily be added in the future.
> There is currently two checkers for path objects:
> * SECCOMP_CHECK_FS_LITERAL checks if a string match a defined path;
> * SECCOMP_CHECK_FS_BENEATH checks if the path representation of a string is
>   equal or equivalent to a file belonging to a defined path.
> 
> This design does not seems too intrusive but is flexible enough to allow a
> powerful sandbox mechanism accessible by any process on Linux. The use of
> seccomp, including this new feature, is more suitable with the help of a
> userland library (e.g. libseccomp) that could help to specify a high-level
> language to express a security policy instead of raw syscall rules.
> 
> The main concern should be about time-of-check-time-of-use (TOCTOU) race
> conditions attacks. Because of the nature of seccomp (executed before the
> effective syscall and before a potential ptrace), it is not possible to block
> all races but to detect them.
> 
> There is still some questions I couldn't answer for sure (grep for FIXME or
> XXX). Comments appreciated.
> 
> Tested on the x86 and UM architectures in 32 and 64 bits (with audit enabled).
> 
> [1] https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/lsm
> 
> 
> # Need for LSM
> 
> Because the arguments can be checked before the syscall actually evaluate them,
> there is two race condition classes:
> * The data pointed by the user address is in control of the userland (e.g. a
>   tracing process) and is so subject to TOCTOU race conditions between the
>   seccomp filter evaluation and the effective resource grabbing (part of each
>   syscall code).
> * The semantic of the pointed data is also subject to race condition because
>   there is no lock on the resource (e.g. file) between the evaluation of the
>   argument by the seccomp filter and the use of the pointed resource by each
>   part of the syscall code.
> 
> The solution to fix these race conditions is to copy the userspace data and to
> lock the pointed resource. Whereas it is easy to copy the userspace data, it is
> not realistic to lock any pointed resources because of obvious locking issues.
> However, it is possible to detect a TOCTOU race condition with the help of LSM
> hooks. This way, we can keep a flexible access control (e.g. by controlling
> syscall return values) while blocking unattended malicious or bogus userland
> behavior (e.g. exploit a race-condition).
> 
> To be able to deny access to a malicious userland behavior we must replay the
> seccomp filters and verify the intermediate return values to find out if the
> filters policy is still respected. Thanks to a cache we can detect if a check
> replay is necessary. Otherwise, the LSM hooks are really quick for
> non-malicious userland.
> 
> # Cache handling
> 
> Each time a checker is called, for each argument to check, it get them from
> it's seccomp_argeval_checked cache if any, or create a new cache entry and put
> it otherwise. This cache entries will be used to evaluate arguments.
> 
> When rechecking in the LSM hooks, first it find out which argument is mapped to
> the hook check and find if it differ from the corresponding cache entry. If it
> match, then return OK without replaying the checks, or if nothing match, replay
> all the checks from this check type.
> 
> # How to use it
> 
> The SECCOMP_ARGFLAG_* help to narrow the rules constraints:
> * SECCOMP_ARGFLAG_FS_DENTRY: Check and rely on the path name.
> * SECCOMP_ARGFLAG_FS_INODE: Check the data "container" whatever it's path name.
> * SECCOMP_ARGFLAG_FS_DEVICE: Check the device (i.e. file system) on which the
>   file is, e.g. it can be use to allow access to USB mass-storage or dm-verity
>   content only
> * SECCOMP_ARGFLAG_FS_MOUNT: Check the file mount point, e.g. can enforce a
>   read-only bind mount (but is less flexible than the other checks)
> * SECCOMP_ARGFLAG_FS_NOFOLLOW: Check the file without following it if it is a
>   symlink. Useful for rename(2) or open(2) with O_NOFOLLOW to have consistent
>   check. However, LSM hooks will deny all unattended accesses set by the rules
>   ignoring this flag (i.e. it act as a fail-safe).
> 
> # Limitations
> 
> ## Ptrace
> If a process can ptrace another one, the tracer can execute whatever syscall it
> wants without being constrained by any seccomp filter from the tracee. This
> apply for this seccomp extension as well. Any seccomp filter should then deny
> the use of ptrace.
> 
> The LSM hooks must ensure that the filters results are the same (with the same
> arguments) but must not deny any ptraced modifications (e.g. syscall argument
> change).
> 
> ## Stateless access
> Unlike current LSMs, the policies are stateless. It's not possible to mark and
> track a kernel object (e.g. file descriptor). Capsicum seems more appropriate
> for this kind of feature.
> 
> ## Resource usage
> We must limit the resources taken by a filter list, and so the number of rules,
> to not allow any process to exhaust the system.
> 
> 
> Regards,
> 
> Mickaël Salaün (17):
>   um: Export the sys_call_table
>   seccomp: Fix typo
>   selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
>   selftest/seccomp: Fix the seccomp(2) signature
>   security/seccomp: Add LSM and create arrays of syscall metadata
>   seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command
>   seccomp: Add seccomp object checker evaluation
>   selftest/seccomp: Remove unknown_ret_is_kill_above_allow test
>   selftest/seccomp: Extend seccomp_data until matches[6]
>   selftest/seccomp: Add field_is_valid_syscall test
>   selftest/seccomp: Add argeval_open_whitelist test
>   audit,seccomp: Extend audit with seccomp state
>   selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read
>   selftest/seccomp: Make tracer_poke() more generic
>   selftest/seccomp: Add argeval_toctou_argument test
>   security/seccomp: Protect against filesystem TOCTOU
>   selftest/seccomp: Add argeval_toctou_filesystem test
> 
>  arch/x86/um/asm/syscall.h                     |   2 +
>  include/asm-generic/vmlinux.lds.h             |  22 +
>  include/linux/audit.h                         |  25 ++
>  include/linux/compat.h                        |  10 +
>  include/linux/lsm_hooks.h                     |   5 +
>  include/linux/seccomp.h                       | 136 +++++-
>  include/linux/syscalls.h                      |  68 +++
>  include/uapi/linux/seccomp.h                  | 105 +++++
>  kernel/audit.h                                |   3 +
>  kernel/auditsc.c                              |  36 +-
>  kernel/fork.c                                 |  13 +-
>  kernel/seccomp.c                              | 594 +++++++++++++++++++++++++-
>  security/Kconfig                              |   1 +
>  security/Makefile                             |   2 +
>  security/seccomp/Kconfig                      |  14 +
>  security/seccomp/Makefile                     |   3 +
>  security/seccomp/checker_fs.c                 | 524 +++++++++++++++++++++++
>  security/seccomp/checker_fs.h                 |  18 +
>  security/seccomp/lsm.c                        | 135 ++++++
>  security/seccomp/lsm.h                        |  19 +
>  security/security.c                           |   1 +
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 572 +++++++++++++++++++++++--
>  22 files changed, 2248 insertions(+), 60 deletions(-)
>  create mode 100644 security/seccomp/Kconfig
>  create mode 100644 security/seccomp/Makefile
>  create mode 100644 security/seccomp/checker_fs.c
>  create mode 100644 security/seccomp/checker_fs.h
>  create mode 100644 security/seccomp/lsm.c
>  create mode 100644 security/seccomp/lsm.h
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

  parent reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-24  1:46 [kernel-hardening] " Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 01/17] um: Export the sys_call_table Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 02/17] seccomp: Fix typo Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC Mickaël Salaün
2016-03-24  4:35   ` [kernel-hardening] " Kees Cook
2016-03-29 15:35     ` Shuah Khan
2016-03-29 18:46       ` [kernel-hardening] [PATCH 1/2] " Mickaël Salaün
2016-03-29 19:06         ` [kernel-hardening] " Shuah Khan
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature Mickaël Salaün
2016-03-24  4:36   ` [kernel-hardening] " Kees Cook
2016-03-29 15:38     ` Shuah Khan
2016-03-29 18:51       ` [kernel-hardening] [PATCH 2/2] " Mickaël Salaün
2016-03-29 19:07         ` [kernel-hardening] " Shuah Khan
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata Mickaël Salaün
2016-03-24 15:47   ` [kernel-hardening] " Casey Schaufler
2016-03-24 16:01   ` Casey Schaufler
2016-03-24 21:31     ` Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 06/17] seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 07/17] seccomp: Add seccomp object checker evaluation Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 08/17] selftest/seccomp: Remove unknown_ret_is_kill_above_allow test Mickaël Salaün
2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 10/17] selftest/seccomp: Add field_is_valid_syscall test Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 11/17] selftest/seccomp: Add argeval_open_whitelist test Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 12/17] audit,seccomp: Extend audit with seccomp state Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 13/17] selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 14/17] selftest/seccomp: Make tracer_poke() more generic Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 15/17] selftest/seccomp: Add argeval_toctou_argument test Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 16/17] security/seccomp: Protect against filesystem TOCTOU Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 17/17] selftest/seccomp: Add argeval_toctou_filesystem test Mickaël Salaün
2016-03-24 16:24 ` [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Kees Cook
2016-03-27  5:03   ` Loganaden Velvindron
2016-04-20 18:21 ` Mickaël Salaün [this message]
2016-04-26 22:46   ` Kees Cook
2016-04-28  2:36 ` Kees Cook
2016-04-28 23:45   ` Mickaël Salaün
2016-05-21 12:58     ` Mickaël Salaün
2016-05-02 22:19   ` James Morris
2016-05-21 15:19   ` Daniel Borkmann
2016-05-22 21:30     ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5717C897.1050508@digikod.net \
    --to=mic@digikod.net \
    --cc=agruenba@redhat.com \
    --cc=arnd@arndb.de \
    --cc=casey@schaufler-ca.com \
    --cc=daniel@iogearbox.net \
    --cc=drysdale@google.com \
    --cc=eparis@redhat.com \
    --cc=james.l.morris@oracle.com \
    --cc=jdike@addtoit.com \
    --cc=jln@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=pmoore@redhat.com \
    --cc=richard@nod.at \
    --cc=sds@tycho.nsa.gov \
    --cc=serge@hallyn.com \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Kernel-hardening Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernel-hardening/0 kernel-hardening/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernel-hardening kernel-hardening/ https://lore.kernel.org/kernel-hardening \
		kernel-hardening@lists.openwall.com
	public-inbox-index kernel-hardening

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.openwall.lists.kernel-hardening


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git