kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
@ 2016-03-24  1:46 Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 01/17] um: Export the sys_call_table Mickaël Salaün
                   ` (11 more replies)
  0 siblings, 12 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Hi,

This series is a proof of concept (not ready for production) to extend seccomp
with the ability to check argument pointers of syscalls as kernel object (e.g.
file path). This add a needed feature to create a full sandbox managed by
userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)

The audience for this RFC is limited to security-related actors to discuss
about this new feature before enlarging the scope to a wider audience. This
aims to focus on the security goal, usability and architecture before entering
into the gory details of each subsystem. I also wish to get constructive
criticisms about the userland API and intrusiveness of the code (and what could
be the other ways to do it better) before going further (and addressing the
TODO and FIXME in the code).

The approach taken is to add the minimum amount of code while still allowing
the userland to create access rules via seccomp. The current limitation of
seccomp is to get raw syscall arguments value but there is no way to
dereference a pointer to check its content (e.g. the first argument of the open
syscall). This seccomp evolution brings a generic way to check against argument
pointer regardless from the syscall unlike current LSMs.

Here is the use case scenario:
* First, a process must load some groups of seccomp checkers. This checkers are
  dedicated structs describing a pointed data (e.g. path). They are
  semantically grouped to be efficiently managed and checked in batch. Each
  group have a static ID. This IDs are unique and they reference groups only
  accessible from the filters created by the same process.
* The loaded checkers are inherited and accessible by the newly created
  filters. This groups can be referenced by filters with a new return value
  SECCOMP_RET_ARGEVAL. Value in  SECCOMP_RET_DATA contains a group ID and an
  argument bitmask. This return value is only meaningful between stacked
  filters to ask a check and get the result in the extended struct
  seccomp_data. The new fields are "is_valid_syscall", "arg_group" containing a
  group ID and "matches[6]" consisting of one 64-bits mask per argument. This
  bitmasks are useful to get the check result of each checker from a group on a
  syscall argument which is handy to create a custom access control engine from
  userland.
* SECCOMP_RET_ARGEVAL is equivalent to SECCOMP_RET_ACCESS except that the
  following filters can take a decision regarding a match (e.g. return EACCESS
  or emulate the syscall).

Each checker is autonomous and new ones can easily be added in the future.
There is currently two checkers for path objects:
* SECCOMP_CHECK_FS_LITERAL checks if a string match a defined path;
* SECCOMP_CHECK_FS_BENEATH checks if the path representation of a string is
  equal or equivalent to a file belonging to a defined path.

This design does not seems too intrusive but is flexible enough to allow a
powerful sandbox mechanism accessible by any process on Linux. The use of
seccomp, including this new feature, is more suitable with the help of a
userland library (e.g. libseccomp) that could help to specify a high-level
language to express a security policy instead of raw syscall rules.

The main concern should be about time-of-check-time-of-use (TOCTOU) race
conditions attacks. Because of the nature of seccomp (executed before the
effective syscall and before a potential ptrace), it is not possible to block
all races but to detect them.

There is still some questions I couldn't answer for sure (grep for FIXME or
XXX). Comments appreciated.

Tested on the x86 and UM architectures in 32 and 64 bits (with audit enabled).

[1] https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/lsm


# Need for LSM

Because the arguments can be checked before the syscall actually evaluate them,
there is two race condition classes:
* The data pointed by the user address is in control of the userland (e.g. a
  tracing process) and is so subject to TOCTOU race conditions between the
  seccomp filter evaluation and the effective resource grabbing (part of each
  syscall code).
* The semantic of the pointed data is also subject to race condition because
  there is no lock on the resource (e.g. file) between the evaluation of the
  argument by the seccomp filter and the use of the pointed resource by each
  part of the syscall code.

The solution to fix these race conditions is to copy the userspace data and to
lock the pointed resource. Whereas it is easy to copy the userspace data, it is
not realistic to lock any pointed resources because of obvious locking issues.
However, it is possible to detect a TOCTOU race condition with the help of LSM
hooks. This way, we can keep a flexible access control (e.g. by controlling
syscall return values) while blocking unattended malicious or bogus userland
behavior (e.g. exploit a race-condition).

To be able to deny access to a malicious userland behavior we must replay the
seccomp filters and verify the intermediate return values to find out if the
filters policy is still respected. Thanks to a cache we can detect if a check
replay is necessary. Otherwise, the LSM hooks are really quick for
non-malicious userland.

# Cache handling

Each time a checker is called, for each argument to check, it get them from
it's seccomp_argeval_checked cache if any, or create a new cache entry and put
it otherwise. This cache entries will be used to evaluate arguments.

When rechecking in the LSM hooks, first it find out which argument is mapped to
the hook check and find if it differ from the corresponding cache entry. If it
match, then return OK without replaying the checks, or if nothing match, replay
all the checks from this check type.

# How to use it

The SECCOMP_ARGFLAG_* help to narrow the rules constraints:
* SECCOMP_ARGFLAG_FS_DENTRY: Check and rely on the path name.
* SECCOMP_ARGFLAG_FS_INODE: Check the data "container" whatever it's path name.
* SECCOMP_ARGFLAG_FS_DEVICE: Check the device (i.e. file system) on which the
  file is, e.g. it can be use to allow access to USB mass-storage or dm-verity
  content only
* SECCOMP_ARGFLAG_FS_MOUNT: Check the file mount point, e.g. can enforce a
  read-only bind mount (but is less flexible than the other checks)
* SECCOMP_ARGFLAG_FS_NOFOLLOW: Check the file without following it if it is a
  symlink. Useful for rename(2) or open(2) with O_NOFOLLOW to have consistent
  check. However, LSM hooks will deny all unattended accesses set by the rules
  ignoring this flag (i.e. it act as a fail-safe).

# Limitations

## Ptrace
If a process can ptrace another one, the tracer can execute whatever syscall it
wants without being constrained by any seccomp filter from the tracee. This
apply for this seccomp extension as well. Any seccomp filter should then deny
the use of ptrace.

The LSM hooks must ensure that the filters results are the same (with the same
arguments) but must not deny any ptraced modifications (e.g. syscall argument
change).

## Stateless access
Unlike current LSMs, the policies are stateless. It's not possible to mark and
track a kernel object (e.g. file descriptor). Capsicum seems more appropriate
for this kind of feature.

## Resource usage
We must limit the resources taken by a filter list, and so the number of rules,
to not allow any process to exhaust the system.


Regards,

Mickaël Salaün (17):
  um: Export the sys_call_table
  seccomp: Fix typo
  selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
  selftest/seccomp: Fix the seccomp(2) signature
  security/seccomp: Add LSM and create arrays of syscall metadata
  seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command
  seccomp: Add seccomp object checker evaluation
  selftest/seccomp: Remove unknown_ret_is_kill_above_allow test
  selftest/seccomp: Extend seccomp_data until matches[6]
  selftest/seccomp: Add field_is_valid_syscall test
  selftest/seccomp: Add argeval_open_whitelist test
  audit,seccomp: Extend audit with seccomp state
  selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read
  selftest/seccomp: Make tracer_poke() more generic
  selftest/seccomp: Add argeval_toctou_argument test
  security/seccomp: Protect against filesystem TOCTOU
  selftest/seccomp: Add argeval_toctou_filesystem test

 arch/x86/um/asm/syscall.h                     |   2 +
 include/asm-generic/vmlinux.lds.h             |  22 +
 include/linux/audit.h                         |  25 ++
 include/linux/compat.h                        |  10 +
 include/linux/lsm_hooks.h                     |   5 +
 include/linux/seccomp.h                       | 136 +++++-
 include/linux/syscalls.h                      |  68 +++
 include/uapi/linux/seccomp.h                  | 105 +++++
 kernel/audit.h                                |   3 +
 kernel/auditsc.c                              |  36 +-
 kernel/fork.c                                 |  13 +-
 kernel/seccomp.c                              | 594 +++++++++++++++++++++++++-
 security/Kconfig                              |   1 +
 security/Makefile                             |   2 +
 security/seccomp/Kconfig                      |  14 +
 security/seccomp/Makefile                     |   3 +
 security/seccomp/checker_fs.c                 | 524 +++++++++++++++++++++++
 security/seccomp/checker_fs.h                 |  18 +
 security/seccomp/lsm.c                        | 135 ++++++
 security/seccomp/lsm.h                        |  19 +
 security/security.c                           |   1 +
 tools/testing/selftests/seccomp/seccomp_bpf.c | 572 +++++++++++++++++++++++--
 22 files changed, 2248 insertions(+), 60 deletions(-)
 create mode 100644 security/seccomp/Kconfig
 create mode 100644 security/seccomp/Makefile
 create mode 100644 security/seccomp/checker_fs.c
 create mode 100644 security/seccomp/checker_fs.h
 create mode 100644 security/seccomp/lsm.c
 create mode 100644 security/seccomp/lsm.h

-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 01/17] um: Export the sys_call_table
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 02/17] seccomp: Fix typo Mickaël Salaün
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
---
 arch/x86/um/asm/syscall.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/um/asm/syscall.h b/arch/x86/um/asm/syscall.h
index 11ab90dc5f14..7f76696d6b1b 100644
--- a/arch/x86/um/asm/syscall.h
+++ b/arch/x86/um/asm/syscall.h
@@ -8,6 +8,8 @@ typedef asmlinkage long (*sys_call_ptr_t)(unsigned long, unsigned long,
 					  unsigned long, unsigned long,
 					  unsigned long, unsigned long);
 
+extern const sys_call_ptr_t sys_call_table[];
+
 static inline int syscall_get_arch(void)
 {
 #ifdef CONFIG_X86_32
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 02/17] seccomp: Fix typo
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 01/17] um: Export the sys_call_table Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC Mickaël Salaün
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
---
 kernel/seccomp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 15a1795bbba1..2c94693e4163 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -915,7 +915,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
 
 	fprog = filter->prog->orig_prog;
 	if (!fprog) {
-		/* This must be a new non-cBPF filter, since we save every
+		/* This must be a new non-cBPF filter, since we save
 		 * every cBPF filter's orig_prog above when
 		 * CONFIG_CHECKPOINT_RESTORE is enabled.
 		 */
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 01/17] um: Export the sys_call_table Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 02/17] seccomp: Fix typo Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24  4:35   ` [kernel-hardening] " Kees Cook
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature Mickaël Salaün
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Rename SECCOMP_FLAG_FILTER_TSYNC to SECCOMP_FILTER_FLAG_TSYNC to match
the UAPI.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index b9453b838162..9c1460f277c2 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1497,8 +1497,8 @@ TEST_F(TRACE_syscall, syscall_dropped)
 #define SECCOMP_SET_MODE_FILTER 1
 #endif
 
-#ifndef SECCOMP_FLAG_FILTER_TSYNC
-#define SECCOMP_FLAG_FILTER_TSYNC 1
+#ifndef SECCOMP_FILTER_FLAG_TSYNC
+#define SECCOMP_FILTER_FLAG_TSYNC 1
 #endif
 
 #ifndef seccomp
@@ -1613,7 +1613,7 @@ TEST(TSYNC_first)
 		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &prog);
 	ASSERT_NE(ENOSYS, errno) {
 		TH_LOG("Kernel does not support seccomp syscall!");
@@ -1831,7 +1831,7 @@ TEST_F(TSYNC, two_siblings_with_ancestor)
 		self->sibling_count++;
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(0, ret) {
 		TH_LOG("Could install filter on all threads!");
@@ -1892,7 +1892,7 @@ TEST_F(TSYNC, two_siblings_with_no_filter)
 		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_NE(ENOSYS, errno) {
 		TH_LOG("Kernel does not support seccomp syscall!");
@@ -1940,7 +1940,7 @@ TEST_F(TSYNC, two_siblings_with_one_divergence)
 		self->sibling_count++;
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(self->sibling[0].system_tid, ret) {
 		TH_LOG("Did not fail on diverged sibling.");
@@ -1992,7 +1992,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
 		TH_LOG("Kernel does not support SECCOMP_SET_MODE_FILTER!");
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(ret, self->sibling[0].system_tid) {
 		TH_LOG("Did not fail on diverged sibling.");
@@ -2021,7 +2021,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
 	/* Switch to the remaining sibling */
 	sib = !sib;
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(0, ret) {
 		TH_LOG("Expected the remaining sibling to sync");
@@ -2044,7 +2044,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
 	while (!kill(self->sibling[sib].system_tid, 0))
 		sleep(0.1);
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(0, ret);  /* just us chickens */
 }
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (2 preceding siblings ...)
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24  4:36   ` [kernel-hardening] " Kees Cook
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata Mickaël Salaün
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 9c1460f277c2..150829dd7998 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1502,10 +1502,10 @@ TEST_F(TRACE_syscall, syscall_dropped)
 #endif
 
 #ifndef seccomp
-int seccomp(unsigned int op, unsigned int flags, struct sock_fprog *filter)
+int seccomp(unsigned int op, unsigned int flags, void *args)
 {
 	errno = 0;
-	return syscall(__NR_seccomp, op, flags, filter);
+	return syscall(__NR_seccomp, op, flags, args);
 }
 #endif
 
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (3 preceding siblings ...)
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24 15:47   ` [kernel-hardening] " Casey Schaufler
  2016-03-24 16:01   ` Casey Schaufler
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 06/17] seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command Mickaël Salaün
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

To avoid userland to make mistakes by misusing a syscall parameter, the
kernel check the type of the syscall parameters (e.g. char pointer). At
compile time we create a memory section (i.e. __syscall_argdesc) with
syscall metadata. At boot time, this section is used to create an array
(i.e. seccomp_syscalls_argdesc) usable to check the syscall arguments.
The same way, another array can be created and used for compat mode.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: David Drysdale <drysdale@google.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Will Drewry <wad@chromium.org>
---
 include/asm-generic/vmlinux.lds.h | 22 ++++++++++
 include/linux/compat.h            | 10 +++++
 include/linux/lsm_hooks.h         |  5 +++
 include/linux/syscalls.h          | 68 ++++++++++++++++++++++++++++++
 security/Kconfig                  |  1 +
 security/Makefile                 |  2 +
 security/seccomp/Kconfig          | 14 +++++++
 security/seccomp/Makefile         |  3 ++
 security/seccomp/lsm.c            | 87 +++++++++++++++++++++++++++++++++++++++
 security/seccomp/lsm.h            | 19 +++++++++
 security/security.c               |  1 +
 11 files changed, 232 insertions(+)
 create mode 100644 security/seccomp/Kconfig
 create mode 100644 security/seccomp/Makefile
 create mode 100644 security/seccomp/lsm.c
 create mode 100644 security/seccomp/lsm.h

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index c4bd0e2c173c..b8792fc083c2 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -153,6 +153,26 @@
 #define TRACE_SYSCALLS()
 #endif
 
+#ifdef CONFIG_SECURITY_SECCOMP
+#define ARGDESC_SYSCALLS() . = ALIGN(8);				\
+			 VMLINUX_SYMBOL(__start_syscalls_argdesc) = .;	\
+			 *(__syscalls_argdesc)				\
+			 VMLINUX_SYMBOL(__stop_syscalls_argdesc) = .;
+
+#ifdef CONFIG_COMPAT
+#define COMPAT_ARGDESC_SYSCALLS() . = ALIGN(8);				\
+		 VMLINUX_SYMBOL(__start_compat_syscalls_argdesc) = .;	\
+		 *(__compat_syscalls_argdesc)				\
+		 VMLINUX_SYMBOL(__stop_compat_syscalls_argdesc) = .;
+#else
+#define COMPAT_ARGDESC_SYSCALLS()
+#endif	/* CONFIG_COMPAT */
+
+#else
+#define ARGDESC_SYSCALLS()
+#define COMPAT_ARGDESC_SYSCALLS()
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 #ifdef CONFIG_SERIAL_EARLYCON
 #define EARLYCON_TABLE() STRUCT_ALIGN();			\
 			 VMLINUX_SYMBOL(__earlycon_table) = .;	\
@@ -511,6 +531,8 @@
 	MEM_DISCARD(init.data)						\
 	KERNEL_CTORS()							\
 	MCOUNT_REC()							\
+	ARGDESC_SYSCALLS()						\
+	COMPAT_ARGDESC_SYSCALLS()					\
 	*(.init.rodata)							\
 	FTRACE_EVENTS()							\
 	TRACE_SYSCALLS()						\
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c9172b2eb..b63579a401e8 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -15,6 +15,7 @@
 #include <linux/fs.h>
 #include <linux/aio_abi.h>	/* for aio_context_t */
 #include <linux/unistd.h>
+#include <linux/syscalls.h>	/* for SYSCALL_FILL_ARGDESC_SECTION */
 
 #include <asm/compat.h>
 #include <asm/siginfo.h>
@@ -28,7 +29,15 @@
 #define __SC_DELOUSE(t,v) ((t)(unsigned long)(v))
 #endif
 
+#ifdef CONFIG_SECURITY_SECCOMP
+#define COMPAT_SYSCALL_FILL_ARGDESC(...)	\
+	SYSCALL_FILL_ARGDESC_SECTION("__compat_syscalls_argdesc", __VA_ARGS__)
+#else
+#define COMPAT_SYSCALL_FILL_ARGDESC(...)
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 #define COMPAT_SYSCALL_DEFINE0(name) \
+	COMPAT_SYSCALL_FILL_ARGDESC(compat_sys_##name, 0)	\
 	asmlinkage long compat_sys_##name(void)
 
 #define COMPAT_SYSCALL_DEFINE1(name, ...) \
@@ -45,6 +54,7 @@
 	COMPAT_SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
 
 #define COMPAT_SYSCALL_DEFINEx(x, name, ...)				\
+	COMPAT_SYSCALL_FILL_ARGDESC(compat_sys##name, x, __VA_ARGS__)	\
 	asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
 		__attribute__((alias(__stringify(compat_SyS##name))));  \
 	static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 71969de4058c..12df41669308 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1892,5 +1892,10 @@ extern void __init yama_add_hooks(void);
 #else
 static inline void __init yama_add_hooks(void) { }
 #endif
+#ifdef CONFIG_SECURITY_SECCOMP
+extern void __init seccomp_init(void);
+#else
+static inline void __init seccomp_init(void) { }
+#endif
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 185815c96433..0f846c408bba 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -79,6 +79,8 @@ union bpf_attr;
 #include <linux/quota.h>
 #include <linux/key.h>
 #include <trace/syscall.h>
+#include <uapi/asm/unistd.h>
+#include <linux/seccomp.h>
 
 /*
  * __MAP - apply a macro to syscall arguments
@@ -98,6 +100,24 @@ union bpf_attr;
 #define __MAP6(m,t,a,...) m(t,a), __MAP5(m,__VA_ARGS__)
 #define __MAP(n,...) __MAP##n(__VA_ARGS__)
 
+#define __COMPARGS6
+#define __COMPARGS5 , 0
+#define __COMPARGS4 , 0, 0
+#define __COMPARGS3 , 0, 0, 0
+#define __COMPARGS2 , 0, 0, 0, 0
+#define __COMPARGS1 , 0, 0, 0, 0, 0
+#define __COMPARGS0 0, 0, 0, 0, 0, 0
+#define __COMPARGS(n) __COMPARGS##n
+
+#define __COMPDECL6
+#define __COMPDECL5
+#define __COMPDECL4
+#define __COMPDECL3
+#define __COMPDECL2
+#define __COMPDECL1
+#define __COMPDECL0 void
+#define __COMPDECL(n) __COMPDECL##n
+
 #define __SC_DECL(t, a)	t a
 #define __TYPE_IS_L(t)	(__same_type((t)0, 0L))
 #define __TYPE_IS_UL(t)	(__same_type((t)0, 0UL))
@@ -175,8 +195,55 @@ extern struct trace_event_functions exit_syscall_print_funcs;
 #define SYSCALL_METADATA(sname, nb, ...)
 #endif
 
+#ifdef CONFIG_SECURITY_SECCOMP
+/*
+ * Do not store the symbole name but the syscall symbole address.
+ * FIXME: Handle aliased symboles (i.e. different name but same address)?
+ *
+ * @addr: syscall address
+ * @args: syscall arguments C type (i.e. __SACT__* values)
+ */
+struct syscall_argdesc {
+	const void *addr;
+	u8 args[6];
+};
+
+/* Syscall Argument C Type (none means no argument) */
+#define __SACT__NONE			0
+#define __SACT__OTHER			1
+#define __SACT__CONST_CHAR_PTR		2
+#define __SACT__CHAR_PTR		3
+
+#define __SC_ARGDESC_TYPE(t, a)						\
+	__builtin_types_compatible_p(typeof(t), const char *) ?		\
+	__SACT__CONST_CHAR_PTR :					\
+	__builtin_types_compatible_p(typeof(t), char *) ?		\
+	__SACT__CHAR_PTR :						\
+	__SACT__OTHER
+
+#define SYSCALL_FILL_ARGDESC_SECTION(_section, sname, nb, ...)		\
+	asmlinkage long sname(__MAP(nb, __SC_DECL, __VA_ARGS__)		\
+			__COMPDECL(nb));				\
+	static struct syscall_argdesc __used				\
+		__attribute__((section(_section)))			\
+		syscall_argdesc_##sname = {				\
+			.addr = sname,					\
+			.args = {					\
+				__MAP(nb, __SC_ARGDESC_TYPE, __VA_ARGS__)\
+				__COMPARGS(nb)				\
+			},						\
+		};
+
+#define SYSCALL_FILL_ARGDESC(...)	\
+	SYSCALL_FILL_ARGDESC_SECTION("__syscalls_argdesc", __VA_ARGS__)
+
+#else
+#define SYSCALL_FILL_ARGDESC(...)
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 #define SYSCALL_DEFINE0(sname)					\
 	SYSCALL_METADATA(_##sname, 0);				\
+	SYSCALL_FILL_ARGDESC(sys_##sname, 0)			\
 	asmlinkage long sys_##sname(void)
 
 #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
@@ -188,6 +255,7 @@ extern struct trace_event_functions exit_syscall_print_funcs;
 
 #define SYSCALL_DEFINEx(x, sname, ...)				\
 	SYSCALL_METADATA(sname, x, __VA_ARGS__)			\
+	SYSCALL_FILL_ARGDESC(sys##sname, x, __VA_ARGS__)	\
 	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 
 #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
diff --git a/security/Kconfig b/security/Kconfig
index e45237897b43..c98fe1a924cd 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -123,6 +123,7 @@ source security/smack/Kconfig
 source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/yama/Kconfig
+source security/seccomp/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/Makefile b/security/Makefile
index c9bfbc84ff50..0e4cdefc4777 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -8,6 +8,7 @@ subdir-$(CONFIG_SECURITY_SMACK)		+= smack
 subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
 subdir-$(CONFIG_SECURITY_YAMA)		+= yama
+subdir-$(CONFIG_SECCOMP_FILTER)		+= seccomp
 
 # always enable default capabilities
 obj-y					+= commoncap.o
@@ -22,6 +23,7 @@ obj-$(CONFIG_AUDIT)			+= lsm_audit.o
 obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
 obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
 obj-$(CONFIG_SECURITY_YAMA)		+= yama/
+obj-$(CONFIG_SECCOMP_FILTER)	+= seccomp/
 obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
 
 # Object integrity file lists
diff --git a/security/seccomp/Kconfig b/security/seccomp/Kconfig
new file mode 100644
index 000000000000..7b0fe649ed89
--- /dev/null
+++ b/security/seccomp/Kconfig
@@ -0,0 +1,14 @@
+config SECURITY_SECCOMP
+	bool "Seccomp LSM support"
+	depends on AUDIT
+	depends on SECCOMP
+	depends on SECURITY
+	default y
+	help
+	  This selects an extension to the Seccomp BPF to be able to filter
+	  syscall arguments as kernel objects (e.g. file path).
+	  This stacked LSM is needed to detect and block race-condition attacks
+	  against argument evaluation (i.e. TOCTOU). Further information can be
+	  found in Documentation/prctl/seccomp_filter.txt .
+
+	  If you are unsure how to answer this question, answer Y.
diff --git a/security/seccomp/Makefile b/security/seccomp/Makefile
new file mode 100644
index 000000000000..f2e848d81138
--- /dev/null
+++ b/security/seccomp/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SECURITY_SECCOMP) := seccomp.o
+
+seccomp-y := lsm.o
diff --git a/security/seccomp/lsm.c b/security/seccomp/lsm.c
new file mode 100644
index 000000000000..93c881724341
--- /dev/null
+++ b/security/seccomp/lsm.c
@@ -0,0 +1,87 @@
+/*
+ * Seccomp Linux Security Module
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/syscall.h>	/* sys_call_table */
+#include <linux/compat.h>
+#include <linux/slab.h>	/* kcalloc() */
+#include <linux/syscalls.h>	/* syscall_argdesc */
+
+#include "lsm.h"
+
+/* TODO: Remove the need for CONFIG_SYSFS dependency */
+
+struct syscall_argdesc (*seccomp_syscalls_argdesc)[] = NULL;
+#ifdef CONFIG_COMPAT
+struct syscall_argdesc (*compat_seccomp_syscalls_argdesc)[] = NULL;
+#endif	/* CONFIG_COMPAT */
+
+static const struct syscall_argdesc *__init
+find_syscall_argdesc(const struct syscall_argdesc *start,
+		const struct syscall_argdesc *stop, const void *addr)
+{
+	if (unlikely(!addr || !start || !stop)) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	for (; start < stop; start++) {
+		if (start->addr == addr)
+			return start;
+	}
+	return NULL;
+}
+
+static inline void __init init_argdesc(void)
+{
+	const struct syscall_argdesc *argdesc;
+	const void *addr;
+	int i;
+
+	seccomp_syscalls_argdesc = kcalloc(NR_syscalls,
+			sizeof((*seccomp_syscalls_argdesc)[0]), GFP_KERNEL);
+	if (unlikely(!seccomp_syscalls_argdesc)) {
+		WARN_ON(1);
+		return;
+	}
+	for (i = 0; i < NR_syscalls; i++) {
+		addr = sys_call_table[i];
+		argdesc = find_syscall_argdesc(__start_syscalls_argdesc,
+				__stop_syscalls_argdesc, addr);
+		if (!argdesc)
+			continue;
+
+		(*seccomp_syscalls_argdesc)[i] = *argdesc;
+	}
+
+#ifdef CONFIG_COMPAT
+	compat_seccomp_syscalls_argdesc = kcalloc(IA32_NR_syscalls,
+			sizeof((*compat_seccomp_syscalls_argdesc)[0]),
+			GFP_KERNEL);
+	if (unlikely(!compat_seccomp_syscalls_argdesc)) {
+		WARN_ON(1);
+		return;
+	}
+	for (i = 0; i < IA32_NR_syscalls; i++) {
+		addr = ia32_sys_call_table[i];
+		argdesc = find_syscall_argdesc(__start_compat_syscalls_argdesc,
+				__stop_compat_syscalls_argdesc, addr);
+		if (!argdesc)
+			continue;
+
+		(*compat_seccomp_syscalls_argdesc)[i] = *argdesc;
+	}
+#endif	/* CONFIG_COMPAT */
+}
+
+void __init seccomp_init(void)
+{
+	pr_info("seccomp: Becoming ready for sandboxing\n");
+	init_argdesc();
+}
diff --git a/security/seccomp/lsm.h b/security/seccomp/lsm.h
new file mode 100644
index 000000000000..ededbd27c225
--- /dev/null
+++ b/security/seccomp/lsm.h
@@ -0,0 +1,19 @@
+/*
+ * Seccomp Linux Security Module
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/syscalls.h>	/* syscall_argdesc */
+
+extern const struct syscall_argdesc __start_syscalls_argdesc[];
+extern const struct syscall_argdesc __stop_syscalls_argdesc[];
+
+#ifdef CONFIG_COMPAT
+extern const struct syscall_argdesc __start_compat_syscalls_argdesc[];
+extern const struct syscall_argdesc __stop_compat_syscalls_argdesc[];
+#endif	/* CONFIG_COMPAT */
diff --git a/security/security.c b/security/security.c
index e8ffd92ae2eb..76e50345cd82 100644
--- a/security/security.c
+++ b/security/security.c
@@ -60,6 +60,7 @@ int __init security_init(void)
 	 */
 	capability_add_hooks();
 	yama_add_hooks();
+	seccomp_init();
 
 	/*
 	 * Load all the remaining security modules.
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 06/17] seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (4 preceding siblings ...)
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 07/17] seccomp: Add seccomp object checker evaluation Mickaël Salaün
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

A new command SECCOMP_ADD_CHECKER_GROUP allows userland seccomp filters
to reference kernel objects with checkers in a batch.

Each checker is autonomous and new ones can easily be added in the
future. There is currently two checkers for path objects:
* SECCOMP_CHECK_FS_LITERAL checks if a string match a defined path;
* SECCOMP_CHECK_FS_BENEATH checks if the path representation of a string
  is equal or equivalent to a file belonging to a defined path.

These checkers can use a bitmask of flags to match a path:
* SECCOMP_OBJFLAG_FS_DENTRY match a unique file;
* SECCOMP_OBJFLAG_FS_INODE only match a file inode (must be used with
  the device flag);
* SECCOMP_OBJFLAG_FS_DEVICE match the device of a file;
* SECCOMP_OBJFLAG_FS_MOUNT match the mount point of a file;
* SECCOMP_OBJFLAG_FS_NOFOLLOW do not follow a symlink for the
  initial checker evaluation.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Drysdale <drysdale@google.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk@man7.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Will Drewry <wad@chromium.org>
---
 include/linux/seccomp.h       |  32 ++++++
 include/uapi/linux/seccomp.h  |  81 ++++++++++++++++
 kernel/seccomp.c              | 221 ++++++++++++++++++++++++++++++++++++++++++
 security/seccomp/Makefile     |   2 +-
 security/seccomp/checker_fs.c | 102 +++++++++++++++++++
 5 files changed, 437 insertions(+), 1 deletion(-)
 create mode 100644 security/seccomp/checker_fs.c

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 2296e6b2f690..78f5861a0328 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -9,8 +9,10 @@
 
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
+#include <linux/path.h>
 
 struct seccomp_filter;
+struct seccomp_filter_checker_group;
 /**
  * struct seccomp - the state of a seccomp'ed process
  *
@@ -19,12 +21,20 @@ struct seccomp_filter;
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *          accessed without locking during system call entry.
  *
+ * @checker_group: an append-only list of argument checkers usable by filters
+ *                 created after the last update.
+ *
  *          @filter must only be accessed from the context of current as there
  *          is no read locking.
  */
 struct seccomp {
 	int mode;
 	struct seccomp_filter *filter;
+
+#ifdef CONFIG_SECURITY_SECCOMP
+	/* @checker_group is only used for filter creation and unique per thread */
+	struct seccomp_filter_checker_group *checker_group;
+#endif
 };
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
@@ -85,6 +95,28 @@ static inline int seccomp_mode(struct seccomp *s)
 #ifdef CONFIG_SECCOMP_FILTER
 extern void put_seccomp_filter(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
+
+#ifdef CONFIG_SECURITY_SECCOMP
+struct seccomp_filter_object_path {
+	u32 flags;
+	struct path path;
+};
+
+struct seccomp_filter_checker {
+	/* e.g. SECCOMP_ARGCHECK_FS_LITERAL */
+	u32 check;
+	/* e.g. SECCOMP_ARGTYPE_PATH */
+	u32 type;
+	union {
+		struct seccomp_filter_object_path object_path;
+	};
+};
+
+
+long seccomp_set_argcheck_fs(const struct seccomp_checker *,
+			     struct seccomp_filter_checker *);
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 #else  /* CONFIG_SECCOMP_FILTER */
 static inline void put_seccomp_filter(struct task_struct *tsk)
 {
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a43ff1e..ca7e9343f3d7 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,6 +13,7 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
+#define SECCOMP_ADD_CHECKER_GROUP	2 /* add a group of checkers */
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
@@ -35,6 +36,25 @@
 #define SECCOMP_RET_ACTION	0x7fff0000U
 #define SECCOMP_RET_DATA	0x0000ffffU
 
+/* Object checks */
+#define SECCOMP_CHECK_FS_LITERAL	1
+#define SECCOMP_CHECK_FS_BENEATH	2
+
+/* Object flags */
+#define SECCOMP_OBJFLAG_FS_DENTRY	(1 << 0)
+#define SECCOMP_OBJFLAG_FS_INODE	(1 << 1)
+#define SECCOMP_OBJFLAG_FS_DEVICE	(1 << 2)
+#define SECCOMP_OBJFLAG_FS_MOUNT	(1 << 3)
+/* Do the evaluation follow the argument path? (cf. fs/namei.c)
+ * This flag is only used for the seccomp filter but not by the LSM check to
+ * enforce access control. You need to take care of the different path
+ * interpretation per syscall (e.g. rename(2) or open(2) with O_NOFOLLOW).
+ */
+#define SECCOMP_OBJFLAG_FS_NOFOLLOW	(1 << 4)
+
+/* Argument types */
+#define SECCOMP_OBJTYPE_PATH		1
+
 /**
  * struct seccomp_data - the format the BPF program executes over.
  * @nr: the system call number
@@ -51,4 +71,65 @@ struct seccomp_data {
 	__u64 args[6];
 };
 
+/* TODO: Add a "at" field (default to AT_FDCWD) */
+struct seccomp_object_path {
+	/* e.g. SECCOMP_OBJFLAG_FS_DENTRY */
+	__u32 flags;
+	const char *path;
+};
+
+struct seccomp_checker {
+	__u32 check;
+	__u32 type;
+	/* Must match the checker extra size, if any */
+	unsigned int len;
+	/* Checkers must be pointers to allow futur additions */
+	union {
+		const struct seccomp_object_path *object_path;
+	};
+};
+
+#define SECCOMP_MAKE_PATH_DENTRY(_p)				\
+	{							\
+		.flags = SECCOMP_OBJFLAG_FS_DENTRY,		\
+		.path = _p,					\
+	}
+
+#define SECCOMP_MAKE_PATH_INODE(_p)				\
+	{							\
+		.flags = SECCOMP_OBJFLAG_FS_INODE |		\
+			SECCOMP_OBJFLAG_FS_DEVICE,		\
+		.path = _p,					\
+	}
+
+#define SECCOMP_MAKE_PATH_MOUNT(_p)				\
+	{							\
+		.flags = SECCOMP_OBJFLAG_FS_MOUNT,		\
+		.path = _p,					\
+	}
+
+#define SECCOMP_MAKE_PATH_ALL(_p)				\
+	{							\
+		.flags = SECCOMP_OBJFLAG_FS_DENTRY |		\
+			SECCOMP_OBJFLAG_FS_INODE |		\
+			SECCOMP_OBJFLAG_FS_DEVICE |		\
+			SECCOMP_OBJFLAG_FS_MOUNT,		\
+		.path = _p,					\
+	}
+
+#define SECCOMP_MAKE_OBJ_PATH(_c, _p)				\
+	{							\
+		.check = SECCOMP_CHECK_##_c,			\
+		.type = SECCOMP_OBJTYPE_PATH,			\
+		.len = 0,					\
+		.object_path = _p,				\
+	}
+
+struct seccomp_checker_group {
+	__u8 version;
+	__u8 id;
+	unsigned int len;
+	const struct seccomp_checker (*checkers)[];
+};
+
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 2c94693e4163..0e5471d2891c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -6,6 +6,8 @@
  * Copyright (C) 2012 Google, Inc.
  * Will Drewry <wad@chromium.org>
  *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
  * This defines a simple but solid secure-computing facility.
  *
  * Mode 1 uses a fixed list of allowed system calls.
@@ -60,6 +62,34 @@ struct seccomp_filter {
 	struct bpf_prog *prog;
 };
 
+/* Argument group attached to seccomp filters
+ *
+ * @usage keep track of the references
+ * @prev link to the previous checker_group
+ * @id is given by userland to easely check a filter statically and not
+ *     leak data from the kernel
+ * @checkers_len is the number of @checkers elements
+ * @checkers contains the checkers
+ *
+ * seccomp_filter_checker_group checkers are organized in a tree linked via the
+ * @prev pointer. For any task, it appears to be a singly-linked list starting
+ * with current->seccomp.filter->checker_group, the most recently added argument
+ * group. All filters created by a process share the argument groups created by
+ * this process until the filter creation but they can not be changed. However,
+ * multiple argument groups may share a @prev node, which results in a
+ * unidirectional tree existing in memory. They are not inherited through
+ * fork().
+ */
+#ifdef CONFIG_SECURITY_SECCOMP
+struct seccomp_filter_checker_group {
+	atomic_t usage;
+	struct seccomp_filter_checker_group *prev;
+	u8 id;
+	unsigned int checkers_len;
+	struct seccomp_filter_checker checkers[];
+};
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
@@ -467,6 +497,38 @@ void get_seccomp_filter(struct task_struct *tsk)
 	atomic_inc(&orig->usage);
 }
 
+#ifdef CONFIG_SECURITY_SECCOMP
+/* Do not free @checker */
+static void put_seccomp_obj(struct seccomp_filter_checker *checker)
+{
+	switch (checker->type) {
+	case SECCOMP_OBJTYPE_PATH:
+		/* Pointer checks done in path_put() */
+		path_put(&checker->object_path.path);
+		break;
+	default:
+		WARN_ON(1);
+	}
+}
+
+/* Free @checker_group */
+static void put_seccomp_checker_group(struct seccomp_filter_checker_group *checker_group)
+{
+	int i;
+	struct seccomp_filter_checker_group *orig = checker_group;
+
+	/* Clean up single-reference branches iteratively. */
+	while (orig && atomic_dec_and_test(&orig->usage)) {
+		struct seccomp_filter_checker_group *freeme = orig;
+
+		for (i = 0; i < freeme->checkers_len; i++)
+			put_seccomp_obj(&freeme->checkers[i]);
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 static inline void seccomp_filter_free(struct seccomp_filter *filter)
 {
 	if (filter) {
@@ -485,6 +547,9 @@ void put_seccomp_filter(struct task_struct *tsk)
 		orig = orig->prev;
 		seccomp_filter_free(freeme);
 	}
+#ifdef CONFIG_SECURITY_SECCOMP
+	put_seccomp_checker_group(tsk->seccomp.checker_group);
+#endif
 }
 
 /**
@@ -813,6 +878,158 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
 }
 #endif
 
+#ifdef CONFIG_SECURITY_SECCOMP
+
+/* Limit checkers number to 64 to be able to show matches with a bitmask. */
+#define SECCOMP_CHECKERS_MAX 64
+
+/* Limit arg group list and their checkers to 256KB. */
+#define SECCOMP_GROUP_CHECKERS_MAX_SIZE (1 << 18)
+
+static long seccomp_add_checker_group(unsigned int flags, const char __user *group)
+{
+	struct seccomp_checker_group kgroup;
+	struct seccomp_checker (*kcheckers)[], *user_checker;
+	struct seccomp_filter_checker_group *filter_group, *walker;
+	struct seccomp_filter_checker *kernel_obj;
+	unsigned int i;
+	unsigned long group_size, kcheckers_size, full_group_size;
+	long result;
+
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(),
+				     current_user_ns(), CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+	if (flags != 0 || !group)
+		return -EINVAL;
+
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		struct compat_seccomp_checker_group kgroup32;
+
+		if (copy_from_user(&kgroup32, group, sizeof(kgroup32)))
+			return -EFAULT;
+		kgroup.version = kgroup32.version;
+		kgroup.id = kgroup32.id;
+		kgroup.len = kgroup32.len;
+		kgroup.checkers = compat_ptr(kgroup32.checkers);
+	} else			/* Falls through to the if below */
+#endif /* CONFIG_COMPAT */
+	if (copy_from_user(&kgroup, group, sizeof(kgroup)))
+		return -EFAULT;
+
+	if (kgroup.version != 1)
+		return -EINVAL;
+	/* The group ID 0 means no evaluated checkers */
+	if (kgroup.id == 0)
+		return -EINVAL;
+	if (kgroup.len == 0)
+		return -EINVAL;
+	if (kgroup.len > SECCOMP_CHECKERS_MAX)
+		return -E2BIG;
+
+	/* Validate resulting checker_group ID and length. */
+	group_size = sizeof(*filter_group) +
+		kgroup.len * sizeof(filter_group->checkers[0]);
+	full_group_size = group_size;
+	for (walker = current->seccomp.checker_group;
+			walker; walker = walker->prev) {
+		if (walker->id == kgroup.id)
+			return -EINVAL;
+		/* TODO: add penalty? */
+		full_group_size += sizeof(*walker) +
+			walker->checkers_len * sizeof(walker->checkers[0]);
+	}
+	if (full_group_size > SECCOMP_GROUP_CHECKERS_MAX_SIZE)
+		return -ENOMEM;
+
+	kcheckers_size = kgroup.len * sizeof((*kcheckers)[0]);
+	kcheckers = kmalloc(kcheckers_size, GFP_KERNEL);
+	if (!kcheckers)
+		return -ENOMEM;
+
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		unsigned int i, kcheckers32_size;
+		struct compat_seccomp_checker (*kcheckers32)[];
+
+		kcheckers32_size = kgroup.len * sizeof((*kcheckers32)[0]);
+		kcheckers32 = kmalloc(kcheckers32_size, GFP_KERNEL);
+		if (!kcheckers32) {
+			result = -ENOMEM;
+			goto free_kcheckers;
+		}
+		if (copy_from_user(kcheckers32, kgroup.checkers, kcheckers32_size)) {
+			kfree(kcheckers32);
+			result = -EFAULT;
+			goto free_kcheckers;
+		}
+		for (i = 0; i < kgroup.len; i++) {
+			(*kcheckers)[i].check = (*kcheckers32)[i].check;
+			(*kcheckers)[i].type = (*kcheckers32)[i].type;
+			(*kcheckers)[i].len = (*kcheckers32)[i].len;
+			(*kcheckers)[i].object_path = compat_ptr((*kcheckers32)[i].checker);
+		}
+		kfree(kcheckers32);
+	} else			/* Falls through to the if below */
+#endif /* CONFIG_COMPAT */
+	if (copy_from_user(kcheckers, kgroup.checkers, kcheckers_size)) {
+		result = -EFAULT;
+		goto free_kcheckers;
+	}
+
+	/* filter_group->checkers must be zeroed to correctly be freed on error */
+	filter_group = kzalloc(group_size, GFP_KERNEL);
+	if (!filter_group) {
+		result = -ENOMEM;
+		goto free_kcheckers;
+	}
+	filter_group->prev = NULL;
+	filter_group->id = kgroup.id;
+	filter_group->checkers_len = kgroup.len;
+	for (i = 0; i < filter_group->checkers_len; i++) {
+		user_checker = &(*kcheckers)[i];
+		kernel_obj = &filter_group->checkers[i];
+		switch (user_checker->check) {
+		case SECCOMP_CHECK_FS_LITERAL:
+		case SECCOMP_CHECK_FS_BENEATH:
+			kernel_obj->check = user_checker->check;
+			result =
+			    seccomp_set_argcheck_fs(user_checker, kernel_obj);
+			if (result)
+				goto free_group;
+			break;
+		default:
+			result = -EINVAL;
+			goto free_group;
+		}
+	}
+
+	atomic_set(&filter_group->usage, 1);
+	filter_group->prev = current->seccomp.checker_group;
+	/* No need to update filter_group->prev->usage because it get one
+	 * reference from this filter but lose one from
+	 * current->seccomp.checker_group.
+	 */
+	current->seccomp.checker_group = filter_group;
+	/* XXX: Return the number of groups? */
+	result = 0;
+	goto free_kcheckers;
+
+free_group:
+	for (i = 0; i < filter_group->checkers_len; i++) {
+		kernel_obj = &filter_group->checkers[i];
+		if (kernel_obj->type)
+			put_seccomp_obj(kernel_obj);
+	}
+	kfree(filter_group);
+
+free_kcheckers:
+	kfree(kcheckers);
+	return result;
+}
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 /* Common entry point for both prctl and syscall. */
 static long do_seccomp(unsigned int op, unsigned int flags,
 		       const char __user *uargs)
@@ -824,6 +1041,10 @@ static long do_seccomp(unsigned int op, unsigned int flags,
 		return seccomp_set_mode_strict();
 	case SECCOMP_SET_MODE_FILTER:
 		return seccomp_set_mode_filter(flags, uargs);
+#ifdef CONFIG_SECURITY_SECCOMP
+	case SECCOMP_ADD_CHECKER_GROUP:
+		return seccomp_add_checker_group(flags, uargs);
+#endif /* CONFIG_SECURITY_SECCOMP */
 	default:
 		return -EINVAL;
 	}
diff --git a/security/seccomp/Makefile b/security/seccomp/Makefile
index f2e848d81138..1ed68b23a922 100644
--- a/security/seccomp/Makefile
+++ b/security/seccomp/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SECURITY_SECCOMP) := seccomp.o
 
-seccomp-y := lsm.o
+seccomp-y := lsm.o checker_fs.o
diff --git a/security/seccomp/checker_fs.c b/security/seccomp/checker_fs.c
new file mode 100644
index 000000000000..c11efc892de5
--- /dev/null
+++ b/security/seccomp/checker_fs.c
@@ -0,0 +1,102 @@
+/*
+ * Seccomp Linux Security Module - File System Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/compat.h>
+#include <linux/namei.h>	/* user_lpath() */
+#include <linux/path.h>
+#include <linux/seccomp.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>	/* copy_from_user() */
+
+#ifdef CONFIG_COMPAT
+/* struct seccomp_object_path */
+struct compat_seccomp_object_path {
+	__u32 flags;
+	compat_uptr_t path;	/* const char * */
+};
+#endif
+
+static const u32 path_flags_mask_literal =
+	SECCOMP_OBJFLAG_FS_DENTRY |
+	SECCOMP_OBJFLAG_FS_INODE |
+	SECCOMP_OBJFLAG_FS_DEVICE |
+	SECCOMP_OBJFLAG_FS_MOUNT |
+	SECCOMP_OBJFLAG_FS_NOFOLLOW;
+
+static const u32 path_flags_mask_beneath =
+	SECCOMP_OBJFLAG_FS_DENTRY |
+	SECCOMP_OBJFLAG_FS_INODE |
+	SECCOMP_OBJFLAG_FS_NOFOLLOW;
+
+/* Return true for any error, or false if flags are OK. */
+static bool wrong_check_flags(u32 check, u32 flags)
+{
+	u32 path_flags_mask;
+
+	/* Do not allow insecure check: inode without device */
+	if ((flags & SECCOMP_OBJFLAG_FS_INODE) &&
+	    !(flags & SECCOMP_OBJFLAG_FS_DEVICE))
+		return true;
+
+	switch (check) {
+	case SECCOMP_CHECK_FS_LITERAL:
+		path_flags_mask = path_flags_mask_literal;
+		break;
+	case SECCOMP_CHECK_FS_BENEATH:
+		path_flags_mask = path_flags_mask_beneath;
+		break;
+	default:
+		WARN_ON(1);
+		return true;
+	}
+	/* Need at least one flag, but only in the allowed mask */
+	return !(flags & path_flags_mask) ||
+		((flags | path_flags_mask) != path_flags_mask);
+}
+
+static long set_argtype_path(const struct seccomp_checker *user_checker,
+			     struct seccomp_filter_checker *kernel_checker)
+{
+	struct seccomp_object_path user_cp;
+
+	/* @len is not used for @object_path */
+	if (user_checker->len != 0)
+		return -EINVAL;
+
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		struct compat_seccomp_object_path user_cp32;
+
+		if (copy_from_user(&user_cp32, user_checker->object_path, sizeof(user_cp32)))
+			return -EFAULT;
+		user_cp.flags = user_cp32.flags;
+		user_cp.path = compat_ptr(user_cp32.path);
+	} else			/* Falls through to the if below */
+#endif
+	if (copy_from_user(&user_cp, user_checker->object_path, sizeof(user_cp)))
+		return -EFAULT;
+
+	if (wrong_check_flags(kernel_checker->check, user_cp.flags))
+		return -EINVAL;
+	kernel_checker->object_path.flags = user_cp.flags;
+	/* Do not follow symlinks for objects */
+	return user_lpath(user_cp.path, &kernel_checker->object_path.path);
+}
+
+long seccomp_set_argcheck_fs(const struct seccomp_checker *user_checker,
+			     struct seccomp_filter_checker *kernel_checker)
+{
+	switch (user_checker->type) {
+	case SECCOMP_OBJTYPE_PATH:
+		kernel_checker->type = user_checker->type;
+		return set_argtype_path(user_checker, kernel_checker);
+	}
+	return -EINVAL;
+}
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 07/17] seccomp: Add seccomp object checker evaluation
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (5 preceding siblings ...)
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 06/17] seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 08/17] selftest/seccomp: Remove unknown_ret_is_kill_above_allow test Mickaël Salaün
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

This brings a new seccomp filters return type named SECCOMP_RET_ARGEVAL.
It is equivalent to SECCOMP_RET_ACCESS except that the next stacked
filters can take a decision regarding a match (e.g. return EACCESS or
emulate a syscall).
SECCOMP_RET_ARGEVAL is a special return value only usable between
filters evaluation and can't reach userland.

Userland can create a seccomp BPF returning SECCOMP_RET_ARGEVAL and two
values in the 16 least significant bits:
* a group ID to use a checker batch against the current syscall;
* a syscall argument bitmask to give specific arguments to the checkers.

The checker group IDs are unique per filter creator (i.e. thread). They
remains private and can only be referenced by the filters created
consecutively to a checker group creation by the same userland thread.

Three fields are added to the struct seccomp_data:
* is_valid_syscall allows seccomp filters to check if the current
  syscall number is known by the kernel.
* arg_group is the checker group ID asked by the previous filter.
* matches[6] consist of one 64-bits mask per matched argument. This
  bitmasks are useful to get the check result of each object from a
  group on a syscall argument.

A path cache is used to protect against time-of-check-time-of-use
(TOCTOU) race conditions attacks on userland addresses. For now, this
cache use the audit framework. It must then be enabled at boot time.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Drysdale <drysdale@google.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk@man7.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Will Drewry <wad@chromium.org>
---
 include/linux/seccomp.h       |  77 +++++++++++-
 include/uapi/linux/seccomp.h  |  24 ++++
 kernel/fork.c                 |  11 +-
 kernel/seccomp.c              | 251 ++++++++++++++++++++++++++++++++++++++-
 security/seccomp/checker_fs.c | 269 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 623 insertions(+), 9 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 78f5861a0328..0c5468f78945 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -13,6 +13,8 @@
 
 struct seccomp_filter;
 struct seccomp_filter_checker_group;
+struct seccomp_argeval_cache;
+
 /**
  * struct seccomp - the state of a seccomp'ed process
  *
@@ -34,6 +36,9 @@ struct seccomp {
 #ifdef CONFIG_SECURITY_SECCOMP
 	/* @checker_group is only used for filter creation and unique per thread */
 	struct seccomp_filter_checker_group *checker_group;
+
+	/* syscall-lifetime data */
+	struct seccomp_argeval_cache *arg_cache;
 #endif
 };
 
@@ -93,10 +98,12 @@ static inline int seccomp_mode(struct seccomp *s)
 #endif /* CONFIG_SECCOMP */
 
 #ifdef CONFIG_SECCOMP_FILTER
-extern void put_seccomp_filter(struct task_struct *tsk);
+extern void put_seccomp(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
 
 #ifdef CONFIG_SECURITY_SECCOMP
+extern void flush_seccomp_cache(struct task_struct *tsk);
+
 struct seccomp_filter_object_path {
 	u32 flags;
 	struct path path;
@@ -112,20 +119,86 @@ struct seccomp_filter_checker {
 	};
 };
 
+/** seccomp_argrule_t - Argument rule matcher
+ * e.g. seccomp_argrule_path_literal()
+ * This prototype get the whole syscall argument picture to be able to get the
+ * sementic from multiple arguments (e.g. pointer plus size of the pointed
+ * data, which can indicated by @argrule).
+ *
+ * Return which arguments match @argdesc.
+ *
+ * @argdesc: Pointer to the argument type description.
+ * @args: Pointer to an array of the (max) six arguments. Can use them thanks
+ *	to @argdesc.
+ * @to_check: Which arguments are asked to check; should at least have one to
+ *	make sense.
+ * @argrule: The rule to check on @args.
+ */
+typedef u8 seccomp_argrule_t(const u8(*argdesc)[6],
+			     const u64(*args)[6], u8 to_check,
+			     const struct seccomp_filter_checker *checker);
+
+/* seccomp LSM */
+
+seccomp_argrule_t *get_argrule_checker(u32 check);
+struct syscall_argdesc *syscall_nr_to_argdesc(int nr);
+
+/**
+ * struct seccomp_argeval_cache_fs
+ *
+ * @hash_len: refer to the hashlen field from struct qstr.
+ */
+struct seccomp_argeval_cache_fs {
+	struct path *path;
+	u64 hash_len;
+};
+
+/**
+ * struct seccomp_argeval_cache_entry
+ *
+ * To be consistent with the filters checks, we only check the original
+ * arguments but not those put by a tracer process, if any.
+ *
+ * Because the cache is uptr-oriented, it is possible to have the same dentry
+ * in multiple cache entries (but with different uptr).
+ */
+struct seccomp_argeval_cache_entry {
+	const void __user *uptr;
+	u8 args;
+	union {
+		struct seccomp_argeval_cache_fs fs;
+	};
+	struct seccomp_argeval_cache_entry *next;
+};
+
+struct seccomp_argeval_cache {
+	/* e.g. SECCOMP_ARGTYPE_PATH */
+	u32 type;
+	struct seccomp_argeval_cache_entry *entry;
+	struct seccomp_argeval_cache *next;
+};
+
+void put_seccomp_filter_checker(struct seccomp_filter_checker *);
+
+u8 seccomp_argrule_path(const u8(*)[6], const u64(*)[6], u8,
+			const struct seccomp_filter_checker *);
 
 long seccomp_set_argcheck_fs(const struct seccomp_checker *,
 			     struct seccomp_filter_checker *);
+
 #endif /* CONFIG_SECURITY_SECCOMP */
 
 #else  /* CONFIG_SECCOMP_FILTER */
-static inline void put_seccomp_filter(struct task_struct *tsk)
+static inline void put_seccomp(struct task_struct *tsk)
 {
 	return;
 }
+
 static inline void get_seccomp_filter(struct task_struct *tsk)
 {
 	return;
 }
+
 #endif /* CONFIG_SECCOMP_FILTER */
 
 #if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index ca7e9343f3d7..36d9be535249 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -32,9 +32,15 @@
 #define SECCOMP_RET_TRACE	0x7ff00000U /* pass to a tracer or disallow */
 #define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
 
+/* Intermediate return values */
+#define SECCOMP_RET_ARGEVAL	0x80ff0000U /* trigger argument evaluation */
+
 /* Masks for the return value sections. */
+#define SECCOMP_RET_INTER	0xffff0000U
 #define SECCOMP_RET_ACTION	0x7fff0000U
 #define SECCOMP_RET_DATA	0x0000ffffU
+#define SECCOMP_RET_CHECKER_GROUP	0x000000ffU
+#define SECCOMP_RET_ARG_MATCHES	0x00003f00U
 
 /* Object checks */
 #define SECCOMP_CHECK_FS_LITERAL	1
@@ -57,20 +63,38 @@
 
 /**
  * struct seccomp_data - the format the BPF program executes over.
+ *
+ * Userland can find the seccomp_data version with the struct length (i.e.
+ * BPF_LEN) and offsetof(struct seccomp_data, <field>) + sizeof(<field-type>).
+
  * @nr: the system call number
  * @arch: indicates system call convention as an AUDIT_ARCH_* value
  *        as defined in <linux/audit.h>.
  * @instruction_pointer: at the time of the system call.
  * @args: up to 6 system call arguments always stored as 64-bit values
  *        regardless of the architecture.
+ * @is_valid_syscall: set to 1 if the syscall exists and was found or 0
+ *                    otherwise (needed for argument type resolution).
+ * @checker_group: checker group selected by the previously executed filter
+ *                 (only the 8 least significant bits are used).
+ * @arg_matches: 6 bitmasks indicating which argument checkers matched the
+ *               system call arguments.
  */
 struct seccomp_data {
 	int nr;
 	__u32 arch;
 	__u64 instruction_pointer;
 	__u64 args[6];
+	__u32 is_valid_syscall; /* SECCOMP_DATA_VALIDSYS_PRESENT */
+	__u32 checker_group; /* SECCOMP_DATA_ARGEVAL_PRESENT */
+	__u64 arg_matches[6]; /* SECCOMP_DATA_ARGEVAL_PRESENT */
 };
 
+/* Up to seccomp_data.is_valid_syscall */
+#define SECCOMP_DATA_VALIDSYS_PRESENT	1
+/* Up to seccomp_data.arg_matches[6] */
+#define SECCOMP_DATA_ARGEVAL_PRESENT	1
+
 /* TODO: Add a "at" field (default to AT_FDCWD) */
 struct seccomp_object_path {
 	/* e.g. SECCOMP_OBJFLAG_FS_DENTRY */
diff --git a/kernel/fork.c b/kernel/fork.c
index 2e391c754ae7..b8155ebdd308 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -226,7 +226,7 @@ void free_task(struct task_struct *tsk)
 	free_thread_info(tsk->stack);
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
-	put_seccomp_filter(tsk);
+	put_seccomp(tsk);
 	arch_release_task_struct(tsk);
 	free_task_struct(tsk);
 }
@@ -359,7 +359,11 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 	 * the usage counts on the error path calling free_task.
 	 */
 	tsk->seccomp.filter = NULL;
-#endif
+#ifdef CONFIG_SECURITY_SECCOMP
+	tsk->seccomp.checker_group = NULL;
+	tsk->seccomp.arg_cache = NULL;
+#endif /* CONFIG_SECURITY_SECCOMP */
+#endif /* CONFIG_SECCOMP */
 
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
@@ -1175,7 +1179,8 @@ static void copy_seccomp(struct task_struct *p)
 
 	/* Ref-count the new filter user, and assign it. */
 	get_seccomp_filter(current);
-	p->seccomp = current->seccomp;
+	p->seccomp.mode = current->seccomp.mode;
+	p->seccomp.filter = current->seccomp.filter;
 
 	/*
 	 * Explicitly enable no_new_privs here in case it got set
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0e5471d2891c..60e11863857e 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -23,6 +23,13 @@
 #include <linux/slab.h>
 #include <linux/syscalls.h>
 
+#include <linux/bitops.h>	/* BIT_ULL() */
+#include <linux/fs.h>
+#include <linux/fs_struct.h>
+#include <linux/mount.h>
+#include <linux/namei.h>	/* user_lpath*() path_put() */
+#include <linux/path.h>
+
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
 #include <asm/syscall.h>
 #endif
@@ -34,6 +41,33 @@
 #include <linux/security.h>
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
+#include <linux/ftrace.h>	/* for arch_syscall_match_sym_name() overload */
+
+#ifdef CONFIG_SECURITY_SECCOMP
+#include <linux/kernel.h>	/* FIELD_SIZEOF() */
+
+#ifdef CONFIG_COMPAT
+/* struct seccomp_checker_group */
+struct compat_seccomp_checker_group {
+	__u8 version;
+	__u8 id;
+	unsigned int len;
+	compat_uptr_t checkers;	/* const struct seccomp_checker (*)[] */
+};
+
+/* struct seccomp_checker */
+struct compat_seccomp_checker {
+	__u32 check;
+	__u32 type;
+	unsigned int len;
+	compat_uptr_t checker;	/* const struct seccomp_object_path * */
+};
+
+extern struct syscall_argdesc (*compat_seccomp_syscalls_argdesc)[];
+#endif /* CONFIG_COMPAT */
+
+extern struct syscall_argdesc (*seccomp_syscalls_argdesc)[];
+#endif /* CONFIG_SECURITY_SECCOMP */
 
 /**
  * struct seccomp_filter - container for seccomp BPF programs
@@ -46,6 +80,8 @@
  * @len: the number of instructions in the program
  * @insnsi: the BPF program instructions to evaluate
  *
+ * @checker_group: the list of argument checkers usable by a filter
+ *
  * seccomp_filter objects are organized in a tree linked via the @prev
  * pointer.  For any task, it appears to be a singly-linked list starting
  * with current->seccomp.filter, the most recently attached or inherited filter.
@@ -60,6 +96,9 @@ struct seccomp_filter {
 	atomic_t usage;
 	struct seccomp_filter *prev;
 	struct bpf_prog *prog;
+#ifdef CONFIG_SECURITY_SECCOMP
+	struct seccomp_filter_checker_group *checker_group;
+#endif
 };
 
 /* Argument group attached to seccomp filters
@@ -93,6 +132,18 @@ struct seccomp_filter_checker_group {
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
+static void clean_seccomp_data(struct seccomp_data *sd)
+{
+	sd->is_valid_syscall = 0;
+	sd->checker_group = 0;
+	sd->arg_matches[0] = 0ULL;
+	sd->arg_matches[1] = 0ULL;
+	sd->arg_matches[2] = 0ULL;
+	sd->arg_matches[3] = 0ULL;
+	sd->arg_matches[4] = 0ULL;
+	sd->arg_matches[5] = 0ULL;
+}
+
 /*
  * Endianness is explicitly ignored and left for BPF program authors to manage
  * as per the specific architecture.
@@ -113,6 +164,7 @@ static void populate_seccomp_data(struct seccomp_data *sd)
 	sd->args[4] = args[4];
 	sd->args[5] = args[5];
 	sd->instruction_pointer = KSTK_EIP(task);
+	clean_seccomp_data(sd);
 }
 
 /**
@@ -197,6 +249,136 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 	return 0;
 }
 
+#ifdef CONFIG_SECURITY_SECCOMP
+seccomp_argrule_t *get_argrule_checker(u32 check)
+{
+	switch (check) {
+	case SECCOMP_CHECK_FS_LITERAL:
+	case SECCOMP_CHECK_FS_BENEATH:
+		return seccomp_argrule_path;
+	}
+	return NULL;
+}
+
+struct syscall_argdesc *syscall_nr_to_argdesc(int nr)
+{
+	unsigned int nr_syscalls;
+	struct syscall_argdesc (*seccomp_sa)[];
+
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		nr_syscalls = IA32_NR_syscalls;
+		seccomp_sa = compat_seccomp_syscalls_argdesc;
+	} else /* falls below */
+#endif	/* CONFIG_COMPAT */
+	{
+		nr_syscalls = NR_syscalls;
+		seccomp_sa = seccomp_syscalls_argdesc;
+	}
+
+	if (nr >= nr_syscalls || nr < 0)
+		return NULL;
+	if (unlikely(!seccomp_sa)) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	return &(*seccomp_sa)[nr];
+}
+
+/* Return the argument group address that match the group ID, or NULL
+ * otherwise.
+ */
+static struct seccomp_filter_checker_group *seccomp_update_argrule_data(
+		struct seccomp_filter *filter,
+		struct seccomp_data *sd, u16 ret_data)
+{
+	int i, j;
+	u8 match;
+	struct seccomp_filter_checker_group *walker, *checker_group = NULL;
+	const struct syscall_argdesc *argdesc;
+	struct seccomp_filter_checker *checker;
+	seccomp_argrule_t *engine;
+
+	const u8 group_id = ret_data & SECCOMP_RET_CHECKER_GROUP;
+	const u8 to_check = (ret_data & SECCOMP_RET_ARG_MATCHES) >> 8;
+
+	clean_seccomp_data(sd);
+
+	/* Find the matching group in those accessible to this filter */
+	for (walker = filter->checker_group; walker; walker = walker->prev) {
+		if (walker->id == group_id) {
+			checker_group = walker;
+			break;
+		}
+	}
+	if (!checker_group)
+		return NULL;
+	sd->checker_group = checker_group->id;
+
+	argdesc = syscall_nr_to_argdesc(sd->nr);
+	if (!argdesc)
+		return checker_group;
+	sd->is_valid_syscall = 1;
+
+	for (i = 0; i < checker_group->checkers_len; i++) {
+		checker = &checker_group->checkers[i];
+		engine = get_argrule_checker(checker->check);
+		if (engine) {
+			match = (*engine)(&argdesc->args, &sd->args, to_check, checker);
+
+			for (j = 0; j < 6; j++) {
+				sd->arg_matches[j] |=
+				    ((BIT_ULL(j) & match) >> j) << i;
+			}
+		}
+	}
+	return checker_group;
+}
+
+static void free_seccomp_argeval_cache_entry(u32 type,
+					     struct seccomp_argeval_cache_entry
+					     *entry)
+{
+	while (entry) {
+		struct seccomp_argeval_cache_entry *freeme = entry;
+
+		switch (type) {
+		case SECCOMP_OBJTYPE_PATH:
+			if (entry->fs.path) {
+				/* Pointer checks done in path_put() */
+				path_put(entry->fs.path);
+				kfree(entry->fs.path);
+			}
+			break;
+		default:
+			WARN_ON(1);
+		}
+		entry = entry->next;
+		kfree(freeme);
+	}
+}
+
+static void free_seccomp_argeval_cache(struct seccomp_argeval_cache *arg_cache)
+{
+	while (arg_cache) {
+		struct seccomp_argeval_cache *freeme = arg_cache;
+
+		free_seccomp_argeval_cache_entry(arg_cache->type, arg_cache->entry);
+		arg_cache = arg_cache->next;
+		kfree(freeme);
+	}
+}
+
+void flush_seccomp_cache(struct task_struct *tsk)
+{
+	free_seccomp_argeval_cache(tsk->seccomp.arg_cache);
+	tsk->seccomp.arg_cache = NULL;
+}
+#endif /* CONFIG_SECURITY_SECCOMP */
+
+static void put_seccomp_filter(struct task_struct *tsk);
+
 /**
  * seccomp_run_filters - evaluates all seccomp filters against @syscall
  * @syscall: number of the current system call
@@ -205,6 +387,9 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
  */
 static u32 seccomp_run_filters(struct seccomp_data *sd)
 {
+#ifdef CONFIG_SECURITY_SECCOMP
+	struct seccomp_filter_checker_group *walker, *arg_match = NULL;
+#endif
 	struct seccomp_data sd_local;
 	u32 ret = SECCOMP_RET_ALLOW;
 	/* Make sure cross-thread synced filter points somewhere sane. */
@@ -219,17 +404,54 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
 		populate_seccomp_data(&sd_local);
 		sd = &sd_local;
 	}
+#ifdef CONFIG_SECURITY_SECCOMP
+	/* Cleanup old (syscall-lifetime) cache */
+	flush_seccomp_cache(current);
+#endif
 
 	/*
 	 * All filters in the list are evaluated and the lowest BPF return
 	 * value always takes priority (ignoring the DATA).
 	 */
 	for (; f; f = f->prev) {
-		u32 cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
+		u32 cur_ret;
+
+#ifdef CONFIG_SECURITY_SECCOMP
+		if (arg_match) {
+			bool found = false;
+
+			/* Find if the argument group is accessible from this filter */
+			for (walker = f->checker_group; walker; walker = walker->prev) {
+				if (walker == arg_match) {
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+				clean_seccomp_data(sd);
+		}
+#endif /* CONFIG_SECURITY_SECCOMP */
+		cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
 
-		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
+#ifdef CONFIG_SECURITY_SECCOMP
+		/* Intermediate return values */
+		if ((cur_ret & SECCOMP_RET_INTER) == SECCOMP_RET_ARGEVAL) {
+			/* XXX: sd modification /!\ */
+			arg_match = seccomp_update_argrule_data(f, sd,
+					(cur_ret & SECCOMP_RET_DATA));
+		} else if (arg_match) {
+			clean_seccomp_data(sd);
+			arg_match = NULL;
+		}
+#endif /* CONFIG_SECURITY_SECCOMP */
+
+		if ((cur_ret & SECCOMP_RET_INTER) < (ret & SECCOMP_RET_ACTION))
 			ret = cur_ret;
 	}
+#ifdef CONFIG_SECURITY_SECCOMP
+	if (arg_match && sd != &sd_local)
+		clean_seccomp_data(sd);
+#endif /* CONFIG_SECURITY_SECCOMP */
 	return ret;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
@@ -407,6 +629,13 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(ret);
 	}
 
+#ifdef CONFIG_SECURITY_SECCOMP
+	sfilter->checker_group =
+		lockless_dereference(current->seccomp.checker_group);
+	if (sfilter->checker_group)
+		atomic_inc(&sfilter->checker_group->usage);
+#endif /* CONFIG_SECURITY_SECCOMP */
+
 	atomic_set(&sfilter->usage, 1);
 
 	return sfilter;
@@ -532,13 +761,16 @@ static void put_seccomp_checker_group(struct seccomp_filter_checker_group *check
 static inline void seccomp_filter_free(struct seccomp_filter *filter)
 {
 	if (filter) {
+#ifdef CONFIG_SECURITY_SECCOMP
+		put_seccomp_checker_group(filter->checker_group);
+#endif /* CONFIG_SECURITY_SECCOMP */
 		bpf_prog_destroy(filter->prog);
 		kfree(filter);
 	}
 }
 
 /* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
-void put_seccomp_filter(struct task_struct *tsk)
+static void put_seccomp_filter(struct task_struct *tsk)
 {
 	struct seccomp_filter *orig = tsk->seccomp.filter;
 	/* Clean up single-reference branches iteratively. */
@@ -547,7 +779,14 @@ void put_seccomp_filter(struct task_struct *tsk)
 		orig = orig->prev;
 		seccomp_filter_free(freeme);
 	}
+}
+
+void put_seccomp(struct task_struct *tsk)
+{
+	put_seccomp_filter(tsk);
 #ifdef CONFIG_SECURITY_SECCOMP
+	/* Free in that order because of referenced checkers */
+	free_seccomp_argeval_cache(tsk->seccomp.arg_cache);
 	put_seccomp_checker_group(tsk->seccomp.checker_group);
 #endif
 }
@@ -673,6 +912,9 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
 	case SECCOMP_RET_TRACE:
 		return filter_ret;  /* Save the rest for phase 2. */
 
+	case SECCOMP_RET_ARGEVAL:
+		/* Handled in seccomp_run_filters() */
+		BUG();
 	case SECCOMP_RET_ALLOW:
 		return SECCOMP_PHASE1_OK;
 
@@ -881,7 +1123,8 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
 #ifdef CONFIG_SECURITY_SECCOMP
 
 /* Limit checkers number to 64 to be able to show matches with a bitmask. */
-#define SECCOMP_CHECKERS_MAX 64
+#define SECCOMP_CHECKERS_MAX \
+	(FIELD_SIZEOF(struct seccomp_data, arg_matches[0]) * BITS_PER_BYTE)
 
 /* Limit arg group list and their checkers to 256KB. */
 #define SECCOMP_GROUP_CHECKERS_MAX_SIZE (1 << 18)
diff --git a/security/seccomp/checker_fs.c b/security/seccomp/checker_fs.c
index c11efc892de5..0a5ec3a204e7 100644
--- a/security/seccomp/checker_fs.c
+++ b/security/seccomp/checker_fs.c
@@ -8,11 +8,13 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/bitops.h>	/* BIT() */
 #include <linux/compat.h>
 #include <linux/namei.h>	/* user_lpath() */
 #include <linux/path.h>
 #include <linux/seccomp.h>
 #include <linux/slab.h>
+#include <linux/syscalls.h>	/* __SACT__CONST_CHAR_PTR */
 #include <linux/uaccess.h>	/* copy_from_user() */
 
 #ifdef CONFIG_COMPAT
@@ -61,6 +63,273 @@ static bool wrong_check_flags(u32 check, u32 flags)
 		((flags | path_flags_mask) != path_flags_mask);
 }
 
+/* This cache prohibit TOCTOU race conditions between seccomp filter checks and
+ * LSM hooks checks, i.e. the dereferenced data is kept in cache and only
+ * dereferenced once for the whole syscall lifetime.
+ *
+ * Ignore @follow_symlink if @str_path match a cache entry (i.e. do not store
+ * @follow_symlink in the cache).
+ */
+static const struct path *get_cache_path(const char __user *str_path,
+					 bool follow_symlink, u8 arg_nr)
+{
+	struct path *path = NULL;
+	u64 hash_len = 0;
+	struct filename *name;
+	struct seccomp_argeval_cache_entry **entry = NULL;
+	struct seccomp_argeval_cache **arg_cache = &current->seccomp.arg_cache;
+	bool new_cache = false;
+
+	/* Find a cache entry matching @str_path */
+	while (*arg_cache) {
+		if ((*arg_cache)->type == SECCOMP_OBJTYPE_PATH) {
+			entry = &(*arg_cache)->entry;
+			while (*entry) {
+				if ((*entry)->uptr == str_path) {
+					/* Add this argument to the cache */
+					(*entry)->args |= BIT(arg_nr);
+					return (*entry)->fs.path;
+				}
+				entry = &(*entry)->next;
+			}
+			break;
+		}
+		arg_cache = &(*arg_cache)->next;
+	}
+
+	/* Save @str_path to avoid syscall argument TOCTOU race condition
+	 * thanks to the audit_names list for the current audit context (cf.
+	 * __audit_reusename).
+	 * @name will be freed with audit_syscall_exit(), audit_free() or
+	 * audit_seccomp_exit().
+	 */
+	name = getname(str_path);
+	if (IS_ERR(name))
+		return NULL;
+
+	path = kmalloc(sizeof(*path), GFP_KERNEL);
+	if (path) {
+		int ret;
+
+		/* @follow_symlink is only evaluated for the first cache entry */
+		if (follow_symlink)
+			ret = user_path(str_path, path);
+		else
+			ret = user_lpath(str_path, path);
+		if (ret) {
+			/* Store invalid path entry as well */
+			kfree(path);
+			path = NULL;
+		} else {
+			/* FIXME: How not to make this racy because of possible
+			 * concurrent dentry update by other task?
+			 */
+			hash_len = path->dentry->d_name.hash_len;
+		}
+	} else {
+		return NULL;
+	}
+
+	/* Append a new cache entry for @str_path */
+	if (!*arg_cache) {
+		*arg_cache = kmalloc(sizeof(**arg_cache), GFP_KERNEL);
+		if (!*arg_cache)
+			goto free_path;
+		new_cache = true;
+		(*arg_cache)->type = SECCOMP_OBJTYPE_PATH;
+		(*arg_cache)->next = NULL;
+		entry = &(*arg_cache)->entry;
+	}
+	*entry = kmalloc(sizeof(**entry), GFP_KERNEL);
+	if (!*entry)
+		goto free_cache;
+	(*entry)->uptr = str_path;
+	(*entry)->args = BIT(arg_nr);
+	(*entry)->fs.path = path;
+	(*entry)->fs.hash_len = hash_len;
+	(*entry)->next = NULL;
+	return (*entry)->fs.path;
+
+free_cache:
+	if (new_cache) {
+		/* It is not mandatory to free the cache because it is linked */
+		kfree(*arg_cache);
+		*arg_cache = NULL;
+	}
+
+free_path:
+	kfree(path);
+	return NULL;
+}
+
+#define EQUAL_NOT_NULL(a, b) (a && a == b)
+
+static bool check_path_literal(const struct path *p1, const struct path *p2,
+			       u32 flags)
+{
+	bool result_dentry = !(flags & SECCOMP_OBJFLAG_FS_DENTRY);
+	bool result_inode = !(flags & SECCOMP_OBJFLAG_FS_INODE);
+	bool result_device = !(flags & SECCOMP_OBJFLAG_FS_DEVICE);
+	bool result_mount = !(flags & SECCOMP_OBJFLAG_FS_MOUNT);
+
+	if (unlikely(!p1 || !p2)) {
+		WARN_ON(1);
+		return false;
+	}
+
+	if (!result_dentry && p1->dentry == p2->dentry)
+		result_dentry = true;
+	/* XXX: Use d_inode_rcu() instead? */
+	if (!result_inode
+	    && EQUAL_NOT_NULL(d_inode(p1->dentry)->i_ino,
+			      d_inode(p2->dentry)->i_ino))
+		result_inode = true;
+	/* Check superblock instead of device major/minor */
+	if (!result_device
+	    && EQUAL_NOT_NULL(d_inode(p1->dentry)->i_sb,
+			      d_inode(p2->dentry)->i_sb))
+		result_device = true;
+	if (!result_mount && EQUAL_NOT_NULL(p1->mnt, p2->mnt))
+		result_mount = true;
+
+	return result_dentry && result_inode && result_device && result_mount;
+}
+
+static bool check_path_beneath(const struct path *p1, const struct path *p2,
+			       u32 flags)
+{
+	struct path walker = {
+		/* Mount can't be checked here */
+		.mnt = NULL,
+		.dentry = NULL,
+	};
+
+	if (unlikely(!p1 || !p2)) {
+		WARN_ON(1);
+		return false;
+	}
+
+	/* Meanigless mount and device checks are not in flags thanks to
+	 * previous call to wrong_check_flags().
+	 */
+	if (unlikely((flags | path_flags_mask_beneath)
+				!= path_flags_mask_beneath)) {
+		WARN_ON(1);
+		return false;
+	}
+
+	for (walker.dentry = p2->dentry; !IS_ROOT(walker.dentry);
+			walker.dentry = walker.dentry->d_parent) {
+		if (check_path_literal(p1, &walker, flags))
+			return true;
+	}
+	return false;
+}
+
+/* Must be called with a locked path->dentry */
+static bool argrule_match_path(const struct seccomp_filter_checker *checker,
+			       const struct path *arg)
+{
+	const struct seccomp_filter_object_path *object_path;
+
+	if (unlikely(!checker || !arg)) {
+		WARN_ON(1);
+		return false;
+	}
+
+	switch (checker->type) {
+	case SECCOMP_OBJTYPE_PATH:
+		object_path = &checker->object_path;
+		if (unlikely(!object_path->path.dentry)) {
+			WARN_ON(1);
+			return false;
+		}
+
+		/* Comparing mnt+pathname is not enough because pivot_root can
+		 * remove a path prefix; could be used to allow access to a
+		 * subdirectory with bind mounting and pivot-rooting to
+		 * simulate the initial mnt+pathname configuration.
+		 *
+		 * The check should allow to compare bind-mounted files and
+		 * keep the user's path semantic.
+		 */
+		switch (checker->check) {
+		case SECCOMP_CHECK_FS_LITERAL:
+			return check_path_literal(&object_path->path, arg,
+						  object_path->flags);
+		case SECCOMP_CHECK_FS_BENEATH:
+			return check_path_beneath(&object_path->path, arg,
+						  object_path->flags);
+		default:
+			WARN_ON(1);
+			return false;
+		}
+	default:
+		WARN_ON(1);
+	}
+	return false;
+}
+
+/* Return matched checks. */
+u8 seccomp_argrule_path(const u8(*argdesc)[6], const u64(*args)[6],
+			u8 to_check,
+			const struct seccomp_filter_checker *checker)
+{
+	int i;
+	const char __user *str_path;
+	const struct path *path;
+	u8 ret = 0;
+	bool follow_symlink;
+
+	if (unlikely(!argdesc || !args || !checker)) {
+		WARN_ON(1);
+		goto out;
+	}
+	switch (checker->check) {
+	case SECCOMP_CHECK_FS_LITERAL:
+	case SECCOMP_CHECK_FS_BENEATH:
+		break;
+	default:
+		WARN_ON(1);
+		goto out;
+	}
+
+	if (wrong_check_flags(checker->check, checker->object_path.flags)) {
+		WARN_ON(1);
+		goto out;
+	}
+	follow_symlink = !(checker->object_path.flags & SECCOMP_OBJFLAG_FS_NOFOLLOW);
+
+	/* XXX: Add a whole cache lock? */
+	for (i = 0; i < 6; i++) {
+		if (!(BIT(i) & to_check))
+			continue;
+		if ((*argdesc)[i] != __SACT__CONST_CHAR_PTR)
+			continue;
+#ifdef CONFIG_COMPAT
+		if (is_compat_task()) {
+			str_path = compat_ptr((*args)[i]);
+		} else	/* Falls below */
+#endif
+		str_path = (const char __user *)((unsigned long)(*args)[i]);
+		/* Path are interpreted differently according to each syscall:
+		 * some follow symlinks whereas other don't (cf.
+		 * linux/namei.h:user_*path*).
+		 */
+		/* XXX: Do we need to check mnt/namespace? */
+		path = get_cache_path(str_path, follow_symlink, i);
+		if (!path)
+			continue;
+		spin_lock(&path->dentry->d_lock);
+		if (argrule_match_path(checker, path))
+			ret |= BIT(i);
+		spin_unlock(&path->dentry->d_lock);
+	}
+
+out:
+	return ret;
+}
+
 static long set_argtype_path(const struct seccomp_checker *user_checker,
 			     struct seccomp_filter_checker *kernel_checker)
 {
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 08/17] selftest/seccomp: Remove unknown_ret_is_kill_above_allow test
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (6 preceding siblings ...)
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 07/17] seccomp: Add seccomp object checker evaluation Mickaël Salaün
@ 2016-03-24  1:46 ` Mickaël Salaün
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  1:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

This is not relevant anymore because of SECCOMP_RET_INTER.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 150829dd7998..023717bf3185 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -363,28 +363,6 @@ TEST_SIGNAL(unknown_ret_is_kill_inside, SIGSYS)
 	}
 }
 
-/* return code >= 0x80000000 is unused. */
-TEST_SIGNAL(unknown_ret_is_kill_above_allow, SIGSYS)
-{
-	struct sock_filter filter[] = {
-		BPF_STMT(BPF_RET|BPF_K, 0x90000000U),
-	};
-	struct sock_fprog prog = {
-		.len = (unsigned short)ARRAY_SIZE(filter),
-		.filter = filter,
-	};
-	long ret;
-
-	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
-	ASSERT_EQ(0, ret);
-
-	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
-	ASSERT_EQ(0, ret);
-	EXPECT_EQ(0, syscall(__NR_getpid)) {
-		TH_LOG("getpid() shouldn't ever return");
-	}
-}
-
 TEST_SIGNAL(KILL_all, SIGSYS)
 {
 	struct sock_filter filter[] = {
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6]
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (7 preceding siblings ...)
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 08/17] selftest/seccomp: Remove unknown_ret_is_kill_above_allow test Mickaël Salaün
@ 2016-03-24  2:53 ` Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 10/17] selftest/seccomp: Add field_is_valid_syscall test Mickaël Salaün
                     ` (7 more replies)
  2016-03-24 16:24 ` [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Kees Cook
                   ` (2 subsequent siblings)
  11 siblings, 8 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:53 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 023717bf3185..edaa405111aa 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -84,13 +84,21 @@ struct seccomp_data {
 	__u32 arch;
 	__u64 instruction_pointer;
 	__u64 args[6];
+	__u32 is_valid_syscall; /* SECCOMP_DATA_VALIDSYS_PRESENT */
+	__u32 checker_group; /* SECCOMP_DATA_ARGEVAL_PRESENT */
+	__u64 arg_matches[6]; /* SECCOMP_DATA_ARGEVAL_PRESENT */
 };
+
+#define SECCOMP_DATA_ARGEVAL_PRESENT
 #endif
 
 #if __BYTE_ORDER == __LITTLE_ENDIAN
 #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
+#define match_arg(_n) (offsetof(struct seccomp_data, arg_matches[_n]))
 #elif __BYTE_ORDER == __BIG_ENDIAN
 #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]) + sizeof(__u32))
+#define match_arg(_n) \
+	(offsetof(struct seccomp_data, arg_matches[_n]) + sizeof(__u32))
 #else
 #error "wut? Unknown __BYTE_ORDER?!"
 #endif
@@ -502,7 +510,11 @@ TEST_SIGNAL(KILL_one_arg_six, SIGSYS)
 TEST(arg_out_of_range)
 {
 	struct sock_filter filter[] = {
+#ifdef SECCOMP_DATA_ARGEVAL_PRESENT
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS, match_arg(6)),
+#else
 		BPF_STMT(BPF_LD|BPF_W|BPF_ABS, syscall_arg(6)),
+#endif
 		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
 	};
 	struct sock_fprog prog = {
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 10/17] selftest/seccomp: Add field_is_valid_syscall test
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
@ 2016-03-24  2:53   ` Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 11/17] selftest/seccomp: Add argeval_open_whitelist test Mickaël Salaün
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:53 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Test the new seccomp_data field: is_valid_syscall.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 31 +++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index edaa405111aa..8b1a6bfc64a1 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -2208,6 +2208,37 @@ TEST(syscall_restart)
 		_metadata->passed = 0;
 }
 
+#ifdef SECCOMP_DATA_ARGEVAL_PRESENT
+TEST(field_is_valid_syscall)
+{
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+				offsetof(struct seccomp_data, nr)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getpid, 1, 0),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+				offsetof(struct seccomp_data, is_valid_syscall)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, 1, 1, 0),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | EINVAL),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)ARRAY_SIZE(filter),
+		.filter = filter,
+	};
+
+	ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+	}
+	EXPECT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog)) {
+		TH_LOG("Failed to install filter!");
+	}
+
+	EXPECT_EQ(-1, syscall(__NR_getpid));
+	EXPECT_EQ(EINVAL, errno);
+}
+#endif /* SECCOMP_DATA_ARGEVAL_PRESENT */
+
 /*
  * TODO:
  * - add microbenchmarks
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 11/17] selftest/seccomp: Add argeval_open_whitelist test
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 10/17] selftest/seccomp: Add field_is_valid_syscall test Mickaël Salaün
@ 2016-03-24  2:53   ` Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 12/17] audit,seccomp: Extend audit with seccomp state Mickaël Salaün
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:53 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Test a basic sandbox adding a checker group and using it for path
filtering.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 114 ++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 8b1a6bfc64a1..49c5d39c30a4 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -2237,6 +2237,120 @@ TEST(field_is_valid_syscall)
 	EXPECT_EQ(-1, syscall(__NR_getpid));
 	EXPECT_EQ(EINVAL, errno);
 }
+
+#define PATH_DEV_NULL "/dev/null"
+#define PATH_DEV_ZERO "/dev/zero"
+
+/* The sandbox0 allow opening only @allowed_path */
+void apply_sandbox0(struct __test_metadata *_metadata, const char *allowed_path)
+{
+	struct sock_filter filter0[] = {
+		/* Only care about open(2) */
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+				offsetof(struct seccomp_data, nr)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_open, 0, 1),
+		/* Check the objects of group 5 matching the first argument */
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ARGEVAL | 1 << 8 | 5),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+	};
+	struct sock_fprog prog0 = {
+		.len = (unsigned short)ARRAY_SIZE(filter0),
+		.filter = filter0,
+	};
+	struct sock_filter filter1[] = {
+		/* Does not need to check for arch nor syscall number because
+		 * of the @checker_group check
+		 */
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+				offsetof(struct seccomp_data, checker_group)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, 5, 1, 0),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+		/* Kill if not a valid syscall (unknown open‽) */
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+				offsetof(struct seccomp_data, is_valid_syscall)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, 1, 1, 0),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
+		/* Denied access if the first argument was not validated by the
+		 * checker.
+		 */
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS, match_arg(0)),
+		/* Match the first two checkers, if any */
+		BPF_JUMP(BPF_JMP|BPF_JSET|BPF_K, 3, 0, 1),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+		/* Use an impossible errno value to ensure it comes from our
+		 * filter (should be EACCES most of the time).
+		 */
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG),
+	};
+	struct sock_fprog prog1 = {
+		.len = (unsigned short)ARRAY_SIZE(filter1),
+		.filter = filter1,
+	};
+	struct seccomp_object_path path0 = SECCOMP_MAKE_PATH_DENTRY(allowed_path);
+	struct seccomp_checker checker0[] = {
+		SECCOMP_MAKE_OBJ_PATH(FS_LITERAL, &path0),
+	};
+	/* Group 5 */
+	struct seccomp_checker_group checker_group0 = {
+		.version = 1,
+		.id = 5,
+		.len = ARRAY_SIZE(checker0),
+		.checkers = &checker0,
+	};
+
+	/* Set up the test sandbox */
+	ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+	}
+	/* Load the path checkers */
+	EXPECT_EQ(0, seccomp(SECCOMP_ADD_CHECKER_GROUP, 0, &checker_group0)) {
+		TH_LOG("Failed to add checker group!");
+	}
+	/* Load filters in reverse order */
+	EXPECT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog1)) {
+		TH_LOG("Failed to install filter!");
+	}
+	EXPECT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER,
+				SECCOMP_FILTER_FLAG_TSYNC, &prog0)) {
+		TH_LOG("Failed to install filter!");
+	}
+}
+
+TEST(argeval_open_whitelist)
+{
+	int fd;
+
+	/* Validate the first test file */
+	fd = open(PATH_DEV_ZERO, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open " PATH_DEV_ZERO);
+	}
+	close(fd);
+
+	/* Validate the second test file */
+	fd = open(PATH_DEV_NULL, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open " PATH_DEV_NULL);
+	}
+	close(fd);
+
+	apply_sandbox0(_metadata, PATH_DEV_ZERO);
+
+	/* Allowed file */
+	fd = open(PATH_DEV_ZERO, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open " PATH_DEV_ZERO);
+	}
+	close(fd);
+
+	/* Denied file (by the filter) */
+	fd = open(PATH_DEV_NULL, O_RDONLY);
+	EXPECT_EQ(-1, fd) {
+		TH_LOG("Could open " PATH_DEV_NULL);
+	}
+	EXPECT_EQ(E2BIG, errno);
+	close(fd);
+}
 #endif /* SECCOMP_DATA_ARGEVAL_PRESENT */
 
 /*
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 12/17] audit,seccomp: Extend audit with seccomp state
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 10/17] selftest/seccomp: Add field_is_valid_syscall test Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 11/17] selftest/seccomp: Add argeval_open_whitelist test Mickaël Salaün
@ 2016-03-24  2:53   ` Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 13/17] selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read Mickaël Salaün
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:53 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Extend the audit framework to known if we are in a seccomp filter
evaluation or not.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 include/linux/audit.h | 25 +++++++++++++++++++++++++
 kernel/audit.h        |  3 +++
 kernel/auditsc.c      | 36 ++++++++++++++++++++++++++++++++++--
 kernel/seccomp.c      | 21 ++++++++++++++++++---
 4 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b40ed5df5542..480df19473d9 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -230,6 +230,8 @@ extern void __audit_free(struct task_struct *task);
 extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
 				  unsigned long a2, unsigned long a3);
 extern void __audit_syscall_exit(int ret_success, long ret_value);
+extern void __audit_seccomp_entry(void);
+extern void __audit_seccomp_exit(int do_free);
 extern struct filename *__audit_reusename(const __user char *uptr);
 extern void __audit_getname(struct filename *name);
 
@@ -270,17 +272,40 @@ static inline void audit_syscall_exit(void *pt_regs)
 		__audit_syscall_exit(success, return_code);
 	}
 }
+
+static inline void audit_seccomp_entry(void)
+{
+	if (unlikely(current->audit_context))
+		__audit_seccomp_entry();
+}
+
+static inline void audit_seccomp_exit(int do_free)
+{
+	if (unlikely(current->audit_context))
+		__audit_seccomp_exit(do_free);
+}
+
 static inline struct filename *audit_reusename(const __user char *name)
 {
 	if (unlikely(!audit_dummy_context()))
 		return __audit_reusename(name);
+#ifdef CONFIG_SECURITY_SECCOMP
+	if (current->audit_context)
+		return __audit_reusename(name);
+#endif /* CONFIG_SECURITY_SECCOMP */
 	return NULL;
 }
+
 static inline void audit_getname(struct filename *name)
 {
 	if (unlikely(!audit_dummy_context()))
 		__audit_getname(name);
+#ifdef CONFIG_SECURITY_SECCOMP
+	else if (current->audit_context)
+		__audit_getname(name);
+#endif /* CONFIG_SECURITY_SECCOMP */
 }
+
 static inline void audit_inode(struct filename *name,
 				const struct dentry *dentry,
 				unsigned int parent) {
diff --git a/kernel/audit.h b/kernel/audit.h
index cbbe6bb6496e..c63ce77b44ae 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -108,6 +108,9 @@ struct audit_proctitle {
 struct audit_context {
 	int		    dummy;	/* must be the first element */
 	int		    in_syscall;	/* 1 if task is in a syscall */
+#ifdef CONFIG_SECURITY_SECCOMP
+	int		    in_seccomp;	/* 1 if task is in seccomp */
+#endif
 	enum audit_state    state, current_state;
 	unsigned int	    serial;     /* serial number for record */
 	int		    major;      /* syscall number */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 195ffaee50b9..dd1d9f4b1c61 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1501,7 +1501,10 @@ void __audit_syscall_entry(int major, unsigned long a1, unsigned long a2,
 	if (!context)
 		return;
 
-	BUG_ON(context->in_syscall || context->name_count);
+	BUG_ON(context->in_syscall);
+#ifndef CONFIG_SECURITY_SECCOMP
+	BUG_ON(context->name_count);
+#endif /* CONFIG_SECURITY_SECCOMP */
 
 	if (!audit_enabled)
 		return;
@@ -1580,6 +1583,35 @@ void __audit_syscall_exit(int success, long return_code)
 	tsk->audit_context = context;
 }
 
+void __audit_seccomp_entry(void)
+{
+	struct audit_context *context = current->audit_context;
+
+	if (!context)
+		return;
+	BUG_ON(context->in_seccomp || context->name_count);
+	if (!audit_enabled)
+		return;
+
+	context->in_seccomp = 1;
+}
+
+void __audit_seccomp_exit(int do_free)
+{
+	struct audit_context *context = current->audit_context;
+
+	if (!context)
+		return;
+	BUG_ON(!context->in_seccomp);
+	if (!audit_enabled)
+		return;
+
+	BUG_ON(!context->in_seccomp);
+	context->in_seccomp = 0;
+	if (do_free)
+		audit_free_names(context);
+}
+
 static inline void handle_one(const struct inode *inode)
 {
 #ifdef CONFIG_AUDIT_TREE
@@ -1728,7 +1760,7 @@ void __audit_getname(struct filename *name)
 	struct audit_context *context = current->audit_context;
 	struct audit_names *n;
 
-	if (!context->in_syscall)
+	if (!context->in_syscall && !context->in_seccomp)
 		return;
 
 	n = audit_alloc_name(context, AUDIT_TYPE_UNKNOWN);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 60e11863857e..a8a6ba31ecc4 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -881,7 +881,7 @@ int __secure_computing(void)
 static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
 {
 	u32 filter_ret, action;
-	int data;
+	int data, ret;
 
 	/*
 	 * Make sure that any changes to mode from another thread have
@@ -889,6 +889,9 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
 	 */
 	rmb();
 
+	/* Enable caching */
+	audit_seccomp_entry();
+
 	filter_ret = seccomp_run_filters(sd);
 	data = filter_ret & SECCOMP_RET_DATA;
 	action = filter_ret & SECCOMP_RET_ACTION;
@@ -910,13 +913,15 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
 		goto skip;
 
 	case SECCOMP_RET_TRACE:
-		return filter_ret;  /* Save the rest for phase 2. */
+		ret = filter_ret;  /* Save the rest for phase 2. */
+		goto audit_exit;
 
 	case SECCOMP_RET_ARGEVAL:
 		/* Handled in seccomp_run_filters() */
 		BUG();
 	case SECCOMP_RET_ALLOW:
-		return SECCOMP_PHASE1_OK;
+		ret = SECCOMP_PHASE1_OK;
+		goto audit_exit;
 
 	case SECCOMP_RET_KILL:
 	default:
@@ -926,7 +931,12 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
 
 	unreachable();
 
+audit_exit:
+	audit_seccomp_exit(0);
+	return ret;
+
 skip:
+	audit_seccomp_exit(1);
 	audit_seccomp(this_syscall, 0, action);
 	return SECCOMP_PHASE1_SKIP;
 }
@@ -1139,6 +1149,11 @@ static long seccomp_add_checker_group(unsigned int flags, const char __user *gro
 	unsigned long group_size, kcheckers_size, full_group_size;
 	long result;
 
+	/* FIXME: Deny unsecure path evaluation (i.e. without audit_names) for
+	 * the entire task life.
+	 */
+	if (!current->audit_context)
+		return -EPERM;
 	if (!task_no_new_privs(current) &&
 	    security_capable_noaudit(current_cred(),
 				     current_user_ns(), CAP_SYS_ADMIN) != 0)
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 13/17] selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
                     ` (2 preceding siblings ...)
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 12/17] audit,seccomp: Extend audit with seccomp state Mickaël Salaün
@ 2016-03-24  2:53   ` Mickaël Salaün
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 14/17] selftest/seccomp: Make tracer_poke() more generic Mickaël Salaün
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:53 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Cosmetic rename for future name derivatives.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 49c5d39c30a4..7c48d4cf476a 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1138,14 +1138,14 @@ void tracer_poke(struct __test_metadata *_metadata, pid_t tracee, int status,
 	EXPECT_EQ(0, ret);
 }
 
-FIXTURE_DATA(TRACE_poke) {
+FIXTURE_DATA(TRACE_poke_sys_read) {
 	struct sock_fprog prog;
 	pid_t tracer;
 	long poked;
 	struct tracer_args_poke_t tracer_args;
 };
 
-FIXTURE_SETUP(TRACE_poke)
+FIXTURE_SETUP(TRACE_poke_sys_read)
 {
 	struct sock_filter filter[] = {
 		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
@@ -1170,14 +1170,14 @@ FIXTURE_SETUP(TRACE_poke)
 					   &self->tracer_args);
 }
 
-FIXTURE_TEARDOWN(TRACE_poke)
+FIXTURE_TEARDOWN(TRACE_poke_sys_read)
 {
 	teardown_trace_fixture(_metadata, self->tracer);
 	if (self->prog.filter)
 		free(self->prog.filter);
 }
 
-TEST_F(TRACE_poke, read_has_side_effects)
+TEST_F(TRACE_poke_sys_read, read_has_side_effects)
 {
 	ssize_t ret;
 
@@ -1193,7 +1193,7 @@ TEST_F(TRACE_poke, read_has_side_effects)
 	EXPECT_EQ(0x1001, self->poked);
 }
 
-TEST_F(TRACE_poke, getpid_runs_normally)
+TEST_F(TRACE_poke_sys_read, getpid_runs_normally)
 {
 	long ret;
 
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 14/17] selftest/seccomp: Make tracer_poke() more generic
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
                     ` (3 preceding siblings ...)
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 13/17] selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read Mickaël Salaün
@ 2016-03-24  2:53   ` Mickaël Salaün
  2016-03-24  2:54   ` [kernel-hardening] [RFC v1 15/17] selftest/seccomp: Add argeval_toctou_argument test Mickaël Salaün
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:53 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Use a different flag test (0x2001) than the RET_TRACE value (0x1001) to
not be ambiguous.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 7c48d4cf476a..f3a6ef4fce62 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1113,14 +1113,16 @@ void teardown_trace_fixture(struct __test_metadata *_metadata,
 
 /* "poke" tracer arguments and function. */
 struct tracer_args_poke_t {
-	unsigned long poke_addr;
+	unsigned long *poke_addr;
+	unsigned long *poke_data;
+	unsigned long poke_len;
 };
 
 void tracer_poke(struct __test_metadata *_metadata, pid_t tracee, int status,
 		 void *args)
 {
 	int ret;
-	unsigned long msg;
+	unsigned long msg, i;
 	struct tracer_args_poke_t *info = (struct tracer_args_poke_t *)args;
 
 	ret = ptrace(PTRACE_GETEVENTMSG, tracee, NULL, &msg);
@@ -1134,8 +1136,14 @@ void tracer_poke(struct __test_metadata *_metadata, pid_t tracee, int status,
 	 * Registers are not touched to try to keep this relatively arch
 	 * agnostic.
 	 */
-	ret = ptrace(PTRACE_POKEDATA, tracee, info->poke_addr, 0x1001);
-	EXPECT_EQ(0, ret);
+	for (i = 0; i < info->poke_len; i++) {
+		unsigned long addr = (unsigned long)info->poke_addr +
+			i * sizeof(long);
+
+		ret = ptrace(PTRACE_POKEDATA, tracee,
+				addr, *(info->poke_data + i));
+		EXPECT_EQ(0, ret);
+	}
 }
 
 FIXTURE_DATA(TRACE_poke_sys_read) {
@@ -1143,6 +1151,7 @@ FIXTURE_DATA(TRACE_poke_sys_read) {
 	pid_t tracer;
 	long poked;
 	struct tracer_args_poke_t tracer_args;
+	unsigned long flag;
 };
 
 FIXTURE_SETUP(TRACE_poke_sys_read)
@@ -1163,7 +1172,10 @@ FIXTURE_SETUP(TRACE_poke_sys_read)
 	self->prog.len = (unsigned short)ARRAY_SIZE(filter);
 
 	/* Set up tracer args. */
-	self->tracer_args.poke_addr = (unsigned long)&self->poked;
+	self->tracer_args.poke_addr = &self->poked;
+	self->flag = 0x2001;
+	self->tracer_args.poke_data = &self->flag;
+	self->tracer_args.poke_len = 1;
 
 	/* Launch tracer. */
 	self->tracer = setup_trace_fixture(_metadata, tracer_poke,
@@ -1190,7 +1202,7 @@ TEST_F(TRACE_poke_sys_read, read_has_side_effects)
 	EXPECT_EQ(0, self->poked);
 	ret = read(-1, NULL, 0);
 	EXPECT_EQ(-1, ret);
-	EXPECT_EQ(0x1001, self->poked);
+	EXPECT_EQ(0x2001, self->poked);
 }
 
 TEST_F(TRACE_poke_sys_read, getpid_runs_normally)
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 15/17] selftest/seccomp: Add argeval_toctou_argument test
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
                     ` (4 preceding siblings ...)
  2016-03-24  2:53   ` [kernel-hardening] [RFC v1 14/17] selftest/seccomp: Make tracer_poke() more generic Mickaël Salaün
@ 2016-03-24  2:54   ` Mickaël Salaün
  2016-03-24  2:54   ` [kernel-hardening] [RFC v1 16/17] security/seccomp: Protect against filesystem TOCTOU Mickaël Salaün
  2016-03-24  2:54   ` [kernel-hardening] [RFC v1 17/17] selftest/seccomp: Add argeval_toctou_filesystem test Mickaël Salaün
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:54 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Test if a time-of-check-time-of-use (TOCTOU) race condition attack is
effective to change a syscall argument after the seccomp filter
evaluation but before the effective syscall.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 157 ++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index f3a6ef4fce62..64b4d758b007 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -2363,6 +2363,163 @@ TEST(argeval_open_whitelist)
 	EXPECT_EQ(E2BIG, errno);
 	close(fd);
 }
+
+FIXTURE_DATA(TRACE_poke_arg_path) {
+	struct sock_fprog prog;
+	pid_t tracer;
+	struct tracer_args_poke_t tracer_args;
+	char *path_orig;
+	char *path_hijack;
+};
+
+FIXTURE_SETUP(TRACE_poke_arg_path)
+{
+	unsigned long orig_delta, orig_size, hijack_delta, hijack_size;
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+			offsetof(struct seccomp_data, nr)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_open, 0, 1),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_TRACE | 0x1001),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+	};
+
+	memset(&self->prog, 0, sizeof(self->prog));
+	self->prog.filter = malloc(sizeof(filter));
+	ASSERT_NE(NULL, self->prog.filter);
+	memcpy(self->prog.filter, filter, sizeof(filter));
+	self->prog.len = (unsigned short)ARRAY_SIZE(filter);
+
+	/* @path_orig must be writable */
+	orig_delta = sizeof(PATH_DEV_ZERO) % sizeof(long);
+	orig_size = sizeof(PATH_DEV_ZERO) - orig_delta +
+		(orig_delta ? sizeof(long) : 0);
+	self->path_orig = malloc(orig_size);
+	ASSERT_NE(NULL, self->path_orig);
+	memset(self->path_orig, 0, orig_size);
+	memcpy(self->path_orig, PATH_DEV_ZERO, sizeof(PATH_DEV_ZERO));
+	self->tracer_args.poke_addr = (unsigned long *)self->path_orig;
+
+	hijack_delta = sizeof(PATH_DEV_NULL) % sizeof(long);
+	hijack_size = sizeof(PATH_DEV_NULL) - hijack_delta +
+		(hijack_delta ? sizeof(long) : 0);
+	/* @path_hijack must be able to override @path_orig */
+	ASSERT_GE(orig_size, hijack_size);
+	self->path_hijack = malloc(hijack_size);
+	ASSERT_NE(NULL, self->path_hijack);
+	memset(self->path_hijack, 0, hijack_size);
+	memcpy(self->path_hijack, PATH_DEV_NULL, sizeof(PATH_DEV_NULL));
+	self->tracer_args.poke_data = (unsigned long *)self->path_hijack;
+	self->tracer_args.poke_len = hijack_size;
+
+	/* Launch tracer */
+	self->tracer = setup_trace_fixture(_metadata, tracer_poke,
+					   &self->tracer_args);
+}
+
+FIXTURE_TEARDOWN(TRACE_poke_arg_path)
+{
+	teardown_trace_fixture(_metadata, self->tracer);
+	if (self->prog.filter)
+		free(self->prog.filter);
+	if (self->path_orig)
+		free(self->path_orig);
+}
+
+/* Any tracer process can bypass a seccomp filter, so we can't protect against
+ * this threat and should deny any ptrace call from a seccomped process to be
+ * able to properly sandbox it.
+ *
+ * However, a seccomped process can fork and ask its child to change a shared
+ * memory used to hold the syscall arguments. This can be used to trigger
+ * TOCTOU race conditions between the filter evaluation and the effective
+ * syscall operations. For test purpose, it is simpler to ask a dedicated
+ * tracer process to do the same action after the filter evaluation to acheive
+ * the same result. The kernel must detect and block this race condition.
+ */
+TEST_F(TRACE_poke_arg_path, argeval_toctou_argument)
+{
+	int fd;
+	char buf;
+	ssize_t len;
+
+	/* Validate the first test file */
+	fd = open(self->path_orig, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open %s", self->path_orig);
+	}
+	len = read(fd, &buf, sizeof(buf));
+	EXPECT_EQ(1, len) {
+		TH_LOG("Failed to read from %s", self->path_orig);
+	}
+	EXPECT_EQ(0, buf) {
+		TH_LOG("Got unexpected value from %s", self->path_orig);
+	}
+	close(fd);
+
+	/* Validate the second test file */
+	fd = open(self->path_hijack, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open %s", self->path_hijack);
+	}
+	len = read(fd, &buf, sizeof(buf));
+	EXPECT_EQ(0, len) {
+		TH_LOG("Able to read from %s", self->path_orig);
+	}
+	close(fd);
+
+	apply_sandbox0(_metadata, PATH_DEV_ZERO);
+
+	/* Allowed file: /dev/zero */
+	fd = open(self->path_orig, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open %s", self->path_orig);
+	}
+	len = read(fd, &buf, sizeof(buf));
+	EXPECT_EQ(1, len) {
+		TH_LOG("Failed to read from %s", self->path_orig);
+	}
+	EXPECT_EQ(0, buf) {
+		TH_LOG("Got unexpected value from %s", self->path_orig);
+	}
+	close(fd);
+
+	/* Denied file: /dev/null */
+	fd = open(self->path_hijack, O_RDONLY);
+	EXPECT_EQ(-1, fd) {
+		TH_LOG("Could open %s", self->path_hijack);
+	}
+	close(fd);
+
+	/* Setup the hijack for every open: replace /dev/zero with /dev/null */
+	EXPECT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER,
+				SECCOMP_FILTER_FLAG_TSYNC, &self->prog)) {
+		TH_LOG("Failed to install filter!");
+	}
+
+	/* Should read /dev/zero even if it is hijacked with /dev/null after
+	 * the filter
+	 */
+	fd = open(self->path_orig, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open %s", self->path_orig);
+	}
+	len = read(fd, &buf, sizeof(buf));
+	EXPECT_EQ(1, len) {
+		TH_LOG("Failed to read from %s", self->path_orig);
+	}
+	EXPECT_EQ(0, buf) {
+		TH_LOG("Got unexpected value from %s", self->path_orig);
+	}
+	close(fd);
+
+	/* Now path_orig is definitely hijacked, so it must be denied */
+	fd = open(self->path_orig, O_RDONLY);
+	EXPECT_EQ(-1, fd) {
+		TH_LOG("Could open %s", self->path_orig);
+	}
+	EXPECT_EQ(E2BIG, errno);
+	close(fd);
+}
 #endif /* SECCOMP_DATA_ARGEVAL_PRESENT */
 
 /*
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 16/17] security/seccomp: Protect against filesystem TOCTOU
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
                     ` (5 preceding siblings ...)
  2016-03-24  2:54   ` [kernel-hardening] [RFC v1 15/17] selftest/seccomp: Add argeval_toctou_argument test Mickaël Salaün
@ 2016-03-24  2:54   ` Mickaël Salaün
  2016-03-24  2:54   ` [kernel-hardening] [RFC v1 17/17] selftest/seccomp: Add argeval_toctou_filesystem test Mickaël Salaün
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:54 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Detect a TOCTOU race condition attack on the filesystem by checking if
the effective syscall (i.e. LSM hooks) see the same files as previously
checked by the seccomp filter.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: David Drysdale <drysdale@google.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Will Drewry <wad@chromium.org>
---
 include/linux/seccomp.h       |  27 ++++++++
 kernel/fork.c                 |   2 +
 kernel/seccomp.c              | 103 +++++++++++++++++++++++++++-
 security/seccomp/checker_fs.c | 153 ++++++++++++++++++++++++++++++++++++++++++
 security/seccomp/checker_fs.h |  18 +++++
 security/seccomp/lsm.c        |  48 +++++++++++++
 6 files changed, 350 insertions(+), 1 deletion(-)
 create mode 100644 security/seccomp/checker_fs.h

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 0c5468f78945..8ea63813ca64 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -13,7 +13,9 @@
 
 struct seccomp_filter;
 struct seccomp_filter_checker_group;
+struct seccomp_argeval_checked;
 struct seccomp_argeval_cache;
+struct seccomp_argeval_syscall;
 
 /**
  * struct seccomp - the state of a seccomp'ed process
@@ -38,7 +40,9 @@ struct seccomp {
 	struct seccomp_filter_checker_group *checker_group;
 
 	/* syscall-lifetime data */
+	struct seccomp_argeval_checked *arg_checked;
 	struct seccomp_argeval_cache *arg_cache;
+	struct seccomp_argeval_syscall *orig_syscall;
 #endif
 };
 
@@ -153,6 +157,14 @@ struct seccomp_argeval_cache_fs {
 	u64 hash_len;
 };
 
+struct seccomp_argeval_history {
+	/* @checker point to current.seccomp->checker_group->checkers[] */
+	struct seccomp_filter_checker *checker;
+	u8 asked;
+	u8 result;
+	struct seccomp_argeval_history *next;
+};
+
 /**
  * struct seccomp_argeval_cache_entry
  *
@@ -178,6 +190,13 @@ struct seccomp_argeval_cache {
 	struct seccomp_argeval_cache *next;
 };
 
+/* Use get_argrule_checker() */
+struct seccomp_argeval_checked {
+	u32 check;
+	struct seccomp_argeval_history *history;
+	struct seccomp_argeval_checked *next;
+};
+
 void put_seccomp_filter_checker(struct seccomp_filter_checker *);
 
 u8 seccomp_argrule_path(const u8(*)[6], const u64(*)[6], u8,
@@ -186,6 +205,14 @@ u8 seccomp_argrule_path(const u8(*)[6], const u64(*)[6], u8,
 long seccomp_set_argcheck_fs(const struct seccomp_checker *,
 			     struct seccomp_filter_checker *);
 
+/* Need to save syscall properties to be able to properly recheck the filters
+ * even if the syscall and its arguments has been tampered by a tracer process.
+ */
+struct seccomp_argeval_syscall {
+	int nr;
+	u64 args[6];
+};
+
 #endif /* CONFIG_SECURITY_SECCOMP */
 
 #else  /* CONFIG_SECCOMP_FILTER */
diff --git a/kernel/fork.c b/kernel/fork.c
index b8155ebdd308..f41912acd755 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -361,7 +361,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 	tsk->seccomp.filter = NULL;
 #ifdef CONFIG_SECURITY_SECCOMP
 	tsk->seccomp.checker_group = NULL;
+	tsk->seccomp.arg_checked = NULL;
 	tsk->seccomp.arg_cache = NULL;
+	tsk->seccomp.orig_syscall = NULL;
 #endif /* CONFIG_SECURITY_SECCOMP */
 #endif /* CONFIG_SECCOMP */
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index a8a6ba31ecc4..735b7caf4e06 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -286,6 +286,40 @@ struct syscall_argdesc *syscall_nr_to_argdesc(int nr)
 	return &(*seccomp_sa)[nr];
 }
 
+/* Return a new empty history entry for the check type or NULL if ENOMEM */
+static struct seccomp_argeval_history *new_argeval_history(u32 check)
+{
+	struct seccomp_argeval_checked **arg_checked;
+	struct seccomp_argeval_history **history;
+	bool found = false;
+
+	/* Find the check type */
+	arg_checked = &current->seccomp.arg_checked;
+	while (*arg_checked) {
+		if ((*arg_checked)->check == check) {
+			found = true;
+			break;
+		}
+		arg_checked = &(*arg_checked)->next;
+	}
+	if (!found) {
+		*arg_checked = kmalloc(sizeof(**arg_checked), GFP_KERNEL);
+		if (!*arg_checked)
+			return NULL;
+		(*arg_checked)->check = check;
+		(*arg_checked)->history = NULL;
+		(*arg_checked)->next = NULL;
+	}
+
+	/* Append to history */
+	history = &(*arg_checked)->history;
+	while (*history)
+		history = &(*history)->next;
+	*history = kzalloc(sizeof(**history), GFP_KERNEL);
+
+	return *history;
+}
+
 /* Return the argument group address that match the group ID, or NULL
  * otherwise.
  */
@@ -299,6 +333,7 @@ static struct seccomp_filter_checker_group *seccomp_update_argrule_data(
 	const struct syscall_argdesc *argdesc;
 	struct seccomp_filter_checker *checker;
 	seccomp_argrule_t *engine;
+	struct seccomp_argeval_history *history;
 
 	const u8 group_id = ret_data & SECCOMP_RET_CHECKER_GROUP;
 	const u8 to_check = (ret_data & SECCOMP_RET_ARG_MATCHES) >> 8;
@@ -327,6 +362,17 @@ static struct seccomp_filter_checker_group *seccomp_update_argrule_data(
 		if (engine) {
 			match = (*engine)(&argdesc->args, &sd->args, to_check, checker);
 
+			/* Append the results to be able to replay the checks */
+			history = new_argeval_history(checker->check);
+			if (!history) {
+				/* XXX: return -ENOMEM somehow? */
+				break;
+			}
+			history->checker = checker;
+			history->asked = to_check;
+			history->result = match;
+
+			/* Store the matches after the history record */
 			for (j = 0; j < 6; j++) {
 				sd->arg_matches[j] |=
 				    ((BIT_ULL(j) & match) >> j) << i;
@@ -375,6 +421,39 @@ void flush_seccomp_cache(struct task_struct *tsk)
 	free_seccomp_argeval_cache(tsk->seccomp.arg_cache);
 	tsk->seccomp.arg_cache = NULL;
 }
+
+static void free_seccomp_argeval_history(struct seccomp_argeval_history *history)
+{
+	struct seccomp_argeval_history *walker = history;
+
+	while (walker) {
+		struct seccomp_argeval_history *freeme = walker;
+
+		/* Must not free history->checker owned by
+		 * current.seccomp->checker_group->checkers[]
+		 */
+		walker = walker->next;
+		kfree(freeme);
+	}
+}
+
+static void free_seccomp_argeval_checked(struct seccomp_argeval_checked *checked)
+{
+	struct seccomp_argeval_checked *walker = checked;
+
+	while (walker) {
+		struct seccomp_argeval_checked *freeme = walker;
+
+		free_seccomp_argeval_history(walker->history);
+		walker = walker->next;
+		kfree(freeme);
+	}
+}
+
+static inline void free_seccomp_argeval_syscall(struct seccomp_argeval_syscall *orig_syscall)
+{
+	kfree(orig_syscall);
+}
 #endif /* CONFIG_SECURITY_SECCOMP */
 
 static void put_seccomp_filter(struct task_struct *tsk);
@@ -405,7 +484,27 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
 		sd = &sd_local;
 	}
 #ifdef CONFIG_SECURITY_SECCOMP
-	/* Cleanup old (syscall-lifetime) cache */
+	/* Backup the current syscall and its arguments (used by the filters),
+	 * to not be misled in the LSM checks by a potential ptrace setregs
+	 * command.
+	 */
+	if (!current->seccomp.orig_syscall) {
+		current->seccomp.orig_syscall =
+		    kmalloc(sizeof(*current->seccomp.orig_syscall), GFP_KERNEL);
+		if (!current->seccomp.orig_syscall)
+			return SECCOMP_RET_KILL;
+	}
+	current->seccomp.orig_syscall->nr = sd->nr;
+	current->seccomp.orig_syscall->args[0] = sd->args[0];
+	current->seccomp.orig_syscall->args[1] = sd->args[1];
+	current->seccomp.orig_syscall->args[2] = sd->args[2];
+	current->seccomp.orig_syscall->args[3] = sd->args[3];
+	current->seccomp.orig_syscall->args[4] = sd->args[4];
+	current->seccomp.orig_syscall->args[5] = sd->args[5];
+
+	/* Cleanup old (syscall-lifetime) history and cache */
+	free_seccomp_argeval_checked(current->seccomp.arg_checked);
+	current->seccomp.arg_checked = NULL;
 	flush_seccomp_cache(current);
 #endif
 
@@ -786,7 +885,9 @@ void put_seccomp(struct task_struct *tsk)
 	put_seccomp_filter(tsk);
 #ifdef CONFIG_SECURITY_SECCOMP
 	/* Free in that order because of referenced checkers */
+	free_seccomp_argeval_checked(tsk->seccomp.arg_checked);
 	free_seccomp_argeval_cache(tsk->seccomp.arg_cache);
+	free_seccomp_argeval_syscall(tsk->seccomp.orig_syscall);
 	put_seccomp_checker_group(tsk->seccomp.checker_group);
 #endif
 }
diff --git a/security/seccomp/checker_fs.c b/security/seccomp/checker_fs.c
index 0a5ec3a204e7..994d889b0334 100644
--- a/security/seccomp/checker_fs.c
+++ b/security/seccomp/checker_fs.c
@@ -8,6 +8,7 @@
  * published by the Free Software Foundation.
  */
 
+#include <asm/syscall.h>	/* syscall_get_nr() */
 #include <linux/bitops.h>	/* BIT() */
 #include <linux/compat.h>
 #include <linux/namei.h>	/* user_lpath() */
@@ -17,6 +18,8 @@
 #include <linux/syscalls.h>	/* __SACT__CONST_CHAR_PTR */
 #include <linux/uaccess.h>	/* copy_from_user() */
 
+#include "checker_fs.h"
+
 #ifdef CONFIG_COMPAT
 /* struct seccomp_object_path */
 struct compat_seccomp_object_path {
@@ -330,6 +333,156 @@ out:
 	return ret;
 }
 
+/* argeval_find_args_file - return a bitmask of the syscall arguments matching
+ * a struct file and that have changed since the filter checks
+ *
+ * To match a file with a syscall argument, we get its path and deduce the
+ * corresponding user address (uptr). Then, if a match is found, the file's
+ * inode must match the cached inode, otherwise the access is denied if a
+ * second filter check doesn't match exactly the first one. This ensure the
+ * seccomp filter results are still the sames but a tracer process can still
+ * change the tracee syscall arguments.
+ *
+ * If the syscall take multiple paths and the same address is used but only one
+ * argument is checked by the filter, the inode will be checked for all paths
+ * with this same address, detecting a TOCTOU for all of them even if they were
+ * not evaluated by the filter.
+ */
+static u8 argeval_find_args_file(const struct file *file)
+{
+	const struct syscall_argdesc *argdesc;
+	struct dentry *dentry;
+	u8 result = 0;
+	struct seccomp_argeval_cache *arg_cache = current->seccomp.arg_cache;
+
+	if (unlikely(!file)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	/* Create the argument mask matching the uptr.
+	 * The syscall arguments may have been changed by a tracer.
+	 */
+	argdesc = syscall_nr_to_argdesc(syscall_get_nr(current,
+				task_pt_regs(current)));
+	if (unlikely(!argdesc)) {
+		WARN_ON(1);
+		return 0;
+	}
+	dentry = file->f_path.dentry;
+
+	/* Look in the cache for this path */
+	for (; arg_cache; arg_cache = arg_cache->next) {
+		struct seccomp_argeval_cache_entry *entry = arg_cache->entry;
+
+		switch (arg_cache->type) {
+		case SECCOMP_OBJTYPE_PATH:
+			/* Ignore the mount point to not be fooled by a
+			 * malicious one. Only look for a previously
+			 * seen dentry.
+			 */
+			for (; entry; entry = entry->next) {
+				/* Check for the same filename/argument.
+				 * If the hash and the length are the same
+				 * but the path is different we treat it
+				 * as a race-condition.
+				 */
+				if (entry->fs.hash_len !=
+				    dentry->d_name.hash_len)
+					continue;
+				/* Ignore exact match (i.e. pointed file didn't
+				 * change)
+				 */
+				if (entry->fs.path
+				    && entry->fs.path->dentry == dentry)
+					continue;
+				/* TODO: Add process info/audit */
+				pr_warn("seccomp: TOCTOU race-condition detected!\n");
+				result |= entry->args;
+			}
+			break;
+		default:
+			WARN_ON(1);
+		}
+	}
+	return result;
+}
+
+/**
+ * argeval_history_recheck_file - recheck with the seccomp filters if needed
+ */
+static bool argeval_history_recheck_file(const struct seccomp_argeval_history
+					 *history, seccomp_argrule_t *engine,
+					 const struct syscall_argdesc *argdesc,
+					 const u64(*args)[6], u8 arg_matches)
+{
+	/* Flush the cache to not rely on the first seccomp filter check
+	 * results
+	 */
+	flush_seccomp_cache(current);
+
+	while (history) {
+		/* Only check the changed arguments */
+		if (history->asked & arg_matches) {
+			u8 match = (*engine)(&argdesc->args, args,
+					history->asked, history->checker);
+
+			if (match != history->result)
+				return true;
+		}
+		history = history->next;
+	}
+	return false;
+}
+
+int seccomp_check_file(const struct file *file)
+{
+	int result = -EPERM;
+	const struct syscall_argdesc *argdesc;
+	struct seccomp_argeval_syscall *orig_syscall;
+	struct seccomp_argeval_checked *arg_checked;
+	seccomp_argrule_t *engine;
+	u8 arg_matches;
+
+	/* @file may be null (e.g. security_mmap_file) */
+	if (!file)
+		return 0;
+	/* Early check to exit quickly if no history */
+	arg_checked = current->seccomp.arg_checked;
+	if (!arg_checked)
+		return 0;
+	orig_syscall = current->seccomp.orig_syscall;
+	if (unlikely(!orig_syscall)) {
+		WARN_ON(1);
+		return 0;
+	}
+	/* Check if anything changed from the cache */
+	arg_matches = argeval_find_args_file(file);
+	if (!arg_matches)
+		return 0;
+	/* The syscall may have been changed by the tracer process */
+	argdesc = syscall_nr_to_argdesc(orig_syscall->nr);
+	if (!argdesc) {
+		WARN_ON(1);
+		goto out;
+	}
+	do {
+		engine = get_argrule_checker(arg_checked->check);
+		/* The syscall arguments may have been changed by the tracer
+		 * process
+		 */
+		/* FIXME: Adapt the checker to "struct file *" to avoid races */
+		result =
+		    argeval_history_recheck_file(arg_checked->history, engine,
+						 argdesc, &orig_syscall->args,
+						 arg_matches) ? -EPERM : 0;
+		arg_checked = arg_checked->next;
+	} while (arg_checked);
+
+out:
+	return result;
+}
+
 static long set_argtype_path(const struct seccomp_checker *user_checker,
 			     struct seccomp_filter_checker *kernel_checker)
 {
diff --git a/security/seccomp/checker_fs.h b/security/seccomp/checker_fs.h
new file mode 100644
index 000000000000..7ac102b510ec
--- /dev/null
+++ b/security/seccomp/checker_fs.h
@@ -0,0 +1,18 @@
+/*
+ * Seccomp Linux Security Module - File System Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_SECCOMP_CHECKER_FS_H
+#define _SECURITY_SECCOMP_CHECKER_FS_H
+
+#include <linux/fs.h>
+
+int seccomp_check_file(const struct file *);
+
+#endif /* _SECURITY_SECCOMP_CHECKER_FS_H */
diff --git a/security/seccomp/lsm.c b/security/seccomp/lsm.c
index 93c881724341..7bde63505dbd 100644
--- a/security/seccomp/lsm.c
+++ b/security/seccomp/lsm.c
@@ -10,9 +10,13 @@
 
 #include <asm/syscall.h>	/* sys_call_table */
 #include <linux/compat.h>
+#include <linux/cred.h>
+#include <linux/fs.h>
+#include <linux/lsm_hooks.h>
 #include <linux/slab.h>	/* kcalloc() */
 #include <linux/syscalls.h>	/* syscall_argdesc */
 
+#include "checker_fs.h"
 #include "lsm.h"
 
 /* TODO: Remove the need for CONFIG_SYSFS dependency */
@@ -22,6 +26,49 @@ struct syscall_argdesc (*seccomp_syscalls_argdesc)[] = NULL;
 struct syscall_argdesc (*compat_seccomp_syscalls_argdesc)[] = NULL;
 #endif	/* CONFIG_COMPAT */
 
+#define SECCOMP_HOOK(CHECK, NAME, ...)				\
+	static inline int seccomp_hook_##NAME(__VA_ARGS__)	\
+	{ 							\
+		return seccomp_check_##CHECK(CHECK);		\
+	}
+
+#define SECCOMP_HOOK_INIT(NAME) LSM_HOOK_INIT(NAME, seccomp_hook_##NAME)
+
+/* TODO: file_set_fowner, file_alloc_security? */
+
+SECCOMP_HOOK(file, binder_transfer_file, struct task_struct *from, struct task_struct *to, struct file *file)
+SECCOMP_HOOK(file, file_permission, struct file *file, int mask)
+SECCOMP_HOOK(file, file_ioctl, struct file *file, unsigned int cmd, unsigned long arg)
+SECCOMP_HOOK(file, mmap_file, struct file *file, unsigned long reqprot, unsigned long prot, unsigned long flags)
+SECCOMP_HOOK(file, file_lock, struct file *file, unsigned int cmd)
+SECCOMP_HOOK(file, file_fcntl, struct file *file, unsigned int cmd, unsigned long arg)
+SECCOMP_HOOK(file, file_receive, struct file *file)
+SECCOMP_HOOK(file, file_open, struct file *file, const struct cred *cred)
+SECCOMP_HOOK(file, kernel_fw_from_file, struct file *file, char *buf, size_t size)
+SECCOMP_HOOK(file, kernel_module_from_file, struct file *file)
+
+/* TODO: Add hooks with:
+ * - struct dentry *
+ * - struct path *
+ * - struct inode *
+ * ...
+ */
+
+
+static struct security_hook_list seccomp_hooks[] = {
+	SECCOMP_HOOK_INIT(binder_transfer_file),
+	SECCOMP_HOOK_INIT(file_permission),
+	SECCOMP_HOOK_INIT(file_ioctl),
+	SECCOMP_HOOK_INIT(mmap_file),
+	SECCOMP_HOOK_INIT(file_lock),
+	SECCOMP_HOOK_INIT(file_fcntl),
+	SECCOMP_HOOK_INIT(file_receive),
+	SECCOMP_HOOK_INIT(file_open),
+	SECCOMP_HOOK_INIT(kernel_fw_from_file),
+	SECCOMP_HOOK_INIT(kernel_module_from_file),
+};
+
+
 static const struct syscall_argdesc *__init
 find_syscall_argdesc(const struct syscall_argdesc *start,
 		const struct syscall_argdesc *stop, const void *addr)
@@ -84,4 +131,5 @@ void __init seccomp_init(void)
 {
 	pr_info("seccomp: Becoming ready for sandboxing\n");
 	init_argdesc();
+	security_add_hooks(seccomp_hooks, ARRAY_SIZE(seccomp_hooks));
 }
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [RFC v1 17/17] selftest/seccomp: Add argeval_toctou_filesystem test
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
                     ` (6 preceding siblings ...)
  2016-03-24  2:54   ` [kernel-hardening] [RFC v1 16/17] security/seccomp: Protect against filesystem TOCTOU Mickaël Salaün
@ 2016-03-24  2:54   ` Mickaël Salaün
  7 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24  2:54 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Stephen Smalley,
	Tetsuo Handa, Will Drewry, linux-api, kernel-hardening

Detect a TOCTOU race condition attack on the filesystem by renaming a
file after the seccomp filter evaluation but before the effective
syscall.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 180 +++++++++++++++++++++++++-
 1 file changed, 178 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 64b4d758b007..1558e0079fe9 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -10,6 +10,7 @@
 #define __have_sigval_t 1
 #define __have_sigevent_t 1
 
+#define _GNU_SOURCE
 #include <errno.h>
 #include <linux/filter.h>
 #include <sys/prctl.h>
@@ -32,8 +33,6 @@
 #include <sys/fcntl.h>
 #include <sys/mman.h>
 #include <sys/times.h>
-
-#define _GNU_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 
@@ -2520,6 +2519,183 @@ TEST_F(TRACE_poke_arg_path, argeval_toctou_argument)
 	EXPECT_EQ(E2BIG, errno);
 	close(fd);
 }
+
+char *new_file(struct __test_metadata *_metadata, const char *name, char buf)
+{
+	int ret, fd, path_len;
+	char *path;
+	const char tmpl[] = "/tmp/seccomp-test_%s.XXXXXX";
+
+	path_len = sizeof(tmpl) - 2 + strlen(name);
+	path = malloc(path_len);
+	ASSERT_NE(path, NULL);
+	ret = snprintf(path, path_len, tmpl, name);
+	ASSERT_EQ(ret, path_len - 1);
+	fd = mkostemp(path, O_CLOEXEC);
+	ASSERT_NE(fd, -1);
+	ret = write(fd, &buf, sizeof(buf));
+	ASSERT_EQ(ret, sizeof(buf));
+	close(fd);
+	return path;
+}
+
+struct tracer_args_files {
+	char *path_orig, *path_hijack, *path_swap;
+};
+
+/* Move a file after the filter evaluation but before the effective syscall. */
+void tracer_swap_file(struct __test_metadata *_metadata, pid_t tracee,
+		int status, void *args)
+{
+	int ret;
+	unsigned long msg;
+	struct tracer_args_files *info = (struct tracer_args_files *)args;
+
+	ret = ptrace(PTRACE_GETEVENTMSG, tracee, NULL, &msg);
+	EXPECT_EQ(0, ret);
+	/* If this fails, don't try to recover. */
+	ASSERT_EQ(0x1002, msg) {
+		kill(tracee, SIGKILL);
+	}
+	/* Let's start the bonneteau! */
+	ret = rename(info->path_orig, info->path_swap);
+	EXPECT_EQ(0, ret);
+	ret = rename(info->path_hijack, info->path_orig);
+	EXPECT_EQ(0, ret);
+	ret = rename(info->path_swap, info->path_hijack);
+	EXPECT_EQ(0, ret);
+}
+
+FIXTURE_DATA(TRACE_swap_file) {
+	struct sock_fprog prog;
+	pid_t tracer;
+	struct tracer_args_files tracer_args;
+	char *path_orig, *path_hijack, *path_swap;
+};
+
+FIXTURE_SETUP(TRACE_swap_file)
+{
+	int fd;
+	unsigned long orig_delta, orig_size, hijack_delta, hijack_size;
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+			offsetof(struct seccomp_data, nr)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_open, 0, 1),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_TRACE | 0x1002),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+	};
+
+	memset(&self->prog, 0, sizeof(self->prog));
+	self->prog.filter = malloc(sizeof(filter));
+	ASSERT_NE(NULL, self->prog.filter);
+	memcpy(self->prog.filter, filter, sizeof(filter));
+	self->prog.len = (unsigned short)ARRAY_SIZE(filter);
+
+	/* Create all the files */
+	self->path_orig = new_file(_metadata, "orig", 'O');
+	self->tracer_args.path_orig = self->path_orig;
+	self->path_hijack = new_file(_metadata, "hijack", 'H');
+	self->tracer_args.path_hijack = self->path_hijack;
+	self->path_swap = new_file(_metadata, "swap", 'S');
+	self->tracer_args.path_swap = self->path_swap;
+
+	/* Remove the temporary swap file */
+	unlink(self->path_swap);
+
+	/* Launch tracer */
+	self->tracer = setup_trace_fixture(_metadata, tracer_swap_file,
+					   &self->tracer_args);
+}
+
+FIXTURE_TEARDOWN(TRACE_swap_file)
+{
+	teardown_trace_fixture(_metadata, self->tracer);
+	if (self->prog.filter)
+		free(self->prog.filter);
+	if (self->path_orig) {
+		unlink(self->path_orig);
+		free(self->path_orig);
+	}
+	if (self->path_hijack) {
+		unlink(self->path_hijack);
+		free(self->path_hijack);
+	}
+	if (self->path_swap) {
+		unlink(self->path_swap);
+		free(self->path_swap);
+	}
+}
+
+TEST_F(TRACE_swap_file, argeval_toctou_filesystem)
+{
+	int fd;
+	char buf;
+	ssize_t len;
+
+	/* Validate the first test file */
+	fd = open(self->path_orig, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open %s", self->path_orig);
+	}
+	len = read(fd, &buf, sizeof(buf));
+	EXPECT_EQ(1, len) {
+		TH_LOG("Failed to read from %s", self->path_orig);
+	}
+	EXPECT_EQ('O', buf) {
+		TH_LOG("Got unexpected value from %s", self->path_orig);
+	}
+	close(fd);
+
+	/* Validate the second test file */
+	fd = open(self->path_hijack, O_RDONLY);
+	EXPECT_NE(-1, fd) {
+		TH_LOG("Failed to open %s", self->path_hijack);
+	}
+	len = read(fd, &buf, sizeof(buf));
+	EXPECT_EQ(1, len) {
+		TH_LOG("Failed to read from %s", self->path_hijack);
+	}
+	EXPECT_EQ('H', buf) {
+		TH_LOG("Got unexpected value from %s", self->path_hijack);
+	}
+	close(fd);
+
+	apply_sandbox0(_metadata, self->path_orig);
+
+	/* Setup the hijack for every open */
+	EXPECT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER,
+				SECCOMP_FILTER_FLAG_TSYNC, &self->prog)) {
+		TH_LOG("Failed to install filter!");
+	}
+
+	/* Hijacked file */
+	fd = open(self->path_orig, O_RDONLY);
+	EXPECT_EQ(-1, fd) {
+		TH_LOG("Could open %s", self->path_hijack);
+	}
+	EXPECT_EQ(EPERM, errno);
+	close(fd);
+
+	/* Denied file */
+	fd = open(self->path_orig, O_RDONLY);
+	EXPECT_EQ(-1, fd) {
+		TH_LOG("Could open %s", self->path_hijack);
+	}
+	EXPECT_EQ(E2BIG, errno);
+	close(fd);
+}
+
+/*
+ * TODO: tests to add
+ * - symlink following
+ * - dentry/inode/device/mount checkers
+ * - PATH_BENEATH
+ * - object creation with nonexistent file
+ * - validate that ptrace's SETREGS is still working on a process using seccomp-objects
+ * - TOCTOU with a hard link (should pass)
+ * - limits
+ */
+
 #endif /* SECCOMP_DATA_ARGEVAL_PRESENT */
 
 /*
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC Mickaël Salaün
@ 2016-03-24  4:35   ` Kees Cook
  2016-03-29 15:35     ` Shuah Khan
  0 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2016-03-24  4:35 UTC (permalink / raw)
  To: Mickaël Salaün, Shuah Khan
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening

On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
> Rename SECCOMP_FLAG_FILTER_TSYNC to SECCOMP_FILTER_FLAG_TSYNC to match
> the UAPI.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Will Drewry <wad@chromium.org>

Hah, oops. Thanks! Shuah, can you take this patch into the selftest tree?

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index b9453b838162..9c1460f277c2 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -1497,8 +1497,8 @@ TEST_F(TRACE_syscall, syscall_dropped)
>  #define SECCOMP_SET_MODE_FILTER 1
>  #endif
>
> -#ifndef SECCOMP_FLAG_FILTER_TSYNC
> -#define SECCOMP_FLAG_FILTER_TSYNC 1
> +#ifndef SECCOMP_FILTER_FLAG_TSYNC
> +#define SECCOMP_FILTER_FLAG_TSYNC 1
>  #endif
>
>  #ifndef seccomp
> @@ -1613,7 +1613,7 @@ TEST(TSYNC_first)
>                 TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
>         }
>
> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>                       &prog);
>         ASSERT_NE(ENOSYS, errno) {
>                 TH_LOG("Kernel does not support seccomp syscall!");
> @@ -1831,7 +1831,7 @@ TEST_F(TSYNC, two_siblings_with_ancestor)
>                 self->sibling_count++;
>         }
>
> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>                       &self->apply_prog);
>         ASSERT_EQ(0, ret) {
>                 TH_LOG("Could install filter on all threads!");
> @@ -1892,7 +1892,7 @@ TEST_F(TSYNC, two_siblings_with_no_filter)
>                 TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
>         }
>
> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>                       &self->apply_prog);
>         ASSERT_NE(ENOSYS, errno) {
>                 TH_LOG("Kernel does not support seccomp syscall!");
> @@ -1940,7 +1940,7 @@ TEST_F(TSYNC, two_siblings_with_one_divergence)
>                 self->sibling_count++;
>         }
>
> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>                       &self->apply_prog);
>         ASSERT_EQ(self->sibling[0].system_tid, ret) {
>                 TH_LOG("Did not fail on diverged sibling.");
> @@ -1992,7 +1992,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
>                 TH_LOG("Kernel does not support SECCOMP_SET_MODE_FILTER!");
>         }
>
> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>                       &self->apply_prog);
>         ASSERT_EQ(ret, self->sibling[0].system_tid) {
>                 TH_LOG("Did not fail on diverged sibling.");
> @@ -2021,7 +2021,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
>         /* Switch to the remaining sibling */
>         sib = !sib;
>
> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>                       &self->apply_prog);
>         ASSERT_EQ(0, ret) {
>                 TH_LOG("Expected the remaining sibling to sync");
> @@ -2044,7 +2044,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
>         while (!kill(self->sibling[sib].system_tid, 0))
>                 sleep(0.1);
>
> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>                       &self->apply_prog);
>         ASSERT_EQ(0, ret);  /* just us chickens */
>  }
> --
> 2.8.0.rc3
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature Mickaël Salaün
@ 2016-03-24  4:36   ` Kees Cook
  2016-03-29 15:38     ` Shuah Khan
  0 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2016-03-24  4:36 UTC (permalink / raw)
  To: Mickaël Salaün, Shuah Khan
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening

On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Will Drewry <wad@chromium.org>

Another good catch. Shuah, can you take this one too?

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index 9c1460f277c2..150829dd7998 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -1502,10 +1502,10 @@ TEST_F(TRACE_syscall, syscall_dropped)
>  #endif
>
>  #ifndef seccomp
> -int seccomp(unsigned int op, unsigned int flags, struct sock_fprog *filter)
> +int seccomp(unsigned int op, unsigned int flags, void *args)
>  {
>         errno = 0;
> -       return syscall(__NR_seccomp, op, flags, filter);
> +       return syscall(__NR_seccomp, op, flags, args);
>  }
>  #endif
>
> --
> 2.8.0.rc3
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata Mickaël Salaün
@ 2016-03-24 15:47   ` Casey Schaufler
  2016-03-24 16:01   ` Casey Schaufler
  1 sibling, 0 replies; 39+ messages in thread
From: Casey Schaufler @ 2016-03-24 15:47 UTC (permalink / raw)
  To: Mickaël Salaün, linux-security-module
  Cc: Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Daniel Borkmann, David Drysdale, Eric Paris,
	James Morris, Jeff Dike, Julien Tinnes, Kees Cook,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	linux-api, kernel-hardening

On 3/23/2016 6:46 PM, Mickaël Salaün wrote:
> To avoid userland to make mistakes by misusing a syscall parameter, the
> kernel check the type of the syscall parameters (e.g. char pointer). At
> compile time we create a memory section (i.e. __syscall_argdesc) with
> syscall metadata. At boot time, this section is used to create an array
> (i.e. seccomp_syscalls_argdesc) usable to check the syscall arguments.
> The same way, another array can be created and used for compat mode.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Andreas Gruenbacher <agruenba@redhat.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Casey Schaufler <casey@schaufler-ca.com>
> Cc: David Drysdale <drysdale@google.com>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <pmoore@redhat.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Will Drewry <wad@chromium.org>
> ---
>  include/asm-generic/vmlinux.lds.h | 22 ++++++++++
>  include/linux/compat.h            | 10 +++++
>  include/linux/lsm_hooks.h         |  5 +++
>  include/linux/syscalls.h          | 68 ++++++++++++++++++++++++++++++
>  security/Kconfig                  |  1 +
>  security/Makefile                 |  2 +
>  security/seccomp/Kconfig          | 14 +++++++
>  security/seccomp/Makefile         |  3 ++
>  security/seccomp/lsm.c            | 87 +++++++++++++++++++++++++++++++++++++++
>  security/seccomp/lsm.h            | 19 +++++++++
>  security/security.c               |  1 +
>  11 files changed, 232 insertions(+)
>  create mode 100644 security/seccomp/Kconfig
>  create mode 100644 security/seccomp/Makefile
>  create mode 100644 security/seccomp/lsm.c
>  create mode 100644 security/seccomp/lsm.h
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index c4bd0e2c173c..b8792fc083c2 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -153,6 +153,26 @@
>  #define TRACE_SYSCALLS()
>  #endif
>  
> +#ifdef CONFIG_SECURITY_SECCOMP
> +#define ARGDESC_SYSCALLS() . = ALIGN(8);				\
> +			 VMLINUX_SYMBOL(__start_syscalls_argdesc) = .;	\
> +			 *(__syscalls_argdesc)				\
> +			 VMLINUX_SYMBOL(__stop_syscalls_argdesc) = .;
> +
> +#ifdef CONFIG_COMPAT
> +#define COMPAT_ARGDESC_SYSCALLS() . = ALIGN(8);				\
> +		 VMLINUX_SYMBOL(__start_compat_syscalls_argdesc) = .;	\
> +		 *(__compat_syscalls_argdesc)				\
> +		 VMLINUX_SYMBOL(__stop_compat_syscalls_argdesc) = .;
> +#else
> +#define COMPAT_ARGDESC_SYSCALLS()
> +#endif	/* CONFIG_COMPAT */
> +
> +#else
> +#define ARGDESC_SYSCALLS()
> +#define COMPAT_ARGDESC_SYSCALLS()
> +#endif /* CONFIG_SECURITY_SECCOMP */
> +
>  #ifdef CONFIG_SERIAL_EARLYCON
>  #define EARLYCON_TABLE() STRUCT_ALIGN();			\
>  			 VMLINUX_SYMBOL(__earlycon_table) = .;	\
> @@ -511,6 +531,8 @@
>  	MEM_DISCARD(init.data)						\
>  	KERNEL_CTORS()							\
>  	MCOUNT_REC()							\
> +	ARGDESC_SYSCALLS()						\
> +	COMPAT_ARGDESC_SYSCALLS()					\
>  	*(.init.rodata)							\
>  	FTRACE_EVENTS()							\
>  	TRACE_SYSCALLS()						\
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index a76c9172b2eb..b63579a401e8 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -15,6 +15,7 @@
>  #include <linux/fs.h>
>  #include <linux/aio_abi.h>	/* for aio_context_t */
>  #include <linux/unistd.h>
> +#include <linux/syscalls.h>	/* for SYSCALL_FILL_ARGDESC_SECTION */
>  
>  #include <asm/compat.h>
>  #include <asm/siginfo.h>
> @@ -28,7 +29,15 @@
>  #define __SC_DELOUSE(t,v) ((t)(unsigned long)(v))
>  #endif
>  
> +#ifdef CONFIG_SECURITY_SECCOMP
> +#define COMPAT_SYSCALL_FILL_ARGDESC(...)	\
> +	SYSCALL_FILL_ARGDESC_SECTION("__compat_syscalls_argdesc", __VA_ARGS__)
> +#else
> +#define COMPAT_SYSCALL_FILL_ARGDESC(...)
> +#endif /* CONFIG_SECURITY_SECCOMP */
> +
>  #define COMPAT_SYSCALL_DEFINE0(name) \
> +	COMPAT_SYSCALL_FILL_ARGDESC(compat_sys_##name, 0)	\
>  	asmlinkage long compat_sys_##name(void)
>  
>  #define COMPAT_SYSCALL_DEFINE1(name, ...) \
> @@ -45,6 +54,7 @@
>  	COMPAT_SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
>  
>  #define COMPAT_SYSCALL_DEFINEx(x, name, ...)				\
> +	COMPAT_SYSCALL_FILL_ARGDESC(compat_sys##name, x, __VA_ARGS__)	\
>  	asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
>  		__attribute__((alias(__stringify(compat_SyS##name))));  \
>  	static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 71969de4058c..12df41669308 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1892,5 +1892,10 @@ extern void __init yama_add_hooks(void);
>  #else
>  static inline void __init yama_add_hooks(void) { }
>  #endif
> +#ifdef CONFIG_SECURITY_SECCOMP
> +extern void __init seccomp_init(void);
> +#else
> +static inline void __init seccomp_init(void) { }
> +#endif
>  
>  #endif /* ! __LINUX_LSM_HOOKS_H */
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 185815c96433..0f846c408bba 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -79,6 +79,8 @@ union bpf_attr;
>  #include <linux/quota.h>
>  #include <linux/key.h>
>  #include <trace/syscall.h>
> +#include <uapi/asm/unistd.h>
> +#include <linux/seccomp.h>
>  
>  /*
>   * __MAP - apply a macro to syscall arguments
> @@ -98,6 +100,24 @@ union bpf_attr;
>  #define __MAP6(m,t,a,...) m(t,a), __MAP5(m,__VA_ARGS__)
>  #define __MAP(n,...) __MAP##n(__VA_ARGS__)
>  
> +#define __COMPARGS6
> +#define __COMPARGS5 , 0
> +#define __COMPARGS4 , 0, 0
> +#define __COMPARGS3 , 0, 0, 0
> +#define __COMPARGS2 , 0, 0, 0, 0
> +#define __COMPARGS1 , 0, 0, 0, 0, 0
> +#define __COMPARGS0 0, 0, 0, 0, 0, 0
> +#define __COMPARGS(n) __COMPARGS##n
> +
> +#define __COMPDECL6
> +#define __COMPDECL5
> +#define __COMPDECL4
> +#define __COMPDECL3
> +#define __COMPDECL2
> +#define __COMPDECL1
> +#define __COMPDECL0 void
> +#define __COMPDECL(n) __COMPDECL##n
> +
>  #define __SC_DECL(t, a)	t a
>  #define __TYPE_IS_L(t)	(__same_type((t)0, 0L))
>  #define __TYPE_IS_UL(t)	(__same_type((t)0, 0UL))
> @@ -175,8 +195,55 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  #define SYSCALL_METADATA(sname, nb, ...)
>  #endif
>  
> +#ifdef CONFIG_SECURITY_SECCOMP
> +/*
> + * Do not store the symbole name but the syscall symbole address.
> + * FIXME: Handle aliased symboles (i.e. different name but same address)?
> + *
> + * @addr: syscall address
> + * @args: syscall arguments C type (i.e. __SACT__* values)
> + */
> +struct syscall_argdesc {
> +	const void *addr;
> +	u8 args[6];
> +};
> +
> +/* Syscall Argument C Type (none means no argument) */
> +#define __SACT__NONE			0
> +#define __SACT__OTHER			1
> +#define __SACT__CONST_CHAR_PTR		2
> +#define __SACT__CHAR_PTR		3
> +
> +#define __SC_ARGDESC_TYPE(t, a)						\
> +	__builtin_types_compatible_p(typeof(t), const char *) ?		\
> +	__SACT__CONST_CHAR_PTR :					\
> +	__builtin_types_compatible_p(typeof(t), char *) ?		\
> +	__SACT__CHAR_PTR :						\
> +	__SACT__OTHER
> +
> +#define SYSCALL_FILL_ARGDESC_SECTION(_section, sname, nb, ...)		\
> +	asmlinkage long sname(__MAP(nb, __SC_DECL, __VA_ARGS__)		\
> +			__COMPDECL(nb));				\
> +	static struct syscall_argdesc __used				\
> +		__attribute__((section(_section)))			\
> +		syscall_argdesc_##sname = {				\
> +			.addr = sname,					\
> +			.args = {					\
> +				__MAP(nb, __SC_ARGDESC_TYPE, __VA_ARGS__)\
> +				__COMPARGS(nb)				\
> +			},						\
> +		};
> +
> +#define SYSCALL_FILL_ARGDESC(...)	\
> +	SYSCALL_FILL_ARGDESC_SECTION("__syscalls_argdesc", __VA_ARGS__)
> +
> +#else
> +#define SYSCALL_FILL_ARGDESC(...)
> +#endif /* CONFIG_SECURITY_SECCOMP */
> +
>  #define SYSCALL_DEFINE0(sname)					\
>  	SYSCALL_METADATA(_##sname, 0);				\
> +	SYSCALL_FILL_ARGDESC(sys_##sname, 0)			\
>  	asmlinkage long sys_##sname(void)
>  
>  #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
> @@ -188,6 +255,7 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  
>  #define SYSCALL_DEFINEx(x, sname, ...)				\
>  	SYSCALL_METADATA(sname, x, __VA_ARGS__)			\
> +	SYSCALL_FILL_ARGDESC(sys##sname, x, __VA_ARGS__)	\
>  	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>  
>  #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
> diff --git a/security/Kconfig b/security/Kconfig
> index e45237897b43..c98fe1a924cd 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -123,6 +123,7 @@ source security/smack/Kconfig
>  source security/tomoyo/Kconfig
>  source security/apparmor/Kconfig
>  source security/yama/Kconfig
> +source security/seccomp/Kconfig
>  
>  source security/integrity/Kconfig
>  
> diff --git a/security/Makefile b/security/Makefile
> index c9bfbc84ff50..0e4cdefc4777 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -8,6 +8,7 @@ subdir-$(CONFIG_SECURITY_SMACK)		+= smack
>  subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
>  subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
>  subdir-$(CONFIG_SECURITY_YAMA)		+= yama
> +subdir-$(CONFIG_SECCOMP_FILTER)		+= seccomp
>  
>  # always enable default capabilities
>  obj-y					+= commoncap.o
> @@ -22,6 +23,7 @@ obj-$(CONFIG_AUDIT)			+= lsm_audit.o
>  obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
>  obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
>  obj-$(CONFIG_SECURITY_YAMA)		+= yama/
> +obj-$(CONFIG_SECCOMP_FILTER)	+= seccomp/
>  obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
>  
>  # Object integrity file lists
> diff --git a/security/seccomp/Kconfig b/security/seccomp/Kconfig
> new file mode 100644
> index 000000000000..7b0fe649ed89
> --- /dev/null
> +++ b/security/seccomp/Kconfig
> @@ -0,0 +1,14 @@
> +config SECURITY_SECCOMP
> +	bool "Seccomp LSM support"
> +	depends on AUDIT
> +	depends on SECCOMP
> +	depends on SECURITY
> +	default y
> +	help
> +	  This selects an extension to the Seccomp BPF to be able to filter
> +	  syscall arguments as kernel objects (e.g. file path).
> +	  This stacked LSM is needed to detect and block race-condition attacks
> +	  against argument evaluation (i.e. TOCTOU). Further information can be
> +	  found in Documentation/prctl/seccomp_filter.txt .
> +
> +	  If you are unsure how to answer this question, answer Y.
> diff --git a/security/seccomp/Makefile b/security/seccomp/Makefile
> new file mode 100644
> index 000000000000..f2e848d81138
> --- /dev/null
> +++ b/security/seccomp/Makefile
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_SECURITY_SECCOMP) := seccomp.o
> +
> +seccomp-y := lsm.o
> diff --git a/security/seccomp/lsm.c b/security/seccomp/lsm.c
> new file mode 100644
> index 000000000000..93c881724341
> --- /dev/null
> +++ b/security/seccomp/lsm.c
> @@ -0,0 +1,87 @@
> +/*
> + * Seccomp Linux Security Module
> + *
> + * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <asm/syscall.h>	/* sys_call_table */
> +#include <linux/compat.h>
> +#include <linux/slab.h>	/* kcalloc() */
> +#include <linux/syscalls.h>	/* syscall_argdesc */
> +
> +#include "lsm.h"
> +
> +/* TODO: Remove the need for CONFIG_SYSFS dependency */
> +
> +struct syscall_argdesc (*seccomp_syscalls_argdesc)[] = NULL;
> +#ifdef CONFIG_COMPAT
> +struct syscall_argdesc (*compat_seccomp_syscalls_argdesc)[] = NULL;
> +#endif	/* CONFIG_COMPAT */
> +
> +static const struct syscall_argdesc *__init
> +find_syscall_argdesc(const struct syscall_argdesc *start,
> +		const struct syscall_argdesc *stop, const void *addr)
> +{
> +	if (unlikely(!addr || !start || !stop)) {
> +		WARN_ON(1);
> +		return NULL;
> +	}
> +
> +	for (; start < stop; start++) {
> +		if (start->addr == addr)
> +			return start;
> +	}
> +	return NULL;
> +}
> +
> +static inline void __init init_argdesc(void)
> +{
> +	const struct syscall_argdesc *argdesc;
> +	const void *addr;
> +	int i;
> +
> +	seccomp_syscalls_argdesc = kcalloc(NR_syscalls,
> +			sizeof((*seccomp_syscalls_argdesc)[0]), GFP_KERNEL);
> +	if (unlikely(!seccomp_syscalls_argdesc)) {
> +		WARN_ON(1);
> +		return;
> +	}
> +	for (i = 0; i < NR_syscalls; i++) {
> +		addr = sys_call_table[i];
> +		argdesc = find_syscall_argdesc(__start_syscalls_argdesc,
> +				__stop_syscalls_argdesc, addr);
> +		if (!argdesc)
> +			continue;
> +
> +		(*seccomp_syscalls_argdesc)[i] = *argdesc;
> +	}
> +
> +#ifdef CONFIG_COMPAT
> +	compat_seccomp_syscalls_argdesc = kcalloc(IA32_NR_syscalls,
> +			sizeof((*compat_seccomp_syscalls_argdesc)[0]),
> +			GFP_KERNEL);
> +	if (unlikely(!compat_seccomp_syscalls_argdesc)) {
> +		WARN_ON(1);
> +		return;
> +	}
> +	for (i = 0; i < IA32_NR_syscalls; i++) {
> +		addr = ia32_sys_call_table[i];
> +		argdesc = find_syscall_argdesc(__start_compat_syscalls_argdesc,
> +				__stop_compat_syscalls_argdesc, addr);
> +		if (!argdesc)
> +			continue;
> +
> +		(*compat_seccomp_syscalls_argdesc)[i] = *argdesc;
> +	}
> +#endif	/* CONFIG_COMPAT */
> +}
> +
> +void __init seccomp_init(void)
> +{
> +	pr_info("seccomp: Becoming ready for sandboxing\n");
> +	init_argdesc();
> +}
> diff --git a/security/seccomp/lsm.h b/security/seccomp/lsm.h
> new file mode 100644
> index 000000000000..ededbd27c225
> --- /dev/null
> +++ b/security/seccomp/lsm.h
> @@ -0,0 +1,19 @@
> +/*
> + * Seccomp Linux Security Module
> + *
> + * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/syscalls.h>	/* syscall_argdesc */
> +
> +extern const struct syscall_argdesc __start_syscalls_argdesc[];
> +extern const struct syscall_argdesc __stop_syscalls_argdesc[];
> +
> +#ifdef CONFIG_COMPAT
> +extern const struct syscall_argdesc __start_compat_syscalls_argdesc[];
> +extern const struct syscall_argdesc __stop_compat_syscalls_argdesc[];
> +#endif	/* CONFIG_COMPAT */
> diff --git a/security/security.c b/security/security.c
> index e8ffd92ae2eb..76e50345cd82 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -60,6 +60,7 @@ int __init security_init(void)
>  	 */
>  	capability_add_hooks();
>  	yama_add_hooks();
> +	seccomp_init();

Can you make this seccomp_add_hooks() instead?
That makes it a bit easier to distinguish between
the modules that are being explicitly stacked and
those that are using the generic init mechanism.

>  
>  	/*
>  	 * Load all the remaining security modules.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata
  2016-03-24  1:46 ` [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata Mickaël Salaün
  2016-03-24 15:47   ` [kernel-hardening] " Casey Schaufler
@ 2016-03-24 16:01   ` Casey Schaufler
  2016-03-24 21:31     ` Mickaël Salaün
  1 sibling, 1 reply; 39+ messages in thread
From: Casey Schaufler @ 2016-03-24 16:01 UTC (permalink / raw)
  To: Mickaël Salaün, linux-security-module
  Cc: Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Daniel Borkmann, David Drysdale, Eric Paris,
	James Morris, Jeff Dike, Julien Tinnes, Kees Cook,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	linux-api, kernel-hardening

On 3/23/2016 6:46 PM, Mickaël Salaün wrote:
> To avoid userland to make mistakes by misusing a syscall parameter, the
> kernel check the type of the syscall parameters (e.g. char pointer). At
> compile time we create a memory section (i.e. __syscall_argdesc) with
> syscall metadata. At boot time, this section is used to create an array
> (i.e. seccomp_syscalls_argdesc) usable to check the syscall arguments.
> The same way, another array can be created and used for compat mode.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Andreas Gruenbacher <agruenba@redhat.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Casey Schaufler <casey@schaufler-ca.com>
> Cc: David Drysdale <drysdale@google.com>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <pmoore@redhat.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Will Drewry <wad@chromium.org>
> ---
>  include/asm-generic/vmlinux.lds.h | 22 ++++++++++
>  include/linux/compat.h            | 10 +++++
>  include/linux/lsm_hooks.h         |  5 +++
>  include/linux/syscalls.h          | 68 ++++++++++++++++++++++++++++++
>  security/Kconfig                  |  1 +
>  security/Makefile                 |  2 +
>  security/seccomp/Kconfig          | 14 +++++++
>  security/seccomp/Makefile         |  3 ++
>  security/seccomp/lsm.c            | 87 +++++++++++++++++++++++++++++++++++++++
>  security/seccomp/lsm.h            | 19 +++++++++
>  security/security.c               |  1 +
>  11 files changed, 232 insertions(+)
>  create mode 100644 security/seccomp/Kconfig
>  create mode 100644 security/seccomp/Makefile
>  create mode 100644 security/seccomp/lsm.c
>  create mode 100644 security/seccomp/lsm.h
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index c4bd0e2c173c..b8792fc083c2 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -153,6 +153,26 @@
>  #define TRACE_SYSCALLS()
>  #endif
>  
> +#ifdef CONFIG_SECURITY_SECCOMP
> +#define ARGDESC_SYSCALLS() . = ALIGN(8);				\
> +			 VMLINUX_SYMBOL(__start_syscalls_argdesc) = .;	\
> +			 *(__syscalls_argdesc)				\
> +			 VMLINUX_SYMBOL(__stop_syscalls_argdesc) = .;
> +
> +#ifdef CONFIG_COMPAT
> +#define COMPAT_ARGDESC_SYSCALLS() . = ALIGN(8);				\
> +		 VMLINUX_SYMBOL(__start_compat_syscalls_argdesc) = .;	\
> +		 *(__compat_syscalls_argdesc)				\
> +		 VMLINUX_SYMBOL(__stop_compat_syscalls_argdesc) = .;
> +#else
> +#define COMPAT_ARGDESC_SYSCALLS()
> +#endif	/* CONFIG_COMPAT */
> +
> +#else
> +#define ARGDESC_SYSCALLS()
> +#define COMPAT_ARGDESC_SYSCALLS()
> +#endif /* CONFIG_SECURITY_SECCOMP */
> +
>  #ifdef CONFIG_SERIAL_EARLYCON
>  #define EARLYCON_TABLE() STRUCT_ALIGN();			\
>  			 VMLINUX_SYMBOL(__earlycon_table) = .;	\
> @@ -511,6 +531,8 @@
>  	MEM_DISCARD(init.data)						\
>  	KERNEL_CTORS()							\
>  	MCOUNT_REC()							\
> +	ARGDESC_SYSCALLS()						\
> +	COMPAT_ARGDESC_SYSCALLS()					\
>  	*(.init.rodata)							\
>  	FTRACE_EVENTS()							\
>  	TRACE_SYSCALLS()						\
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index a76c9172b2eb..b63579a401e8 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -15,6 +15,7 @@
>  #include <linux/fs.h>
>  #include <linux/aio_abi.h>	/* for aio_context_t */
>  #include <linux/unistd.h>
> +#include <linux/syscalls.h>	/* for SYSCALL_FILL_ARGDESC_SECTION */
>  
>  #include <asm/compat.h>
>  #include <asm/siginfo.h>
> @@ -28,7 +29,15 @@
>  #define __SC_DELOUSE(t,v) ((t)(unsigned long)(v))
>  #endif
>  
> +#ifdef CONFIG_SECURITY_SECCOMP
> +#define COMPAT_SYSCALL_FILL_ARGDESC(...)	\
> +	SYSCALL_FILL_ARGDESC_SECTION("__compat_syscalls_argdesc", __VA_ARGS__)
> +#else
> +#define COMPAT_SYSCALL_FILL_ARGDESC(...)
> +#endif /* CONFIG_SECURITY_SECCOMP */
> +
>  #define COMPAT_SYSCALL_DEFINE0(name) \
> +	COMPAT_SYSCALL_FILL_ARGDESC(compat_sys_##name, 0)	\
>  	asmlinkage long compat_sys_##name(void)
>  
>  #define COMPAT_SYSCALL_DEFINE1(name, ...) \
> @@ -45,6 +54,7 @@
>  	COMPAT_SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
>  
>  #define COMPAT_SYSCALL_DEFINEx(x, name, ...)				\
> +	COMPAT_SYSCALL_FILL_ARGDESC(compat_sys##name, x, __VA_ARGS__)	\
>  	asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
>  		__attribute__((alias(__stringify(compat_SyS##name))));  \
>  	static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 71969de4058c..12df41669308 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1892,5 +1892,10 @@ extern void __init yama_add_hooks(void);
>  #else
>  static inline void __init yama_add_hooks(void) { }
>  #endif
> +#ifdef CONFIG_SECURITY_SECCOMP
> +extern void __init seccomp_init(void);
> +#else
> +static inline void __init seccomp_init(void) { }
> +#endif
>  
>  #endif /* ! __LINUX_LSM_HOOKS_H */
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 185815c96433..0f846c408bba 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -79,6 +79,8 @@ union bpf_attr;
>  #include <linux/quota.h>
>  #include <linux/key.h>
>  #include <trace/syscall.h>
> +#include <uapi/asm/unistd.h>
> +#include <linux/seccomp.h>
>  
>  /*
>   * __MAP - apply a macro to syscall arguments
> @@ -98,6 +100,24 @@ union bpf_attr;
>  #define __MAP6(m,t,a,...) m(t,a), __MAP5(m,__VA_ARGS__)
>  #define __MAP(n,...) __MAP##n(__VA_ARGS__)
>  
> +#define __COMPARGS6
> +#define __COMPARGS5 , 0
> +#define __COMPARGS4 , 0, 0
> +#define __COMPARGS3 , 0, 0, 0
> +#define __COMPARGS2 , 0, 0, 0, 0
> +#define __COMPARGS1 , 0, 0, 0, 0, 0
> +#define __COMPARGS0 0, 0, 0, 0, 0, 0
> +#define __COMPARGS(n) __COMPARGS##n
> +
> +#define __COMPDECL6
> +#define __COMPDECL5
> +#define __COMPDECL4
> +#define __COMPDECL3
> +#define __COMPDECL2
> +#define __COMPDECL1
> +#define __COMPDECL0 void
> +#define __COMPDECL(n) __COMPDECL##n
> +
>  #define __SC_DECL(t, a)	t a
>  #define __TYPE_IS_L(t)	(__same_type((t)0, 0L))
>  #define __TYPE_IS_UL(t)	(__same_type((t)0, 0UL))
> @@ -175,8 +195,55 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  #define SYSCALL_METADATA(sname, nb, ...)
>  #endif
>  
> +#ifdef CONFIG_SECURITY_SECCOMP
> +/*
> + * Do not store the symbole name but the syscall symbole address.
> + * FIXME: Handle aliased symboles (i.e. different name but same address)?
> + *
> + * @addr: syscall address
> + * @args: syscall arguments C type (i.e. __SACT__* values)
> + */
> +struct syscall_argdesc {
> +	const void *addr;
> +	u8 args[6];
> +};
> +
> +/* Syscall Argument C Type (none means no argument) */
> +#define __SACT__NONE			0
> +#define __SACT__OTHER			1
> +#define __SACT__CONST_CHAR_PTR		2
> +#define __SACT__CHAR_PTR		3
> +
> +#define __SC_ARGDESC_TYPE(t, a)						\
> +	__builtin_types_compatible_p(typeof(t), const char *) ?		\
> +	__SACT__CONST_CHAR_PTR :					\
> +	__builtin_types_compatible_p(typeof(t), char *) ?		\
> +	__SACT__CHAR_PTR :						\
> +	__SACT__OTHER
> +
> +#define SYSCALL_FILL_ARGDESC_SECTION(_section, sname, nb, ...)		\
> +	asmlinkage long sname(__MAP(nb, __SC_DECL, __VA_ARGS__)		\
> +			__COMPDECL(nb));				\
> +	static struct syscall_argdesc __used				\
> +		__attribute__((section(_section)))			\
> +		syscall_argdesc_##sname = {				\
> +			.addr = sname,					\
> +			.args = {					\
> +				__MAP(nb, __SC_ARGDESC_TYPE, __VA_ARGS__)\
> +				__COMPARGS(nb)				\
> +			},						\
> +		};
> +
> +#define SYSCALL_FILL_ARGDESC(...)	\
> +	SYSCALL_FILL_ARGDESC_SECTION("__syscalls_argdesc", __VA_ARGS__)
> +
> +#else
> +#define SYSCALL_FILL_ARGDESC(...)
> +#endif /* CONFIG_SECURITY_SECCOMP */
> +
>  #define SYSCALL_DEFINE0(sname)					\
>  	SYSCALL_METADATA(_##sname, 0);				\
> +	SYSCALL_FILL_ARGDESC(sys_##sname, 0)			\
>  	asmlinkage long sys_##sname(void)
>  
>  #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
> @@ -188,6 +255,7 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  
>  #define SYSCALL_DEFINEx(x, sname, ...)				\
>  	SYSCALL_METADATA(sname, x, __VA_ARGS__)			\
> +	SYSCALL_FILL_ARGDESC(sys##sname, x, __VA_ARGS__)	\
>  	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>  
>  #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
> diff --git a/security/Kconfig b/security/Kconfig
> index e45237897b43..c98fe1a924cd 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -123,6 +123,7 @@ source security/smack/Kconfig
>  source security/tomoyo/Kconfig
>  source security/apparmor/Kconfig
>  source security/yama/Kconfig
> +source security/seccomp/Kconfig
>  
>  source security/integrity/Kconfig
>  
> diff --git a/security/Makefile b/security/Makefile
> index c9bfbc84ff50..0e4cdefc4777 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -8,6 +8,7 @@ subdir-$(CONFIG_SECURITY_SMACK)		+= smack
>  subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
>  subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
>  subdir-$(CONFIG_SECURITY_YAMA)		+= yama
> +subdir-$(CONFIG_SECCOMP_FILTER)		+= seccomp
>  
>  # always enable default capabilities
>  obj-y					+= commoncap.o
> @@ -22,6 +23,7 @@ obj-$(CONFIG_AUDIT)			+= lsm_audit.o
>  obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
>  obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
>  obj-$(CONFIG_SECURITY_YAMA)		+= yama/
> +obj-$(CONFIG_SECCOMP_FILTER)	+= seccomp/
>  obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
>  
>  # Object integrity file lists
> diff --git a/security/seccomp/Kconfig b/security/seccomp/Kconfig
> new file mode 100644
> index 000000000000..7b0fe649ed89
> --- /dev/null
> +++ b/security/seccomp/Kconfig
> @@ -0,0 +1,14 @@
> +config SECURITY_SECCOMP
> +	bool "Seccomp LSM support"
> +	depends on AUDIT
> +	depends on SECCOMP
> +	depends on SECURITY
> +	default y
> +	help
> +	  This selects an extension to the Seccomp BPF to be able to filter
> +	  syscall arguments as kernel objects (e.g. file path).
> +	  This stacked LSM is needed to detect and block race-condition attacks
> +	  against argument evaluation (i.e. TOCTOU). Further information can be
> +	  found in Documentation/prctl/seccomp_filter.txt .
> +
> +	  If you are unsure how to answer this question, answer Y.
> diff --git a/security/seccomp/Makefile b/security/seccomp/Makefile
> new file mode 100644
> index 000000000000..f2e848d81138
> --- /dev/null
> +++ b/security/seccomp/Makefile
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_SECURITY_SECCOMP) := seccomp.o
> +
> +seccomp-y := lsm.o
> diff --git a/security/seccomp/lsm.c b/security/seccomp/lsm.c
> new file mode 100644
> index 000000000000..93c881724341
> --- /dev/null
> +++ b/security/seccomp/lsm.c
> @@ -0,0 +1,87 @@
> +/*
> + * Seccomp Linux Security Module
> + *
> + * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <asm/syscall.h>	/* sys_call_table */
> +#include <linux/compat.h>
> +#include <linux/slab.h>	/* kcalloc() */
> +#include <linux/syscalls.h>	/* syscall_argdesc */
> +
> +#include "lsm.h"
> +
> +/* TODO: Remove the need for CONFIG_SYSFS dependency */
> +
> +struct syscall_argdesc (*seccomp_syscalls_argdesc)[] = NULL;
> +#ifdef CONFIG_COMPAT
> +struct syscall_argdesc (*compat_seccomp_syscalls_argdesc)[] = NULL;
> +#endif	/* CONFIG_COMPAT */
> +
> +static const struct syscall_argdesc *__init
> +find_syscall_argdesc(const struct syscall_argdesc *start,
> +		const struct syscall_argdesc *stop, const void *addr)
> +{
> +	if (unlikely(!addr || !start || !stop)) {
> +		WARN_ON(1);
> +		return NULL;
> +	}
> +
> +	for (; start < stop; start++) {
> +		if (start->addr == addr)
> +			return start;
> +	}
> +	return NULL;
> +}
> +
> +static inline void __init init_argdesc(void)
> +{
> +	const struct syscall_argdesc *argdesc;
> +	const void *addr;
> +	int i;
> +
> +	seccomp_syscalls_argdesc = kcalloc(NR_syscalls,
> +			sizeof((*seccomp_syscalls_argdesc)[0]), GFP_KERNEL);
> +	if (unlikely(!seccomp_syscalls_argdesc)) {
> +		WARN_ON(1);
> +		return;
> +	}
> +	for (i = 0; i < NR_syscalls; i++) {
> +		addr = sys_call_table[i];
> +		argdesc = find_syscall_argdesc(__start_syscalls_argdesc,
> +				__stop_syscalls_argdesc, addr);
> +		if (!argdesc)
> +			continue;
> +
> +		(*seccomp_syscalls_argdesc)[i] = *argdesc;
> +	}
> +
> +#ifdef CONFIG_COMPAT
> +	compat_seccomp_syscalls_argdesc = kcalloc(IA32_NR_syscalls,
> +			sizeof((*compat_seccomp_syscalls_argdesc)[0]),
> +			GFP_KERNEL);
> +	if (unlikely(!compat_seccomp_syscalls_argdesc)) {
> +		WARN_ON(1);
> +		return;
> +	}
> +	for (i = 0; i < IA32_NR_syscalls; i++) {
> +		addr = ia32_sys_call_table[i];
> +		argdesc = find_syscall_argdesc(__start_compat_syscalls_argdesc,
> +				__stop_compat_syscalls_argdesc, addr);
> +		if (!argdesc)
> +			continue;
> +
> +		(*compat_seccomp_syscalls_argdesc)[i] = *argdesc;
> +	}
> +#endif	/* CONFIG_COMPAT */
> +}
> +
> +void __init seccomp_init(void)
> +{
> +	pr_info("seccomp: Becoming ready for sandboxing\n");
> +	init_argdesc();
> +}

This isn't using the LSM infrastructure at all, is it?
It looks like the only reason you're calling it a security
module is to get the initialization code called in
security_init().

Let me amend my previous comment, which was to change
the name of seccomp_init(). Leave it as is, but add a
comment before it that explains why you've put the
call in the midst of the security module initialization.

> diff --git a/security/seccomp/lsm.h b/security/seccomp/lsm.h
> new file mode 100644
> index 000000000000..ededbd27c225
> --- /dev/null
> +++ b/security/seccomp/lsm.h
> @@ -0,0 +1,19 @@
> +/*
> + * Seccomp Linux Security Module
> + *
> + * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/syscalls.h>	/* syscall_argdesc */
> +
> +extern const struct syscall_argdesc __start_syscalls_argdesc[];
> +extern const struct syscall_argdesc __stop_syscalls_argdesc[];
> +
> +#ifdef CONFIG_COMPAT
> +extern const struct syscall_argdesc __start_compat_syscalls_argdesc[];
> +extern const struct syscall_argdesc __stop_compat_syscalls_argdesc[];
> +#endif	/* CONFIG_COMPAT */
> diff --git a/security/security.c b/security/security.c
> index e8ffd92ae2eb..76e50345cd82 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -60,6 +60,7 @@ int __init security_init(void)
>  	 */
>  	capability_add_hooks();
>  	yama_add_hooks();
> +	seccomp_init();
>  
>  	/*
>  	 * Load all the remaining security modules.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (8 preceding siblings ...)
  2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
@ 2016-03-24 16:24 ` Kees Cook
  2016-03-27  5:03   ` Loganaden Velvindron
  2016-04-20 18:21 ` Mickaël Salaün
  2016-04-28  2:36 ` Kees Cook
  11 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2016-03-24 16:24 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening

On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> This series is a proof of concept (not ready for production) to extend seccomp
> with the ability to check argument pointers of syscalls as kernel object (e.g.
> file path). This add a needed feature to create a full sandbox managed by
> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)

This is interesting! I'd really like to get argument inspection
working. I'm going to spend some time examining this series more
closely, but my initial reaction is that I'm suspicious of the ToCToU
checking -- I'd rather there be no race at all. As for the bug-fixes,
I'll get those pulled in now. Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata
  2016-03-24 16:01   ` Casey Schaufler
@ 2016-03-24 21:31     ` Mickaël Salaün
  0 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-24 21:31 UTC (permalink / raw)
  To: Casey Schaufler, linux-security-module
  Cc: Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Daniel Borkmann, David Drysdale, Eric Paris,
	James Morris, Jeff Dike, Julien Tinnes, Kees Cook,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	linux-api, kernel-hardening


[-- Attachment #1.1: Type: text/plain, Size: 3571 bytes --]


On 24/03/2016 17:01, Casey Schaufler wrote:
> On 3/23/2016 6:46 PM, Mickaël Salaün wrote:
>> diff --git a/security/seccomp/lsm.c b/security/seccomp/lsm.c
>> new file mode 100644
>> index 000000000000..93c881724341
>> --- /dev/null
>> +++ b/security/seccomp/lsm.c
>> @@ -0,0 +1,87 @@
>> +/*
>> + * Seccomp Linux Security Module
>> + *
>> + * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2, as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <asm/syscall.h>	/* sys_call_table */
>> +#include <linux/compat.h>
>> +#include <linux/slab.h>	/* kcalloc() */
>> +#include <linux/syscalls.h>	/* syscall_argdesc */
>> +
>> +#include "lsm.h"
>> +
>> +/* TODO: Remove the need for CONFIG_SYSFS dependency */
>> +
>> +struct syscall_argdesc (*seccomp_syscalls_argdesc)[] = NULL;
>> +#ifdef CONFIG_COMPAT
>> +struct syscall_argdesc (*compat_seccomp_syscalls_argdesc)[] = NULL;
>> +#endif	/* CONFIG_COMPAT */
>> +
>> +static const struct syscall_argdesc *__init
>> +find_syscall_argdesc(const struct syscall_argdesc *start,
>> +		const struct syscall_argdesc *stop, const void *addr)
>> +{
>> +	if (unlikely(!addr || !start || !stop)) {
>> +		WARN_ON(1);
>> +		return NULL;
>> +	}
>> +
>> +	for (; start < stop; start++) {
>> +		if (start->addr == addr)
>> +			return start;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static inline void __init init_argdesc(void)
>> +{
>> +	const struct syscall_argdesc *argdesc;
>> +	const void *addr;
>> +	int i;
>> +
>> +	seccomp_syscalls_argdesc = kcalloc(NR_syscalls,
>> +			sizeof((*seccomp_syscalls_argdesc)[0]), GFP_KERNEL);
>> +	if (unlikely(!seccomp_syscalls_argdesc)) {
>> +		WARN_ON(1);
>> +		return;
>> +	}
>> +	for (i = 0; i < NR_syscalls; i++) {
>> +		addr = sys_call_table[i];
>> +		argdesc = find_syscall_argdesc(__start_syscalls_argdesc,
>> +				__stop_syscalls_argdesc, addr);
>> +		if (!argdesc)
>> +			continue;
>> +
>> +		(*seccomp_syscalls_argdesc)[i] = *argdesc;
>> +	}
>> +
>> +#ifdef CONFIG_COMPAT
>> +	compat_seccomp_syscalls_argdesc = kcalloc(IA32_NR_syscalls,
>> +			sizeof((*compat_seccomp_syscalls_argdesc)[0]),
>> +			GFP_KERNEL);
>> +	if (unlikely(!compat_seccomp_syscalls_argdesc)) {
>> +		WARN_ON(1);
>> +		return;
>> +	}
>> +	for (i = 0; i < IA32_NR_syscalls; i++) {
>> +		addr = ia32_sys_call_table[i];
>> +		argdesc = find_syscall_argdesc(__start_compat_syscalls_argdesc,
>> +				__stop_compat_syscalls_argdesc, addr);
>> +		if (!argdesc)
>> +			continue;
>> +
>> +		(*compat_seccomp_syscalls_argdesc)[i] = *argdesc;
>> +	}
>> +#endif	/* CONFIG_COMPAT */
>> +}
>> +
>> +void __init seccomp_init(void)
>> +{
>> +	pr_info("seccomp: Becoming ready for sandboxing\n");
>> +	init_argdesc();
>> +}
> 
> This isn't using the LSM infrastructure at all, is it?
> It looks like the only reason you're calling it a security
> module is to get the initialization code called in
> security_init().
> 
> Let me amend my previous comment, which was to change
> the name of seccomp_init(). Leave it as is, but add a
> comment before it that explains why you've put the
> call in the midst of the security module initialization.

The patch "[RFC v1 16/17] security/seccomp: Protect against filesystem TOCTOU" add LSM hooks, so it make sense to follow your first comment and rename seccomp_init() to seccomp_add_hooks().

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-03-24 16:24 ` [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Kees Cook
@ 2016-03-27  5:03   ` Loganaden Velvindron
  0 siblings, 0 replies; 39+ messages in thread
From: Loganaden Velvindron @ 2016-03-27  5:03 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Mickaël Salaün, linux-security-module,
	Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, David Drysdale,
	Eric Paris, James Morris, Jeff Dike, Julien Tinnes,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API

On Thu, Mar 24, 2016 at 4:24 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
>> Hi,
>>
>> This series is a proof of concept (not ready for production) to extend seccomp
>> with the ability to check argument pointers of syscalls as kernel object (e.g.
>> file path). This add a needed feature to create a full sandbox managed by
>> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
>> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)
>
> This is interesting! I'd really like to get argument inspection
> working. I'm going to spend some time examining this series more
> closely, but my initial reaction is that I'm suspicious of the ToCToU
> checking -- I'd rather there be no race at all. As for the bug-fixes,
> I'll get those pulled in now. Thanks!
>

Personally, I love the OpenBSD pledge() mechanism. It makes it so easy
to apply attack surface reduction. If seccomp moves closer to pledge,
that would be great.

See here:
https://github.com/dimkr/libwaive

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
  2016-03-24  4:35   ` [kernel-hardening] " Kees Cook
@ 2016-03-29 15:35     ` Shuah Khan
  2016-03-29 18:46       ` [kernel-hardening] [PATCH 1/2] " Mickaël Salaün
  0 siblings, 1 reply; 39+ messages in thread
From: Shuah Khan @ 2016-03-29 15:35 UTC (permalink / raw)
  To: Kees Cook, Mickaël Salaün
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening, Shuah Khan

On 03/23/2016 10:35 PM, Kees Cook wrote:
> On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
>> Rename SECCOMP_FLAG_FILTER_TSYNC to SECCOMP_FILTER_FLAG_TSYNC to match
>> the UAPI.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Will Drewry <wad@chromium.org>
> 
> Hah, oops. Thanks! Shuah, can you take this patch into the selftest tree?
> 
> Acked-by: Kees Cook <keescook@chromium.org>
> 

Hi Michael,

Could you please send me the patch. I can't find it in my Inbox. I can get
this into rc-2 with Kees Cook's ack.

thanks,
-- Shuah
> 
>> ---
>>  tools/testing/selftests/seccomp/seccomp_bpf.c | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
>> index b9453b838162..9c1460f277c2 100644
>> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
>> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
>> @@ -1497,8 +1497,8 @@ TEST_F(TRACE_syscall, syscall_dropped)
>>  #define SECCOMP_SET_MODE_FILTER 1
>>  #endif
>>
>> -#ifndef SECCOMP_FLAG_FILTER_TSYNC
>> -#define SECCOMP_FLAG_FILTER_TSYNC 1
>> +#ifndef SECCOMP_FILTER_FLAG_TSYNC
>> +#define SECCOMP_FILTER_FLAG_TSYNC 1
>>  #endif
>>
>>  #ifndef seccomp
>> @@ -1613,7 +1613,7 @@ TEST(TSYNC_first)
>>                 TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
>>         }
>>
>> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
>> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>>                       &prog);
>>         ASSERT_NE(ENOSYS, errno) {
>>                 TH_LOG("Kernel does not support seccomp syscall!");
>> @@ -1831,7 +1831,7 @@ TEST_F(TSYNC, two_siblings_with_ancestor)
>>                 self->sibling_count++;
>>         }
>>
>> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
>> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>>                       &self->apply_prog);
>>         ASSERT_EQ(0, ret) {
>>                 TH_LOG("Could install filter on all threads!");
>> @@ -1892,7 +1892,7 @@ TEST_F(TSYNC, two_siblings_with_no_filter)
>>                 TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
>>         }
>>
>> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
>> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>>                       &self->apply_prog);
>>         ASSERT_NE(ENOSYS, errno) {
>>                 TH_LOG("Kernel does not support seccomp syscall!");
>> @@ -1940,7 +1940,7 @@ TEST_F(TSYNC, two_siblings_with_one_divergence)
>>                 self->sibling_count++;
>>         }
>>
>> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
>> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>>                       &self->apply_prog);
>>         ASSERT_EQ(self->sibling[0].system_tid, ret) {
>>                 TH_LOG("Did not fail on diverged sibling.");
>> @@ -1992,7 +1992,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
>>                 TH_LOG("Kernel does not support SECCOMP_SET_MODE_FILTER!");
>>         }
>>
>> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
>> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>>                       &self->apply_prog);
>>         ASSERT_EQ(ret, self->sibling[0].system_tid) {
>>                 TH_LOG("Did not fail on diverged sibling.");
>> @@ -2021,7 +2021,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
>>         /* Switch to the remaining sibling */
>>         sib = !sib;
>>
>> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
>> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>>                       &self->apply_prog);
>>         ASSERT_EQ(0, ret) {
>>                 TH_LOG("Expected the remaining sibling to sync");
>> @@ -2044,7 +2044,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
>>         while (!kill(self->sibling[sib].system_tid, 0))
>>                 sleep(0.1);
>>
>> -       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
>> +       ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
>>                       &self->apply_prog);
>>         ASSERT_EQ(0, ret);  /* just us chickens */
>>  }
>> --
>> 2.8.0.rc3
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature
  2016-03-24  4:36   ` [kernel-hardening] " Kees Cook
@ 2016-03-29 15:38     ` Shuah Khan
  2016-03-29 18:51       ` [kernel-hardening] [PATCH 2/2] " Mickaël Salaün
  0 siblings, 1 reply; 39+ messages in thread
From: Shuah Khan @ 2016-03-29 15:38 UTC (permalink / raw)
  To: Kees Cook, Mickaël Salaün
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening, Shuah Khan

On 03/23/2016 10:36 PM, Kees Cook wrote:
> On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Will Drewry <wad@chromium.org>
> 
> Another good catch. Shuah, can you take this one too?
> 
> Acked-by: Kees Cook <keescook@chromium.org>
> 
> -Kees

Hi Michael,

Could you please send me the patch. I can't find it in my Inbox. I can get
this into rc-2 with Kees Cook's ack.

thanks,
-- Shuah

> 
>> ---
>>  tools/testing/selftests/seccomp/seccomp_bpf.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
>> index 9c1460f277c2..150829dd7998 100644
>> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
>> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
>> @@ -1502,10 +1502,10 @@ TEST_F(TRACE_syscall, syscall_dropped)
>>  #endif
>>
>>  #ifndef seccomp
>> -int seccomp(unsigned int op, unsigned int flags, struct sock_fprog *filter)
>> +int seccomp(unsigned int op, unsigned int flags, void *args)
>>  {
>>         errno = 0;
>> -       return syscall(__NR_seccomp, op, flags, filter);
>> +       return syscall(__NR_seccomp, op, flags, args);
>>  }
>>  #endif
>>
>> --
>> 2.8.0.rc3
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [PATCH 1/2] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
  2016-03-29 15:35     ` Shuah Khan
@ 2016-03-29 18:46       ` Mickaël Salaün
  2016-03-29 19:06         ` [kernel-hardening] " Shuah Khan
  0 siblings, 1 reply; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-29 18:46 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Shuah Khan,
	Stephen Smalley, Tetsuo Handa, Will Drewry, linux-api,
	kernel-hardening

Rename SECCOMP_FLAG_FILTER_TSYNC to SECCOMP_FILTER_FLAG_TSYNC to match
the UAPI.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index b9453b838162..9c1460f277c2 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1497,8 +1497,8 @@ TEST_F(TRACE_syscall, syscall_dropped)
 #define SECCOMP_SET_MODE_FILTER 1
 #endif
 
-#ifndef SECCOMP_FLAG_FILTER_TSYNC
-#define SECCOMP_FLAG_FILTER_TSYNC 1
+#ifndef SECCOMP_FILTER_FLAG_TSYNC
+#define SECCOMP_FILTER_FLAG_TSYNC 1
 #endif
 
 #ifndef seccomp
@@ -1613,7 +1613,7 @@ TEST(TSYNC_first)
 		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &prog);
 	ASSERT_NE(ENOSYS, errno) {
 		TH_LOG("Kernel does not support seccomp syscall!");
@@ -1831,7 +1831,7 @@ TEST_F(TSYNC, two_siblings_with_ancestor)
 		self->sibling_count++;
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(0, ret) {
 		TH_LOG("Could install filter on all threads!");
@@ -1892,7 +1892,7 @@ TEST_F(TSYNC, two_siblings_with_no_filter)
 		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_NE(ENOSYS, errno) {
 		TH_LOG("Kernel does not support seccomp syscall!");
@@ -1940,7 +1940,7 @@ TEST_F(TSYNC, two_siblings_with_one_divergence)
 		self->sibling_count++;
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(self->sibling[0].system_tid, ret) {
 		TH_LOG("Did not fail on diverged sibling.");
@@ -1992,7 +1992,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
 		TH_LOG("Kernel does not support SECCOMP_SET_MODE_FILTER!");
 	}
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(ret, self->sibling[0].system_tid) {
 		TH_LOG("Did not fail on diverged sibling.");
@@ -2021,7 +2021,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
 	/* Switch to the remaining sibling */
 	sib = !sib;
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(0, ret) {
 		TH_LOG("Expected the remaining sibling to sync");
@@ -2044,7 +2044,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
 	while (!kill(self->sibling[sib].system_tid, 0))
 		sleep(0.1);
 
-	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
+	ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
 		      &self->apply_prog);
 	ASSERT_EQ(0, ret);  /* just us chickens */
 }
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] [PATCH 2/2] selftest/seccomp: Fix the seccomp(2) signature
  2016-03-29 15:38     ` Shuah Khan
@ 2016-03-29 18:51       ` Mickaël Salaün
  2016-03-29 19:07         ` [kernel-hardening] " Shuah Khan
  0 siblings, 1 reply; 39+ messages in thread
From: Mickaël Salaün @ 2016-03-29 18:51 UTC (permalink / raw)
  To: linux-security-module
  Cc: Mickaël Salaün, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Kees Cook, Michael Kerrisk, Paul Moore,
	Richard Weinberger, Serge E . Hallyn, Shuah Khan,
	Stephen Smalley, Tetsuo Handa, Will Drewry, linux-api,
	kernel-hardening

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Will Drewry <wad@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 9c1460f277c2..150829dd7998 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1502,10 +1502,10 @@ TEST_F(TRACE_syscall, syscall_dropped)
 #endif
 
 #ifndef seccomp
-int seccomp(unsigned int op, unsigned int flags, struct sock_fprog *filter)
+int seccomp(unsigned int op, unsigned int flags, void *args)
 {
 	errno = 0;
-	return syscall(__NR_seccomp, op, flags, filter);
+	return syscall(__NR_seccomp, op, flags, args);
 }
 #endif
 
-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [PATCH 1/2] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
  2016-03-29 18:46       ` [kernel-hardening] [PATCH 1/2] " Mickaël Salaün
@ 2016-03-29 19:06         ` Shuah Khan
  0 siblings, 0 replies; 39+ messages in thread
From: Shuah Khan @ 2016-03-29 19:06 UTC (permalink / raw)
  To: Mickaël Salaün, linux-security-module
  Cc: Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, David Drysdale,
	Eric Paris, James Morris, Jeff Dike, Julien Tinnes, Kees Cook,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	linux-api, kernel-hardening, Shuah Khan

On 03/29/2016 12:46 PM, Mickaël Salaün wrote:
> Rename SECCOMP_FLAG_FILTER_TSYNC to SECCOMP_FILTER_FLAG_TSYNC to match
> the UAPI.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Shuah Khan <shuahkh@osg.samsung.com>
> Cc: Will Drewry <wad@chromium.org>
> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

Thanks. Applied to linux-kselftest fixes for 4.6-rc2

-- Shuah

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [PATCH 2/2] selftest/seccomp: Fix the seccomp(2) signature
  2016-03-29 18:51       ` [kernel-hardening] [PATCH 2/2] " Mickaël Salaün
@ 2016-03-29 19:07         ` Shuah Khan
  0 siblings, 0 replies; 39+ messages in thread
From: Shuah Khan @ 2016-03-29 19:07 UTC (permalink / raw)
  To: Mickaël Salaün, linux-security-module
  Cc: Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, David Drysdale,
	Eric Paris, James Morris, Jeff Dike, Julien Tinnes, Kees Cook,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	linux-api, kernel-hardening, Shuah Khan

On 03/29/2016 12:51 PM, Mickaël Salaün wrote:
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Shuah Khan <shuahkh@osg.samsung.com>
> Cc: Will Drewry <wad@chromium.org>
> ---

Thanks. Applied to linux-kselftest fixes for 4.6-rc2

-- Shuah

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (9 preceding siblings ...)
  2016-03-24 16:24 ` [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Kees Cook
@ 2016-04-20 18:21 ` Mickaël Salaün
  2016-04-26 22:46   ` Kees Cook
  2016-04-28  2:36 ` Kees Cook
  11 siblings, 1 reply; 39+ messages in thread
From: Mickaël Salaün @ 2016-04-20 18:21 UTC (permalink / raw)
  To: linux-security-module
  Cc: Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, David Drysdale,
	Eric Paris, James Morris, Jeff Dike, Julien Tinnes, Kees Cook,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	linux-api, kernel-hardening


[-- Attachment #1.1: Type: text/plain, Size: 10853 bytes --]

Hi,

Does anyone had time to review some patches?

What do you think about the ToCToU workarounds?
What about the userland API?

The series can be found here: https://github.com/l0kod/linux/commits/seccomp-object-v1

 Mickaël


On 24/03/2016 02:46, Mickaël Salaün wrote:
> Hi,
> 
> This series is a proof of concept (not ready for production) to extend seccomp
> with the ability to check argument pointers of syscalls as kernel object (e.g.
> file path). This add a needed feature to create a full sandbox managed by
> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)
> 
> The audience for this RFC is limited to security-related actors to discuss
> about this new feature before enlarging the scope to a wider audience. This
> aims to focus on the security goal, usability and architecture before entering
> into the gory details of each subsystem. I also wish to get constructive
> criticisms about the userland API and intrusiveness of the code (and what could
> be the other ways to do it better) before going further (and addressing the
> TODO and FIXME in the code).
> 
> The approach taken is to add the minimum amount of code while still allowing
> the userland to create access rules via seccomp. The current limitation of
> seccomp is to get raw syscall arguments value but there is no way to
> dereference a pointer to check its content (e.g. the first argument of the open
> syscall). This seccomp evolution brings a generic way to check against argument
> pointer regardless from the syscall unlike current LSMs.
> 
> Here is the use case scenario:
> * First, a process must load some groups of seccomp checkers. This checkers are
>   dedicated structs describing a pointed data (e.g. path). They are
>   semantically grouped to be efficiently managed and checked in batch. Each
>   group have a static ID. This IDs are unique and they reference groups only
>   accessible from the filters created by the same process.
> * The loaded checkers are inherited and accessible by the newly created
>   filters. This groups can be referenced by filters with a new return value
>   SECCOMP_RET_ARGEVAL. Value in  SECCOMP_RET_DATA contains a group ID and an
>   argument bitmask. This return value is only meaningful between stacked
>   filters to ask a check and get the result in the extended struct
>   seccomp_data. The new fields are "is_valid_syscall", "arg_group" containing a
>   group ID and "matches[6]" consisting of one 64-bits mask per argument. This
>   bitmasks are useful to get the check result of each checker from a group on a
>   syscall argument which is handy to create a custom access control engine from
>   userland.
> * SECCOMP_RET_ARGEVAL is equivalent to SECCOMP_RET_ACCESS except that the
>   following filters can take a decision regarding a match (e.g. return EACCESS
>   or emulate the syscall).
> 
> Each checker is autonomous and new ones can easily be added in the future.
> There is currently two checkers for path objects:
> * SECCOMP_CHECK_FS_LITERAL checks if a string match a defined path;
> * SECCOMP_CHECK_FS_BENEATH checks if the path representation of a string is
>   equal or equivalent to a file belonging to a defined path.
> 
> This design does not seems too intrusive but is flexible enough to allow a
> powerful sandbox mechanism accessible by any process on Linux. The use of
> seccomp, including this new feature, is more suitable with the help of a
> userland library (e.g. libseccomp) that could help to specify a high-level
> language to express a security policy instead of raw syscall rules.
> 
> The main concern should be about time-of-check-time-of-use (TOCTOU) race
> conditions attacks. Because of the nature of seccomp (executed before the
> effective syscall and before a potential ptrace), it is not possible to block
> all races but to detect them.
> 
> There is still some questions I couldn't answer for sure (grep for FIXME or
> XXX). Comments appreciated.
> 
> Tested on the x86 and UM architectures in 32 and 64 bits (with audit enabled).
> 
> [1] https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/lsm
> 
> 
> # Need for LSM
> 
> Because the arguments can be checked before the syscall actually evaluate them,
> there is two race condition classes:
> * The data pointed by the user address is in control of the userland (e.g. a
>   tracing process) and is so subject to TOCTOU race conditions between the
>   seccomp filter evaluation and the effective resource grabbing (part of each
>   syscall code).
> * The semantic of the pointed data is also subject to race condition because
>   there is no lock on the resource (e.g. file) between the evaluation of the
>   argument by the seccomp filter and the use of the pointed resource by each
>   part of the syscall code.
> 
> The solution to fix these race conditions is to copy the userspace data and to
> lock the pointed resource. Whereas it is easy to copy the userspace data, it is
> not realistic to lock any pointed resources because of obvious locking issues.
> However, it is possible to detect a TOCTOU race condition with the help of LSM
> hooks. This way, we can keep a flexible access control (e.g. by controlling
> syscall return values) while blocking unattended malicious or bogus userland
> behavior (e.g. exploit a race-condition).
> 
> To be able to deny access to a malicious userland behavior we must replay the
> seccomp filters and verify the intermediate return values to find out if the
> filters policy is still respected. Thanks to a cache we can detect if a check
> replay is necessary. Otherwise, the LSM hooks are really quick for
> non-malicious userland.
> 
> # Cache handling
> 
> Each time a checker is called, for each argument to check, it get them from
> it's seccomp_argeval_checked cache if any, or create a new cache entry and put
> it otherwise. This cache entries will be used to evaluate arguments.
> 
> When rechecking in the LSM hooks, first it find out which argument is mapped to
> the hook check and find if it differ from the corresponding cache entry. If it
> match, then return OK without replaying the checks, or if nothing match, replay
> all the checks from this check type.
> 
> # How to use it
> 
> The SECCOMP_ARGFLAG_* help to narrow the rules constraints:
> * SECCOMP_ARGFLAG_FS_DENTRY: Check and rely on the path name.
> * SECCOMP_ARGFLAG_FS_INODE: Check the data "container" whatever it's path name.
> * SECCOMP_ARGFLAG_FS_DEVICE: Check the device (i.e. file system) on which the
>   file is, e.g. it can be use to allow access to USB mass-storage or dm-verity
>   content only
> * SECCOMP_ARGFLAG_FS_MOUNT: Check the file mount point, e.g. can enforce a
>   read-only bind mount (but is less flexible than the other checks)
> * SECCOMP_ARGFLAG_FS_NOFOLLOW: Check the file without following it if it is a
>   symlink. Useful for rename(2) or open(2) with O_NOFOLLOW to have consistent
>   check. However, LSM hooks will deny all unattended accesses set by the rules
>   ignoring this flag (i.e. it act as a fail-safe).
> 
> # Limitations
> 
> ## Ptrace
> If a process can ptrace another one, the tracer can execute whatever syscall it
> wants without being constrained by any seccomp filter from the tracee. This
> apply for this seccomp extension as well. Any seccomp filter should then deny
> the use of ptrace.
> 
> The LSM hooks must ensure that the filters results are the same (with the same
> arguments) but must not deny any ptraced modifications (e.g. syscall argument
> change).
> 
> ## Stateless access
> Unlike current LSMs, the policies are stateless. It's not possible to mark and
> track a kernel object (e.g. file descriptor). Capsicum seems more appropriate
> for this kind of feature.
> 
> ## Resource usage
> We must limit the resources taken by a filter list, and so the number of rules,
> to not allow any process to exhaust the system.
> 
> 
> Regards,
> 
> Mickaël Salaün (17):
>   um: Export the sys_call_table
>   seccomp: Fix typo
>   selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
>   selftest/seccomp: Fix the seccomp(2) signature
>   security/seccomp: Add LSM and create arrays of syscall metadata
>   seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command
>   seccomp: Add seccomp object checker evaluation
>   selftest/seccomp: Remove unknown_ret_is_kill_above_allow test
>   selftest/seccomp: Extend seccomp_data until matches[6]
>   selftest/seccomp: Add field_is_valid_syscall test
>   selftest/seccomp: Add argeval_open_whitelist test
>   audit,seccomp: Extend audit with seccomp state
>   selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read
>   selftest/seccomp: Make tracer_poke() more generic
>   selftest/seccomp: Add argeval_toctou_argument test
>   security/seccomp: Protect against filesystem TOCTOU
>   selftest/seccomp: Add argeval_toctou_filesystem test
> 
>  arch/x86/um/asm/syscall.h                     |   2 +
>  include/asm-generic/vmlinux.lds.h             |  22 +
>  include/linux/audit.h                         |  25 ++
>  include/linux/compat.h                        |  10 +
>  include/linux/lsm_hooks.h                     |   5 +
>  include/linux/seccomp.h                       | 136 +++++-
>  include/linux/syscalls.h                      |  68 +++
>  include/uapi/linux/seccomp.h                  | 105 +++++
>  kernel/audit.h                                |   3 +
>  kernel/auditsc.c                              |  36 +-
>  kernel/fork.c                                 |  13 +-
>  kernel/seccomp.c                              | 594 +++++++++++++++++++++++++-
>  security/Kconfig                              |   1 +
>  security/Makefile                             |   2 +
>  security/seccomp/Kconfig                      |  14 +
>  security/seccomp/Makefile                     |   3 +
>  security/seccomp/checker_fs.c                 | 524 +++++++++++++++++++++++
>  security/seccomp/checker_fs.h                 |  18 +
>  security/seccomp/lsm.c                        | 135 ++++++
>  security/seccomp/lsm.h                        |  19 +
>  security/security.c                           |   1 +
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 572 +++++++++++++++++++++++--
>  22 files changed, 2248 insertions(+), 60 deletions(-)
>  create mode 100644 security/seccomp/Kconfig
>  create mode 100644 security/seccomp/Makefile
>  create mode 100644 security/seccomp/checker_fs.c
>  create mode 100644 security/seccomp/checker_fs.h
>  create mode 100644 security/seccomp/lsm.c
>  create mode 100644 security/seccomp/lsm.h
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-04-20 18:21 ` Mickaël Salaün
@ 2016-04-26 22:46   ` Kees Cook
  0 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2016-04-26 22:46 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening

On Wed, Apr 20, 2016 at 11:21 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> Does anyone had time to review some patches?

Hi! Sorry for the delay on this. I keep getting distracted by other
stuff. I've got some time on a plane tomorrow, so I'll bring your
series along and spend some time reading through it more carefully.

-Kees

>
> What do you think about the ToCToU workarounds?
> What about the userland API?
>
> The series can be found here: https://github.com/l0kod/linux/commits/seccomp-object-v1
>
>  Mickaël
>
>
> On 24/03/2016 02:46, Mickaël Salaün wrote:
>> Hi,
>>
>> This series is a proof of concept (not ready for production) to extend seccomp
>> with the ability to check argument pointers of syscalls as kernel object (e.g.
>> file path). This add a needed feature to create a full sandbox managed by
>> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
>> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)
>>
>> The audience for this RFC is limited to security-related actors to discuss
>> about this new feature before enlarging the scope to a wider audience. This
>> aims to focus on the security goal, usability and architecture before entering
>> into the gory details of each subsystem. I also wish to get constructive
>> criticisms about the userland API and intrusiveness of the code (and what could
>> be the other ways to do it better) before going further (and addressing the
>> TODO and FIXME in the code).
>>
>> The approach taken is to add the minimum amount of code while still allowing
>> the userland to create access rules via seccomp. The current limitation of
>> seccomp is to get raw syscall arguments value but there is no way to
>> dereference a pointer to check its content (e.g. the first argument of the open
>> syscall). This seccomp evolution brings a generic way to check against argument
>> pointer regardless from the syscall unlike current LSMs.
>>
>> Here is the use case scenario:
>> * First, a process must load some groups of seccomp checkers. This checkers are
>>   dedicated structs describing a pointed data (e.g. path). They are
>>   semantically grouped to be efficiently managed and checked in batch. Each
>>   group have a static ID. This IDs are unique and they reference groups only
>>   accessible from the filters created by the same process.
>> * The loaded checkers are inherited and accessible by the newly created
>>   filters. This groups can be referenced by filters with a new return value
>>   SECCOMP_RET_ARGEVAL. Value in  SECCOMP_RET_DATA contains a group ID and an
>>   argument bitmask. This return value is only meaningful between stacked
>>   filters to ask a check and get the result in the extended struct
>>   seccomp_data. The new fields are "is_valid_syscall", "arg_group" containing a
>>   group ID and "matches[6]" consisting of one 64-bits mask per argument. This
>>   bitmasks are useful to get the check result of each checker from a group on a
>>   syscall argument which is handy to create a custom access control engine from
>>   userland.
>> * SECCOMP_RET_ARGEVAL is equivalent to SECCOMP_RET_ACCESS except that the
>>   following filters can take a decision regarding a match (e.g. return EACCESS
>>   or emulate the syscall).
>>
>> Each checker is autonomous and new ones can easily be added in the future.
>> There is currently two checkers for path objects:
>> * SECCOMP_CHECK_FS_LITERAL checks if a string match a defined path;
>> * SECCOMP_CHECK_FS_BENEATH checks if the path representation of a string is
>>   equal or equivalent to a file belonging to a defined path.
>>
>> This design does not seems too intrusive but is flexible enough to allow a
>> powerful sandbox mechanism accessible by any process on Linux. The use of
>> seccomp, including this new feature, is more suitable with the help of a
>> userland library (e.g. libseccomp) that could help to specify a high-level
>> language to express a security policy instead of raw syscall rules.
>>
>> The main concern should be about time-of-check-time-of-use (TOCTOU) race
>> conditions attacks. Because of the nature of seccomp (executed before the
>> effective syscall and before a potential ptrace), it is not possible to block
>> all races but to detect them.
>>
>> There is still some questions I couldn't answer for sure (grep for FIXME or
>> XXX). Comments appreciated.
>>
>> Tested on the x86 and UM architectures in 32 and 64 bits (with audit enabled).
>>
>> [1] https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/lsm
>>
>>
>> # Need for LSM
>>
>> Because the arguments can be checked before the syscall actually evaluate them,
>> there is two race condition classes:
>> * The data pointed by the user address is in control of the userland (e.g. a
>>   tracing process) and is so subject to TOCTOU race conditions between the
>>   seccomp filter evaluation and the effective resource grabbing (part of each
>>   syscall code).
>> * The semantic of the pointed data is also subject to race condition because
>>   there is no lock on the resource (e.g. file) between the evaluation of the
>>   argument by the seccomp filter and the use of the pointed resource by each
>>   part of the syscall code.
>>
>> The solution to fix these race conditions is to copy the userspace data and to
>> lock the pointed resource. Whereas it is easy to copy the userspace data, it is
>> not realistic to lock any pointed resources because of obvious locking issues.
>> However, it is possible to detect a TOCTOU race condition with the help of LSM
>> hooks. This way, we can keep a flexible access control (e.g. by controlling
>> syscall return values) while blocking unattended malicious or bogus userland
>> behavior (e.g. exploit a race-condition).
>>
>> To be able to deny access to a malicious userland behavior we must replay the
>> seccomp filters and verify the intermediate return values to find out if the
>> filters policy is still respected. Thanks to a cache we can detect if a check
>> replay is necessary. Otherwise, the LSM hooks are really quick for
>> non-malicious userland.
>>
>> # Cache handling
>>
>> Each time a checker is called, for each argument to check, it get them from
>> it's seccomp_argeval_checked cache if any, or create a new cache entry and put
>> it otherwise. This cache entries will be used to evaluate arguments.
>>
>> When rechecking in the LSM hooks, first it find out which argument is mapped to
>> the hook check and find if it differ from the corresponding cache entry. If it
>> match, then return OK without replaying the checks, or if nothing match, replay
>> all the checks from this check type.
>>
>> # How to use it
>>
>> The SECCOMP_ARGFLAG_* help to narrow the rules constraints:
>> * SECCOMP_ARGFLAG_FS_DENTRY: Check and rely on the path name.
>> * SECCOMP_ARGFLAG_FS_INODE: Check the data "container" whatever it's path name.
>> * SECCOMP_ARGFLAG_FS_DEVICE: Check the device (i.e. file system) on which the
>>   file is, e.g. it can be use to allow access to USB mass-storage or dm-verity
>>   content only
>> * SECCOMP_ARGFLAG_FS_MOUNT: Check the file mount point, e.g. can enforce a
>>   read-only bind mount (but is less flexible than the other checks)
>> * SECCOMP_ARGFLAG_FS_NOFOLLOW: Check the file without following it if it is a
>>   symlink. Useful for rename(2) or open(2) with O_NOFOLLOW to have consistent
>>   check. However, LSM hooks will deny all unattended accesses set by the rules
>>   ignoring this flag (i.e. it act as a fail-safe).
>>
>> # Limitations
>>
>> ## Ptrace
>> If a process can ptrace another one, the tracer can execute whatever syscall it
>> wants without being constrained by any seccomp filter from the tracee. This
>> apply for this seccomp extension as well. Any seccomp filter should then deny
>> the use of ptrace.
>>
>> The LSM hooks must ensure that the filters results are the same (with the same
>> arguments) but must not deny any ptraced modifications (e.g. syscall argument
>> change).
>>
>> ## Stateless access
>> Unlike current LSMs, the policies are stateless. It's not possible to mark and
>> track a kernel object (e.g. file descriptor). Capsicum seems more appropriate
>> for this kind of feature.
>>
>> ## Resource usage
>> We must limit the resources taken by a filter list, and so the number of rules,
>> to not allow any process to exhaust the system.
>>
>>
>> Regards,
>>
>> Mickaël Salaün (17):
>>   um: Export the sys_call_table
>>   seccomp: Fix typo
>>   selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
>>   selftest/seccomp: Fix the seccomp(2) signature
>>   security/seccomp: Add LSM and create arrays of syscall metadata
>>   seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command
>>   seccomp: Add seccomp object checker evaluation
>>   selftest/seccomp: Remove unknown_ret_is_kill_above_allow test
>>   selftest/seccomp: Extend seccomp_data until matches[6]
>>   selftest/seccomp: Add field_is_valid_syscall test
>>   selftest/seccomp: Add argeval_open_whitelist test
>>   audit,seccomp: Extend audit with seccomp state
>>   selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read
>>   selftest/seccomp: Make tracer_poke() more generic
>>   selftest/seccomp: Add argeval_toctou_argument test
>>   security/seccomp: Protect against filesystem TOCTOU
>>   selftest/seccomp: Add argeval_toctou_filesystem test
>>
>>  arch/x86/um/asm/syscall.h                     |   2 +
>>  include/asm-generic/vmlinux.lds.h             |  22 +
>>  include/linux/audit.h                         |  25 ++
>>  include/linux/compat.h                        |  10 +
>>  include/linux/lsm_hooks.h                     |   5 +
>>  include/linux/seccomp.h                       | 136 +++++-
>>  include/linux/syscalls.h                      |  68 +++
>>  include/uapi/linux/seccomp.h                  | 105 +++++
>>  kernel/audit.h                                |   3 +
>>  kernel/auditsc.c                              |  36 +-
>>  kernel/fork.c                                 |  13 +-
>>  kernel/seccomp.c                              | 594 +++++++++++++++++++++++++-
>>  security/Kconfig                              |   1 +
>>  security/Makefile                             |   2 +
>>  security/seccomp/Kconfig                      |  14 +
>>  security/seccomp/Makefile                     |   3 +
>>  security/seccomp/checker_fs.c                 | 524 +++++++++++++++++++++++
>>  security/seccomp/checker_fs.h                 |  18 +
>>  security/seccomp/lsm.c                        | 135 ++++++
>>  security/seccomp/lsm.h                        |  19 +
>>  security/security.c                           |   1 +
>>  tools/testing/selftests/seccomp/seccomp_bpf.c | 572 +++++++++++++++++++++++--
>>  22 files changed, 2248 insertions(+), 60 deletions(-)
>>  create mode 100644 security/seccomp/Kconfig
>>  create mode 100644 security/seccomp/Makefile
>>  create mode 100644 security/seccomp/checker_fs.c
>>  create mode 100644 security/seccomp/checker_fs.h
>>  create mode 100644 security/seccomp/lsm.c
>>  create mode 100644 security/seccomp/lsm.h
>>
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
                   ` (10 preceding siblings ...)
  2016-04-20 18:21 ` Mickaël Salaün
@ 2016-04-28  2:36 ` Kees Cook
  2016-04-28 23:45   ` Mickaël Salaün
                     ` (2 more replies)
  11 siblings, 3 replies; 39+ messages in thread
From: Kees Cook @ 2016-04-28  2:36 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	David Drysdale, Eric Paris, James Morris, Jeff Dike,
	Julien Tinnes, Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening

On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> This series is a proof of concept (not ready for production) to extend seccomp
> with the ability to check argument pointers of syscalls as kernel object (e.g.
> file path). This add a needed feature to create a full sandbox managed by
> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)
>
> The audience for this RFC is limited to security-related actors to discuss
> about this new feature before enlarging the scope to a wider audience. This
> aims to focus on the security goal, usability and architecture before entering
> into the gory details of each subsystem. I also wish to get constructive
> criticisms about the userland API and intrusiveness of the code (and what could
> be the other ways to do it better) before going further (and addressing the
> TODO and FIXME in the code).
>
> The approach taken is to add the minimum amount of code while still allowing
> the userland to create access rules via seccomp. The current limitation of
> seccomp is to get raw syscall arguments value but there is no way to
> dereference a pointer to check its content (e.g. the first argument of the open
> syscall). This seccomp evolution brings a generic way to check against argument
> pointer regardless from the syscall unlike current LSMs.

Okay, I've read through this whole series now (sorry for the huge
delay). I think that it is overly complex for what it results in
providing. Here are some background thoughts I had:

1) People have asked for "dereferenced argument inspection" (I will
call this DAI...), in that they would like to be able to process
arguments like how BPF traditionally processes packets. This series
doesn't provide that. Rather, it provides static checks against
specific arguments types (currently just path checks).

2) When I dig into the requirements people have around DAI, it's
mostly about checking path names. There is some interest in some of
the network structures, but mostly it's path names. This series
certainly underscores this since your first example is path names. :)

3) Solving ToCToU should also solve performance problems. For example,
this series, on a successful syscall, will look up a pathname twice
(once in seccomp, then again in the syscall, and then compares the
results in the LSM as a ToCToU back-stop). This seems like a waste of
effort, since this reimplements the work the kernel is already doing
to pass the resulting structure to the LSM hooks. As such, since this
series is doing static checks and not allowing byte processing for
DAI, I'm convinced that it should entirely happen in the LSM hooks.

4) Performing the checks in the LSM hooks carries a risk of exposing
the syscall's argument processing code to an attacker, but I think
that is okay since very similar code would already need to be written
to do the same thing before entering the syscall. The only way out of
this, I think, would be to standardize syscall argument processing.

5) If we can standardize syscall argument processing, we could also
change when it happens, and retain the results for the syscall,
allowing for full byte processing style of DAI. e.g. copy userspace to
kernel space, do BPF on the argument, if okay, pass the kernel copy to
the syscall where it continues the processing. If the kernel copy
wasn't already created by seccomp, the syscall would just make that
copy itself, etc.

So, I see DAI as going one of two ways:

a) rewrite all syscall entry to use a common cacheable argument parser
and offering true BPF processing of the argument bytes.

b) use the existing LSM hooks and define a policy language that can be
loaded ahead of time.

Doing "a" has many problems, I think. Not the least of which is that I
can't imagine a way for such an architectural change to not have
negative performance impacts for the regular case.

Doing "b" means writing a policy engine. I would expect it to look a
lot like either AppArmor or TOMOYO. TOMOYO has network structure
processing, so probably it would look more like TOMOYO if you wanted
more than just file paths. Maybe a seccomp LSM could share logic from
one of the existing path-based LSMs.

Another note I had for this series was that because the checker tries
to keep a cached struct path, it allows unprivileged users to check
for path names existing or not, regardless of the user's permissions.
Instead, you have to check the path against the policy each time.
AppArmor does this efficiently with a pre-built deterministic finite
automatons (built from regular expressions), and TOMOYO just does
string compares and limited glob parsing every time.

So, I can't take this as-is, but I'll take the one fix near the start.
:) I hope this isn't too discouraging, since I'd love to see this
solved. Hopefully you can keep chipping away at it!
Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-04-28  2:36 ` Kees Cook
@ 2016-04-28 23:45   ` Mickaël Salaün
  2016-05-21 12:58     ` Mickaël Salaün
  2016-05-02 22:19   ` James Morris
  2016-05-21 15:19   ` Daniel Borkmann
  2 siblings, 1 reply; 39+ messages in thread
From: Mickaël Salaün @ 2016-04-28 23:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, David Drysdale,
	Eric Paris, James Morris, Julien Tinnes, Michael Kerrisk,
	Paul Moore, Serge E . Hallyn, Stephen Smalley, Tetsuo Handa,
	Will Drewry, Linux API, kernel-hardening


[-- Attachment #1.1: Type: text/plain, Size: 7016 bytes --]

Thanks for the comments. Here are mine:

On 28/04/2016 04:36, Kees Cook wrote:
> Okay, I've read through this whole series now (sorry for the huge
> delay). I think that it is overly complex for what it results in
> providing. Here are some background thoughts I had:

It may be a bit complex but my goal was to create a generic framework easily extensible in the future.

> 
> 1) People have asked for "dereferenced argument inspection" (I will
> call this DAI...), in that they would like to be able to process
> arguments like how BPF traditionally processes packets. This series
> doesn't provide that. Rather, it provides static checks against
> specific arguments types (currently just path checks).

The thing is, a network packet can be filtered based on some basic type checks (e.g. integer, bit fields, enum) but it seems really complex to be able to check for stuff like file path (without even thinking about a BPF regex engine :).
However, the approach taken in this series is to allow complex checks based on a path *object* which could not be directly possible by inspecting the argument. Indeed, the kernel can expose more information than just a (user-controlled) string: parent directory, inode, device, mount point…

I think that the need is not to be able to (directly) dereference syscall arguments but to be able to evaluate arguments.

Moreover, exposing raw arguments in the seccomp_data struct can be tricky because of possible multiple pointer indirections.

Last but not least, giving the ability to a BPF to interpret syscall arguments can lead to inconsistent evaluation (incorrect mirroring of the OS code and state, cf. http://www.isoc.org/isoc/conferences/ndss/03/proceedings/papers/11.pdf).

However, another approach could be to expose a high-level (canonicalized path, inode values…) object in the seccomp_data struct but this seems more complex and less flexible. How to store path reference object? How to check path hierarchy?

> 
> 2) When I dig into the requirements people have around DAI, it's
> mostly about checking path names. There is some interest in some of
> the network structures, but mostly it's path names. This series
> certainly underscores this since your first example is path names. :)

Indeed. The same approach could be used to filter arguments as socket.

> 
> 3) Solving ToCToU should also solve performance problems. For example,
> this series, on a successful syscall, will look up a pathname twice
> (once in seccomp, then again in the syscall, and then compares the
> results in the LSM as a ToCToU back-stop). This seems like a waste of
> effort, since this reimplements the work the kernel is already doing
> to pass the resulting structure to the LSM hooks. As such, since this
> series is doing static checks and not allowing byte processing for
> DAI, I'm convinced that it should entirely happen in the LSM hooks.

This could be misleading. This series use the audit cache to not evaluate multiple path (and prevent ToCToU). I think there is then no more penalty than a syscall using multiple times the same file descriptor. If the checked file is not modified by another process, the file (path) cache check is only an integer/address comparison.

According to the current seccomp and syscall workflow in Linux, I don't see any other way to check an argument without modifying the current kernel behavior (e.g. ptrace hook) or locking resources (i.e. file).


> 
> 4) Performing the checks in the LSM hooks carries a risk of exposing
> the syscall's argument processing code to an attacker, but I think
> that is okay since very similar code would already need to be written
> to do the same thing before entering the syscall. The only way out of
> this, I think, would be to standardize syscall argument processing.

I created a basic, but generic and non-intrusive, syscall argument table to store useful types (e.g. int, char *) but standardizing syscall argument processing seems a big and long term task. However, using a cache similar to audit for some argument types seems more realistic ;)

> 
> 5) If we can standardize syscall argument processing, we could also
> change when it happens, and retain the results for the syscall,
> allowing for full byte processing style of DAI. e.g. copy userspace to
> kernel space, do BPF on the argument, if okay, pass the kernel copy to
> the syscall where it continues the processing. If the kernel copy
> wasn't already created by seccomp, the syscall would just make that
> copy itself, etc.

Cf. my first comment and the mirroring code problem (and complexity).

> 
> So, I see DAI as going one of two ways:
> 
> a) rewrite all syscall entry to use a common cacheable argument parser
> and offering true BPF processing of the argument bytes.
> 
> b) use the existing LSM hooks and define a policy language that can be
> loaded ahead of time.
> 
> Doing "a" has many problems, I think. Not the least of which is that I
> can't imagine a way for such an architectural change to not have
> negative performance impacts for the regular case.

Agree :)

> 
> Doing "b" means writing a policy engine. I would expect it to look a
> lot like either AppArmor or TOMOYO. TOMOYO has network structure
> processing, so probably it would look more like TOMOYO if you wanted
> more than just file paths. Maybe a seccomp LSM could share logic from
> one of the existing path-based LSMs.

An interesting thing about BPF is that it's already an engine and would be interesting to do more than denying accesses but, for example, faking syscall return values as it is already possible. Moreover, this keeps a small attack surface.

> 
> Another note I had for this series was that because the checker tries
> to keep a cached struct path, it allows unprivileged users to check
> for path names existing or not, regardless of the user's permissions.

The registration of a new checker (SECCOMP_ADD_CHECKER_GROUP) against a file path should only be allowed according to the current user permissions, and only a filter from the same thread can checks against this same file path. So I don't see how this series allows unprivileged users to check for path names regardless of their permissions.


> Instead, you have to check the path against the policy each time.

Well, each time doesn't means a path parsing but a cache comparison (like does the kernel anyway).

> AppArmor does this efficiently with a pre-built deterministic finite
> automatons (built from regular expressions), and TOMOYO just does
> string compares and limited glob parsing every time.
> 
> So, I can't take this as-is, but I'll take the one fix near the start.
> :) I hope this isn't too discouraging, since I'd love to see this
> solved. Hopefully you can keep chipping away at it!

Of course this series is not ready as-is, but I'm convinced the main ideas could make happy sandbox developers!

Regards,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-04-28  2:36 ` Kees Cook
  2016-04-28 23:45   ` Mickaël Salaün
@ 2016-05-02 22:19   ` James Morris
  2016-05-21 15:19   ` Daniel Borkmann
  2 siblings, 0 replies; 39+ messages in thread
From: James Morris @ 2016-05-02 22:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: Mickaël Salaün, linux-security-module,
	Andreas Gruenbacher, Andy Lutomirski, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, David Drysdale,
	Eric Paris, James Morris, Jeff Dike, Julien Tinnes,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening

On Wed, 27 Apr 2016, Kees Cook wrote:

> Doing "b" means writing a policy engine. I would expect it to look a
> lot like either AppArmor or TOMOYO. TOMOYO has network structure
> processing, so probably it would look more like TOMOYO if you wanted
> more than just file paths. Maybe a seccomp LSM could share logic from
> one of the existing path-based LSMs.

Right, and that LSM should probably be AppArmor, which is actually being 
used and maintained.


-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-04-28 23:45   ` Mickaël Salaün
@ 2016-05-21 12:58     ` Mickaël Salaün
  0 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-05-21 12:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, David Drysdale,
	Eric Paris, James Morris, Julien Tinnes, Michael Kerrisk,
	Paul Moore, Serge E . Hallyn, Stephen Smalley, Tetsuo Handa,
	Will Drewry, Linux API, kernel-hardening


[-- Attachment #1.1: Type: text/plain, Size: 7937 bytes --]

Hi,

I will make another try with an approach closer to the LSM without trying to extend the seccomp filter code too much. This will move all the checks into the LSM, remove the need for the audit cache, remove the (possible) double checks and remove the syscall metadata code.

However, I would really like to get some feedback on the way I use the SECCOMP_ADD_CHECKER_GROUP command with the BPF stack thanks to seccomp_data.{checker_group,arg_matches[]}.
As describe bellow, I think this approach is simple and powerful. I plan to keep this way to identify a kernel object rather than using/extracting some code from AppArmor. It's more flexible and closer to the seccomp way to filter.

 Mickaël


On 29/04/2016 01:45, Mickaël Salaün wrote:
> Thanks for the comments. Here are mine:
> 
> On 28/04/2016 04:36, Kees Cook wrote:
>> Okay, I've read through this whole series now (sorry for the huge
>> delay). I think that it is overly complex for what it results in
>> providing. Here are some background thoughts I had:
> 
> It may be a bit complex but my goal was to create a generic framework easily extensible in the future.
> 
>>
>> 1) People have asked for "dereferenced argument inspection" (I will
>> call this DAI...), in that they would like to be able to process
>> arguments like how BPF traditionally processes packets. This series
>> doesn't provide that. Rather, it provides static checks against
>> specific arguments types (currently just path checks).
> 
> The thing is, a network packet can be filtered based on some basic type checks (e.g. integer, bit fields, enum) but it seems really complex to be able to check for stuff like file path (without even thinking about a BPF regex engine :).
> However, the approach taken in this series is to allow complex checks based on a path *object* which could not be directly possible by inspecting the argument. Indeed, the kernel can expose more information than just a (user-controlled) string: parent directory, inode, device, mount point…
> 
> I think that the need is not to be able to (directly) dereference syscall arguments but to be able to evaluate arguments.
> 
> Moreover, exposing raw arguments in the seccomp_data struct can be tricky because of possible multiple pointer indirections.
> 
> Last but not least, giving the ability to a BPF to interpret syscall arguments can lead to inconsistent evaluation (incorrect mirroring of the OS code and state, cf. http://www.isoc.org/isoc/conferences/ndss/03/proceedings/papers/11.pdf).
> 
> However, another approach could be to expose a high-level (canonicalized path, inode values…) object in the seccomp_data struct but this seems more complex and less flexible. How to store path reference object? How to check path hierarchy?
> 
>>
>> 2) When I dig into the requirements people have around DAI, it's
>> mostly about checking path names. There is some interest in some of
>> the network structures, but mostly it's path names. This series
>> certainly underscores this since your first example is path names. :)
> 
> Indeed. The same approach could be used to filter arguments as socket.
> 
>>
>> 3) Solving ToCToU should also solve performance problems. For example,
>> this series, on a successful syscall, will look up a pathname twice
>> (once in seccomp, then again in the syscall, and then compares the
>> results in the LSM as a ToCToU back-stop). This seems like a waste of
>> effort, since this reimplements the work the kernel is already doing
>> to pass the resulting structure to the LSM hooks. As such, since this
>> series is doing static checks and not allowing byte processing for
>> DAI, I'm convinced that it should entirely happen in the LSM hooks.
> 
> This could be misleading. This series use the audit cache to not evaluate multiple path (and prevent ToCToU). I think there is then no more penalty than a syscall using multiple times the same file descriptor. If the checked file is not modified by another process, the file (path) cache check is only an integer/address comparison.
> 
> According to the current seccomp and syscall workflow in Linux, I don't see any other way to check an argument without modifying the current kernel behavior (e.g. ptrace hook) or locking resources (i.e. file).
> 
> 
>>
>> 4) Performing the checks in the LSM hooks carries a risk of exposing
>> the syscall's argument processing code to an attacker, but I think
>> that is okay since very similar code would already need to be written
>> to do the same thing before entering the syscall. The only way out of
>> this, I think, would be to standardize syscall argument processing.
> 
> I created a basic, but generic and non-intrusive, syscall argument table to store useful types (e.g. int, char *) but standardizing syscall argument processing seems a big and long term task. However, using a cache similar to audit for some argument types seems more realistic ;)
> 
>>
>> 5) If we can standardize syscall argument processing, we could also
>> change when it happens, and retain the results for the syscall,
>> allowing for full byte processing style of DAI. e.g. copy userspace to
>> kernel space, do BPF on the argument, if okay, pass the kernel copy to
>> the syscall where it continues the processing. If the kernel copy
>> wasn't already created by seccomp, the syscall would just make that
>> copy itself, etc.
> 
> Cf. my first comment and the mirroring code problem (and complexity).
> 
>>
>> So, I see DAI as going one of two ways:
>>
>> a) rewrite all syscall entry to use a common cacheable argument parser
>> and offering true BPF processing of the argument bytes.
>>
>> b) use the existing LSM hooks and define a policy language that can be
>> loaded ahead of time.
>>
>> Doing "a" has many problems, I think. Not the least of which is that I
>> can't imagine a way for such an architectural change to not have
>> negative performance impacts for the regular case.
> 
> Agree :)
> 
>>
>> Doing "b" means writing a policy engine. I would expect it to look a
>> lot like either AppArmor or TOMOYO. TOMOYO has network structure
>> processing, so probably it would look more like TOMOYO if you wanted
>> more than just file paths. Maybe a seccomp LSM could share logic from
>> one of the existing path-based LSMs.
> 
> An interesting thing about BPF is that it's already an engine and would be interesting to do more than denying accesses but, for example, faking syscall return values as it is already possible. Moreover, this keeps a small attack surface.
> 
>>
>> Another note I had for this series was that because the checker tries
>> to keep a cached struct path, it allows unprivileged users to check
>> for path names existing or not, regardless of the user's permissions.
> 
> The registration of a new checker (SECCOMP_ADD_CHECKER_GROUP) against a file path should only be allowed according to the current user permissions, and only a filter from the same thread can checks against this same file path. So I don't see how this series allows unprivileged users to check for path names regardless of their permissions.
> 
> 
>> Instead, you have to check the path against the policy each time.
> 
> Well, each time doesn't means a path parsing but a cache comparison (like does the kernel anyway).
> 
>> AppArmor does this efficiently with a pre-built deterministic finite
>> automatons (built from regular expressions), and TOMOYO just does
>> string compares and limited glob parsing every time.
>>
>> So, I can't take this as-is, but I'll take the one fix near the start.
>> :) I hope this isn't too discouraging, since I'd love to see this
>> solved. Hopefully you can keep chipping away at it!
> 
> Of course this series is not ready as-is, but I'm convinced the main ideas could make happy sandbox developers!
> 
> Regards,
>  Mickaël
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-04-28  2:36 ` Kees Cook
  2016-04-28 23:45   ` Mickaël Salaün
  2016-05-02 22:19   ` James Morris
@ 2016-05-21 15:19   ` Daniel Borkmann
  2016-05-22 21:30     ` Mickaël Salaün
  2 siblings, 1 reply; 39+ messages in thread
From: Daniel Borkmann @ 2016-05-21 15:19 UTC (permalink / raw)
  To: Kees Cook, Mickaël Salaün
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, David Drysdale,
	Eric Paris, James Morris, Jeff Dike, Julien Tinnes,
	Michael Kerrisk, Paul Moore, Richard Weinberger,
	Serge E . Hallyn, Stephen Smalley, Tetsuo Handa, Will Drewry,
	Linux API, kernel-hardening, alexei.starovoitov

Hi Mickaël,

[ sorry for commenting so late ... ]

On 04/28/2016 04:36 AM, Kees Cook wrote:
> On Wed, Mar 23, 2016 at 6:46 PM, Mickaël Salaün <mic@digikod.net> wrote:
>> Hi,
>>
>> This series is a proof of concept (not ready for production) to extend seccomp
>> with the ability to check argument pointers of syscalls as kernel object (e.g.
>> file path). This add a needed feature to create a full sandbox managed by
>> userland like the Seatbelt/XNU Sandbox or the OpenBSD Pledge. It was initially
>> inspired from a partial seccomp-LSM prototype [1] but has evolved a lot since :)
>>
>> The audience for this RFC is limited to security-related actors to discuss
>> about this new feature before enlarging the scope to a wider audience. This
>> aims to focus on the security goal, usability and architecture before entering
>> into the gory details of each subsystem. I also wish to get constructive
>> criticisms about the userland API and intrusiveness of the code (and what could
>> be the other ways to do it better) before going further (and addressing the
>> TODO and FIXME in the code).
>>
>> The approach taken is to add the minimum amount of code while still allowing
>> the userland to create access rules via seccomp. The current limitation of
>> seccomp is to get raw syscall arguments value but there is no way to
>> dereference a pointer to check its content (e.g. the first argument of the open
>> syscall). This seccomp evolution brings a generic way to check against argument
>> pointer regardless from the syscall unlike current LSMs.
>
> Okay, I've read through this whole series now (sorry for the huge
> delay). I think that it is overly complex for what it results in
> providing. Here are some background thoughts I had:
>
> 1) People have asked for "dereferenced argument inspection" (I will
> call this DAI...), in that they would like to be able to process
> arguments like how BPF traditionally processes packets. This series
> doesn't provide that. Rather, it provides static checks against
> specific arguments types (currently just path checks).
>
> 2) When I dig into the requirements people have around DAI, it's
> mostly about checking path names. There is some interest in some of
> the network structures, but mostly it's path names. This series
> certainly underscores this since your first example is path names. :)

Out of curiosity, did you have a look whether adding some very basic
eBPF support for seccomp-BPF could also enable you for the option of
inspecting arguments eventually?

With basic, I mean adding new eBPF program type BPF_PROG_TYPE_SECCOMP
and the only things allowed would be to use a very limited set of
helpers. No maps, etc allowed for this type. If needed for extracting
args, you could extend struct seccomp_data for eBPF use, and add new
set of helper functions that would allow you to extract/walk arguments,
and, for example, pass the extracted buffer back to the eBPF prog for
further inspection.

Have a look at samples in [1,2], which are for tracing though, but possibly
it could be designed in a more or less similar way, where clang compiles
this policy down into eBPF bytecode. Did you have a look at this direction
or any thoughts on it?

   [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/tracex5_kern.c
   [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/tracex1_kern.c

> 3) Solving ToCToU should also solve performance problems. For example,
> this series, on a successful syscall, will look up a pathname twice
> (once in seccomp, then again in the syscall, and then compares the
> results in the LSM as a ToCToU back-stop). This seems like a waste of
> effort, since this reimplements the work the kernel is already doing
> to pass the resulting structure to the LSM hooks. As such, since this
> series is doing static checks and not allowing byte processing for
> DAI, I'm convinced that it should entirely happen in the LSM hooks.
>
> 4) Performing the checks in the LSM hooks carries a risk of exposing
> the syscall's argument processing code to an attacker, but I think
> that is okay since very similar code would already need to be written
> to do the same thing before entering the syscall. The only way out of
> this, I think, would be to standardize syscall argument processing.
>
> 5) If we can standardize syscall argument processing, we could also
> change when it happens, and retain the results for the syscall,
> allowing for full byte processing style of DAI. e.g. copy userspace to
> kernel space, do BPF on the argument, if okay, pass the kernel copy to
> the syscall where it continues the processing. If the kernel copy
> wasn't already created by seccomp, the syscall would just make that
> copy itself, etc.
>
> So, I see DAI as going one of two ways:
>
> a) rewrite all syscall entry to use a common cacheable argument parser
> and offering true BPF processing of the argument bytes.
>
> b) use the existing LSM hooks and define a policy language that can be
> loaded ahead of time.
>
> Doing "a" has many problems, I think. Not the least of which is that I
> can't imagine a way for such an architectural change to not have
> negative performance impacts for the regular case.
>
> Doing "b" means writing a policy engine. I would expect it to look a
> lot like either AppArmor or TOMOYO. TOMOYO has network structure
> processing, so probably it would look more like TOMOYO if you wanted
> more than just file paths. Maybe a seccomp LSM could share logic from
> one of the existing path-based LSMs.
>
> Another note I had for this series was that because the checker tries
> to keep a cached struct path, it allows unprivileged users to check
> for path names existing or not, regardless of the user's permissions.
> Instead, you have to check the path against the policy each time.
> AppArmor does this efficiently with a pre-built deterministic finite
> automatons (built from regular expressions), and TOMOYO just does
> string compares and limited glob parsing every time.
>
> So, I can't take this as-is, but I'll take the one fix near the start.
> :) I hope this isn't too discouraging, since I'd love to see this
> solved. Hopefully you can keep chipping away at it!
> Thanks!
>
> -Kees
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
  2016-05-21 15:19   ` Daniel Borkmann
@ 2016-05-22 21:30     ` Mickaël Salaün
  0 siblings, 0 replies; 39+ messages in thread
From: Mickaël Salaün @ 2016-05-22 21:30 UTC (permalink / raw)
  To: Daniel Borkmann, Kees Cook
  Cc: linux-security-module, Andreas Gruenbacher, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, David Drysdale, Eric Paris,
	James Morris, Jeff Dike, Julien Tinnes, Michael Kerrisk,
	Paul Moore, Richard Weinberger, Serge E . Hallyn,
	Stephen Smalley, Tetsuo Handa, Will Drewry, Linux API,
	kernel-hardening, alexei.starovoitov


[-- Attachment #1.1: Type: text/plain, Size: 2022 bytes --]

Hi Daniel,

On 21/05/2016 17:19, Daniel Borkmann wrote:
> Out of curiosity, did you have a look whether adding some very basic
> eBPF support for seccomp-BPF could also enable you for the option of
> inspecting arguments eventually?
> 
> With basic, I mean adding new eBPF program type BPF_PROG_TYPE_SECCOMP
> and the only things allowed would be to use a very limited set of
> helpers. No maps, etc allowed for this type. If needed for extracting
> args, you could extend struct seccomp_data for eBPF use, and add new
> set of helper functions that would allow you to extract/walk arguments,
> and, for example, pass the extracted buffer back to the eBPF prog for
> further inspection.
> 
> Have a look at samples in [1,2], which are for tracing though, but possibly
> it could be designed in a more or less similar way, where clang compiles
> this policy down into eBPF bytecode. Did you have a look at this direction
> or any thoughts on it?
> 
>   [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/tracex5_kern.c
>   [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/tracex1_kern.c

One of my initial goals was to use as much as possible the existing code without modifying the BPF part. I use (or abuse) the seccomp BPF stack to be able to run some checks by the kernel outside the BPF but get the result from each intermediate BPF.

I have not really looked at the eBPF possibilities, but that seems interesting now that I plan to move the kernel object evaluation part only in the LSM. However, the current seccomp code is whitelisting a very small subset of BPF and it would extend the attack surface to add some more commands. But maybe, as you said, we could create some custom eBPF functions dedicated to kernel object inspection and add them (BPF_CALL) to the current whitelist (for the LSM). It would be less hacky than the stacked BPF I used, but could be more complex.

Kees, what do your think about this?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-05-22 21:30 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-24  1:46 [kernel-hardening] [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 01/17] um: Export the sys_call_table Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 02/17] seccomp: Fix typo Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 03/17] selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC Mickaël Salaün
2016-03-24  4:35   ` [kernel-hardening] " Kees Cook
2016-03-29 15:35     ` Shuah Khan
2016-03-29 18:46       ` [kernel-hardening] [PATCH 1/2] " Mickaël Salaün
2016-03-29 19:06         ` [kernel-hardening] " Shuah Khan
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 04/17] selftest/seccomp: Fix the seccomp(2) signature Mickaël Salaün
2016-03-24  4:36   ` [kernel-hardening] " Kees Cook
2016-03-29 15:38     ` Shuah Khan
2016-03-29 18:51       ` [kernel-hardening] [PATCH 2/2] " Mickaël Salaün
2016-03-29 19:07         ` [kernel-hardening] " Shuah Khan
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 05/17] security/seccomp: Add LSM and create arrays of syscall metadata Mickaël Salaün
2016-03-24 15:47   ` [kernel-hardening] " Casey Schaufler
2016-03-24 16:01   ` Casey Schaufler
2016-03-24 21:31     ` Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 06/17] seccomp: Add the SECCOMP_ADD_CHECKER_GROUP command Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 07/17] seccomp: Add seccomp object checker evaluation Mickaël Salaün
2016-03-24  1:46 ` [kernel-hardening] [RFC v1 08/17] selftest/seccomp: Remove unknown_ret_is_kill_above_allow test Mickaël Salaün
2016-03-24  2:53 ` [kernel-hardening] [RFC v1 09/17] selftest/seccomp: Extend seccomp_data until matches[6] Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 10/17] selftest/seccomp: Add field_is_valid_syscall test Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 11/17] selftest/seccomp: Add argeval_open_whitelist test Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 12/17] audit,seccomp: Extend audit with seccomp state Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 13/17] selftest/seccomp: Rename TRACE_poke to TRACE_poke_sys_read Mickaël Salaün
2016-03-24  2:53   ` [kernel-hardening] [RFC v1 14/17] selftest/seccomp: Make tracer_poke() more generic Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 15/17] selftest/seccomp: Add argeval_toctou_argument test Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 16/17] security/seccomp: Protect against filesystem TOCTOU Mickaël Salaün
2016-03-24  2:54   ` [kernel-hardening] [RFC v1 17/17] selftest/seccomp: Add argeval_toctou_filesystem test Mickaël Salaün
2016-03-24 16:24 ` [kernel-hardening] Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing Kees Cook
2016-03-27  5:03   ` Loganaden Velvindron
2016-04-20 18:21 ` Mickaël Salaün
2016-04-26 22:46   ` Kees Cook
2016-04-28  2:36 ` Kees Cook
2016-04-28 23:45   ` Mickaël Salaün
2016-05-21 12:58     ` Mickaël Salaün
2016-05-02 22:19   ` James Morris
2016-05-21 15:19   ` Daniel Borkmann
2016-05-22 21:30     ` Mickaël Salaün

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox