LKML Archive on lore.kernel.org
 help / color / Atom feed
* [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing
@ 2016-09-14  7:23 Mickaël Salaün
  2016-09-14  7:23 ` [RFC v3 01/22] landlock: Add Kconfig Mickaël Salaün
                   ` (22 more replies)
  0 siblings, 23 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Hi,

This series, improvement of the previous RFC [1], is a proof of concept to fill
some missing part of seccomp as the ability to check syscall argument pointers
or creating more dynamic security policies. The goal of this new stackable
Linux Security Module (LSM) called Landlock is to allow any process, including
unprivileged ones, to create powerful security sandboxes comparable to the
Seatbelt/XNU Sandbox or the OpenBSD Pledge. This kind of sandbox help to
mitigate the security impact of bugs or unexpected/malicious behaviors in
userland applications.


# Landlock LSM

This series is mainly focused on cgroups while keeping the possibility to
enforce access rules through seccomp. eBPF programs are used to create a
security rule. They are very limited (i.e. can only call a whitelist of
functions) and can not do a denial of service (i.e. no loop). A new dedicated
eBPF map allows to collect and compare Landlock handles with system resources
(e.g. files or network connections).

The approach taken is to add the minimum amount of code while still allowing
the userland to create quite complex access rules. A dedicated security policy
language such as used by SELinux, AppArmor and other major LSMs is a lot of
code and dedicated to a trusted process (i.e. root/administrator).


# eBPF

To get an expressive language while still being safe and small, Landlock is
based on eBPF. Landlock should be usable by untrusted processes and must then
expose a minimal attack surface. The eBPF bytecode is minimal while powerful,
widely used and thought to be used by not so trusted application. Reusing this
code allows to not reproduce the same mistakes and minimize new code  while
still taking a generic approach. There is only some new features like a new
kind of arraymap and few dedicated eBPF functions.

An eBPF program have access to an eBPF context which contains the LSM hook
arguments (as does seccomp-bpf with syscall arguments). They can be used
directly or passed to helper functions according to their types. It is then
possible to do complex access checks without race conditions nor inconsistent
evaluation (i.e. incorrect mirroring of the OS code and state [2]).

There is one eBPF program subtype per LSM hook. This allow to statically check
which context access is performed by an eBPF program. This is needed to deny
kernel address leak and ensure the right use of LSM hook arguments with eBPF
functions. Moreover, this safe pointer handling remove the need for runtime
check or abstract data, which improve performances. Any user can add multiple
Landlock eBPF programs per LSM hook. They are stacked and evaluated one after
the other (cf. seccomp-bpf).


# LSM hooks

Contrary to syscalls, LSM hooks are security checkpoints and are not
architecture dependant. They are designed to match a security need reflected by
a security policy (e.g. access to a file). Exposing parts of some LSM hooks
instead of using the syscall API for sandboxing should help to avoid bugs and
hacks as encountered by the first RFC. Instead of redoing the work of the LSM
hooks through syscalls, we should use and expose them as does policies of
access control LSM.

Only a subset of the hooks are meaningful for an unprivileged sandbox mechanism
(e.g. file system or network access control). Landlock use an abstraction of
raw LSM hooks, which allow to deal with possible future API changes of the LSM
hook API. Moreover, thanks to the ePBF program typing (per LSM hook) used by
Landlock, it should not be hard to make such evolutions backward compatible.


# Use case scenario

First, a process need to create a new dedicated eBPF map containing handles.
This handles are references to system resources (e.g. file or directory) and
grouped in one or multiple maps to be efficiently managed and checked in
batches. This kind of map can be passed to Landlock eBPF functions to compare,
for example, with a file access request. The handles are only accessible from
the eBPF programs created by the same thread.

The loaded Landlock eBPF programs can be triggered by a seccomp filter
returning RET_LANDLOCK. In addition, a cookie (16-bit value) can be passed from
a seccomp filter to eBPF programs. This allow flexible security policies
between seccomp and Landlock.

Another way to enforce a Landlock security policy is to attach Landlock
programs to a dedicated cgroup. All the processes in this cgroup will then be
subject to this policy. For unprivileged processes, this can be done thanks to
cgroup delegation.

A triggered Landlock eBPF program can allows or deny an access, according to
its subtype (i.e. LSM hook), thanks to errno return values.


# Sandbox example with process hierarchy sandboxing (seccomp)

  $ ls /home
  user1
  $ LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
      ./samples/landlock/sandbox /bin/sh -i
  Launching a new sandboxed process.
  $ ls /home
  ls: cannot open directory '/home': Permission denied


# Sandbox example with conditional access control depending on a cgroup

  $ mkdir /sys/fs/cgroup/sandboxed
  $ ls /home
  user1
  $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
      LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
      ./samples/landlock/sandbox
  Ready to sandbox with cgroups.
  $ ls /home
  user1
  $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
  $ ls /home
  ls: cannot open directory '/home': Permission denied


# Current limitations and possible improvements

For now, eBPF programs can only return an errno code. It may be interesting to
be able to do other actions like seccomp-filter does (e.g. kill process). Such
features can easily be implemented but the main advantage of the current
approach is to be able to only execute eBPF programs until one return an errno
code instead of executing all programs like seccomp-filter does.

It is quite easy to add new eBPF functions to extend Landlock. The main concern
should be about the ability to leak information from the current process to
another one (e.g. through maps) to not reproduce the same security sensitive
behavior as ptrace.

This design does not seems too intrusive but is flexible enough to allow a
powerful sandbox mechanism accessible by any process on Linux. The use of
seccomp and Landlock is more suitable with the help of a userland library (e.g.
libseccomp) that could help to specify a high-level language to express a
security policy instead of raw eBPF programs. Moreover, thanks to LLVM, it is
possible to express an eBPF program with a subset of C.


# FAQ

## Why not use a policy language like used by SElinux or AppArmor?

This kind of LSMs are dedicated to administrators. They already manage the
system and are not a threat to the system security. However, seccomp, and
Landlock too, should be available to anyone, which potentially include
untrusted users and processes. To reduce the attack surface, Landlock should
expose the minimum amount of code, hence minimal complexity. Moreover, another
threat is to make accessible to a malicious code a new way to gain more
information. For example, Landlock features should not allow a program to get
the file owner if the directory containing this file is not readable. This data
could then be exfiltrated thanks to the access result. Thus, we should limit
the expressiveness of the available checks. The current approach is to do the
checks in such a way that only a comparison with an already accessed resource
(e.g. file descriptor) is possible. This allow to have a reference to compare
with, without exposing much information.


## Why a new LSM? Does SELinux, AppArmor, Smack or Tomoyo are not good enough?

The current access control LSMs are fine for their purpose which is to give the
*root* the ability to enforce a security policy for the *system*. What is
missing is a way to enforce a security policy for any applications by its
developer and *unprivileged user* as seccomp can do for raw syscall filtering.
Moreover, Landlock handles stacked hook programs from different users. It must
then ensure there is no possible malicious interactions between this programs.

Difference with other (access control) LSMs:
* not only dedicated to administrators (i.e. no_new_priv);
* limited kernel attack surface (e.g. policy parsing);
* helpers to compare complex objects (path/FD), no access to internal kernel
  data (do not leak addresses);
* constraint policy rules/programs (no DoS: deterministic execution time);
* do not leak more information than the loader process can legitimately have
  access to (minimize metadata inference): must compare from an already allowed
  file (through a handle).


## Why does seccomp-filter is not enough?

A seccomp filter can access to raw syscall arguments which means that it is not
possible to filter according to pointed data as a file path. As demonstrated
the first version of this patch series, filtering at the syscall level is
complicated (e.g. need to take care of race conditions). This is mainly because
the access control checkpoints of the kernel are not at this high-level but
more underneath, at LSM hooks level. The LSM hooks are designed to handle this
kind of checks. This series use this approach to leverage the ability of
unprivileged users to limit themselves.

Cf. "What it isn't?" in Documentation/prctl/seccomp_filter.txt


## As a developer, why do I need this feature?

Landlock's goal is to help userland to limit its attack surface.
Security-conscious developers would like to protect users from a security bug
in their applications and the third-party dependencies they are using. Such a
bug can compromise all the user data and help an attacker to perform a
privilege escalation. Using an *unprivileged sandbox* feature such as Landlock
empower the developer with the ability to properly compartmentalize its
software and limit the impact of being compromised.


## As a user, why do I need a this feature?

Any user can already use seccomp-filter to whitelist a set of syscalls to
reduce the kernel attack surface for a set of processes. However an
unprivileged user can't create a security policy as the root user can thanks to
SELinux and other access control LSMs. Landlock allows any unprivileged user to
protect their data from being accessed by any process they run but only an
identified subset. User tools can be created to help create such a high-level
access control policy. This policy may not be powerful enough to express the
same policies as the current access control LSMs, because of the threat an
unprivileged user can be to the system, but it should be enough for most
use-cases (e.g. blacklist or whitelist a set of file hierarchies).


## Does Landlock can limit network access or other resources?

Limiting network access is obviously in the scope of Landlock but it is not yet
implemented. The main goal now is to get feedback about the whole concept, the
API and the file access control part. More access control types could be
implemented in the future.


## Why using the seccomp(2) syscall?

Landlock use the same semantic as seccomp to apply access rule restrictions. It
add a new layer of security for the current process which is inherited by its
childs. It make sense to use an unique access-restricting syscall (that should
be allowed by seccomp-filter rules) which can only drop privileges. Moreover, a
Landlock eBPF program could come from outside a process (e.g. passed through a
UNIX socket). It is then useful to differentiate the creation/load of Landlock
eBPF programs via bpf(2), from rule enforcing via seccomp(2).


## Why using cgroups?

cgroups are designed to handle groups of processes. One use case is to manage
containers. Sandboxing based on process hierarchy (seccomp) is design to handle
immutable security policies, which is a good security property but does not
match all use cases. A user can attach Landlock rules to a cgroup. Doing so,
all the processes in that cgroup will be subject to the security policy.
However, if the user is allowed to manage this cgroup, it could dynamically
move this group of processes to a cgroup with another security policy (or
none). Landlock rules can be applied either on a process hierarchy (e.g.
application with built-in sandboxing) or a group of processes (e.g. container
sandboxing). Both approaches can be combined for the same process.


# Changes since RFC v2

* revamp cgroup handling:
  * use Daniel Mack's patches "Add eBPF hooks for cgroups" v5 [3]
  * remove bpf_landlock_cmp_cgroup_beneath()
  * make BPF_PROG_ATTACH usable with delegated cgroups
  * add a new CGRP_NO_NEW_PRIVS flag for safe cgroups
  * handle Landlock sandboxing for cgroups hierarchy
  * allow unprivileged processes to attach Landlock eBPF program to cgroups
* add subtype to eBPF programs:
  * replace Landlock hook identification by custom eBPF program types with a
    dedicated subtype field
  * manage fine-grained privileged Landlock programs
  * register Landlock programs for dedicated trigger origins (e.g. syscall,
    return from seccomp filter and/or interruption)
* performance and memory optimizations: use an array to access Landlock hooks
  directly but do not duplicated it for each thread (seccomp-based)
* allow running Landlock programs without seccomp filter
* fix seccomp-related issues
* remove extra errno bounding check for Landlock programs
* add some examples for optional eBPF functions or context access (network
  related) according to security checks to allow more features for privileged
  programs (e.g. Checmate)


# Changes since RFC v1

* focus on the LSM hooks, not the syscalls:
  * much more simple implementation
  * does not need audit cache tricks to avoid race conditions
  * more simple to use and more generic because using the LSM hook abstraction
    directly
  * more efficient because only checking in LSM hooks
  * architecture agnostic
* switch from cBPF to eBPF:
  * new eBPF program types dedicated to Landlock
  * custom functions used by the eBPF program
  * gain some new features (e.g. 10 registers, can load values of different
	size, LLVM translator) but only a few functions allowed and a dedicated map
    type
  * new context: LSM hook ID, cookie and LSM hook arguments
  * need to set the sysctl kernel.unprivileged_bpf_disable to 0 (default value)
    to be able to load hook filters as unprivileged users
* smaller and simpler:
  * no more checker groups but dedicated arraymap of handles
  * simpler userland structs thanks to eBPF functions
* distinctive name: Landlock


This series can be applied on top of the Daniel Mack's patches for
BPF_PROG_ATTACH v5 [3] on Linux next-20160913 (commit
562d4a2d7fa26b11d995f418951f3396a5d0f550). This can be
tested with CONFIG_SECURITY_LANDLOCK, CONFIG_SECCOMP_FILTER and
CONFIG_CGROUP_BPF. I would really appreciate constructive comments on the
usability, architecture, code and userland API of Landlock LSM.

[1] https://lkml.kernel.org/r/1472121165-29071-1-git-send-email-mic@digikod.net
[2] https://crypto.stanford.edu/cs155/papers/traps.pdf
[3] https://lkml.kernel.org/r/1473696735-11269-1-git-send-email-daniel@zonque.org

Regards,

Mickaël Salaün (22):
  landlock: Add Kconfig
  bpf: Move u64_to_ptr() to BPF headers and inline it
  bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  bpf: Set register type according to is_valid_access()
  bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier
  landlock: Add LSM hooks
  landlock: Handle file comparisons
  seccomp: Fix documentation for struct seccomp_filter
  seccomp: Move struct seccomp_filter in seccomp.h
  seccomp: Split put_seccomp_filter() with put_seccomp()
  seccomp,landlock: Handle Landlock hooks per process hierarchy
  bpf: Cosmetic change for bpf_prog_attach()
  bpf/cgroup: Replace struct bpf_prog with union bpf_object
  bpf/cgroup: Make cgroup_bpf_update() return an error code
  bpf/cgroup: Move capability check
  bpf/cgroup,landlock: Handle Landlock hooks per cgroup
  cgroup: Add access check for cgroup_get_from_fd()
  cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  landlock: Add interrupted origin
  landlock: Add update and debug access flags
  bpf,landlock: Add optional skb pointer in the Landlock context
  samples/landlock: Add sandbox example

 include/linux/bpf-cgroup.h     |  19 +-
 include/linux/bpf.h            |  45 +++-
 include/linux/cgroup-defs.h    |   9 +
 include/linux/cgroup.h         |   2 +-
 include/linux/filter.h         |   1 +
 include/linux/landlock.h       |  86 ++++++++
 include/linux/lsm_hooks.h      |   5 +
 include/linux/seccomp.h        |  58 +++++-
 include/uapi/linux/bpf.h       | 122 +++++++++++
 include/uapi/linux/seccomp.h   |   2 +
 kernel/bpf/arraymap.c          | 226 ++++++++++++++++++++-
 kernel/bpf/cgroup.c            |  78 ++++---
 kernel/bpf/syscall.c           |  78 ++++---
 kernel/bpf/verifier.c          |  47 ++++-
 kernel/cgroup.c                |  66 +++++-
 kernel/fork.c                  |  25 ++-
 kernel/seccomp.c               | 107 +++++++---
 kernel/trace/bpf_trace.c       |  12 +-
 net/core/filter.c              |  21 +-
 samples/Makefile               |   2 +-
 samples/landlock/.gitignore    |   1 +
 samples/landlock/Makefile      |  16 ++
 samples/landlock/sandbox.c     | 307 ++++++++++++++++++++++++++++
 security/Kconfig               |   1 +
 security/Makefile              |   2 +
 security/landlock/Kconfig      |  23 +++
 security/landlock/Makefile     |   3 +
 security/landlock/checker_fs.c | 179 ++++++++++++++++
 security/landlock/checker_fs.h |  20 ++
 security/landlock/common.h     |  27 +++
 security/landlock/lsm.c        | 451 +++++++++++++++++++++++++++++++++++++++++
 security/landlock/manager.c    | 281 +++++++++++++++++++++++++
 security/security.c            |   1 +
 33 files changed, 2194 insertions(+), 129 deletions(-)
 create mode 100644 include/linux/landlock.h
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/checker_fs.c
 create mode 100644 security/landlock/checker_fs.h
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/lsm.c
 create mode 100644 security/landlock/manager.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 01/22] landlock: Add Kconfig
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
@ 2016-09-14  7:23 ` Mickaël Salaün
  2016-09-14  7:23 ` [RFC v3 02/22] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Initial Landlock Kconfig needed to split the Landlock eBPF and seccomp
parts to ease the review.

Changes from v2:
* add seccomp filter or cgroups (with eBPF programs attached support)
  dependencies

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---
 security/Kconfig          |  1 +
 security/landlock/Kconfig | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)
 create mode 100644 security/landlock/Kconfig

diff --git a/security/Kconfig b/security/Kconfig
index 118f4549404e..c63194c561c5 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -164,6 +164,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/landlock/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index 000000000000..dec64270b06d
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,23 @@
+config SECURITY_LANDLOCK
+	bool "Landlock sandbox support"
+	depends on SECURITY
+	depends on BPF_SYSCALL
+	depends on SECCOMP_FILTER || CGROUP_BPF
+	default y
+	help
+	  Landlock is a stacked LSM which allows any user to load a security
+	  policy to restrict their processes (i.e. create a sandbox). The
+	  policy is a list of stacked eBPF programs for some LSM hooks. Each
+	  program can do some access comparison to check if an access request
+	  is legitimate.
+
+	  You need to enable seccomp filter and/or cgroups (with eBPF programs
+	  attached support) to apply a security policy to either a process
+	  hierarchy (e.g. application with built-in sandboxing) or a group of
+	  processes (e.g. container sandboxing). It is recommended to enable
+	  both seccomp filter and cgroups.
+
+	  Further information about eBPF can be found in
+	  Documentation/networking/filter.txt
+
+	  If you are unsure how to answer this question, answer Y.
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 02/22] bpf: Move u64_to_ptr() to BPF headers and inline it
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
  2016-09-14  7:23 ` [RFC v3 01/22] landlock: Add Kconfig Mickaël Salaün
@ 2016-09-14  7:23 ` Mickaël Salaün
  2016-09-14  7:23 ` [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This helper will be useful for arraymap (next commit).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h  | 6 ++++++
 kernel/bpf/syscall.c | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a904f63f8c1..fa9a988400d9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -274,6 +274,12 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
 
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
+
+/* helper to convert user pointers passed inside __aligned_u64 fields */
+static inline void __user *u64_to_ptr(__u64 val)
+{
+	return (void __user *) (unsigned long) val;
+}
 #else
 static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1a8592a082ce..776c752604b0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -252,12 +252,6 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
 	return map;
 }
 
-/* helper to convert user pointers passed inside __aligned_u64 fields */
-static void __user *u64_to_ptr(__u64 val)
-{
-	return (void __user *) (unsigned long) val;
-}
-
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
 	return -ENOTSUPP;
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
  2016-09-14  7:23 ` [RFC v3 01/22] landlock: Add Kconfig Mickaël Salaün
  2016-09-14  7:23 ` [RFC v3 02/22] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
@ 2016-09-14  7:23 ` Mickaël Salaün
  2016-09-14 18:51   ` Alexei Starovoitov
  2016-10-03 23:53   ` Kees Cook
  2016-09-14  7:23 ` [RFC v3 04/22] bpf: Set register type according to is_valid_access() Mickaël Salaün
                   ` (19 subsequent siblings)
  22 siblings, 2 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This new arraymap looks like a set and brings new properties:
* strong typing of entries: the eBPF functions get the array type of
  elements instead of CONST_PTR_TO_MAP (e.g.
  CONST_PTR_TO_LANDLOCK_HANDLE_FS);
* force sequential filling (i.e. replace or append-only update), which
  allow quick browsing of all entries.

This strong typing is useful to statically check if the content of a map
can be passed to an eBPF function. For example, Landlock use it to store
and manage kernel objects (e.g. struct file) instead of dealing with
userland raw data. This improve efficiency and ensure that an eBPF
program can only call functions with the right high-level arguments.

The enum bpf_map_handle_type list low-level types (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
updating a map entry (handle). This handle types are used to infer a
high-level arraymap type which are listed in enum bpf_map_array_type
(e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).

For now, this new arraymap is only used by Landlock LSM (cf. next
commits) but it could be useful for other needs.

Changes since v2:
* add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
  handle entries (suggested by Andy Lutomirski)
* remove useless checks

Changes since v1:
* arraymap of handles replace custom checker groups
* simpler userland API

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
---
 include/linux/bpf.h      |  14 ++++
 include/uapi/linux/bpf.h |  18 +++++
 kernel/bpf/arraymap.c    | 203 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c    |  12 ++-
 4 files changed, 246 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fa9a988400d9..eae4ce4542c1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -13,6 +13,10 @@
 #include <linux/percpu.h>
 #include <linux/err.h>
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include <linux/fs.h> /* struct file */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 struct perf_event;
 struct bpf_map;
 
@@ -38,6 +42,7 @@ struct bpf_map_ops {
 struct bpf_map {
 	atomic_t refcnt;
 	enum bpf_map_type map_type;
+	enum bpf_map_array_type map_array_type;
 	u32 key_size;
 	u32 value_size;
 	u32 max_entries;
@@ -187,6 +192,9 @@ struct bpf_array {
 	 */
 	enum bpf_prog_type owner_prog_type;
 	bool owner_jited;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	u32 n_entries;	/* number of entries in a handle array */
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	union {
 		char value[0] __aligned(8);
 		void *ptrs[0] __aligned(8);
@@ -194,6 +202,12 @@ struct bpf_array {
 	};
 };
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct map_landlock_handle {
+	u32 type; /* enum bpf_map_handle_type */
+};
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #define MAX_TAIL_CALL_CNT 32
 
 struct bpf_event_entry {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7cd36166f9b7..b68de57f7ab8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -87,6 +87,15 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_PERCPU_ARRAY,
 	BPF_MAP_TYPE_STACK_TRACE,
 	BPF_MAP_TYPE_CGROUP_ARRAY,
+	BPF_MAP_TYPE_LANDLOCK_ARRAY,
+};
+
+enum bpf_map_array_type {
+	BPF_MAP_ARRAY_TYPE_UNSPEC,
+};
+
+enum bpf_map_handle_type {
+	BPF_MAP_HANDLE_TYPE_UNSPEC,
 };
 
 enum bpf_prog_type {
@@ -510,4 +519,13 @@ struct xdp_md {
 	__u32 data_end;
 };
 
+/* Map handle entry */
+struct landlock_handle {
+	__u32 type; /* enum bpf_map_handle_type */
+	union {
+		__u32 fd;
+		__aligned_u64 glob;
+	};
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index a2ac051c342f..94256597eacd 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -16,6 +16,13 @@
 #include <linux/mm.h>
 #include <linux/filter.h>
 #include <linux/perf_event.h>
+#include <linux/file.h> /* fput() */
+#include <linux/fs.h> /* struct file */
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include <asm/resource.h> /* RLIMIT_NOFILE */
+#include <linux/sched.h> /* rlimit() */
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
@@ -580,3 +587,199 @@ static int __init register_cgroup_array_map(void)
 }
 late_initcall(register_cgroup_array_map);
 #endif
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+static struct bpf_map *landlock_array_map_alloc(union bpf_attr *attr)
+{
+	if (attr->value_size != sizeof(struct landlock_handle))
+		return ERR_PTR(-EINVAL);
+	attr->value_size = sizeof(struct map_landlock_handle);
+
+	return array_map_alloc(attr);
+}
+
+static void landlock_put_handle(struct map_landlock_handle *handle)
+{
+	enum bpf_map_handle_type handle_type = handle->type;
+
+	switch (handle_type) {
+	case BPF_MAP_HANDLE_TYPE_UNSPEC:
+	default:
+		WARN_ON(1);
+	}
+	/* safeguard */
+	handle->type = BPF_MAP_HANDLE_TYPE_UNSPEC;
+}
+
+static void landlock_array_map_free(struct bpf_map *map)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	int i;
+
+	synchronize_rcu();
+
+	for (i = 0; i < array->n_entries; i++)
+		landlock_put_handle((struct map_landlock_handle *)
+				(array->value + array->elem_size * i));
+	kvfree(array);
+}
+
+static enum bpf_map_array_type landlock_get_array_type(
+		enum bpf_map_handle_type handle_type)
+{
+	switch (handle_type) {
+	case BPF_MAP_HANDLE_TYPE_UNSPEC:
+	default:
+		return -EINVAL;
+	}
+}
+
+#define FGET_OR_RET(file, fd) { \
+	file = fget(fd); \
+	if (unlikely(IS_ERR(file))) \
+		return PTR_ERR(file); \
+	}
+
+/**
+ * landlock_store_handle - store an user handle in an arraymap entry
+ *
+ * @dst: non-NULL kernel-side Landlock handle destination
+ * @handle: non-NULL user-side Landlock handle source
+ */
+static inline long landlock_store_handle(struct map_landlock_handle *dst,
+		struct landlock_handle *handle)
+{
+	enum bpf_map_handle_type handle_type = handle->type;
+
+	switch (handle_type) {
+	case BPF_MAP_HANDLE_TYPE_UNSPEC:
+	default:
+		WARN_ON(1);
+		return -EINVAL;
+	}
+	dst->type = handle_type;
+	return 0;
+}
+
+static void *nop_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	return ERR_PTR(-EINVAL);
+}
+
+/* called from syscall or from eBPF program */
+static int landlock_array_map_update_elem(struct bpf_map *map, void *key,
+		void *value, u64 map_flags)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u32 index = *(u32 *)key;
+	enum bpf_map_array_type array_type;
+	int ret;
+	struct landlock_handle *khandle = (struct landlock_handle *)value;
+	struct map_landlock_handle *handle, handle_new;
+
+	if (unlikely(map_flags > BPF_EXIST))
+		/* unknown flags */
+		return -EINVAL;
+
+	/*
+	 * Limit number of entries in an arraymap of handles to the maximum
+	 * number of open files for the current process. The maximum number of
+	 * handle entries (including all arraymaps) for a process is then
+	 * (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
+	 * is 0, then any entry update is forbidden.
+	 *
+	 * An eBPF program can inherit all the arraymap FD. The worse case is
+	 * to fill a bunch of arraymaps, create an eBPF program, close the
+	 * arraymap FDs, and start again. The maximum number of arraymap
+	 * entries can then be close to RLIMIT_NOFILE^3.
+	 *
+	 * FIXME: This should be improved... any idea?
+	 */
+	if (unlikely(index >= rlimit(RLIMIT_NOFILE)))
+		return -EMFILE;
+
+	if (unlikely(index >= array->map.max_entries))
+		/* all elements were pre-allocated, cannot insert a new one */
+		return -E2BIG;
+
+	/* FIXME: add lock */
+	if (unlikely(index > array->n_entries))
+		/* only replace an existing entry or append a new one */
+		return -EINVAL;
+
+	/* TODO: handle all flags, not only BPF_ANY */
+	if (unlikely(map_flags == BPF_NOEXIST))
+		/* all elements already exist */
+		return -EEXIST;
+
+	if (unlikely(!khandle))
+		return -EINVAL;
+
+	array_type = landlock_get_array_type(khandle->type);
+	if (array_type < 0)
+		return array_type;
+
+	if (!map->map_array_type) {
+		/* set the initial set type */
+		map->map_array_type = array_type;
+	} else if (map->map_array_type != array_type) {
+		return -EINVAL;
+	}
+
+	ret = landlock_store_handle(&handle_new, khandle);
+	if (!ret) {
+		/* map->value_size == sizeof(struct map_landlock_handle) */
+		handle = (struct map_landlock_handle *)
+			(array->value + array->elem_size * index);
+		/* FIXME: make atomic update */
+		if (index < array->n_entries)
+			landlock_put_handle(handle);
+		*handle = handle_new;
+		/* TODO: use atomic_inc? */
+		if (index == array->n_entries)
+			array->n_entries++;
+	}
+	/* FIXME: unlock */
+
+	return ret;
+}
+
+/* called from syscall or from eBPF program */
+static int landlock_array_map_delete_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u32 index = *(u32 *)key;
+
+	/* only remove the last element */
+	/* TODO: use atomic_dec? */
+	if (array->n_entries && index == array->n_entries - 1) {
+		array->n_entries--;
+		landlock_put_handle((struct map_landlock_handle *)
+				(array->value + array->elem_size * index));
+		return 0;
+	}
+	return -EINVAL;
+}
+
+static const struct bpf_map_ops landlock_array_ops = {
+	.map_alloc = landlock_array_map_alloc,
+	.map_free = landlock_array_map_free,
+	.map_get_next_key = array_map_get_next_key,
+	.map_lookup_elem = nop_map_lookup_elem,
+	.map_update_elem = landlock_array_map_update_elem,
+	.map_delete_elem = landlock_array_map_delete_elem,
+};
+
+static struct bpf_map_type_list landlock_array_type __read_mostly = {
+	.ops = &landlock_array_ops,
+	.type = BPF_MAP_TYPE_LANDLOCK_ARRAY,
+};
+
+static int __init register_landlock_array_map(void)
+{
+	bpf_register_map_type(&landlock_array_type);
+	return 0;
+}
+
+late_initcall(register_landlock_array_map);
+#endif /* CONFIG_SECURITY_LANDLOCK */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d5d28758d04c..c0c4a92dae8c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1793,6 +1793,15 @@ static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
 	return (struct bpf_map *) (unsigned long) imm64;
 }
 
+static inline enum bpf_reg_type bpf_reg_type_from_map(struct bpf_map *map)
+{
+	switch (map->map_array_type) {
+	case BPF_MAP_ARRAY_TYPE_UNSPEC:
+	default:
+		return CONST_PTR_TO_MAP;
+	}
+}
+
 /* verify BPF_LD_IMM64 instruction */
 static int check_ld_imm(struct verifier_env *env, struct bpf_insn *insn)
 {
@@ -1819,8 +1828,9 @@ static int check_ld_imm(struct verifier_env *env, struct bpf_insn *insn)
 	/* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
 	BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
 
-	regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
 	regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
+	regs[insn->dst_reg].type =
+		bpf_reg_type_from_map(regs[insn->dst_reg].map_ptr);
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 04/22] bpf: Set register type according to is_valid_access()
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (2 preceding siblings ...)
  2016-09-14  7:23 ` [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
@ 2016-09-14  7:23 ` Mickaël Salaün
  2016-10-19 14:54   ` Thomas Graf
  2016-09-14  7:23 ` [RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier Mickaël Salaün
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This fix a pointer leak when an unprivileged eBPF program read a pointer
value from the context. Even if is_valid_access() returns a pointer
type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
value containing an address is then allowed to leak. Moreover, this
prevented unprivileged eBPF programs to use functions with (legitimate)
pointer arguments.

This bug was not a problem until now because the only unprivileged eBPF
program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
from its context are UNKNOWN_VALUE.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Fixes: 969bf05eb3ce ("bpf: direct packet access")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/verifier.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c0c4a92dae8c..608cbffb0e86 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off,
 		}
 		err = check_ctx_access(env, off, size, t, &reg_type);
 		if (!err && t == BPF_READ && value_regno >= 0) {
-			mark_reg_unknown_value(state->regs, value_regno);
-			if (env->allow_ptr_leaks)
-				/* note that reg.[id|off|range] == 0 */
-				state->regs[value_regno].type = reg_type;
+			/* note that reg.[id|off|range] == 0 */
+			state->regs[value_regno].type = reg_type;
 		}
 
 	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (3 preceding siblings ...)
  2016-09-14  7:23 ` [RFC v3 04/22] bpf: Set register type according to is_valid_access() Mickaël Salaün
@ 2016-09-14  7:23 ` Mickaël Salaün
  2016-10-19 15:01   ` Thomas Graf
  2016-09-14  7:23 ` [RFC v3 06/22] landlock: Add LSM hooks Mickaël Salaün
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

The program subtype goal is to be able to have different static
fine-grained verifications for a unique program type.

The struct bpf_verifier_ops gets a new optional function:
is_valid_subtype(). This new verifier is called at the begening of the
eBPF program verification to check if the (optional) program subtype is
valid.

For now, only Landlock eBPF programs are using a program subtype but
this could be used by other program types in the future.

Cf. the next commit to see how the subtype is used by Landlock LSM.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lkml.kernel.org/r/20160827205559.GA43880@ast-mbp.thefacebook.com
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
---
 include/linux/bpf.h      |  8 ++++++--
 include/linux/filter.h   |  1 +
 include/uapi/linux/bpf.h |  9 +++++++++
 kernel/bpf/syscall.c     |  5 +++--
 kernel/bpf/verifier.c    |  9 +++++++--
 kernel/trace/bpf_trace.c | 12 ++++++++----
 net/core/filter.c        | 21 +++++++++++++--------
 7 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eae4ce4542c1..9aa01d9d3d80 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -149,17 +149,21 @@ struct bpf_prog;
 
 struct bpf_verifier_ops {
 	/* return eBPF function prototype for verification */
-	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
+	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id,
+			union bpf_prog_subtype *prog_subtype);
 
 	/* return true if 'size' wide access at offset 'off' within bpf_context
 	 * with 'type' (read or write) is allowed
 	 */
 	bool (*is_valid_access)(int off, int size, enum bpf_access_type type,
-				enum bpf_reg_type *reg_type);
+				enum bpf_reg_type *reg_type,
+				union bpf_prog_subtype *prog_subtype);
 
 	u32 (*convert_ctx_access)(enum bpf_access_type type, int dst_reg,
 				  int src_reg, int ctx_off,
 				  struct bpf_insn *insn, struct bpf_prog *prog);
+
+	bool (*is_valid_subtype)(union bpf_prog_subtype *prog_subtype);
 };
 
 struct bpf_prog_type_list {
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c521adfe..88470cdd3ee1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -406,6 +406,7 @@ struct bpf_prog {
 	kmemcheck_bitfield_end(meta);
 	u32			len;		/* Number of filter blocks */
 	enum bpf_prog_type	type;		/* Type of BPF program */
+	union bpf_prog_subtype	subtype;	/* For fine-grained verifications */
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b68de57f7ab8..667b6ef3ff1e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -127,6 +127,14 @@ enum bpf_attach_type {
 
 #define BPF_F_NO_PREALLOC	(1U << 0)
 
+union bpf_prog_subtype {
+	struct {
+		__u32		id; /* enum landlock_hook_id */
+		__u16		origin; /* LANDLOCK_FLAG_ORIGIN_* */
+		__aligned_u64	access; /* LANDLOCK_FLAG_ACCESS_* */
+	} landlock_hook;
+} __attribute__((aligned(8)));
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -155,6 +163,7 @@ union bpf_attr {
 		__u32		log_size;	/* size of user buffer */
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
+		union bpf_prog_subtype prog_subtype;	/* checked when prog_type=landlock */
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 776c752604b0..8b3f4d2b4802 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -572,7 +572,7 @@ static void fixup_bpf_calls(struct bpf_prog *prog)
 				continue;
 			}
 
-			fn = prog->aux->ops->get_func_proto(insn->imm);
+			fn = prog->aux->ops->get_func_proto(insn->imm, &prog->subtype);
 			/* all functions that have prototype and verifier allowed
 			 * programs to call them, must be real in-kernel functions
 			 */
@@ -710,7 +710,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type)
 EXPORT_SYMBOL_GPL(bpf_prog_get_type);
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD kern_version
+#define	BPF_PROG_LOAD_LAST_FIELD prog_subtype
 
 static int bpf_prog_load(union bpf_attr *attr)
 {
@@ -768,6 +768,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 	err = find_prog_type(type, prog);
 	if (err < 0)
 		goto free_prog;
+	prog->subtype = attr->prog_subtype;
 
 	/* run eBPF verifier */
 	err = bpf_check(&prog, attr);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 608cbffb0e86..c434817e6ef4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -684,7 +684,8 @@ static int check_ctx_access(struct verifier_env *env, int off, int size,
 			    enum bpf_access_type t, enum bpf_reg_type *reg_type)
 {
 	if (env->prog->aux->ops->is_valid_access &&
-	    env->prog->aux->ops->is_valid_access(off, size, t, reg_type)) {
+	    env->prog->aux->ops->is_valid_access(off, size, t, reg_type,
+		    &env->prog->subtype)) {
 		/* remember the offset of last byte accessed in ctx */
 		if (env->prog->aux->max_ctx_offset < off + size)
 			env->prog->aux->max_ctx_offset = off + size;
@@ -1173,7 +1174,7 @@ static int check_call(struct verifier_env *env, int func_id)
 	}
 
 	if (env->prog->aux->ops->get_func_proto)
-		fn = env->prog->aux->ops->get_func_proto(func_id);
+		fn = env->prog->aux->ops->get_func_proto(func_id, &env->prog->subtype);
 
 	if (!fn) {
 		verbose("unknown func %d\n", func_id);
@@ -2768,6 +2769,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 	if ((*prog)->len <= 0 || (*prog)->len > BPF_MAXINSNS)
 		return -E2BIG;
 
+	if ((*prog)->aux->ops->is_valid_subtype &&
+	    !(*prog)->aux->ops->is_valid_subtype(&(*prog)->subtype))
+		return -EINVAL;
+
 	/* 'struct verifier_env' can be global, but since it's not small,
 	 * allocate/free it every time bpf_check() is called
 	 */
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 5dcb99281259..51cf0f254bf2 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -435,7 +435,8 @@ static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id func_id)
 	}
 }
 
-static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id)
+static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id,
+		union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -449,7 +450,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 
 /* bpf+kprobe programs can access fields of 'struct pt_regs' */
 static bool kprobe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-					enum bpf_reg_type *reg_type)
+					enum bpf_reg_type *reg_type,
+					union bpf_prog_subtype *prog_subtype)
 {
 	if (off < 0 || off >= sizeof(struct pt_regs))
 		return false;
@@ -517,7 +519,8 @@ static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
-static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
+static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id,
+		union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -530,7 +533,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 }
 
 static bool tp_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-				    enum bpf_reg_type *reg_type)
+				    enum bpf_reg_type *reg_type,
+				    union bpf_prog_subtype *prog_subtype)
 {
 	if (off < sizeof(void *) || off >= PERF_MAX_TRACE_SIZE)
 		return false;
diff --git a/net/core/filter.c b/net/core/filter.c
index 9e9d99e52814..e61f02d0dd64 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2411,7 +2411,8 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
 };
 
 static const struct bpf_func_proto *
-sk_filter_func_proto(enum bpf_func_id func_id)
+sk_filter_func_proto(enum bpf_func_id func_id,
+		union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -2437,7 +2438,8 @@ sk_filter_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
-tc_cls_act_func_proto(enum bpf_func_id func_id)
+tc_cls_act_func_proto(enum bpf_func_id func_id,
+		union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_store_bytes:
@@ -2485,18 +2487,18 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return sk_filter_func_proto(func_id);
+		return sk_filter_func_proto(func_id, prog_subtype);
 	}
 }
 
 static const struct bpf_func_proto *
-xdp_func_proto(enum bpf_func_id func_id)
+xdp_func_proto(enum bpf_func_id func_id, union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
 		return &bpf_xdp_event_output_proto;
 	default:
-		return sk_filter_func_proto(func_id);
+		return sk_filter_func_proto(func_id, prog_subtype);
 	}
 }
 
@@ -2515,7 +2517,8 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 
 static bool sk_filter_is_valid_access(int off, int size,
 				      enum bpf_access_type type,
-				      enum bpf_reg_type *reg_type)
+				      enum bpf_reg_type *reg_type,
+				      union bpf_prog_subtype *prog_subtype)
 {
 	switch (off) {
 	case offsetof(struct __sk_buff, tc_classid):
@@ -2539,7 +2542,8 @@ static bool sk_filter_is_valid_access(int off, int size,
 
 static bool tc_cls_act_is_valid_access(int off, int size,
 				       enum bpf_access_type type,
-				       enum bpf_reg_type *reg_type)
+				       enum bpf_reg_type *reg_type,
+				       union bpf_prog_subtype *prog_subtype)
 {
 	if (type == BPF_WRITE) {
 		switch (off) {
@@ -2582,7 +2586,8 @@ static bool __is_valid_xdp_access(int off, int size,
 
 static bool xdp_is_valid_access(int off, int size,
 				enum bpf_access_type type,
-				enum bpf_reg_type *reg_type)
+				enum bpf_reg_type *reg_type,
+				union bpf_prog_subtype *prog_subtype)
 {
 	if (type == BPF_WRITE)
 		return false;
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 06/22] landlock: Add LSM hooks
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (4 preceding siblings ...)
  2016-09-14  7:23 ` [RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier Mickaël Salaün
@ 2016-09-14  7:23 ` Mickaël Salaün
  2016-10-19 15:19   ` Thomas Graf
  2016-09-14  7:24 ` [RFC v3 07/22] landlock: Handle file comparisons Mickaël Salaün
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Add LSM hooks which can be used by userland through Landlock (eBPF)
programs. This programs are limited to a whitelist of functions (cf.
next commit). The eBPF program context is depicted by the struct
landlock_data (cf. include/uapi/linux/bpf.h):
* hook: LSM hook ID
* origin: what triggered this Landlock program (syscall, dedicated
  seccomp return or interruption)
* cookie: the 16-bit value from the seccomp filter that triggered this
  Landlock program
* args[6]: array of some LSM hook arguments

The LSM hook arguments can contain raw values as integers or
(unleakable) pointers. The only way to use the pointers are to pass them
to an eBPF function according to their types (e.g. the
bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
file pointer).

For each Landlock program, the subtype allows to specify for which LSM
hook the program is dedicated thanks to the "id" field. The "origin"
field must contains each triggers for which the Landlock program will
be called (e.g. every syscall or/and seccomp filters returning
RET_LANDLOCK). The "access" bitfield can be used to allow a program to
access a specific feature from a Landlock hook (i.e. context value or
function). The flag guarding this feature may only be enabled according
to the capabilities of the process loading the program.

For now, there is three hooks for file system access control:
* file_open
* file_permission
* mmap_file

Changes since v2:
* use subtypes instead of dedicated eBPF program types for each hook
  (suggested by Alexei Starovoitov)
* replace convert_ctx_access() with subtype check
* use an array of Landlock program list instead of a single list
* handle running Landlock programs without needing a seccomp filter
* use, check and expose "origin" to Landlock programs
* mask the unused struct cred * (suggested by Andy Lutomirski)

Changes since v1:
* revamp access control from a syscall-based to a LSM hooks-based
* do not use audit cache
* no race conditions by design
* architecture agnostic
* switch from cBPF to eBPF (suggested by Daniel Borkmann)
* new BPF context

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Will Drewry <wad@chromium.org>
Link: https://lkml.kernel.org/r/20160827205559.GA43880@ast-mbp.thefacebook.com
Link: https://lkml.kernel.org/r/20160827180642.GA38754@ast-mbp.thefacebook.com
Link: https://lkml.kernel.org/r/CALCETrUK1umtXMEXXKzMAccNQCVTPA8_XNDf01B5=gAZuJwvsQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20160827204307.GA43714@ast-mbp.thefacebook.com
---
 include/linux/bpf.h        |   5 +
 include/linux/lsm_hooks.h  |   5 +
 include/uapi/linux/bpf.h   |  37 ++++++++
 kernel/bpf/syscall.c       |  10 +-
 kernel/bpf/verifier.c      |   6 ++
 security/Makefile          |   2 +
 security/landlock/Makefile |   3 +
 security/landlock/lsm.c    | 222 +++++++++++++++++++++++++++++++++++++++++++++
 security/security.c        |   1 +
 9 files changed, 289 insertions(+), 2 deletions(-)
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/lsm.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9aa01d9d3d80..36c3e482239c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -85,6 +85,8 @@ enum bpf_arg_type {
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
+
+	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */
 };
 
 /* type of values returned from helper functions */
@@ -143,6 +145,9 @@ enum bpf_reg_type {
 	 */
 	PTR_TO_PACKET,
 	PTR_TO_PACKET_END,	 /* skb->data + headlen */
+
+	/* Landlock */
+	PTR_TO_STRUCT_FILE,
 };
 
 struct bpf_prog;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 558adfa5c8a8..069af34301d4 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1933,5 +1933,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 667b6ef3ff1e..ad87003fe892 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -108,6 +108,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_XDP,
 	BPF_PROG_TYPE_PERF_EVENT,
 	BPF_PROG_TYPE_CGROUP_SOCKET,
+	BPF_PROG_TYPE_LANDLOCK,
 };
 
 enum bpf_attach_type {
@@ -528,6 +529,23 @@ struct xdp_md {
 	__u32 data_end;
 };
 
+/* LSM hooks */
+enum landlock_hook_id {
+	LANDLOCK_HOOK_UNSPEC,
+	LANDLOCK_HOOK_FILE_OPEN,
+	LANDLOCK_HOOK_FILE_PERMISSION,
+	LANDLOCK_HOOK_MMAP_FILE,
+};
+#define _LANDLOCK_HOOK_LAST LANDLOCK_HOOK_MMAP_FILE
+
+/* Trigger type */
+#define LANDLOCK_FLAG_ORIGIN_SYSCALL	(1 << 0)
+#define LANDLOCK_FLAG_ORIGIN_SECCOMP	(1 << 1)
+#define _LANDLOCK_FLAG_ORIGIN_MASK	((1 << 2) - 1)
+
+/* context of function access flags */
+#define _LANDLOCK_FLAG_ACCESS_MASK	((1ULL << 0) - 1)
+
 /* Map handle entry */
 struct landlock_handle {
 	__u32 type; /* enum bpf_map_handle_type */
@@ -537,4 +555,23 @@ struct landlock_handle {
 	};
 } __attribute__((aligned(8)));
 
+/**
+ * struct landlock_data
+ *
+ * @hook: LSM hook ID (e.g. BPF_PROG_TYPE_LANDLOCK_FILE_OPEN)
+ * @origin: bit indicating for which reason the program is running
+ * @cookie: value set by a seccomp-filter return value RET_LANDLOCK. This come
+ *          from a trusted seccomp-bpf program: the same process that loaded
+ *          this Landlock hook program.
+ * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
+ *        description and the LANDLOCK_HOOK* definitions from
+ *        security/landlock/lsm.c for their types.
+ */
+struct landlock_data {
+	__u32 hook; /* enum landlock_hook_id */
+	__u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */
+	__u16 cookie; /* seccomp RET_LANDLOCK */
+	__u64 args[6];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8b3f4d2b4802..f22e3b63d253 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -739,8 +739,14 @@ static int bpf_prog_load(union bpf_attr *attr)
 	    attr->kern_version != LINUX_VERSION_CODE)
 		return -EINVAL;
 
-	if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
-		return -EPERM;
+	switch (type) {
+	case BPF_PROG_TYPE_SOCKET_FILTER:
+	case BPF_PROG_TYPE_LANDLOCK:
+		break;
+	default:
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+	}
 
 	/* plain bpf_prog allocation */
 	prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c434817e6ef4..5c9982d55612 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -245,6 +245,7 @@ static const char * const reg_type_str[] = {
 	[CONST_IMM]		= "imm",
 	[PTR_TO_PACKET]		= "pkt",
 	[PTR_TO_PACKET_END]	= "pkt_end",
+	[PTR_TO_STRUCT_FILE]	= "struct_file",
 };
 
 static void print_verifier_state(struct verifier_state *state)
@@ -555,6 +556,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_PACKET_END:
 	case FRAME_PTR:
 	case CONST_PTR_TO_MAP:
+	case PTR_TO_STRUCT_FILE:
 		return true;
 	default:
 		return false;
@@ -972,6 +974,10 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		expected_type = PTR_TO_CTX;
 		if (type != expected_type)
 			goto err_type;
+	} else if (arg_type == ARG_PTR_TO_STRUCT_FILE) {
+		expected_type = PTR_TO_STRUCT_FILE;
+		if (type != expected_type)
+			goto err_type;
 	} else if (arg_type == ARG_PTR_TO_STACK ||
 		   arg_type == ARG_PTR_TO_RAW_STACK) {
 		expected_type = PTR_TO_STACK;
diff --git a/security/Makefile b/security/Makefile
index f2d71cdb8e19..3fdc2f19dc48 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -9,6 +9,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
 subdir-$(CONFIG_SECURITY_YAMA)		+= yama
 subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
+subdir-$(CONFIG_SECURITY_LANDLOCK)		+= landlock
 
 # always enable default capabilities
 obj-y					+= commoncap.o
@@ -24,6 +25,7 @@ obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
 obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
 obj-$(CONFIG_SECURITY_YAMA)		+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
+obj-$(CONFIG_SECURITY_LANDLOCK)	+= landlock/
 obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
 
 # Object integrity file lists
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
new file mode 100644
index 000000000000..59669d70bc7e
--- /dev/null
+++ b/security/landlock/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
+
+landlock-y := lsm.o
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
new file mode 100644
index 000000000000..c032183e5d95
--- /dev/null
+++ b/security/landlock/lsm.c
@@ -0,0 +1,222 @@
+/*
+ * Landlock LSM
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/current.h>
+#include <linux/bpf.h> /* enum bpf_reg_type, struct landlock_data */
+#include <linux/cred.h>
+#include <linux/err.h> /* MAX_ERRNO */
+#include <linux/filter.h> /* struct bpf_prog, BPF_PROG_RUN() */
+#include <linux/kernel.h> /* FIELD_SIZEOF() */
+#include <linux/lsm_hooks.h>
+#include <linux/types.h> /* uintptr_t */
+
+#define LANDLOCK_MAP0(m, ...)
+#define LANDLOCK_MAP1(m, d, t, a) m(d, t, a)
+#define LANDLOCK_MAP2(m, d, t, a, ...) m(d, t, a), LANDLOCK_MAP1(m, __VA_ARGS__)
+#define LANDLOCK_MAP3(m, d, t, a, ...) m(d, t, a), LANDLOCK_MAP2(m, __VA_ARGS__)
+#define LANDLOCK_MAP4(m, d, t, a, ...) m(d, t, a), LANDLOCK_MAP3(m, __VA_ARGS__)
+#define LANDLOCK_MAP5(m, d, t, a, ...) m(d, t, a), LANDLOCK_MAP4(m, __VA_ARGS__)
+#define LANDLOCK_MAP6(m, d, t, a, ...) m(d, t, a), LANDLOCK_MAP5(m, __VA_ARGS__)
+#define LANDLOCK_MAP(n, ...) LANDLOCK_MAP##n(__VA_ARGS__)
+
+#define LANDLOCK_ARG_D(d, t, a) d
+#define LANDLOCK_ARG_TA(d, t, a) t a
+#define LANDLOCK_ARG_A(d, t, a) ((u64)(uintptr_t)a)
+
+#define LANDLOCK_HOOKx(X, NAME, CNAME, ...)				\
+	static inline int landlock_hook_##NAME(				\
+		LANDLOCK_MAP(X, LANDLOCK_ARG_TA, __VA_ARGS__))		\
+	{								\
+		__u64 args[6] = {					\
+			LANDLOCK_MAP(X, LANDLOCK_ARG_A, __VA_ARGS__)	\
+		};							\
+		return landlock_run_prog(LANDLOCK_HOOK_##CNAME, args);	\
+	}								\
+	static inline bool __is_valid_access_hook_##CNAME(		\
+			int off, int size, enum bpf_access_type type,	\
+			enum bpf_reg_type *reg_type,			\
+			union bpf_prog_subtype *prog_subtype)		\
+	{								\
+		enum bpf_reg_type arg_types[6] = {			\
+			LANDLOCK_MAP(X, LANDLOCK_ARG_D, __VA_ARGS__)	\
+		};							\
+		return __is_valid_access(off, size, type, reg_type,	\
+				arg_types, prog_subtype);		\
+	}								\
+
+#define LANDLOCK_HOOK1(NAME, ...) LANDLOCK_HOOKx(1, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK2(NAME, ...) LANDLOCK_HOOKx(2, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK3(NAME, ...) LANDLOCK_HOOKx(3, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK4(NAME, ...) LANDLOCK_HOOKx(4, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK5(NAME, ...) LANDLOCK_HOOKx(5, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK6(NAME, ...) LANDLOCK_HOOKx(6, NAME, __VA_ARGS__)
+
+#define LANDLOCK_HOOK_INIT(NAME) LSM_HOOK_INIT(NAME, landlock_hook_##NAME)
+
+
+static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
+{
+	return 0;
+}
+
+static const struct bpf_func_proto *bpf_landlock_func_proto(
+		enum bpf_func_id func_id, union bpf_prog_subtype *prog_subtype)
+{
+	switch (func_id) {
+	default:
+		return NULL;
+	}
+}
+
+static bool __is_valid_access(int off, int size, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type,
+		enum bpf_reg_type arg_types[6],
+		union bpf_prog_subtype *prog_subtype)
+{
+	int arg_nb, expected_size;
+
+	if (type != BPF_READ)
+		return false;
+	if (off < 0 || off >= sizeof(struct landlock_data))
+		return false;
+
+	/* check size */
+	switch (off) {
+	case offsetof(struct landlock_data, origin):
+	case offsetof(struct landlock_data, cookie):
+		expected_size = sizeof(__u16);
+		break;
+	case offsetof(struct landlock_data, hook):
+		expected_size = sizeof(__u32);
+		break;
+	case offsetof(struct landlock_data, args[0]) ...
+			offsetof(struct landlock_data, args[5]):
+		expected_size = sizeof(__u64);
+		break;
+	default:
+		return false;
+	}
+	if (expected_size != size)
+		return false;
+
+	/* check pointer access and set pointer type */
+	switch (off) {
+	case offsetof(struct landlock_data, args[0]) ...
+			offsetof(struct landlock_data, args[5]):
+		arg_nb = (off - offsetof(struct landlock_data, args[0]))
+			/ FIELD_SIZEOF(struct landlock_data, args[0]);
+		*reg_type = arg_types[arg_nb];
+		if (*reg_type == NOT_INIT)
+			return false;
+		break;
+	}
+
+	return true;
+}
+
+LANDLOCK_HOOK2(file_open, FILE_OPEN,
+	PTR_TO_STRUCT_FILE, struct file *, file,
+	NOT_INIT, const struct cred *, cred
+)
+
+LANDLOCK_HOOK2(file_permission, FILE_PERMISSION,
+	PTR_TO_STRUCT_FILE, struct file *, file,
+	UNKNOWN_VALUE, int, mask
+)
+
+LANDLOCK_HOOK4(mmap_file, MMAP_FILE,
+	PTR_TO_STRUCT_FILE, struct file *, file,
+	UNKNOWN_VALUE, unsigned long, reqprot,
+	UNKNOWN_VALUE, unsigned long, prot,
+	UNKNOWN_VALUE, unsigned long, flags
+)
+
+static struct security_hook_list landlock_hooks[] = {
+	LANDLOCK_HOOK_INIT(file_open),
+	LANDLOCK_HOOK_INIT(file_permission),
+	LANDLOCK_HOOK_INIT(mmap_file),
+};
+
+void __init landlock_add_hooks(void)
+{
+	pr_info("landlock: Becoming ready for sandboxing\n");
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks));
+}
+
+#define LANDLOCK_CASE_ACCESS_HOOK(CNAME)			\
+	case LANDLOCK_HOOK_##CNAME:				\
+		return __is_valid_access_hook_##CNAME(		\
+				off, size, type, reg_type, prog_subtype);
+
+static inline bool bpf_landlock_is_valid_access(int off, int size,
+		enum bpf_access_type type, enum bpf_reg_type *reg_type,
+		union bpf_prog_subtype *prog_subtype)
+{
+	enum landlock_hook_id hook_id = prog_subtype->landlock_hook.id;
+
+	switch (hook_id) {
+	LANDLOCK_CASE_ACCESS_HOOK(FILE_OPEN)
+	LANDLOCK_CASE_ACCESS_HOOK(FILE_PERMISSION)
+	LANDLOCK_CASE_ACCESS_HOOK(MMAP_FILE)
+	case LANDLOCK_HOOK_UNSPEC:
+	default:
+		return false;
+	}
+}
+
+static inline bool bpf_landlock_is_valid_subtype(
+		union bpf_prog_subtype *prog_subtype)
+{
+	enum landlock_hook_id hook_id = prog_subtype->landlock_hook.id;
+
+	switch (hook_id) {
+	case LANDLOCK_HOOK_FILE_OPEN:
+	case LANDLOCK_HOOK_FILE_PERMISSION:
+	case LANDLOCK_HOOK_MMAP_FILE:
+		break;
+	case LANDLOCK_HOOK_UNSPEC:
+	default:
+		return false;
+	}
+	if (!prog_subtype->landlock_hook.id ||
+			prog_subtype->landlock_hook.id > _LANDLOCK_HOOK_LAST)
+		return false;
+	if (!prog_subtype->landlock_hook.origin ||
+			(prog_subtype->landlock_hook.origin &
+			 ~_LANDLOCK_FLAG_ORIGIN_MASK))
+		return false;
+#ifndef CONFIG_SECCOMP_FILTER
+	if (prog_subtype->landlock_hook.origin & LANDLOCK_FLAG_ORIGIN_SECCOMP)
+		return false;
+#endif /* !CONFIG_SECCOMP_FILTER */
+	if (prog_subtype->landlock_hook.access & ~_LANDLOCK_FLAG_ACCESS_MASK)
+		return false;
+
+	return true;
+}
+
+static const struct bpf_verifier_ops bpf_landlock_ops = {
+	.get_func_proto	= bpf_landlock_func_proto,
+	.is_valid_access = bpf_landlock_is_valid_access,
+	.is_valid_subtype = bpf_landlock_is_valid_subtype,
+};
+
+static struct bpf_prog_type_list bpf_landlock_type __read_mostly = {
+	.ops	= &bpf_landlock_ops,
+	.type	= BPF_PROG_TYPE_LANDLOCK,
+};
+
+static int __init register_landlock_filter_ops(void)
+{
+	bpf_register_prog_type(&bpf_landlock_type);
+	return 0;
+}
+
+late_initcall(register_landlock_filter_ops);
diff --git a/security/security.c b/security/security.c
index f825304f04a7..92f0f1f209b6 100644
--- a/security/security.c
+++ b/security/security.c
@@ -61,6 +61,7 @@ int __init security_init(void)
 	capability_add_hooks();
 	yama_add_hooks();
 	loadpin_add_hooks();
+	landlock_add_hooks();
 
 	/*
 	 * Load all the remaining security modules.
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (5 preceding siblings ...)
  2016-09-14  7:23 ` [RFC v3 06/22] landlock: Add LSM hooks Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 19:07   ` Jann Horn
                     ` (2 more replies)
  2016-09-14  7:24 ` [RFC v3 08/22] seccomp: Fix documentation for struct seccomp_filter Mickaël Salaün
                   ` (15 subsequent siblings)
  22 siblings, 3 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Add eBPF functions to compare file system access with a Landlock file
system handle:
* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
  This function allows to compare the dentry, inode, device or mount
  point of the currently accessed file, with a reference handle.
* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
  This function allows an eBPF program to check if the current accessed
  file is the same or in the hierarchy of a reference handle.

The goal of file system handle is to abstract kernel objects such as a
struct file or a struct inode. Userland can create this kind of handle
thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
landlock_handle containing the handle type (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
also be any descriptions able to match a struct file or a struct inode
(e.g. path or glob string).

Changes since v2:
* add MNT_INTERNAL check to only add file handle from user-visible FS
  (e.g. no anonymous inode)
* replace struct file* with struct path* in map_landlock_handle
* add BPF protos
* fix bpf_landlock_cmp_fs_prop_with_struct_file()

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
---
 include/linux/bpf.h            |  10 +++
 include/uapi/linux/bpf.h       |  49 +++++++++++
 kernel/bpf/arraymap.c          |  21 +++++
 kernel/bpf/verifier.c          |   8 ++
 security/landlock/Makefile     |   2 +-
 security/landlock/checker_fs.c | 179 +++++++++++++++++++++++++++++++++++++++++
 security/landlock/checker_fs.h |  20 +++++
 security/landlock/lsm.c        |   6 ++
 8 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/checker_fs.c
 create mode 100644 security/landlock/checker_fs.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 36c3e482239c..f7325c17f720 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -87,6 +87,7 @@ enum bpf_arg_type {
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
 
 	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */
+	ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,	/* pointer to Landlock FS handle */
 };
 
 /* type of values returned from helper functions */
@@ -148,6 +149,7 @@ enum bpf_reg_type {
 
 	/* Landlock */
 	PTR_TO_STRUCT_FILE,
+	CONST_PTR_TO_LANDLOCK_HANDLE_FS,
 };
 
 struct bpf_prog;
@@ -214,6 +216,9 @@ struct bpf_array {
 #ifdef CONFIG_SECURITY_LANDLOCK
 struct map_landlock_handle {
 	u32 type; /* enum bpf_map_handle_type */
+	union {
+		struct path path;
+	};
 };
 #endif /* CONFIG_SECURITY_LANDLOCK */
 
@@ -348,6 +353,11 @@ extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern const struct bpf_func_proto bpf_landlock_cmp_fs_prop_with_struct_file_proto;
+extern const struct bpf_func_proto bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ad87003fe892..905dcace7255 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -92,10 +92,20 @@ enum bpf_map_type {
 
 enum bpf_map_array_type {
 	BPF_MAP_ARRAY_TYPE_UNSPEC,
+	BPF_MAP_ARRAY_TYPE_LANDLOCK_FS,
 };
 
 enum bpf_map_handle_type {
 	BPF_MAP_HANDLE_TYPE_UNSPEC,
+	BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
+	/* BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB, */
+};
+
+enum bpf_map_array_op {
+	BPF_MAP_ARRAY_OP_UNSPEC,
+	BPF_MAP_ARRAY_OP_OR,
+	BPF_MAP_ARRAY_OP_AND,
+	BPF_MAP_ARRAY_OP_XOR,
 };
 
 enum bpf_prog_type {
@@ -434,6 +444,34 @@ enum bpf_func_id {
 	 */
 	BPF_FUNC_skb_change_tail,
 
+	/**
+	 * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
+	 * Compare file system handles with a struct file
+	 *
+	 * @prop: properties to check against (e.g. LANDLOCK_FLAG_FS_DENTRY)
+	 * @map: handles to compare against
+	 * @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+	 * @file: struct file address to compare with (taken from the context)
+	 *
+	 * Return: 0 if the file match the handles, 1 otherwise, or a negative
+	 * value if an error occurred.
+	 */
+	BPF_FUNC_landlock_cmp_fs_prop_with_struct_file,
+
+	/**
+	 * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
+	 * Check if a struct file is a leaf of file system handles
+	 *
+	 * @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE)
+	 * @map: handles to compare against
+	 * @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+	 * @file: struct file address to compare with (taken from the context)
+	 *
+	 * Return: 0 if the file is the same or beneath the handles,
+	 * 1 otherwise, or a negative value if an error occurred.
+	 */
+	BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file,
+
 	__BPF_FUNC_MAX_ID,
 };
 
@@ -546,6 +584,17 @@ enum landlock_hook_id {
 /* context of function access flags */
 #define _LANDLOCK_FLAG_ACCESS_MASK	((1ULL << 0) - 1)
 
+/* Handle check flags */
+#define LANDLOCK_FLAG_FS_DENTRY		(1 << 0)
+#define LANDLOCK_FLAG_FS_INODE		(1 << 1)
+#define LANDLOCK_FLAG_FS_DEVICE		(1 << 2)
+#define LANDLOCK_FLAG_FS_MOUNT		(1 << 3)
+#define _LANDLOCK_FLAG_FS_MASK		((1ULL << 4) - 1)
+
+/* Handle option flags */
+#define LANDLOCK_FLAG_OPT_REVERSE	(1<<0)
+#define _LANDLOCK_FLAG_OPT_MASK		((1ULL << 1) - 1)
+
 /* Map handle entry */
 struct landlock_handle {
 	__u32 type; /* enum bpf_map_handle_type */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 94256597eacd..edaab4c87292 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -21,6 +21,8 @@
 
 #ifdef CONFIG_SECURITY_LANDLOCK
 #include <asm/resource.h> /* RLIMIT_NOFILE */
+#include <linux/mount.h> /* struct vfsmount, MNT_INTERNAL */
+#include <linux/path.h> /* path_get(), path_put() */
 #include <linux/sched.h> /* rlimit() */
 #endif /* CONFIG_SECURITY_LANDLOCK */
 
@@ -603,6 +605,9 @@ static void landlock_put_handle(struct map_landlock_handle *handle)
 	enum bpf_map_handle_type handle_type = handle->type;
 
 	switch (handle_type) {
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
+		path_put(&handle->path);
+		break;
 	case BPF_MAP_HANDLE_TYPE_UNSPEC:
 	default:
 		WARN_ON(1);
@@ -628,6 +633,8 @@ static enum bpf_map_array_type landlock_get_array_type(
 		enum bpf_map_handle_type handle_type)
 {
 	switch (handle_type) {
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
+		return BPF_MAP_ARRAY_TYPE_LANDLOCK_FS;
 	case BPF_MAP_HANDLE_TYPE_UNSPEC:
 	default:
 		return -EINVAL;
@@ -650,8 +657,22 @@ static inline long landlock_store_handle(struct map_landlock_handle *dst,
 		struct landlock_handle *handle)
 {
 	enum bpf_map_handle_type handle_type = handle->type;
+	struct file *handle_file;
+
+	/* access control already done for the FD */
 
 	switch (handle_type) {
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
+		FGET_OR_RET(handle_file, handle->fd);
+		/* check if the FD is tied to a user mount point */
+		if (unlikely(handle_file->f_path.mnt->mnt_flags & MNT_INTERNAL)) {
+			fput(handle_file);
+			return -EINVAL;
+		}
+		path_get(&handle_file->f_path);
+		dst->path = handle_file->f_path;
+		fput(handle_file);
+		break;
 	case BPF_MAP_HANDLE_TYPE_UNSPEC:
 	default:
 		WARN_ON(1);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5c9982d55612..8d7b18574f5a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -246,6 +246,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_PACKET]		= "pkt",
 	[PTR_TO_PACKET_END]	= "pkt_end",
 	[PTR_TO_STRUCT_FILE]	= "struct_file",
+	[CONST_PTR_TO_LANDLOCK_HANDLE_FS] = "landlock_handle_fs",
 };
 
 static void print_verifier_state(struct verifier_state *state)
@@ -557,6 +558,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case FRAME_PTR:
 	case CONST_PTR_TO_MAP:
 	case PTR_TO_STRUCT_FILE:
+	case CONST_PTR_TO_LANDLOCK_HANDLE_FS:
 		return true;
 	default:
 		return false;
@@ -978,6 +980,10 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		expected_type = PTR_TO_STRUCT_FILE;
 		if (type != expected_type)
 			goto err_type;
+	} else if (arg_type == ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS) {
+		expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_FS;
+		if (type != expected_type)
+			goto err_type;
 	} else if (arg_type == ARG_PTR_TO_STACK ||
 		   arg_type == ARG_PTR_TO_RAW_STACK) {
 		expected_type = PTR_TO_STACK;
@@ -1801,6 +1807,8 @@ static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
 static inline enum bpf_reg_type bpf_reg_type_from_map(struct bpf_map *map)
 {
 	switch (map->map_array_type) {
+	case BPF_MAP_ARRAY_TYPE_LANDLOCK_FS:
+		return CONST_PTR_TO_LANDLOCK_HANDLE_FS;
 	case BPF_MAP_ARRAY_TYPE_UNSPEC:
 	default:
 		return CONST_PTR_TO_MAP;
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 59669d70bc7e..27f359a8cfaa 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := lsm.o
+landlock-y := lsm.o checker_fs.o
diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c
new file mode 100644
index 000000000000..39eb85dc7d18
--- /dev/null
+++ b/security/landlock/checker_fs.c
@@ -0,0 +1,179 @@
+/*
+ * Landlock LSM - File System Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h> /* enum bpf_map_array_op */
+#include <linux/errno.h>
+#include <linux/fs.h> /* path_is_under() */
+#include <linux/path.h> /* struct path */
+
+#include "checker_fs.h"
+
+#define EQUAL_NOT_NULL(a, b) (a && a == b)
+
+/*
+ * bpf_landlock_cmp_fs_prop_with_struct_file
+ *
+ * Cf. include/uapi/linux/bpf.h
+ */
+static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
+		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
+{
+	u8 property = (u8) r1_property;
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
+	enum bpf_map_array_op map_op = r3_map_op;
+	struct file *file = (struct file *) (unsigned long) r4_file;
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	struct path *p1, *p2;
+	struct map_landlock_handle *handle;
+	int i;
+
+	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
+	if (unlikely(!map)) {
+		WARN_ON(1);
+		return -EFAULT;
+	}
+	if (unlikely(!file))
+		return -ENOENT;
+	if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != _LANDLOCK_FLAG_FS_MASK))
+		return -EINVAL;
+
+	/* for now, only handle OP_OR */
+	switch (map_op) {
+	case BPF_MAP_ARRAY_OP_OR:
+		break;
+	case BPF_MAP_ARRAY_OP_UNSPEC:
+	case BPF_MAP_ARRAY_OP_AND:
+	case BPF_MAP_ARRAY_OP_XOR:
+	default:
+		return -EINVAL;
+	}
+	p2 = &file->f_path;
+
+	synchronize_rcu();
+
+	for (i = 0; i < array->n_entries; i++) {
+		bool result_dentry = !(property & LANDLOCK_FLAG_FS_DENTRY);
+		bool result_inode = !(property & LANDLOCK_FLAG_FS_INODE);
+		bool result_device = !(property & LANDLOCK_FLAG_FS_DEVICE);
+		bool result_mount = !(property & LANDLOCK_FLAG_FS_MOUNT);
+
+		handle = (struct map_landlock_handle *)
+				(array->value + array->elem_size * i);
+
+		if (handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) {
+			WARN_ON(1);
+			return -EFAULT;
+		}
+		p1 = &handle->path;
+
+		if (!result_dentry && p1->dentry == p2->dentry)
+			result_dentry = true;
+		/* TODO: use d_inode_rcu() instead? */
+		if (!result_inode
+		    && EQUAL_NOT_NULL(d_inode(p1->dentry)->i_ino,
+				      d_inode(p2->dentry)->i_ino))
+			result_inode = true;
+		/* check superblock instead of device major/minor */
+		if (!result_device
+		    && EQUAL_NOT_NULL(d_inode(p1->dentry)->i_sb,
+				      d_inode(p2->dentry)->i_sb))
+			result_device = true;
+		if (!result_mount && EQUAL_NOT_NULL(p1->mnt, p2->mnt))
+			result_mount = true;
+		if (result_dentry && result_inode && result_device && result_mount)
+			return 0;
+	}
+	return 1;
+}
+
+const struct bpf_func_proto bpf_landlock_cmp_fs_prop_with_struct_file_proto = {
+	.func		= bpf_landlock_cmp_fs_prop_with_struct_file,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_STRUCT_FILE,
+};
+
+/*
+ * bpf_landlock_cmp_fs_beneath_with_struct_file
+ *
+ * Cf. include/uapi/linux/bpf.h
+ */
+static inline u64 bpf_landlock_cmp_fs_beneath_with_struct_file(u64 r1_option,
+		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
+{
+	u8 option = (u8) r1_option;
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
+	enum bpf_map_array_op map_op = r3_map_op;
+	struct file *file = (struct file *) (unsigned long) r4_file;
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	struct path *p1, *p2;
+	struct map_landlock_handle *handle;
+	int i;
+
+	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
+	if (unlikely(!map)) {
+		WARN_ON(1);
+		return -EFAULT;
+	}
+	/* @file can be null for anonymous mmap */
+	if (unlikely(!file))
+		return -ENOENT;
+	if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
+		return -EINVAL;
+
+	/* for now, only handle OP_OR */
+	switch (map_op) {
+	case BPF_MAP_ARRAY_OP_OR:
+		break;
+	case BPF_MAP_ARRAY_OP_UNSPEC:
+	case BPF_MAP_ARRAY_OP_AND:
+	case BPF_MAP_ARRAY_OP_XOR:
+	default:
+		return -EINVAL;
+	}
+	/* p1 and p2 will be set correctly in the loop */
+	p1 = &file->f_path;
+	p2 = p1;
+
+	synchronize_rcu();
+
+	for (i = 0; i < array->n_entries; i++) {
+		handle = (struct map_landlock_handle *)
+				(array->value + array->elem_size * i);
+
+		/* protected by the proto types, should not happen */
+		if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD)) {
+			WARN_ON(1);
+			return -EINVAL;
+		}
+
+		if (option & LANDLOCK_FLAG_OPT_REVERSE)
+			p2 = &handle->path;
+		else
+			p1 = &handle->path;
+
+		if (path_is_under(p2, p1))
+			return 0;
+	}
+	return 1;
+}
+
+const struct bpf_func_proto bpf_landlock_cmp_fs_beneath_with_struct_file_proto = {
+	.func		= bpf_landlock_cmp_fs_beneath_with_struct_file,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_STRUCT_FILE,
+};
diff --git a/security/landlock/checker_fs.h b/security/landlock/checker_fs.h
new file mode 100644
index 000000000000..a62f84e39efd
--- /dev/null
+++ b/security/landlock/checker_fs.h
@@ -0,0 +1,20 @@
+/*
+ * Landlock LSM - File System Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_CHECKER_FS_H
+#define _SECURITY_LANDLOCK_CHECKER_FS_H
+
+#include <linux/fs.h>
+#include <linux/seccomp.h>
+
+extern const struct bpf_func_proto bpf_landlock_cmp_fs_prop_with_struct_file_proto;
+extern const struct bpf_func_proto bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+
+#endif /* _SECURITY_LANDLOCK_CHECKER_FS_H */
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index c032183e5d95..952b7bc66328 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -17,6 +17,8 @@
 #include <linux/lsm_hooks.h>
 #include <linux/types.h> /* uintptr_t */
 
+#include "checker_fs.h"
+
 #define LANDLOCK_MAP0(m, ...)
 #define LANDLOCK_MAP1(m, d, t, a) m(d, t, a)
 #define LANDLOCK_MAP2(m, d, t, a, ...) m(d, t, a), LANDLOCK_MAP1(m, __VA_ARGS__)
@@ -70,6 +72,10 @@ static const struct bpf_func_proto *bpf_landlock_func_proto(
 		enum bpf_func_id func_id, union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
+	case BPF_FUNC_landlock_cmp_fs_prop_with_struct_file:
+		return &bpf_landlock_cmp_fs_prop_with_struct_file_proto;
+	case BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file:
+		return &bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
 	default:
 		return NULL;
 	}
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 08/22] seccomp: Fix documentation for struct seccomp_filter
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (6 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 07/22] landlock: Handle file comparisons Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 09/22] seccomp: Move struct seccomp_filter in seccomp.h Mickaël Salaün
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
---
 kernel/seccomp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0db7c8a2afe2..dccfc05cb3ec 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -41,8 +41,7 @@
  *         outside of a lifetime-guarded section.  In general, this
  *         is only needed for handling filters shared across tasks.
  * @prev: points to a previously installed, or inherited, filter
- * @len: the number of instructions in the program
- * @insnsi: the BPF program instructions to evaluate
+ * @prog: the BPF program to evaluate
  *
  * seccomp_filter objects are organized in a tree linked via the @prev
  * pointer.  For any task, it appears to be a singly-linked list starting
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 09/22] seccomp: Move struct seccomp_filter in seccomp.h
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (7 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 08/22] seccomp: Fix documentation for struct seccomp_filter Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 10/22] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Set struct seccomp_filter public because of the next use of
the new field thread_prev added for Landlock LSM.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
---
 include/linux/seccomp.h | 27 ++++++++++++++++++++++++++-
 kernel/seccomp.c        | 26 --------------------------
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index ecc296c137cd..a0459a7315ce 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,7 +10,32 @@
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
 
-struct seccomp_filter;
+/**
+ * struct seccomp_filter - container for seccomp BPF programs
+ *
+ * @usage: reference count to manage the object lifetime.
+ *         get/put helpers should be used when accessing an instance
+ *         outside of a lifetime-guarded section.  In general, this
+ *         is only needed for handling filters shared across tasks.
+ * @prev: points to a previously installed, or inherited, filter
+ * @prog: the BPF program to evaluate
+ *
+ * seccomp_filter objects are organized in a tree linked via the @prev
+ * pointer.  For any task, it appears to be a singly-linked list starting
+ * with current->seccomp.filter, the most recently attached or inherited filter.
+ * However, multiple filters may share a @prev node, by way of fork(), which
+ * results in a unidirectional tree existing in memory.  This is similar to
+ * how namespaces work.
+ *
+ * seccomp_filter objects should never be modified after being attached
+ * to a task_struct (other than @usage).
+ */
+struct seccomp_filter {
+	atomic_t usage;
+	struct seccomp_filter *prev;
+	struct bpf_prog *prog;
+};
+
 /**
  * struct seccomp - the state of a seccomp'ed process
  *
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index dccfc05cb3ec..1867bbfa7c6c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -33,32 +33,6 @@
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
 
-/**
- * struct seccomp_filter - container for seccomp BPF programs
- *
- * @usage: reference count to manage the object lifetime.
- *         get/put helpers should be used when accessing an instance
- *         outside of a lifetime-guarded section.  In general, this
- *         is only needed for handling filters shared across tasks.
- * @prev: points to a previously installed, or inherited, filter
- * @prog: the BPF program to evaluate
- *
- * seccomp_filter objects are organized in a tree linked via the @prev
- * pointer.  For any task, it appears to be a singly-linked list starting
- * with current->seccomp.filter, the most recently attached or inherited filter.
- * However, multiple filters may share a @prev node, by way of fork(), which
- * results in a unidirectional tree existing in memory.  This is similar to
- * how namespaces work.
- *
- * seccomp_filter objects should never be modified after being attached
- * to a task_struct (other than @usage).
- */
-struct seccomp_filter {
-	atomic_t usage;
-	struct seccomp_filter *prev;
-	struct bpf_prog *prog;
-};
-
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 10/22] seccomp: Split put_seccomp_filter() with put_seccomp()
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (8 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 09/22] seccomp: Move struct seccomp_filter in seccomp.h Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy Mickaël Salaün
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

The semantic is unchanged. This will be useful for the Landlock
integration with seccomp (next commit).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
---
 include/linux/seccomp.h |  5 +++--
 kernel/fork.c           |  2 +-
 kernel/seccomp.c        | 18 +++++++++++++-----
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index a0459a7315ce..ffdab7cdd162 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -102,13 +102,14 @@ static inline int seccomp_mode(struct seccomp *s)
 #endif /* CONFIG_SECCOMP */
 
 #ifdef CONFIG_SECCOMP_FILTER
-extern void put_seccomp_filter(struct task_struct *tsk);
+extern void put_seccomp(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
 #else  /* CONFIG_SECCOMP_FILTER */
-static inline void put_seccomp_filter(struct task_struct *tsk)
+static inline void put_seccomp(struct task_struct *tsk)
 {
 	return;
 }
+
 static inline void get_seccomp_filter(struct task_struct *tsk)
 {
 	return;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3584f521e3a6..99df46f157cf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -276,7 +276,7 @@ void free_task(struct task_struct *tsk)
 	free_thread_stack(tsk);
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
-	put_seccomp_filter(tsk);
+	put_seccomp(tsk);
 	arch_release_task_struct(tsk);
 	free_task_struct(tsk);
 }
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 1867bbfa7c6c..92b15083b1b2 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -36,6 +36,8 @@
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
+static void put_seccomp_filter(struct seccomp_filter *filter);
+
 /*
  * Endianness is explicitly ignored and left for BPF program authors to manage
  * as per the specific architecture.
@@ -286,7 +288,7 @@ static inline void seccomp_sync_threads(void)
 		 * current's path will hold a reference.  (This also
 		 * allows a put before the assignment.)
 		 */
-		put_seccomp_filter(thread);
+		put_seccomp_filter(thread->seccomp.filter);
 		smp_store_release(&thread->seccomp.filter,
 				  caller->seccomp.filter);
 
@@ -448,10 +450,11 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter)
 	}
 }
 
-/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
-void put_seccomp_filter(struct task_struct *tsk)
+/* put_seccomp_filter - decrements the ref count of a filter */
+static void put_seccomp_filter(struct seccomp_filter *filter)
 {
-	struct seccomp_filter *orig = tsk->seccomp.filter;
+	struct seccomp_filter *orig = filter;
+
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
@@ -460,6 +463,11 @@ void put_seccomp_filter(struct task_struct *tsk)
 	}
 }
 
+void put_seccomp(struct task_struct *tsk)
+{
+	put_seccomp_filter(tsk->seccomp.filter);
+}
+
 /**
  * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
  * @syscall: syscall number to send to userland
@@ -871,7 +879,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
 	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
 		ret = -EFAULT;
 
-	put_seccomp_filter(task);
+	put_seccomp_filter(task->seccomp.filter);
 	return ret;
 
 out:
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (9 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 10/22] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 18:43   ` Andy Lutomirski
  2016-09-14  7:24 ` [RFC v3 12/22] bpf: Cosmetic change for bpf_prog_attach() Mickaël Salaün
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups, Andrew Morton

A Landlock program will be triggered according to its subtype/origin
bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
Landlock program when a seccomp filter will return RET_LANDLOCK.
Moreover, it is possible to return a 16-bit cookie which will be
readable by the Landlock programs in its context.

Only seccomp filters loaded from the same thread and before a Landlock
program can trigger it through LANDLOCK_FLAG_ORIGIN_SECCOMP. Multiple
Landlock programs can be triggered by one or more seccomp filters. This
way, each RET_LANDLOCK (with specific cookie) will trigger all the
allowed Landlock programs once.

Changes since v2:
* Landlock programs can now be run without seccomp filter but for any
  syscall (from the process) or interruption
* move Landlock related functions and structs into security/landlock/*
  (to manage cgroups as well)
* fix seccomp filter handling: run Landlock programs for each of their
  legitimate seccomp filter
* properly clean up all seccomp results
* cosmetic changes to ease the understanding
* fix some ifdef

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/landlock.h     |  77 ++++++++++++++
 include/linux/seccomp.h      |  26 +++++
 include/uapi/linux/seccomp.h |   2 +
 kernel/fork.c                |  23 +++-
 kernel/seccomp.c             |  68 +++++++++++-
 security/landlock/Makefile   |   2 +-
 security/landlock/common.h   |  27 +++++
 security/landlock/lsm.c      |  96 ++++++++++++++++-
 security/landlock/manager.c  | 242 +++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 552 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/landlock.h
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/manager.c

diff --git a/include/linux/landlock.h b/include/linux/landlock.h
new file mode 100644
index 000000000000..932ae57fa70e
--- /dev/null
+++ b/include/linux/landlock.h
@@ -0,0 +1,77 @@
+/*
+ * Landlock LSM - Public headers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_LANDLOCK_H
+#define _LINUX_LANDLOCK_H
+#ifdef CONFIG_SECURITY_LANDLOCK
+
+#include <linux/bpf.h>	/* _LANDLOCK_HOOK_LAST */
+#include <linux/types.h> /* atomic_t */
+
+#ifdef CONFIG_SECCOMP_FILTER
+#include <linux/seccomp.h> /* struct seccomp_filter */
+#endif /* CONFIG_SECCOMP_FILTER */
+
+
+#ifdef CONFIG_SECCOMP_FILTER
+struct landlock_seccomp_ret {
+	struct landlock_seccomp_ret *prev;
+	struct seccomp_filter *filter;
+	u16 cookie;
+	bool triggered;
+};
+#endif /* CONFIG_SECCOMP_FILTER */
+
+struct landlock_rule {
+	atomic_t usage;
+	struct landlock_rule *prev;
+	/*
+	 * List of filters (through filter->thread_prev) allowed to trigger
+	 * this Landlock program.
+	 */
+	struct bpf_prog *prog;
+#ifdef CONFIG_SECCOMP_FILTER
+	struct seccomp_filter *thread_filter;
+#endif /* CONFIG_SECCOMP_FILTER */
+};
+
+/**
+ * struct landlock_hooks - Landlock hook programs enforced on a thread
+ *
+ * This is used for low performance impact when forking a process. Instead of
+ * copying the full array and incrementing the usage field of each entries,
+ * only create a pointer to struct landlock_hooks and increment the usage
+ * field.
+ *
+ * A new struct landlock_hooks must be created thanks to a call to
+ * new_landlock_hooks().
+ *
+ * @usage: reference count to manage the object lifetime. When a thread need to
+ *         add Landlock programs and if @usage is greater than 1, then the
+ *         thread must duplicate struct landlock_hooks to not change the
+ *         children' rules as well.
+ */
+struct landlock_hooks {
+	atomic_t usage;
+	struct landlock_rule *rules[_LANDLOCK_HOOK_LAST];
+};
+
+
+struct landlock_hooks *new_landlock_hooks(void);
+void put_landlock_hooks(struct landlock_hooks *hooks);
+
+#ifdef CONFIG_SECCOMP_FILTER
+void put_landlock_ret(struct landlock_seccomp_ret *landlock_ret);
+int landlock_seccomp_set_hook(unsigned int flags,
+		const char __user *user_bpf_fd);
+#endif /* CONFIG_SECCOMP_FILTER */
+
+#endif /* CONFIG_SECURITY_LANDLOCK */
+#endif /* _LINUX_LANDLOCK_H */
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index ffdab7cdd162..3cb90bf43a24 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,6 +10,10 @@
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
 
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+#include <linux/landlock.h>
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
+
 /**
  * struct seccomp_filter - container for seccomp BPF programs
  *
@@ -19,6 +23,7 @@
  *         is only needed for handling filters shared across tasks.
  * @prev: points to a previously installed, or inherited, filter
  * @prog: the BPF program to evaluate
+ * @thread_prev: points to filters installed by the same thread
  *
  * seccomp_filter objects are organized in a tree linked via the @prev
  * pointer.  For any task, it appears to be a singly-linked list starting
@@ -34,6 +39,9 @@ struct seccomp_filter {
 	atomic_t usage;
 	struct seccomp_filter *prev;
 	struct bpf_prog *prog;
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+	struct seccomp_filter *thread_prev;
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
 };
 
 /**
@@ -43,6 +51,11 @@ struct seccomp_filter {
  *         system calls available to a process.
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *          accessed without locking during system call entry.
+ * @thread_filter: list of filters allowed to trigger an associated Landlock
+ *                 hook via a RET_LANDLOCK; must walk through thread_prev.
+ * @landlock_ret: one unique private list per thread storing the RET_LANDLOCK
+ *                values of all filters.
+ * @landlock_hooks: contains an array of Landlock programs.
  *
  *          @filter must only be accessed from the context of current as there
  *          is no read locking.
@@ -50,6 +63,12 @@ struct seccomp_filter {
 struct seccomp {
 	int mode;
 	struct seccomp_filter *filter;
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+	struct seccomp_filter *thread_filter;
+	struct landlock_seccomp_ret *landlock_ret;
+	struct landlock_hooks *landlock_hooks;
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
 };
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
@@ -103,13 +122,20 @@ static inline int seccomp_mode(struct seccomp *s)
 
 #ifdef CONFIG_SECCOMP_FILTER
 extern void put_seccomp(struct task_struct *tsk);
+extern void put_seccomp_filter(struct seccomp_filter *filter);
 extern void get_seccomp_filter(struct task_struct *tsk);
+
 #else  /* CONFIG_SECCOMP_FILTER */
 static inline void put_seccomp(struct task_struct *tsk)
 {
 	return;
 }
 
+static void put_seccomp_filter(struct seccomp_filter *filter)
+{
+	return;
+}
+
 static inline void get_seccomp_filter(struct task_struct *tsk)
 {
 	return;
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a43ff1e..a1273ceb5b3d 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,6 +13,7 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
+#define SECCOMP_SET_LANDLOCK_HOOK	2
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
@@ -28,6 +29,7 @@
 #define SECCOMP_RET_KILL	0x00000000U /* kill the task immediately */
 #define SECCOMP_RET_TRAP	0x00030000U /* disallow and force a SIGSYS */
 #define SECCOMP_RET_ERRNO	0x00050000U /* returns an errno */
+#define SECCOMP_RET_LANDLOCK	0x00070000U /* trigger Landlock LSM */
 #define SECCOMP_RET_TRACE	0x7ff00000U /* pass to a tracer or disallow */
 #define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 99df46f157cf..3dba89fa2cea 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -429,7 +429,12 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	 * the usage counts on the error path calling free_task.
 	 */
 	tsk->seccomp.filter = NULL;
-#endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+	tsk->seccomp.thread_filter = NULL;
+	tsk->seccomp.landlock_ret = NULL;
+	tsk->seccomp.landlock_hooks = NULL;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+#endif /* CONFIG_SECCOMP */
 
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
@@ -1284,7 +1289,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	return 0;
 }
 
-static void copy_seccomp(struct task_struct *p)
+static int copy_seccomp(struct task_struct *p)
 {
 #ifdef CONFIG_SECCOMP
 	/*
@@ -1297,7 +1302,14 @@ static void copy_seccomp(struct task_struct *p)
 
 	/* Ref-count the new filter user, and assign it. */
 	get_seccomp_filter(current);
-	p->seccomp = current->seccomp;
+	p->seccomp.mode = current->seccomp.mode;
+	p->seccomp.filter = current->seccomp.filter;
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+	/* No copy for thread_filter nor landlock_ret. */
+	p->seccomp.landlock_hooks = current->seccomp.landlock_hooks;
+	if (p->seccomp.landlock_hooks)
+		atomic_inc(&p->seccomp.landlock_hooks->usage);
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
 
 	/*
 	 * Explicitly enable no_new_privs here in case it got set
@@ -1315,6 +1327,7 @@ static void copy_seccomp(struct task_struct *p)
 	if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
 		set_tsk_thread_flag(p, TIF_SECCOMP);
 #endif
+	return 0;
 }
 
 SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
@@ -1674,7 +1687,9 @@ static __latent_entropy struct task_struct *copy_process(
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
 	 */
-	copy_seccomp(p);
+	retval = copy_seccomp(p);
+	if (retval)
+		goto bad_fork_cancel_cgroup;
 
 	/*
 	 * Process group and session signals need to be delivered to just the
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 92b15083b1b2..13729b8b9f21 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -6,6 +6,8 @@
  * Copyright (C) 2012 Google, Inc.
  * Will Drewry <wad@chromium.org>
  *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
  * This defines a simple but solid secure-computing facility.
  *
  * Mode 1 uses a fixed list of allowed system calls.
@@ -32,12 +34,11 @@
 #include <linux/security.h>
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
+#include <linux/landlock.h>
 
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
-static void put_seccomp_filter(struct seccomp_filter *filter);
-
 /*
  * Endianness is explicitly ignored and left for BPF program authors to manage
  * as per the specific architecture.
@@ -152,6 +153,10 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd)
 {
 	struct seccomp_data sd_local;
 	u32 ret = SECCOMP_RET_ALLOW;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	struct landlock_seccomp_ret *landlock_ret, *init_landlock_ret =
+		current->seccomp.landlock_ret;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	/* Make sure cross-thread synced filter points somewhere sane. */
 	struct seccomp_filter *f =
 			lockless_dereference(current->seccomp.filter);
@@ -171,8 +176,46 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd)
 	 */
 	for (; f; f = f->prev) {
 		u32 cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
+		u32 action = cur_ret & SECCOMP_RET_ACTION;
+#ifdef CONFIG_SECURITY_LANDLOCK
+		u32 data = cur_ret & SECCOMP_RET_DATA;
+
+		if (action == SECCOMP_RET_LANDLOCK &&
+				current->seccomp.landlock_hooks) {
+			bool found_ret = false;
+
+			/*
+			 * Keep track of filters from the current task that
+			 * trigger a RET_LANDLOCK.
+			 */
+			for (landlock_ret = init_landlock_ret;
+					landlock_ret;
+					landlock_ret = landlock_ret->prev) {
+				if (landlock_ret->filter == f) {
+					landlock_ret->triggered = true;
+					landlock_ret->cookie = data;
+					found_ret = true;
+					break;
+				}
+			}
+			if (!found_ret) {
+				/*
+				 * Lazy allocation of landlock_ret; it will be
+				 * freed when the thread will exit.
+				 */
+				landlock_ret = kzalloc(sizeof(*landlock_ret),
+						GFP_KERNEL);
+				if (!landlock_ret)
+					return SECCOMP_RET_KILL;
+				atomic_inc(&f->usage);
+				landlock_ret->filter = f;
+				landlock_ret->prev = current->seccomp.landlock_ret;
+				current->seccomp.landlock_ret = landlock_ret;
+			}
+		}
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
-		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
+		if (action < (ret & SECCOMP_RET_ACTION))
 			ret = cur_ret;
 	}
 	return ret;
@@ -424,6 +467,11 @@ static long seccomp_attach_filter(unsigned int flags,
 	 */
 	filter->prev = current->seccomp.filter;
 	current->seccomp.filter = filter;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	/* Chain the filters from the same thread, if any. */
+	filter->thread_prev = current->seccomp.thread_filter;
+	current->seccomp.thread_filter = filter;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 	/* Now that the new filter is in place, synchronize to all threads. */
 	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
@@ -451,14 +499,16 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter)
 }
 
 /* put_seccomp_filter - decrements the ref count of a filter */
-static void put_seccomp_filter(struct seccomp_filter *filter)
+void put_seccomp_filter(struct seccomp_filter *filter)
 {
 	struct seccomp_filter *orig = filter;
 
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
+
 		orig = orig->prev;
+		/* must not put orig->thread_prev */
 		seccomp_filter_free(freeme);
 	}
 }
@@ -466,6 +516,10 @@ static void put_seccomp_filter(struct seccomp_filter *filter)
 void put_seccomp(struct task_struct *tsk)
 {
 	put_seccomp_filter(tsk->seccomp.filter);
+#ifdef CONFIG_SECURITY_LANDLOCK
+	put_landlock_hooks(tsk->seccomp.landlock_hooks);
+	put_landlock_ret(tsk->seccomp.landlock_ret);
+#endif /* CONFIG_SECURITY_LANDLOCK */
 }
 
 /**
@@ -612,6 +666,8 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 
 		return 0;
 
+	case SECCOMP_RET_LANDLOCK:
+		/* fall through */
 	case SECCOMP_RET_ALLOW:
 		return 0;
 
@@ -770,6 +826,10 @@ static long do_seccomp(unsigned int op, unsigned int flags,
 		return seccomp_set_mode_strict();
 	case SECCOMP_SET_MODE_FILTER:
 		return seccomp_set_mode_filter(flags, uargs);
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+	case SECCOMP_SET_LANDLOCK_HOOK:
+		return landlock_seccomp_set_hook(flags, uargs);
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
 	default:
 		return -EINVAL;
 	}
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 27f359a8cfaa..432c83e7c6b9 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := lsm.o checker_fs.o
+landlock-y := lsm.o manager.o checker_fs.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
new file mode 100644
index 000000000000..4e686b40c87f
--- /dev/null
+++ b/security/landlock/common.h
@@ -0,0 +1,27 @@
+/*
+ * Landlock LSM - private headers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_COMMON_H
+#define _SECURITY_LANDLOCK_COMMON_H
+
+#include <linux/bpf.h> /* enum landlock_hook_id */
+
+/**
+ * get_index - get an index for the rules of struct landlock_hooks
+ *
+ * @hook_id: a Landlock hook ID
+ */
+static inline int get_index(enum landlock_hook_id hook_id)
+{
+	/* hook ID > 0 for loaded programs */
+	return hook_id - 1;
+}
+
+#endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 952b7bc66328..b6e0bace683d 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -14,10 +14,13 @@
 #include <linux/err.h> /* MAX_ERRNO */
 #include <linux/filter.h> /* struct bpf_prog, BPF_PROG_RUN() */
 #include <linux/kernel.h> /* FIELD_SIZEOF() */
+#include <linux/landlock.h>
 #include <linux/lsm_hooks.h>
+#include <linux/seccomp.h> /* struct seccomp_* */
 #include <linux/types.h> /* uintptr_t */
 
 #include "checker_fs.h"
+#include "common.h"
 
 #define LANDLOCK_MAP0(m, ...)
 #define LANDLOCK_MAP1(m, d, t, a) m(d, t, a)
@@ -62,10 +65,99 @@
 
 #define LANDLOCK_HOOK_INIT(NAME) LSM_HOOK_INIT(NAME, landlock_hook_##NAME)
 
+/**
+ * landlock_run_prog_for_syscall - run Landlock program for a syscall
+ *
+ * @hook_idx: hook index in the rules array
+ * @ctx: non-NULL eBPF context; the "origin" field will be updated
+ * @hooks: Landlock hooks pointer
+ */
+static u32 landlock_run_prog_for_syscall(u32 hook_idx,
+		struct landlock_data *ctx, struct landlock_hooks *hooks)
+{
+	struct landlock_rule *rule;
+	u32 cur_ret = 0, ret = 0;
+
+	if (!hooks)
+		return 0;
+
+	for (rule = hooks->rules[hook_idx]; rule && !ret; rule = rule->prev) {
+		if (!(rule->prog->subtype.landlock_hook.origin & ctx->origin))
+			continue;
+		cur_ret = BPF_PROG_RUN(rule->prog, (void *)ctx);
+		if (cur_ret > MAX_ERRNO)
+			ret = MAX_ERRNO;
+		else
+			ret = cur_ret;
+	}
+	return ret;
+}
 
 static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 {
-	return 0;
+	u32 cur_ret = 0, ret = 0;
+#ifdef CONFIG_SECCOMP_FILTER
+	struct landlock_seccomp_ret *lr;
+#endif /* CONFIG_SECCOMP_FILTER */
+	struct landlock_rule *rule;
+	u32 hook_idx = get_index(hook_id);
+
+	struct landlock_data ctx = {
+		.hook = hook_id,
+		.cookie = 0,
+		.args[0] = args[0],
+		.args[1] = args[1],
+		.args[2] = args[2],
+		.args[3] = args[3],
+		.args[4] = args[4],
+		.args[5] = args[5],
+	};
+
+	/* TODO: use lockless_dereference()? */
+
+#ifdef CONFIG_SECCOMP_FILTER
+	/* seccomp triggers and landlock_ret cleanup */
+	ctx.origin = LANDLOCK_FLAG_ORIGIN_SECCOMP;
+	for (lr = current->seccomp.landlock_ret; lr; lr = lr->prev) {
+		if (!lr->triggered)
+			continue;
+		lr->triggered = false;
+		/* clean up all seccomp results */
+		if (ret)
+			continue;
+		ctx.cookie = lr->cookie;
+		for (rule = current->seccomp.landlock_hooks->rules[hook_idx];
+				rule && !ret; rule = rule->prev) {
+			struct seccomp_filter *filter;
+
+			if (!(rule->prog->subtype.landlock_hook.origin &
+						ctx.origin))
+				continue;
+			for (filter = rule->thread_filter; filter;
+					filter = filter->thread_prev) {
+				if (rule->thread_filter != lr->filter)
+					continue;
+				cur_ret = BPF_PROG_RUN(rule->prog, (void *)&ctx);
+				if (cur_ret > MAX_ERRNO)
+					ret = MAX_ERRNO;
+				else
+					ret = cur_ret;
+				/* walk to the next program */
+				break;
+			}
+		}
+	}
+	if (ret)
+		return -ret;
+	ctx.cookie = 0;
+
+	/* syscall trigger */
+	ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+	ret = landlock_run_prog_for_syscall(hook_idx, &ctx,
+			current->seccomp.landlock_hooks);
+#endif /* CONFIG_SECCOMP_FILTER */
+
+	return -ret;
 }
 
 static const struct bpf_func_proto *bpf_landlock_func_proto(
@@ -152,7 +244,7 @@ static struct security_hook_list landlock_hooks[] = {
 
 void __init landlock_add_hooks(void)
 {
-	pr_info("landlock: Becoming ready for sandboxing\n");
+	pr_info("landlock: Becoming ready to sandbox with seccomp\n");
 	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks));
 }
 
diff --git a/security/landlock/manager.c b/security/landlock/manager.c
new file mode 100644
index 000000000000..e9f3f1092023
--- /dev/null
+++ b/security/landlock/manager.c
@@ -0,0 +1,242 @@
+/*
+ * Landlock LSM - seccomp and cgroups managers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/atomic.h> /* atomic_*() */
+#include <asm/page.h> /* PAGE_SIZE */
+#include <asm/uaccess.h> /* copy_from_user() */
+#include <linux/bpf.h> /* bpf_prog_put() */
+#include <linux/filter.h> /* struct bpf_prog */
+#include <linux/kernel.h> /* round_up() */
+#include <linux/landlock.h>
+#include <linux/sched.h> /* current_cred(), task_no_new_privs() */
+#include <linux/security.h> /* security_capable_noaudit() */
+#include <linux/slab.h> /* alloc(), kfree() */
+#include <linux/types.h> /* atomic_t */
+
+#ifdef CONFIG_SECCOMP_FILTER
+#include <linux/seccomp.h> /* struct seccomp_filter */
+#endif /* CONFIG_SECCOMP_FILTER */
+
+#include "common.h"
+
+static void put_landlock_rule(struct landlock_rule *rule)
+{
+	struct landlock_rule *orig = rule;
+
+	/* Clean up single-reference branches iteratively. */
+	while (orig && atomic_dec_and_test(&orig->usage)) {
+		struct landlock_rule *freeme = orig;
+
+#ifdef CONFIG_SECCOMP_FILTER
+		put_seccomp_filter(orig->thread_filter);
+#endif /* CONFIG_SECCOMP_FILTER */
+		bpf_prog_put(orig->prog);
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+
+void put_landlock_hooks(struct landlock_hooks *hooks)
+{
+	if (!hooks)
+		return;
+
+	if (atomic_dec_and_test(&hooks->usage)) {
+		int i;
+
+		for (i = 0; i < ARRAY_SIZE(hooks->rules); i++)
+			put_landlock_rule(hooks->rules[i]);
+		kfree(hooks);
+	}
+}
+
+#ifdef CONFIG_SECCOMP_FILTER
+void put_landlock_ret(struct landlock_seccomp_ret *landlock_ret)
+{
+	struct landlock_seccomp_ret *orig = landlock_ret;
+
+	while (orig) {
+		struct landlock_seccomp_ret *freeme = orig;
+
+		put_seccomp_filter(orig->filter);
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+#endif /* CONFIG_SECCOMP_FILTER */
+
+struct landlock_hooks *new_landlock_hooks(void)
+{
+	struct landlock_hooks *ret;
+
+	/* array filled with NULL values */
+	ret = kzalloc(sizeof(*ret), GFP_KERNEL);
+	if (!ret)
+		return ERR_PTR(-ENOMEM);
+	atomic_set(&ret->usage, 1);
+	return ret;
+}
+
+/* Limit Landlock hooks to 256KB. */
+#define LANDLOCK_HOOKS_MAX_PAGES (1 << 6)
+
+/**
+ * landlock_set_hook - attach a Landlock program to @current_hooks
+ *
+ * @current_hooks: landlock_hooks pointer, must be locked (if needed) to
+ *                 prevent a concurrent put/free. This pointer must not be
+ *                 freed after the call.
+ * @prog: non-NULL Landlock program to append to @current_hooks. @prog will be
+ *        owned by landlock_set_hook() and freed if an error happened.
+ * @thread_filter: pointer to the seccomp filter of the current thread, if any
+ *
+ * Return @current_hooks or a new pointer when OK. Return a pointer error
+ * otherwise.
+ */
+static struct landlock_hooks *landlock_set_hook(
+		struct landlock_hooks *current_hooks, struct bpf_prog *prog,
+		struct seccomp_filter *thread_filter)
+{
+	struct landlock_hooks *new_hooks = current_hooks;
+	unsigned long pages;
+	struct landlock_rule *rule;
+	u32 hook_idx;
+
+	if (prog->type != BPF_PROG_TYPE_LANDLOCK) {
+		new_hooks = ERR_PTR(-EINVAL);
+		goto put_prog;
+	}
+
+	/* validate allocated memory */
+	pages = prog->pages;
+	if (current_hooks) {
+		int i;
+		struct landlock_rule *walker;
+
+		for (i = 0; i < ARRAY_SIZE(current_hooks->rules); i++) {
+			for (walker = current_hooks->rules[i]; walker;
+					walker = walker->prev) {
+				/* TODO: add penalty for each prog? */
+				pages += walker->prog->pages;
+			}
+		}
+		/* count landlock_hooks if we will allocate it */
+		if (atomic_read(&current_hooks->usage) != 1)
+			pages += round_up(sizeof(*current_hooks), PAGE_SIZE) /
+				PAGE_SIZE;
+	}
+	if (pages > LANDLOCK_HOOKS_MAX_PAGES) {
+		new_hooks = ERR_PTR(-E2BIG);
+		goto put_prog;
+	}
+
+	rule = kmalloc(sizeof(*rule), GFP_KERNEL);
+	if (!rule) {
+		new_hooks = ERR_PTR(-ENOMEM);
+		goto put_prog;
+	}
+	rule->prev = NULL;
+	rule->prog = prog;
+	/* attach the filters from the same thread, if any */
+	rule->thread_filter = thread_filter;
+	if (rule->thread_filter)
+		atomic_inc(&rule->thread_filter->usage);
+	atomic_set(&rule->usage, 1);
+
+	if (!current_hooks) {
+		/* add a new landlock_hooks, if needed */
+		new_hooks = new_landlock_hooks();
+		if (IS_ERR(new_hooks))
+			goto put_rule;
+	} else if (atomic_read(&current_hooks->usage) > 1) {
+		int i;
+
+		/* copy landlock_hooks, if shared */
+		new_hooks = new_landlock_hooks();
+		if (IS_ERR(new_hooks))
+			goto put_rule;
+		for (i = 0; i < ARRAY_SIZE(new_hooks->rules); i++) {
+			new_hooks->rules[i] =
+				current_hooks->rules[i];
+			if (new_hooks->rules[i])
+				atomic_inc(&new_hooks->rules[i]->usage);
+		}
+		/*
+		 * @current_hooks will not be freed here because it's usage
+		 * field is > 1. It is only prevented to be freed by another
+		 * subject thanks to the caller of landlock_set_hook() which
+		 * should be locked if needed.
+		 */
+		put_landlock_hooks(current_hooks);
+	}
+
+	/* subtype.landlock_hook.id > 0 for loaded programs */
+	hook_idx = get_index(rule->prog->subtype.landlock_hook.id);
+	rule->prev = new_hooks->rules[hook_idx];
+	new_hooks->rules[hook_idx] = rule;
+	return new_hooks;
+
+put_prog:
+	bpf_prog_put(prog);
+	return new_hooks;
+
+put_rule:
+	put_landlock_rule(rule);
+	return new_hooks;
+}
+
+/**
+ * landlock_set_hook - attach a Landlock program to the current process
+ *
+ * current->seccomp.landlock_hooks is lazily allocated. When a process fork,
+ * only a pointer is copied. When a new hook is added by a process, if there is
+ * other references to this process' landlock_hooks, then a new allocation is
+ * made to contains an array pointing to Landlock program lists. This design
+ * has low-performance impact and memory efficiency while keeping the property
+ * of append-only programs.
+ *
+ * @flags: not used for now, but could be used for TSYNC
+ * @user_bpf_fd: file descriptor pointing to a loaded/checked eBPF program
+ *               dedicated to Landlock
+ */
+#ifdef CONFIG_SECCOMP_FILTER
+int landlock_seccomp_set_hook(unsigned int flags, const char __user *user_bpf_fd)
+{
+	struct landlock_hooks *new_hooks;
+	struct bpf_prog *prog;
+	int bpf_fd;
+
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(),
+				     current_user_ns(), CAP_SYS_ADMIN) != 0)
+		return -EPERM;
+	if (!user_bpf_fd)
+		return -EINVAL;
+	if (flags)
+		return -EINVAL;
+	if (copy_from_user(&bpf_fd, user_bpf_fd, sizeof(user_bpf_fd)))
+		return -EFAULT;
+	prog = bpf_prog_get(bpf_fd);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	/*
+	 * We don't need to lock anything for the current process hierarchy,
+	 * everything is guarded by the atomic counters.
+	 */
+	new_hooks = landlock_set_hook(current->seccomp.landlock_hooks, prog,
+			current->seccomp.thread_filter);
+	/* @prog is managed/freed by landlock_set_hook() */
+	if (IS_ERR(new_hooks))
+		return PTR_ERR(new_hooks);
+	current->seccomp.landlock_hooks = new_hooks;
+	return 0;
+}
+#endif /* CONFIG_SECCOMP_FILTER */
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 12/22] bpf: Cosmetic change for bpf_prog_attach()
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (10 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 13/22] bpf/cgroup: Replace struct bpf_prog with union bpf_object Mickaël Salaün
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Move code outside a switch/case to ease code factoring (cf. next
commit).

This apply on Daniel Mack's "Add eBPF hooks for cgroups":
https://lkml.kernel.org/r/1473696735-11269-1-git-send-email-daniel@zonque.org

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
---
 kernel/bpf/syscall.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f22e3b63d253..45a91d511119 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -843,23 +843,24 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET_EGRESS:
 		prog = bpf_prog_get_type(attr->attach_bpf_fd,
 					 BPF_PROG_TYPE_CGROUP_SOCKET);
-		if (IS_ERR(prog))
-			return PTR_ERR(prog);
-
-		cgrp = cgroup_get_from_fd(attr->target_fd);
-		if (IS_ERR(cgrp)) {
-			bpf_prog_put(prog);
-			return PTR_ERR(cgrp);
-		}
-
-		cgroup_bpf_update(cgrp, prog, attr->attach_type);
-		cgroup_put(cgrp);
 		break;
 
 	default:
 		return -EINVAL;
 	}
 
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	cgrp = cgroup_get_from_fd(attr->target_fd);
+	if (IS_ERR(cgrp)) {
+		bpf_prog_put(prog);
+		return PTR_ERR(cgrp);
+	}
+
+	cgroup_bpf_update(cgrp, prog, attr->attach_type);
+	cgroup_put(cgrp);
+
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 13/22] bpf/cgroup: Replace struct bpf_prog with union bpf_object
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (11 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 12/22] bpf: Cosmetic change for bpf_prog_attach() Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code Mickaël Salaün
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This allows CONFIG_CGROUP_BPF to manage different type of pointers
instead of only eBPF programs. This will be useful for the next commits
to support Landlock with cgroups.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Tejun Heo <tj@kernel.org>
---
 include/linux/bpf-cgroup.h |  8 ++++++--
 kernel/bpf/cgroup.c        | 44 +++++++++++++++++++++++---------------------
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index fc076de74ab9..2234042d7f61 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -14,14 +14,18 @@ struct sk_buff;
 extern struct static_key_false cgroup_bpf_enabled_key;
 #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key)
 
+union bpf_object {
+	struct bpf_prog *prog;
+};
+
 struct cgroup_bpf {
 	/*
 	 * Store two sets of bpf_prog pointers, one for programs that are
 	 * pinned directly to this cgroup, and one for those that are effective
 	 * when this cgroup is accessed.
 	 */
-	struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
-	struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
+	union bpf_object pinned[MAX_BPF_ATTACH_TYPE];
+	union bpf_object effective[MAX_BPF_ATTACH_TYPE];
 };
 
 void cgroup_bpf_put(struct cgroup *cgrp);
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 21d168c3ad35..782878ec4f2d 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -20,18 +20,18 @@ DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
 
 /**
- * cgroup_bpf_put() - put references of all bpf programs
+ * cgroup_bpf_put() - put references of all bpf objects
  * @cgrp: the cgroup to modify
  */
 void cgroup_bpf_put(struct cgroup *cgrp)
 {
 	unsigned int type;
 
-	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.prog); type++) {
-		struct bpf_prog *prog = cgrp->bpf.prog[type];
+	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.pinned); type++) {
+		union bpf_object pinned = cgrp->bpf.pinned[type];
 
-		if (prog) {
-			bpf_prog_put(prog);
+		if (pinned.prog) {
+			bpf_prog_put(pinned.prog);
 			static_branch_dec(&cgroup_bpf_enabled_key);
 		}
 	}
@@ -47,11 +47,12 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
 	unsigned int type;
 
 	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.effective); type++) {
-		struct bpf_prog *e;
+		union bpf_object e;
 
-		e = rcu_dereference_protected(parent->bpf.effective[type],
-					      lockdep_is_held(&cgroup_mutex));
-		rcu_assign_pointer(cgrp->bpf.effective[type], e);
+		e.prog = rcu_dereference_protected(
+				parent->bpf.effective[type].prog,
+				lockdep_is_held(&cgroup_mutex));
+		rcu_assign_pointer(cgrp->bpf.effective[type].prog, e.prog);
 	}
 }
 
@@ -87,32 +88,33 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 			 struct bpf_prog *prog,
 			 enum bpf_attach_type type)
 {
-	struct bpf_prog *old_prog, *effective;
+	union bpf_object obj, old_pinned, effective;
 	struct cgroup_subsys_state *pos;
 
-	old_prog = xchg(cgrp->bpf.prog + type, prog);
+	obj.prog = prog;
+	old_pinned = xchg(cgrp->bpf.pinned + type, obj);
 
-	effective = (!prog && parent) ?
-		rcu_dereference_protected(parent->bpf.effective[type],
+	effective.prog = (!obj.prog && parent) ?
+		rcu_dereference_protected(parent->bpf.effective[type].prog,
 					  lockdep_is_held(&cgroup_mutex)) :
-		prog;
+		obj.prog;
 
 	css_for_each_descendant_pre(pos, &cgrp->self) {
 		struct cgroup *desc = container_of(pos, struct cgroup, self);
 
 		/* skip the subtree if the descendant has its own program */
-		if (desc->bpf.prog[type] && desc != cgrp)
+		if (desc->bpf.pinned[type].prog && desc != cgrp)
 			pos = css_rightmost_descendant(pos);
 		else
-			rcu_assign_pointer(desc->bpf.effective[type],
-					   effective);
+			rcu_assign_pointer(desc->bpf.effective[type].prog,
+					   effective.prog);
 	}
 
-	if (prog)
+	if (obj.prog)
 		static_branch_inc(&cgroup_bpf_enabled_key);
 
-	if (old_prog) {
-		bpf_prog_put(old_prog);
+	if (old_pinned.prog) {
+		bpf_prog_put(old_pinned.prog);
 		static_branch_dec(&cgroup_bpf_enabled_key);
 	}
 }
@@ -151,7 +153,7 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 
 	rcu_read_lock();
 
-	prog = rcu_dereference(cgrp->bpf.effective[type]);
+	prog = rcu_dereference(cgrp->bpf.effective[type].prog);
 	if (prog) {
 		unsigned int offset = skb->data - skb_mac_header(skb);
 
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (12 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 13/22] bpf/cgroup: Replace struct bpf_prog with union bpf_object Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 21:16   ` Alexei Starovoitov
  2016-09-14  7:24 ` [RFC v3 15/22] bpf/cgroup: Move capability check Mickaël Salaün
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This will be useful to support Landlock for the next commits.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Tejun Heo <tj@kernel.org>
---
 include/linux/bpf-cgroup.h |  4 ++--
 kernel/bpf/cgroup.c        |  3 ++-
 kernel/bpf/syscall.c       | 10 ++++++----
 kernel/cgroup.c            |  6 ++++--
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 2234042d7f61..6cca7924ee17 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -31,13 +31,13 @@ struct cgroup_bpf {
 void cgroup_bpf_put(struct cgroup *cgrp);
 void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
 
-void __cgroup_bpf_update(struct cgroup *cgrp,
+int __cgroup_bpf_update(struct cgroup *cgrp,
 			 struct cgroup *parent,
 			 struct bpf_prog *prog,
 			 enum bpf_attach_type type);
 
 /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
-void cgroup_bpf_update(struct cgroup *cgrp,
+int cgroup_bpf_update(struct cgroup *cgrp,
 		       struct bpf_prog *prog,
 		       enum bpf_attach_type type);
 
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 782878ec4f2d..7b75fa692617 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -83,7 +83,7 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
  *
  * Must be called with cgroup_mutex held.
  */
-void __cgroup_bpf_update(struct cgroup *cgrp,
+int __cgroup_bpf_update(struct cgroup *cgrp,
 			 struct cgroup *parent,
 			 struct bpf_prog *prog,
 			 enum bpf_attach_type type)
@@ -117,6 +117,7 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 		bpf_prog_put(old_pinned.prog);
 		static_branch_dec(&cgroup_bpf_enabled_key);
 	}
+	return 0;
 }
 
 /**
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 45a91d511119..c978f2d9a1b3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -831,6 +831,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 {
 	struct bpf_prog *prog;
 	struct cgroup *cgrp;
+	int result;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -858,10 +859,10 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 		return PTR_ERR(cgrp);
 	}
 
-	cgroup_bpf_update(cgrp, prog, attr->attach_type);
+	result = cgroup_bpf_update(cgrp, prog, attr->attach_type);
 	cgroup_put(cgrp);
 
-	return 0;
+	return result;
 }
 
 #define BPF_PROG_DETACH_LAST_FIELD attach_type
@@ -869,6 +870,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 static int bpf_prog_detach(const union bpf_attr *attr)
 {
 	struct cgroup *cgrp;
+	int result = 0;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -883,7 +885,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
 
-		cgroup_bpf_update(cgrp, NULL, attr->attach_type);
+		result = cgroup_bpf_update(cgrp, NULL, attr->attach_type);
 		cgroup_put(cgrp);
 		break;
 
@@ -891,7 +893,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		return -EINVAL;
 	}
 
-	return 0;
+	return result;
 }
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 87324ce481b1..48b650a640a9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6450,15 +6450,17 @@ static __init int cgroup_namespaces_init(void)
 subsys_initcall(cgroup_namespaces_init);
 
 #ifdef CONFIG_CGROUP_BPF
-void cgroup_bpf_update(struct cgroup *cgrp,
+int cgroup_bpf_update(struct cgroup *cgrp,
 		       struct bpf_prog *prog,
 		       enum bpf_attach_type type)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
+	int result;
 
 	mutex_lock(&cgroup_mutex);
-	__cgroup_bpf_update(cgrp, parent, prog, type);
+	result = __cgroup_bpf_update(cgrp, parent, prog, type);
 	mutex_unlock(&cgroup_mutex);
+	return result;
 }
 #endif /* CONFIG_CGROUP_BPF */
 
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 15/22] bpf/cgroup: Move capability check
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (13 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup Mickaël Salaün
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This will be useful to be able to add more BPF attach type with
different capability checks.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
---
 kernel/bpf/syscall.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c978f2d9a1b3..8599596fd6cf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -833,15 +833,15 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	struct cgroup *cgrp;
 	int result;
 
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
 	if (CHECK_ATTR(BPF_PROG_ATTACH))
 		return -EINVAL;
 
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
+
 		prog = bpf_prog_get_type(attr->attach_bpf_fd,
 					 BPF_PROG_TYPE_CGROUP_SOCKET);
 		break;
@@ -872,15 +872,15 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	struct cgroup *cgrp;
 	int result = 0;
 
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
 	if (CHECK_ATTR(BPF_PROG_DETACH))
 		return -EINVAL;
 
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
+
 		cgrp = cgroup_get_from_fd(attr->target_fd);
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (14 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 15/22] bpf/cgroup: Move capability check Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-10-03 23:43   ` Kees Cook
  2016-09-14  7:24 ` [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd() Mickaël Salaün
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This allows to add new eBPF programs to Landlock hooks dedicated to a
cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF
programs, the Landlock hooks attached to a cgroup are propagated to the
nested cgroups. However, when a new Landlock program is attached to one
of this nested cgroup, this cgroup hierarchy fork the Landlock hooks.
This design is simple and match the current CONFIG_BPF_CGROUP
inheritance. The difference lie in the fact that Landlock programs can
only be stacked but not removed. This match the append-only seccomp
behavior. Userland is free to handle Landlock hooks attached to a cgroup
in more complicated ways (e.g. continuous inheritance), but care should
be taken to properly handle error cases (e.g. memory allocation errors).

Changes since v2:
* new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov)

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20160826021432.GA8291@ast-mbp.thefacebook.com
Link: https://lkml.kernel.org/r/20160827204307.GA43714@ast-mbp.thefacebook.com
---
 include/linux/bpf-cgroup.h  |  7 +++++++
 include/linux/cgroup-defs.h |  2 ++
 include/linux/landlock.h    |  9 +++++++++
 include/uapi/linux/bpf.h    |  1 +
 kernel/bpf/cgroup.c         | 33 ++++++++++++++++++++++++++++++---
 kernel/bpf/syscall.c        | 11 +++++++++++
 security/landlock/lsm.c     | 40 +++++++++++++++++++++++++++++++++++++++-
 security/landlock/manager.c | 32 ++++++++++++++++++++++++++++++++
 8 files changed, 131 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 6cca7924ee17..439c681159e2 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -14,8 +14,15 @@ struct sk_buff;
 extern struct static_key_false cgroup_bpf_enabled_key;
 #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key)
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct landlock_hooks;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 union bpf_object {
 	struct bpf_prog *prog;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	struct landlock_hooks *hooks;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 };
 
 struct cgroup_bpf {
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 861b4677fc5b..fe1023bf7b9d 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -301,8 +301,10 @@ struct cgroup {
 	/* used to schedule release agent */
 	struct work_struct release_agent_work;
 
+#ifdef CONFIG_CGROUP_BPF
 	/* used to store eBPF programs */
 	struct cgroup_bpf bpf;
+#endif /* CONFIG_CGROUP_BPF */
 
 	/* ids of the ancestors at each level including self */
 	int ancestor_ids[];
diff --git a/include/linux/landlock.h b/include/linux/landlock.h
index 932ae57fa70e..179a848110f3 100644
--- a/include/linux/landlock.h
+++ b/include/linux/landlock.h
@@ -19,6 +19,9 @@
 #include <linux/seccomp.h> /* struct seccomp_filter */
 #endif /* CONFIG_SECCOMP_FILTER */
 
+#ifdef CONFIG_CGROUP_BPF
+#include <linux/cgroup-defs.h> /* struct cgroup */
+#endif /* CONFIG_CGROUP_BPF */
 
 #ifdef CONFIG_SECCOMP_FILTER
 struct landlock_seccomp_ret {
@@ -65,6 +68,7 @@ struct landlock_hooks {
 
 
 struct landlock_hooks *new_landlock_hooks(void);
+void get_landlock_hooks(struct landlock_hooks *hooks);
 void put_landlock_hooks(struct landlock_hooks *hooks);
 
 #ifdef CONFIG_SECCOMP_FILTER
@@ -73,5 +77,10 @@ int landlock_seccomp_set_hook(unsigned int flags,
 		const char __user *user_bpf_fd);
 #endif /* CONFIG_SECCOMP_FILTER */
 
+#ifdef CONFIG_CGROUP_BPF
+struct landlock_hooks *landlock_cgroup_set_hook(struct cgroup *cgrp,
+		struct bpf_prog *prog);
+#endif /* CONFIG_CGROUP_BPF */
+
 #endif /* CONFIG_SECURITY_LANDLOCK */
 #endif /* _LINUX_LANDLOCK_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 905dcace7255..12e61508f879 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -124,6 +124,7 @@ enum bpf_prog_type {
 enum bpf_attach_type {
 	BPF_CGROUP_INET_INGRESS,
 	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_LANDLOCK,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 7b75fa692617..1c18fe46958a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -15,6 +15,7 @@
 #include <linux/bpf.h>
 #include <linux/bpf-cgroup.h>
 #include <net/sock.h>
+#include <linux/landlock.h>
 
 DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
@@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp)
 		union bpf_object pinned = cgrp->bpf.pinned[type];
 
 		if (pinned.prog) {
-			bpf_prog_put(pinned.prog);
+			switch (type) {
+			case BPF_CGROUP_LANDLOCK:
+#ifdef CONFIG_SECURITY_LANDLOCK
+				put_landlock_hooks(pinned.hooks);
+				break;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+			default:
+				bpf_prog_put(pinned.prog);
+			}
 			static_branch_dec(&cgroup_bpf_enabled_key);
 		}
 	}
@@ -53,6 +62,10 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
 				parent->bpf.effective[type].prog,
 				lockdep_is_held(&cgroup_mutex));
 		rcu_assign_pointer(cgrp->bpf.effective[type].prog, e.prog);
+#ifdef CONFIG_SECURITY_LANDLOCK
+		if (type == BPF_CGROUP_LANDLOCK)
+			get_landlock_hooks(e.hooks);
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	}
 }
 
@@ -91,7 +104,18 @@ int __cgroup_bpf_update(struct cgroup *cgrp,
 	union bpf_object obj, old_pinned, effective;
 	struct cgroup_subsys_state *pos;
 
-	obj.prog = prog;
+	switch (type) {
+	case BPF_CGROUP_LANDLOCK:
+#ifdef CONFIG_SECURITY_LANDLOCK
+		/* append hook */
+		obj.hooks = landlock_cgroup_set_hook(cgrp, prog);
+		if (IS_ERR(obj.hooks))
+			return PTR_ERR(obj.hooks);
+		break;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+	default:
+		obj.prog = prog;
+	}
 	old_pinned = xchg(cgrp->bpf.pinned + type, obj);
 
 	effective.prog = (!obj.prog && parent) ?
@@ -114,7 +138,10 @@ int __cgroup_bpf_update(struct cgroup *cgrp,
 		static_branch_inc(&cgroup_bpf_enabled_key);
 
 	if (old_pinned.prog) {
-		bpf_prog_put(old_pinned.prog);
+#ifdef CONFIG_SECURITY_LANDLOCK
+		if (type != BPF_CGROUP_LANDLOCK)
+			bpf_prog_put(old_pinned.prog);
+#endif /* CONFIG_SECURITY_LANDLOCK */
 		static_branch_dec(&cgroup_bpf_enabled_key);
 	}
 	return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8599596fd6cf..e9c5add327e6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -846,6 +846,16 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 					 BPF_PROG_TYPE_CGROUP_SOCKET);
 		break;
 
+	case BPF_CGROUP_LANDLOCK:
+#ifdef CONFIG_SECURITY_LANDLOCK
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+
+		prog = bpf_prog_get_type(attr->attach_bpf_fd,
+				BPF_PROG_TYPE_LANDLOCK);
+		break;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 	default:
 		return -EINVAL;
 	}
@@ -889,6 +899,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		cgroup_put(cgrp);
 		break;
 
+	case BPF_CGROUP_LANDLOCK:
 	default:
 		return -EINVAL;
 	}
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index b6e0bace683d..000dd0c7ec3d 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -9,6 +9,7 @@
  */
 
 #include <asm/current.h>
+#include <linux/bpf-cgroup.h> /* cgroup_bpf_enabled */
 #include <linux/bpf.h> /* enum bpf_reg_type, struct landlock_data */
 #include <linux/cred.h>
 #include <linux/err.h> /* MAX_ERRNO */
@@ -19,6 +20,10 @@
 #include <linux/seccomp.h> /* struct seccomp_* */
 #include <linux/types.h> /* uintptr_t */
 
+#ifdef CONFIG_CGROUP_BPF
+#include <linux/cgroup-defs.h> /* struct cgroup */
+#endif /* CONFIG_CGROUP_BPF */
+
 #include "checker_fs.h"
 #include "common.h"
 
@@ -99,6 +104,9 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 #ifdef CONFIG_SECCOMP_FILTER
 	struct landlock_seccomp_ret *lr;
 #endif /* CONFIG_SECCOMP_FILTER */
+#ifdef CONFIG_CGROUP_BPF
+	struct cgroup *cgrp;
+#endif /* CONFIG_CGROUP_BPF */
 	struct landlock_rule *rule;
 	u32 hook_idx = get_index(hook_id);
 
@@ -115,6 +123,11 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 
 	/* TODO: use lockless_dereference()? */
 
+	/*
+	 * Run the seccomp-based triggers before the cgroup-based triggers to
+	 * prioritize fine-grained policies (i.e. per thread), and return early.
+	 */
+
 #ifdef CONFIG_SECCOMP_FILTER
 	/* seccomp triggers and landlock_ret cleanup */
 	ctx.origin = LANDLOCK_FLAG_ORIGIN_SECCOMP;
@@ -155,8 +168,21 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 	ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL;
 	ret = landlock_run_prog_for_syscall(hook_idx, &ctx,
 			current->seccomp.landlock_hooks);
+	if (ret)
+		return -ret;
 #endif /* CONFIG_SECCOMP_FILTER */
 
+#ifdef CONFIG_CGROUP_BPF
+	/* syscall trigger */
+	if (cgroup_bpf_enabled) {
+		ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+		/* get the default cgroup associated with the current thread */
+		cgrp = task_css_set(current)->dfl_cgrp;
+		ret = landlock_run_prog_for_syscall(hook_idx, &ctx,
+				cgrp->bpf.effective[BPF_CGROUP_LANDLOCK].hooks);
+	}
+#endif /* CONFIG_CGROUP_BPF */
+
 	return -ret;
 }
 
@@ -242,9 +268,21 @@ static struct security_hook_list landlock_hooks[] = {
 	LANDLOCK_HOOK_INIT(mmap_file),
 };
 
+#ifdef CONFIG_SECCOMP_FILTER
+#ifdef CONFIG_CGROUP_BPF
+#define LANDLOCK_MANAGERS "seccomp and cgroups"
+#else /* CONFIG_CGROUP_BPF */
+#define LANDLOCK_MANAGERS "seccomp"
+#endif /* CONFIG_CGROUP_BPF */
+#elif define(CONFIG_CGROUP_BPF)
+#define LANDLOCK_MANAGERS "cgroups"
+#else
+#error "Need CONFIG_SECCOMP_FILTER or CONFIG_CGROUP_BPF"
+#endif /* CONFIG_SECCOMP_FILTER */
+
 void __init landlock_add_hooks(void)
 {
-	pr_info("landlock: Becoming ready to sandbox with seccomp\n");
+	pr_info("landlock: Becoming ready to sandbox with " LANDLOCK_MANAGERS "\n");
 	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks));
 }
 
diff --git a/security/landlock/manager.c b/security/landlock/manager.c
index e9f3f1092023..50aa1305d0d1 100644
--- a/security/landlock/manager.c
+++ b/security/landlock/manager.c
@@ -24,6 +24,11 @@
 #include <linux/seccomp.h> /* struct seccomp_filter */
 #endif /* CONFIG_SECCOMP_FILTER */
 
+#ifdef CONFIG_CGROUP_BPF
+#include <linux/bpf-cgroup.h> /* struct cgroup_bpf */
+#include <linux/cgroup-defs.h> /* struct cgroup */
+#endif /* CONFIG_CGROUP_BPF */
+
 #include "common.h"
 
 static void put_landlock_rule(struct landlock_rule *rule)
@@ -84,6 +89,12 @@ struct landlock_hooks *new_landlock_hooks(void)
 	return ret;
 }
 
+inline void get_landlock_hooks(struct landlock_hooks *hooks)
+{
+	if (hooks)
+		atomic_inc(&hooks->usage);
+}
+
 /* Limit Landlock hooks to 256KB. */
 #define LANDLOCK_HOOKS_MAX_PAGES (1 << 6)
 
@@ -240,3 +251,24 @@ int landlock_seccomp_set_hook(unsigned int flags, const char __user *user_bpf_fd
 	return 0;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+/**
+ * landlock_cgroup_set_hook - attach a Landlock program to a cgroup
+ *
+ * Must be called with cgroup_mutex held.
+ *
+ * @crgp: non-NULL cgroup pointer to attach to
+ * @prog: Landlock program pointer
+ */
+#ifdef CONFIG_CGROUP_BPF
+struct landlock_hooks *landlock_cgroup_set_hook(struct cgroup *cgrp,
+		struct bpf_prog *prog)
+{
+	if (!prog)
+		return ERR_PTR(-EINVAL);
+
+	/* copy the inherited hooks and append a new one */
+	return landlock_set_hook(cgrp->bpf.effective[BPF_CGROUP_LANDLOCK].hooks,
+			prog, NULL);
+}
+#endif /* CONFIG_CGROUP_BPF */
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd()
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (15 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 22:06   ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks Mickaël Salaün
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Add security access check for cgroup backed FD. The "cgroup.procs" file
of the corresponding cgroup must be readable to identify the cgroup, and
writable to prove that the current process can manage this cgroup (e.g.
through delegation). This is similar to the check done by
cgroup_procs_write_permission().

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h |  2 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/syscall.c   |  6 +++---
 kernel/cgroup.c        | 16 +++++++++++++++-
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c4688742ddc4..5767d471e292 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -87,7 +87,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry,
 						       struct cgroup_subsys *ss);
 
 struct cgroup *cgroup_get_from_path(const char *path);
-struct cgroup *cgroup_get_from_fd(int fd);
+struct cgroup *cgroup_get_from_fd(int fd, int access_mask);
 
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
 int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index edaab4c87292..1d4de8e0ab13 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -552,7 +552,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
 				     struct file *map_file /* not used */,
 				     int fd)
 {
-	return cgroup_get_from_fd(fd);
+	return cgroup_get_from_fd(fd, MAY_READ);
 }
 
 static void cgroup_fd_array_put_ptr(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e9c5add327e6..f90225dbbb59 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -17,6 +17,7 @@
 #include <linux/license.h>
 #include <linux/filter.h>
 #include <linux/version.h>
+#include <linux/fs.h>
 
 DEFINE_PER_CPU(int, bpf_prog_active);
 
@@ -863,7 +864,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 
-	cgrp = cgroup_get_from_fd(attr->target_fd);
+	cgrp = cgroup_get_from_fd(attr->target_fd, MAY_WRITE);
 	if (IS_ERR(cgrp)) {
 		bpf_prog_put(prog);
 		return PTR_ERR(cgrp);
@@ -891,10 +892,9 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
 
-		cgrp = cgroup_get_from_fd(attr->target_fd);
+		cgrp = cgroup_get_from_fd(attr->target_fd, MAY_WRITE);
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
-
 		result = cgroup_bpf_update(cgrp, NULL, attr->attach_type);
 		cgroup_put(cgrp);
 		break;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 48b650a640a9..3bbaf3f02ed2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6241,17 +6241,20 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
 /**
  * cgroup_get_from_fd - get a cgroup pointer from a fd
  * @fd: fd obtained by open(cgroup2_dir)
+ * @access_mask: contains the permission mask
  *
  * Find the cgroup from a fd which should be obtained
  * by opening a cgroup directory.  Returns a pointer to the
  * cgroup on success. ERR_PTR is returned if the cgroup
  * cannot be found.
  */
-struct cgroup *cgroup_get_from_fd(int fd)
+struct cgroup *cgroup_get_from_fd(int fd, int access_mask)
 {
 	struct cgroup_subsys_state *css;
 	struct cgroup *cgrp;
 	struct file *f;
+	struct inode *inode;
+	int ret;
 
 	f = fget_raw(fd);
 	if (!f)
@@ -6268,6 +6271,17 @@ struct cgroup *cgroup_get_from_fd(int fd)
 		return ERR_PTR(-EBADF);
 	}
 
+	ret = -ENOMEM;
+	inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn);
+	if (inode) {
+		ret = inode_permission(inode, access_mask);
+		iput(inode);
+	}
+	if (ret) {
+		cgroup_put(cgrp);
+		return ERR_PTR(ret);
+	}
+
 	return cgrp;
 }
 EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (16 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd() Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 18:27   ` Andy Lutomirski
  2016-09-14  7:24 ` [RFC v3 19/22] landlock: Add interrupted origin Mickaël Salaün
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
set for all cgroup except the root. The flag is clear when a new process
without the no_new_privs flags is attached to the cgroup.

If a cgroup is landlocked, then any new attempt, from an unprivileged
process, to attach a process without no_new_privs to this cgroup will
be denied.

This allows to safely manage Landlock rules with cgroup delegation as
with seccomp.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup-defs.h |  7 +++++++
 kernel/bpf/syscall.c        |  7 ++++---
 kernel/cgroup.c             | 44 ++++++++++++++++++++++++++++++++++++++++++--
 security/landlock/manager.c |  7 +++++++
 4 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index fe1023bf7b9d..ce0e4c90ae7d 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -59,6 +59,13 @@ enum {
 	 * specified at mount time and thus is implemented here.
 	 */
 	CGRP_CPUSET_CLONE_CHILDREN,
+	/*
+	 * Keep track of the no_new_privs property of processes in the cgroup.
+	 * This is useful to quickly check if all processes in the cgroup have
+	 * their no_new_privs bit on. This flag is initially set to true but
+	 * ANDed with every processes coming in the cgroup.
+	 */
+	CGRP_NO_NEW_PRIVS,
 };
 
 /* cgroup_root->flags */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f90225dbbb59..ff8b53a8a2a0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -849,9 +849,10 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 
 	case BPF_CGROUP_LANDLOCK:
 #ifdef CONFIG_SECURITY_LANDLOCK
-		if (!capable(CAP_SYS_ADMIN))
-			return -EPERM;
-
+		/*
+		 * security/capability check done in landlock_cgroup_set_hook()
+		 * called by cgroup_bpf_update()
+		 */
 		prog = bpf_prog_get_type(attr->attach_bpf_fd,
 				BPF_PROG_TYPE_LANDLOCK);
 		break;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3bbaf3f02ed2..913e2d3b6d55 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -62,6 +62,7 @@
 #include <linux/proc_ns.h>
 #include <linux/nsproxy.h>
 #include <linux/file.h>
+#include <linux/bitops.h>
 #include <net/sock.h>
 
 #define CREATE_TRACE_POINTS
@@ -1985,6 +1986,7 @@ static void init_cgroup_root(struct cgroup_root *root,
 		strcpy(root->name, opts->name);
 	if (opts->cpuset_clone_children)
 		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
+	/* no CGRP_NO_NEW_PRIVS flag for the root */
 }
 
 static int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
@@ -2812,14 +2814,35 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
 	LIST_HEAD(preloaded_csets);
 	struct task_struct *task;
 	int ret;
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK)
+	bool no_new_privs;
+#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */
 
 	if (!cgroup_may_migrate_to(dst_cgrp))
 		return -EBUSY;
 
+	task = leader;
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK)
+	no_new_privs = !!(dst_cgrp->flags & BIT_ULL(CGRP_NO_NEW_PRIVS));
+	do {
+		no_new_privs = no_new_privs && task_no_new_privs(task);
+		if (!no_new_privs) {
+			if (dst_cgrp->bpf.pinned[BPF_CGROUP_LANDLOCK].hooks &&
+					security_capable_noaudit(current_cred(),
+						current_user_ns(),
+						CAP_SYS_ADMIN) != 0)
+				return -EPERM;
+			clear_bit(CGRP_NO_NEW_PRIVS, &dst_cgrp->flags);
+			break;
+		}
+		if (!threadgroup)
+			break;
+	} while_each_thread(leader, task);
+#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */
+
 	/* look up all src csets */
 	spin_lock_irq(&css_set_lock);
 	rcu_read_lock();
-	task = leader;
 	do {
 		cgroup_migrate_add_src(task_css_set(task), dst_cgrp,
 				       &preloaded_csets);
@@ -4345,9 +4368,22 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 		return -EBUSY;
 
 	mutex_lock(&cgroup_mutex);
-
 	percpu_down_write(&cgroup_threadgroup_rwsem);
 
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK)
+	if (!(from->flags & BIT_ULL(CGRP_NO_NEW_PRIVS))) {
+		if (to->bpf.pinned[BPF_CGROUP_LANDLOCK].hooks &&
+				security_capable_noaudit(current_cred(),
+					current_user_ns(), CAP_SYS_ADMIN) != 0) {
+			pr_warn("%s: EPERM\n", __func__);
+			ret = -EPERM;
+			goto out_unlock;
+		}
+		pr_warn("%s: no EPERM\n", __func__);
+		clear_bit(CGRP_NO_NEW_PRIVS, &to->flags);
+	}
+#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */
+
 	/* all tasks in @from are being moved, all csets are source */
 	spin_lock_irq(&css_set_lock);
 	list_for_each_entry(link, &from->cset_links, cset_link)
@@ -4378,6 +4414,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 	} while (task && !ret);
 out_err:
 	cgroup_migrate_finish(&preloaded_csets);
+out_unlock:
 	percpu_up_write(&cgroup_threadgroup_rwsem);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
@@ -5241,6 +5278,9 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 
 	if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &parent->flags))
 		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags);
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK)
+	set_bit(CGRP_NO_NEW_PRIVS, &cgrp->flags);
+#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */
 
 	cgrp->self.serial_nr = css_serial_nr_next++;
 
diff --git a/security/landlock/manager.c b/security/landlock/manager.c
index 50aa1305d0d1..479f6990aeff 100644
--- a/security/landlock/manager.c
+++ b/security/landlock/manager.c
@@ -11,6 +11,7 @@
 #include <asm/atomic.h> /* atomic_*() */
 #include <asm/page.h> /* PAGE_SIZE */
 #include <asm/uaccess.h> /* copy_from_user() */
+#include <linux/bitops.h> /* BIT_ULL() */
 #include <linux/bpf.h> /* bpf_prog_put() */
 #include <linux/filter.h> /* struct bpf_prog */
 #include <linux/kernel.h> /* round_up() */
@@ -267,6 +268,12 @@ struct landlock_hooks *landlock_cgroup_set_hook(struct cgroup *cgrp,
 	if (!prog)
 		return ERR_PTR(-EINVAL);
 
+	/* check no_new_privs for tasks in the cgroup */
+	if (!(cgrp->flags & BIT_ULL(CGRP_NO_NEW_PRIVS)) &&
+			security_capable_noaudit(current_cred(),
+				current_user_ns(), CAP_SYS_ADMIN) != 0)
+		return ERR_PTR(-EPERM);
+
 	/* copy the inherited hooks and append a new one */
 	return landlock_set_hook(cgrp->bpf.effective[BPF_CGROUP_LANDLOCK].hooks,
 			prog, NULL);
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 19/22] landlock: Add interrupted origin
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (17 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 18:29   ` Andy Lutomirski
  2016-09-14  7:24 ` [RFC v3 20/22] landlock: Add update and debug access flags Mickaël Salaün
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This third origin of hook call should cover all possible trigger paths
(e.g. page fault). Landlock eBPF programs can then take decisions
accordingly.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Kees Cook <keescook@chromium.org>
---
 include/uapi/linux/bpf.h |  3 ++-
 security/landlock/lsm.c  | 17 +++++++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 12e61508f879..3cc52e51357f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -580,7 +580,8 @@ enum landlock_hook_id {
 /* Trigger type */
 #define LANDLOCK_FLAG_ORIGIN_SYSCALL	(1 << 0)
 #define LANDLOCK_FLAG_ORIGIN_SECCOMP	(1 << 1)
-#define _LANDLOCK_FLAG_ORIGIN_MASK	((1 << 2) - 1)
+#define LANDLOCK_FLAG_ORIGIN_INTERRUPT	(1 << 2)
+#define _LANDLOCK_FLAG_ORIGIN_MASK	((1 << 3) - 1)
 
 /* context of function access flags */
 #define _LANDLOCK_FLAG_ACCESS_MASK	((1ULL << 0) - 1)
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 000dd0c7ec3d..2a15839a08c8 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -17,6 +17,7 @@
 #include <linux/kernel.h> /* FIELD_SIZEOF() */
 #include <linux/landlock.h>
 #include <linux/lsm_hooks.h>
+#include <linux/preempt.h> /* in_interrupt() */
 #include <linux/seccomp.h> /* struct seccomp_* */
 #include <linux/types.h> /* uintptr_t */
 
@@ -109,6 +110,7 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 #endif /* CONFIG_CGROUP_BPF */
 	struct landlock_rule *rule;
 	u32 hook_idx = get_index(hook_id);
+	u16 current_call;
 
 	struct landlock_data ctx = {
 		.hook = hook_id,
@@ -128,6 +130,16 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 	 * prioritize fine-grained policies (i.e. per thread), and return early.
 	 */
 
+	if (unlikely(in_interrupt())) {
+		current_call = LANDLOCK_FLAG_ORIGIN_INTERRUPT;
+#ifdef CONFIG_SECCOMP_FILTER
+		/* bypass landlock_ret evaluation */
+		goto seccomp_int;
+#endif /* CONFIG_SECCOMP_FILTER */
+	} else {
+		current_call = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+	}
+
 #ifdef CONFIG_SECCOMP_FILTER
 	/* seccomp triggers and landlock_ret cleanup */
 	ctx.origin = LANDLOCK_FLAG_ORIGIN_SECCOMP;
@@ -164,8 +176,9 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 		return -ret;
 	ctx.cookie = 0;
 
+seccomp_int:
 	/* syscall trigger */
-	ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+	ctx.origin = current_call;
 	ret = landlock_run_prog_for_syscall(hook_idx, &ctx,
 			current->seccomp.landlock_hooks);
 	if (ret)
@@ -175,7 +188,7 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 #ifdef CONFIG_CGROUP_BPF
 	/* syscall trigger */
 	if (cgroup_bpf_enabled) {
-		ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+		ctx.origin = current_call;
 		/* get the default cgroup associated with the current thread */
 		cgrp = task_css_set(current)->dfl_cgrp;
 		ret = landlock_run_prog_for_syscall(hook_idx, &ctx,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 20/22] landlock: Add update and debug access flags
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (18 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 19/22] landlock: Add interrupted origin Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14  7:24 ` [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context Mickaël Salaün
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

For now, the update and debug accesses are only accessible to a process
with CAP_SYS_ADMIN. This could change in the future.

The capability check is statically done when loading an eBPF program,
according to the current process. If the process has enough rights and
set the appropriate access flags, then the dedicated functions or data
will be accessible.

With the update access, the following functions are available:
* bpf_map_lookup_elem
* bpf_map_update_elem
* bpf_map_delete_elem
* bpf_tail_call

With the debug access, the following functions are available:
* bpf_trace_printk
* bpf_get_prandom_u32
* bpf_get_current_pid_tgid
* bpf_get_current_uid_gid
* bpf_get_current_comm

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sargun Dhillon <sargun@sargun.me>
---
 include/uapi/linux/bpf.h |  4 +++-
 security/landlock/lsm.c  | 54 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3cc52e51357f..8cfc2de2ab76 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -584,7 +584,9 @@ enum landlock_hook_id {
 #define _LANDLOCK_FLAG_ORIGIN_MASK	((1 << 3) - 1)
 
 /* context of function access flags */
-#define _LANDLOCK_FLAG_ACCESS_MASK	((1ULL << 0) - 1)
+#define LANDLOCK_FLAG_ACCESS_UPDATE	(1 << 0)
+#define LANDLOCK_FLAG_ACCESS_DEBUG	(1 << 1)
+#define _LANDLOCK_FLAG_ACCESS_MASK	((1ULL << 2) - 1)
 
 /* Handle check flags */
 #define LANDLOCK_FLAG_FS_DENTRY		(1 << 0)
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 2a15839a08c8..56c45abe979c 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -202,11 +202,57 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6])
 static const struct bpf_func_proto *bpf_landlock_func_proto(
 		enum bpf_func_id func_id, union bpf_prog_subtype *prog_subtype)
 {
+	bool access_update = !!(prog_subtype->landlock_hook.access &
+			LANDLOCK_FLAG_ACCESS_UPDATE);
+	bool access_debug = !!(prog_subtype->landlock_hook.access &
+			LANDLOCK_FLAG_ACCESS_DEBUG);
+
 	switch (func_id) {
 	case BPF_FUNC_landlock_cmp_fs_prop_with_struct_file:
 		return &bpf_landlock_cmp_fs_prop_with_struct_file_proto;
 	case BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file:
 		return &bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+
+	/* access_update */
+	case BPF_FUNC_map_lookup_elem:
+		if (access_update)
+			return &bpf_map_lookup_elem_proto;
+		return NULL;
+	case BPF_FUNC_map_update_elem:
+		if (access_update)
+			return &bpf_map_update_elem_proto;
+		return NULL;
+	case BPF_FUNC_map_delete_elem:
+		if (access_update)
+			return &bpf_map_delete_elem_proto;
+		return NULL;
+	case BPF_FUNC_tail_call:
+		if (access_update)
+			return &bpf_tail_call_proto;
+		return NULL;
+
+	/* access_debug */
+	case BPF_FUNC_trace_printk:
+		if (access_debug)
+			return bpf_get_trace_printk_proto();
+		return NULL;
+	case BPF_FUNC_get_prandom_u32:
+		if (access_debug)
+			return &bpf_get_prandom_u32_proto;
+		return NULL;
+	case BPF_FUNC_get_current_pid_tgid:
+		if (access_debug)
+			return &bpf_get_current_pid_tgid_proto;
+		return NULL;
+	case BPF_FUNC_get_current_uid_gid:
+		if (access_debug)
+			return &bpf_get_current_uid_gid_proto;
+		return NULL;
+	case BPF_FUNC_get_current_comm:
+		if (access_debug)
+			return &bpf_get_current_comm_proto;
+		return NULL;
+
 	default:
 		return NULL;
 	}
@@ -348,6 +394,14 @@ static inline bool bpf_landlock_is_valid_subtype(
 	if (prog_subtype->landlock_hook.access & ~_LANDLOCK_FLAG_ACCESS_MASK)
 		return false;
 
+	/* check access flags */
+	if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_UPDATE &&
+			!capable(CAP_SYS_ADMIN))
+		return false;
+	if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_DEBUG &&
+			!capable(CAP_SYS_ADMIN))
+		return false;
+
 	return true;
 }
 
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (19 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 20/22] landlock: Add update and debug access flags Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 21:20   ` Alexei Starovoitov
  2016-09-14  7:24 ` [RFC v3 22/22] samples/landlock: Add sandbox example Mickaël Salaün
  2016-09-14 14:36 ` [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing David Laight
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

This is a proof of concept to expose optional values that could depend
of the process access rights.

There is two dedicated flags: LANDLOCK_FLAG_ACCESS_SKB_READ and
LANDLOCK_FLAG_ACCESS_SKB_WRITE. Each of them can be activated to access
eBPF functions manipulating a skb in a read or write way.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sargun Dhillon <sargun@sargun.me>
---
 include/linux/bpf.h      |  2 ++
 include/uapi/linux/bpf.h |  7 ++++++-
 kernel/bpf/verifier.c    |  6 ++++++
 security/landlock/lsm.c  | 26 ++++++++++++++++++++++++++
 4 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f7325c17f720..218973777612 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -88,6 +88,7 @@ enum bpf_arg_type {
 
 	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */
 	ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,	/* pointer to Landlock FS handle */
+	ARG_PTR_TO_STRUCT_SKB,		/* pointer to struct skb */
 };
 
 /* type of values returned from helper functions */
@@ -150,6 +151,7 @@ enum bpf_reg_type {
 	/* Landlock */
 	PTR_TO_STRUCT_FILE,
 	CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+	PTR_TO_STRUCT_SKB,
 };
 
 struct bpf_prog;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8cfc2de2ab76..7d9e56952ed9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -586,7 +586,9 @@ enum landlock_hook_id {
 /* context of function access flags */
 #define LANDLOCK_FLAG_ACCESS_UPDATE	(1 << 0)
 #define LANDLOCK_FLAG_ACCESS_DEBUG	(1 << 1)
-#define _LANDLOCK_FLAG_ACCESS_MASK	((1ULL << 2) - 1)
+#define LANDLOCK_FLAG_ACCESS_SKB_READ	(1 << 2)
+#define LANDLOCK_FLAG_ACCESS_SKB_WRITE	(1 << 3)
+#define _LANDLOCK_FLAG_ACCESS_MASK	((1ULL << 4) - 1)
 
 /* Handle check flags */
 #define LANDLOCK_FLAG_FS_DENTRY		(1 << 0)
@@ -619,12 +621,15 @@ struct landlock_handle {
  * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
  *        description and the LANDLOCK_HOOK* definitions from
  *        security/landlock/lsm.c for their types.
+ * @opt_skb: optional skb pointer, accessible with the
+ *           LANDLOCK_FLAG_ACCESS_SKB_* flags for network-related hooks.
  */
 struct landlock_data {
 	__u32 hook; /* enum landlock_hook_id */
 	__u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */
 	__u16 cookie; /* seccomp RET_LANDLOCK */
 	__u64 args[6];
+	__u64 opt_skb;
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8d7b18574f5a..a95154c1a60f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -247,6 +247,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_PACKET_END]	= "pkt_end",
 	[PTR_TO_STRUCT_FILE]	= "struct_file",
 	[CONST_PTR_TO_LANDLOCK_HANDLE_FS] = "landlock_handle_fs",
+	[PTR_TO_STRUCT_SKB]	= "struct_skb",
 };
 
 static void print_verifier_state(struct verifier_state *state)
@@ -559,6 +560,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case CONST_PTR_TO_MAP:
 	case PTR_TO_STRUCT_FILE:
 	case CONST_PTR_TO_LANDLOCK_HANDLE_FS:
+	case PTR_TO_STRUCT_SKB:
 		return true;
 	default:
 		return false;
@@ -984,6 +986,10 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_FS;
 		if (type != expected_type)
 			goto err_type;
+	} else if (arg_type == ARG_PTR_TO_STRUCT_SKB) {
+		expected_type = PTR_TO_STRUCT_SKB;
+		if (type != expected_type)
+			goto err_type;
 	} else if (arg_type == ARG_PTR_TO_STACK ||
 		   arg_type == ARG_PTR_TO_RAW_STACK) {
 		expected_type = PTR_TO_STACK;
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 56c45abe979c..8b0e6f0eb6b7 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -281,6 +281,7 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type,
 		break;
 	case offsetof(struct landlock_data, args[0]) ...
 			offsetof(struct landlock_data, args[5]):
+	case offsetof(struct landlock_data, opt_skb):
 		expected_size = sizeof(__u64);
 		break;
 	default:
@@ -299,6 +300,13 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type,
 		if (*reg_type == NOT_INIT)
 			return false;
 		break;
+	case offsetof(struct landlock_data, opt_skb):
+		if (!(prog_subtype->landlock_hook.access &
+				(LANDLOCK_FLAG_ACCESS_SKB_READ |
+				 LANDLOCK_FLAG_ACCESS_SKB_WRITE)))
+			return false;
+		*reg_type = PTR_TO_STRUCT_SKB;
+		break;
 	}
 
 	return true;
@@ -401,6 +409,24 @@ static inline bool bpf_landlock_is_valid_subtype(
 	if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_DEBUG &&
 			!capable(CAP_SYS_ADMIN))
 		return false;
+	/*
+	 * Capability checks must be enforced for every landlocked process.
+	 * To support user namespaces/capabilities, we must then check the
+	 * namespaces of a task before putting it in a landlocked cgroup.
+	 * This could be implemented in the future.
+	 */
+	if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_SKB_READ &&
+			!capable(CAP_NET_ADMIN))
+		return false;
+	/*
+	 * It is interesting to differentiate read and write access to be able
+	 * to securely delegate some work to unprivileged (and potentially
+	 * compromised/untrusted) processes. This different type of access can
+	 * be checked for function calls or context accesses.
+	 */
+	if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_SKB_WRITE &&
+			!capable(CAP_NET_ADMIN))
+		return false;
 
 	return true;
 }
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC v3 22/22] samples/landlock: Add sandbox example
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (20 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context Mickaël Salaün
@ 2016-09-14  7:24 ` Mickaël Salaün
  2016-09-14 21:24   ` Alexei Starovoitov
  2016-09-14 14:36 ` [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing David Laight
  22 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14  7:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	cgroups

Add a basic sandbox tool to create a process isolated from some part of
the system. This can depend of the current cgroup.

Example with the current process hierarchy (seccomp):

  $ ls /home
  user1
  $ LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
      ./samples/landlock/sandbox /bin/sh -i
  Launching a new sandboxed process.
  $ ls /home
  ls: cannot open directory '/home': Permission denied

Example with a cgroup:

  $ mkdir /sys/fs/cgroup/sandboxed
  $ ls /home
  user1
  $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
      LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
      ./samples/landlock/sandbox
  Ready to sandbox with cgroups.
  $ ls /home
  user1
  $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
  $ ls /home
  ls: cannot open directory '/home': Permission denied

Changes since v2:
* use BPF_PROG_ATTACH for cgroup handling

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---
 samples/Makefile            |   2 +-
 samples/landlock/.gitignore |   1 +
 samples/landlock/Makefile   |  16 +++
 samples/landlock/sandbox.c  | 307 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 325 insertions(+), 1 deletion(-)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c

diff --git a/samples/Makefile b/samples/Makefile
index 1a20169d85ac..a2dcd57ca7ac 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -2,4 +2,4 @@
 
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ trace_events/ livepatch/ \
 			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
-			   configfs/ connector/ v4l/ trace_printk/
+			   configfs/ connector/ v4l/ trace_printk/ landlock/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index 000000000000..f6c6da930a30
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandbox
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index 000000000000..d1044b2afd27
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,16 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-$(CONFIG_SECURITY_LANDLOCK) := sandbox
+sandbox-objs := sandbox.o
+
+always := $(hostprogs-y)
+
+HOSTCFLAGS += -I$(objtree)/usr/include
+
+# Trick to allow make to be run from this directory
+all:
+	$(MAKE) -C ../../ $$PWD/
+
+clean:
+	$(MAKE) -C ../../ M=$$PWD clean
diff --git a/samples/landlock/sandbox.c b/samples/landlock/sandbox.c
new file mode 100644
index 000000000000..9d6ac00cdd23
--- /dev/null
+++ b/samples/landlock/sandbox.c
@@ -0,0 +1,307 @@
+/*
+ * Landlock LSM - Sandbox example
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 3, as
+ * published by the Free Software Foundation.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h> /* open() */
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/prctl.h>
+#include <linux/seccomp.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#include "../../tools/include/linux/filter.h"
+
+#include "../bpf/libbpf.c"
+
+#ifndef seccomp
+static int seccomp(unsigned int op, unsigned int flags, void *args)
+{
+	errno = 0;
+	return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
+
+static int landlock_prog_load(const struct bpf_insn *insns, int prog_len,
+		enum landlock_hook_id hook_id, __u64 access)
+{
+	union bpf_attr attr = {
+		.prog_type = BPF_PROG_TYPE_LANDLOCK,
+		.insns = ptr_to_u64((void *) insns),
+		.insn_cnt = prog_len / sizeof(struct bpf_insn),
+		.license = ptr_to_u64((void *) "GPL"),
+		.log_buf = ptr_to_u64(bpf_log_buf),
+		.log_size = LOG_BUF_SIZE,
+		.log_level = 1,
+		.prog_subtype.landlock_hook = {
+			.id = hook_id,
+			.origin = LANDLOCK_FLAG_ORIGIN_SECCOMP |
+				LANDLOCK_FLAG_ORIGIN_SYSCALL |
+				LANDLOCK_FLAG_ORIGIN_INTERRUPT,
+			.access = access,
+		},
+	};
+
+	/* assign one field outside of struct init to make sure any
+	 * padding is zero initialized
+	 */
+	attr.kern_version = 0;
+
+	bpf_log_buf[0] = 0;
+
+	return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+}
+
+#define ARRAY_SIZE(a)	(sizeof(a) / sizeof(a[0]))
+
+static int apply_sandbox(const char **allowed_paths, int path_nb, const char
+		**cgroup_paths, int cgroup_nb)
+{
+	__u32 key;
+	int i, ret = 0, map_fs = -1, offset;
+
+	/* set up the test sandbox */
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(no_new_priv)");
+		return 1;
+	}
+
+	/* register a new syscall filter */
+	struct sock_filter filter0[] = {
+		/* pass a cookie containing 5 to the LSM hook filter */
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_LANDLOCK | 5),
+	};
+	struct sock_fprog prog0 = {
+		.len = (unsigned short)ARRAY_SIZE(filter0),
+		.filter = filter0,
+	};
+	if (!cgroup_nb) {
+		if (seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog0)) {
+			perror("seccomp(set_filter)");
+			return 1;
+		}
+	}
+
+	if (path_nb) {
+		map_fs = bpf_create_map(BPF_MAP_TYPE_LANDLOCK_ARRAY,
+				sizeof(key), sizeof(struct landlock_handle),
+				10, 0);
+		if (map_fs < 0) {
+			fprintf(stderr, "bpf_create_map(fs): %s\n",
+					strerror(errno));
+			return 1;
+		}
+		for (key = 0; key < path_nb; key++) {
+			int fd = open(allowed_paths[key],
+					O_RDONLY | O_CLOEXEC);
+			if (fd < 0) {
+				fprintf(stderr, "open(fs: \"%s\"): %s\n",
+						allowed_paths[key],
+						strerror(errno));
+				return 1;
+			}
+			struct landlock_handle handle = {
+				.type = BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
+				.fd = (__u64)fd,
+			};
+
+			/* register a new LSM handle */
+			if (bpf_update_elem(map_fs, &key, &handle, BPF_ANY)) {
+				fprintf(stderr, "bpf_update_elem(fs: \"%s\"): %s\n",
+						allowed_paths[key],
+						strerror(errno));
+				close(fd);
+				return 1;
+			}
+			close(fd);
+		}
+	}
+
+	/* load a LSM filter hook (eBPF) */
+	struct bpf_insn hook_pre[] = {
+		/* save context */
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+#if 0
+		/* check our cookie (not used in this example) */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_6, offsetof(struct
+					landlock_data, cookie)),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 5, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+#endif
+	};
+	struct bpf_insn hook_path[] = {
+		/* specify an option, if any */
+		BPF_MOV32_IMM(BPF_REG_1, 0),
+		/* handles to compare with */
+		BPF_LD_MAP_FD(BPF_REG_2, map_fs),
+		BPF_MOV64_IMM(BPF_REG_3, BPF_MAP_ARRAY_OP_OR),
+		/* hook argument (struct file) */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_6, offsetof(struct
+					landlock_data, args[0])),
+		/* checker function */
+		BPF_EMIT_CALL(BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file),
+
+		/* if the checked path is beneath the handle */
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+		/* allow anonymous mapping */
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -ENOENT, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+		/* deny by default, if any error */
+		BPF_JMP_IMM(BPF_JGE, BPF_REG_0, 0, 2),
+		BPF_MOV32_IMM(BPF_REG_0, EACCES),
+		BPF_EXIT_INSN(),
+	};
+	struct bpf_insn hook_post[] = {
+		BPF_MOV32_IMM(BPF_REG_0, EACCES),
+		BPF_EXIT_INSN(),
+	};
+
+	unsigned long hook_size = sizeof(hook_pre) + sizeof(hook_path) *
+		(path_nb ? 1 : 0) + sizeof(hook_post);
+
+	struct bpf_insn *hook0 = malloc(hook_size);
+	if (!hook0) {
+		perror("malloc");
+		ret = 1;
+		goto err_alloc;
+	}
+	memcpy(hook0, hook_pre, sizeof(hook_pre));
+	offset = sizeof(hook_pre) / sizeof(hook0[0]);
+	if (path_nb) {
+		memcpy(hook0 + offset, hook_path, sizeof(hook_path));
+		offset += sizeof(hook_path) / sizeof(hook0[0]);
+	}
+	memcpy(hook0 + offset, hook_post, sizeof(hook_post));
+
+	/* TODO: handle inode_permission hook (e.g. chdir) */
+	enum landlock_hook_id hooks[] = {
+		LANDLOCK_HOOK_FILE_OPEN,
+		LANDLOCK_HOOK_FILE_PERMISSION,
+		LANDLOCK_HOOK_MMAP_FILE,
+	};
+	for (i = 0; i < ARRAY_SIZE(hooks) && !ret; i++) {
+		int bpf0 = landlock_prog_load(hook0, hook_size, hooks[i], 0);
+		if (bpf0 == -1) {
+			perror("prog_load");
+			fprintf(stderr, "%s", bpf_log_buf);
+			ret = 1;
+			break;
+		}
+		if (!cgroup_nb) {
+			if (seccomp(SECCOMP_SET_LANDLOCK_HOOK, 0, &bpf0)) {
+				perror("seccomp(set_hook)");
+				ret = 1;
+			}
+		} else {
+			for (key = 0; key < cgroup_nb && !ret; key++) {
+				int fd = open(cgroup_paths[key],
+						O_DIRECTORY | O_CLOEXEC);
+				if (fd < 0) {
+					fprintf(stderr, "open(cgroup: \"%s\"): %s\n",
+							cgroup_paths[key], strerror(errno));
+					ret = 1;
+					break;
+				}
+				if (bpf_prog_attach(bpf0, fd, BPF_CGROUP_LANDLOCK)) {
+					fprintf(stderr, "bpf_prog_attach(cgroup: \"%s\"): %s\n",
+							cgroup_paths[key], strerror(errno));
+					ret = 1;
+				}
+				close(fd);
+			}
+		}
+		close(bpf0);
+	}
+
+	free(hook0);
+err_alloc:
+	if (path_nb) {
+		close(map_fs);
+	}
+	return ret;
+}
+
+#define ENV_FS_PATH_NAME "LANDLOCK_ALLOWED"
+#define ENV_CGROUP_PATH_NAME "LANDLOCK_CGROUPS"
+#define ENV_PATH_TOKEN ":"
+
+static int parse_path(char *env_path, const char ***path_list)
+{
+	int i, path_nb = 0;
+
+	if (env_path) {
+		path_nb++;
+		for (i = 0; env_path[i]; i++) {
+			if (env_path[i] == ENV_PATH_TOKEN[0]) {
+				path_nb++;
+			}
+		}
+	}
+	*path_list = malloc(path_nb * sizeof(**path_list));
+	for (i = 0; i < path_nb; i++) {
+		(*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN);
+	}
+
+	return path_nb;
+}
+
+int main(int argc, char * const argv[], char * const *envp)
+{
+	char *cmd_path;
+	char *env_path_allowed, *env_path_cgroup;
+	int path_nb, cgroup_nb;
+	const char **sb_paths = NULL;
+	const char **cg_paths = NULL;
+	char * const *cmd_argv;
+
+	env_path_allowed = getenv(ENV_FS_PATH_NAME);
+	if (env_path_allowed)
+		env_path_allowed = strdup(env_path_allowed);
+	env_path_cgroup = getenv(ENV_CGROUP_PATH_NAME);
+	if (env_path_cgroup)
+		env_path_cgroup = strdup(env_path_cgroup);
+
+	path_nb = parse_path(env_path_allowed, &sb_paths);
+	cgroup_nb = parse_path(env_path_cgroup, &cg_paths);
+	if (argc < 2 && !cgroup_nb) {
+		fprintf(stderr, "usage: %s <cmd> [args]...\n\n", argv[0]);
+		fprintf(stderr, "Environment variables containing paths, each separated by a colon:\n");
+		fprintf(stderr, "* %s (whitelist of allowed files and directories)\n",
+				ENV_FS_PATH_NAME);
+		fprintf(stderr, "* %s (optional cgroup paths for which the sandbox is enabled)\n",
+				ENV_CGROUP_PATH_NAME);
+		fprintf(stderr, "\nexample:\n%s='/bin:/lib:/usr:/tmp:/proc/self/fd/0' %s /bin/sh -i\n",
+				ENV_FS_PATH_NAME, argv[0]);
+		return 1;
+	}
+	if (apply_sandbox(sb_paths, path_nb, cg_paths, cgroup_nb))
+		return 1;
+	if (!cgroup_nb) {
+		cmd_path = argv[1];
+		cmd_argv = argv + 1;
+		fprintf(stderr, "Launching a new sandboxed process.\n");
+		execve(cmd_path, cmd_argv, envp);
+		perror("execve");
+		return 1;
+	}
+	fprintf(stderr, "Ready to sandbox with cgroups.\n");
+	return 0;
+}
-- 
2.9.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing
  2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (21 preceding siblings ...)
  2016-09-14  7:24 ` [RFC v3 22/22] samples/landlock: Add sandbox example Mickaël Salaün
@ 2016-09-14 14:36 ` David Laight
  22 siblings, 0 replies; 76+ messages in thread
From: David Laight @ 2016-09-14 14:36 UTC (permalink / raw)
  To: 'Mickaël Salaün', linux-kernel
  Cc: Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

From: Mickaël Salaün
> Sent: 14 September 2016 08:24
...
> ## Why does seccomp-filter is not enough?
> 
> A seccomp filter can access to raw syscall arguments which means that it is not
> possible to filter according to pointed data as a file path. As demonstrated
> the first version of this patch series, filtering at the syscall level is
> complicated (e.g. need to take care of race conditions). This is mainly because
> the access control checkpoints of the kernel are not at this high-level but
> more underneath, at LSM hooks level. The LSM hooks are designed to handle this
> kind of checks. This series use this approach to leverage the ability of
> unprivileged users to limit themselves.

You cannot validate file path parameters during syscall entry.
It can only be done after the user buffer has been read into kernel memory.
(ie you must only access the buffer once.)

This has nothing to do with where the kernel does any access checks,
and everything to do with the fact that another thread/process can
modify the buffer after you have validated it.

	David

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-14  7:24 ` [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks Mickaël Salaün
@ 2016-09-14 18:27   ` Andy Lutomirski
  2016-09-14 22:11     ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-14 18:27 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
> set for all cgroup except the root. The flag is clear when a new process
> without the no_new_privs flags is attached to the cgroup.
>
> If a cgroup is landlocked, then any new attempt, from an unprivileged
> process, to attach a process without no_new_privs to this cgroup will
> be denied.

Until and unless everyone can agree on a way to properly namespace,
delegate, etc cgroups, I think that trying to add unprivileged
semantics to cgroups is nuts.  Given the big thread about cgroup v2,
no-internal-tasks, etc, I just don't see how this approach can be
viable.

Can we try to make landlock work completely independently of cgroups
so that it doesn't get stuck and so that programs can use it without
worrying about cgroup v1 vs v2, interactions with cgroup managers,
cgroup managers that (supposedly?) will start migrating processes
around piecemeal and almost certainly blowing up landlock in the
process, etc?

I have no problem with looking at prototypes for how landlock +
cgroups would work, but I can't imagine the result being mergeable.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 19/22] landlock: Add interrupted origin
  2016-09-14  7:24 ` [RFC v3 19/22] landlock: Add interrupted origin Mickaël Salaün
@ 2016-09-14 18:29   ` Andy Lutomirski
  2016-09-14 22:14     ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-14 18:29 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
> This third origin of hook call should cover all possible trigger paths
> (e.g. page fault). Landlock eBPF programs can then take decisions
> accordingly.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Kees Cook <keescook@chromium.org>
> ---


>
> +       if (unlikely(in_interrupt())) {

IMO security hooks have no business being called from interrupts.
Aren't they all synchronous things done by tasks?  Interrupts are
driver things.

Are you trying to check for page faults and such?

--Andy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy
  2016-09-14  7:24 ` [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy Mickaël Salaün
@ 2016-09-14 18:43   ` Andy Lutomirski
  2016-09-14 22:34     ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-14 18:43 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP),
	Andrew Morton

On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
> A Landlock program will be triggered according to its subtype/origin
> bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
> Landlock program when a seccomp filter will return RET_LANDLOCK.
> Moreover, it is possible to return a 16-bit cookie which will be
> readable by the Landlock programs in its context.

Are you envisioning that the filters will return RET_LANDLOCK most of
the time or rarely?  If it's most of the time, then maybe this could
be simplified a bit by unconditionally calling the landlock filter and
letting the landlock filter access a struct seccomp_data if needed.


>
> Only seccomp filters loaded from the same thread and before a Landlock
> program can trigger it through LANDLOCK_FLAG_ORIGIN_SECCOMP. Multiple
> Landlock programs can be triggered by one or more seccomp filters. This
> way, each RET_LANDLOCK (with specific cookie) will trigger all the
> allowed Landlock programs once.

This interface seems somewhat awkward.  Should we not have a way to
atomicaly install a whole pile of landlock filters and associated
seccomp filter all at once?

--Andy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-09-14  7:23 ` [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
@ 2016-09-14 18:51   ` Alexei Starovoitov
  2016-09-14 23:22     ` Mickaël Salaün
  2016-10-03 23:53   ` Kees Cook
  1 sibling, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-14 18:51 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Wed, Sep 14, 2016 at 09:23:56AM +0200, Mickaël Salaün wrote:
> This new arraymap looks like a set and brings new properties:
> * strong typing of entries: the eBPF functions get the array type of
>   elements instead of CONST_PTR_TO_MAP (e.g.
>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
> * force sequential filling (i.e. replace or append-only update), which
>   allow quick browsing of all entries.
> 
> This strong typing is useful to statically check if the content of a map
> can be passed to an eBPF function. For example, Landlock use it to store
> and manage kernel objects (e.g. struct file) instead of dealing with
> userland raw data. This improve efficiency and ensure that an eBPF
> program can only call functions with the right high-level arguments.
> 
> The enum bpf_map_handle_type list low-level types (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
> updating a map entry (handle). This handle types are used to infer a
> high-level arraymap type which are listed in enum bpf_map_array_type
> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
> 
> For now, this new arraymap is only used by Landlock LSM (cf. next
> commits) but it could be useful for other needs.
> 
> Changes since v2:
> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
>   handle entries (suggested by Andy Lutomirski)
> * remove useless checks
> 
> Changes since v1:
> * arraymap of handles replace custom checker groups
> * simpler userland API
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Kees Cook <keescook@chromium.org>
> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
> ---
>  include/linux/bpf.h      |  14 ++++
>  include/uapi/linux/bpf.h |  18 +++++
>  kernel/bpf/arraymap.c    | 203 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c    |  12 ++-
>  4 files changed, 246 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index fa9a988400d9..eae4ce4542c1 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -13,6 +13,10 @@
>  #include <linux/percpu.h>
>  #include <linux/err.h>
>  
> +#ifdef CONFIG_SECURITY_LANDLOCK
> +#include <linux/fs.h> /* struct file */
> +#endif /* CONFIG_SECURITY_LANDLOCK */
> +
>  struct perf_event;
>  struct bpf_map;
>  
> @@ -38,6 +42,7 @@ struct bpf_map_ops {
>  struct bpf_map {
>  	atomic_t refcnt;
>  	enum bpf_map_type map_type;
> +	enum bpf_map_array_type map_array_type;
>  	u32 key_size;
>  	u32 value_size;
>  	u32 max_entries;
> @@ -187,6 +192,9 @@ struct bpf_array {
>  	 */
>  	enum bpf_prog_type owner_prog_type;
>  	bool owner_jited;
> +#ifdef CONFIG_SECURITY_LANDLOCK
> +	u32 n_entries;	/* number of entries in a handle array */
> +#endif /* CONFIG_SECURITY_LANDLOCK */
>  	union {
>  		char value[0] __aligned(8);
>  		void *ptrs[0] __aligned(8);
> @@ -194,6 +202,12 @@ struct bpf_array {
>  	};
>  };
>  
> +#ifdef CONFIG_SECURITY_LANDLOCK
> +struct map_landlock_handle {
> +	u32 type; /* enum bpf_map_handle_type */
> +};
> +#endif /* CONFIG_SECURITY_LANDLOCK */
> +
>  #define MAX_TAIL_CALL_CNT 32
>  
>  struct bpf_event_entry {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 7cd36166f9b7..b68de57f7ab8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -87,6 +87,15 @@ enum bpf_map_type {
>  	BPF_MAP_TYPE_PERCPU_ARRAY,
>  	BPF_MAP_TYPE_STACK_TRACE,P_TYPE_CGROUP_ARRAY
>  	BPF_MAP_TYPE_CGROUP_ARRAY,
> +	BPF_MAP_TYPE_LANDLOCK_ARRAY,
> +};
> +
> +enum bpf_map_array_type {
> +	BPF_MAP_ARRAY_TYPE_UNSPEC,
> +};
> +
> +enum bpf_map_handle_type {
> +	BPF_MAP_HANDLE_TYPE_UNSPEC,
>  };

missing something. why it has to be special to have it's own
fd array implementation?
Please take a look how BPF_MAP_TYPE_PERF_EVENT_ARRAY, 
BPF_MAP_TYPE_CGROUP_ARRAY and BPF_MAP_TYPE_PROG_ARRAY are done.
The all store objects into array map that user space passes via FD.
I think the same model should apply here.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14  7:24 ` [RFC v3 07/22] landlock: Handle file comparisons Mickaël Salaün
@ 2016-09-14 19:07   ` Jann Horn
  2016-09-14 22:39     ` Mickaël Salaün
  2016-09-14 21:06   ` Alexei Starovoitov
  2016-10-03 23:30   ` Kees Cook
  2 siblings, 1 reply; 76+ messages in thread
From: Jann Horn @ 2016-09-14 19:07 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups


[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]

On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote:
> Add eBPF functions to compare file system access with a Landlock file
> system handle:
> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>   This function allows to compare the dentry, inode, device or mount
>   point of the currently accessed file, with a reference handle.
> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>   This function allows an eBPF program to check if the current accessed
>   file is the same or in the hierarchy of a reference handle.
[...]
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 94256597eacd..edaab4c87292 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -603,6 +605,9 @@ static void landlock_put_handle(struct map_landlock_handle *handle)
>  	enum bpf_map_handle_type handle_type = handle->type;
>  
>  	switch (handle_type) {
> +	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
> +		path_put(&handle->path);
> +		break;
>  	case BPF_MAP_HANDLE_TYPE_UNSPEC:
>  	default:
>  		WARN_ON(1);
[...]
> diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c
> new file mode 100644
> index 000000000000..39eb85dc7d18
> --- /dev/null
> +++ b/security/landlock/checker_fs.c
[...]
> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
> +		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
> +{
> +	u8 property = (u8) r1_property;
> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +	enum bpf_map_array_op map_op = r3_map_op;
> +	struct file *file = (struct file *) (unsigned long) r4_file;
> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	struct path *p1, *p2;
> +	struct map_landlock_handle *handle;
> +	int i;

Please don't use int when iterating over an array, use size_t.


> +	/* for now, only handle OP_OR */

Is "OP_OR" an appropriate name for something that ANDs the success of
checks?


[...]
> +	synchronize_rcu();

Can you put a comment here that explains what's going on?


> +	for (i = 0; i < array->n_entries; i++) {
> +		bool result_dentry = !(property & LANDLOCK_FLAG_FS_DENTRY);
> +		bool result_inode = !(property & LANDLOCK_FLAG_FS_INODE);
> +		bool result_device = !(property & LANDLOCK_FLAG_FS_DEVICE);
> +		bool result_mount = !(property & LANDLOCK_FLAG_FS_MOUNT);
> +
> +		handle = (struct map_landlock_handle *)
> +				(array->value + array->elem_size * i);
> +
> +		if (handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) {
> +			WARN_ON(1);
> +			return -EFAULT;
> +		}
> +		p1 = &handle->path;
> +
> +		if (!result_dentry && p1->dentry == p2->dentry)
> +			result_dentry = true;

Why is this safe? As far as I can tell, this is not in an RCU read-side
critical section (synchronize_rcu() was just called), and no lock has been
taken. What prevents someone from removing the arraymap entry while we're
looking at it? Am I missing something?


[...]
> +static inline u64 bpf_landlock_cmp_fs_beneath_with_struct_file(u64 r1_option,
> +		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
> +{
> +	u8 option = (u8) r1_option;
> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +	enum bpf_map_array_op map_op = r3_map_op;
> +	struct file *file = (struct file *) (unsigned long) r4_file;
> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	struct path *p1, *p2;
> +	struct map_landlock_handle *handle;
> +	int i;

As above, please use size_t.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14  7:24 ` [RFC v3 07/22] landlock: Handle file comparisons Mickaël Salaün
  2016-09-14 19:07   ` Jann Horn
@ 2016-09-14 21:06   ` Alexei Starovoitov
  2016-09-14 23:02     ` Mickaël Salaün
  2016-10-03 23:30   ` Kees Cook
  2 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-14 21:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote:
> Add eBPF functions to compare file system access with a Landlock file
> system handle:
> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>   This function allows to compare the dentry, inode, device or mount
>   point of the currently accessed file, with a reference handle.
> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>   This function allows an eBPF program to check if the current accessed
>   file is the same or in the hierarchy of a reference handle.
> 
> The goal of file system handle is to abstract kernel objects such as a
> struct file or a struct inode. Userland can create this kind of handle
> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
> landlock_handle containing the handle type (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
> also be any descriptions able to match a struct file or a struct inode
> (e.g. path or glob string).
> 
> Changes since v2:
> * add MNT_INTERNAL check to only add file handle from user-visible FS
>   (e.g. no anonymous inode)
> * replace struct file* with struct path* in map_landlock_handle
> * add BPF protos
> * fix bpf_landlock_cmp_fs_prop_with_struct_file()
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com

thanks for keeping the links to the previous discussion.
Long term it should help, though I worry we already at the point
where there are too many outstanding issues to resolve before we
can proceed with reasonable code review.

> +/*
> + * bpf_landlock_cmp_fs_prop_with_struct_file
> + *
> + * Cf. include/uapi/linux/bpf.h
> + */
> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
> +		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
> +{
> +	u8 property = (u8) r1_property;
> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +	enum bpf_map_array_op map_op = r3_map_op;
> +	struct file *file = (struct file *) (unsigned long) r4_file;

please use just added BPF_CALL_ macros. They will help readability of the above.

> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	struct path *p1, *p2;
> +	struct map_landlock_handle *handle;
> +	int i;
> +
> +	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
> +	if (unlikely(!map)) {
> +		WARN_ON(1);
> +		return -EFAULT;
> +	}
> +	if (unlikely(!file))
> +		return -ENOENT;
> +	if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != _LANDLOCK_FLAG_FS_MASK))
> +		return -EINVAL;
> +
> +	/* for now, only handle OP_OR */
> +	switch (map_op) {
> +	case BPF_MAP_ARRAY_OP_OR:
> +		break;
> +	case BPF_MAP_ARRAY_OP_UNSPEC:
> +	case BPF_MAP_ARRAY_OP_AND:
> +	case BPF_MAP_ARRAY_OP_XOR:
> +	default:
> +		return -EINVAL;
> +	}
> +	p2 = &file->f_path;
> +
> +	synchronize_rcu();

that is completely broken.
bpf programs are executing under rcu_lock.
please enable CONFIG_PROVE_RCU and retest everything.

I would suggest for the next RFC to do minimal 7 patches up to this point
with simple example that demonstrates the use case.
I would avoid all unpriv stuff and all of seccomp for the next RFC as well,
otherwise I don't think we can realistically make forward progress, since
there are too many issues raised in the subsequent patches.

The common part that is mergeable is prog's subtype extension to
the verifier that can be used for better tracing and is the common
piece of infra needed for both landlock and checmate LSMs
(which must be one LSM anyway)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code
  2016-09-14  7:24 ` [RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code Mickaël Salaün
@ 2016-09-14 21:16   ` Alexei Starovoitov
  0 siblings, 0 replies; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-14 21:16 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Wed, Sep 14, 2016 at 09:24:07AM +0200, Mickaël Salaün wrote:
> This will be useful to support Landlock for the next commits.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Daniel Mack <daniel@zonque.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Tejun Heo <tj@kernel.org>

I think this is good to do regardless. Sooner or later cgroup_bpf_update
will start rejecting the prog attach. Like we discussed to have a flag
that would dissallow processeses lower in the cgroup hierarchy to install
their own bpf programs.
It will be minimal change to bpf_prog_attach() error handling,
but will greatly help Mickael to build stuff on top.
DanielM can you refactor your patch to do that from the start ?

Thanks!

> ---
>  include/linux/bpf-cgroup.h |  4 ++--
>  kernel/bpf/cgroup.c        |  3 ++-
>  kernel/bpf/syscall.c       | 10 ++++++----
>  kernel/cgroup.c            |  6 ++++--
>  4 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 2234042d7f61..6cca7924ee17 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -31,13 +31,13 @@ struct cgroup_bpf {
>  void cgroup_bpf_put(struct cgroup *cgrp);
>  void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
>  
> -void __cgroup_bpf_update(struct cgroup *cgrp,
> +int __cgroup_bpf_update(struct cgroup *cgrp,
>  			 struct cgroup *parent,
>  			 struct bpf_prog *prog,
>  			 enum bpf_attach_type type);
>  
>  /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
> -void cgroup_bpf_update(struct cgroup *cgrp,
> +int cgroup_bpf_update(struct cgroup *cgrp,
>  		       struct bpf_prog *prog,
>  		       enum bpf_attach_type type);
>  
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 782878ec4f2d..7b75fa692617 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -83,7 +83,7 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
>   *
>   * Must be called with cgroup_mutex held.
>   */
> -void __cgroup_bpf_update(struct cgroup *cgrp,
> +int __cgroup_bpf_update(struct cgroup *cgrp,
>  			 struct cgroup *parent,
>  			 struct bpf_prog *prog,
>  			 enum bpf_attach_type type)
> @@ -117,6 +117,7 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
>  		bpf_prog_put(old_pinned.prog);
>  		static_branch_dec(&cgroup_bpf_enabled_key);
>  	}
> +	return 0;
>  }
>  
>  /**
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 45a91d511119..c978f2d9a1b3 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -831,6 +831,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  {
>  	struct bpf_prog *prog;
>  	struct cgroup *cgrp;
> +	int result;
>  
>  	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
> @@ -858,10 +859,10 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  		return PTR_ERR(cgrp);
>  	}
>  
> -	cgroup_bpf_update(cgrp, prog, attr->attach_type);
> +	result = cgroup_bpf_update(cgrp, prog, attr->attach_type);
>  	cgroup_put(cgrp);
>  
> -	return 0;
> +	return result;
>  }
>  
>  #define BPF_PROG_DETACH_LAST_FIELD attach_type
> @@ -869,6 +870,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  static int bpf_prog_detach(const union bpf_attr *attr)
>  {
>  	struct cgroup *cgrp;
> +	int result = 0;
>  
>  	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
> @@ -883,7 +885,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>  		if (IS_ERR(cgrp))
>  			return PTR_ERR(cgrp);
>  
> -		cgroup_bpf_update(cgrp, NULL, attr->attach_type);
> +		result = cgroup_bpf_update(cgrp, NULL, attr->attach_type);
>  		cgroup_put(cgrp);
>  		break;
>  
> @@ -891,7 +893,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>  		return -EINVAL;
>  	}
>  
> -	return 0;
> +	return result;
>  }
>  #endif /* CONFIG_CGROUP_BPF */
>  
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 87324ce481b1..48b650a640a9 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -6450,15 +6450,17 @@ static __init int cgroup_namespaces_init(void)
>  subsys_initcall(cgroup_namespaces_init);
>  
>  #ifdef CONFIG_CGROUP_BPF
> -void cgroup_bpf_update(struct cgroup *cgrp,
> +int cgroup_bpf_update(struct cgroup *cgrp,
>  		       struct bpf_prog *prog,
>  		       enum bpf_attach_type type)
>  {
>  	struct cgroup *parent = cgroup_parent(cgrp);
> +	int result;
>  
>  	mutex_lock(&cgroup_mutex);
> -	__cgroup_bpf_update(cgrp, parent, prog, type);
> +	result = __cgroup_bpf_update(cgrp, parent, prog, type);
>  	mutex_unlock(&cgroup_mutex);
> +	return result;
>  }
>  #endif /* CONFIG_CGROUP_BPF */
>  
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context
  2016-09-14  7:24 ` [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context Mickaël Salaün
@ 2016-09-14 21:20   ` Alexei Starovoitov
  2016-09-14 22:46     ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-14 21:20 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Wed, Sep 14, 2016 at 09:24:14AM +0200, Mickaël Salaün wrote:
> This is a proof of concept to expose optional values that could depend
> of the process access rights.
> 
> There is two dedicated flags: LANDLOCK_FLAG_ACCESS_SKB_READ and
> LANDLOCK_FLAG_ACCESS_SKB_WRITE. Each of them can be activated to access
> eBPF functions manipulating a skb in a read or write way.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
...
>  /* Handle check flags */
>  #define LANDLOCK_FLAG_FS_DENTRY		(1 << 0)
> @@ -619,12 +621,15 @@ struct landlock_handle {
>   * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
>   *        description and the LANDLOCK_HOOK* definitions from
>   *        security/landlock/lsm.c for their types.
> + * @opt_skb: optional skb pointer, accessible with the
> + *           LANDLOCK_FLAG_ACCESS_SKB_* flags for network-related hooks.
>   */
>  struct landlock_data {
>  	__u32 hook; /* enum landlock_hook_id */
>  	__u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */
>  	__u16 cookie; /* seccomp RET_LANDLOCK */
>  	__u64 args[6];
> +	__u64 opt_skb;
>  };

missing something here.
This patch doesn't make use of it.
That's something for the future?
How that field will be populated?
Why make it different vs the rest or args[6] ?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 22/22] samples/landlock: Add sandbox example
  2016-09-14  7:24 ` [RFC v3 22/22] samples/landlock: Add sandbox example Mickaël Salaün
@ 2016-09-14 21:24   ` Alexei Starovoitov
  0 siblings, 0 replies; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-14 21:24 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Wed, Sep 14, 2016 at 09:24:15AM +0200, Mickaël Salaün wrote:
> Add a basic sandbox tool to create a process isolated from some part of
> the system. This can depend of the current cgroup.
> 
> Example with the current process hierarchy (seccomp):
> 
>   $ ls /home
>   user1
>   $ LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>       ./samples/landlock/sandbox /bin/sh -i
>   Launching a new sandboxed process.
>   $ ls /home
>   ls: cannot open directory '/home': Permission denied
> 
> Example with a cgroup:
> 
>   $ mkdir /sys/fs/cgroup/sandboxed
>   $ ls /home
>   user1
>   $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
>       LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>       ./samples/landlock/sandbox
>   Ready to sandbox with cgroups.
>   $ ls /home
>   user1
>   $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
>   $ ls /home
>   ls: cannot open directory '/home': Permission denied
> 
> Changes since v2:
> * use BPF_PROG_ATTACH for cgroup handling
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
...
> +	struct bpf_insn hook_path[] = {
> +		/* specify an option, if any */
> +		BPF_MOV32_IMM(BPF_REG_1, 0),
> +		/* handles to compare with */
> +		BPF_LD_MAP_FD(BPF_REG_2, map_fs),
> +		BPF_MOV64_IMM(BPF_REG_3, BPF_MAP_ARRAY_OP_OR),
> +		/* hook argument (struct file) */
> +		BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_6, offsetof(struct
> +					landlock_data, args[0])),
> +		/* checker function */
> +		BPF_EMIT_CALL(BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file),

the example is sweet!
Since only that helper is used, could you skip the other one
from the patches (for now) ?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd()
  2016-09-14  7:24 ` [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd() Mickaël Salaün
@ 2016-09-14 22:06   ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 22:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 2103 bytes --]


On 14/09/2016 09:24, Mickaël Salaün wrote:
> Add security access check for cgroup backed FD. The "cgroup.procs" file
> of the corresponding cgroup must be readable to identify the cgroup, and
> writable to prove that the current process can manage this cgroup (e.g.
> through delegation). This is similar to the check done by
> cgroup_procs_write_permission().
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Daniel Mack <daniel@zonque.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> ---
>  include/linux/cgroup.h |  2 +-
>  kernel/bpf/arraymap.c  |  2 +-
>  kernel/bpf/syscall.c   |  6 +++---
>  kernel/cgroup.c        | 16 +++++++++++++++-
>  4 files changed, 20 insertions(+), 6 deletions(-)
...
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 48b650a640a9..3bbaf3f02ed2 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -6241,17 +6241,20 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
>  /**
>   * cgroup_get_from_fd - get a cgroup pointer from a fd
>   * @fd: fd obtained by open(cgroup2_dir)
> + * @access_mask: contains the permission mask
>   *
>   * Find the cgroup from a fd which should be obtained
>   * by opening a cgroup directory.  Returns a pointer to the
>   * cgroup on success. ERR_PTR is returned if the cgroup
>   * cannot be found.
>   */
> -struct cgroup *cgroup_get_from_fd(int fd)
> +struct cgroup *cgroup_get_from_fd(int fd, int access_mask)
>  {
>  	struct cgroup_subsys_state *css;
>  	struct cgroup *cgrp;
>  	struct file *f;
> +	struct inode *inode;
> +	int ret;
>  
>  	f = fget_raw(fd);
>  	if (!f)
> @@ -6268,6 +6271,17 @@ struct cgroup *cgroup_get_from_fd(int fd)
>  		return ERR_PTR(-EBADF);
>  	}
>  
> +	ret = -ENOMEM;
> +	inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn);

I forgot to properly move fput(f) after this line… This will be fixed.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-14 18:27   ` Andy Lutomirski
@ 2016-09-14 22:11     ` Mickaël Salaün
  2016-09-15  1:25       ` Andy Lutomirski
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 22:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

[-- Attachment #1.1: Type: text/plain, Size: 2025 bytes --]


On 14/09/2016 20:27, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>> set for all cgroup except the root. The flag is clear when a new process
>> without the no_new_privs flags is attached to the cgroup.
>>
>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>> process, to attach a process without no_new_privs to this cgroup will
>> be denied.
> 
> Until and unless everyone can agree on a way to properly namespace,
> delegate, etc cgroups, I think that trying to add unprivileged
> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
> no-internal-tasks, etc, I just don't see how this approach can be
> viable.

As far as I can tell, the no_new_privs flag of at task is not related to
namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.

Using cgroup is optional, any task could use the seccomp-based
landlocking instead. However, for those that want/need to manage a
security policy in a more dynamic way, using cgroups may make sense.

I though cgroup delegation was OK in the v2, isn't it the case? Do you
have some links?

> 
> Can we try to make landlock work completely independently of cgroups
> so that it doesn't get stuck and so that programs can use it without
> worrying about cgroup v1 vs v2, interactions with cgroup managers,
> cgroup managers that (supposedly?) will start migrating processes
> around piecemeal and almost certainly blowing up landlock in the
> process, etc?

This RFC handle both cgroup and seccomp approaches in a similar way. I
don't see why building on top of cgroup v2 is a problem. Is there
security issues with delegation?

> 
> I have no problem with looking at prototypes for how landlock +
> cgroups would work, but I can't imagine the result being mergeable.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 19/22] landlock: Add interrupted origin
  2016-09-14 18:29   ` Andy Lutomirski
@ 2016-09-14 22:14     ` Mickaël Salaün
  2016-09-15  1:19       ` Andy Lutomirski
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 22:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

[-- Attachment #1.1: Type: text/plain, Size: 891 bytes --]


On 14/09/2016 20:29, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> This third origin of hook call should cover all possible trigger paths
>> (e.g. page fault). Landlock eBPF programs can then take decisions
>> accordingly.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Kees Cook <keescook@chromium.org>
>> ---
> 
> 
>>
>> +       if (unlikely(in_interrupt())) {
> 
> IMO security hooks have no business being called from interrupts.
> Aren't they all synchronous things done by tasks?  Interrupts are
> driver things.
> 
> Are you trying to check for page faults and such?

Yes, that was the idea you did put in my mind. Not sure how to deal with
this.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy
  2016-09-14 18:43   ` Andy Lutomirski
@ 2016-09-14 22:34     ` Mickaël Salaün
  2016-10-03 23:52       ` Kees Cook
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 22:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP),
	Andrew Morton

[-- Attachment #1.1: Type: text/plain, Size: 1920 bytes --]


On 14/09/2016 20:43, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> A Landlock program will be triggered according to its subtype/origin
>> bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
>> Landlock program when a seccomp filter will return RET_LANDLOCK.
>> Moreover, it is possible to return a 16-bit cookie which will be
>> readable by the Landlock programs in its context.
> 
> Are you envisioning that the filters will return RET_LANDLOCK most of
> the time or rarely?  If it's most of the time, then maybe this could
> be simplified a bit by unconditionally calling the landlock filter and
> letting the landlock filter access a struct seccomp_data if needed.

Exposing seccomp_data in a Landlock context may be a good idea. The main
implication is that Landlock programs may then be architecture specific
(if dealing with data) as seccomp filters are. Another point is that it
remove any direct binding between seccomp filters and Landlock programs.
I will try this (more simple) approach.

> 
>>
>> Only seccomp filters loaded from the same thread and before a Landlock
>> program can trigger it through LANDLOCK_FLAG_ORIGIN_SECCOMP. Multiple
>> Landlock programs can be triggered by one or more seccomp filters. This
>> way, each RET_LANDLOCK (with specific cookie) will trigger all the
>> allowed Landlock programs once.
> 
> This interface seems somewhat awkward.  Should we not have a way to
> atomicaly install a whole pile of landlock filters and associated
> seccomp filter all at once?

I can change the seccomp(2) use in this way: instead of loading a
Landlock program, (atomically) load an array of Landlock programs.

However, exposing seccomp_data to Landlock programs looks like a better
way to deal with it. This does not needs to manage an array of Landlock
programs.

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14 19:07   ` Jann Horn
@ 2016-09-14 22:39     ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 22:39 UTC (permalink / raw)
  To: Jann Horn
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 3266 bytes --]


On 14/09/2016 21:07, Jann Horn wrote:
> On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote:
>> Add eBPF functions to compare file system access with a Landlock file
>> system handle:
>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>   This function allows to compare the dentry, inode, device or mount
>>   point of the currently accessed file, with a reference handle.
>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>   This function allows an eBPF program to check if the current accessed
>>   file is the same or in the hierarchy of a reference handle.
> [...]
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 94256597eacd..edaab4c87292 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -603,6 +605,9 @@ static void landlock_put_handle(struct map_landlock_handle *handle)
>>  	enum bpf_map_handle_type handle_type = handle->type;
>>  
>>  	switch (handle_type) {
>> +	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
>> +		path_put(&handle->path);
>> +		break;
>>  	case BPF_MAP_HANDLE_TYPE_UNSPEC:
>>  	default:
>>  		WARN_ON(1);
> [...]
>> diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c
>> new file mode 100644
>> index 000000000000..39eb85dc7d18
>> --- /dev/null
>> +++ b/security/landlock/checker_fs.c
> [...]
>> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
>> +		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
>> +{
>> +	u8 property = (u8) r1_property;
>> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
>> +	enum bpf_map_array_op map_op = r3_map_op;
>> +	struct file *file = (struct file *) (unsigned long) r4_file;
>> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
>> +	struct path *p1, *p2;
>> +	struct map_landlock_handle *handle;
>> +	int i;
> 
> Please don't use int when iterating over an array, use size_t.

OK, I will use size_t.

> 
> 
>> +	/* for now, only handle OP_OR */
> 
> Is "OP_OR" an appropriate name for something that ANDs the success of
> checks?
> 
> 
> [...]
>> +	synchronize_rcu();
> 
> Can you put a comment here that explains what's going on?

Hum, this should not be here.

> 
> 
>> +	for (i = 0; i < array->n_entries; i++) {
>> +		bool result_dentry = !(property & LANDLOCK_FLAG_FS_DENTRY);
>> +		bool result_inode = !(property & LANDLOCK_FLAG_FS_INODE);
>> +		bool result_device = !(property & LANDLOCK_FLAG_FS_DEVICE);
>> +		bool result_mount = !(property & LANDLOCK_FLAG_FS_MOUNT);
>> +
>> +		handle = (struct map_landlock_handle *)
>> +				(array->value + array->elem_size * i);
>> +
>> +		if (handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) {
>> +			WARN_ON(1);
>> +			return -EFAULT;
>> +		}
>> +		p1 = &handle->path;
>> +
>> +		if (!result_dentry && p1->dentry == p2->dentry)
>> +			result_dentry = true;
> 
> Why is this safe? As far as I can tell, this is not in an RCU read-side
> critical section (synchronize_rcu() was just called), and no lock has been
> taken. What prevents someone from removing the arraymap entry while we're
> looking at it? Am I missing something?

I will try to properly deal with RCU.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context
  2016-09-14 21:20   ` Alexei Starovoitov
@ 2016-09-14 22:46     ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 22:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 1626 bytes --]


On 14/09/2016 23:20, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:24:14AM +0200, Mickaël Salaün wrote:
>> This is a proof of concept to expose optional values that could depend
>> of the process access rights.
>>
>> There is two dedicated flags: LANDLOCK_FLAG_ACCESS_SKB_READ and
>> LANDLOCK_FLAG_ACCESS_SKB_WRITE. Each of them can be activated to access
>> eBPF functions manipulating a skb in a read or write way.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ...
>>  /* Handle check flags */
>>  #define LANDLOCK_FLAG_FS_DENTRY		(1 << 0)
>> @@ -619,12 +621,15 @@ struct landlock_handle {
>>   * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
>>   *        description and the LANDLOCK_HOOK* definitions from
>>   *        security/landlock/lsm.c for their types.
>> + * @opt_skb: optional skb pointer, accessible with the
>> + *           LANDLOCK_FLAG_ACCESS_SKB_* flags for network-related hooks.
>>   */
>>  struct landlock_data {
>>  	__u32 hook; /* enum landlock_hook_id */
>>  	__u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */
>>  	__u16 cookie; /* seccomp RET_LANDLOCK */
>>  	__u64 args[6];
>> +	__u64 opt_skb;
>>  };
> 
> missing something here.
> This patch doesn't make use of it.
> That's something for the future?
> How that field will be populated?
> Why make it different vs the rest or args[6] ?
> 
> 

I don't use this code, it's only purpose is to show how to deal with
fine-grained privileges of Landlock programs (to allow Sargun to add his
custom helpers from Checmate). However, this optional field may be part
of args[6].


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14 21:06   ` Alexei Starovoitov
@ 2016-09-14 23:02     ` Mickaël Salaün
  2016-09-14 23:24       ` Alexei Starovoitov
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 23:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 4653 bytes --]



On 14/09/2016 23:06, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote:
>> Add eBPF functions to compare file system access with a Landlock file
>> system handle:
>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>   This function allows to compare the dentry, inode, device or mount
>>   point of the currently accessed file, with a reference handle.
>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>   This function allows an eBPF program to check if the current accessed
>>   file is the same or in the hierarchy of a reference handle.
>>
>> The goal of file system handle is to abstract kernel objects such as a
>> struct file or a struct inode. Userland can create this kind of handle
>> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
>> landlock_handle containing the handle type (e.g.
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
>> also be any descriptions able to match a struct file or a struct inode
>> (e.g. path or glob string).
>>
>> Changes since v2:
>> * add MNT_INTERNAL check to only add file handle from user-visible FS
>>   (e.g. no anonymous inode)
>> * replace struct file* with struct path* in map_landlock_handle
>> * add BPF protos
>> * fix bpf_landlock_cmp_fs_prop_with_struct_file()
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Serge E. Hallyn <serge@hallyn.com>
>> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
> 
> thanks for keeping the links to the previous discussion.
> Long term it should help, though I worry we already at the point
> where there are too many outstanding issues to resolve before we
> can proceed with reasonable code review.
> 
>> +/*
>> + * bpf_landlock_cmp_fs_prop_with_struct_file
>> + *
>> + * Cf. include/uapi/linux/bpf.h
>> + */
>> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
>> +		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
>> +{
>> +	u8 property = (u8) r1_property;
>> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
>> +	enum bpf_map_array_op map_op = r3_map_op;
>> +	struct file *file = (struct file *) (unsigned long) r4_file;
> 
> please use just added BPF_CALL_ macros. They will help readability of the above.
> 
>> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
>> +	struct path *p1, *p2;
>> +	struct map_landlock_handle *handle;
>> +	int i;
>> +
>> +	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
>> +	if (unlikely(!map)) {
>> +		WARN_ON(1);
>> +		return -EFAULT;
>> +	}
>> +	if (unlikely(!file))
>> +		return -ENOENT;
>> +	if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != _LANDLOCK_FLAG_FS_MASK))
>> +		return -EINVAL;
>> +
>> +	/* for now, only handle OP_OR */
>> +	switch (map_op) {
>> +	case BPF_MAP_ARRAY_OP_OR:
>> +		break;
>> +	case BPF_MAP_ARRAY_OP_UNSPEC:
>> +	case BPF_MAP_ARRAY_OP_AND:
>> +	case BPF_MAP_ARRAY_OP_XOR:
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +	p2 = &file->f_path;
>> +
>> +	synchronize_rcu();
> 
> that is completely broken.
> bpf programs are executing under rcu_lock.
> please enable CONFIG_PROVE_RCU and retest everything.

Thanks for the tip. I will fix this.

> 
> I would suggest for the next RFC to do minimal 7 patches up to this point
> with simple example that demonstrates the use case.
> I would avoid all unpriv stuff and all of seccomp for the next RFC as well,
> otherwise I don't think we can realistically make forward progress, since
> there are too many issues raised in the subsequent patches.

I hope we will find a common agreement about seccomp vs cgroup… I think
both approaches have their advantages, can be complementary and nicely
combined.

Unprivileged sandboxing is the main goal of Landlock. This should not be
a problem, even for privileged features, thanks to the new subtype/access.

> 
> The common part that is mergeable is prog's subtype extension to
> the verifier that can be used for better tracing and is the common
> piece of infra needed for both landlock and checmate LSMs
> (which must be one LSM anyway)

Agreed. With this RFC, the Checmate features (i.e. network helpers)
should be able to sit on top of Landlock.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-09-14 18:51   ` Alexei Starovoitov
@ 2016-09-14 23:22     ` Mickaël Salaün
  2016-09-14 23:28       ` Alexei Starovoitov
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-14 23:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 5140 bytes --]


On 14/09/2016 20:51, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:23:56AM +0200, Mickaël Salaün wrote:
>> This new arraymap looks like a set and brings new properties:
>> * strong typing of entries: the eBPF functions get the array type of
>>   elements instead of CONST_PTR_TO_MAP (e.g.
>>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
>> * force sequential filling (i.e. replace or append-only update), which
>>   allow quick browsing of all entries.
>>
>> This strong typing is useful to statically check if the content of a map
>> can be passed to an eBPF function. For example, Landlock use it to store
>> and manage kernel objects (e.g. struct file) instead of dealing with
>> userland raw data. This improve efficiency and ensure that an eBPF
>> program can only call functions with the right high-level arguments.
>>
>> The enum bpf_map_handle_type list low-level types (e.g.
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
>> updating a map entry (handle). This handle types are used to infer a
>> high-level arraymap type which are listed in enum bpf_map_array_type
>> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
>>
>> For now, this new arraymap is only used by Landlock LSM (cf. next
>> commits) but it could be useful for other needs.
>>
>> Changes since v2:
>> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
>>   handle entries (suggested by Andy Lutomirski)
>> * remove useless checks
>>
>> Changes since v1:
>> * arraymap of handles replace custom checker groups
>> * simpler userland API
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Kees Cook <keescook@chromium.org>
>> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
>> ---
>>  include/linux/bpf.h      |  14 ++++
>>  include/uapi/linux/bpf.h |  18 +++++
>>  kernel/bpf/arraymap.c    | 203 +++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/bpf/verifier.c    |  12 ++-
>>  4 files changed, 246 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index fa9a988400d9..eae4ce4542c1 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -13,6 +13,10 @@
>>  #include <linux/percpu.h>
>>  #include <linux/err.h>
>>  
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +#include <linux/fs.h> /* struct file */
>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>> +
>>  struct perf_event;
>>  struct bpf_map;
>>  
>> @@ -38,6 +42,7 @@ struct bpf_map_ops {
>>  struct bpf_map {
>>  	atomic_t refcnt;
>>  	enum bpf_map_type map_type;
>> +	enum bpf_map_array_type map_array_type;
>>  	u32 key_size;
>>  	u32 value_size;
>>  	u32 max_entries;
>> @@ -187,6 +192,9 @@ struct bpf_array {
>>  	 */
>>  	enum bpf_prog_type owner_prog_type;
>>  	bool owner_jited;
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +	u32 n_entries;	/* number of entries in a handle array */
>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>  	union {
>>  		char value[0] __aligned(8);
>>  		void *ptrs[0] __aligned(8);
>> @@ -194,6 +202,12 @@ struct bpf_array {
>>  	};
>>  };
>>  
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +struct map_landlock_handle {
>> +	u32 type; /* enum bpf_map_handle_type */
>> +};
>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>> +
>>  #define MAX_TAIL_CALL_CNT 32
>>  
>>  struct bpf_event_entry {
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 7cd36166f9b7..b68de57f7ab8 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -87,6 +87,15 @@ enum bpf_map_type {
>>  	BPF_MAP_TYPE_PERCPU_ARRAY,
>>  	BPF_MAP_TYPE_STACK_TRACE,P_TYPE_CGROUP_ARRAY
>>  	BPF_MAP_TYPE_CGROUP_ARRAY,
>> +	BPF_MAP_TYPE_LANDLOCK_ARRAY,
>> +};
>> +
>> +enum bpf_map_array_type {
>> +	BPF_MAP_ARRAY_TYPE_UNSPEC,
>> +};
>> +
>> +enum bpf_map_handle_type {
>> +	BPF_MAP_HANDLE_TYPE_UNSPEC,
>>  };
> 
> missing something. why it has to be special to have it's own
> fd array implementation?
> Please take a look how BPF_MAP_TYPE_PERF_EVENT_ARRAY, 
> BPF_MAP_TYPE_CGROUP_ARRAY and BPF_MAP_TYPE_PROG_ARRAY are done.
> The all store objects into array map that user space passes via FD.
> I think the same model should apply here.

The idea is to have multiple way for userland to describe a resource
(e.g. an open file descriptor, a path or a glob pattern). The kernel
representation could then be a "struct path *" or dedicated types (e.g.
custom glob).

Another interesting point (that could replace
check_map_func_compatibility()) is that BPF_MAP_TYPE_LANDLOCK_ARRAY
translate to dedicated (abstract) types (instead of CONST_PTR_TO_MAP)
thanks to bpf_reg_type_from_map(). This is useful to abstract userland
(map) interface with kernel object(s) dealing with that type.

A third point is that BPF_MAP_TYPE_LANDLOCK_ARRAY is a kind of set. It
is optimized to quickly walk through all the elements in a sequential way.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14 23:02     ` Mickaël Salaün
@ 2016-09-14 23:24       ` Alexei Starovoitov
  2016-09-15 21:25         ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-14 23:24 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Thu, Sep 15, 2016 at 01:02:22AM +0200, Mickaël Salaün wrote:
> > 
> > I would suggest for the next RFC to do minimal 7 patches up to this point
> > with simple example that demonstrates the use case.
> > I would avoid all unpriv stuff and all of seccomp for the next RFC as well,
> > otherwise I don't think we can realistically make forward progress, since
> > there are too many issues raised in the subsequent patches.
> 
> I hope we will find a common agreement about seccomp vs cgroup… I think
> both approaches have their advantages, can be complementary and nicely
> combined.

I don't mind having both task based lsm and cgroup based as long as
infrastracture is not duplicated and scaling issues from earlier version
are resolved.
I'm proposing to do cgroup only for the next RFC, since mine and Sargun's
use case for this bpf+lsm+cgroup is _not_ security or sandboxing.
No need for unpriv, no_new_priv to cgroups are other things that Andy
is concerned about.

> Unprivileged sandboxing is the main goal of Landlock. This should not be
> a problem, even for privileged features, thanks to the new subtype/access.

yes. the point that unpriv stuff can come later after agreement is reached.
If we keep arguing about seccomp details this set won't go anywhere.
Even in basic part (which is cgroup+bpf+lsm) are plenty of questions
to be still agreed.

> Agreed. With this RFC, the Checmate features (i.e. network helpers)
> should be able to sit on top of Landlock.

I think neither of them should be called fancy names for no technical reason.
We will have only one bpf based lsm. That's it and it doesn't
need an obscure name. Directory name can be security/bpf/..stuff.c

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-09-14 23:22     ` Mickaël Salaün
@ 2016-09-14 23:28       ` Alexei Starovoitov
  2016-09-15 21:51         ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-14 23:28 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Thu, Sep 15, 2016 at 01:22:49AM +0200, Mickaël Salaün wrote:
> 
> On 14/09/2016 20:51, Alexei Starovoitov wrote:
> > On Wed, Sep 14, 2016 at 09:23:56AM +0200, Mickaël Salaün wrote:
> >> This new arraymap looks like a set and brings new properties:
> >> * strong typing of entries: the eBPF functions get the array type of
> >>   elements instead of CONST_PTR_TO_MAP (e.g.
> >>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
> >> * force sequential filling (i.e. replace or append-only update), which
> >>   allow quick browsing of all entries.
> >>
> >> This strong typing is useful to statically check if the content of a map
> >> can be passed to an eBPF function. For example, Landlock use it to store
> >> and manage kernel objects (e.g. struct file) instead of dealing with
> >> userland raw data. This improve efficiency and ensure that an eBPF
> >> program can only call functions with the right high-level arguments.
> >>
> >> The enum bpf_map_handle_type list low-level types (e.g.
> >> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
> >> updating a map entry (handle). This handle types are used to infer a
> >> high-level arraymap type which are listed in enum bpf_map_array_type
> >> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
> >>
> >> For now, this new arraymap is only used by Landlock LSM (cf. next
> >> commits) but it could be useful for other needs.
> >>
> >> Changes since v2:
> >> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
> >>   handle entries (suggested by Andy Lutomirski)
> >> * remove useless checks
> >>
> >> Changes since v1:
> >> * arraymap of handles replace custom checker groups
> >> * simpler userland API
> >>
> >> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> >> Cc: Alexei Starovoitov <ast@kernel.org>
> >> Cc: Andy Lutomirski <luto@amacapital.net>
> >> Cc: Daniel Borkmann <daniel@iogearbox.net>
> >> Cc: David S. Miller <davem@davemloft.net>
> >> Cc: Kees Cook <keescook@chromium.org>
> >> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
> >> ---
> >>  include/linux/bpf.h      |  14 ++++
> >>  include/uapi/linux/bpf.h |  18 +++++
> >>  kernel/bpf/arraymap.c    | 203 +++++++++++++++++++++++++++++++++++++++++++++++
> >>  kernel/bpf/verifier.c    |  12 ++-
> >>  4 files changed, 246 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >> index fa9a988400d9..eae4ce4542c1 100644
> >> --- a/include/linux/bpf.h
> >> +++ b/include/linux/bpf.h
> >> @@ -13,6 +13,10 @@
> >>  #include <linux/percpu.h>
> >>  #include <linux/err.h>
> >>  
> >> +#ifdef CONFIG_SECURITY_LANDLOCK
> >> +#include <linux/fs.h> /* struct file */
> >> +#endif /* CONFIG_SECURITY_LANDLOCK */
> >> +
> >>  struct perf_event;
> >>  struct bpf_map;
> >>  
> >> @@ -38,6 +42,7 @@ struct bpf_map_ops {
> >>  struct bpf_map {
> >>  	atomic_t refcnt;
> >>  	enum bpf_map_type map_type;
> >> +	enum bpf_map_array_type map_array_type;
> >>  	u32 key_size;
> >>  	u32 value_size;
> >>  	u32 max_entries;
> >> @@ -187,6 +192,9 @@ struct bpf_array {
> >>  	 */
> >>  	enum bpf_prog_type owner_prog_type;
> >>  	bool owner_jited;
> >> +#ifdef CONFIG_SECURITY_LANDLOCK
> >> +	u32 n_entries;	/* number of entries in a handle array */
> >> +#endif /* CONFIG_SECURITY_LANDLOCK */
> >>  	union {
> >>  		char value[0] __aligned(8);
> >>  		void *ptrs[0] __aligned(8);
> >> @@ -194,6 +202,12 @@ struct bpf_array {
> >>  	};
> >>  };
> >>  
> >> +#ifdef CONFIG_SECURITY_LANDLOCK
> >> +struct map_landlock_handle {
> >> +	u32 type; /* enum bpf_map_handle_type */
> >> +};
> >> +#endif /* CONFIG_SECURITY_LANDLOCK */
> >> +
> >>  #define MAX_TAIL_CALL_CNT 32
> >>  
> >>  struct bpf_event_entry {
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 7cd36166f9b7..b68de57f7ab8 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -87,6 +87,15 @@ enum bpf_map_type {
> >>  	BPF_MAP_TYPE_PERCPU_ARRAY,
> >>  	BPF_MAP_TYPE_STACK_TRACE,P_TYPE_CGROUP_ARRAY
> >>  	BPF_MAP_TYPE_CGROUP_ARRAY,
> >> +	BPF_MAP_TYPE_LANDLOCK_ARRAY,
> >> +};
> >> +
> >> +enum bpf_map_array_type {
> >> +	BPF_MAP_ARRAY_TYPE_UNSPEC,
> >> +};
> >> +
> >> +enum bpf_map_handle_type {
> >> +	BPF_MAP_HANDLE_TYPE_UNSPEC,
> >>  };
> > 
> > missing something. why it has to be special to have it's own
> > fd array implementation?
> > Please take a look how BPF_MAP_TYPE_PERF_EVENT_ARRAY, 
> > BPF_MAP_TYPE_CGROUP_ARRAY and BPF_MAP_TYPE_PROG_ARRAY are done.
> > The all store objects into array map that user space passes via FD.
> > I think the same model should apply here.
> 
> The idea is to have multiple way for userland to describe a resource
> (e.g. an open file descriptor, a path or a glob pattern). The kernel
> representation could then be a "struct path *" or dedicated types (e.g.
> custom glob).

hmm. I think user space api should only deal with FD. Everything
else is user space job to encapsulate/hide.

> Another interesting point (that could replace
> check_map_func_compatibility()) is that BPF_MAP_TYPE_LANDLOCK_ARRAY
> translate to dedicated (abstract) types (instead of CONST_PTR_TO_MAP)
> thanks to bpf_reg_type_from_map(). This is useful to abstract userland
> (map) interface with kernel object(s) dealing with that type.

I probably missing something. If user space interface is FD,
to the kernel they're different object types. Nothing else.

> A third point is that BPF_MAP_TYPE_LANDLOCK_ARRAY is a kind of set. It
> is optimized to quickly walk through all the elements in a sequential way.

why set is any faster to walk vs array?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 19/22] landlock: Add interrupted origin
  2016-09-14 22:14     ` Mickaël Salaün
@ 2016-09-15  1:19       ` Andy Lutomirski
  2016-10-03 23:46         ` Kees Cook
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-15  1:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 3:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 14/09/2016 20:29, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> This third origin of hook call should cover all possible trigger paths
>>> (e.g. page fault). Landlock eBPF programs can then take decisions
>>> accordingly.
>>>
>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> ---
>>
>>
>>>
>>> +       if (unlikely(in_interrupt())) {
>>
>> IMO security hooks have no business being called from interrupts.
>> Aren't they all synchronous things done by tasks?  Interrupts are
>> driver things.
>>
>> Are you trying to check for page faults and such?
>
> Yes, that was the idea you did put in my mind. Not sure how to deal with
> this.
>

It's not so easy, unfortunately.  The easiest reliable way might be to
set a TS_ flag on all syscall entries when TIF_SECCOMP or similar is
set.

--Andy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-14 22:11     ` Mickaël Salaün
@ 2016-09-15  1:25       ` Andy Lutomirski
  2016-09-15  2:19         ` Alexei Starovoitov
  2016-09-15 19:35         ` Mickaël Salaün
  0 siblings, 2 replies; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-15  1:25 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 14/09/2016 20:27, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>>> set for all cgroup except the root. The flag is clear when a new process
>>> without the no_new_privs flags is attached to the cgroup.
>>>
>>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>>> process, to attach a process without no_new_privs to this cgroup will
>>> be denied.
>>
>> Until and unless everyone can agree on a way to properly namespace,
>> delegate, etc cgroups, I think that trying to add unprivileged
>> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
>> no-internal-tasks, etc, I just don't see how this approach can be
>> viable.
>
> As far as I can tell, the no_new_privs flag of at task is not related to
> namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
> the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
>
> Using cgroup is optional, any task could use the seccomp-based
> landlocking instead. However, for those that want/need to manage a
> security policy in a more dynamic way, using cgroups may make sense.
>
> I though cgroup delegation was OK in the v2, isn't it the case? Do you
> have some links?
>
>>
>> Can we try to make landlock work completely independently of cgroups
>> so that it doesn't get stuck and so that programs can use it without
>> worrying about cgroup v1 vs v2, interactions with cgroup managers,
>> cgroup managers that (supposedly?) will start migrating processes
>> around piecemeal and almost certainly blowing up landlock in the
>> process, etc?
>
> This RFC handle both cgroup and seccomp approaches in a similar way. I
> don't see why building on top of cgroup v2 is a problem. Is there
> security issues with delegation?

What I mean is: cgroup v2 delegation has a functionality problem.
Tejun says [1]:

We haven't had to face this decision because cgroup has never properly
supported delegating to applications and the in-use setups where this
happens are custom configurations where there is no boundary between
system and applications and adhoc trial-and-error is good enough a way
to find a working solution.  That wiggle room goes away once we
officially open this up to individual applications.

Unless and until that changes, I think that landlock should stay away
from cgroups.  Others could reasonably disagree with me.


[1] https://lkml.kernel.org/g/<20160909225747.GA30105@mtj.duckdns.org

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  1:25       ` Andy Lutomirski
@ 2016-09-15  2:19         ` Alexei Starovoitov
  2016-09-15  2:27           ` Andy Lutomirski
  2016-09-15 19:35         ` Mickaël Salaün
  1 sibling, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-15  2:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 06:25:07PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On 14/09/2016 20:27, Andy Lutomirski wrote:
> >> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
> >>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
> >>> set for all cgroup except the root. The flag is clear when a new process
> >>> without the no_new_privs flags is attached to the cgroup.
> >>>
> >>> If a cgroup is landlocked, then any new attempt, from an unprivileged
> >>> process, to attach a process without no_new_privs to this cgroup will
> >>> be denied.
> >>
> >> Until and unless everyone can agree on a way to properly namespace,
> >> delegate, etc cgroups, I think that trying to add unprivileged
> >> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
> >> no-internal-tasks, etc, I just don't see how this approach can be
> >> viable.
> >
> > As far as I can tell, the no_new_privs flag of at task is not related to
> > namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
> > the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
> >
> > Using cgroup is optional, any task could use the seccomp-based
> > landlocking instead. However, for those that want/need to manage a
> > security policy in a more dynamic way, using cgroups may make sense.
> >
> > I though cgroup delegation was OK in the v2, isn't it the case? Do you
> > have some links?
> >
> >>
> >> Can we try to make landlock work completely independently of cgroups
> >> so that it doesn't get stuck and so that programs can use it without
> >> worrying about cgroup v1 vs v2, interactions with cgroup managers,
> >> cgroup managers that (supposedly?) will start migrating processes
> >> around piecemeal and almost certainly blowing up landlock in the
> >> process, etc?
> >
> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> > don't see why building on top of cgroup v2 is a problem. Is there
> > security issues with delegation?
> 
> What I mean is: cgroup v2 delegation has a functionality problem.
> Tejun says [1]:
> 
> We haven't had to face this decision because cgroup has never properly
> supported delegating to applications and the in-use setups where this
> happens are custom configurations where there is no boundary between
> system and applications and adhoc trial-and-error is good enough a way
> to find a working solution.  That wiggle room goes away once we
> officially open this up to individual applications.
> 
> Unless and until that changes, I think that landlock should stay away
> from cgroups.  Others could reasonably disagree with me.

Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
and not for sandboxing. So the above doesn't matter in such contexts.
lsm hooks + cgroups provide convenient scope and existing entry points.
Please see checmate examples how it's used.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  2:19         ` Alexei Starovoitov
@ 2016-09-15  2:27           ` Andy Lutomirski
  2016-09-15  4:00             ` Alexei Starovoitov
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-15  2:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 7:19 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Sep 14, 2016 at 06:25:07PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün <mic@digikod.net> wrote:
>> >
>> > On 14/09/2016 20:27, Andy Lutomirski wrote:
>> >> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> >>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>> >>> set for all cgroup except the root. The flag is clear when a new process
>> >>> without the no_new_privs flags is attached to the cgroup.
>> >>>
>> >>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>> >>> process, to attach a process without no_new_privs to this cgroup will
>> >>> be denied.
>> >>
>> >> Until and unless everyone can agree on a way to properly namespace,
>> >> delegate, etc cgroups, I think that trying to add unprivileged
>> >> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
>> >> no-internal-tasks, etc, I just don't see how this approach can be
>> >> viable.
>> >
>> > As far as I can tell, the no_new_privs flag of at task is not related to
>> > namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
>> > the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
>> >
>> > Using cgroup is optional, any task could use the seccomp-based
>> > landlocking instead. However, for those that want/need to manage a
>> > security policy in a more dynamic way, using cgroups may make sense.
>> >
>> > I though cgroup delegation was OK in the v2, isn't it the case? Do you
>> > have some links?
>> >
>> >>
>> >> Can we try to make landlock work completely independently of cgroups
>> >> so that it doesn't get stuck and so that programs can use it without
>> >> worrying about cgroup v1 vs v2, interactions with cgroup managers,
>> >> cgroup managers that (supposedly?) will start migrating processes
>> >> around piecemeal and almost certainly blowing up landlock in the
>> >> process, etc?
>> >
>> > This RFC handle both cgroup and seccomp approaches in a similar way. I
>> > don't see why building on top of cgroup v2 is a problem. Is there
>> > security issues with delegation?
>>
>> What I mean is: cgroup v2 delegation has a functionality problem.
>> Tejun says [1]:
>>
>> We haven't had to face this decision because cgroup has never properly
>> supported delegating to applications and the in-use setups where this
>> happens are custom configurations where there is no boundary between
>> system and applications and adhoc trial-and-error is good enough a way
>> to find a working solution.  That wiggle room goes away once we
>> officially open this up to individual applications.
>>
>> Unless and until that changes, I think that landlock should stay away
>> from cgroups.  Others could reasonably disagree with me.
>
> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> and not for sandboxing. So the above doesn't matter in such contexts.
> lsm hooks + cgroups provide convenient scope and existing entry points.
> Please see checmate examples how it's used.
>

To be clear: I'm not arguing at all that there shouldn't be
bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
landlock interface shouldn't expose any cgroup integration, at least
until the cgroup situation settles down a lot.

--Andy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  2:27           ` Andy Lutomirski
@ 2016-09-15  4:00             ` Alexei Starovoitov
  2016-09-15  4:08               ` Andy Lutomirski
  0 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-15  4:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >
> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> > security issues with delegation?
> >>
> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> Tejun says [1]:
> >>
> >> We haven't had to face this decision because cgroup has never properly
> >> supported delegating to applications and the in-use setups where this
> >> happens are custom configurations where there is no boundary between
> >> system and applications and adhoc trial-and-error is good enough a way
> >> to find a working solution.  That wiggle room goes away once we
> >> officially open this up to individual applications.
> >>
> >> Unless and until that changes, I think that landlock should stay away
> >> from cgroups.  Others could reasonably disagree with me.
> >
> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> > and not for sandboxing. So the above doesn't matter in such contexts.
> > lsm hooks + cgroups provide convenient scope and existing entry points.
> > Please see checmate examples how it's used.
> >
> 
> To be clear: I'm not arguing at all that there shouldn't be
> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> landlock interface shouldn't expose any cgroup integration, at least
> until the cgroup situation settles down a lot.

ahh. yes. we're perfectly in agreement here.
I'm suggesting that the next RFC shouldn't include unpriv
and seccomp at all. Once bpf+lsm+cgroup is merged, we can
argue about unpriv with cgroups and even unpriv as a whole,
since it's not a given. Seccomp integration is also questionable.
I'd rather not have seccomp as a gate keeper for this lsm.
lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
don't have one to one relationship, so mixing them up is only
asking for trouble further down the road.
If we really need to carry some information from seccomp to lsm+bpf,
it's easier to add eBPF support to seccomp and let bpf side deal
with passing whatever information. 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  4:00             ` Alexei Starovoitov
@ 2016-09-15  4:08               ` Andy Lutomirski
  2016-09-15  4:31                 ` Alexei Starovoitov
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-15  4:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>> >> >
>> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
>> >> > don't see why building on top of cgroup v2 is a problem. Is there
>> >> > security issues with delegation?
>> >>
>> >> What I mean is: cgroup v2 delegation has a functionality problem.
>> >> Tejun says [1]:
>> >>
>> >> We haven't had to face this decision because cgroup has never properly
>> >> supported delegating to applications and the in-use setups where this
>> >> happens are custom configurations where there is no boundary between
>> >> system and applications and adhoc trial-and-error is good enough a way
>> >> to find a working solution.  That wiggle room goes away once we
>> >> officially open this up to individual applications.
>> >>
>> >> Unless and until that changes, I think that landlock should stay away
>> >> from cgroups.  Others could reasonably disagree with me.
>> >
>> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>> > and not for sandboxing. So the above doesn't matter in such contexts.
>> > lsm hooks + cgroups provide convenient scope and existing entry points.
>> > Please see checmate examples how it's used.
>> >
>>
>> To be clear: I'm not arguing at all that there shouldn't be
>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>> landlock interface shouldn't expose any cgroup integration, at least
>> until the cgroup situation settles down a lot.
>
> ahh. yes. we're perfectly in agreement here.
> I'm suggesting that the next RFC shouldn't include unpriv
> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> argue about unpriv with cgroups and even unpriv as a whole,
> since it's not a given. Seccomp integration is also questionable.
> I'd rather not have seccomp as a gate keeper for this lsm.
> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> don't have one to one relationship, so mixing them up is only
> asking for trouble further down the road.
> If we really need to carry some information from seccomp to lsm+bpf,
> it's easier to add eBPF support to seccomp and let bpf side deal
> with passing whatever information.
>

As an argument for keeping seccomp (or an extended seccomp) as the
interface for an unprivileged bpf+lsm: seccomp already checks off most
of the boxes for safely letting unprivileged programs sandbox
themselves.  Furthermore, to the extent that there are use cases for
unprivileged bpf+lsm that *aren't* expressible within the seccomp
hierarchy, I suspect that syscall filters have exactly the same
problem and that we should fix seccomp to cover it.

If I ever add a "seccomp monitor", which is something I want to do
eventually, I think it should work for lsm+bpf as well, which is
another argument for keeping it in seccomp.

--Andy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  4:08               ` Andy Lutomirski
@ 2016-09-15  4:31                 ` Alexei Starovoitov
  2016-09-15  4:38                   ` Andy Lutomirski
  0 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-15  4:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >> >
> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> >> > security issues with delegation?
> >> >>
> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> >> Tejun says [1]:
> >> >>
> >> >> We haven't had to face this decision because cgroup has never properly
> >> >> supported delegating to applications and the in-use setups where this
> >> >> happens are custom configurations where there is no boundary between
> >> >> system and applications and adhoc trial-and-error is good enough a way
> >> >> to find a working solution.  That wiggle room goes away once we
> >> >> officially open this up to individual applications.
> >> >>
> >> >> Unless and until that changes, I think that landlock should stay away
> >> >> from cgroups.  Others could reasonably disagree with me.
> >> >
> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> >> > and not for sandboxing. So the above doesn't matter in such contexts.
> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
> >> > Please see checmate examples how it's used.
> >> >
> >>
> >> To be clear: I'm not arguing at all that there shouldn't be
> >> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> >> landlock interface shouldn't expose any cgroup integration, at least
> >> until the cgroup situation settles down a lot.
> >
> > ahh. yes. we're perfectly in agreement here.
> > I'm suggesting that the next RFC shouldn't include unpriv
> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> > argue about unpriv with cgroups and even unpriv as a whole,
> > since it's not a given. Seccomp integration is also questionable.
> > I'd rather not have seccomp as a gate keeper for this lsm.
> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> > don't have one to one relationship, so mixing them up is only
> > asking for trouble further down the road.
> > If we really need to carry some information from seccomp to lsm+bpf,
> > it's easier to add eBPF support to seccomp and let bpf side deal
> > with passing whatever information.
> >
> 
> As an argument for keeping seccomp (or an extended seccomp) as the
> interface for an unprivileged bpf+lsm: seccomp already checks off most
> of the boxes for safely letting unprivileged programs sandbox
> themselves.  

you mean the attach part of seccomp syscall that deals with no_new_priv?
sure, that's reusable.

> Furthermore, to the extent that there are use cases for
> unprivileged bpf+lsm that *aren't* expressible within the seccomp
> hierarchy, I suspect that syscall filters have exactly the same
> problem and that we should fix seccomp to cover it.

not sure what you mean by 'seccomp hierarchy'. The normal process
hierarchy ?
imo the main deficiency of secccomp is inability to look into arguments.
One can argue that it's a blessing, since composite args
are not yet copied into the kernel memory.
But in a lot of cases the seccomp arguments are FDs pointing
to kernel objects and if programs could examine those objects
the sandboxing scope would be more precise.
lsm+bpf solves that part and I'd still argue that it's
orthogonal to seccomp's pass/reject flow.
I mean if seccomp says 'ok' the syscall should continue executing
as normal and whatever LSM hooks were triggered by it may have
their own lsm+bpf verdicts.
Furthermore in the process hierarchy different children
should be able to set their own lsm+bpf filters that are not
related to parallel seccomp+bpf hierarchy of programs.
seccomp syscall can be an interface to attach programs
to lsm hooks, but nothing more than that.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  4:31                 ` Alexei Starovoitov
@ 2016-09-15  4:38                   ` Andy Lutomirski
  2016-09-15  4:48                     ` Alexei Starovoitov
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2016-09-15  4:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>> >> >> >
>> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
>> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
>> >> >> > security issues with delegation?
>> >> >>
>> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
>> >> >> Tejun says [1]:
>> >> >>
>> >> >> We haven't had to face this decision because cgroup has never properly
>> >> >> supported delegating to applications and the in-use setups where this
>> >> >> happens are custom configurations where there is no boundary between
>> >> >> system and applications and adhoc trial-and-error is good enough a way
>> >> >> to find a working solution.  That wiggle room goes away once we
>> >> >> officially open this up to individual applications.
>> >> >>
>> >> >> Unless and until that changes, I think that landlock should stay away
>> >> >> from cgroups.  Others could reasonably disagree with me.
>> >> >
>> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>> >> > and not for sandboxing. So the above doesn't matter in such contexts.
>> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
>> >> > Please see checmate examples how it's used.
>> >> >
>> >>
>> >> To be clear: I'm not arguing at all that there shouldn't be
>> >> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>> >> landlock interface shouldn't expose any cgroup integration, at least
>> >> until the cgroup situation settles down a lot.
>> >
>> > ahh. yes. we're perfectly in agreement here.
>> > I'm suggesting that the next RFC shouldn't include unpriv
>> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
>> > argue about unpriv with cgroups and even unpriv as a whole,
>> > since it's not a given. Seccomp integration is also questionable.
>> > I'd rather not have seccomp as a gate keeper for this lsm.
>> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
>> > don't have one to one relationship, so mixing them up is only
>> > asking for trouble further down the road.
>> > If we really need to carry some information from seccomp to lsm+bpf,
>> > it's easier to add eBPF support to seccomp and let bpf side deal
>> > with passing whatever information.
>> >
>>
>> As an argument for keeping seccomp (or an extended seccomp) as the
>> interface for an unprivileged bpf+lsm: seccomp already checks off most
>> of the boxes for safely letting unprivileged programs sandbox
>> themselves.
>
> you mean the attach part of seccomp syscall that deals with no_new_priv?
> sure, that's reusable.
>
>> Furthermore, to the extent that there are use cases for
>> unprivileged bpf+lsm that *aren't* expressible within the seccomp
>> hierarchy, I suspect that syscall filters have exactly the same
>> problem and that we should fix seccomp to cover it.
>
> not sure what you mean by 'seccomp hierarchy'. The normal process
> hierarchy ?

Kind of.  I mean the filter layers that are inherited across fork(),
the TSYNC mechanism, etc.

> imo the main deficiency of secccomp is inability to look into arguments.
> One can argue that it's a blessing, since composite args
> are not yet copied into the kernel memory.
> But in a lot of cases the seccomp arguments are FDs pointing
> to kernel objects and if programs could examine those objects
> the sandboxing scope would be more precise.
> lsm+bpf solves that part and I'd still argue that it's
> orthogonal to seccomp's pass/reject flow.
> I mean if seccomp says 'ok' the syscall should continue executing
> as normal and whatever LSM hooks were triggered by it may have
> their own lsm+bpf verdicts.

I agree with all of this...

> Furthermore in the process hierarchy different children
> should be able to set their own lsm+bpf filters that are not
> related to parallel seccomp+bpf hierarchy of programs.
> seccomp syscall can be an interface to attach programs
> to lsm hooks, but nothing more than that.

I'm not sure what you mean.  I mean that, logically, I think we should
be able to do:

seccomp(attach a syscall filter);
fork();
child does seccomp(attach some lsm filters);

I think that they *should* be related to the seccomp+bpf hierarchy of
programs in that they are entries in the same logical list of filter
layers installed.  Some of those layers can be syscall filters and
some of the layers can be lsm filters.  If we subsequently add a way
to attach a removable seccomp filter or a way to attach a seccomp
filter that logs failures to some fd watched by an outside monitor, I
think that should work for lsm, too, with more or less the same
interface.

If we need a way for a sandbox manager to opt different children into
different subsets of fancy filters, then I think that syscall filters
and lsm filters should use the same mechanism.

I think we might be on the same page here and just saying it different ways.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  4:38                   ` Andy Lutomirski
@ 2016-09-15  4:48                     ` Alexei Starovoitov
  2016-09-15 19:41                       ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-15  4:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >> >> >
> >> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> >> >> > security issues with delegation?
> >> >> >>
> >> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> >> >> Tejun says [1]:
> >> >> >>
> >> >> >> We haven't had to face this decision because cgroup has never properly
> >> >> >> supported delegating to applications and the in-use setups where this
> >> >> >> happens are custom configurations where there is no boundary between
> >> >> >> system and applications and adhoc trial-and-error is good enough a way
> >> >> >> to find a working solution.  That wiggle room goes away once we
> >> >> >> officially open this up to individual applications.
> >> >> >>
> >> >> >> Unless and until that changes, I think that landlock should stay away
> >> >> >> from cgroups.  Others could reasonably disagree with me.
> >> >> >
> >> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> >> >> > and not for sandboxing. So the above doesn't matter in such contexts.
> >> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
> >> >> > Please see checmate examples how it's used.
> >> >> >
> >> >>
> >> >> To be clear: I'm not arguing at all that there shouldn't be
> >> >> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> >> >> landlock interface shouldn't expose any cgroup integration, at least
> >> >> until the cgroup situation settles down a lot.
> >> >
> >> > ahh. yes. we're perfectly in agreement here.
> >> > I'm suggesting that the next RFC shouldn't include unpriv
> >> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> >> > argue about unpriv with cgroups and even unpriv as a whole,
> >> > since it's not a given. Seccomp integration is also questionable.
> >> > I'd rather not have seccomp as a gate keeper for this lsm.
> >> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> >> > don't have one to one relationship, so mixing them up is only
> >> > asking for trouble further down the road.
> >> > If we really need to carry some information from seccomp to lsm+bpf,
> >> > it's easier to add eBPF support to seccomp and let bpf side deal
> >> > with passing whatever information.
> >> >
> >>
> >> As an argument for keeping seccomp (or an extended seccomp) as the
> >> interface for an unprivileged bpf+lsm: seccomp already checks off most
> >> of the boxes for safely letting unprivileged programs sandbox
> >> themselves.
> >
> > you mean the attach part of seccomp syscall that deals with no_new_priv?
> > sure, that's reusable.
> >
> >> Furthermore, to the extent that there are use cases for
> >> unprivileged bpf+lsm that *aren't* expressible within the seccomp
> >> hierarchy, I suspect that syscall filters have exactly the same
> >> problem and that we should fix seccomp to cover it.
> >
> > not sure what you mean by 'seccomp hierarchy'. The normal process
> > hierarchy ?
> 
> Kind of.  I mean the filter layers that are inherited across fork(),
> the TSYNC mechanism, etc.
> 
> > imo the main deficiency of secccomp is inability to look into arguments.
> > One can argue that it's a blessing, since composite args
> > are not yet copied into the kernel memory.
> > But in a lot of cases the seccomp arguments are FDs pointing
> > to kernel objects and if programs could examine those objects
> > the sandboxing scope would be more precise.
> > lsm+bpf solves that part and I'd still argue that it's
> > orthogonal to seccomp's pass/reject flow.
> > I mean if seccomp says 'ok' the syscall should continue executing
> > as normal and whatever LSM hooks were triggered by it may have
> > their own lsm+bpf verdicts.
> 
> I agree with all of this...
> 
> > Furthermore in the process hierarchy different children
> > should be able to set their own lsm+bpf filters that are not
> > related to parallel seccomp+bpf hierarchy of programs.
> > seccomp syscall can be an interface to attach programs
> > to lsm hooks, but nothing more than that.
> 
> I'm not sure what you mean.  I mean that, logically, I think we should
> be able to do:
> 
> seccomp(attach a syscall filter);
> fork();
> child does seccomp(attach some lsm filters);
> 
> I think that they *should* be related to the seccomp+bpf hierarchy of
> programs in that they are entries in the same logical list of filter
> layers installed.  Some of those layers can be syscall filters and
> some of the layers can be lsm filters.  If we subsequently add a way
> to attach a removable seccomp filter or a way to attach a seccomp
> filter that logs failures to some fd watched by an outside monitor, I
> think that should work for lsm, too, with more or less the same
> interface.
> 
> If we need a way for a sandbox manager to opt different children into
> different subsets of fancy filters, then I think that syscall filters
> and lsm filters should use the same mechanism.
> 
> I think we might be on the same page here and just saying it different ways.

Sounds like it :)
All of the above makes sense to me.
The 'orthogonal' part is that the user should be able to use
this seccomp-managed hierarchy without actually enabling
TIF_SECCOMP for the task and syscalls should still go through
fast path and all the way till lsm hooks as normal.
I don't want to pay _any_ performance penalty for this feature
for lsm hooks (and all syscalls) that don't have bpf programs attached.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  1:25       ` Andy Lutomirski
  2016-09-15  2:19         ` Alexei Starovoitov
@ 2016-09-15 19:35         ` Mickaël Salaün
  1 sibling, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-15 19:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

[-- Attachment #1.1: Type: text/plain, Size: 3682 bytes --]


On 15/09/2016 03:25, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 14/09/2016 20:27, Andy Lutomirski wrote:
>>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>>>> set for all cgroup except the root. The flag is clear when a new process
>>>> without the no_new_privs flags is attached to the cgroup.
>>>>
>>>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>>>> process, to attach a process without no_new_privs to this cgroup will
>>>> be denied.
>>>
>>> Until and unless everyone can agree on a way to properly namespace,
>>> delegate, etc cgroups, I think that trying to add unprivileged
>>> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
>>> no-internal-tasks, etc, I just don't see how this approach can be
>>> viable.
>>
>> As far as I can tell, the no_new_privs flag of at task is not related to
>> namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
>> the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
>>
>> Using cgroup is optional, any task could use the seccomp-based
>> landlocking instead. However, for those that want/need to manage a
>> security policy in a more dynamic way, using cgroups may make sense.
>>
>> I though cgroup delegation was OK in the v2, isn't it the case? Do you
>> have some links?
>>
>>>
>>> Can we try to make landlock work completely independently of cgroups
>>> so that it doesn't get stuck and so that programs can use it without
>>> worrying about cgroup v1 vs v2, interactions with cgroup managers,
>>> cgroup managers that (supposedly?) will start migrating processes
>>> around piecemeal and almost certainly blowing up landlock in the
>>> process, etc?
>>
>> This RFC handle both cgroup and seccomp approaches in a similar way. I
>> don't see why building on top of cgroup v2 is a problem. Is there
>> security issues with delegation?
> 
> What I mean is: cgroup v2 delegation has a functionality problem.
> Tejun says [1]:
> 
> We haven't had to face this decision because cgroup has never properly
> supported delegating to applications and the in-use setups where this
> happens are custom configurations where there is no boundary between
> system and applications and adhoc trial-and-error is good enough a way
> to find a working solution.  That wiggle room goes away once we
> officially open this up to individual applications.
> 
> Unless and until that changes, I think that landlock should stay away
> from cgroups.  Others could reasonably disagree with me.
> 
> [1] https://lkml.kernel.org/r/20160909225747.GA30105@mtj.duckdns.org
> 

I don't get the same echo here:
https://lkml.kernel.org/r/20160826155026.GD16906@mtj.duckdns.org

On 26/08/2016 17:50, Tejun Heo wrote:
> Please refer to "2-5. Delegation" of Documentation/cgroup-v2.txt.
> Delegation on v1 is broken on both core and specific controller
> behaviors and thus discouraged.  On v2, delegation should work just
> fine.

Tejun, could you please clarify if there is still a problem with cgroup
v2 delegation?

This patch only implement a cache mechanism with the CGRP_NO_NEW_PRIVS
flag. If cgroups can group processes correctly, I don't see any
(security) issue here. It's the administrator choice to delegate a part
of the cgroup management. It's then the delegatee responsibility to
correctly put processes in cgroups. This is comparable to a process
which is responsible to correctly call seccomp(2).

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15  4:48                     ` Alexei Starovoitov
@ 2016-09-15 19:41                       ` Mickaël Salaün
  2016-09-20  4:37                         ` Sargun Dhillon
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-15 19:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

[-- Attachment #1.1: Type: text/plain, Size: 6257 bytes --]


On 15/09/2016 06:48, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
>>>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>>>>>>>>>
>>>>>>>>> This RFC handle both cgroup and seccomp approaches in a similar way. I
>>>>>>>>> don't see why building on top of cgroup v2 is a problem. Is there
>>>>>>>>> security issues with delegation?
>>>>>>>>
>>>>>>>> What I mean is: cgroup v2 delegation has a functionality problem.
>>>>>>>> Tejun says [1]:
>>>>>>>>
>>>>>>>> We haven't had to face this decision because cgroup has never properly
>>>>>>>> supported delegating to applications and the in-use setups where this
>>>>>>>> happens are custom configurations where there is no boundary between
>>>>>>>> system and applications and adhoc trial-and-error is good enough a way
>>>>>>>> to find a working solution.  That wiggle room goes away once we
>>>>>>>> officially open this up to individual applications.
>>>>>>>>
>>>>>>>> Unless and until that changes, I think that landlock should stay away
>>>>>>>> from cgroups.  Others could reasonably disagree with me.
>>>>>>>
>>>>>>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>>>>>>> and not for sandboxing. So the above doesn't matter in such contexts.
>>>>>>> lsm hooks + cgroups provide convenient scope and existing entry points.
>>>>>>> Please see checmate examples how it's used.
>>>>>>>
>>>>>>
>>>>>> To be clear: I'm not arguing at all that there shouldn't be
>>>>>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>>>>>> landlock interface shouldn't expose any cgroup integration, at least
>>>>>> until the cgroup situation settles down a lot.
>>>>>
>>>>> ahh. yes. we're perfectly in agreement here.
>>>>> I'm suggesting that the next RFC shouldn't include unpriv
>>>>> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
>>>>> argue about unpriv with cgroups and even unpriv as a whole,
>>>>> since it's not a given. Seccomp integration is also questionable.
>>>>> I'd rather not have seccomp as a gate keeper for this lsm.
>>>>> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
>>>>> don't have one to one relationship, so mixing them up is only
>>>>> asking for trouble further down the road.
>>>>> If we really need to carry some information from seccomp to lsm+bpf,
>>>>> it's easier to add eBPF support to seccomp and let bpf side deal
>>>>> with passing whatever information.
>>>>>
>>>>
>>>> As an argument for keeping seccomp (or an extended seccomp) as the
>>>> interface for an unprivileged bpf+lsm: seccomp already checks off most
>>>> of the boxes for safely letting unprivileged programs sandbox
>>>> themselves.
>>>
>>> you mean the attach part of seccomp syscall that deals with no_new_priv?
>>> sure, that's reusable.
>>>
>>>> Furthermore, to the extent that there are use cases for
>>>> unprivileged bpf+lsm that *aren't* expressible within the seccomp
>>>> hierarchy, I suspect that syscall filters have exactly the same
>>>> problem and that we should fix seccomp to cover it.
>>>
>>> not sure what you mean by 'seccomp hierarchy'. The normal process
>>> hierarchy ?
>>
>> Kind of.  I mean the filter layers that are inherited across fork(),
>> the TSYNC mechanism, etc.
>>
>>> imo the main deficiency of secccomp is inability to look into arguments.
>>> One can argue that it's a blessing, since composite args
>>> are not yet copied into the kernel memory.
>>> But in a lot of cases the seccomp arguments are FDs pointing
>>> to kernel objects and if programs could examine those objects
>>> the sandboxing scope would be more precise.
>>> lsm+bpf solves that part and I'd still argue that it's
>>> orthogonal to seccomp's pass/reject flow.
>>> I mean if seccomp says 'ok' the syscall should continue executing
>>> as normal and whatever LSM hooks were triggered by it may have
>>> their own lsm+bpf verdicts.
>>
>> I agree with all of this...
>>
>>> Furthermore in the process hierarchy different children
>>> should be able to set their own lsm+bpf filters that are not
>>> related to parallel seccomp+bpf hierarchy of programs.
>>> seccomp syscall can be an interface to attach programs
>>> to lsm hooks, but nothing more than that.
>>
>> I'm not sure what you mean.  I mean that, logically, I think we should
>> be able to do:
>>
>> seccomp(attach a syscall filter);
>> fork();
>> child does seccomp(attach some lsm filters);
>>
>> I think that they *should* be related to the seccomp+bpf hierarchy of
>> programs in that they are entries in the same logical list of filter
>> layers installed.  Some of those layers can be syscall filters and
>> some of the layers can be lsm filters.  If we subsequently add a way
>> to attach a removable seccomp filter or a way to attach a seccomp
>> filter that logs failures to some fd watched by an outside monitor, I
>> think that should work for lsm, too, with more or less the same
>> interface.
>>
>> If we need a way for a sandbox manager to opt different children into
>> different subsets of fancy filters, then I think that syscall filters
>> and lsm filters should use the same mechanism.
>>
>> I think we might be on the same page here and just saying it different ways.
> 
> Sounds like it :)
> All of the above makes sense to me.
> The 'orthogonal' part is that the user should be able to use
> this seccomp-managed hierarchy without actually enabling
> TIF_SECCOMP for the task and syscalls should still go through
> fast path and all the way till lsm hooks as normal.
> I don't want to pay _any_ performance penalty for this feature
> for lsm hooks (and all syscalls) that don't have bpf programs attached.

Yes, it seems that we are all on the same page here, and that match this
RFC implementation. So, using the seccomp(2) *interface* to attach
Landlock programs to a process hierarchy is still on track. :)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14 23:24       ` Alexei Starovoitov
@ 2016-09-15 21:25         ` Mickaël Salaün
  2016-09-20  0:12           ` lsm naming dilemma. " Alexei Starovoitov
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-15 21:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 3218 bytes --]


On 15/09/2016 01:24, Alexei Starovoitov wrote:
> On Thu, Sep 15, 2016 at 01:02:22AM +0200, Mickaël Salaün wrote:
>>>
>>> I would suggest for the next RFC to do minimal 7 patches up to this point
>>> with simple example that demonstrates the use case.
>>> I would avoid all unpriv stuff and all of seccomp for the next RFC as well,
>>> otherwise I don't think we can realistically make forward progress, since
>>> there are too many issues raised in the subsequent patches.
>>
>> I hope we will find a common agreement about seccomp vs cgroup… I think
>> both approaches have their advantages, can be complementary and nicely
>> combined.
> 
> I don't mind having both task based lsm and cgroup based as long as
> infrastracture is not duplicated and scaling issues from earlier version
> are resolved.

It should be much better with this RFC.

> I'm proposing to do cgroup only for the next RFC, since mine and Sargun's
> use case for this bpf+lsm+cgroup is _not_ security or sandboxing.

Well, LSM purpose is to do security stuff. The main goal of Landlock is
to bring security features to userland, including unprivileged
processes, at least via the seccomp interface [1].

> No need for unpriv, no_new_priv to cgroups are other things that Andy
> is concerned about.

I'm concern about security too! :)

> 
>> Unprivileged sandboxing is the main goal of Landlock. This should not be
>> a problem, even for privileged features, thanks to the new subtype/access.
> 
> yes. the point that unpriv stuff can come later after agreement is reached.
> If we keep arguing about seccomp details this set won't go anywhere.
> Even in basic part (which is cgroup+bpf+lsm) are plenty of questions
> to be still agreed.

Using the seccomp(2) (unpriv) *interface* is OK according to a more
recent thread [1].

[1]
https://lkml.kernel.org/r/20160915044852.GA66000@ast-mbp.thefacebook.com

> 
>> Agreed. With this RFC, the Checmate features (i.e. network helpers)
>> should be able to sit on top of Landlock.
> 
> I think neither of them should be called fancy names for no technical reason.
> We will have only one bpf based lsm. That's it and it doesn't
> need an obscure name. Directory name can be security/bpf/..stuff.c

I disagree on an LSM named "BPF". I first started with the "seccomp LSM"
name (first RFC) but I later realized that it is confusing because
seccomp is associated to its syscall and the underlying features. Same
thing goes for BPF. It is also artificially hard to grep on a name too
used in the kernel source tree.
Making an association between the generic eBPF mechanism and a security
centric approach (i.e. LSM) seems a bit reductive (for BPF). Moreover,
the seccomp interface [1] can still be used.

Landlock is a nice name to depict a sandbox as an enclave (i.e. a
landlocked country/state). I want to keep this name, which is simple,
express the goal of Landlock nicely and is comparable to other sandbox
mechanisms as Seatbelt or Pledge.
Landlock should not be confused with the underlying eBPF implementation.
Landlock could use more than only eBPF in the future and eBPF could be
used in other LSM as well.

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-09-14 23:28       ` Alexei Starovoitov
@ 2016-09-15 21:51         ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-15 21:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 6355 bytes --]



On 15/09/2016 01:28, Alexei Starovoitov wrote:
> On Thu, Sep 15, 2016 at 01:22:49AM +0200, Mickaël Salaün wrote:
>>
>> On 14/09/2016 20:51, Alexei Starovoitov wrote:
>>> On Wed, Sep 14, 2016 at 09:23:56AM +0200, Mickaël Salaün wrote:
>>>> This new arraymap looks like a set and brings new properties:
>>>> * strong typing of entries: the eBPF functions get the array type of
>>>>   elements instead of CONST_PTR_TO_MAP (e.g.
>>>>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
>>>> * force sequential filling (i.e. replace or append-only update), which
>>>>   allow quick browsing of all entries.
>>>>
>>>> This strong typing is useful to statically check if the content of a map
>>>> can be passed to an eBPF function. For example, Landlock use it to store
>>>> and manage kernel objects (e.g. struct file) instead of dealing with
>>>> userland raw data. This improve efficiency and ensure that an eBPF
>>>> program can only call functions with the right high-level arguments.
>>>>
>>>> The enum bpf_map_handle_type list low-level types (e.g.
>>>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
>>>> updating a map entry (handle). This handle types are used to infer a
>>>> high-level arraymap type which are listed in enum bpf_map_array_type
>>>> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
>>>>
>>>> For now, this new arraymap is only used by Landlock LSM (cf. next
>>>> commits) but it could be useful for other needs.
>>>>
>>>> Changes since v2:
>>>> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
>>>>   handle entries (suggested by Andy Lutomirski)
>>>> * remove useless checks
>>>>
>>>> Changes since v1:
>>>> * arraymap of handles replace custom checker groups
>>>> * simpler userland API
>>>>
>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>> Cc: David S. Miller <davem@davemloft.net>
>>>> Cc: Kees Cook <keescook@chromium.org>
>>>> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
>>>> ---
>>>>  include/linux/bpf.h      |  14 ++++
>>>>  include/uapi/linux/bpf.h |  18 +++++
>>>>  kernel/bpf/arraymap.c    | 203 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  kernel/bpf/verifier.c    |  12 ++-
>>>>  4 files changed, 246 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>> index fa9a988400d9..eae4ce4542c1 100644
>>>> --- a/include/linux/bpf.h
>>>> +++ b/include/linux/bpf.h
>>>> @@ -13,6 +13,10 @@
>>>>  #include <linux/percpu.h>
>>>>  #include <linux/err.h>
>>>>  
>>>> +#ifdef CONFIG_SECURITY_LANDLOCK
>>>> +#include <linux/fs.h> /* struct file */
>>>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>>> +
>>>>  struct perf_event;
>>>>  struct bpf_map;
>>>>  
>>>> @@ -38,6 +42,7 @@ struct bpf_map_ops {
>>>>  struct bpf_map {
>>>>  	atomic_t refcnt;
>>>>  	enum bpf_map_type map_type;
>>>> +	enum bpf_map_array_type map_array_type;
>>>>  	u32 key_size;
>>>>  	u32 value_size;
>>>>  	u32 max_entries;
>>>> @@ -187,6 +192,9 @@ struct bpf_array {
>>>>  	 */
>>>>  	enum bpf_prog_type owner_prog_type;
>>>>  	bool owner_jited;
>>>> +#ifdef CONFIG_SECURITY_LANDLOCK
>>>> +	u32 n_entries;	/* number of entries in a handle array */
>>>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>>>  	union {
>>>>  		char value[0] __aligned(8);
>>>>  		void *ptrs[0] __aligned(8);
>>>> @@ -194,6 +202,12 @@ struct bpf_array {
>>>>  	};
>>>>  };
>>>>  
>>>> +#ifdef CONFIG_SECURITY_LANDLOCK
>>>> +struct map_landlock_handle {
>>>> +	u32 type; /* enum bpf_map_handle_type */
>>>> +};
>>>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>>> +
>>>>  #define MAX_TAIL_CALL_CNT 32
>>>>  
>>>>  struct bpf_event_entry {
>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>> index 7cd36166f9b7..b68de57f7ab8 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -87,6 +87,15 @@ enum bpf_map_type {
>>>>  	BPF_MAP_TYPE_PERCPU_ARRAY,
>>>>  	BPF_MAP_TYPE_STACK_TRACE,P_TYPE_CGROUP_ARRAY
>>>>  	BPF_MAP_TYPE_CGROUP_ARRAY,
>>>> +	BPF_MAP_TYPE_LANDLOCK_ARRAY,
>>>> +};
>>>> +
>>>> +enum bpf_map_array_type {
>>>> +	BPF_MAP_ARRAY_TYPE_UNSPEC,
>>>> +};
>>>> +
>>>> +enum bpf_map_handle_type {
>>>> +	BPF_MAP_HANDLE_TYPE_UNSPEC,
>>>>  };
>>>
>>> missing something. why it has to be special to have it's own
>>> fd array implementation?
>>> Please take a look how BPF_MAP_TYPE_PERF_EVENT_ARRAY, 
>>> BPF_MAP_TYPE_CGROUP_ARRAY and BPF_MAP_TYPE_PROG_ARRAY are done.
>>> The all store objects into array map that user space passes via FD.
>>> I think the same model should apply here.
>>
>> The idea is to have multiple way for userland to describe a resource
>> (e.g. an open file descriptor, a path or a glob pattern). The kernel
>> representation could then be a "struct path *" or dedicated types (e.g.
>> custom glob).
> 
> hmm. I think user space api should only deal with FD. Everything
> else is user space job to encapsulate/hide.

How would you create a FD referring to a glob, a user or port ranges for
example ?

> 
>> Another interesting point (that could replace
>> check_map_func_compatibility()) is that BPF_MAP_TYPE_LANDLOCK_ARRAY
>> translate to dedicated (abstract) types (instead of CONST_PTR_TO_MAP)
>> thanks to bpf_reg_type_from_map(). This is useful to abstract userland
>> (map) interface with kernel object(s) dealing with that type.
> 
> I probably missing something. If user space interface is FD,
> to the kernel they're different object types. Nothing else.

Yes but what if there is more than one way to express a resource (cf.
previous comment). A FD can refer to an *existing file* but a glob
pattern could match a bunch of files (existing or not). This was a
concern for Kees Cook and James Morris [1].

[1]
https://lkml.kernel.org/r/CAGXu5jK1U12vMk11HD_x_gNz3Rk4ZgEfdThY7DHvm4e4sPRh4g@mail.gmail.com

> 
>> A third point is that BPF_MAP_TYPE_LANDLOCK_ARRAY is a kind of set. It
>> is optimized to quickly walk through all the elements in a sequential way.
> 
> why set is any faster to walk vs array?

It is an array with only sequential entries (i.e. no hole in the array).


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* lsm naming dilemma. Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-15 21:25         ` Mickaël Salaün
@ 2016-09-20  0:12           ` Alexei Starovoitov
  2016-09-20  1:10             ` Sargun Dhillon
  0 siblings, 1 reply; 76+ messages in thread
From: Alexei Starovoitov @ 2016-09-20  0:12 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On Thu, Sep 15, 2016 at 11:25:10PM +0200, Mickaël Salaün wrote:
> >> Agreed. With this RFC, the Checmate features (i.e. network helpers)
> >> should be able to sit on top of Landlock.
> > 
> > I think neither of them should be called fancy names for no technical reason.
> > We will have only one bpf based lsm. That's it and it doesn't
> > need an obscure name. Directory name can be security/bpf/..stuff.c
> 
> I disagree on an LSM named "BPF". I first started with the "seccomp LSM"
> name (first RFC) but I later realized that it is confusing because
> seccomp is associated to its syscall and the underlying features. Same
> thing goes for BPF. It is also artificially hard to grep on a name too
> used in the kernel source tree.
> Making an association between the generic eBPF mechanism and a security
> centric approach (i.e. LSM) seems a bit reductive (for BPF). Moreover,
> the seccomp interface [1] can still be used.

agree with above.

> Landlock is a nice name to depict a sandbox as an enclave (i.e. a
> landlocked country/state). I want to keep this name, which is simple,
> express the goal of Landlock nicely and is comparable to other sandbox
> mechanisms as Seatbelt or Pledge.
> Landlock should not be confused with the underlying eBPF implementation.
> Landlock could use more than only eBPF in the future and eBPF could be
> used in other LSM as well.

there will not be two bpf based LSMs.
Therefore unless you can convince Sargun to give up his 'checmate' name,
nothing goes in.
The features you both need are 90% the same, so they must be done
as part of single LSM whatever you both agree to call it.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: lsm naming dilemma. Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-20  0:12           ` lsm naming dilemma. " Alexei Starovoitov
@ 2016-09-20  1:10             ` Sargun Dhillon
  2016-09-20 16:58               ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Sargun Dhillon @ 2016-09-20  1:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	Daniel Mack, David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, LSM, netdev, cgroups

I'm fine giving up the Checmate name. Landlock seems easy enough to
Google. I haven't gotten a chance to look through the entire patchset
yet, but it does seem like they are somewhat similar.

On Mon, Sep 19, 2016 at 5:12 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Thu, Sep 15, 2016 at 11:25:10PM +0200, Mickaël Salaün wrote:
>> >> Agreed. With this RFC, the Checmate features (i.e. network helpers)
>> >> should be able to sit on top of Landlock.
>> >
>> > I think neither of them should be called fancy names for no technical reason.
>> > We will have only one bpf based lsm. That's it and it doesn't
>> > need an obscure name. Directory name can be security/bpf/..stuff.c
>>
>> I disagree on an LSM named "BPF". I first started with the "seccomp LSM"
>> name (first RFC) but I later realized that it is confusing because
>> seccomp is associated to its syscall and the underlying features. Same
>> thing goes for BPF. It is also artificially hard to grep on a name too
>> used in the kernel source tree.
>> Making an association between the generic eBPF mechanism and a security
>> centric approach (i.e. LSM) seems a bit reductive (for BPF). Moreover,
>> the seccomp interface [1] can still be used.
>
> agree with above.
>
>> Landlock is a nice name to depict a sandbox as an enclave (i.e. a
>> landlocked country/state). I want to keep this name, which is simple,
>> express the goal of Landlock nicely and is comparable to other sandbox
>> mechanisms as Seatbelt or Pledge.
>> Landlock should not be confused with the underlying eBPF implementation.
>> Landlock could use more than only eBPF in the future and eBPF could be
>> used in other LSM as well.
>
> there will not be two bpf based LSMs.
> Therefore unless you can convince Sargun to give up his 'checmate' name,
> nothing goes in.
> The features you both need are 90% the same, so they must be done
> as part of single LSM whatever you both agree to call it.
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-15 19:41                       ` Mickaël Salaün
@ 2016-09-20  4:37                         ` Sargun Dhillon
  2016-09-20 17:02                           ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Sargun Dhillon @ 2016-09-20  4:37 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Alexei Starovoitov, Andy Lutomirski, linux-kernel,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Thu, Sep 15, 2016 at 09:41:33PM +0200, Mickaël Salaün wrote:
> 
> On 15/09/2016 06:48, Alexei Starovoitov wrote:
> > On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
> >>>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
> >>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >>>>>>>>>
> >>>>>>>>> This RFC handle both cgroup and seccomp approaches in a similar way. I
> >>>>>>>>> don't see why building on top of cgroup v2 is a problem. Is there
> >>>>>>>>> security issues with delegation?
> >>>>>>>>
> >>>>>>>> What I mean is: cgroup v2 delegation has a functionality problem.
> >>>>>>>> Tejun says [1]:
> >>>>>>>>
> >>>>>>>> We haven't had to face this decision because cgroup has never properly
> >>>>>>>> supported delegating to applications and the in-use setups where this
> >>>>>>>> happens are custom configurations where there is no boundary between
> >>>>>>>> system and applications and adhoc trial-and-error is good enough a way
> >>>>>>>> to find a working solution.  That wiggle room goes away once we
> >>>>>>>> officially open this up to individual applications.
> >>>>>>>>
> >>>>>>>> Unless and until that changes, I think that landlock should stay away
> >>>>>>>> from cgroups.  Others could reasonably disagree with me.
> >>>>>>>
> >>>>>>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> >>>>>>> and not for sandboxing. So the above doesn't matter in such contexts.
> >>>>>>> lsm hooks + cgroups provide convenient scope and existing entry points.
> >>>>>>> Please see checmate examples how it's used.
> >>>>>>>
> >>>>>>
> >>>>>> To be clear: I'm not arguing at all that there shouldn't be
> >>>>>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> >>>>>> landlock interface shouldn't expose any cgroup integration, at least
> >>>>>> until the cgroup situation settles down a lot.
> >>>>>
> >>>>> ahh. yes. we're perfectly in agreement here.
> >>>>> I'm suggesting that the next RFC shouldn't include unpriv
> >>>>> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> >>>>> argue about unpriv with cgroups and even unpriv as a whole,
> >>>>> since it's not a given. Seccomp integration is also questionable.
> >>>>> I'd rather not have seccomp as a gate keeper for this lsm.
> >>>>> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> >>>>> don't have one to one relationship, so mixing them up is only
> >>>>> asking for trouble further down the road.
> >>>>> If we really need to carry some information from seccomp to lsm+bpf,
> >>>>> it's easier to add eBPF support to seccomp and let bpf side deal
> >>>>> with passing whatever information.
> >>>>>
> >>>>
> >>>> As an argument for keeping seccomp (or an extended seccomp) as the
> >>>> interface for an unprivileged bpf+lsm: seccomp already checks off most
> >>>> of the boxes for safely letting unprivileged programs sandbox
> >>>> themselves.
> >>>
> >>> you mean the attach part of seccomp syscall that deals with no_new_priv?
> >>> sure, that's reusable.
> >>>
> >>>> Furthermore, to the extent that there are use cases for
> >>>> unprivileged bpf+lsm that *aren't* expressible within the seccomp
> >>>> hierarchy, I suspect that syscall filters have exactly the same
> >>>> problem and that we should fix seccomp to cover it.
> >>>
> >>> not sure what you mean by 'seccomp hierarchy'. The normal process
> >>> hierarchy ?
> >>
> >> Kind of.  I mean the filter layers that are inherited across fork(),
> >> the TSYNC mechanism, etc.
> >>
> >>> imo the main deficiency of secccomp is inability to look into arguments.
> >>> One can argue that it's a blessing, since composite args
> >>> are not yet copied into the kernel memory.
> >>> But in a lot of cases the seccomp arguments are FDs pointing
> >>> to kernel objects and if programs could examine those objects
> >>> the sandboxing scope would be more precise.
> >>> lsm+bpf solves that part and I'd still argue that it's
> >>> orthogonal to seccomp's pass/reject flow.
> >>> I mean if seccomp says 'ok' the syscall should continue executing
> >>> as normal and whatever LSM hooks were triggered by it may have
> >>> their own lsm+bpf verdicts.
> >>
> >> I agree with all of this...
> >>
> >>> Furthermore in the process hierarchy different children
> >>> should be able to set their own lsm+bpf filters that are not
> >>> related to parallel seccomp+bpf hierarchy of programs.
> >>> seccomp syscall can be an interface to attach programs
> >>> to lsm hooks, but nothing more than that.
> >>
> >> I'm not sure what you mean.  I mean that, logically, I think we should
> >> be able to do:
> >>
> >> seccomp(attach a syscall filter);
> >> fork();
> >> child does seccomp(attach some lsm filters);
> >>
> >> I think that they *should* be related to the seccomp+bpf hierarchy of
> >> programs in that they are entries in the same logical list of filter
> >> layers installed.  Some of those layers can be syscall filters and
> >> some of the layers can be lsm filters.  If we subsequently add a way
> >> to attach a removable seccomp filter or a way to attach a seccomp
> >> filter that logs failures to some fd watched by an outside monitor, I
> >> think that should work for lsm, too, with more or less the same
> >> interface.
> >>
> >> If we need a way for a sandbox manager to opt different children into
> >> different subsets of fancy filters, then I think that syscall filters
> >> and lsm filters should use the same mechanism.
> >>
> >> I think we might be on the same page here and just saying it different ways.
> > 
> > Sounds like it :)
> > All of the above makes sense to me.
> > The 'orthogonal' part is that the user should be able to use
> > this seccomp-managed hierarchy without actually enabling
> > TIF_SECCOMP for the task and syscalls should still go through
> > fast path and all the way till lsm hooks as normal.
> > I don't want to pay _any_ performance penalty for this feature
> > for lsm hooks (and all syscalls) that don't have bpf programs attached.
> 
> Yes, it seems that we are all on the same page here, and that match this
> RFC implementation. So, using the seccomp(2) *interface* to attach
> Landlock programs to a process hierarchy is still on track. :)
> 

So, I'm catching up on this after a little while away. I really like the 
simplicity of the approach Daniel took with his patches. I began to have 
difficulty reading your patchset once you got into using seccomp + unprivileged 
mode. I would love to see a separate patchset that only have the verifier, and
lsm hook changes. Do you think you could decompose your patchset into an MVP?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: lsm naming dilemma. Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-20  1:10             ` Sargun Dhillon
@ 2016-09-20 16:58               ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-20 16:58 UTC (permalink / raw)
  To: Sargun Dhillon, Alexei Starovoitov
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, linux-api, LSM, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 2137 bytes --]


On 20/09/2016 03:10, Sargun Dhillon wrote:
> I'm fine giving up the Checmate name. Landlock seems easy enough to
> Google. I haven't gotten a chance to look through the entire patchset
> yet, but it does seem like they are somewhat similar.

Excellent! I'm looking forward for your review.


> 
> On Mon, Sep 19, 2016 at 5:12 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Thu, Sep 15, 2016 at 11:25:10PM +0200, Mickaël Salaün wrote:
>>>>> Agreed. With this RFC, the Checmate features (i.e. network helpers)
>>>>> should be able to sit on top of Landlock.
>>>>
>>>> I think neither of them should be called fancy names for no technical reason.
>>>> We will have only one bpf based lsm. That's it and it doesn't
>>>> need an obscure name. Directory name can be security/bpf/..stuff.c
>>>
>>> I disagree on an LSM named "BPF". I first started with the "seccomp LSM"
>>> name (first RFC) but I later realized that it is confusing because
>>> seccomp is associated to its syscall and the underlying features. Same
>>> thing goes for BPF. It is also artificially hard to grep on a name too
>>> used in the kernel source tree.
>>> Making an association between the generic eBPF mechanism and a security
>>> centric approach (i.e. LSM) seems a bit reductive (for BPF). Moreover,
>>> the seccomp interface [1] can still be used.
>>
>> agree with above.
>>
>>> Landlock is a nice name to depict a sandbox as an enclave (i.e. a
>>> landlocked country/state). I want to keep this name, which is simple,
>>> express the goal of Landlock nicely and is comparable to other sandbox
>>> mechanisms as Seatbelt or Pledge.
>>> Landlock should not be confused with the underlying eBPF implementation.
>>> Landlock could use more than only eBPF in the future and eBPF could be
>>> used in other LSM as well.
>>
>> there will not be two bpf based LSMs.
>> Therefore unless you can convince Sargun to give up his 'checmate' name,
>> nothing goes in.
>> The features you both need are 90% the same, so they must be done
>> as part of single LSM whatever you both agree to call it.
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  2016-09-20  4:37                         ` Sargun Dhillon
@ 2016-09-20 17:02                           ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-09-20 17:02 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: Alexei Starovoitov, Andy Lutomirski, linux-kernel,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

[-- Attachment #1.1: Type: text/plain, Size: 7168 bytes --]


On 20/09/2016 06:37, Sargun Dhillon wrote:
> On Thu, Sep 15, 2016 at 09:41:33PM +0200, Mickaël Salaün wrote:
>>
>> On 15/09/2016 06:48, Alexei Starovoitov wrote:
>>> On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
>>>> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
>>>>>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>>>>>>>>>>>
>>>>>>>>>>> This RFC handle both cgroup and seccomp approaches in a similar way. I
>>>>>>>>>>> don't see why building on top of cgroup v2 is a problem. Is there
>>>>>>>>>>> security issues with delegation?
>>>>>>>>>>
>>>>>>>>>> What I mean is: cgroup v2 delegation has a functionality problem.
>>>>>>>>>> Tejun says [1]:
>>>>>>>>>>
>>>>>>>>>> We haven't had to face this decision because cgroup has never properly
>>>>>>>>>> supported delegating to applications and the in-use setups where this
>>>>>>>>>> happens are custom configurations where there is no boundary between
>>>>>>>>>> system and applications and adhoc trial-and-error is good enough a way
>>>>>>>>>> to find a working solution.  That wiggle room goes away once we
>>>>>>>>>> officially open this up to individual applications.
>>>>>>>>>>
>>>>>>>>>> Unless and until that changes, I think that landlock should stay away
>>>>>>>>>> from cgroups.  Others could reasonably disagree with me.
>>>>>>>>>
>>>>>>>>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>>>>>>>>> and not for sandboxing. So the above doesn't matter in such contexts.
>>>>>>>>> lsm hooks + cgroups provide convenient scope and existing entry points.
>>>>>>>>> Please see checmate examples how it's used.
>>>>>>>>>
>>>>>>>>
>>>>>>>> To be clear: I'm not arguing at all that there shouldn't be
>>>>>>>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>>>>>>>> landlock interface shouldn't expose any cgroup integration, at least
>>>>>>>> until the cgroup situation settles down a lot.
>>>>>>>
>>>>>>> ahh. yes. we're perfectly in agreement here.
>>>>>>> I'm suggesting that the next RFC shouldn't include unpriv
>>>>>>> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
>>>>>>> argue about unpriv with cgroups and even unpriv as a whole,
>>>>>>> since it's not a given. Seccomp integration is also questionable.
>>>>>>> I'd rather not have seccomp as a gate keeper for this lsm.
>>>>>>> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
>>>>>>> don't have one to one relationship, so mixing them up is only
>>>>>>> asking for trouble further down the road.
>>>>>>> If we really need to carry some information from seccomp to lsm+bpf,
>>>>>>> it's easier to add eBPF support to seccomp and let bpf side deal
>>>>>>> with passing whatever information.
>>>>>>>
>>>>>>
>>>>>> As an argument for keeping seccomp (or an extended seccomp) as the
>>>>>> interface for an unprivileged bpf+lsm: seccomp already checks off most
>>>>>> of the boxes for safely letting unprivileged programs sandbox
>>>>>> themselves.
>>>>>
>>>>> you mean the attach part of seccomp syscall that deals with no_new_priv?
>>>>> sure, that's reusable.
>>>>>
>>>>>> Furthermore, to the extent that there are use cases for
>>>>>> unprivileged bpf+lsm that *aren't* expressible within the seccomp
>>>>>> hierarchy, I suspect that syscall filters have exactly the same
>>>>>> problem and that we should fix seccomp to cover it.
>>>>>
>>>>> not sure what you mean by 'seccomp hierarchy'. The normal process
>>>>> hierarchy ?
>>>>
>>>> Kind of.  I mean the filter layers that are inherited across fork(),
>>>> the TSYNC mechanism, etc.
>>>>
>>>>> imo the main deficiency of secccomp is inability to look into arguments.
>>>>> One can argue that it's a blessing, since composite args
>>>>> are not yet copied into the kernel memory.
>>>>> But in a lot of cases the seccomp arguments are FDs pointing
>>>>> to kernel objects and if programs could examine those objects
>>>>> the sandboxing scope would be more precise.
>>>>> lsm+bpf solves that part and I'd still argue that it's
>>>>> orthogonal to seccomp's pass/reject flow.
>>>>> I mean if seccomp says 'ok' the syscall should continue executing
>>>>> as normal and whatever LSM hooks were triggered by it may have
>>>>> their own lsm+bpf verdicts.
>>>>
>>>> I agree with all of this...
>>>>
>>>>> Furthermore in the process hierarchy different children
>>>>> should be able to set their own lsm+bpf filters that are not
>>>>> related to parallel seccomp+bpf hierarchy of programs.
>>>>> seccomp syscall can be an interface to attach programs
>>>>> to lsm hooks, but nothing more than that.
>>>>
>>>> I'm not sure what you mean.  I mean that, logically, I think we should
>>>> be able to do:
>>>>
>>>> seccomp(attach a syscall filter);
>>>> fork();
>>>> child does seccomp(attach some lsm filters);
>>>>
>>>> I think that they *should* be related to the seccomp+bpf hierarchy of
>>>> programs in that they are entries in the same logical list of filter
>>>> layers installed.  Some of those layers can be syscall filters and
>>>> some of the layers can be lsm filters.  If we subsequently add a way
>>>> to attach a removable seccomp filter or a way to attach a seccomp
>>>> filter that logs failures to some fd watched by an outside monitor, I
>>>> think that should work for lsm, too, with more or less the same
>>>> interface.
>>>>
>>>> If we need a way for a sandbox manager to opt different children into
>>>> different subsets of fancy filters, then I think that syscall filters
>>>> and lsm filters should use the same mechanism.
>>>>
>>>> I think we might be on the same page here and just saying it different ways.
>>>
>>> Sounds like it :)
>>> All of the above makes sense to me.
>>> The 'orthogonal' part is that the user should be able to use
>>> this seccomp-managed hierarchy without actually enabling
>>> TIF_SECCOMP for the task and syscalls should still go through
>>> fast path and all the way till lsm hooks as normal.
>>> I don't want to pay _any_ performance penalty for this feature
>>> for lsm hooks (and all syscalls) that don't have bpf programs attached.
>>
>> Yes, it seems that we are all on the same page here, and that match this
>> RFC implementation. So, using the seccomp(2) *interface* to attach
>> Landlock programs to a process hierarchy is still on track. :)
>>
> 
> So, I'm catching up on this after a little while away. I really like the 
> simplicity of the approach Daniel took with his patches. I began to have 
> difficulty reading your patchset once you got into using seccomp + unprivileged 
> mode. I would love to see a separate patchset that only have the verifier, and
> lsm hook changes. Do you think you could decompose your patchset into an MVP?
> 

OK, I'll try to split the common parts from the seccomp part, but there
is already a dedicated patch for the LSM hooks [06/22].


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 07/22] landlock: Handle file comparisons
  2016-09-14  7:24 ` [RFC v3 07/22] landlock: Handle file comparisons Mickaël Salaün
  2016-09-14 19:07   ` Jann Horn
  2016-09-14 21:06   ` Alexei Starovoitov
@ 2016-10-03 23:30   ` Kees Cook
  2 siblings, 0 replies; 76+ messages in thread
From: Kees Cook @ 2016-10-03 23:30 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API,
	linux-security-module, Network Development, Cgroups

On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Add eBPF functions to compare file system access with a Landlock file
> system handle:
> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>   This function allows to compare the dentry, inode, device or mount
>   point of the currently accessed file, with a reference handle.
> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>   This function allows an eBPF program to check if the current accessed
>   file is the same or in the hierarchy of a reference handle.
>
> The goal of file system handle is to abstract kernel objects such as a
> struct file or a struct inode. Userland can create this kind of handle
> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
> landlock_handle containing the handle type (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
> also be any descriptions able to match a struct file or a struct inode
> (e.g. path or glob string).
>
> Changes since v2:
> * add MNT_INTERNAL check to only add file handle from user-visible FS
>   (e.g. no anonymous inode)
> * replace struct file* with struct path* in map_landlock_handle
> * add BPF protos
> * fix bpf_landlock_cmp_fs_prop_with_struct_file()
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
> ---
>  include/linux/bpf.h            |  10 +++
>  include/uapi/linux/bpf.h       |  49 +++++++++++
>  kernel/bpf/arraymap.c          |  21 +++++
>  kernel/bpf/verifier.c          |   8 ++
>  security/landlock/Makefile     |   2 +-
>  security/landlock/checker_fs.c | 179 +++++++++++++++++++++++++++++++++++++++++
>  security/landlock/checker_fs.h |  20 +++++
>  security/landlock/lsm.c        |   6 ++
>  8 files changed, 294 insertions(+), 1 deletion(-)
>  create mode 100644 security/landlock/checker_fs.c
>  create mode 100644 security/landlock/checker_fs.h
> [...]
> diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c
> new file mode 100644
> index 000000000000..39eb85dc7d18
> --- /dev/null
> +++ b/security/landlock/checker_fs.c
> @@ -0,0 +1,179 @@
> +/*
> + * Landlock LSM - File System Checkers
> + *
> + * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h> /* enum bpf_map_array_op */
> +#include <linux/errno.h>
> +#include <linux/fs.h> /* path_is_under() */
> +#include <linux/path.h> /* struct path */
> +
> +#include "checker_fs.h"
> +
> +#define EQUAL_NOT_NULL(a, b) (a && a == b)
> +
> +/*
> + * bpf_landlock_cmp_fs_prop_with_struct_file
> + *
> + * Cf. include/uapi/linux/bpf.h
> + */
> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
> +               u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
> +{
> +       u8 property = (u8) r1_property;
> +       struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +       enum bpf_map_array_op map_op = r3_map_op;
> +       struct file *file = (struct file *) (unsigned long) r4_file;
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +       struct path *p1, *p2;
> +       struct map_landlock_handle *handle;
> +       int i;
> +
> +       /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
> +       if (unlikely(!map)) {
> +               WARN_ON(1);
> +               return -EFAULT;
> +       }

Just some minor style/readability nits...

This is more readable as:

if (WARN_ON(!map))
    return -EFAULT;

(WARN_ON already includes the unlikely() and passes through the test result.)

> +       if (unlikely(!file))
> +               return -ENOENT;
> +       if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != _LANDLOCK_FLAG_FS_MASK))
> +               return -EINVAL;
> +
> +       /* for now, only handle OP_OR */
> +       switch (map_op) {
> +       case BPF_MAP_ARRAY_OP_OR:
> +               break;
> +       case BPF_MAP_ARRAY_OP_UNSPEC:
> +       case BPF_MAP_ARRAY_OP_AND:
> +       case BPF_MAP_ARRAY_OP_XOR:
> +       default:
> +               return -EINVAL;
> +       }
> +       p2 = &file->f_path;
> +
> +       synchronize_rcu();
> +
> +       for (i = 0; i < array->n_entries; i++) {
> +               bool result_dentry = !(property & LANDLOCK_FLAG_FS_DENTRY);
> +               bool result_inode = !(property & LANDLOCK_FLAG_FS_INODE);
> +               bool result_device = !(property & LANDLOCK_FLAG_FS_DEVICE);
> +               bool result_mount = !(property & LANDLOCK_FLAG_FS_MOUNT);
> +
> +               handle = (struct map_landlock_handle *)
> +                               (array->value + array->elem_size * i);
> +
> +               if (handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) {
> +                       WARN_ON(1);
> +                       return -EFAULT;
> +               }

Same here... and in the other function (much of which seems to repeat
-- can some of these checks be put into common functions?)

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup
  2016-09-14  7:24 ` [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup Mickaël Salaün
@ 2016-10-03 23:43   ` Kees Cook
  2016-10-05 20:58     ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Kees Cook @ 2016-10-03 23:43 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API,
	linux-security-module, Network Development, Cgroups

On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
> This allows to add new eBPF programs to Landlock hooks dedicated to a
> cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF
> programs, the Landlock hooks attached to a cgroup are propagated to the
> nested cgroups. However, when a new Landlock program is attached to one
> of this nested cgroup, this cgroup hierarchy fork the Landlock hooks.
> This design is simple and match the current CONFIG_BPF_CGROUP
> inheritance. The difference lie in the fact that Landlock programs can
> only be stacked but not removed. This match the append-only seccomp
> behavior. Userland is free to handle Landlock hooks attached to a cgroup
> in more complicated ways (e.g. continuous inheritance), but care should
> be taken to properly handle error cases (e.g. memory allocation errors).
>
> Changes since v2:
> * new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov)
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Daniel Mack <daniel@zonque.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> Link: https://lkml.kernel.org/r/20160826021432.GA8291@ast-mbp.thefacebook.com
> Link: https://lkml.kernel.org/r/20160827204307.GA43714@ast-mbp.thefacebook.com
> ---
>  include/linux/bpf-cgroup.h  |  7 +++++++
>  include/linux/cgroup-defs.h |  2 ++
>  include/linux/landlock.h    |  9 +++++++++
>  include/uapi/linux/bpf.h    |  1 +
>  kernel/bpf/cgroup.c         | 33 ++++++++++++++++++++++++++++++---
>  kernel/bpf/syscall.c        | 11 +++++++++++
>  security/landlock/lsm.c     | 40 +++++++++++++++++++++++++++++++++++++++-
>  security/landlock/manager.c | 32 ++++++++++++++++++++++++++++++++
>  8 files changed, 131 insertions(+), 4 deletions(-)
>
> [...]
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 7b75fa692617..1c18fe46958a 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -15,6 +15,7 @@
>  #include <linux/bpf.h>
>  #include <linux/bpf-cgroup.h>
>  #include <net/sock.h>
> +#include <linux/landlock.h>
>
>  DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
>  EXPORT_SYMBOL(cgroup_bpf_enabled_key);
> @@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp)
>                 union bpf_object pinned = cgrp->bpf.pinned[type];
>
>                 if (pinned.prog) {
> -                       bpf_prog_put(pinned.prog);
> +                       switch (type) {
> +                       case BPF_CGROUP_LANDLOCK:
> +#ifdef CONFIG_SECURITY_LANDLOCK
> +                               put_landlock_hooks(pinned.hooks);
> +                               break;
> +#endif /* CONFIG_SECURITY_LANDLOCK */
> +                       default:
> +                               bpf_prog_put(pinned.prog);
> +                       }
>                         static_branch_dec(&cgroup_bpf_enabled_key);
>                 }
>         }

I get creeped out by type-controlled unions of pointers. :P I don't
have a suggestion to improve this, but I don't like seeing a pointer
type managed separately from the pointer itself as it tends to bypass
a lot of both static and dynamic checking. A union is better than a
cast of void *, but it still worries me. :)

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 19/22] landlock: Add interrupted origin
  2016-09-15  1:19       ` Andy Lutomirski
@ 2016-10-03 23:46         ` Kees Cook
  2016-10-05 21:01           ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Kees Cook @ 2016-10-03 23:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Wed, Sep 14, 2016 at 6:19 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Sep 14, 2016 at 3:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 14/09/2016 20:29, Andy Lutomirski wrote:
>>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>> This third origin of hook call should cover all possible trigger paths
>>>> (e.g. page fault). Landlock eBPF programs can then take decisions
>>>> accordingly.
>>>>
>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>> Cc: Kees Cook <keescook@chromium.org>
>>>> ---
>>>
>>>
>>>>
>>>> +       if (unlikely(in_interrupt())) {
>>>
>>> IMO security hooks have no business being called from interrupts.
>>> Aren't they all synchronous things done by tasks?  Interrupts are
>>> driver things.
>>>
>>> Are you trying to check for page faults and such?
>>
>> Yes, that was the idea you did put in my mind. Not sure how to deal with
>> this.
>>
>
> It's not so easy, unfortunately.  The easiest reliable way might be to
> set a TS_ flag on all syscall entries when TIF_SECCOMP or similar is
> set.

For making this series smaller, let's leave the idea idea of interrupt
hooks out -- the intention is for stricter syscall filtering, yes?

Once things are more well established and there's a use-case for this,
it can be added back in.

-Kees


-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy
  2016-09-14 22:34     ` Mickaël Salaün
@ 2016-10-03 23:52       ` Kees Cook
  2016-10-05 21:05         ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Kees Cook @ 2016-10-03 23:52 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Andy Lutomirski, linux-kernel, Alexei Starovoitov, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP),
	Andrew Morton

On Wed, Sep 14, 2016 at 3:34 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 14/09/2016 20:43, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> A Landlock program will be triggered according to its subtype/origin
>>> bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
>>> Landlock program when a seccomp filter will return RET_LANDLOCK.
>>> Moreover, it is possible to return a 16-bit cookie which will be
>>> readable by the Landlock programs in its context.
>>
>> Are you envisioning that the filters will return RET_LANDLOCK most of
>> the time or rarely?  If it's most of the time, then maybe this could
>> be simplified a bit by unconditionally calling the landlock filter and
>> letting the landlock filter access a struct seccomp_data if needed.
>
> Exposing seccomp_data in a Landlock context may be a good idea. The main
> implication is that Landlock programs may then be architecture specific
> (if dealing with data) as seccomp filters are. Another point is that it
> remove any direct binding between seccomp filters and Landlock programs.
> I will try this (more simple) approach.

Yeah, I would prefer that the seccomp code isn't doing list management
to identify the landlock hooks to trigger, etc. I think that's better
done on the LSM side. And since multiple seccomp filters could trigger
landlock, it may be best to just leave the low 16 bits unused
entirely. Then all state management is handled by the landlock eBPF
maps, not a value coming from seccomp that can get stomped on by new
filters, etc.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-09-14  7:23 ` [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
  2016-09-14 18:51   ` Alexei Starovoitov
@ 2016-10-03 23:53   ` Kees Cook
  2016-10-05 22:02     ` Mickaël Salaün
  1 sibling, 1 reply; 76+ messages in thread
From: Kees Cook @ 2016-10-03 23:53 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API,
	linux-security-module, Network Development, Cgroups

On Wed, Sep 14, 2016 at 12:23 AM, Mickaël Salaün <mic@digikod.net> wrote:
> This new arraymap looks like a set and brings new properties:
> * strong typing of entries: the eBPF functions get the array type of
>   elements instead of CONST_PTR_TO_MAP (e.g.
>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
> * force sequential filling (i.e. replace or append-only update), which
>   allow quick browsing of all entries.
>
> This strong typing is useful to statically check if the content of a map
> can be passed to an eBPF function. For example, Landlock use it to store
> and manage kernel objects (e.g. struct file) instead of dealing with
> userland raw data. This improve efficiency and ensure that an eBPF
> program can only call functions with the right high-level arguments.
>
> The enum bpf_map_handle_type list low-level types (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
> updating a map entry (handle). This handle types are used to infer a
> high-level arraymap type which are listed in enum bpf_map_array_type
> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
>
> For now, this new arraymap is only used by Landlock LSM (cf. next
> commits) but it could be useful for other needs.
>
> Changes since v2:
> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
>   handle entries (suggested by Andy Lutomirski)
> * remove useless checks
>
> Changes since v1:
> * arraymap of handles replace custom checker groups
> * simpler userland API
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Kees Cook <keescook@chromium.org>
> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
> ---
>  include/linux/bpf.h      |  14 ++++
>  include/uapi/linux/bpf.h |  18 +++++
>  kernel/bpf/arraymap.c    | 203 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c    |  12 ++-
>  4 files changed, 246 insertions(+), 1 deletion(-)
>
> [...]
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index a2ac051c342f..94256597eacd 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> [...]
> +       /*
> +        * Limit number of entries in an arraymap of handles to the maximum
> +        * number of open files for the current process. The maximum number of
> +        * handle entries (including all arraymaps) for a process is then
> +        * (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
> +        * is 0, then any entry update is forbidden.
> +        *
> +        * An eBPF program can inherit all the arraymap FD. The worse case is
> +        * to fill a bunch of arraymaps, create an eBPF program, close the
> +        * arraymap FDs, and start again. The maximum number of arraymap
> +        * entries can then be close to RLIMIT_NOFILE^3.
> +        *
> +        * FIXME: This should be improved... any idea?
> +        */
> +       if (unlikely(index >= rlimit(RLIMIT_NOFILE)))
> +               return -EMFILE;

I'm not sure what's best for resource management here. Landlock will
be holding open path structs, for example, but how are you expecting
to track things like network policies? An allowed IP address, for
example, doesn't have a handle outside of doing a full
socket()/connect() setup.

I think an explicit design for resource management should be
considered up front...

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup
  2016-10-03 23:43   ` Kees Cook
@ 2016-10-05 20:58     ` Mickaël Salaün
  2016-10-05 21:25       ` Kees Cook
  0 siblings, 1 reply; 76+ messages in thread
From: Mickaël Salaün @ 2016-10-05 20:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API,
	linux-security-module, Network Development, Cgroups

[-- Attachment #1.1: Type: text/plain, Size: 3826 bytes --]



On 04/10/2016 01:43, Kees Cook wrote:
> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> This allows to add new eBPF programs to Landlock hooks dedicated to a
>> cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF
>> programs, the Landlock hooks attached to a cgroup are propagated to the
>> nested cgroups. However, when a new Landlock program is attached to one
>> of this nested cgroup, this cgroup hierarchy fork the Landlock hooks.
>> This design is simple and match the current CONFIG_BPF_CGROUP
>> inheritance. The difference lie in the fact that Landlock programs can
>> only be stacked but not removed. This match the append-only seccomp
>> behavior. Userland is free to handle Landlock hooks attached to a cgroup
>> in more complicated ways (e.g. continuous inheritance), but care should
>> be taken to properly handle error cases (e.g. memory allocation errors).
>>
>> Changes since v2:
>> * new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov)
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Daniel Mack <daniel@zonque.org>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Tejun Heo <tj@kernel.org>
>> Link: https://lkml.kernel.org/r/20160826021432.GA8291@ast-mbp.thefacebook.com
>> Link: https://lkml.kernel.org/r/20160827204307.GA43714@ast-mbp.thefacebook.com
>> ---
>>  include/linux/bpf-cgroup.h  |  7 +++++++
>>  include/linux/cgroup-defs.h |  2 ++
>>  include/linux/landlock.h    |  9 +++++++++
>>  include/uapi/linux/bpf.h    |  1 +
>>  kernel/bpf/cgroup.c         | 33 ++++++++++++++++++++++++++++++---
>>  kernel/bpf/syscall.c        | 11 +++++++++++
>>  security/landlock/lsm.c     | 40 +++++++++++++++++++++++++++++++++++++++-
>>  security/landlock/manager.c | 32 ++++++++++++++++++++++++++++++++
>>  8 files changed, 131 insertions(+), 4 deletions(-)
>>
>> [...]
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 7b75fa692617..1c18fe46958a 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -15,6 +15,7 @@
>>  #include <linux/bpf.h>
>>  #include <linux/bpf-cgroup.h>
>>  #include <net/sock.h>
>> +#include <linux/landlock.h>
>>
>>  DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
>>  EXPORT_SYMBOL(cgroup_bpf_enabled_key);
>> @@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp)
>>                 union bpf_object pinned = cgrp->bpf.pinned[type];
>>
>>                 if (pinned.prog) {
>> -                       bpf_prog_put(pinned.prog);
>> +                       switch (type) {
>> +                       case BPF_CGROUP_LANDLOCK:
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +                               put_landlock_hooks(pinned.hooks);
>> +                               break;
>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>> +                       default:
>> +                               bpf_prog_put(pinned.prog);
>> +                       }
>>                         static_branch_dec(&cgroup_bpf_enabled_key);
>>                 }
>>         }
> 
> I get creeped out by type-controlled unions of pointers. :P I don't
> have a suggestion to improve this, but I don't like seeing a pointer
> type managed separately from the pointer itself as it tends to bypass
> a lot of both static and dynamic checking. A union is better than a
> cast of void *, but it still worries me. :)

This is not fully satisfactory for me neither but the other approach is
to use two distinct struct fields instead of a union.
Do you prefer if there is a "type" field in the "pinned" struct to
select the union?

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 19/22] landlock: Add interrupted origin
  2016-10-03 23:46         ` Kees Cook
@ 2016-10-05 21:01           ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-10-05 21:01 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

[-- Attachment #1.1: Type: text/plain, Size: 1636 bytes --]


On 04/10/2016 01:46, Kees Cook wrote:
> On Wed, Sep 14, 2016 at 6:19 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Sep 14, 2016 at 3:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> On 14/09/2016 20:29, Andy Lutomirski wrote:
>>>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>> This third origin of hook call should cover all possible trigger paths
>>>>> (e.g. page fault). Landlock eBPF programs can then take decisions
>>>>> accordingly.
>>>>>
>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>> ---
>>>>
>>>>
>>>>>
>>>>> +       if (unlikely(in_interrupt())) {
>>>>
>>>> IMO security hooks have no business being called from interrupts.
>>>> Aren't they all synchronous things done by tasks?  Interrupts are
>>>> driver things.
>>>>
>>>> Are you trying to check for page faults and such?
>>>
>>> Yes, that was the idea you did put in my mind. Not sure how to deal with
>>> this.
>>>
>>
>> It's not so easy, unfortunately.  The easiest reliable way might be to
>> set a TS_ flag on all syscall entries when TIF_SECCOMP or similar is
>> set.
> 
> For making this series smaller, let's leave the idea idea of interrupt
> hooks out -- the intention is for stricter syscall filtering, yes?
> 
> Once things are more well established and there's a use-case for this,
> it can be added back in.

Right, I'm no more convinced it's worth it.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy
  2016-10-03 23:52       ` Kees Cook
@ 2016-10-05 21:05         ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-10-05 21:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, linux-kernel, Alexei Starovoitov, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP),
	Andrew Morton

[-- Attachment #1.1: Type: text/plain, Size: 1827 bytes --]



On 04/10/2016 01:52, Kees Cook wrote:
> On Wed, Sep 14, 2016 at 3:34 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 14/09/2016 20:43, Andy Lutomirski wrote:
>>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>> A Landlock program will be triggered according to its subtype/origin
>>>> bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
>>>> Landlock program when a seccomp filter will return RET_LANDLOCK.
>>>> Moreover, it is possible to return a 16-bit cookie which will be
>>>> readable by the Landlock programs in its context.
>>>
>>> Are you envisioning that the filters will return RET_LANDLOCK most of
>>> the time or rarely?  If it's most of the time, then maybe this could
>>> be simplified a bit by unconditionally calling the landlock filter and
>>> letting the landlock filter access a struct seccomp_data if needed.
>>
>> Exposing seccomp_data in a Landlock context may be a good idea. The main
>> implication is that Landlock programs may then be architecture specific
>> (if dealing with data) as seccomp filters are. Another point is that it
>> remove any direct binding between seccomp filters and Landlock programs.
>> I will try this (more simple) approach.
> 
> Yeah, I would prefer that the seccomp code isn't doing list management
> to identify the landlock hooks to trigger, etc. I think that's better
> done on the LSM side. And since multiple seccomp filters could trigger
> landlock, it may be best to just leave the low 16 bits unused
> entirely. Then all state management is handled by the landlock eBPF
> maps, not a value coming from seccomp that can get stomped on by new
> filters, etc.

Right, this approach should be simpler, more efficient and independent
from seccomp. This will be in the next RFC.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup
  2016-10-05 20:58     ` Mickaël Salaün
@ 2016-10-05 21:25       ` Kees Cook
  0 siblings, 0 replies; 76+ messages in thread
From: Kees Cook @ 2016-10-05 21:25 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API,
	linux-security-module, Network Development, Cgroups

On Wed, Oct 5, 2016 at 1:58 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
>
> On 04/10/2016 01:43, Kees Cook wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> This allows to add new eBPF programs to Landlock hooks dedicated to a
>>> cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF
>>> programs, the Landlock hooks attached to a cgroup are propagated to the
>>> nested cgroups. However, when a new Landlock program is attached to one
>>> of this nested cgroup, this cgroup hierarchy fork the Landlock hooks.
>>> This design is simple and match the current CONFIG_BPF_CGROUP
>>> inheritance. The difference lie in the fact that Landlock programs can
>>> only be stacked but not removed. This match the append-only seccomp
>>> behavior. Userland is free to handle Landlock hooks attached to a cgroup
>>> in more complicated ways (e.g. continuous inheritance), but care should
>>> be taken to properly handle error cases (e.g. memory allocation errors).
>>>
>>> Changes since v2:
>>> * new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov)
>>>
>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> Cc: Daniel Mack <daniel@zonque.org>
>>> Cc: David S. Miller <davem@davemloft.net>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Tejun Heo <tj@kernel.org>
>>> Link: https://lkml.kernel.org/r/20160826021432.GA8291@ast-mbp.thefacebook.com
>>> Link: https://lkml.kernel.org/r/20160827204307.GA43714@ast-mbp.thefacebook.com
>>> ---
>>>  include/linux/bpf-cgroup.h  |  7 +++++++
>>>  include/linux/cgroup-defs.h |  2 ++
>>>  include/linux/landlock.h    |  9 +++++++++
>>>  include/uapi/linux/bpf.h    |  1 +
>>>  kernel/bpf/cgroup.c         | 33 ++++++++++++++++++++++++++++++---
>>>  kernel/bpf/syscall.c        | 11 +++++++++++
>>>  security/landlock/lsm.c     | 40 +++++++++++++++++++++++++++++++++++++++-
>>>  security/landlock/manager.c | 32 ++++++++++++++++++++++++++++++++
>>>  8 files changed, 131 insertions(+), 4 deletions(-)
>>>
>>> [...]
>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>> index 7b75fa692617..1c18fe46958a 100644
>>> --- a/kernel/bpf/cgroup.c
>>> +++ b/kernel/bpf/cgroup.c
>>> @@ -15,6 +15,7 @@
>>>  #include <linux/bpf.h>
>>>  #include <linux/bpf-cgroup.h>
>>>  #include <net/sock.h>
>>> +#include <linux/landlock.h>
>>>
>>>  DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
>>>  EXPORT_SYMBOL(cgroup_bpf_enabled_key);
>>> @@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp)
>>>                 union bpf_object pinned = cgrp->bpf.pinned[type];
>>>
>>>                 if (pinned.prog) {
>>> -                       bpf_prog_put(pinned.prog);
>>> +                       switch (type) {
>>> +                       case BPF_CGROUP_LANDLOCK:
>>> +#ifdef CONFIG_SECURITY_LANDLOCK
>>> +                               put_landlock_hooks(pinned.hooks);
>>> +                               break;
>>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>> +                       default:
>>> +                               bpf_prog_put(pinned.prog);
>>> +                       }
>>>                         static_branch_dec(&cgroup_bpf_enabled_key);
>>>                 }
>>>         }
>>
>> I get creeped out by type-controlled unions of pointers. :P I don't
>> have a suggestion to improve this, but I don't like seeing a pointer
>> type managed separately from the pointer itself as it tends to bypass
>> a lot of both static and dynamic checking. A union is better than a
>> cast of void *, but it still worries me. :)
>
> This is not fully satisfactory for me neither but the other approach is
> to use two distinct struct fields instead of a union.
> Do you prefer if there is a "type" field in the "pinned" struct to
> select the union?

Since memory usage isn't a huge deal for this, I'd actually prefer
there just be no union at all. Have a type field, and a distinct
pointer field for each type you're expecting to use. That way there
can never be confusion between types and you could even validate that
only a single field type has been populated, etc.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-10-03 23:53   ` Kees Cook
@ 2016-10-05 22:02     ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-10-05 22:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Tejun Heo, Will Drewry, kernel-hardening, Linux API,
	linux-security-module, Network Development, Cgroups

[-- Attachment #1.1: Type: text/plain, Size: 4989 bytes --]


On 04/10/2016 01:53, Kees Cook wrote:
> On Wed, Sep 14, 2016 at 12:23 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> This new arraymap looks like a set and brings new properties:
>> * strong typing of entries: the eBPF functions get the array type of
>>   elements instead of CONST_PTR_TO_MAP (e.g.
>>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
>> * force sequential filling (i.e. replace or append-only update), which
>>   allow quick browsing of all entries.
>>
>> This strong typing is useful to statically check if the content of a map
>> can be passed to an eBPF function. For example, Landlock use it to store
>> and manage kernel objects (e.g. struct file) instead of dealing with
>> userland raw data. This improve efficiency and ensure that an eBPF
>> program can only call functions with the right high-level arguments.
>>
>> The enum bpf_map_handle_type list low-level types (e.g.
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
>> updating a map entry (handle). This handle types are used to infer a
>> high-level arraymap type which are listed in enum bpf_map_array_type
>> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
>>
>> For now, this new arraymap is only used by Landlock LSM (cf. next
>> commits) but it could be useful for other needs.
>>
>> Changes since v2:
>> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
>>   handle entries (suggested by Andy Lutomirski)
>> * remove useless checks
>>
>> Changes since v1:
>> * arraymap of handles replace custom checker groups
>> * simpler userland API
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Kees Cook <keescook@chromium.org>
>> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g@mail.gmail.com
>> ---
>>  include/linux/bpf.h      |  14 ++++
>>  include/uapi/linux/bpf.h |  18 +++++
>>  kernel/bpf/arraymap.c    | 203 +++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/bpf/verifier.c    |  12 ++-
>>  4 files changed, 246 insertions(+), 1 deletion(-)
>>
>> [...]
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index a2ac051c342f..94256597eacd 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> [...]
>> +       /*
>> +        * Limit number of entries in an arraymap of handles to the maximum
>> +        * number of open files for the current process. The maximum number of
>> +        * handle entries (including all arraymaps) for a process is then
>> +        * (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
>> +        * is 0, then any entry update is forbidden.
>> +        *
>> +        * An eBPF program can inherit all the arraymap FD. The worse case is
>> +        * to fill a bunch of arraymaps, create an eBPF program, close the
>> +        * arraymap FDs, and start again. The maximum number of arraymap
>> +        * entries can then be close to RLIMIT_NOFILE^3.
>> +        *
>> +        * FIXME: This should be improved... any idea?
>> +        */
>> +       if (unlikely(index >= rlimit(RLIMIT_NOFILE)))
>> +               return -EMFILE;
> 
> I'm not sure what's best for resource management here. Landlock will
> be holding open path structs, for example, but how are you expecting
> to track things like network policies? An allowed IP address, for
> example, doesn't have a handle outside of doing a full
> socket()/connect() setup.

Path and file references are hard to handle correctly but other things
should be simpler. External resources (i.e. not relative to the running
system as paths are) like network hosts or ports could simply be
expressed as raw values (like used for iptables rules). Moreover, for
network rules, relying on raw packet values (as
BPF_PROG_TYPE_SOCKET_FILTER have access to) may be more than enough.

> 
> I think an explicit design for resource management should be
> considered up front...

I'm not really sure how to handle that part…

There is basically two ways to express a "kernel object": relative (with
an internal pointer to a struct, e.g. struct file) or absolute (a raw
value). Both of them use kernel memory. However, only the former may
impact other parts of the kernel (e.g. can force to hold a kernel object
like a struct dentry). The impact of this is not clear for me but it
looks like other resource managements for a process: number of open
files, number of network connections…

The more reasonable approach seems to charge the user for the (kernel)
memory dedicated to the user's policy. How can I do it? Maybe to
decrement the RLIMIT_NPROC and check the RLIMIT_AS (i.e. act like a
virtual process)?

There is no such limits with other eBPF maps (even those dealing with
FD), so this may be too much.

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 04/22] bpf: Set register type according to is_valid_access()
  2016-09-14  7:23 ` [RFC v3 04/22] bpf: Set register type according to is_valid_access() Mickaël Salaün
@ 2016-10-19 14:54   ` Thomas Graf
  2016-10-19 15:10     ` Daniel Borkmann
  0 siblings, 1 reply; 76+ messages in thread
From: Thomas Graf @ 2016-10-19 14:54 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On 09/14/16 at 09:23am, Mickaël Salaün wrote:
> This fix a pointer leak when an unprivileged eBPF program read a pointer
> value from the context. Even if is_valid_access() returns a pointer
> type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
> value containing an address is then allowed to leak. Moreover, this
> prevented unprivileged eBPF programs to use functions with (legitimate)
> pointer arguments.
> 
> This bug was not a problem until now because the only unprivileged eBPF
> program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
> from its context are UNKNOWN_VALUE.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Fixes: 969bf05eb3ce ("bpf: direct packet access")
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>

Can you post this fix separately? It's valid and needed outside of the
scope of this series.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier
  2016-09-14  7:23 ` [RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier Mickaël Salaün
@ 2016-10-19 15:01   ` Thomas Graf
  0 siblings, 0 replies; 76+ messages in thread
From: Thomas Graf @ 2016-10-19 15:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On 09/14/16 at 09:23am, Mickaël Salaün wrote:
> @@ -155,6 +163,7 @@ union bpf_attr {
>  		__u32		log_size;	/* size of user buffer */
>  		__aligned_u64	log_buf;	/* user supplied buffer */
>  		__u32		kern_version;	/* checked when prog_type=kprobe */
> +		union bpf_prog_subtype prog_subtype;	/* checked when prog_type=landlock */

The comment seems bogus, this is not landlock specific.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 04/22] bpf: Set register type according to is_valid_access()
  2016-10-19 14:54   ` Thomas Graf
@ 2016-10-19 15:10     ` Daniel Borkmann
  0 siblings, 0 replies; 76+ messages in thread
From: Daniel Borkmann @ 2016-10-19 15:10 UTC (permalink / raw)
  To: Thomas Graf, Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev, cgroups

On 10/19/2016 04:54 PM, Thomas Graf wrote:
> On 09/14/16 at 09:23am, Mickaël Salaün wrote:
>> This fix a pointer leak when an unprivileged eBPF program read a pointer
>> value from the context. Even if is_valid_access() returns a pointer
>> type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
>> value containing an address is then allowed to leak. Moreover, this
>> prevented unprivileged eBPF programs to use functions with (legitimate)
>> pointer arguments.
>>
>> This bug was not a problem until now because the only unprivileged eBPF
>> program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
>> from its context are UNKNOWN_VALUE.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Fixes: 969bf05eb3ce ("bpf: direct packet access")
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>
> Can you post this fix separately? It's valid and needed outside of the
> scope of this series.

Yes, that one was already merged:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1955351da41caa1dbf4139191358fed84909d64b

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 06/22] landlock: Add LSM hooks
  2016-09-14  7:23 ` [RFC v3 06/22] landlock: Add LSM hooks Mickaël Salaün
@ 2016-10-19 15:19   ` Thomas Graf
  2016-10-19 22:42     ` Mickaël Salaün
  0 siblings, 1 reply; 76+ messages in thread
From: Thomas Graf @ 2016-10-19 15:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

On 09/14/16 at 09:23am, Mickaël Salaün wrote:
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9aa01d9d3d80..36c3e482239c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -85,6 +85,8 @@ enum bpf_arg_type {
>  
>  	ARG_PTR_TO_CTX,		/* pointer to context */
>  	ARG_ANYTHING,		/* any (initialized) argument is ok */
> +
> +	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */

This should go into patch 7 I guess?

> +void __init landlock_add_hooks(void)
> +{
> +	pr_info("landlock: Becoming ready for sandboxing\n");
> +	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks));
> +}

Can we add the hooks when we load the first BPF program for a hook? That
would also allow to not make this conditional on a new config option
which all all distros have to enable anyway.

I would really like to see this patch split into the LSM part which
allows running BPF progs at LSM and your specific sandboxing use case
which requires the new BPF helpers, new reg type, etc.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC v3 06/22] landlock: Add LSM hooks
  2016-10-19 15:19   ` Thomas Graf
@ 2016-10-19 22:42     ` Mickaël Salaün
  0 siblings, 0 replies; 76+ messages in thread
From: Mickaël Salaün @ 2016-10-19 22:42 UTC (permalink / raw)
  To: Thomas Graf
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, cgroups

[-- Attachment #1.1: Type: text/plain, Size: 1505 bytes --]


On 19/10/2016 17:19, Thomas Graf wrote:
> On 09/14/16 at 09:23am, Mickaël Salaün wrote:
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 9aa01d9d3d80..36c3e482239c 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -85,6 +85,8 @@ enum bpf_arg_type {
>>  
>>  	ARG_PTR_TO_CTX,		/* pointer to context */
>>  	ARG_ANYTHING,		/* any (initialized) argument is ok */
>> +
>> +	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */
> 
> This should go into patch 7 I guess?

Right, the ARG_PTR_* are only used by BPF helpers.

> 
>> +void __init landlock_add_hooks(void)
>> +{
>> +	pr_info("landlock: Becoming ready for sandboxing\n");
>> +	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks));
>> +}
> 
> Can we add the hooks when we load the first BPF program for a hook? That
> would also allow to not make this conditional on a new config option
> which all all distros have to enable anyway.

We could either add hook by hook or all hooks at once when loading a BPF
program for which its subtype match the hook type, but I'm not sure it
is worth it.

I'd like to enable this LSM by default but we should be able to disable
it if needed, like most kernel features.

> 
> I would really like to see this patch split into the LSM part which
> allows running BPF progs at LSM and your specific sandboxing use case
> which requires the new BPF helpers, new reg type, etc.
> 

I'll try to split it as much as possible.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, back to index

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14  7:23 [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
2016-09-14  7:23 ` [RFC v3 01/22] landlock: Add Kconfig Mickaël Salaün
2016-09-14  7:23 ` [RFC v3 02/22] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
2016-09-14  7:23 ` [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
2016-09-14 18:51   ` Alexei Starovoitov
2016-09-14 23:22     ` Mickaël Salaün
2016-09-14 23:28       ` Alexei Starovoitov
2016-09-15 21:51         ` Mickaël Salaün
2016-10-03 23:53   ` Kees Cook
2016-10-05 22:02     ` Mickaël Salaün
2016-09-14  7:23 ` [RFC v3 04/22] bpf: Set register type according to is_valid_access() Mickaël Salaün
2016-10-19 14:54   ` Thomas Graf
2016-10-19 15:10     ` Daniel Borkmann
2016-09-14  7:23 ` [RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier Mickaël Salaün
2016-10-19 15:01   ` Thomas Graf
2016-09-14  7:23 ` [RFC v3 06/22] landlock: Add LSM hooks Mickaël Salaün
2016-10-19 15:19   ` Thomas Graf
2016-10-19 22:42     ` Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 07/22] landlock: Handle file comparisons Mickaël Salaün
2016-09-14 19:07   ` Jann Horn
2016-09-14 22:39     ` Mickaël Salaün
2016-09-14 21:06   ` Alexei Starovoitov
2016-09-14 23:02     ` Mickaël Salaün
2016-09-14 23:24       ` Alexei Starovoitov
2016-09-15 21:25         ` Mickaël Salaün
2016-09-20  0:12           ` lsm naming dilemma. " Alexei Starovoitov
2016-09-20  1:10             ` Sargun Dhillon
2016-09-20 16:58               ` Mickaël Salaün
2016-10-03 23:30   ` Kees Cook
2016-09-14  7:24 ` [RFC v3 08/22] seccomp: Fix documentation for struct seccomp_filter Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 09/22] seccomp: Move struct seccomp_filter in seccomp.h Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 10/22] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy Mickaël Salaün
2016-09-14 18:43   ` Andy Lutomirski
2016-09-14 22:34     ` Mickaël Salaün
2016-10-03 23:52       ` Kees Cook
2016-10-05 21:05         ` Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 12/22] bpf: Cosmetic change for bpf_prog_attach() Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 13/22] bpf/cgroup: Replace struct bpf_prog with union bpf_object Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code Mickaël Salaün
2016-09-14 21:16   ` Alexei Starovoitov
2016-09-14  7:24 ` [RFC v3 15/22] bpf/cgroup: Move capability check Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup Mickaël Salaün
2016-10-03 23:43   ` Kees Cook
2016-10-05 20:58     ` Mickaël Salaün
2016-10-05 21:25       ` Kees Cook
2016-09-14  7:24 ` [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd() Mickaël Salaün
2016-09-14 22:06   ` Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks Mickaël Salaün
2016-09-14 18:27   ` Andy Lutomirski
2016-09-14 22:11     ` Mickaël Salaün
2016-09-15  1:25       ` Andy Lutomirski
2016-09-15  2:19         ` Alexei Starovoitov
2016-09-15  2:27           ` Andy Lutomirski
2016-09-15  4:00             ` Alexei Starovoitov
2016-09-15  4:08               ` Andy Lutomirski
2016-09-15  4:31                 ` Alexei Starovoitov
2016-09-15  4:38                   ` Andy Lutomirski
2016-09-15  4:48                     ` Alexei Starovoitov
2016-09-15 19:41                       ` Mickaël Salaün
2016-09-20  4:37                         ` Sargun Dhillon
2016-09-20 17:02                           ` Mickaël Salaün
2016-09-15 19:35         ` Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 19/22] landlock: Add interrupted origin Mickaël Salaün
2016-09-14 18:29   ` Andy Lutomirski
2016-09-14 22:14     ` Mickaël Salaün
2016-09-15  1:19       ` Andy Lutomirski
2016-10-03 23:46         ` Kees Cook
2016-10-05 21:01           ` Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 20/22] landlock: Add update and debug access flags Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context Mickaël Salaün
2016-09-14 21:20   ` Alexei Starovoitov
2016-09-14 22:46     ` Mickaël Salaün
2016-09-14  7:24 ` [RFC v3 22/22] samples/landlock: Add sandbox example Mickaël Salaün
2016-09-14 21:24   ` Alexei Starovoitov
2016-09-14 14:36 ` [RFC v3 00/22] Landlock LSM: Unprivileged sandboxing David Laight

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git