linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
@ 2016-08-25 10:32 Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 01/10] landlock: Add Kconfig Mickaël Salaün
                   ` (13 more replies)
  0 siblings, 14 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

Hi,

This series is a proof of concept to fill some missing part of seccomp as the
ability to check syscall argument pointers or creating more dynamic security
policies. The goal of this new stackable Linux Security Module (LSM) called
Landlock is to allow any process, including unprivileged ones, to create
powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
bugs or unexpected/malicious behaviors in userland applications.

The first RFC [1] was focused on extending seccomp while staying at the syscall
level. This brought a working PoC but with some (mitigated) ToCToU race
conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
syscall argument evaluation (hence the LSM hooks).


# Landlock LSM

This second RFC is a fresh revamp of the code while keeping some working ideas.
This series is mainly focused on LSM hooks, while keeping the possibility to
tied them to syscalls. This new code removes all race conditions by design. It
now use eBPF instead of a subset of cBPF (as used by seccomp-bpf). This allow
to remove the previous stacked cBPF hack to do complex access checks thanks to
dedicated eBPF functions. An eBPF program is still very limited (i.e. can only
call a whitelist of functions) and can not do a denial of service (i.e. no
loop). The other major improvement is the replacement of the previous custom
checker groups of syscall arguments with a new dedicated eBPF map to collect
and compare Landlock handles with system resources (e.g. files or network
connections).

The approach taken is to add the minimum amount of code while still allowing
the userland to create quite complex access rules. A dedicated security policy
language such as used by SELinux, AppArmor and other major LSMs is a lot of
code and dedicated to a trusted process (i.e. root/administrator).


# eBPF

To get an expressive language while still being safe and small, Landlock is
based on eBPF. Landlock should be usable by untrusted processes and must then
expose a minimal attack surface. The eBPF bytecode is minimal while powerful,
widely used and thought to be used by not so trusted application. Reusing this
code allows to not reproduce the same mistakes and minimize new code  while
still taking a generic approach. There is only some new features like a new
kind of arraymap and few dedicated eBPF functions.

An eBPF program have access to an eBPF context which contains the LSM hook
arguments (as does seccomp-bpf with syscall arguments). They can be used
directly or passed to helper functions according to their types. It is then
possible to do complex access checks without race conditions nor inconsistent
evaluation (i.e. incorrect mirroring of the OS code and state [2]).

There is one new eBPF program type per LSM hook. This allow to statically check
which context access is performed by an eBPF program. This is needed to deny
kernel address leak and ensure the right use of LSM hook arguments with eBPF
functions. Moreover, this safe pointer handling remove the need for runtime
check or abstract data, which improve performances. Any user can add multiple
Landlock eBPF programs per LSM hook. They are stacked and evaluated one after
the other (cf. seccomp-bpf).


# LSM hooks

Contrary to syscalls, LSM hooks are security checkpoints and are not
architecture dependant. They are designed to match a security need reflected by
a security policy (e.g. access to a file). Exposing parts of some LSM hooks
instead of using the syscall API for sandboxing should help to avoid bugs and
hacks as encountered by the first RFC. Instead of redoing the work of the LSM
hooks through syscalls, we should use and expose them as does policies of
access control LSM.

Only a subset of the hooks are meaningful for an unprivileged sandbox mechanism
(e.g. file system or network access control). Landlock use an abstraction of
raw LSM hooks, which allow to deal with possible future API changes of the LSM
hook API. Moreover, thanks to the ePBF program typing (per LSM hook) used by
Landlock, it should not be hard to make such evolutions backward compatible.


# Use case scenario

First, a process need to create a new dedicated eBPF map containing handles.
This handles are references to system resources (e.g. file or directory) and
grouped in one or multiple maps to be efficiently managed and checked in
batches. This kind of map can be passed to Landlock eBPF functions to compare,
for example, with a file access request. The handles are only accessible from
the eBPF programs created by the same thread.

The loaded Landlock eBPF programs can be triggered by a seccomp filter
returning RET_LANDLOCK. In addition, a cookie (16-bit value) can be passed from
a seccomp filter to eBPF programs. This allow flexible security policies
between seccomp and Landlock.

A triggered Landlock eBPF program can then allow or deny an access, according
to its type (i.e. LSM hook), thanks to errno return values.


# Sandbox example with conditional access control depending on cgroup

  $ mkdir /sys/fs/cgroup/sandboxed
  $ ls /home
  user1
  $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
      LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
      ./sandbox /bin/sh -i
  $ ls /home
  user1
  $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
  $ ls /home
  ls: cannot open directory '/home': Permission denied


# Current limitations and possible improvements

For now, eBPF programs can only return an errno code. It may be interesting to
be able to do other actions like seccomp-filter does (e.g. kill process). Such
features can easily be implemented but the main advantage of the current
approach is to be able to only execute eBPF programs until one return an errno
code instead of executing all programs like seccomp-filter does.

It is quite easy to add new eBPF functions to extend Landlock. The main concern
should be about the ability to leak information from the current process to
another one (e.g. through maps) to not reproduce the same security sensitive
behavior as ptrace.

This design does not seems too intrusive but is flexible enough to allow a
powerful sandbox mechanism accessible by any process on Linux. The use of
seccomp and Landlock is more suitable with the help of a userland library (e.g.
libseccomp) that could help to specify a high-level language to express a
security policy instead of raw eBPF programs. Moreover, thanks to LLVM, it is
possible to express an eBPF program with a subset of C.


# FAQ

## Why not use a language like used by SElinux or AppArmor?

This kind of LSMs are dedicated to administrators. They already manage the
system and are not a threat to the system security. However, seccomp, and
Landlock too, should be available to anyone, which potentially include
untrusted users and processes. To reduce the attack surface, Landlock should
expose the minimum amount of code, hence minimal complexity. Moreover, another
threat is to make accessible to a malicious code a new way to gain more
information. For example, Landlock features should not allow a program to get
the file owner if the directory containing this file is not readable. This data
could then be exfiltrated thanks to the access result. Thus, we should limit
the expressiveness of the available checks. The current approach is to do the
checks in such a way that only a comparison with an already accessed resource
(e.g. file descriptor) is possible. This allow to have a reference to compare
with, without exposing much information.


## Why a new LSM? Does SELinux, AppArmor, Smack or Tomoyo are not good enough?

The current access control LSMs are fine for their purpose which is to give the
*root* the ability to enforce a security policy for the *system*. What is
missing is a way to enforce a security policy for any applications by its
developer and *unprivileged user* as seccomp can do for raw syscall filtering.
Moreover, Landlock handles stacked hook programs from different users. It must
then ensure there is no possible malicious interactions between this programs.

Difference with other (access control) LSMs:
* not only dedicated to administrators (i.e. no_new_priv);
* limited kernel attack surface (e.g. policy parsing);
* helpers to compare complex objects (path/FD), no access to internal kernel
  data (do not leak addresses);
* constraint policy rules/programs (no DoS: deterministic execution time);
* do not leak more information than the loader process can legitimately have
  access to (minimize metadata inference): must compare from an already allowed
  file (through a handle).


## Why does seccomp-filter is not enough?

A seccomp filter can access to raw syscall arguments which means that it is not
possible to filter according to pointed data as a file path. As demonstrated
the first version of this patch series, filtering at the syscall level is
complicated (e.g. need to take care of race conditions). This is mainly because
the access control checkpoints of the kernel are not at this high-level but
more underneath, at LSM hooks level. The LSM hooks are designed to handle this
kind of checks. This series use this approach to leverage the ability of
unprivileged users to limit themselves.

Cf. "What it isn't?" in Documentation/prctl/seccomp_filter.txt


## As a developer, why do I need this feature?

Landlock's goal is to help userland to limit its attack surface.
Security-conscious developers would like to protect users from a security bug
in their applications and the third-party dependencies they are using. Such a
bug can compromise all the user data and help an attacker to perform a
privilege escalation. Using an *unprivileged sandbox* feature such as Landlock
empower the developer with the ability to properly compartmentalize its
software and limit the impact of being compromised.


## As a user, why do I need a this feature?

Any user can already use seccomp-filter to whitelist a set of syscalls to
reduce the kernel attack surface for a set of processes. However an
unprivileged user can't create a security policy as the root user can thanks to
SELinux and other access control LSMs. Landlock allows any unprivileged user to
protect their data from being accessed by any process they run but only an
identified subset. User tools can be created to help create such a high-level
access control policy. This policy may not be powerful enough to express the
same policies as the current access control LSMs, because of the threat an
unprivileged user can be to the system, but it should be enough for most
use-cases (e.g. blacklist or whitelist a set of file hierarchies).


## Does Landlock can limit network access or other resources?

Limiting network access is obviously in the scope of Landlock but it is not yet
implemented. The main goal now is to get feedback about the whole concept, the
API and the file access control part. More access control types could be
implemented in the future.


## Why using the seccomp(2) syscall?

Landlock use the same semantic as seccomp to apply access rule restrictions. It
add a new layer of security for the current process which is inherited by its
childs. It make sense to use an unique access-restricting syscall (that should
be allowed by seccomp-filter rules) which can only drop privileges. Moreover, a
Landlock eBPF program could come from outside a process (e.g. passed through a
UNIX socket). It is then useful to differentiate the creation/load of Landlock
eBPF programs via bpf(2), from rule enforcing via seccomp(2).


# Differences from the RFC v1

* focus on the LSM hooks, not the syscalls:
  * much more simple implementation
  * does not need audit cache tricks to avoid race conditions
  * more simple to use and more generic because using the LSM hook abstraction
    directly
  * more efficient because only checking in LSM hooks
  * architecture agnostic
* switch from cBPF to eBPF:
  * new eBPF program types dedicated to Landlock
  * custom functions used by the eBPF program
  * gain some new features (e.g. 10 registers, can load values of different
	size, LLVM translator) but only a few functions allowed and a dedicated map
    type
  * new context: LSM hook ID, cookie and LSM hook arguments
  * need to set the sysctl kernel.unprivileged_bpf_disable to 0 (default value)
    to be able to load hook filters as unprivileged users
* smaller and simpler:
  * no more checker groups but dedicated arraymap of handles
  * simpler userland structs thanks to eBPF functions
* distinctive name: Landlock


[1] https://lkml.kernel.org/r/1458784008-16277-1-git-send-email-mic@digikod.net
[2] https://crypto.stanford.edu/cs155/papers/traps.pdf


This series can be applied on Linux 4.7 and be tested with
CONFIG_SECURITY_LANDLOCK and CONFIG_CGROUPS. I would really appreciate
constructive comments on the usability, architecture, code and userland API of
Landlock LSM.

Regards,

Mickaël Salaün (10):
  landlock: Add Kconfig
  bpf: Move u64_to_ptr() to BPF headers and inline it
  bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  seccomp: Split put_seccomp_filter() with put_seccomp()
  seccomp: Handle Landlock
  landlock: Add LSM hooks
  landlock: Add errno check
  landlock: Handle file system comparisons
  landlock: Handle cgroups
  samples/landlock: Add sandbox example

 include/linux/bpf.h                   |  41 +++++
 include/linux/lsm_hooks.h             |   5 +
 include/linux/seccomp.h               |  54 ++++++-
 include/uapi/asm-generic/errno-base.h |   1 +
 include/uapi/linux/bpf.h              | 103 ++++++++++++
 include/uapi/linux/seccomp.h          |   2 +
 kernel/bpf/arraymap.c                 | 222 +++++++++++++++++++++++++
 kernel/bpf/syscall.c                  |  18 ++-
 kernel/bpf/verifier.c                 |  32 +++-
 kernel/fork.c                         |  41 ++++-
 kernel/seccomp.c                      | 211 +++++++++++++++++++++++-
 samples/Makefile                      |   2 +-
 samples/landlock/.gitignore           |   1 +
 samples/landlock/Makefile             |  16 ++
 samples/landlock/sandbox.c            | 295 ++++++++++++++++++++++++++++++++++
 security/Kconfig                      |   1 +
 security/Makefile                     |   2 +
 security/landlock/Kconfig             |  19 +++
 security/landlock/Makefile            |   3 +
 security/landlock/checker_cgroup.c    |  96 +++++++++++
 security/landlock/checker_cgroup.h    |  18 +++
 security/landlock/checker_fs.c        | 183 +++++++++++++++++++++
 security/landlock/checker_fs.h        |  20 +++
 security/landlock/lsm.c               | 228 ++++++++++++++++++++++++++
 security/security.c                   |   1 +
 25 files changed, 1592 insertions(+), 23 deletions(-)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/checker_cgroup.c
 create mode 100644 security/landlock/checker_cgroup.h
 create mode 100644 security/landlock/checker_fs.c
 create mode 100644 security/landlock/checker_fs.h
 create mode 100644 security/landlock/lsm.c

-- 
2.8.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC v2 01/10] landlock: Add Kconfig
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

Initial Landlock Kconfig needed to split the Landlock eBPF and seccomp
parts to ease the review.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---
 security/Kconfig          |  1 +
 security/landlock/Kconfig | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)
 create mode 100644 security/landlock/Kconfig

diff --git a/security/Kconfig b/security/Kconfig
index 176758cdfa57..be6c549dd0ca 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -124,6 +124,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/landlock/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index 000000000000..dc8328d216d7
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,16 @@
+config SECURITY_LANDLOCK
+	bool "Landlock sandbox support"
+	depends on SECURITY
+	select BPF_SYSCALL
+	select SECCOMP
+	default y
+	help
+	  Landlock is a stacked LSM which allows any user to load a security policy
+	  to restrict their processes (i.e. create a sandbox). The policy is a list
+	  of stacked eBPF programs for some LSM hooks. Each program can do some
+	  access comparison to check if an access request is legitimate.
+
+	  Further information about eBPF can be found in
+	  Documentation/networking/filter.txt
+
+	  If you are unsure how to answer this question, answer Y.
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 01/10] landlock: Add Kconfig Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

This helper will be useful for arraymap (next commit).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h  | 6 ++++++
 kernel/bpf/syscall.c | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0de4de6dd43e..ca3742729ae7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -251,6 +251,12 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
 
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
+
+/* helper to convert user pointers passed inside __aligned_u64 fields */
+static inline void __user *u64_to_ptr(__u64 val)
+{
+	return (void __user *) (unsigned long) val;
+}
 #else
 static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 46ecce4b79ed..d305a3ce0fa7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -247,12 +247,6 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
 	return map;
 }
 
-/* helper to convert user pointers passed inside __aligned_u64 fields */
-static void __user *u64_to_ptr(__u64 val)
-{
-	return (void __user *) (unsigned long) val;
-}
-
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
 	return -ENOTSUPP;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 01/10] landlock: Add Kconfig Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

This new arraymap looks like a set and brings new properties:
* strong typing of entries: the eBPF functions get the array type of
  elements instead of CONST_PTR_TO_MAP (e.g.
  CONST_PTR_TO_LANDLOCK_HANDLE_FS);
* force sequential filling (i.e. replace or append-only update), which
  allow quick browsing of all entries.

This strong typing is useful to statically check if the content of a map
can be passed to an eBPF function. For example, Landlock use it to store
and manage kernel objects (e.g. struct file) instead of dealing with
userland raw data. This improve efficiency and ensure that an eBPF
program can only call functions with the right high-level arguments.

The enum bpf_map_handle_type list low-level types (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
updating a map entry (handle). This handle types are used to infer a
high-level arraymap type which are listed in enum bpf_map_array_type
(e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).

For now, this new arraymap is only used by Landlock LSM (cf. next
commits) but it could be useful for other needs.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
---
 include/linux/bpf.h      |  18 +++++
 include/uapi/linux/bpf.h |  18 +++++
 kernel/bpf/arraymap.c    | 181 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c     |   9 ++-
 kernel/bpf/verifier.c    |  12 +++-
 5 files changed, 235 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ca3742729ae7..9a5b388be099 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -12,6 +12,10 @@
 #include <linux/file.h>
 #include <linux/percpu.h>
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include <linux/fs.h> /* struct file */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 struct bpf_map;
 
 /* map is generic key/value storage optionally accesible by eBPF programs */
@@ -34,6 +38,7 @@ struct bpf_map_ops {
 struct bpf_map {
 	atomic_t refcnt;
 	enum bpf_map_type map_type;
+	enum bpf_map_array_type map_array_type;
 	u32 key_size;
 	u32 value_size;
 	u32 max_entries;
@@ -183,12 +188,25 @@ struct bpf_array {
 	 */
 	enum bpf_prog_type owner_prog_type;
 	bool owner_jited;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	u32 n_entries;	/* number of entries in a handle array */
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	union {
 		char value[0] __aligned(8);
 		void *ptrs[0] __aligned(8);
 		void __percpu *pptrs[0] __aligned(8);
 	};
 };
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct map_landlock_handle {
+	u32 type;
+	union {
+		struct file *file;
+	};
+};
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #define MAX_TAIL_CALL_CNT 32
 
 u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 406459b935a2..a60eedc17d40 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -84,6 +84,15 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_PERCPU_HASH,
 	BPF_MAP_TYPE_PERCPU_ARRAY,
 	BPF_MAP_TYPE_STACK_TRACE,
+	BPF_MAP_TYPE_LANDLOCK_ARRAY,
+};
+
+enum bpf_map_array_type {
+	BPF_MAP_ARRAY_TYPE_UNSPEC,
+};
+
+enum bpf_map_handle_type {
+	BPF_MAP_HANDLE_TYPE_UNSPEC,
 };
 
 enum bpf_prog_type {
@@ -386,4 +395,13 @@ struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
+/* Map handle entry */
+struct landlock_handle {
+	__u32 type; /* enum bpf_map_handle_type */
+	union {
+		__u32 fd;
+		__aligned_u64 glob;
+	};
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 76d5a794e426..5938b8ee475b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -16,6 +16,8 @@
 #include <linux/mm.h>
 #include <linux/filter.h>
 #include <linux/perf_event.h>
+#include <linux/file.h> /* fput() */
+#include <linux/fs.h> /* struct file */
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
@@ -491,3 +493,182 @@ static int __init register_perf_event_array_map(void)
 	return 0;
 }
 late_initcall(register_perf_event_array_map);
+
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+static struct bpf_map *landlock_array_map_alloc(union bpf_attr *attr)
+{
+	if (attr->value_size != sizeof(struct landlock_handle))
+		return ERR_PTR(-EINVAL);
+	attr->value_size = sizeof(struct map_landlock_handle);
+
+	return array_map_alloc(attr);
+}
+
+static void landlock_put_handle(struct map_landlock_handle *handle)
+{
+	switch (handle->type) {
+		/* TODO: add handle types */
+	default:
+		WARN_ON(1);
+	}
+	/* safeguard */
+	handle->type = BPF_MAP_HANDLE_TYPE_UNSPEC;
+}
+
+static void landlock_array_map_free(struct bpf_map *map)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	int i;
+
+	synchronize_rcu();
+
+	for (i = 0; i < array->n_entries; i++)
+		landlock_put_handle((struct map_landlock_handle *)
+				(array->value + array->elem_size * i));
+	kvfree(array);
+}
+
+static enum bpf_map_array_type landlock_get_array_type(
+		enum bpf_map_handle_type handle_type)
+{
+	switch (handle_type) {
+		/* TODO: add handle types */
+	case BPF_MAP_HANDLE_TYPE_UNSPEC:
+	default:
+		return -EINVAL;
+	}
+}
+
+#define FGET_OR_RET(file, fd) { \
+	file = fget(fd); \
+	if (unlikely(IS_ERR(file))) \
+		return PTR_ERR(file); \
+	}
+
+static inline long landlock_store_handle(struct map_landlock_handle *dst,
+		struct landlock_handle *khandle)
+{
+	struct path kpath;
+
+	if (unlikely(!khandle))
+		return -EINVAL;
+
+	/* access control already done for the FD */
+
+	switch (khandle->type) {
+		/* TODO: add handle types */
+	default:
+		WARN_ON(1);
+		path_put(&kpath);
+		return -EINVAL;
+	}
+	dst->type = khandle->type;
+	return 0;
+}
+
+static void *nop_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	return ERR_PTR(-EINVAL);
+}
+
+/* called from syscall or from eBPF program */
+static int landlock_array_map_update_elem(struct bpf_map *map, void *key,
+		void *value, u64 map_flags)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u32 index = *(u32 *)key;
+	enum bpf_map_array_type array_type;
+	int ret;
+	struct landlock_handle *khandle = (struct landlock_handle *)value;
+	struct map_landlock_handle *handle, handle_new;
+
+	if (unlikely(map_flags > BPF_EXIST))
+		/* unknown flags */
+		return -EINVAL;
+
+	if (unlikely(index >= array->map.max_entries))
+		/* all elements were pre-allocated, cannot insert a new one */
+		return -E2BIG;
+
+	/* FIXME: add lock */
+	if (unlikely(index > array->n_entries))
+		/* only replace an existing entry or append a new one */
+		return -EINVAL;
+
+	/* TODO: handle all flags, not only BPF_ANY */
+	if (unlikely(map_flags == BPF_NOEXIST))
+		/* all elements already exist */
+		return -EEXIST;
+
+	if (unlikely(!khandle))
+		return -EINVAL;
+
+	array_type = landlock_get_array_type(khandle->type);
+	if (array_type < 0)
+		return array_type;
+
+	if (!map->map_array_type) {
+		/* set the initial set type */
+		map->map_array_type = array_type;
+	} else if (map->map_array_type != array_type) {
+		return -EINVAL;
+	}
+
+	ret = landlock_store_handle(&handle_new, khandle);
+	if (!ret) {
+		/* map->value_size == sizeof(struct map_landlock_handle) */
+		handle = (struct map_landlock_handle *)
+			(array->value + array->elem_size * index);
+		/* FIXME: make atomic update */
+		if (index < array->n_entries)
+			landlock_put_handle(handle);
+		*handle = handle_new;
+		/* TODO: use atomic_inc? */
+		if (index == array->n_entries)
+			array->n_entries++;
+	}
+	/* FIXME: unlock */
+
+	return ret;
+}
+
+/* called from syscall or from eBPF program */
+static int landlock_array_map_delete_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u32 index = *(u32 *)key;
+
+	/* only remove the last element */
+	/* TODO: use atomic_dec? */
+	if (array->n_entries && index == array->n_entries - 1) {
+		array->n_entries--;
+		landlock_put_handle((struct map_landlock_handle *)
+				(array->value + array->elem_size * index));
+		return 0;
+	}
+	return -EINVAL;
+}
+
+static const struct bpf_map_ops landlock_array_ops = {
+	.map_alloc = landlock_array_map_alloc,
+	.map_free = landlock_array_map_free,
+	.map_get_next_key = array_map_get_next_key,
+	.map_lookup_elem = nop_map_lookup_elem,
+	.map_update_elem = landlock_array_map_update_elem,
+	.map_delete_elem = landlock_array_map_delete_elem,
+};
+
+static struct bpf_map_type_list landlock_array_type __read_mostly = {
+	.ops = &landlock_array_ops,
+	.type = BPF_MAP_TYPE_LANDLOCK_ARRAY,
+};
+
+static int __init register_landlock_array_map(void)
+{
+	bpf_register_map_type(&landlock_array_type);
+	return 0;
+}
+
+late_initcall(register_landlock_array_map);
+#endif /* CONFIG_SECURITY_LANDLOCK */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d305a3ce0fa7..32a10ef4b878 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -717,8 +717,13 @@ static int bpf_prog_load(union bpf_attr *attr)
 	    attr->kern_version != LINUX_VERSION_CODE)
 		return -EINVAL;
 
-	if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
-		return -EPERM;
+	switch (type) {
+	case BPF_PROG_TYPE_SOCKET_FILTER:
+		break;
+	default:
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+	}
 
 	/* plain bpf_prog allocation */
 	prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index eec9f90ba030..c15f6cc28e00 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1716,6 +1716,15 @@ static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
 	return (struct bpf_map *) (unsigned long) imm64;
 }
 
+static inline enum bpf_reg_type bpf_reg_type_from_map(struct bpf_map *map)
+{
+	switch (map->map_array_type) {
+	case BPF_MAP_ARRAY_TYPE_UNSPEC:
+	default:
+		return CONST_PTR_TO_MAP;
+	}
+}
+
 /* verify BPF_LD_IMM64 instruction */
 static int check_ld_imm(struct verifier_env *env, struct bpf_insn *insn)
 {
@@ -1742,8 +1751,9 @@ static int check_ld_imm(struct verifier_env *env, struct bpf_insn *insn)
 	/* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
 	BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
 
-	regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
 	regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
+	regs[insn->dst_reg].type =
+		bpf_reg_type_from_map(regs[insn->dst_reg].map_ptr);
 	return 0;
 }
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp()
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (2 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 05/10] seccomp: Handle Landlock Mickaël Salaün
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

The semantic is unchanged. This will be useful for the Landlock
integration with seccomp (next commit).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
---
 include/linux/seccomp.h |  5 +++--
 kernel/fork.c           |  2 +-
 kernel/seccomp.c        | 18 +++++++++++++-----
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 2296e6b2f690..29b20fe8fd4d 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -83,13 +83,14 @@ static inline int seccomp_mode(struct seccomp *s)
 #endif /* CONFIG_SECCOMP */
 
 #ifdef CONFIG_SECCOMP_FILTER
-extern void put_seccomp_filter(struct task_struct *tsk);
+extern void put_seccomp(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
 #else  /* CONFIG_SECCOMP_FILTER */
-static inline void put_seccomp_filter(struct task_struct *tsk)
+static inline void put_seccomp(struct task_struct *tsk)
 {
 	return;
 }
+
 static inline void get_seccomp_filter(struct task_struct *tsk)
 {
 	return;
diff --git a/kernel/fork.c b/kernel/fork.c
index 4a7ec0c6c88c..b23a71ec8003 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -235,7 +235,7 @@ void free_task(struct task_struct *tsk)
 	free_thread_stack(tsk->stack);
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
-	put_seccomp_filter(tsk);
+	put_seccomp(tsk);
 	arch_release_task_struct(tsk);
 	free_task_struct(tsk);
 }
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7002796f14a4..f1f475691c27 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -60,6 +60,8 @@ struct seccomp_filter {
 	struct bpf_prog *prog;
 };
 
+static void put_seccomp_filter(struct seccomp_filter *filter);
+
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
@@ -313,7 +315,7 @@ static inline void seccomp_sync_threads(void)
 		 * current's path will hold a reference.  (This also
 		 * allows a put before the assignment.)
 		 */
-		put_seccomp_filter(thread);
+		put_seccomp_filter(thread->seccomp.filter);
 		smp_store_release(&thread->seccomp.filter,
 				  caller->seccomp.filter);
 
@@ -475,10 +477,11 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter)
 	}
 }
 
-/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
-void put_seccomp_filter(struct task_struct *tsk)
+/* put_seccomp_filter - decrements the ref count of a filter */
+static void put_seccomp_filter(struct seccomp_filter *filter)
 {
-	struct seccomp_filter *orig = tsk->seccomp.filter;
+	struct seccomp_filter *orig = filter;
+
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
@@ -487,6 +490,11 @@ void put_seccomp_filter(struct task_struct *tsk)
 	}
 }
 
+void put_seccomp(struct task_struct *tsk)
+{
+	put_seccomp_filter(tsk->seccomp.filter);
+}
+
 /**
  * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
  * @syscall: syscall number to send to userland
@@ -926,7 +934,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
 	if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
 		ret = -EFAULT;
 
-	put_seccomp_filter(task);
+	put_seccomp_filter(task->seccomp.filter);
 	return ret;
 
 out:
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 05/10] seccomp: Handle Landlock
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (3 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 10:32 ` [RFC v2 06/10] landlock: Add LSM hooks Mickaël Salaün
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev, Andrew Morton

A Landlock program can be triggered when a seccomp filter return
RET_LANDLOCK. Moreover, it is possible to return a 16-bit cookie which
will be readable by the Landlock programs.

Only seccomp filters loaded from the same thread and before a Landlock
program can trigger it. Multiple Landlock programs can be triggered by
one or more seccomp filters. This way, each RET_LANDLOCK (with specific
cookie) will trigger all the allowed Landlock programs once.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/seccomp.h      |  49 +++++++++++
 include/uapi/linux/seccomp.h |   2 +
 kernel/fork.c                |  39 ++++++++-
 kernel/seccomp.c             | 190 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 275 insertions(+), 5 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 29b20fe8fd4d..785ccbebf687 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,7 +10,33 @@
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include <linux/bpf.h>	/* struct bpf_prog */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 struct seccomp_filter;
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct seccomp_landlock_ret {
+	struct seccomp_landlock_ret *prev;
+	/* @filter points to a @landlock_filter list */
+	struct seccomp_filter *filter;
+	u16 cookie;
+	bool triggered;
+};
+
+struct seccomp_landlock_prog {
+	atomic_t usage;
+	struct seccomp_landlock_prog *prev;
+	/*
+	 * List of filters (through filter->landlock_prev) allowed to trigger
+	 * this Landlock program.
+	 */
+	struct seccomp_filter *filter;
+	struct bpf_prog *prog;
+};
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 /**
  * struct seccomp - the state of a seccomp'ed process
  *
@@ -18,6 +44,10 @@ struct seccomp_filter;
  *         system calls available to a process.
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *          accessed without locking during system call entry.
+ * @landlock_filter: list of filters allowed to trigger an associated
+ *                    Landlock hook via a RET_LANDLOCK.
+ * @landlock_ret: stored values from a RET_LANDLOCK.
+ * @landlock_prog: list of Landlock programs.
  *
  *          @filter must only be accessed from the context of current as there
  *          is no read locking.
@@ -25,6 +55,12 @@ struct seccomp_filter;
 struct seccomp {
 	int mode;
 	struct seccomp_filter *filter;
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+	struct seccomp_filter *landlock_filter;
+	struct seccomp_landlock_ret *landlock_ret;
+	struct seccomp_landlock_prog *landlock_prog;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 };
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
@@ -85,6 +121,12 @@ static inline int seccomp_mode(struct seccomp *s)
 #ifdef CONFIG_SECCOMP_FILTER
 extern void put_seccomp(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void put_landlock_ret(struct seccomp_landlock_ret *landlock_ret);
+extern struct seccomp_landlock_ret *dup_landlock_ret(
+		struct seccomp_landlock_ret *ret_orig);
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #else  /* CONFIG_SECCOMP_FILTER */
 static inline void put_seccomp(struct task_struct *tsk)
 {
@@ -95,6 +137,13 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 {
 	return;
 }
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+static inline void put_landlock_ret(struct seccomp_landlock_ret *landlock_ret) {}
+static inline struct seccomp_landlock_ret *dup_landlock_ret(
+		struct seccomp_landlock_ret *ret_orig) {}
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #endif /* CONFIG_SECCOMP_FILTER */
 
 #if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a43ff1e..b4aab1c19b8a 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,6 +13,7 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
+#define SECCOMP_SET_LANDLOCK_HOOK	2
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
@@ -28,6 +29,7 @@
 #define SECCOMP_RET_KILL	0x00000000U /* kill the task immediately */
 #define SECCOMP_RET_TRAP	0x00030000U /* disallow and force a SIGSYS */
 #define SECCOMP_RET_ERRNO	0x00050000U /* returns an errno */
+#define SECCOMP_RET_LANDLOCK	0x00070000U /* trigger LSM evaluation */
 #define SECCOMP_RET_TRACE	0x7ff00000U /* pass to a tracer or disallow */
 #define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index b23a71ec8003..3658c1e95e03 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -369,7 +369,12 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	 * the usage counts on the error path calling free_task.
 	 */
 	tsk->seccomp.filter = NULL;
-#endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+	tsk->seccomp.landlock_filter = NULL;
+	tsk->seccomp.landlock_ret = NULL;
+	tsk->seccomp.landlock_prog = NULL;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+#endif /* CONFIG_SECCOMP */
 
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
@@ -1200,9 +1205,12 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	return 0;
 }
 
-static void copy_seccomp(struct task_struct *p)
+static int copy_seccomp(struct task_struct *p)
 {
 #ifdef CONFIG_SECCOMP
+#ifdef CONFIG_SECURITY_LANDLOCK
+	struct seccomp_landlock_ret *ret_walk;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	/*
 	 * Must be called with sighand->lock held, which is common to
 	 * all threads in the group. Holding cred_guard_mutex is not
@@ -1213,7 +1221,27 @@ static void copy_seccomp(struct task_struct *p)
 
 	/* Ref-count the new filter user, and assign it. */
 	get_seccomp_filter(current);
-	p->seccomp = current->seccomp;
+	p->seccomp.mode = current->seccomp.mode;
+	p->seccomp.filter = current->seccomp.filter;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	/* No copy for: landlock_filter, landlock_handle */
+	p->seccomp.landlock_prog = current->seccomp.landlock_prog;
+	if (p->seccomp.landlock_prog)
+		atomic_inc(&p->seccomp.landlock_prog->usage);
+	/* Deep copy for landlock_ret to avoid allocating for each syscall */
+	for (ret_walk = current->seccomp.landlock_ret;
+			ret_walk; ret_walk = ret_walk->prev) {
+		struct seccomp_landlock_ret *ret_new;
+
+		ret_new = dup_landlock_ret(ret_walk);
+		if (IS_ERR(ret_new)) {
+			put_landlock_ret(p->seccomp.landlock_ret);
+			return PTR_ERR(ret_new);
+		}
+		ret_new->prev = p->seccomp.landlock_ret;
+		p->seccomp.landlock_ret = ret_new;
+	}
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 	/*
 	 * Explicitly enable no_new_privs here in case it got set
@@ -1231,6 +1259,7 @@ static void copy_seccomp(struct task_struct *p)
 	if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
 		set_tsk_thread_flag(p, TIF_SECCOMP);
 #endif
+	return 0;
 }
 
 SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
@@ -1589,7 +1618,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
 	 */
-	copy_seccomp(p);
+	retval = copy_seccomp(p);
+	if (retval)
+		goto bad_fork_cancel_cgroup;
 
 	/*
 	 * Process group and session signals need to be delivered to just the
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index f1f475691c27..5df7274c7ec3 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -6,6 +6,8 @@
  * Copyright (C) 2012 Google, Inc.
  * Will Drewry <wad@chromium.org>
  *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
  * This defines a simple but solid secure-computing facility.
  *
  * Mode 1 uses a fixed list of allowed system calls.
@@ -33,6 +35,10 @@
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include <linux/bpf.h>	/* bpf_prog_put()  */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 /**
  * struct seccomp_filter - container for seccomp BPF programs
  *
@@ -58,6 +64,9 @@ struct seccomp_filter {
 	atomic_t usage;
 	struct seccomp_filter *prev;
 	struct bpf_prog *prog;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	struct seccomp_filter *landlock_prev;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 };
 
 static void put_seccomp_filter(struct seccomp_filter *filter);
@@ -179,6 +188,10 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
 {
 	struct seccomp_data sd_local;
 	u32 ret = SECCOMP_RET_ALLOW;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	struct seccomp_landlock_ret *landlock_ret, *init_landlock_ret =
+		current->seccomp.landlock_ret;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	/* Make sure cross-thread synced filter points somewhere sane. */
 	struct seccomp_filter *f =
 			lockless_dereference(current->seccomp.filter);
@@ -191,6 +204,14 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
 		populate_seccomp_data(&sd_local);
 		sd = &sd_local;
 	}
+#ifdef CONFIG_SECURITY_LANDLOCK
+	for (landlock_ret = init_landlock_ret;
+			landlock_ret;
+			landlock_ret = landlock_ret->prev) {
+		/* No need to clean the cookie. */
+		landlock_ret->triggered = false;
+	}
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 	/*
 	 * All filters in the list are evaluated and the lowest BPF return
@@ -198,8 +219,27 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
 	 */
 	for (; f; f = f->prev) {
 		u32 cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
+		u32 action = cur_ret & SECCOMP_RET_ACTION;
+#ifdef CONFIG_SECURITY_LANDLOCK
+		u32 data = cur_ret & SECCOMP_RET_DATA;
+		if (action == SECCOMP_RET_LANDLOCK) {
+			/*
+			 * Keep track of filters from the current task that
+			 * trigger a RET_LANDLOCK.
+			 */
+			for (landlock_ret = init_landlock_ret;
+					landlock_ret;
+					landlock_ret = landlock_ret->prev) {
+				if (landlock_ret->filter == f) {
+					landlock_ret->triggered = true;
+					landlock_ret->cookie = data;
+					break;
+				}
+			}
+		}
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
-		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
+		if (action < (ret & SECCOMP_RET_ACTION))
 			ret = cur_ret;
 	}
 	return ret;
@@ -426,6 +466,9 @@ static long seccomp_attach_filter(unsigned int flags,
 {
 	unsigned long total_insns;
 	struct seccomp_filter *walker;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	struct seccomp_landlock_ret *landlock_ret;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 	assert_spin_locked(&current->sighand->siglock);
 
@@ -450,6 +493,21 @@ static long seccomp_attach_filter(unsigned int flags,
 	 * task reference.
 	 */
 	filter->prev = current->seccomp.filter;
+#ifdef CONFIG_SECURITY_LANDLOCK
+	filter->landlock_prev = current->seccomp.landlock_filter;
+	current->seccomp.landlock_filter = filter;
+
+	/* Dedicated Landlock result */
+	landlock_ret = kmalloc(sizeof(*landlock_ret), GFP_KERNEL);
+	if (!landlock_ret)
+		return -ENOMEM;
+	landlock_ret->prev = current->seccomp.landlock_ret;
+	atomic_inc(&filter->usage);
+	landlock_ret->filter = filter;
+	landlock_ret->cookie = 0;
+	landlock_ret->triggered = false;
+	current->seccomp.landlock_ret = landlock_ret;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	current->seccomp.filter = filter;
 
 	/* Now that the new filter is in place, synchronize to all threads. */
@@ -459,6 +517,55 @@ static long seccomp_attach_filter(unsigned int flags,
 	return 0;
 }
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct seccomp_landlock_ret *dup_landlock_ret(
+		struct seccomp_landlock_ret *ret_orig)
+{
+	struct seccomp_landlock_ret *ret_new;
+
+	if (!ret_orig)
+		return NULL;
+	ret_new = kmalloc(sizeof(*ret_new), GFP_KERNEL);
+	if (!ret_new)
+		return ERR_PTR(-ENOMEM);
+	ret_new->filter = ret_orig->filter;
+	if (ret_new->filter)
+		atomic_inc(&ret_new->filter->usage);
+	ret_new->cookie = 0;
+	ret_new->triggered = false;
+	ret_new->prev = NULL;
+	return ret_new;
+}
+
+static void put_landlock_prog(struct seccomp_landlock_prog *landlock_prog)
+{
+	struct seccomp_landlock_prog *orig = landlock_prog;
+
+	/* Clean up single-reference branches iteratively. */
+	while (orig && atomic_dec_and_test(&orig->usage)) {
+		struct seccomp_landlock_prog *freeme = orig;
+
+		put_seccomp_filter(orig->filter);
+		bpf_prog_put(orig->prog);
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+
+void put_landlock_ret(struct seccomp_landlock_ret *landlock_ret)
+{
+	struct seccomp_landlock_ret *orig = landlock_ret;
+
+	while (orig) {
+		struct seccomp_landlock_ret *freeme = orig;
+
+		put_seccomp_filter(orig->filter);
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 /* get_seccomp_filter - increments the reference count of the filter on @tsk */
 void get_seccomp_filter(struct task_struct *tsk)
 {
@@ -485,7 +592,9 @@ static void put_seccomp_filter(struct seccomp_filter *filter)
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
+
 		orig = orig->prev;
+		/* must not put orig->landlock_prev */
 		seccomp_filter_free(freeme);
 	}
 }
@@ -493,6 +602,10 @@ static void put_seccomp_filter(struct seccomp_filter *filter)
 void put_seccomp(struct task_struct *tsk)
 {
 	put_seccomp_filter(tsk->seccomp.filter);
+#ifdef CONFIG_SECURITY_LANDLOCK
+	put_landlock_prog(tsk->seccomp.landlock_prog);
+	put_landlock_ret(tsk->seccomp.landlock_ret);
+#endif /* CONFIG_SECURITY_LANDLOCK */
 }
 
 /**
@@ -609,6 +722,8 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
 	case SECCOMP_RET_TRACE:
 		return filter_ret;  /* Save the rest for phase 2. */
 
+	case SECCOMP_RET_LANDLOCK:
+		/* fall through */
 	case SECCOMP_RET_ALLOW:
 		return SECCOMP_PHASE1_OK;
 
@@ -814,6 +929,75 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
 }
 #endif
 
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+
+/* Limit Landlock programs to 256KB. */
+#define LANDLOCK_PROG_LIST_MAX_PAGES (1 << 6)
+
+static long landlock_set_hook(unsigned int flags, const char __user *user_bpf_fd)
+{
+	long result;
+	unsigned long prog_list_pages;
+	struct seccomp_landlock_prog *landlock_prog, *cp_walker;
+	int bpf_fd;
+	struct bpf_prog *prog;
+
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(),
+				     current_user_ns(), CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+	if (!user_bpf_fd)
+		return -EINVAL;
+
+	/* could be used for TSYNC */
+	if (flags)
+		return -EINVAL;
+
+	if (copy_from_user(&bpf_fd, user_bpf_fd, sizeof(user_bpf_fd)))
+		return -EFAULT;
+	prog = bpf_prog_get(bpf_fd);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+	switch (prog->type) {
+		/* TODO: add LSM hooks */
+	default:
+		result = -EINVAL;
+		goto put_prog;
+	}
+
+	/* validate allocated memory */
+	prog_list_pages = prog->pages;
+	for (cp_walker = current->seccomp.landlock_prog; cp_walker;
+			cp_walker = cp_walker->prev) {
+		/* TODO: add penalty for each prog? */
+		prog_list_pages += cp_walker->prog->pages;
+	}
+	if (prog_list_pages > LANDLOCK_PROG_LIST_MAX_PAGES) {
+		result = -ENOMEM;
+		goto put_prog;
+	}
+
+	landlock_prog = kmalloc(sizeof(*landlock_prog), GFP_KERNEL);
+	if (!landlock_prog) {
+		result = -ENOMEM;
+		goto put_prog;
+	}
+	landlock_prog->prog = prog;
+	landlock_prog->filter = current->seccomp.filter;
+	if (landlock_prog->filter)
+		atomic_inc(&landlock_prog->filter->usage);
+	atomic_set(&landlock_prog->usage, 1);
+	landlock_prog->prev = current->seccomp.landlock_prog;
+	current->seccomp.landlock_prog = landlock_prog;
+	return 0;
+
+put_prog:
+	bpf_prog_put(prog);
+	return result;
+}
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 /* Common entry point for both prctl and syscall. */
 static long do_seccomp(unsigned int op, unsigned int flags,
 		       const char __user *uargs)
@@ -825,6 +1009,10 @@ static long do_seccomp(unsigned int op, unsigned int flags,
 		return seccomp_set_mode_strict();
 	case SECCOMP_SET_MODE_FILTER:
 		return seccomp_set_mode_filter(flags, uargs);
+#ifdef CONFIG_SECURITY_LANDLOCK
+	case SECCOMP_SET_LANDLOCK_HOOK:
+		return landlock_set_hook(flags, uargs);
+#endif /* CONFIG_SECURITY_LANDLOCK */
 	default:
 		return -EINVAL;
 	}
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 06/10] landlock: Add LSM hooks
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (4 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 05/10] seccomp: Handle Landlock Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-30 18:56   ` Andy Lutomirski
  2016-08-25 10:32 ` [RFC v2 07/10] landlock: Add errno check Mickaël Salaün
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

Add LSM hooks which can be used by userland through Landlock (eBPF)
programs. This programs are limited to a whitelist of functions (cf.
next commit). The eBPF program context is depicted by the struct
landlock_data (cf. include/uapi/linux/bpf.h):
* hook: LSM hook ID (useful when using the same program for multiple LSM
  hooks);
* cookie: the 16-bit value from the seccomp filter that triggered this
  Landlock program;
* args[6]: array of LSM hook arguments.

The LSM hook arguments can contain raw values as integers or
(unleakable) pointers. The only way to use the pointers are to pass them
to an eBPF function according to their types (e.g. the
bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
file pointer).

For now, there is three hooks for file system access control:
* file_open;
* file_permission;
* mmap_file.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h        |   7 ++
 include/linux/lsm_hooks.h  |   5 ++
 include/uapi/linux/bpf.h   |  20 +++++
 kernel/bpf/syscall.c       |   3 +
 kernel/bpf/verifier.c      |   8 ++
 kernel/seccomp.c           |   7 +-
 security/Makefile          |   2 +
 security/landlock/Makefile |   3 +
 security/landlock/lsm.c    | 211 +++++++++++++++++++++++++++++++++++++++++++++
 security/security.c        |   1 +
 10 files changed, 265 insertions(+), 2 deletions(-)
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/lsm.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a5b388be099..557e7efdf0cd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -81,6 +81,9 @@ enum bpf_arg_type {
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
+
+	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */
+	ARG_PTR_TO_STRUCT_CRED,		/* pointer to struct cred */
 };
 
 /* type of values returned from helper functions */
@@ -139,6 +142,10 @@ enum bpf_reg_type {
 	 */
 	PTR_TO_PACKET,
 	PTR_TO_PACKET_END,	 /* skb->data + headlen */
+
+	/* Landlock */
+	PTR_TO_STRUCT_FILE,
+	PTR_TO_STRUCT_CRED,
 };
 
 struct bpf_prog;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7ae397669d8b..6792ae8fb53d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1898,5 +1898,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a60eedc17d40..983d14e910ff 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -102,6 +102,9 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SCHED_CLS,
 	BPF_PROG_TYPE_SCHED_ACT,
 	BPF_PROG_TYPE_TRACEPOINT,
+	BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
+	BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
+	BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
 };
 
 #define BPF_PSEUDO_MAP_FD	1
@@ -404,4 +407,21 @@ struct landlock_handle {
 	};
 } __attribute__((aligned(8)));
 
+/**
+ * struct landlock_data
+ *
+ * @hook: LSM hook ID
+ * @cookie: value set by a seccomp-filter return value RET_LANDLOCK. This come
+ *          from a trusted seccomp-bpf program: the same process that loaded
+ *          this Landlock hook program.
+ * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
+ *        description and the LANDLOCK_HOOK* definitions from
+ *        security/landlock/lsm.c for their types.
+ */
+struct landlock_data {
+	__u32 hook;
+	__u16 cookie;
+	__u64 args[6];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 32a10ef4b878..6b8bfc34c751 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -719,6 +719,9 @@ static int bpf_prog_load(union bpf_attr *attr)
 
 	switch (type) {
 	case BPF_PROG_TYPE_SOCKET_FILTER:
+	case BPF_PROG_TYPE_LANDLOCK_FILE_OPEN:
+	case BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION:
+	case BPF_PROG_TYPE_LANDLOCK_MMAP_FILE:
 		break;
 	default:
 		if (!capable(CAP_SYS_ADMIN))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c15f6cc28e00..2931e2efcc10 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -244,6 +244,8 @@ static const char * const reg_type_str[] = {
 	[CONST_IMM]		= "imm",
 	[PTR_TO_PACKET]		= "pkt",
 	[PTR_TO_PACKET_END]	= "pkt_end",
+	[PTR_TO_STRUCT_FILE]	= "struct_file",
+	[PTR_TO_STRUCT_CRED]	= "struct_cred",
 };
 
 static void print_verifier_state(struct verifier_state *state)
@@ -554,6 +556,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_PACKET_END:
 	case FRAME_PTR:
 	case CONST_PTR_TO_MAP:
+	case PTR_TO_STRUCT_FILE:
+	case PTR_TO_STRUCT_CRED:
 		return true;
 	default:
 		return false;
@@ -943,6 +947,10 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		expected_type = CONST_PTR_TO_MAP;
 	} else if (arg_type == ARG_PTR_TO_CTX) {
 		expected_type = PTR_TO_CTX;
+	} else if (arg_type == ARG_PTR_TO_STRUCT_FILE) {
+		expected_type = PTR_TO_STRUCT_FILE;
+	} else if (arg_type == ARG_PTR_TO_STRUCT_CRED) {
+		expected_type = PTR_TO_STRUCT_CRED;
 	} else if (arg_type == ARG_PTR_TO_STACK ||
 		   arg_type == ARG_PTR_TO_RAW_STACK) {
 		expected_type = PTR_TO_STACK;
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5df7274c7ec3..3395e370cd47 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -36,7 +36,7 @@
 #include <linux/uaccess.h>
 
 #ifdef CONFIG_SECURITY_LANDLOCK
-#include <linux/bpf.h>	/* bpf_prog_put()  */
+#include <linux/bpf.h>	/* bpf_prog_put(), BPF_PROG_TYPE_LANDLOCK_*  */
 #endif /* CONFIG_SECURITY_LANDLOCK */
 
 /**
@@ -960,7 +960,10 @@ static long landlock_set_hook(unsigned int flags, const char __user *user_bpf_fd
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 	switch (prog->type) {
-		/* TODO: add LSM hooks */
+	case BPF_PROG_TYPE_LANDLOCK_FILE_OPEN:
+	case BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION:
+	case BPF_PROG_TYPE_LANDLOCK_MMAP_FILE:
+		break;
 	default:
 		result = -EINVAL;
 		goto put_prog;
diff --git a/security/Makefile b/security/Makefile
index f2d71cdb8e19..3fdc2f19dc48 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -9,6 +9,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
 subdir-$(CONFIG_SECURITY_YAMA)		+= yama
 subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
+subdir-$(CONFIG_SECURITY_LANDLOCK)		+= landlock
 
 # always enable default capabilities
 obj-y					+= commoncap.o
@@ -24,6 +25,7 @@ obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
 obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
 obj-$(CONFIG_SECURITY_YAMA)		+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
+obj-$(CONFIG_SECURITY_LANDLOCK)	+= landlock/
 obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
 
 # Object integrity file lists
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
new file mode 100644
index 000000000000..59669d70bc7e
--- /dev/null
+++ b/security/landlock/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
+
+landlock-y := lsm.o
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
new file mode 100644
index 000000000000..aa9d4a64826e
--- /dev/null
+++ b/security/landlock/lsm.c
@@ -0,0 +1,211 @@
+/*
+ * Landlock LSM
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/current.h>
+#include <linux/bpf.h> /* enum bpf_reg_type, struct landlock_data */
+#include <linux/cred.h>
+#include <linux/err.h> /* MAX_ERRNO */
+#include <linux/filter.h> /* struct bpf_prog, BPF_PROG_RUN() */
+#include <linux/kernel.h> /* FIELD_SIZEOF() */
+#include <linux/lsm_hooks.h>
+#include <linux/seccomp.h> /* struct seccomp_* */
+
+#define LANDLOCK_HOOK_INIT(NAME) LSM_HOOK_INIT(NAME, landlock_hook_##NAME)
+
+#define LANDLOCK_HOOKx(X, NAME, CNAME, ...)				\
+	static inline int landlock_hook_##NAME(				\
+		LANDLOCK_MAP(X, LANDLOCK_ARG_TA, __VA_ARGS__))		\
+	{								\
+		__u64 args[6] = {					\
+			LANDLOCK_MAP(X, LANDLOCK_ARG_A, __VA_ARGS__)	\
+		};							\
+		return landlock_run_prog(args);				\
+	}								\
+	static inline bool bpf_landlock_##NAME##_is_valid_access(	\
+			int off, int size, enum bpf_access_type type,	\
+			enum bpf_reg_type *reg_type)			\
+	{								\
+		enum bpf_reg_type arg_types[6] = {			\
+			LANDLOCK_MAP(X, LANDLOCK_ARG_D, __VA_ARGS__)	\
+		};							\
+		return __is_valid_access(off, size, type, reg_type, arg_types); \
+	}								\
+	static const struct bpf_verifier_ops bpf_landlock_##NAME##_ops = { \
+		.get_func_proto	= bpf_landlock_func_proto,		\
+		.is_valid_access = bpf_landlock_##NAME##_is_valid_access, \
+		.convert_ctx_access = landlock_convert_ctx_access,	\
+	};								\
+	static struct bpf_prog_type_list bpf_landlock_##NAME##_type __read_mostly = { \
+		.ops	= &bpf_landlock_##NAME##_ops,			\
+		.type	= BPF_PROG_TYPE_LANDLOCK_##CNAME,		\
+	};								\
+	static int __init register_landlock_##NAME##_filter_ops(void)	\
+	{								\
+		bpf_register_prog_type(&bpf_landlock_##NAME##_type);	\
+		return 0;						\
+	}								\
+	late_initcall(register_landlock_##NAME##_filter_ops);
+
+#define LANDLOCK_HOOK1(NAME, ...) LANDLOCK_HOOKx(1, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK2(NAME, ...) LANDLOCK_HOOKx(2, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK3(NAME, ...) LANDLOCK_HOOKx(3, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK4(NAME, ...) LANDLOCK_HOOKx(4, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK5(NAME, ...) LANDLOCK_HOOKx(5, NAME, __VA_ARGS__)
+#define LANDLOCK_HOOK6(NAME, ...) LANDLOCK_HOOKx(6, NAME, __VA_ARGS__)
+
+#define LANDLOCK_MAP0(m,...)
+#define LANDLOCK_MAP1(m,d,t,a) m(d,t,a)
+#define LANDLOCK_MAP2(m,d,t,a,...) m(d,t,a), LANDLOCK_MAP1(m,__VA_ARGS__)
+#define LANDLOCK_MAP3(m,d,t,a,...) m(d,t,a), LANDLOCK_MAP2(m,__VA_ARGS__)
+#define LANDLOCK_MAP4(m,d,t,a,...) m(d,t,a), LANDLOCK_MAP3(m,__VA_ARGS__)
+#define LANDLOCK_MAP5(m,d,t,a,...) m(d,t,a), LANDLOCK_MAP4(m,__VA_ARGS__)
+#define LANDLOCK_MAP6(m,d,t,a,...) m(d,t,a), LANDLOCK_MAP5(m,__VA_ARGS__)
+#define LANDLOCK_MAP(n,...) LANDLOCK_MAP##n(__VA_ARGS__)
+
+#define LANDLOCK_ARG_D(d,t,a) d
+#define LANDLOCK_ARG_TA(d,t,a) t a
+#define LANDLOCK_ARG_A(d,t,a) (u64)a
+
+
+static int landlock_run_prog(__u64 args[6])
+{
+	u32 cur_ret = 0, ret = 0;
+	struct seccomp_landlock_ret *landlock_ret;
+	struct seccomp_landlock_prog *prog;
+
+	/* the hook ID is faked by landlock_convert_ctx_access() */
+	struct landlock_data ctx = {
+		.args[0] = args[0],
+		.args[1] = args[1],
+		.args[2] = args[2],
+		.args[3] = args[3],
+		.args[4] = args[4],
+		.args[5] = args[5],
+	};
+
+	/* TODO: use lockless_dereference()? */
+	/* run all the triggered Landlock programs */
+	for (landlock_ret = current->seccomp.landlock_ret;
+			landlock_ret; landlock_ret = landlock_ret->prev) {
+		if (landlock_ret->triggered) {
+			ctx.cookie = landlock_ret->cookie;
+			for (prog = current->seccomp.landlock_prog;
+					prog; prog = prog->prev) {
+				if (prog->filter == landlock_ret->filter) {
+					cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
+					break;
+				}
+			}
+			if (!ret) {
+				if (cur_ret > MAX_ERRNO)
+					ret = MAX_ERRNO;
+				else
+					ret = cur_ret;
+			}
+		}
+	}
+	return -ret;
+}
+
+static const struct bpf_func_proto *bpf_landlock_func_proto(
+		enum bpf_func_id func_id)
+{
+	return NULL;
+}
+
+static u32 landlock_convert_ctx_access(enum bpf_access_type type, int dst_reg,
+				      int src_reg, int ctx_off,
+				      struct bpf_insn *insn_buf,
+				      struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	/* only handle 32-bit values */
+	switch (ctx_off) {
+	case offsetof(struct landlock_data, hook):
+		*insn++ = BPF_MOV32_IMM(dst_reg, prog->type);
+		break;
+	default:
+		return 1;
+	}
+
+	return insn - insn_buf;
+}
+
+static bool __is_valid_access(int off, int size, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type, enum bpf_reg_type arg_types[6])
+{
+	int arg_nb, expected_size;
+
+	if (type != BPF_READ)
+		return false;
+	if (off < 0 || off >= sizeof(struct landlock_data))
+		return false;
+
+	switch (off) {
+	case offsetof(struct landlock_data, cookie):
+		expected_size = sizeof(__u16);
+		break;
+	case offsetof(struct landlock_data, hook):
+		expected_size = sizeof(__u32);
+		break;
+	case offsetof(struct landlock_data, args[0]) ...
+			offsetof(struct landlock_data, args[5]):
+		expected_size = sizeof(__u64);
+		break;
+	default:
+		return false;
+	}
+	if (expected_size != size)
+		return false;
+
+	/* check pointer type */
+	switch (off) {
+	case offsetof(struct landlock_data, args[0]) ...
+			offsetof(struct landlock_data, args[5]):
+		arg_nb = (off - offsetof(struct landlock_data, args[0]))
+			/ FIELD_SIZEOF(struct landlock_data, args[0]);
+		*reg_type = arg_types[arg_nb];
+		if (*reg_type == NOT_INIT)
+			return false;
+		break;
+	}
+
+	return true;
+}
+
+LANDLOCK_HOOK2(file_open, FILE_OPEN,
+	PTR_TO_STRUCT_FILE, struct file *, file,
+	PTR_TO_STRUCT_CRED, const struct cred *, cred
+)
+
+LANDLOCK_HOOK2(file_permission, FILE_PERMISSION,
+	PTR_TO_STRUCT_FILE, struct file *, file,
+	UNKNOWN_VALUE, int, mask
+)
+
+LANDLOCK_HOOK4(mmap_file, MMAP_FILE,
+	PTR_TO_STRUCT_FILE, struct file *, file,
+	UNKNOWN_VALUE, unsigned long, reqprot,
+	UNKNOWN_VALUE, unsigned long, prot,
+	UNKNOWN_VALUE, unsigned long, flags
+)
+
+static struct security_hook_list landlock_hooks[] = {
+	LANDLOCK_HOOK_INIT(file_open),
+	LANDLOCK_HOOK_INIT(file_permission),
+	LANDLOCK_HOOK_INIT(mmap_file),
+};
+
+void __init landlock_add_hooks(void)
+{
+	pr_info("landlock: Becoming ready for sandboxing\n");
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks));
+}
diff --git a/security/security.c b/security/security.c
index 709569305d32..d918c5ca8b81 100644
--- a/security/security.c
+++ b/security/security.c
@@ -61,6 +61,7 @@ int __init security_init(void)
 	capability_add_hooks();
 	yama_add_hooks();
 	loadpin_add_hooks();
+	landlock_add_hooks();
 
 	/*
 	 * Load all the remaining security modules.
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 07/10] landlock: Add errno check
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (5 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 06/10] landlock: Add LSM hooks Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 11:13   ` Andy Lutomirski
  2016-08-25 10:32 ` [RFC v2 08/10] landlock: Handle file system comparisons Mickaël Salaün
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

Add a max errno value.

This is not strictly needed but should improve reliability.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
---
 include/uapi/asm-generic/errno-base.h | 1 +
 security/landlock/lsm.c               | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/uapi/asm-generic/errno-base.h b/include/uapi/asm-generic/errno-base.h
index 65115978510f..43407a403e72 100644
--- a/include/uapi/asm-generic/errno-base.h
+++ b/include/uapi/asm-generic/errno-base.h
@@ -35,5 +35,6 @@
 #define	EPIPE		32	/* Broken pipe */
 #define	EDOM		33	/* Math argument out of domain of func */
 #define	ERANGE		34	/* Math result not representable */
+#define	_ERRNO_LAST	ERANGE
 
 #endif
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index aa9d4a64826e..322309068066 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -11,7 +11,6 @@
 #include <asm/current.h>
 #include <linux/bpf.h> /* enum bpf_reg_type, struct landlock_data */
 #include <linux/cred.h>
-#include <linux/err.h> /* MAX_ERRNO */
 #include <linux/filter.h> /* struct bpf_prog, BPF_PROG_RUN() */
 #include <linux/kernel.h> /* FIELD_SIZEOF() */
 #include <linux/lsm_hooks.h>
@@ -104,8 +103,9 @@ static int landlock_run_prog(__u64 args[6])
 				}
 			}
 			if (!ret) {
-				if (cur_ret > MAX_ERRNO)
-					ret = MAX_ERRNO;
+				/* check errno to not mess with kernel code */
+				if (cur_ret > _ERRNO_LAST)
+					ret = EPERM;
 				else
 					ret = cur_ret;
 			}
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 08/10] landlock: Handle file system comparisons
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (6 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 07/10] landlock: Add errno check Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 11:12   ` Andy Lutomirski
  2016-08-25 10:32 ` [RFC v2 09/10] landlock: Handle cgroups Mickaël Salaün
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

Add eBPF functions to compare file system access with a Landlock file
system handle:
* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
  This function allows to compare the dentry, inode, device or mount
  point of the currently accessed file, with a reference handle.
* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
  This function allows an eBPF program to check if the current accessed
  file is the same or in the hierarchy of a reference handle.

The goal of file system handle is to abstract kernel objects such as a
struct file or a struct inode. Userland can create this kind of handle
thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
landlock_handle containing the handle type (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
also be any descriptions able to match a struct file or a struct inode
(e.g. path or glob string).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h            |   4 +-
 include/uapi/linux/bpf.h       |  52 +++++++++++-
 kernel/bpf/arraymap.c          |  17 +++-
 kernel/bpf/verifier.c          |   6 ++
 security/landlock/Makefile     |   2 +-
 security/landlock/checker_fs.c | 183 +++++++++++++++++++++++++++++++++++++++++
 security/landlock/checker_fs.h |  20 +++++
 security/landlock/lsm.c        |  11 ++-
 8 files changed, 288 insertions(+), 7 deletions(-)
 create mode 100644 security/landlock/checker_fs.c
 create mode 100644 security/landlock/checker_fs.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 557e7efdf0cd..79014aedbea4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -84,6 +84,7 @@ enum bpf_arg_type {
 
 	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */
 	ARG_PTR_TO_STRUCT_CRED,		/* pointer to struct cred */
+	ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,	/* pointer to Landlock FS handle */
 };
 
 /* type of values returned from helper functions */
@@ -146,6 +147,7 @@ enum bpf_reg_type {
 	/* Landlock */
 	PTR_TO_STRUCT_FILE,
 	PTR_TO_STRUCT_CRED,
+	CONST_PTR_TO_LANDLOCK_HANDLE_FS,
 };
 
 struct bpf_prog;
@@ -207,7 +209,7 @@ struct bpf_array {
 
 #ifdef CONFIG_SECURITY_LANDLOCK
 struct map_landlock_handle {
-	u32 type;
+	u32 type; /* e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD */
 	union {
 		struct file *file;
 	};
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 983d14e910ff..88af79dd668c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -89,10 +89,20 @@ enum bpf_map_type {
 
 enum bpf_map_array_type {
 	BPF_MAP_ARRAY_TYPE_UNSPEC,
+	BPF_MAP_ARRAY_TYPE_LANDLOCK_FS,
 };
 
 enum bpf_map_handle_type {
 	BPF_MAP_HANDLE_TYPE_UNSPEC,
+	BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
+	BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB,
+};
+
+enum bpf_map_array_op {
+	BPF_MAP_ARRAY_OP_UNSPEC,
+	BPF_MAP_ARRAY_OP_OR,
+	BPF_MAP_ARRAY_OP_AND,
+	BPF_MAP_ARRAY_OP_XOR,
 };
 
 enum bpf_prog_type {
@@ -325,6 +335,35 @@ enum bpf_func_id {
 	 */
 	BPF_FUNC_skb_get_tunnel_opt,
 	BPF_FUNC_skb_set_tunnel_opt,
+
+	/**
+	 * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
+	 * Compare file system handles with a struct file
+	 *
+	 * @prop: properties to check against (e.g. LANDLOCK_FLAG_FS_DENTRY)
+	 * @map: handles to compare against
+	 * @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+	 * @file: struct file address to compare with (taken from the context)
+	 *
+	 * Return: 0 if the file match the handles, 1 otherwise, or a negative
+	 * value if an error occurred.
+	 */
+	BPF_FUNC_landlock_cmp_fs_prop_with_struct_file,
+
+	/**
+	 * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
+	 * Check if a struct file is a leaf of file system handles
+	 *
+	 * @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE)
+	 * @map: handles to compare against
+	 * @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+	 * @file: struct file address to compare with (taken from the context)
+	 *
+	 * Return: 0 if the file is the same or beneath the handles,
+	 * 1 otherwise, or a negative value if an error occurred.
+	 */
+	BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file,
+
 	__BPF_FUNC_MAX_ID,
 };
 
@@ -398,6 +437,17 @@ struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
+/* Handle check flags */
+#define LANDLOCK_FLAG_FS_DENTRY	(1 << 0)
+#define LANDLOCK_FLAG_FS_INODE	(1 << 1)
+#define LANDLOCK_FLAG_FS_DEVICE	(1 << 2)
+#define LANDLOCK_FLAG_FS_MOUNT	(1 << 3)
+#define _LANDLOCK_FLAG_FS_MASK	((1 << 4) - 1)
+
+/* Handle option flags */
+#define LANDLOCK_FLAG_OPT_REVERSE	(1<<0)
+#define _LANDLOCK_FLAG_OPT_MASK	((1 << 1) - 1)
+
 /* Map handle entry */
 struct landlock_handle {
 	__u32 type; /* enum bpf_map_handle_type */
@@ -410,7 +460,7 @@ struct landlock_handle {
 /**
  * struct landlock_data
  *
- * @hook: LSM hook ID
+ * @hook: LSM hook ID (e.g. BPF_PROG_TYPE_LANDLOCK_FILE_OPEN)
  * @cookie: value set by a seccomp-filter return value RET_LANDLOCK. This come
  *          from a trusted seccomp-bpf program: the same process that loaded
  *          this Landlock hook program.
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 5938b8ee475b..6804dafd8355 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -508,7 +508,12 @@ static struct bpf_map *landlock_array_map_alloc(union bpf_attr *attr)
 static void landlock_put_handle(struct map_landlock_handle *handle)
 {
 	switch (handle->type) {
-		/* TODO: add handle types */
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
+		if (likely(handle->file))
+			fput(handle->file);
+		else
+			WARN_ON(1);
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -533,7 +538,9 @@ static enum bpf_map_array_type landlock_get_array_type(
 		enum bpf_map_handle_type handle_type)
 {
 	switch (handle_type) {
-		/* TODO: add handle types */
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB:
+		return BPF_MAP_ARRAY_TYPE_LANDLOCK_FS;
 	case BPF_MAP_HANDLE_TYPE_UNSPEC:
 	default:
 		return -EINVAL;
@@ -550,6 +557,7 @@ static inline long landlock_store_handle(struct map_landlock_handle *dst,
 		struct landlock_handle *khandle)
 {
 	struct path kpath;
+	struct file *handle_file;
 
 	if (unlikely(!khandle))
 		return -EINVAL;
@@ -557,7 +565,10 @@ static inline long landlock_store_handle(struct map_landlock_handle *dst,
 	/* access control already done for the FD */
 
 	switch (khandle->type) {
-		/* TODO: add handle types */
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
+		FGET_OR_RET(handle_file, khandle->fd);
+		dst->file = handle_file;
+		break;
 	default:
 		WARN_ON(1);
 		path_put(&kpath);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2931e2efcc10..b182c88d5c13 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -246,6 +246,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_PACKET_END]	= "pkt_end",
 	[PTR_TO_STRUCT_FILE]	= "struct_file",
 	[PTR_TO_STRUCT_CRED]	= "struct_cred",
+	[CONST_PTR_TO_LANDLOCK_HANDLE_FS] = "landlock_handle_fs",
 };
 
 static void print_verifier_state(struct verifier_state *state)
@@ -558,6 +559,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case CONST_PTR_TO_MAP:
 	case PTR_TO_STRUCT_FILE:
 	case PTR_TO_STRUCT_CRED:
+	case CONST_PTR_TO_LANDLOCK_HANDLE_FS:
 		return true;
 	default:
 		return false;
@@ -951,6 +953,8 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		expected_type = PTR_TO_STRUCT_FILE;
 	} else if (arg_type == ARG_PTR_TO_STRUCT_CRED) {
 		expected_type = PTR_TO_STRUCT_CRED;
+	} else if (arg_type == ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS) {
+		expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_FS;
 	} else if (arg_type == ARG_PTR_TO_STACK ||
 		   arg_type == ARG_PTR_TO_RAW_STACK) {
 		expected_type = PTR_TO_STACK;
@@ -1727,6 +1731,8 @@ static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
 static inline enum bpf_reg_type bpf_reg_type_from_map(struct bpf_map *map)
 {
 	switch (map->map_array_type) {
+	case BPF_MAP_ARRAY_TYPE_LANDLOCK_FS:
+		return CONST_PTR_TO_LANDLOCK_HANDLE_FS;
 	case BPF_MAP_ARRAY_TYPE_UNSPEC:
 	default:
 		return CONST_PTR_TO_MAP;
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 59669d70bc7e..27f359a8cfaa 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := lsm.o
+landlock-y := lsm.o checker_fs.o
diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c
new file mode 100644
index 000000000000..4d2f1e5d41b6
--- /dev/null
+++ b/security/landlock/checker_fs.c
@@ -0,0 +1,183 @@
+/*
+ * Landlock LSM - File System Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h> /* enum bpf_map_array_op */
+#include <linux/errno.h>
+#include <linux/fs.h> /* path_is_under() */
+#include <linux/path.h> /* struct path */
+
+#include "checker_fs.h"
+
+#define EQUAL_NOT_NULL(a, b) (a && a == b)
+
+/*
+ * bpf_landlock_cmp_fs_prop_with_struct_file
+ *
+ * Cf. include/uapi/linux/bpf.h
+ */
+static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
+		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
+{
+	u8 property = (u8) r1_property;
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
+	enum bpf_map_array_op map_op = r3_map_op;
+	struct file *file = (struct file *) (unsigned long) r4_file;
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	struct path *p1, *p2;
+	struct map_landlock_handle *handle;
+	int i;
+	bool result_dentry = !(property & LANDLOCK_FLAG_FS_DENTRY);
+	bool result_inode = !(property & LANDLOCK_FLAG_FS_INODE);
+	bool result_device = !(property & LANDLOCK_FLAG_FS_DEVICE);
+	bool result_mount = !(property & LANDLOCK_FLAG_FS_MOUNT);
+
+	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is a arraymap */
+	if (unlikely(!map)) {
+		WARN_ON(1);
+		return -EFAULT;
+	}
+	if (unlikely(!file))
+		return -ENOENT;
+	if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != _LANDLOCK_FLAG_FS_MASK))
+		return -EINVAL;
+
+	/* for now, only handle OP_OR */
+	switch (map_op) {
+	case BPF_MAP_ARRAY_OP_OR:
+		break;
+	case BPF_MAP_ARRAY_OP_UNSPEC:
+	case BPF_MAP_ARRAY_OP_AND:
+	case BPF_MAP_ARRAY_OP_XOR:
+	default:
+		return -EINVAL;
+	}
+
+	synchronize_rcu();
+
+	for (i = 0; i < array->n_entries; i++) {
+		handle = (struct map_landlock_handle *)
+				(array->value + array->elem_size * i);
+
+		if (handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) {
+			WARN_ON(1);
+			return -EFAULT;
+		}
+
+		p1 = &handle->file->f_path;
+		p2 = &file->f_path;
+		if (unlikely(!p1 || !p2)) {
+			WARN_ON(1);
+			return -EFAULT;
+		}
+
+		if (!result_dentry && p1->dentry == p2->dentry)
+			result_dentry = true;
+		/* TODO: use d_inode_rcu() instead? */
+		if (!result_inode
+		    && EQUAL_NOT_NULL(d_inode(p1->dentry)->i_ino,
+				      d_inode(p2->dentry)->i_ino))
+			result_inode = true;
+		/* check superblock instead of device major/minor */
+		if (!result_device
+		    && EQUAL_NOT_NULL(d_inode(p1->dentry)->i_sb,
+				      d_inode(p2->dentry)->i_sb))
+			result_device = true;
+		if (!result_mount && EQUAL_NOT_NULL(p1->mnt, p2->mnt))
+			result_mount = true;
+		if (result_dentry && result_inode && result_device && result_mount)
+			return 0;
+	}
+	return 1;
+}
+
+const struct bpf_func_proto bpf_landlock_cmp_fs_prop_with_struct_file_proto = {
+	.func		= bpf_landlock_cmp_fs_prop_with_struct_file,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_STRUCT_FILE,
+};
+
+/*
+ * bpf_landlock_cmp_fs_beneath_with_struct_file
+ *
+ * Cf. include/uapi/linux/bpf.h
+ */
+static inline u64 bpf_landlock_cmp_fs_beneath_with_struct_file(u64 r1_option,
+		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
+{
+	u8 option = (u8) r1_option;
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
+	enum bpf_map_array_op map_op = r3_map_op;
+	struct file *file = (struct file *) (unsigned long) r4_file;
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	struct path *p1, *p2;
+	struct map_landlock_handle *handle;
+	int i;
+
+	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
+	if (unlikely(!map)) {
+		WARN_ON(1);
+		return -EFAULT;
+	}
+	/* @file can be null for anonymous mmap */
+	if (unlikely(!file))
+		return -ENOENT;
+	if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
+		return -EINVAL;
+
+	/* for now, only handle OP_OR */
+	switch (map_op) {
+	case BPF_MAP_ARRAY_OP_OR:
+		break;
+	case BPF_MAP_ARRAY_OP_UNSPEC:
+	case BPF_MAP_ARRAY_OP_AND:
+	case BPF_MAP_ARRAY_OP_XOR:
+	default:
+		return -EINVAL;
+	}
+
+	synchronize_rcu();
+
+	for (i = 0; i < array->n_entries; i++) {
+		handle = (struct map_landlock_handle *)
+				(array->value + array->elem_size * i);
+
+		/* protected by the proto types, should not happen */
+		if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD)) {
+			WARN_ON(1);
+			return -EINVAL;
+		}
+
+		if (option & LANDLOCK_FLAG_OPT_REVERSE) {
+			p1 = &file->f_path;
+			p2 = &handle->file->f_path;
+		} else {
+			p1 = &handle->file->f_path;
+			p2 = &file->f_path;
+		}
+
+		if (path_is_under(p2, p1))
+			return 0;
+	}
+	return 1;
+}
+
+const struct bpf_func_proto bpf_landlock_cmp_fs_beneath_with_struct_file_proto = {
+	.func		= bpf_landlock_cmp_fs_beneath_with_struct_file,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_STRUCT_FILE,
+};
diff --git a/security/landlock/checker_fs.h b/security/landlock/checker_fs.h
new file mode 100644
index 000000000000..a62f84e39efd
--- /dev/null
+++ b/security/landlock/checker_fs.h
@@ -0,0 +1,20 @@
+/*
+ * Landlock LSM - File System Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_CHECKER_FS_H
+#define _SECURITY_LANDLOCK_CHECKER_FS_H
+
+#include <linux/fs.h>
+#include <linux/seccomp.h>
+
+extern const struct bpf_func_proto bpf_landlock_cmp_fs_prop_with_struct_file_proto;
+extern const struct bpf_func_proto bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+
+#endif /* _SECURITY_LANDLOCK_CHECKER_FS_H */
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 322309068066..8645743243b6 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -16,6 +16,8 @@
 #include <linux/lsm_hooks.h>
 #include <linux/seccomp.h> /* struct seccomp_* */
 
+#include "checker_fs.h"
+
 #define LANDLOCK_HOOK_INIT(NAME) LSM_HOOK_INIT(NAME, landlock_hook_##NAME)
 
 #define LANDLOCK_HOOKx(X, NAME, CNAME, ...)				\
@@ -117,7 +119,14 @@ static int landlock_run_prog(__u64 args[6])
 static const struct bpf_func_proto *bpf_landlock_func_proto(
 		enum bpf_func_id func_id)
 {
-	return NULL;
+	switch (func_id) {
+	case BPF_FUNC_landlock_cmp_fs_prop_with_struct_file:
+		return &bpf_landlock_cmp_fs_prop_with_struct_file_proto;
+	case BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file:
+		return &bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+	default:
+		return NULL;
+	}
 }
 
 static u32 landlock_convert_ctx_access(enum bpf_access_type type, int dst_reg,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 09/10] landlock: Handle cgroups
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (7 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 08/10] landlock: Handle file system comparisons Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 11:09   ` Andy Lutomirski
  2016-08-26  2:14   ` Alexei Starovoitov
  2016-08-25 10:32 ` [RFC v2 10/10] samples/landlock: Add sandbox example Mickaël Salaün
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
to compare the current process cgroup with a cgroup handle, The handle
can match the current cgroup if it is the same or a child. This allows
to make conditional rules according to the current cgroup.

A cgroup handle is a map entry created from a file descriptor referring
a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.

An unprivileged process can create and manipulate cgroups thanks to
cgroup delegation.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h                |  8 ++++
 include/uapi/linux/bpf.h           | 15 ++++++
 kernel/bpf/arraymap.c              | 30 ++++++++++++
 kernel/bpf/verifier.c              |  6 +++
 security/landlock/Kconfig          |  3 ++
 security/landlock/Makefile         |  2 +-
 security/landlock/checker_cgroup.c | 96 ++++++++++++++++++++++++++++++++++++++
 security/landlock/checker_cgroup.h | 18 +++++++
 security/landlock/lsm.c            |  8 ++++
 9 files changed, 185 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/checker_cgroup.c
 create mode 100644 security/landlock/checker_cgroup.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 79014aedbea4..9e6786e7a40a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -14,6 +14,9 @@
 
 #ifdef CONFIG_SECURITY_LANDLOCK
 #include <linux/fs.h> /* struct file */
+#ifdef CONFIG_CGROUPS
+#include <linux/cgroup-defs.h> /* struct cgroup_subsys_state */
+#endif	/* CONFIG_CGROUPS */
 #endif /* CONFIG_SECURITY_LANDLOCK */
 
 struct bpf_map;
@@ -85,6 +88,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_STRUCT_FILE,		/* pointer to struct file */
 	ARG_PTR_TO_STRUCT_CRED,		/* pointer to struct cred */
 	ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,	/* pointer to Landlock FS handle */
+	ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP,	/* pointer to Landlock cgroup handle */
 };
 
 /* type of values returned from helper functions */
@@ -148,6 +152,7 @@ enum bpf_reg_type {
 	PTR_TO_STRUCT_FILE,
 	PTR_TO_STRUCT_CRED,
 	CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+	CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP,
 };
 
 struct bpf_prog;
@@ -212,6 +217,9 @@ struct map_landlock_handle {
 	u32 type; /* e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD */
 	union {
 		struct file *file;
+#ifdef CONFIG_CGROUPS
+		struct cgroup_subsys_state *css;
+#endif	/* CONFIG_CGROUPS */
 	};
 };
 #endif /* CONFIG_SECURITY_LANDLOCK */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 88af79dd668c..7f60b9fdb35c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -90,12 +90,14 @@ enum bpf_map_type {
 enum bpf_map_array_type {
 	BPF_MAP_ARRAY_TYPE_UNSPEC,
 	BPF_MAP_ARRAY_TYPE_LANDLOCK_FS,
+	BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP,
 };
 
 enum bpf_map_handle_type {
 	BPF_MAP_HANDLE_TYPE_UNSPEC,
 	BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
 	BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB,
+	BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD,
 };
 
 enum bpf_map_array_op {
@@ -364,6 +366,19 @@ enum bpf_func_id {
 	 */
 	BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file,
 
+	/**
+	 * bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
+	 * Check if the current process is a leaf of cgroup handles
+	 *
+	 * @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE)
+	 * @map: handles to compare against
+	 * @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+	 *
+	 * Return: 0 if the current cgroup is the sam or beneath the handle,
+	 * 1 otherwise, or a negative value if an error occurred.
+	 */
+	BPF_FUNC_landlock_cmp_cgroup_beneath,
+
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 6804dafd8355..050b3d8d88c8 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -19,6 +19,12 @@
 #include <linux/file.h> /* fput() */
 #include <linux/fs.h> /* struct file */
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#ifdef CONFIG_CGROUPS
+#include <linux/cgroup-defs.h> /* struct cgroup_subsys_state */
+#endif	/* CONFIG_CGROUPS */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
 	int i;
@@ -514,6 +520,12 @@ static void landlock_put_handle(struct map_landlock_handle *handle)
 		else
 			WARN_ON(1);
 		break;
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD:
+		if (likely(handle->css))
+			css_put(handle->css);
+		else
+			WARN_ON(1);
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -541,6 +553,10 @@ static enum bpf_map_array_type landlock_get_array_type(
 	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
 	case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB:
 		return BPF_MAP_ARRAY_TYPE_LANDLOCK_FS;
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD:
+#ifdef CONFIG_CGROUPS
+		return BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP;
+#endif	/* CONFIG_CGROUPS */
 	case BPF_MAP_HANDLE_TYPE_UNSPEC:
 	default:
 		return -EINVAL;
@@ -557,6 +573,9 @@ static inline long landlock_store_handle(struct map_landlock_handle *dst,
 		struct landlock_handle *khandle)
 {
 	struct path kpath;
+#ifdef CONFIG_CGROUPS
+	struct cgroup_subsys_state *css;
+#endif	/* CONFIG_CGROUPS */
 	struct file *handle_file;
 
 	if (unlikely(!khandle))
@@ -569,6 +588,17 @@ static inline long landlock_store_handle(struct map_landlock_handle *dst,
 		FGET_OR_RET(handle_file, khandle->fd);
 		dst->file = handle_file;
 		break;
+#ifdef CONFIG_CGROUPS
+	case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD:
+		FGET_OR_RET(handle_file, khandle->fd);
+		css = css_tryget_online_from_dir(file_dentry(handle_file), NULL);
+		fput(handle_file);
+		/* NULL css check done by css_tryget_online_from_dir() */
+		if (IS_ERR(css))
+			return PTR_ERR(css);
+		dst->css = css;
+		break;
+#endif	/* CONFIG_CGROUPS */
 	default:
 		WARN_ON(1);
 		path_put(&kpath);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b182c88d5c13..b4e5c3bbc520 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -247,6 +247,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_STRUCT_FILE]	= "struct_file",
 	[PTR_TO_STRUCT_CRED]	= "struct_cred",
 	[CONST_PTR_TO_LANDLOCK_HANDLE_FS] = "landlock_handle_fs",
+	[CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP] = "landlock_handle_cgroup",
 };
 
 static void print_verifier_state(struct verifier_state *state)
@@ -560,6 +561,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_STRUCT_FILE:
 	case PTR_TO_STRUCT_CRED:
 	case CONST_PTR_TO_LANDLOCK_HANDLE_FS:
+	case CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP:
 		return true;
 	default:
 		return false;
@@ -955,6 +957,8 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		expected_type = PTR_TO_STRUCT_CRED;
 	} else if (arg_type == ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS) {
 		expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_FS;
+	} else if (arg_type == ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP) {
+		expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP;
 	} else if (arg_type == ARG_PTR_TO_STACK ||
 		   arg_type == ARG_PTR_TO_RAW_STACK) {
 		expected_type = PTR_TO_STACK;
@@ -1733,6 +1737,8 @@ static inline enum bpf_reg_type bpf_reg_type_from_map(struct bpf_map *map)
 	switch (map->map_array_type) {
 	case BPF_MAP_ARRAY_TYPE_LANDLOCK_FS:
 		return CONST_PTR_TO_LANDLOCK_HANDLE_FS;
+	case BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP:
+		return CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP;
 	case BPF_MAP_ARRAY_TYPE_UNSPEC:
 	default:
 		return CONST_PTR_TO_MAP;
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
index dc8328d216d7..414eb047e50e 100644
--- a/security/landlock/Kconfig
+++ b/security/landlock/Kconfig
@@ -10,6 +10,9 @@ config SECURITY_LANDLOCK
 	  of stacked eBPF programs for some LSM hooks. Each program can do some
 	  access comparison to check if an access request is legitimate.
 
+	  It is recommended to enable cgroups to be able to match a policy
+	  according to a group of processes.
+
 	  Further information about eBPF can be found in
 	  Documentation/networking/filter.txt
 
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 27f359a8cfaa..cdaaa152b849 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := lsm.o checker_fs.o
+landlock-y := lsm.o checker_fs.o checker_cgroup.o
diff --git a/security/landlock/checker_cgroup.c b/security/landlock/checker_cgroup.c
new file mode 100644
index 000000000000..97f29ac64188
--- /dev/null
+++ b/security/landlock/checker_cgroup.c
@@ -0,0 +1,96 @@
+/*
+ * Landlock LSM - cgroup Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifdef CONFIG_CGROUPS
+
+#include <asm/current.h>
+#include <linux/bpf.h> /* enum bpf_map_array_op */
+#include <linux/cgroup-defs.h> /* struct cgroup_subsys_state */
+#include <linux/cgroup.h> /* cgroup_is_descendant(), task_css_set() */
+#include <linux/errno.h>
+
+#include "checker_cgroup.h"
+
+
+/*
+ * bpf_landlock_cmp_cgroup_beneath
+ *
+ * Cf. include/uapi/linux/bpf.h
+ */
+static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map,
+		u64 r3_map_op, u64 r4, u64 r5)
+{
+	u8 option = (u8) r1_option;
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
+	enum bpf_map_array_op map_op = r3_map_op;
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	struct cgroup *cg1, *cg2;
+	struct map_landlock_handle *handle;
+	int i;
+
+	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
+	if (unlikely(!map)) {
+		WARN_ON(1);
+		return -EFAULT;
+	}
+	if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
+		return -EINVAL;
+
+	/* for now, only handle OP_OR */
+	switch (map_op) {
+	case BPF_MAP_ARRAY_OP_OR:
+		break;
+	case BPF_MAP_ARRAY_OP_UNSPEC:
+	case BPF_MAP_ARRAY_OP_AND:
+	case BPF_MAP_ARRAY_OP_XOR:
+	default:
+		return -EINVAL;
+	}
+
+	synchronize_rcu();
+
+	for (i = 0; i < array->n_entries; i++) {
+		handle = (struct map_landlock_handle *)
+				(array->value + array->elem_size * i);
+
+		/* protected by the proto types, should not happen */
+		if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
+			WARN_ON(1);
+			return -EFAULT;
+		}
+		if (unlikely(!handle->css)) {
+			WARN_ON(1);
+			return -EFAULT;
+		}
+
+		if (option & LANDLOCK_FLAG_OPT_REVERSE) {
+			cg1 = handle->css->cgroup;
+			cg2 = task_css_set(current)->dfl_cgrp;
+		} else {
+			cg1 = task_css_set(current)->dfl_cgrp;
+			cg2 = handle->css->cgroup;
+		}
+
+		if (cgroup_is_descendant(cg1, cg2))
+			return 0;
+	}
+	return 1;
+}
+
+const struct bpf_func_proto bpf_landlock_cmp_cgroup_beneath_proto = {
+	.func		= bpf_landlock_cmp_cgroup_beneath,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+#endif	/* CONFIG_CGROUPS */
diff --git a/security/landlock/checker_cgroup.h b/security/landlock/checker_cgroup.h
new file mode 100644
index 000000000000..497cad7c2bb8
--- /dev/null
+++ b/security/landlock/checker_cgroup.h
@@ -0,0 +1,18 @@
+/*
+ * Landlock LSM - cgroup Checkers
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifdef CONFIG_CGROUPS
+#ifndef _SECURITY_LANDLOCK_CHECKER_CGROUP_H
+#define _SECURITY_LANDLOCK_CHECKER_CGROUP_H
+
+extern const struct bpf_func_proto bpf_landlock_cmp_cgroup_beneath_proto;
+
+#endif /* _SECURITY_LANDLOCK_CHECKER_CGROUP_H */
+#endif /* CONFIG_CGROUPS */
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 8645743243b6..cc4759f4e6c5 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -18,6 +18,10 @@
 
 #include "checker_fs.h"
 
+#ifdef CONFIG_CGROUPS
+#include "checker_cgroup.h"
+#endif /* CONFIG_CGROUPS */
+
 #define LANDLOCK_HOOK_INIT(NAME) LSM_HOOK_INIT(NAME, landlock_hook_##NAME)
 
 #define LANDLOCK_HOOKx(X, NAME, CNAME, ...)				\
@@ -124,6 +128,10 @@ static const struct bpf_func_proto *bpf_landlock_func_proto(
 		return &bpf_landlock_cmp_fs_prop_with_struct_file_proto;
 	case BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file:
 		return &bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+	case BPF_FUNC_landlock_cmp_cgroup_beneath:
+#ifdef CONFIG_CGROUPS
+		return &bpf_landlock_cmp_cgroup_beneath_proto;
+#endif	/* CONFIG_CGROUPS */
 	default:
 		return NULL;
 	}
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC v2 10/10] samples/landlock: Add sandbox example
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (8 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 09/10] landlock: Handle cgroups Mickaël Salaün
@ 2016-08-25 10:32 ` Mickaël Salaün
  2016-08-25 11:05 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, kernel-hardening, linux-api, linux-security-module,
	netdev

Add a basic sandbox tool to create a process isolated from some part of
the system. This can depend of the current cgroup.

Example:

  $ mkdir /sys/fs/cgroup/sandboxed
  $ ls /home
  user1
  $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
      LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
      ./sandbox /bin/sh -i
  $ ls /home
  user1
  $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
  $ ls /home
  ls: cannot open directory '/home': Permission denied

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 samples/Makefile            |   2 +-
 samples/landlock/.gitignore |   1 +
 samples/landlock/Makefile   |  16 +++
 samples/landlock/sandbox.c  | 295 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 313 insertions(+), 1 deletion(-)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c

diff --git a/samples/Makefile b/samples/Makefile
index 2e3b523d7097..42e6a613f728 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -2,4 +2,4 @@
 
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ trace_events/ livepatch/ \
 			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
-			   configfs/ connector/ v4l/
+			   configfs/ connector/ v4l/ landlock/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index 000000000000..f6c6da930a30
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandbox
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index 000000000000..d1044b2afd27
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,16 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-$(CONFIG_SECURITY_LANDLOCK) := sandbox
+sandbox-objs := sandbox.o
+
+always := $(hostprogs-y)
+
+HOSTCFLAGS += -I$(objtree)/usr/include
+
+# Trick to allow make to be run from this directory
+all:
+	$(MAKE) -C ../../ $$PWD/
+
+clean:
+	$(MAKE) -C ../../ M=$$PWD clean
diff --git a/samples/landlock/sandbox.c b/samples/landlock/sandbox.c
new file mode 100644
index 000000000000..86604963c30c
--- /dev/null
+++ b/samples/landlock/sandbox.c
@@ -0,0 +1,295 @@
+/*
+ * Landlock LSM - Sandbox Example
+ *
+ * Copyright (C) 2016  Mickaël Salaün <mic@digikod.net>
+ *
+ * The code may be used by anyone for any purpose, and can serve as a starting
+ * point for developing a sandbox.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h> /* open() */
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/prctl.h>
+#include <linux/seccomp.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#include "../../tools/include/linux/filter.h"
+
+#include "../bpf/libbpf.c"
+
+#ifndef seccomp
+static int seccomp(unsigned int op, unsigned int flags, void *args)
+{
+	errno = 0;
+	return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
+
+#define ARRAY_SIZE(a)	(sizeof(a) / sizeof(a[0]))
+
+static int apply_sandbox(const char **allowed_paths, int path_nb, const char **cgroup_paths, int cgroup_nb)
+{
+	__u32 key;
+	int i, ret = 0, map_fs = -1, map_cg = -1, offset;
+
+	/* set up the test sandbox */
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(no_new_priv)");
+		return 1;
+	}
+
+	/* register a new syscall filter */
+	struct sock_filter filter0[] = {
+		/* pass a cookie containing 5 to the LSM hook filter */
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_LANDLOCK | 5),
+	};
+	struct sock_fprog prog0 = {
+		.len = (unsigned short)ARRAY_SIZE(filter0),
+		.filter = filter0,
+	};
+	if (seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog0)) {
+		perror("seccomp(set_filter)");
+		return 1;
+	}
+
+	if (path_nb) {
+		map_fs = bpf_create_map(BPF_MAP_TYPE_LANDLOCK_ARRAY, sizeof(key), sizeof(struct landlock_handle), 10, 0);
+		if (map_fs < 0) {
+			fprintf(stderr, "bpf_create_map(fs");
+			perror(")");
+			return 1;
+		}
+		for (key = 0; key < path_nb; key++) {
+			int fd = open(allowed_paths[key], O_RDONLY | O_CLOEXEC);
+			if (fd < 0) {
+				fprintf(stderr, "open(fs: \"%s\"", allowed_paths[key]);
+				perror(")");
+				return 1;
+			}
+			struct landlock_handle handle = {
+				.type = BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
+				.fd = (__u64)fd,
+			};
+
+			/* register a new LSM handle */
+			if (bpf_update_elem(map_fs, &key, &handle, BPF_ANY)) {
+				fprintf(stderr, "bpf_update_elem(fs: \"%s\"", allowed_paths[key]);
+				perror(")");
+				close(fd);
+				return 1;
+			}
+			close(fd);
+		}
+	}
+	if (cgroup_nb) {
+		map_cg = bpf_create_map(BPF_MAP_TYPE_LANDLOCK_ARRAY, sizeof(key), sizeof(struct landlock_handle), 10, 0);
+		if (map_cg < 0) {
+			fprintf(stderr, "bpf_create_map(cgroup");
+			perror(")");
+			ret = 1;
+			goto err_map_cgroup;
+		}
+		for (key = 0; key < cgroup_nb; key++) {
+			int fd = open(cgroup_paths[key], O_RDONLY | O_CLOEXEC);
+			if (fd < 0) {
+				fprintf(stderr, "open(cgroup: \"%s\"", cgroup_paths[key]);
+				perror(")");
+				return 1;
+			}
+			struct landlock_handle handle = {
+				.type = BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD,
+				.fd = (__u64)fd,
+			};
+
+			/* register a new LSM handle */
+			if (bpf_update_elem(map_cg, &key, &handle, BPF_ANY)) {
+				fprintf(stderr, "bpf_update_elem(cgroup: \"%s\"", cgroup_paths[key]);
+				perror(")");
+				close(fd);
+				return 1;
+			}
+			close(fd);
+		}
+	}
+
+	/* load a LSM filter hook (eBPF) */
+	struct bpf_insn hook_pre[] = {
+		/* save context */
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+		/* check our cookie (not used in this example) */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_6, offsetof(struct landlock_data, cookie)),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 5, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	struct bpf_insn hook_path[] = {
+		/* specify an option, if any */
+		BPF_MOV32_IMM(BPF_REG_1, 0),
+		/* handles to compare with */
+		BPF_LD_MAP_FD(BPF_REG_2, map_fs),
+		BPF_MOV64_IMM(BPF_REG_3, BPF_MAP_ARRAY_OP_OR),
+		/* hook argument (struct file) */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_6, offsetof(struct landlock_data, args[0])),
+		/* checker function */
+		BPF_EMIT_CALL(BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file),
+
+		/* if the checked path is beneath the handle */
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+		/* allow anonymous mapping */
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, -ENOENT, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+		/* deny by default, if any error */
+		BPF_JMP_IMM(BPF_JGE, BPF_REG_0, 0, 2),
+		BPF_MOV32_IMM(BPF_REG_0, EACCES),
+		BPF_EXIT_INSN(),
+	};
+	struct bpf_insn hook_cgroup[] = {
+		/* specify an option, if any */
+		BPF_MOV32_IMM(BPF_REG_1, 0),
+		/* handles to compare with */
+		BPF_LD_MAP_FD(BPF_REG_2, map_cg),
+		BPF_MOV64_IMM(BPF_REG_3, BPF_MAP_ARRAY_OP_OR),
+		/* checker function */
+		BPF_EMIT_CALL(BPF_FUNC_landlock_cmp_cgroup_beneath),
+
+		/* if the current process is in a blacklisted cgroup */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 1, 2),
+		BPF_MOV32_IMM(BPF_REG_0, EACCES),
+		BPF_EXIT_INSN(),
+	};
+	struct bpf_insn hook_post[] = {
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	/* deny all processes if no cgroup is specified */
+	if (cgroup_nb == 0) {
+		hook_post[0] = BPF_MOV32_IMM(BPF_REG_0, EACCES);
+	}
+
+	unsigned long hook_size = sizeof(hook_pre) + sizeof(hook_path) * (path_nb ? 1 : 0) +
+		sizeof(hook_cgroup) * (cgroup_nb ? 1 : 0) + sizeof(hook_post);
+
+	struct bpf_insn *hook0 = malloc(hook_size);
+	if (!hook0) {
+		perror("malloc");
+		ret = 1;
+		goto err_alloc;
+	}
+	memcpy(hook0, hook_pre, sizeof(hook_pre));
+	offset = sizeof(hook_pre) / sizeof(hook0[0]);
+	if (path_nb) {
+		memcpy(hook0 + offset, hook_path, sizeof(hook_path));
+		offset += sizeof(hook_path) / sizeof(hook0[0]);
+	}
+	if (cgroup_nb) {
+		memcpy(hook0 + offset, hook_cgroup, sizeof(hook_cgroup));
+		offset += sizeof(hook_cgroup) / sizeof(hook0[0]);
+	}
+	memcpy(hook0 + offset, hook_post, sizeof(hook_post));
+
+	/* TODO: handle inode_permission hook (e.g. chdir) */
+	enum bpf_prog_type hook_types[] = {
+		BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
+		BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
+		BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
+	};
+	for (i = 0; i < ARRAY_SIZE(hook_types); i++) {
+		int bpf0 = bpf_prog_load(hook_types[i],
+				hook0, hook_size, "GPL", 0);
+		if (bpf0 == -1) {
+			perror("bpf");
+			fprintf(stderr, "%s", bpf_log_buf);
+			ret = 1;
+			break;
+		}
+		if (seccomp(SECCOMP_SET_LANDLOCK_HOOK, 0, &bpf0)) {
+			perror("seccomp(set_hook)");
+			ret = 1;
+			close(bpf0);
+			break;
+		}
+		close(bpf0);
+	}
+
+	free(hook0);
+err_alloc:
+	if (cgroup_nb) {
+		close(map_cg);
+	}
+err_map_cgroup:
+	if (path_nb) {
+		close(map_fs);
+	}
+	return ret;
+}
+
+#define ENV_FS_PATH_NAME "LANDLOCK_ALLOWED"
+#define ENV_CGROUP_PATH_NAME "LANDLOCK_CGROUPS"
+#define ENV_PATH_TOKEN ":"
+
+static int parse_path(char *env_path, const char ***path_list) {
+	int i, path_nb = 0;
+
+	if (env_path) {
+		path_nb++;
+		for (i = 0; env_path[i]; i++) {
+			if (env_path[i] == ENV_PATH_TOKEN[0]) {
+				path_nb++;
+			}
+		}
+	}
+	*path_list = malloc(path_nb * sizeof(**path_list));
+	for (i = 0; i < path_nb; i++) {
+		(*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN);
+	}
+
+	return path_nb;
+}
+
+int main(int argc, char * const argv[], char * const *envp)
+{
+	char *cmd_path;
+	char *env_path_allowed, *env_path_cgroup;
+	int path_nb, cgroup_nb;
+	const char **sb_paths = NULL;
+	const char **cg_paths = NULL;
+	char * const *cmd_argv;
+
+	env_path_allowed = getenv(ENV_FS_PATH_NAME);
+	if (env_path_allowed)
+		env_path_allowed = strdup(env_path_allowed);
+	env_path_cgroup = getenv(ENV_CGROUP_PATH_NAME);
+	if (env_path_cgroup)
+		env_path_cgroup = strdup(env_path_cgroup);
+
+	if (argc < 2) {
+		fprintf(stderr, "usage: %s <cmd> [args]...\n\n", argv[0]);
+		fprintf(stderr, "Environment variables containing paths, each separated by a colon:\n");
+		fprintf(stderr, "* %s (whitelist of allowed files and directories)\n", ENV_FS_PATH_NAME);
+		fprintf(stderr, "* %s (optional cgroups for which the sandbox is enabled)\n", ENV_CGROUP_PATH_NAME);
+		fprintf(stderr, "\nexample:\n%s='/sys/fs/cgroup/sandboxed' %s='/bin:/lib:/usr:/tmp:/proc/self/fd/0' %s /bin/sh -i\n", ENV_CGROUP_PATH_NAME, ENV_FS_PATH_NAME, argv[0]);
+		return 1;
+	}
+	path_nb = parse_path(env_path_allowed, &sb_paths);
+	cgroup_nb = parse_path(env_path_cgroup, &cg_paths);
+	cmd_path = argv[1];
+	cmd_argv = argv + 1;
+	if (apply_sandbox(sb_paths, path_nb, cg_paths, cgroup_nb))
+		return 1;
+	execve(cmd_path, cmd_argv, envp);
+	perror("execve");
+	return 1;
+}
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (9 preceding siblings ...)
  2016-08-25 10:32 ` [RFC v2 10/10] samples/landlock: Add sandbox example Mickaël Salaün
@ 2016-08-25 11:05 ` Andy Lutomirski
  2016-08-25 13:57   ` Mickaël Salaün
  2016-08-27  7:40 ` Andy Lutomirski
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-25 11:05 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development

On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.
>

Maybe I'm missing an obvious description, but: do you have a
description of the eBPF API to landlock?  What function do you
provide, when is it called, what functions can it call, what does the
fancy new arraymap do, etc?

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-25 10:32 ` [RFC v2 09/10] landlock: Handle cgroups Mickaël Salaün
@ 2016-08-25 11:09   ` Andy Lutomirski
  2016-08-25 14:44     ` Mickaël Salaün
  2016-08-26  2:14   ` Alexei Starovoitov
  1 sibling, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-25 11:09 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development

On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
> to compare the current process cgroup with a cgroup handle, The handle
> can match the current cgroup if it is the same or a child. This allows
> to make conditional rules according to the current cgroup.
>
> A cgroup handle is a map entry created from a file descriptor referring
> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.

Can you elaborate on why this is useful?  I.e. why not just supply
different policies to different subtrees.

Also, how does this interact with the current cgroup v1 vs v2 mess?
As far as I can tell, no one can even really agree on what "what
cgroup am I in" means right now.

>
> An unprivileged process can create and manipulate cgroups thanks to
> cgroup delegation.

What is cgroup delegation?

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 08/10] landlock: Handle file system comparisons
  2016-08-25 10:32 ` [RFC v2 08/10] landlock: Handle file system comparisons Mickaël Salaün
@ 2016-08-25 11:12   ` Andy Lutomirski
  2016-08-25 14:10     ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-25 11:12 UTC (permalink / raw)
  To: Mickaël Salaün, Eric W. Biederman
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development

On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Add eBPF functions to compare file system access with a Landlock file
> system handle:
> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>   This function allows to compare the dentry, inode, device or mount
>   point of the currently accessed file, with a reference handle.
> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>   This function allows an eBPF program to check if the current accessed
>   file is the same or in the hierarchy of a reference handle.
>
> The goal of file system handle is to abstract kernel objects such as a
> struct file or a struct inode. Userland can create this kind of handle
> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
> landlock_handle containing the handle type (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
> also be any descriptions able to match a struct file or a struct inode
> (e.g. path or glob string).

This needs Eric's opinion.

Also, where do all the struct file *'s get stashed?  Are they
preserved in the arraymap?  What prevents reference cycles or absurdly
large numbers of struct files getting pinned?

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 07/10] landlock: Add errno check
  2016-08-25 10:32 ` [RFC v2 07/10] landlock: Add errno check Mickaël Salaün
@ 2016-08-25 11:13   ` Andy Lutomirski
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-25 11:13 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development

On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Add a max errno value.
>
> This is not strictly needed but should improve reliability.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Kees Cook <keescook@chromium.org>
> ---
>  include/uapi/asm-generic/errno-base.h | 1 +
>  security/landlock/lsm.c               | 6 +++---
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/asm-generic/errno-base.h b/include/uapi/asm-generic/errno-base.h
> index 65115978510f..43407a403e72 100644
> --- a/include/uapi/asm-generic/errno-base.h
> +++ b/include/uapi/asm-generic/errno-base.h
> @@ -35,5 +35,6 @@
>  #define        EPIPE           32      /* Broken pipe */
>  #define        EDOM            33      /* Math argument out of domain of func */
>  #define        ERANGE          34      /* Math result not representable */
> +#define        _ERRNO_LAST     ERANGE

At the very least this needs a more sensible name.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-25 11:05 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
@ 2016-08-25 13:57   ` Mickaël Salaün
  0 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 13:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 1810 bytes --]


On 25/08/2016 13:05, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> Hi,
>>
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
>>
> 
> Maybe I'm missing an obvious description, but: do you have a
> description of the eBPF API to landlock?  What function do you
> provide, when is it called, what functions can it call, what does the
> fancy new arraymap do, etc?
> 
> --Andy
> 

The eBPF context is described in "[RFC v2 06/10] landlock: Add LSM hooks".

The provided eBPF functions are described in "[RFC v2 08/10] landlock:
Handle file system comparisons"
(bpf_landlock_cmp_fs_prop_with_struct_file and
bpf_landlock_cmp_fs_beneath_with_struct_file) and "[RFC v2 09/10]
landlock: Handle cgroups" (bpf_landlock_cmp_cgroup_beneath). The
function descriptions are summarized in include/uapi/linux/bpf.h .

This functions can be called by an eBPF program of type
BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION
and BPF_PROG_TYPE_LANDLOCK_MMAP_FILE as described in "[RFC v2 06/10]
landlock: Add LSM hooks".

I tried to split the commits as much as possible to ease the review. The
"[RFC v2 10/10] samples/landlock: Add sandbox example" may help to see
the whole picture.

Hope this helps,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 08/10] landlock: Handle file system comparisons
  2016-08-25 11:12   ` Andy Lutomirski
@ 2016-08-25 14:10     ` Mickaël Salaün
  2016-08-26 14:57       ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 14:10 UTC (permalink / raw)
  To: Andy Lutomirski, Eric W. Biederman
  Cc: LKML, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 1887 bytes --]


On 25/08/2016 13:12, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> Add eBPF functions to compare file system access with a Landlock file
>> system handle:
>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>   This function allows to compare the dentry, inode, device or mount
>>   point of the currently accessed file, with a reference handle.
>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>   This function allows an eBPF program to check if the current accessed
>>   file is the same or in the hierarchy of a reference handle.
>>
>> The goal of file system handle is to abstract kernel objects such as a
>> struct file or a struct inode. Userland can create this kind of handle
>> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
>> landlock_handle containing the handle type (e.g.
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
>> also be any descriptions able to match a struct file or a struct inode
>> (e.g. path or glob string).
> 
> This needs Eric's opinion.
> 
> Also, where do all the struct file *'s get stashed?  Are they
> preserved in the arraymap?  What prevents reference cycles or absurdly
> large numbers of struct files getting pinned?

Yes, the struct file are kept in the arraymap and dropped when there is
no more reference on them. Currently, the limitations are the maximum
number of open file descriptors referring to an arraymap and the maximum
number of eBPF Landlock programs loaded in a process
(LANDLOCK_PROG_LIST_MAX_PAGES in kernel/seccomp.c).

What kind of reference cycles have you in mind?

It probably needs another limit for kernel object references as well.
What is the best option here? Add another static limitation or use an
existing one?

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-25 11:09   ` Andy Lutomirski
@ 2016-08-25 14:44     ` Mickaël Salaün
  2016-08-26 12:55       ` Tejun Heo
  2016-08-26 14:20       ` Andy Lutomirski
  0 siblings, 2 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-25 14:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development, cgroups


[-- Attachment #1.1: Type: text/plain, Size: 1850 bytes --]


On 25/08/2016 13:09, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
>> to compare the current process cgroup with a cgroup handle, The handle
>> can match the current cgroup if it is the same or a child. This allows
>> to make conditional rules according to the current cgroup.
>>
>> A cgroup handle is a map entry created from a file descriptor referring
>> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
>> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
>> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
> 
> Can you elaborate on why this is useful?  I.e. why not just supply
> different policies to different subtrees.

The main use case I see is to load the security policies at the start of
a user session for all processes but not enforce them right away. The
user can then keep a shell for Landlock administration tasks and lock
the other processes with a dedicated cgroup on the fly. This allows the
user to make unremovable Landlock security policies but only activate
them when needed for specific processes.

> 
> Also, how does this interact with the current cgroup v1 vs v2 mess?
> As far as I can tell, no one can even really agree on what "what
> cgroup am I in" means right now.

I tested with cgroup-v2 but indeed, it seems a bit different with
cgroup-v1 :)
Does anyone know how to handle both cases?

> 
>>
>> An unprivileged process can create and manipulate cgroups thanks to
>> cgroup delegation.
> 
> What is cgroup delegation?

This is simply the action of changing the owner of cgroup sysfs files to
allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt)

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-25 10:32 ` [RFC v2 09/10] landlock: Handle cgroups Mickaël Salaün
  2016-08-25 11:09   ` Andy Lutomirski
@ 2016-08-26  2:14   ` Alexei Starovoitov
  2016-08-26 15:10     ` Mickaël Salaün
  1 sibling, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-26  2:14 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Elena Reshetova,
	Kees Cook, Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, anaravaram, tj

On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote:
> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
> to compare the current process cgroup with a cgroup handle, The handle
> can match the current cgroup if it is the same or a child. This allows
> to make conditional rules according to the current cgroup.
> 
> A cgroup handle is a map entry created from a file descriptor referring
> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
> 
> An unprivileged process can create and manipulate cgroups thanks to
> cgroup delegation.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
...
> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map,
> +		u64 r3_map_op, u64 r4, u64 r5)
> +{
> +	u8 option = (u8) r1_option;
> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +	enum bpf_map_array_op map_op = r3_map_op;
> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	struct cgroup *cg1, *cg2;
> +	struct map_landlock_handle *handle;
> +	int i;
> +
> +	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
> +	if (unlikely(!map)) {
> +		WARN_ON(1);
> +		return -EFAULT;
> +	}
> +	if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
> +		return -EINVAL;
> +
> +	/* for now, only handle OP_OR */
> +	switch (map_op) {
> +	case BPF_MAP_ARRAY_OP_OR:
> +		break;
> +	case BPF_MAP_ARRAY_OP_UNSPEC:
> +	case BPF_MAP_ARRAY_OP_AND:
> +	case BPF_MAP_ARRAY_OP_XOR:
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	synchronize_rcu();
> +
> +	for (i = 0; i < array->n_entries; i++) {
> +		handle = (struct map_landlock_handle *)
> +				(array->value + array->elem_size * i);
> +
> +		/* protected by the proto types, should not happen */
> +		if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
> +			WARN_ON(1);
> +			return -EFAULT;
> +		}
> +		if (unlikely(!handle->css)) {
> +			WARN_ON(1);
> +			return -EFAULT;
> +		}
> +
> +		if (option & LANDLOCK_FLAG_OPT_REVERSE) {
> +			cg1 = handle->css->cgroup;
> +			cg2 = task_css_set(current)->dfl_cgrp;
> +		} else {
> +			cg1 = task_css_set(current)->dfl_cgrp;
> +			cg2 = handle->css->cgroup;
> +		}
> +
> +		if (cgroup_is_descendant(cg1, cg2))
> +			return 0;
> +	}
> +	return 1;
> +}

- please take a loook at exisiting bpf_current_task_under_cgroup and
reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array
is nothing but duplication of the code.

- I don't think such 'for' loop can scale. The solution needs to work
with thousands of containers and thousands of cgroups.
In the patch 06/10 the proposal is to use 'current' as holder of
the programs:
+   for (prog = current->seccomp.landlock_prog;
+                   prog; prog = prog->prev) {
+           if (prog->filter == landlock_ret->filter) {
+                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
+                   break;
+           }
+   }
imo that's the root of scalability issue.
I think to be able to scale the bpf programs have to be attached to
cgroups instead of tasks.
That would be very different api. seccomp doesn't need to be touched.
But that is the only way I see to be able to scale.
May be another way of thinking about it is 'lsm cgroup controller'
that Sargun is proposing.
The lsm hooks will provide stable execution points and the programs
will be called like:
prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
BPF_PROG_RUN(prog, ctx);
The delegation functionality and 'prog_effective' logic that
Daniel Mack is proposing will be fully reused here.
External container management software will be able to apply bpf
programs to control tasks under cgroup and such
bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
The user will be able to register different programs for different lsm hooks.
If I understand the patch 6/10 correctly, there is one (or a list) prog for
all lsm hooks per task which is not flexible enough.
Anoop Naravaram's use case is to control the ports the applications
under cgroup can bind and listen on.
Such use case can be solved by such 'lsm cgroup controller' by
attaching bpf program to security_socket_bind lsm hook and
filtering sockaddr.
Furthermore Sargun's use case is to allow further sockaddr rewrites
from the bpf program which can be done as natural extension
of such mechanism.

If I understood Daniel's Anoop's Sargun's and yours use cases
correctly the common piece of kernel infrastructure that can solve
them all can start from Daniel's current set of patches that
establish a mechanism of attaching bpf program to a cgroup.
Then adding lsm hooks to it and later allowing argument rewrite
(since they're already in the kernel and no ToCToU problems exist)

As far as safety and type checking that bpf programs has to do,
I like the approach of patch 06/10:
+LANDLOCK_HOOK2(file_open, FILE_OPEN,
+       PTR_TO_STRUCT_FILE, struct file *, file,
+       PTR_TO_STRUCT_CRED, const struct cred *, cred
+)
teaching verifier to recognize struct file, cred, sockaddr
will let bpf program access them naturally without any overhead.
Though:
@@ -102,6 +102,9 @@ enum bpf_prog_type {
        BPF_PROG_TYPE_SCHED_CLS,
        BPF_PROG_TYPE_SCHED_ACT,
        BPF_PROG_TYPE_TRACEPOINT,
+       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
+       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
+       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
 };
is a bit of overkill.
I think it would be cleaner to have single
BPF_PROG_TYPE_LSM and at program load time pass
lsm_hook_id as well, so that verifier can do safety checks
based on type info provided in LANDLOCK_HOOKs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-25 14:44     ` Mickaël Salaün
@ 2016-08-26 12:55       ` Tejun Heo
  2016-08-26 14:20       ` Andy Lutomirski
  1 sibling, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2016-08-26 12:55 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Andy Lutomirski, LKML, Alexei Starovoitov, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development,
	cgroups

Hello,

On Thu, Aug 25, 2016 at 04:44:13PM +0200, Mickaël Salaün wrote:
> I tested with cgroup-v2 but indeed, it seems a bit different with
> cgroup-v1 :)
> Does anyone know how to handle both cases?

If you wanna do cgroup membership test, just do cgroup v2 membership
test.  No need to introduce a new controller and possibly struct sock
association field for that.  That's what all new cgroup aware network
operations are using anyway and doesn't conflicts with whether other
controllers are v1 or v2.

For examples of using cgroup v2 membership test, please take a look at
cgroup_mt_v1().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-25 14:44     ` Mickaël Salaün
  2016-08-26 12:55       ` Tejun Heo
@ 2016-08-26 14:20       ` Andy Lutomirski
  2016-08-26 15:50         ` Tejun Heo
  1 sibling, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-26 14:20 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development,
	open list:CONTROL GROUP (CGROUP)

On Thu, Aug 25, 2016 at 7:44 AM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 25/08/2016 13:09, Andy Lutomirski wrote:
>> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
>>> to compare the current process cgroup with a cgroup handle, The handle
>>> can match the current cgroup if it is the same or a child. This allows
>>> to make conditional rules according to the current cgroup.
>>>
>>> A cgroup handle is a map entry created from a file descriptor referring
>>> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
>>> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
>>> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
>>
>> Can you elaborate on why this is useful?  I.e. why not just supply
>> different policies to different subtrees.
>
> The main use case I see is to load the security policies at the start of
> a user session for all processes but not enforce them right away. The
> user can then keep a shell for Landlock administration tasks and lock
> the other processes with a dedicated cgroup on the fly. This allows the
> user to make unremovable Landlock security policies but only activate
> them when needed for specific processes.

This seems like a bit of a dubious use case to me.  The landlock
mechanism should be flexible enough to do this kind of thing even
without cgroups, and "spawn a process, wait a while, and then confine
it by fiddling with cgroups" seems a lot dicier than just loading the
right policy in the first place, especially since eBPF policies can be
stateful.

>
>>
>> Also, how does this interact with the current cgroup v1 vs v2 mess?
>> As far as I can tell, no one can even really agree on what "what
>> cgroup am I in" means right now.
>
> I tested with cgroup-v2 but indeed, it seems a bit different with
> cgroup-v1 :)
> Does anyone know how to handle both cases?
>
>>
>>>
>>> An unprivileged process can create and manipulate cgroups thanks to
>>> cgroup delegation.
>>
>> What is cgroup delegation?
>
> This is simply the action of changing the owner of cgroup sysfs files to
> allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt)

As far as I can tell, Tejun and systemd both actively discourage doing
this.  Maybe I misunderstand.  But in any event, the admin giving you
a cgroup hierarchy you can use for this means that the admin has to
cooperate with your policy, and it further requires (with cgroup v2 or
similar, which is most likely the future) that your lockdown policy be
compatible with your resource control policy.

I would suggest dropping this lockdown feature until a use case
emerges that really can't be addressed adequately without it.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 08/10] landlock: Handle file system comparisons
  2016-08-25 14:10     ` Mickaël Salaün
@ 2016-08-26 14:57       ` Andy Lutomirski
  2016-08-27 13:45         ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-26 14:57 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Eric W. Biederman, LKML, Alexei Starovoitov, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development

On Thu, Aug 25, 2016 at 7:10 AM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 25/08/2016 13:12, Andy Lutomirski wrote:
>> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> Add eBPF functions to compare file system access with a Landlock file
>>> system handle:
>>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>>   This function allows to compare the dentry, inode, device or mount
>>>   point of the currently accessed file, with a reference handle.
>>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>>   This function allows an eBPF program to check if the current accessed
>>>   file is the same or in the hierarchy of a reference handle.
>>>
>>> The goal of file system handle is to abstract kernel objects such as a
>>> struct file or a struct inode. Userland can create this kind of handle
>>> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
>>> landlock_handle containing the handle type (e.g.
>>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
>>> also be any descriptions able to match a struct file or a struct inode
>>> (e.g. path or glob string).
>>
>> This needs Eric's opinion.
>>
>> Also, where do all the struct file *'s get stashed?  Are they
>> preserved in the arraymap?  What prevents reference cycles or absurdly
>> large numbers of struct files getting pinned?
>
> Yes, the struct file are kept in the arraymap and dropped when there is
> no more reference on them. Currently, the limitations are the maximum
> number of open file descriptors referring to an arraymap and the maximum
> number of eBPF Landlock programs loaded in a process
> (LANDLOCK_PROG_LIST_MAX_PAGES in kernel/seccomp.c).
>
> What kind of reference cycles have you in mind?

Shoving evil things into the arraymaps, e.g. unix sockets with
SCM_RIGHTS messages pending, eBPF program references, the arraymap fd
itself, another arraymap fd, etc.

>
> It probably needs another limit for kernel object references as well.
> What is the best option here? Add another static limitation or use an
> existing one?

Dunno.  If RLIMIT_FILE could be made to work, that would be nice.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-26  2:14   ` Alexei Starovoitov
@ 2016-08-26 15:10     ` Mickaël Salaün
  2016-08-26 23:05       ` Alexei Starovoitov
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-26 15:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Elena Reshetova,
	Kees Cook, Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Anoop Naravaram, Tejun Heo,
	cgroups, corbet, lizefan, hannes, davem, kuznet, James Morris,
	yoshfuji, kaber, edumazet, maheshb, weiwan, tom, Will Drewry


[-- Attachment #1.1: Type: text/plain, Size: 10169 bytes --]


On 26/08/2016 04:14, Alexei Starovoitov wrote:
> On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote:
>> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
>> to compare the current process cgroup with a cgroup handle, The handle
>> can match the current cgroup if it is the same or a child. This allows
>> to make conditional rules according to the current cgroup.
>>
>> A cgroup handle is a map entry created from a file descriptor referring
>> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
>> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
>> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
>>
>> An unprivileged process can create and manipulate cgroups thanks to
>> cgroup delegation.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ...
>> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map,
>> +		u64 r3_map_op, u64 r4, u64 r5)
>> +{
>> +	u8 option = (u8) r1_option;
>> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
>> +	enum bpf_map_array_op map_op = r3_map_op;
>> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
>> +	struct cgroup *cg1, *cg2;
>> +	struct map_landlock_handle *handle;
>> +	int i;
>> +
>> +	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
>> +	if (unlikely(!map)) {
>> +		WARN_ON(1);
>> +		return -EFAULT;
>> +	}
>> +	if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
>> +		return -EINVAL;
>> +
>> +	/* for now, only handle OP_OR */
>> +	switch (map_op) {
>> +	case BPF_MAP_ARRAY_OP_OR:
>> +		break;
>> +	case BPF_MAP_ARRAY_OP_UNSPEC:
>> +	case BPF_MAP_ARRAY_OP_AND:
>> +	case BPF_MAP_ARRAY_OP_XOR:
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	synchronize_rcu();
>> +
>> +	for (i = 0; i < array->n_entries; i++) {
>> +		handle = (struct map_landlock_handle *)
>> +				(array->value + array->elem_size * i);
>> +
>> +		/* protected by the proto types, should not happen */
>> +		if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
>> +			WARN_ON(1);
>> +			return -EFAULT;
>> +		}
>> +		if (unlikely(!handle->css)) {
>> +			WARN_ON(1);
>> +			return -EFAULT;
>> +		}
>> +
>> +		if (option & LANDLOCK_FLAG_OPT_REVERSE) {
>> +			cg1 = handle->css->cgroup;
>> +			cg2 = task_css_set(current)->dfl_cgrp;
>> +		} else {
>> +			cg1 = task_css_set(current)->dfl_cgrp;
>> +			cg2 = handle->css->cgroup;
>> +		}
>> +
>> +		if (cgroup_is_descendant(cg1, cg2))
>> +			return 0;
>> +	}
>> +	return 1;
>> +}
> 
> - please take a loook at exisiting bpf_current_task_under_cgroup and
> reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array
> is nothing but duplication of the code.

Oh, I didn't know about this patchset and the new helper. Indeed, it
looks a lot like mine except there is no static verification of the map
type as I did with the arraymap of handles, and no batch mode either. I
think the return value of bpf_current_task_under_cgroup is error-prone
if an eBPF program do an "if(ret)" test on the value (because of the
negative ERRNO return value). Inverting the 0 and 1 return values should
fix this (0 == succeed, 1 == failed, <0 == error).


To sum up, there is four related patchsets:
* "Landlock LSM: Unprivileged sandboxing" (this series)
* "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
* "Networking cgroup controller" (Anoop Naravaram)
* "Add eBPF hooks for cgroups" (Daniel Mack)

The three other series (Sargun's, Anoop's and Daniel's) are mainly
focused on network access-control via cgroup for *containers*. As far as
I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's
goal is to empower all processes (privileged or not) to create their own
sandbox. This also means, like explained in "[RFC v2 00/10] Landlock
LSM: Unprivileged sandboxing", there is more constraints. For example,
it is not acceptable to let a process probe the kernel memory as it
wish. More details are in the Landlock cover-letter.


Another important point is that supporting cgroup for Landlock is
optional. It does not rely on cgroup to be usable but is only a feature
available when (unprivileged) users can manage there own cgroup, which
is an important constraint. Put another way, Landlock should not rely on
cgroup to create sandboxes. Indeed, a process creating a sandbox do not
necessarily have access to the cgroup mount point (directly or not).


> 
> - I don't think such 'for' loop can scale. The solution needs to work
> with thousands of containers and thousands of cgroups.
> In the patch 06/10 the proposal is to use 'current' as holder of
> the programs:
> +   for (prog = current->seccomp.landlock_prog;
> +                   prog; prog = prog->prev) {
> +           if (prog->filter == landlock_ret->filter) {
> +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> +                   break;
> +           }
> +   }
> imo that's the root of scalability issue.
> I think to be able to scale the bpf programs have to be attached to
> cgroups instead of tasks.
> That would be very different api. seccomp doesn't need to be touched.
> But that is the only way I see to be able to scale.

Landlock is inspired from seccomp which also use a BPF program per
thread. For seccomp, each BPF programs are executed for each syscall.
For Landlock, some BPF programs are executed for some LSM hooks. I don't
see why it is a scale issue for Landlock comparing to seccomp. I also
don't see why storing the BPF program list pointer in the cgroup struct
instead of the task struct change a lot here. The BPF programs execution
will be the same anyway (for each LSM hook). Kees should probably have a
better opinion on this.


> May be another way of thinking about it is 'lsm cgroup controller'
> that Sargun is proposing.
> The lsm hooks will provide stable execution points and the programs
> will be called like:
> prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
> BPF_PROG_RUN(prog, ctx);
> The delegation functionality and 'prog_effective' logic that
> Daniel Mack is proposing will be fully reused here.
> External container management software will be able to apply bpf
> programs to control tasks under cgroup and such
> bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
> The user will be able to register different programs for different lsm hooks.
> If I understand the patch 6/10 correctly, there is one (or a list) prog for
> all lsm hooks per task which is not flexible enough.

For each LSM hook triggered by a thread, all of its Landlock eBPF
programs (dedicated for this kind of hook) will be evaluated (if
needed). This is the same behavior as seccomp (list of BPF programs
attached to a process hierarchy) except the BPF programs are not
evaluated for syscall but for LSM hooks. There is no way to make it more
fine-grained :)


> Anoop Naravaram's use case is to control the ports the applications
> under cgroup can bind and listen on.
> Such use case can be solved by such 'lsm cgroup controller' by
> attaching bpf program to security_socket_bind lsm hook and
> filtering sockaddr.
> Furthermore Sargun's use case is to allow further sockaddr rewrites
> from the bpf program which can be done as natural extension
> of such mechanism.
> 
> If I understood Daniel's Anoop's Sargun's and yours use cases
> correctly the common piece of kernel infrastructure that can solve
> them all can start from Daniel's current set of patches that
> establish a mechanism of attaching bpf program to a cgroup.
> Then adding lsm hooks to it and later allowing argument rewrite
> (since they're already in the kernel and no ToCToU problems exist)

To sum up, the pieces we have in common are the eBPF use and (optionally
for Landlock) there execution depending of the current cgroup.

Moreover, the three other series (Sargun's, Anoop's and Daniel's) do not
deal with unprivileged process which is the main purpose of Landlock.
I'm not sure that allowing sockaddr rewrites is a good idea here... Like
other LSMs, Landlock is dedicated to access-control.


For the network-related series, I think it make more sense to simply
create a netfilter rule matching a cgroup and then add more features to
netfilter (restrict port ranges and so on) thanks to eBPF programs.
Containers are (usually) in a dedicated network namespace, which open
the possibility to not only rely on cgroups (e.g. match UID,
netmask...). It would also be more flexible to be able to load a BPF
program in netfilter and update its maps on the fly to make dynamic
rules, like ipset does, but in a more generic way.


> 
> As far as safety and type checking that bpf programs has to do,
> I like the approach of patch 06/10:
> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
> +       PTR_TO_STRUCT_FILE, struct file *, file,
> +       PTR_TO_STRUCT_CRED, const struct cred *, cred
> +)
> teaching verifier to recognize struct file, cred, sockaddr
> will let bpf program access them naturally without any overhead.
> Though:
> @@ -102,6 +102,9 @@ enum bpf_prog_type {
>         BPF_PROG_TYPE_SCHED_CLS,
>         BPF_PROG_TYPE_SCHED_ACT,
>         BPF_PROG_TYPE_TRACEPOINT,
> +       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
> +       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
> +       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
>  };
> is a bit of overkill.
> I think it would be cleaner to have single
> BPF_PROG_TYPE_LSM and at program load time pass
> lsm_hook_id as well, so that verifier can do safety checks
> based on type info provided in LANDLOCK_HOOKs

I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
verifier check programs according to their types. If we need to check
specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
dedicated program types. I don't see any other way to do it with the
current verifier code. Moreover it's the purpose of program types, right?

Regards,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-26 14:20       ` Andy Lutomirski
@ 2016-08-26 15:50         ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2016-08-26 15:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Will Drewry, Kernel Hardening, Linux API, LSM List,
	Network Development, open list:CONTROL GROUP (CGROUP)

Hello,

On Fri, Aug 26, 2016 at 07:20:35AM -0700, Andy Lutomirski wrote:
> > This is simply the action of changing the owner of cgroup sysfs files to
> > allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt)
> 
> As far as I can tell, Tejun and systemd both actively discourage doing
> this.  Maybe I misunderstand.  But in any event, the admin giving you

Please refer to "2-5. Delegation" of Documentation/cgroup-v2.txt.
Delegation on v1 is broken on both core and specific controller
behaviors and thus discouraged.  On v2, delegation should work just
fine.

I haven't looked in detail but in general I'm not too excited about
layering security mechanism on top of cgroup.  Maybe it makes some
sense when security domain coincides with resource domains but at any
rate please keep me in the loop.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-26 15:10     ` Mickaël Salaün
@ 2016-08-26 23:05       ` Alexei Starovoitov
  2016-08-27  7:30         ` Andy Lutomirski
                           ` (3 more replies)
  0 siblings, 4 replies; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-26 23:05 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups

On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> 

trimming cc list again. When it's too big vger will consider it as spam.

> On 26/08/2016 04:14, Alexei Starovoitov wrote:
> > On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote:
> >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
> >> to compare the current process cgroup with a cgroup handle, The handle
> >> can match the current cgroup if it is the same or a child. This allows
> >> to make conditional rules according to the current cgroup.
> >>
> >> A cgroup handle is a map entry created from a file descriptor referring
> >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
> >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
> >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
> >>
> >> An unprivileged process can create and manipulate cgroups thanks to
> >> cgroup delegation.
> >>
> >> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > ...
> >> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map,
> >> +		u64 r3_map_op, u64 r4, u64 r5)
> >> +{
> >> +	u8 option = (u8) r1_option;
> >> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> >> +	enum bpf_map_array_op map_op = r3_map_op;
> >> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> >> +	struct cgroup *cg1, *cg2;
> >> +	struct map_landlock_handle *handle;
> >> +	int i;
> >> +
> >> +	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
> >> +	if (unlikely(!map)) {
> >> +		WARN_ON(1);
> >> +		return -EFAULT;
> >> +	}
> >> +	if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
> >> +		return -EINVAL;
> >> +
> >> +	/* for now, only handle OP_OR */
> >> +	switch (map_op) {
> >> +	case BPF_MAP_ARRAY_OP_OR:
> >> +		break;
> >> +	case BPF_MAP_ARRAY_OP_UNSPEC:
> >> +	case BPF_MAP_ARRAY_OP_AND:
> >> +	case BPF_MAP_ARRAY_OP_XOR:
> >> +	default:
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	synchronize_rcu();
> >> +
> >> +	for (i = 0; i < array->n_entries; i++) {
> >> +		handle = (struct map_landlock_handle *)
> >> +				(array->value + array->elem_size * i);
> >> +
> >> +		/* protected by the proto types, should not happen */
> >> +		if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
> >> +			WARN_ON(1);
> >> +			return -EFAULT;
> >> +		}
> >> +		if (unlikely(!handle->css)) {
> >> +			WARN_ON(1);
> >> +			return -EFAULT;
> >> +		}
> >> +
> >> +		if (option & LANDLOCK_FLAG_OPT_REVERSE) {
> >> +			cg1 = handle->css->cgroup;
> >> +			cg2 = task_css_set(current)->dfl_cgrp;
> >> +		} else {
> >> +			cg1 = task_css_set(current)->dfl_cgrp;
> >> +			cg2 = handle->css->cgroup;
> >> +		}
> >> +
> >> +		if (cgroup_is_descendant(cg1, cg2))
> >> +			return 0;
> >> +	}
> >> +	return 1;
> >> +}
> > 
> > - please take a loook at exisiting bpf_current_task_under_cgroup and
> > reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array
> > is nothing but duplication of the code.
> 
> Oh, I didn't know about this patchset and the new helper. Indeed, it
> looks a lot like mine except there is no static verification of the map
> type as I did with the arraymap of handles, and no batch mode either. I
> think the return value of bpf_current_task_under_cgroup is error-prone
> if an eBPF program do an "if(ret)" test on the value (because of the
> negative ERRNO return value). Inverting the 0 and 1 return values should
> fix this (0 == succeed, 1 == failed, <0 == error).

nothing to fix. It's good as-is. Use if (ret > 0) instead.

> 
> To sum up, there is four related patchsets:
> * "Landlock LSM: Unprivileged sandboxing" (this series)
> * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
> * "Networking cgroup controller" (Anoop Naravaram)
> * "Add eBPF hooks for cgroups" (Daniel Mack)
> 
> The three other series (Sargun's, Anoop's and Daniel's) are mainly
> focused on network access-control via cgroup for *containers*. As far as
> I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's
> goal is to empower all processes (privileged or not) to create their own
> sandbox. This also means, like explained in "[RFC v2 00/10] Landlock
> LSM: Unprivileged sandboxing", there is more constraints. For example,
> it is not acceptable to let a process probe the kernel memory as it
> wish. More details are in the Landlock cover-letter.
> 
> 
> Another important point is that supporting cgroup for Landlock is
> optional. It does not rely on cgroup to be usable but is only a feature
> available when (unprivileged) users can manage there own cgroup, which
> is an important constraint. Put another way, Landlock should not rely on
> cgroup to create sandboxes. Indeed, a process creating a sandbox do not
> necessarily have access to the cgroup mount point (directly or not).

cgroup is the common way to group multiple tasks.
Without cgroup only parent<->child relationship will be possible,
which will limit usability of such lsm to a master task that controls
its children. Such api restriction would have been ok, if we could
extend it in the future, but unfortunately task-centric won't allow it
without creating a parallel lsm that is cgroup based.
Therefore I think we have to go with cgroup-centric api and your
application has to use cgroups from the start though only parent-child
would have been enough.
Also I don't think the kernel can afford two bpf based lsm. One task
based and another cgroup based, so we have to find common ground
that suits both use cases.
Having unprivliged access is a subset. There is no strong reason why
cgroup+lsm+bpf should be limited to root only always.
When we can guarantee no pointer leaks, we can allow unpriv.

> > 
> > - I don't think such 'for' loop can scale. The solution needs to work
> > with thousands of containers and thousands of cgroups.
> > In the patch 06/10 the proposal is to use 'current' as holder of
> > the programs:
> > +   for (prog = current->seccomp.landlock_prog;
> > +                   prog; prog = prog->prev) {
> > +           if (prog->filter == landlock_ret->filter) {
> > +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> > +                   break;
> > +           }
> > +   }
> > imo that's the root of scalability issue.
> > I think to be able to scale the bpf programs have to be attached to
> > cgroups instead of tasks.
> > That would be very different api. seccomp doesn't need to be touched.
> > But that is the only way I see to be able to scale.
> 
> Landlock is inspired from seccomp which also use a BPF program per
> thread. For seccomp, each BPF programs are executed for each syscall.
> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> see why it is a scale issue for Landlock comparing to seccomp. I also
> don't see why storing the BPF program list pointer in the cgroup struct
> instead of the task struct change a lot here. The BPF programs execution
> will be the same anyway (for each LSM hook). Kees should probably have a
> better opinion on this.

seccomp has its own issues and copying them doesn't make this lsm any better.
Like seccomp bpf programs are all gigantic switch statement that looks
for interesting syscall numbers. All syscalls of a task are paying
non-trivial seccomp penalty due to such design. If bpf was attached per
syscall it would have been much cheaper. Of course doing it this way
for seccomp is not easy, but for lsm such facility is already there.
Blank call of a single bpf prog for all lsm hooks is unnecessary
overhead that can and should be avoided.

> > May be another way of thinking about it is 'lsm cgroup controller'
> > that Sargun is proposing.
> > The lsm hooks will provide stable execution points and the programs
> > will be called like:
> > prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
> > BPF_PROG_RUN(prog, ctx);
> > The delegation functionality and 'prog_effective' logic that
> > Daniel Mack is proposing will be fully reused here.
> > External container management software will be able to apply bpf
> > programs to control tasks under cgroup and such
> > bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
> > The user will be able to register different programs for different lsm hooks.
> > If I understand the patch 6/10 correctly, there is one (or a list) prog for
> > all lsm hooks per task which is not flexible enough.
> 
> For each LSM hook triggered by a thread, all of its Landlock eBPF
> programs (dedicated for this kind of hook) will be evaluated (if
> needed). This is the same behavior as seccomp (list of BPF programs
> attached to a process hierarchy) except the BPF programs are not
> evaluated for syscall but for LSM hooks. There is no way to make it more
> fine-grained :)

There is a way to attach different bpf program per cgroup
and per lsm hook. Such approach drastically reduces overhead
of sandboxed application.

> > Anoop Naravaram's use case is to control the ports the applications
> > under cgroup can bind and listen on.
> > Such use case can be solved by such 'lsm cgroup controller' by
> > attaching bpf program to security_socket_bind lsm hook and
> > filtering sockaddr.
> > Furthermore Sargun's use case is to allow further sockaddr rewrites
> > from the bpf program which can be done as natural extension
> > of such mechanism.
> > 
> > If I understood Daniel's Anoop's Sargun's and yours use cases
> > correctly the common piece of kernel infrastructure that can solve
> > them all can start from Daniel's current set of patches that
> > establish a mechanism of attaching bpf program to a cgroup.
> > Then adding lsm hooks to it and later allowing argument rewrite
> > (since they're already in the kernel and no ToCToU problems exist)
> 
> To sum up, the pieces we have in common are the eBPF use and (optionally
> for Landlock) there execution depending of the current cgroup.
> 
> Moreover, the three other series (Sargun's, Anoop's and Daniel's) do not
> deal with unprivileged process which is the main purpose of Landlock.
> I'm not sure that allowing sockaddr rewrites is a good idea here... Like
> other LSMs, Landlock is dedicated to access-control.

we have to find common ground and common infra to solve all these use cases.
Pointing out the differences isn't going to make this snowflake
any more special.

> For the network-related series, I think it make more sense to simply
> create a netfilter rule matching a cgroup and then add more features to
> netfilter (restrict port ranges and so on) thanks to eBPF programs.
> Containers are (usually) in a dedicated network namespace, which open
> the possibility to not only rely on cgroups (e.g. match UID,
> netmask...). It would also be more flexible to be able to load a BPF
> program in netfilter and update its maps on the fly to make dynamic
> rules, like ipset does, but in a more generic way.
> 
> 
> > 
> > As far as safety and type checking that bpf programs has to do,
> > I like the approach of patch 06/10:
> > +LANDLOCK_HOOK2(file_open, FILE_OPEN,
> > +       PTR_TO_STRUCT_FILE, struct file *, file,
> > +       PTR_TO_STRUCT_CRED, const struct cred *, cred
> > +)
> > teaching verifier to recognize struct file, cred, sockaddr
> > will let bpf program access them naturally without any overhead.
> > Though:
> > @@ -102,6 +102,9 @@ enum bpf_prog_type {
> >         BPF_PROG_TYPE_SCHED_CLS,
> >         BPF_PROG_TYPE_SCHED_ACT,
> >         BPF_PROG_TYPE_TRACEPOINT,
> > +       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
> > +       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
> > +       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
> >  };
> > is a bit of overkill.
> > I think it would be cleaner to have single
> > BPF_PROG_TYPE_LSM and at program load time pass
> > lsm_hook_id as well, so that verifier can do safety checks
> > based on type info provided in LANDLOCK_HOOKs
> 
> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
> verifier check programs according to their types. If we need to check
> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
> dedicated program types. I don't see any other way to do it with the
> current verifier code. Moreover it's the purpose of program types, right?

Adding new bpf program type for every lsm hook is not acceptable.
Either do one new program type + pass lsm_hook_id as suggested
or please come up with an alternative approach.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-26 23:05       ` Alexei Starovoitov
@ 2016-08-27  7:30         ` Andy Lutomirski
  2016-08-27 18:11           ` Alexei Starovoitov
  2016-08-27 14:06         ` [RFC v2 09/10] landlock: Handle cgroups (performance) Mickaël Salaün
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-27  7:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Mickaël Salaün, Daniel Mack,
	Daniel Borkmann

On Aug 27, 2016 1:05 AM, "Alexei Starovoitov"
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >
>
> trimming cc list again. When it's too big vger will consider it as spam.
>
> > On 26/08/2016 04:14, Alexei Starovoitov wrote:
> > > On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote:
> > >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
> > >> to compare the current process cgroup with a cgroup handle, The handle
> > >> can match the current cgroup if it is the same or a child. This allows
> > >> to make conditional rules according to the current cgroup.
> > >>
> > >> A cgroup handle is a map entry created from a file descriptor referring
> > >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
> > >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
> > >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
> > >>
> > >> An unprivileged process can create and manipulate cgroups thanks to
> > >> cgroup delegation.
> > >>
> > >> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > ...
> > >> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map,
> > >> +          u64 r3_map_op, u64 r4, u64 r5)
> > >> +{
> > >> +  u8 option = (u8) r1_option;
> > >> +  struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> > >> +  enum bpf_map_array_op map_op = r3_map_op;
> > >> +  struct bpf_array *array = container_of(map, struct bpf_array, map);
> > >> +  struct cgroup *cg1, *cg2;
> > >> +  struct map_landlock_handle *handle;
> > >> +  int i;
> > >> +
> > >> +  /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
> > >> +  if (unlikely(!map)) {
> > >> +          WARN_ON(1);
> > >> +          return -EFAULT;
> > >> +  }
> > >> +  if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
> > >> +          return -EINVAL;
> > >> +
> > >> +  /* for now, only handle OP_OR */
> > >> +  switch (map_op) {
> > >> +  case BPF_MAP_ARRAY_OP_OR:
> > >> +          break;
> > >> +  case BPF_MAP_ARRAY_OP_UNSPEC:
> > >> +  case BPF_MAP_ARRAY_OP_AND:
> > >> +  case BPF_MAP_ARRAY_OP_XOR:
> > >> +  default:
> > >> +          return -EINVAL;
> > >> +  }
> > >> +
> > >> +  synchronize_rcu();
> > >> +
> > >> +  for (i = 0; i < array->n_entries; i++) {
> > >> +          handle = (struct map_landlock_handle *)
> > >> +                          (array->value + array->elem_size * i);
> > >> +
> > >> +          /* protected by the proto types, should not happen */
> > >> +          if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
> > >> +                  WARN_ON(1);
> > >> +                  return -EFAULT;
> > >> +          }
> > >> +          if (unlikely(!handle->css)) {
> > >> +                  WARN_ON(1);
> > >> +                  return -EFAULT;
> > >> +          }
> > >> +
> > >> +          if (option & LANDLOCK_FLAG_OPT_REVERSE) {
> > >> +                  cg1 = handle->css->cgroup;
> > >> +                  cg2 = task_css_set(current)->dfl_cgrp;
> > >> +          } else {
> > >> +                  cg1 = task_css_set(current)->dfl_cgrp;
> > >> +                  cg2 = handle->css->cgroup;
> > >> +          }
> > >> +
> > >> +          if (cgroup_is_descendant(cg1, cg2))
> > >> +                  return 0;
> > >> +  }
> > >> +  return 1;
> > >> +}
> > >
> > > - please take a loook at exisiting bpf_current_task_under_cgroup and
> > > reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array
> > > is nothing but duplication of the code.
> >
> > Oh, I didn't know about this patchset and the new helper. Indeed, it
> > looks a lot like mine except there is no static verification of the map
> > type as I did with the arraymap of handles, and no batch mode either. I
> > think the return value of bpf_current_task_under_cgroup is error-prone
> > if an eBPF program do an "if(ret)" test on the value (because of the
> > negative ERRNO return value). Inverting the 0 and 1 return values should
> > fix this (0 == succeed, 1 == failed, <0 == error).
>
> nothing to fix. It's good as-is. Use if (ret > 0) instead.
>
> >
> > To sum up, there is four related patchsets:
> > * "Landlock LSM: Unprivileged sandboxing" (this series)
> > * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
> > * "Networking cgroup controller" (Anoop Naravaram)
> > * "Add eBPF hooks for cgroups" (Daniel Mack)
> >
> > The three other series (Sargun's, Anoop's and Daniel's) are mainly
> > focused on network access-control via cgroup for *containers*. As far as
> > I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's
> > goal is to empower all processes (privileged or not) to create their own
> > sandbox. This also means, like explained in "[RFC v2 00/10] Landlock
> > LSM: Unprivileged sandboxing", there is more constraints. For example,
> > it is not acceptable to let a process probe the kernel memory as it
> > wish. More details are in the Landlock cover-letter.
> >
> >
> > Another important point is that supporting cgroup for Landlock is
> > optional. It does not rely on cgroup to be usable but is only a feature
> > available when (unprivileged) users can manage there own cgroup, which
> > is an important constraint. Put another way, Landlock should not rely on
> > cgroup to create sandboxes. Indeed, a process creating a sandbox do not
> > necessarily have access to the cgroup mount point (directly or not).
>
> cgroup is the common way to group multiple tasks.
> Without cgroup only parent<->child relationship will be possible,
> which will limit usability of such lsm to a master task that controls
> its children. Such api restriction would have been ok, if we could
> extend it in the future, but unfortunately task-centric won't allow it
> without creating a parallel lsm that is cgroup based.
> Therefore I think we have to go with cgroup-centric api and your
> application has to use cgroups from the start though only parent-child
> would have been enough.
> Also I don't think the kernel can afford two bpf based lsm. One task
> based and another cgroup based, so we have to find common ground
> that suits both use cases.
> Having unprivliged access is a subset. There is no strong reason why
> cgroup+lsm+bpf should be limited to root only always.
> When we can guarantee no pointer leaks, we can allow unpriv.

I don't really understand what you mean.  In the context of landlock,
which is a *sandbox*, can one of you explain a use case that
materially benefits from this type of cgroup usage?  I haven't thought
of one.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (10 preceding siblings ...)
  2016-08-25 11:05 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
@ 2016-08-27  7:40 ` Andy Lutomirski
  2016-08-27 15:10   ` Mickaël Salaün
  2016-08-30 16:06 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
  2016-09-15  9:19 ` Pavel Machek
  13 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-27  7:40 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development

On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.
>
> The first RFC [1] was focused on extending seccomp while staying at the syscall
> level. This brought a working PoC but with some (mitigated) ToCToU race
> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> syscall argument evaluation (hence the LSM hooks).
>
>
> # Landlock LSM
>
> This second RFC is a fresh revamp of the code while keeping some working ideas.
> This series is mainly focused on LSM hooks, while keeping the possibility to
> tied them to syscalls. This new code removes all race conditions by design. It
> now use eBPF instead of a subset of cBPF (as used by seccomp-bpf). This allow
> to remove the previous stacked cBPF hack to do complex access checks thanks to
> dedicated eBPF functions. An eBPF program is still very limited (i.e. can only
> call a whitelist of functions) and can not do a denial of service (i.e. no
> loop). The other major improvement is the replacement of the previous custom
> checker groups of syscall arguments with a new dedicated eBPF map to collect
> and compare Landlock handles with system resources (e.g. files or network
> connections).
>
> The approach taken is to add the minimum amount of code while still allowing
> the userland to create quite complex access rules. A dedicated security policy
> language such as used by SELinux, AppArmor and other major LSMs is a lot of
> code and dedicated to a trusted process (i.e. root/administrator).
>

I think there might be a problem with the current design.  If I add a
seccomp filter that uses RET_LANDLOCK and some landlock filters, what
happens if a second seccomp filter *also* uses RET_LANDLOCK?  I think
they'll interfere with each other.  It might end up being necessary to
require only one landlock seccomp layer at a time or to find a way to
stick all the filters in a layer together with the LSM callbacks or
maybe to just drop RET_LANDLOCK and let the callbacks look at the
syscall args.

BTW, what happens if an LSM hook is called outside a syscall context,
e.g. from a page fault?

>
>
> # Sandbox example with conditional access control depending on cgroup
>
>   $ mkdir /sys/fs/cgroup/sandboxed
>   $ ls /home
>   user1
>   $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
>       LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>       ./sandbox /bin/sh -i
>   $ ls /home
>   user1
>   $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
>   $ ls /home
>   ls: cannot open directory '/home': Permission denied
>

Something occurs to me that isn't strictly relevant to landlock but
may be relevant to unprivileged cgroups: can you cause trouble by
setting up a nastily-configured cgroup and running a setuid program in
it?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 08/10] landlock: Handle file system comparisons
  2016-08-26 14:57       ` Andy Lutomirski
@ 2016-08-27 13:45         ` Mickaël Salaün
  0 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 13:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, LKML, Alexei Starovoitov, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 2994 bytes --]


On 26/08/2016 16:57, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 7:10 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 25/08/2016 13:12, Andy Lutomirski wrote:
>>> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>> Add eBPF functions to compare file system access with a Landlock file
>>>> system handle:
>>>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>>>   This function allows to compare the dentry, inode, device or mount
>>>>   point of the currently accessed file, with a reference handle.
>>>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>>>   This function allows an eBPF program to check if the current accessed
>>>>   file is the same or in the hierarchy of a reference handle.
>>>>
>>>> The goal of file system handle is to abstract kernel objects such as a
>>>> struct file or a struct inode. Userland can create this kind of handle
>>>> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
>>>> landlock_handle containing the handle type (e.g.
>>>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
>>>> also be any descriptions able to match a struct file or a struct inode
>>>> (e.g. path or glob string).
>>>
>>> This needs Eric's opinion.
>>>
>>> Also, where do all the struct file *'s get stashed?  Are they
>>> preserved in the arraymap?  What prevents reference cycles or absurdly
>>> large numbers of struct files getting pinned?
>>
>> Yes, the struct file are kept in the arraymap and dropped when there is
>> no more reference on them. Currently, the limitations are the maximum
>> number of open file descriptors referring to an arraymap and the maximum
>> number of eBPF Landlock programs loaded in a process
>> (LANDLOCK_PROG_LIST_MAX_PAGES in kernel/seccomp.c).
>>
>> What kind of reference cycles have you in mind?
> 
> Shoving evil things into the arraymaps, e.g. unix sockets with
> SCM_RIGHTS messages pending, eBPF program references, the arraymap fd
> itself, another arraymap fd, etc.

The arraymap of Landlock handles is strongly typed and can check the
kind of FD it get when creating/updating an entry, which is done for the
cgroup type. It may be wise to add another check for FS types as well,
which should be a one-liner. I'll do it for the next round.


>>
>> It probably needs another limit for kernel object references as well.
>> What is the best option here? Add another static limitation or use an
>> existing one?
> 
> Dunno.  If RLIMIT_FILE could be made to work, that would be nice.

The RLIMIT_NOFILE is used for the eBPF map creation, but only the memory
limit is used to store the map entries (struct file pointers). I'll add
a new static limit for the number of FD-based arraymap entries because
it does not reflect the same semantic. The struct files are not usable
as FD, their only purpose is to be able to compare with another file.

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-26 23:05       ` Alexei Starovoitov
  2016-08-27  7:30         ` Andy Lutomirski
@ 2016-08-27 14:06         ` Mickaël Salaün
  2016-08-27 18:06           ` Alexei Starovoitov
  2016-08-27 14:19         ` [RFC v2 09/10] landlock: Handle cgroups (netfilter match) Mickaël Salaün
  2016-08-27 14:34         ` [RFC v2 09/10] landlock: Handle cgroups (program types) Mickaël Salaün
  3 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 14:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups


[-- Attachment #1.1: Type: text/plain, Size: 4081 bytes --]


On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>
>>>
>>> - I don't think such 'for' loop can scale. The solution needs to work
>>> with thousands of containers and thousands of cgroups.
>>> In the patch 06/10 the proposal is to use 'current' as holder of
>>> the programs:
>>> +   for (prog = current->seccomp.landlock_prog;
>>> +                   prog; prog = prog->prev) {
>>> +           if (prog->filter == landlock_ret->filter) {
>>> +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
>>> +                   break;
>>> +           }
>>> +   }
>>> imo that's the root of scalability issue.
>>> I think to be able to scale the bpf programs have to be attached to
>>> cgroups instead of tasks.
>>> That would be very different api. seccomp doesn't need to be touched.
>>> But that is the only way I see to be able to scale.
>>
>> Landlock is inspired from seccomp which also use a BPF program per
>> thread. For seccomp, each BPF programs are executed for each syscall.
>> For Landlock, some BPF programs are executed for some LSM hooks. I don't
>> see why it is a scale issue for Landlock comparing to seccomp. I also
>> don't see why storing the BPF program list pointer in the cgroup struct
>> instead of the task struct change a lot here. The BPF programs execution
>> will be the same anyway (for each LSM hook). Kees should probably have a
>> better opinion on this.
> 
> seccomp has its own issues and copying them doesn't make this lsm any better.
> Like seccomp bpf programs are all gigantic switch statement that looks
> for interesting syscall numbers. All syscalls of a task are paying
> non-trivial seccomp penalty due to such design. If bpf was attached per
> syscall it would have been much cheaper. Of course doing it this way
> for seccomp is not easy, but for lsm such facility is already there.
> Blank call of a single bpf prog for all lsm hooks is unnecessary
> overhead that can and should be avoided.

It's probably a misunderstanding. Contrary to seccomp which run all the
thread's BPF programs for any syscall, Landlock only run eBPF programs
for the triggered LSM hooks, if their type match. Indeed, thanks to the
multiple eBPF program types and contrary to seccomp, Landlock only run
an eBPF program when needed. Landlock will have almost no performance
overhead if the syscalls do not trigger the watched LSM hooks for the
current process.


> 
>>> May be another way of thinking about it is 'lsm cgroup controller'
>>> that Sargun is proposing.
>>> The lsm hooks will provide stable execution points and the programs
>>> will be called like:
>>> prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
>>> BPF_PROG_RUN(prog, ctx);
>>> The delegation functionality and 'prog_effective' logic that
>>> Daniel Mack is proposing will be fully reused here.
>>> External container management software will be able to apply bpf
>>> programs to control tasks under cgroup and such
>>> bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
>>> The user will be able to register different programs for different lsm hooks.
>>> If I understand the patch 6/10 correctly, there is one (or a list) prog for
>>> all lsm hooks per task which is not flexible enough.
>>
>> For each LSM hook triggered by a thread, all of its Landlock eBPF
>> programs (dedicated for this kind of hook) will be evaluated (if
>> needed). This is the same behavior as seccomp (list of BPF programs
>> attached to a process hierarchy) except the BPF programs are not
>> evaluated for syscall but for LSM hooks. There is no way to make it more
>> fine-grained :)
> 
> There is a way to attach different bpf program per cgroup
> and per lsm hook. Such approach drastically reduces overhead
> of sandboxed application.

As said above, Landlock will not run an eBPF programs when not strictly
needed. Attaching to a cgroup will have the same performance impact as
attaching to a process hierarchy.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (netfilter match)
  2016-08-26 23:05       ` Alexei Starovoitov
  2016-08-27  7:30         ` Andy Lutomirski
  2016-08-27 14:06         ` [RFC v2 09/10] landlock: Handle cgroups (performance) Mickaël Salaün
@ 2016-08-27 14:19         ` Mickaël Salaün
  2016-08-27 18:32           ` Alexei Starovoitov
  2016-08-27 14:34         ` [RFC v2 09/10] landlock: Handle cgroups (program types) Mickaël Salaün
  3 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 14:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups


[-- Attachment #1.1: Type: text/plain, Size: 1858 bytes --]


On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>> To sum up, there is four related patchsets:
>> * "Landlock LSM: Unprivileged sandboxing" (this series)
>> * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
>> * "Networking cgroup controller" (Anoop Naravaram)
>> * "Add eBPF hooks for cgroups" (Daniel Mack)

>>> Anoop Naravaram's use case is to control the ports the applications
>>> under cgroup can bind and listen on.
>>> Such use case can be solved by such 'lsm cgroup controller' by
>>> attaching bpf program to security_socket_bind lsm hook and
>>> filtering sockaddr.
>>> Furthermore Sargun's use case is to allow further sockaddr rewrites
>>> from the bpf program which can be done as natural extension
>>> of such mechanism.
>>>
>>> If I understood Daniel's Anoop's Sargun's and yours use cases
>>> correctly the common piece of kernel infrastructure that can solve
>>> them all can start from Daniel's current set of patches that
>>> establish a mechanism of attaching bpf program to a cgroup.
>>> Then adding lsm hooks to it and later allowing argument rewrite
>>> (since they're already in the kernel and no ToCToU problems exist)

>> For the network-related series, I think it make more sense to simply
>> create a netfilter rule matching a cgroup and then add more features to
>> netfilter (restrict port ranges and so on) thanks to eBPF programs.
>> Containers are (usually) in a dedicated network namespace, which open
>> the possibility to not only rely on cgroups (e.g. match UID,
>> netmask...). It would also be more flexible to be able to load a BPF
>> program in netfilter and update its maps on the fly to make dynamic
>> rules, like ipset does, but in a more generic way.

What do the netdev folks think about this design?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (program types)
  2016-08-26 23:05       ` Alexei Starovoitov
                           ` (2 preceding siblings ...)
  2016-08-27 14:19         ` [RFC v2 09/10] landlock: Handle cgroups (netfilter match) Mickaël Salaün
@ 2016-08-27 14:34         ` Mickaël Salaün
  2016-08-27 18:19           ` Alexei Starovoitov
  3 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 14:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo


[-- Attachment #1.1: Type: text/plain, Size: 1931 bytes --]


On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>
>>> As far as safety and type checking that bpf programs has to do,
>>> I like the approach of patch 06/10:
>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
>>> +       PTR_TO_STRUCT_FILE, struct file *, file,
>>> +       PTR_TO_STRUCT_CRED, const struct cred *, cred
>>> +)
>>> teaching verifier to recognize struct file, cred, sockaddr
>>> will let bpf program access them naturally without any overhead.
>>> Though:
>>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
>>>         BPF_PROG_TYPE_SCHED_CLS,
>>>         BPF_PROG_TYPE_SCHED_ACT,
>>>         BPF_PROG_TYPE_TRACEPOINT,
>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
>>> +       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
>>>  };
>>> is a bit of overkill.
>>> I think it would be cleaner to have single
>>> BPF_PROG_TYPE_LSM and at program load time pass
>>> lsm_hook_id as well, so that verifier can do safety checks
>>> based on type info provided in LANDLOCK_HOOKs
>>
>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
>> verifier check programs according to their types. If we need to check
>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
>> dedicated program types. I don't see any other way to do it with the
>> current verifier code. Moreover it's the purpose of program types, right?
> 
> Adding new bpf program type for every lsm hook is not acceptable.
> Either do one new program type + pass lsm_hook_id as suggested
> or please come up with an alternative approach.

OK, so we have to modify the verifier to not only rely on the program
type but on another value to check the context accesses. Do you have a
hint from where this value could come from? Do we need to add a new bpf
command to associate a program to a subtype?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-27  7:40 ` Andy Lutomirski
@ 2016-08-27 15:10   ` Mickaël Salaün
  2016-08-27 15:21     ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing (cgroup delegation) Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 15:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 5052 bytes --]


On 27/08/2016 09:40, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> Hi,
>>
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
>>
>> The first RFC [1] was focused on extending seccomp while staying at the syscall
>> level. This brought a working PoC but with some (mitigated) ToCToU race
>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>> syscall argument evaluation (hence the LSM hooks).
>>
>>
>> # Landlock LSM
>>
>> This second RFC is a fresh revamp of the code while keeping some working ideas.
>> This series is mainly focused on LSM hooks, while keeping the possibility to
>> tied them to syscalls. This new code removes all race conditions by design. It
>> now use eBPF instead of a subset of cBPF (as used by seccomp-bpf). This allow
>> to remove the previous stacked cBPF hack to do complex access checks thanks to
>> dedicated eBPF functions. An eBPF program is still very limited (i.e. can only
>> call a whitelist of functions) and can not do a denial of service (i.e. no
>> loop). The other major improvement is the replacement of the previous custom
>> checker groups of syscall arguments with a new dedicated eBPF map to collect
>> and compare Landlock handles with system resources (e.g. files or network
>> connections).
>>
>> The approach taken is to add the minimum amount of code while still allowing
>> the userland to create quite complex access rules. A dedicated security policy
>> language such as used by SELinux, AppArmor and other major LSMs is a lot of
>> code and dedicated to a trusted process (i.e. root/administrator).
>>
> 
> I think there might be a problem with the current design.  If I add a
> seccomp filter that uses RET_LANDLOCK and some landlock filters, what
> happens if a second seccomp filter *also* uses RET_LANDLOCK?  I think
> they'll interfere with each other.  It might end up being necessary to
> require only one landlock seccomp layer at a time or to find a way to
> stick all the filters in a layer together with the LSM callbacks or
> maybe to just drop RET_LANDLOCK and let the callbacks look at the
> syscall args.

This is correctly managed. For each RET_LANDLOCK, if there is one or
more associated Landlock programs (i.e. created by the same thread after
this seccomp filters), there is one Landlock program instance run for
each seccomp that trigger them. This way, each cookie linked to a
RET_LANDLOCK is evaluated one time by each relevant Landlock program.

Example when a thread that loaded multiple seccomp filters (SF) and
multiple Landlock programs (LP) associated with one LSM hook: SF0, SF1,
LP0(file_open), SF2, LP1(file_open), LP2(file_permission)

* If SF0 returns RET_LANDLOCK(cookie0), then LP0 and LP1 are run with
cookie0 if the current syscall trigger the file_open hook, and LP2 is
run with cookie0 if the syscall trigger the file_permission hook.

* In addition to the previous case, if SF1 returns
RET_LANDLOCK(cookie1), then LP0 and LP1 are run with cookie1 if the
current syscall trigger the file_open hook, and LP2 is run with cookie1
if the syscall trigger the file_permission hook.

* In addition to the previous cases, if SF2 returns
RET_LANDLOCK(cookie2), then (only) LP1 is run with cookie2 if the
current syscall trigger the file_open hook, and LP2 is run with cookie2
if the syscall trigger the file_permission hook.


> 
> BTW, what happens if an LSM hook is called outside a syscall context,
> e.g. from a page fault?

Good catch! For now, only a syscall can trigger an LSM hook because of
the RET_LANDLOCK constraint. It may be wise to trigger them without a
cookie and add a dedicated variable in the eBPF context.

> 
>>
>>
>> # Sandbox example with conditional access control depending on cgroup
>>
>>   $ mkdir /sys/fs/cgroup/sandboxed
>>   $ ls /home
>>   user1
>>   $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
>>       LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>>       ./sandbox /bin/sh -i
>>   $ ls /home
>>   user1
>>   $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
>>   $ ls /home
>>   ls: cannot open directory '/home': Permission denied
>>
> 
> Something occurs to me that isn't strictly relevant to landlock but
> may be relevant to unprivileged cgroups: can you cause trouble by
> setting up a nastily-configured cgroup and running a setuid program in
> it?
> 

I hope not… But the use of cgroups should not be mandatory for Landlock.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing (cgroup delegation)
  2016-08-27 15:10   ` Mickaël Salaün
@ 2016-08-27 15:21     ` Mickaël Salaün
  0 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 15:21 UTC (permalink / raw)
  To: Mickaël Salaün, Andy Lutomirski
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development, Tejun Heo, cgroups


[-- Attachment #1.1: Type: text/plain, Size: 1324 bytes --]

Cc Tejun and the cgroups ML.

On 27/08/2016 17:10, Mickaël Salaün wrote:
> On 27/08/2016 09:40, Andy Lutomirski wrote:
>> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> # Sandbox example with conditional access control depending on cgroup
>>>
>>>   $ mkdir /sys/fs/cgroup/sandboxed
>>>   $ ls /home
>>>   user1
>>>   $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
>>>       LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>>>       ./sandbox /bin/sh -i
>>>   $ ls /home
>>>   user1
>>>   $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
>>>   $ ls /home
>>>   ls: cannot open directory '/home': Permission denied
>>>
>>
>> Something occurs to me that isn't strictly relevant to landlock but
>> may be relevant to unprivileged cgroups: can you cause trouble by
>> setting up a nastily-configured cgroup and running a setuid program in
>> it?
>>
> 
> I hope not… But the use of cgroups should not be mandatory for Landlock.
> 

In a previous email:

On 26/08/2016 17:50, Tejun Heo wrote:
> I haven't looked in detail but in general I'm not too excited about
> layering security mechanism on top of cgroup.  Maybe it makes some
> sense when security domain coincides with resource domains but at any
> rate please keep me in the loop.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-27 14:06         ` [RFC v2 09/10] landlock: Handle cgroups (performance) Mickaël Salaün
@ 2016-08-27 18:06           ` Alexei Starovoitov
  2016-08-27 19:35             ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-27 18:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups

On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>
> >>>
> >>> - I don't think such 'for' loop can scale. The solution needs to work
> >>> with thousands of containers and thousands of cgroups.
> >>> In the patch 06/10 the proposal is to use 'current' as holder of
> >>> the programs:
> >>> +   for (prog = current->seccomp.landlock_prog;
> >>> +                   prog; prog = prog->prev) {
> >>> +           if (prog->filter == landlock_ret->filter) {
> >>> +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> >>> +                   break;
> >>> +           }
> >>> +   }
> >>> imo that's the root of scalability issue.
> >>> I think to be able to scale the bpf programs have to be attached to
> >>> cgroups instead of tasks.
> >>> That would be very different api. seccomp doesn't need to be touched.
> >>> But that is the only way I see to be able to scale.
> >>
> >> Landlock is inspired from seccomp which also use a BPF program per
> >> thread. For seccomp, each BPF programs are executed for each syscall.
> >> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> >> see why it is a scale issue for Landlock comparing to seccomp. I also
> >> don't see why storing the BPF program list pointer in the cgroup struct
> >> instead of the task struct change a lot here. The BPF programs execution
> >> will be the same anyway (for each LSM hook). Kees should probably have a
> >> better opinion on this.
> > 
> > seccomp has its own issues and copying them doesn't make this lsm any better.
> > Like seccomp bpf programs are all gigantic switch statement that looks
> > for interesting syscall numbers. All syscalls of a task are paying
> > non-trivial seccomp penalty due to such design. If bpf was attached per
> > syscall it would have been much cheaper. Of course doing it this way
> > for seccomp is not easy, but for lsm such facility is already there.
> > Blank call of a single bpf prog for all lsm hooks is unnecessary
> > overhead that can and should be avoided.
> 
> It's probably a misunderstanding. Contrary to seccomp which run all the
> thread's BPF programs for any syscall, Landlock only run eBPF programs
> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> multiple eBPF program types and contrary to seccomp, Landlock only run
> an eBPF program when needed. Landlock will have almost no performance
> overhead if the syscalls do not trigger the watched LSM hooks for the
> current process.

that's not what I see in the patch 06/10:
all lsm_hooks in 'static struct security_hook_list landlock_hooks'
(which eventually means all lsm hooks) will call
static inline int landlock_hook_##NAME
which will call landlock_run_prog()
which does:
+ for (landlock_ret = current->seccomp.landlock_ret;
+      landlock_ret; landlock_ret = landlock_ret->prev) {
+    if (landlock_ret->triggered) {
+       ctx.cookie = landlock_ret->cookie;
+       for (prog = current->seccomp.landlock_prog;
+            prog; prog = prog->prev) {
+               if (prog->filter == landlock_ret->filter) {
+                       cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
+                       break;
+               }
+       }

that is unacceptable overhead and not a scalable design.
It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
as soon as more lsm hooks are added.

> As said above, Landlock will not run an eBPF programs when not strictly
> needed. Attaching to a cgroup will have the same performance impact as
> attaching to a process hierarchy.

Having a prog per cgroup per lsm_hook is the only scalable way I
could come up with. If you see another way, please propose.
current->seccomp.landlock_prog is not the answer.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-27  7:30         ` Andy Lutomirski
@ 2016-08-27 18:11           ` Alexei Starovoitov
  2016-08-28  8:14             ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-27 18:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Mickaël Salaün, Daniel Mack,
	Daniel Borkmann

On Sat, Aug 27, 2016 at 12:30:36AM -0700, Andy Lutomirski wrote:
> > cgroup is the common way to group multiple tasks.
> > Without cgroup only parent<->child relationship will be possible,
> > which will limit usability of such lsm to a master task that controls
> > its children. Such api restriction would have been ok, if we could
> > extend it in the future, but unfortunately task-centric won't allow it
> > without creating a parallel lsm that is cgroup based.
> > Therefore I think we have to go with cgroup-centric api and your
> > application has to use cgroups from the start though only parent-child
> > would have been enough.
> > Also I don't think the kernel can afford two bpf based lsm. One task
> > based and another cgroup based, so we have to find common ground
> > that suits both use cases.
> > Having unprivliged access is a subset. There is no strong reason why
> > cgroup+lsm+bpf should be limited to root only always.
> > When we can guarantee no pointer leaks, we can allow unpriv.
> 
> I don't really understand what you mean.  In the context of landlock,
> which is a *sandbox*, can one of you explain a use case that
> materially benefits from this type of cgroup usage?  I haven't thought
> of one.

In case of seccomp-like sandbox where parent controls child processes
cgroup is not needed. It's needed when container management software
needs to control a set of applications. If we can have one bpf-based lsm
that works via cgroup and without, I'd be fine with it. Right now
I haven't seen a plausible proposal to do that. Therefore cgroup based
api is a common api that works for sandbox as well, though requiring
parent to create a cgroup just to control a single child is cumbersome.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (program types)
  2016-08-27 14:34         ` [RFC v2 09/10] landlock: Handle cgroups (program types) Mickaël Salaün
@ 2016-08-27 18:19           ` Alexei Starovoitov
  2016-08-27 19:55             ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-27 18:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo

On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >
> >>> As far as safety and type checking that bpf programs has to do,
> >>> I like the approach of patch 06/10:
> >>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
> >>> +       PTR_TO_STRUCT_FILE, struct file *, file,
> >>> +       PTR_TO_STRUCT_CRED, const struct cred *, cred
> >>> +)
> >>> teaching verifier to recognize struct file, cred, sockaddr
> >>> will let bpf program access them naturally without any overhead.
> >>> Though:
> >>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
> >>>         BPF_PROG_TYPE_SCHED_CLS,
> >>>         BPF_PROG_TYPE_SCHED_ACT,
> >>>         BPF_PROG_TYPE_TRACEPOINT,
> >>> +       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
> >>> +       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
> >>> +       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
> >>>  };
> >>> is a bit of overkill.
> >>> I think it would be cleaner to have single
> >>> BPF_PROG_TYPE_LSM and at program load time pass
> >>> lsm_hook_id as well, so that verifier can do safety checks
> >>> based on type info provided in LANDLOCK_HOOKs
> >>
> >> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
> >> verifier check programs according to their types. If we need to check
> >> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
> >> dedicated program types. I don't see any other way to do it with the
> >> current verifier code. Moreover it's the purpose of program types, right?
> > 
> > Adding new bpf program type for every lsm hook is not acceptable.
> > Either do one new program type + pass lsm_hook_id as suggested
> > or please come up with an alternative approach.
> 
> OK, so we have to modify the verifier to not only rely on the program
> type but on another value to check the context accesses. Do you have a
> hint from where this value could come from? Do we need to add a new bpf
> command to associate a program to a subtype?

It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
Both prog_type and prog_hook_id are used during verification.
prog_type distinguishes the main aspects whereas prog_hook_id selects
which lsm_hook's argument definition to apply.
At the time of attaching to a hook, the prog_hook_id passed at the
load time should match lsm's hook_id.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (netfilter match)
  2016-08-27 14:19         ` [RFC v2 09/10] landlock: Handle cgroups (netfilter match) Mickaël Salaün
@ 2016-08-27 18:32           ` Alexei Starovoitov
  0 siblings, 0 replies; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-27 18:32 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups

On Sat, Aug 27, 2016 at 04:19:05PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >> To sum up, there is four related patchsets:
> >> * "Landlock LSM: Unprivileged sandboxing" (this series)
> >> * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
> >> * "Networking cgroup controller" (Anoop Naravaram)
> >> * "Add eBPF hooks for cgroups" (Daniel Mack)
> 
> >>> Anoop Naravaram's use case is to control the ports the applications
> >>> under cgroup can bind and listen on.
> >>> Such use case can be solved by such 'lsm cgroup controller' by
> >>> attaching bpf program to security_socket_bind lsm hook and
> >>> filtering sockaddr.
> >>> Furthermore Sargun's use case is to allow further sockaddr rewrites
> >>> from the bpf program which can be done as natural extension
> >>> of such mechanism.
> >>>
> >>> If I understood Daniel's Anoop's Sargun's and yours use cases
> >>> correctly the common piece of kernel infrastructure that can solve
> >>> them all can start from Daniel's current set of patches that
> >>> establish a mechanism of attaching bpf program to a cgroup.
> >>> Then adding lsm hooks to it and later allowing argument rewrite
> >>> (since they're already in the kernel and no ToCToU problems exist)
> 
> >> For the network-related series, I think it make more sense to simply
> >> create a netfilter rule matching a cgroup and then add more features to
> >> netfilter (restrict port ranges and so on) thanks to eBPF programs.
> >> Containers are (usually) in a dedicated network namespace, which open
> >> the possibility to not only rely on cgroups (e.g. match UID,
> >> netmask...). It would also be more flexible to be able to load a BPF
> >> program in netfilter and update its maps on the fly to make dynamic
> >> rules, like ipset does, but in a more generic way.
> 
> What do the netdev folks think about this design?

such design doesn't scale when used for container management and
that's what we need to solve.
netns has its overhead and management issues. There are proposals to
solve that but that is orthogonal to containers in general.
A lot of them don't use netns.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-27 18:06           ` Alexei Starovoitov
@ 2016-08-27 19:35             ` Mickaël Salaün
  2016-08-27 20:43               ` Alexei Starovoitov
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 19:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups


[-- Attachment #1.1: Type: text/plain, Size: 4801 bytes --]


On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 01:05, Alexei Starovoitov wrote:
>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>>>
>>>>>
>>>>> - I don't think such 'for' loop can scale. The solution needs to work
>>>>> with thousands of containers and thousands of cgroups.
>>>>> In the patch 06/10 the proposal is to use 'current' as holder of
>>>>> the programs:
>>>>> +   for (prog = current->seccomp.landlock_prog;
>>>>> +                   prog; prog = prog->prev) {
>>>>> +           if (prog->filter == landlock_ret->filter) {
>>>>> +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
>>>>> +                   break;
>>>>> +           }
>>>>> +   }
>>>>> imo that's the root of scalability issue.
>>>>> I think to be able to scale the bpf programs have to be attached to
>>>>> cgroups instead of tasks.
>>>>> That would be very different api. seccomp doesn't need to be touched.
>>>>> But that is the only way I see to be able to scale.
>>>>
>>>> Landlock is inspired from seccomp which also use a BPF program per
>>>> thread. For seccomp, each BPF programs are executed for each syscall.
>>>> For Landlock, some BPF programs are executed for some LSM hooks. I don't
>>>> see why it is a scale issue for Landlock comparing to seccomp. I also
>>>> don't see why storing the BPF program list pointer in the cgroup struct
>>>> instead of the task struct change a lot here. The BPF programs execution
>>>> will be the same anyway (for each LSM hook). Kees should probably have a
>>>> better opinion on this.
>>>
>>> seccomp has its own issues and copying them doesn't make this lsm any better.
>>> Like seccomp bpf programs are all gigantic switch statement that looks
>>> for interesting syscall numbers. All syscalls of a task are paying
>>> non-trivial seccomp penalty due to such design. If bpf was attached per
>>> syscall it would have been much cheaper. Of course doing it this way
>>> for seccomp is not easy, but for lsm such facility is already there.
>>> Blank call of a single bpf prog for all lsm hooks is unnecessary
>>> overhead that can and should be avoided.
>>
>> It's probably a misunderstanding. Contrary to seccomp which run all the
>> thread's BPF programs for any syscall, Landlock only run eBPF programs
>> for the triggered LSM hooks, if their type match. Indeed, thanks to the
>> multiple eBPF program types and contrary to seccomp, Landlock only run
>> an eBPF program when needed. Landlock will have almost no performance
>> overhead if the syscalls do not trigger the watched LSM hooks for the
>> current process.
> 
> that's not what I see in the patch 06/10:
> all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> (which eventually means all lsm hooks) will call
> static inline int landlock_hook_##NAME
> which will call landlock_run_prog()
> which does:
> + for (landlock_ret = current->seccomp.landlock_ret;
> +      landlock_ret; landlock_ret = landlock_ret->prev) {
> +    if (landlock_ret->triggered) {
> +       ctx.cookie = landlock_ret->cookie;
> +       for (prog = current->seccomp.landlock_prog;
> +            prog; prog = prog->prev) {
> +               if (prog->filter == landlock_ret->filter) {
> +                       cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> +                       break;
> +               }
> +       }
> 
> that is unacceptable overhead and not a scalable design.
> It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> as soon as more lsm hooks are added.

Good catch! I forgot to check the program (sub)type in the loop to only
run the needed programs for the current hook. I will fix this.


> 
>> As said above, Landlock will not run an eBPF programs when not strictly
>> needed. Attaching to a cgroup will have the same performance impact as
>> attaching to a process hierarchy.
> 
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

Hum, I don't see the difference from a performance point of view between
a cgroup-based or a process hierarchy-based system.

Maybe a better option should be to use an array of pointers with N
entries, one for each supported hook, instead of a unique pointer list?

Anyway, being able to attach an LSM hook program to a cgroup thanks to
the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
to use a process hierarchy). The downside will be to handle an LSM hook
program which is not triggered by a seccomp-filter, but this should be
needed anyway to handle interruptions.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (program types)
  2016-08-27 18:19           ` Alexei Starovoitov
@ 2016-08-27 19:55             ` Mickaël Salaün
  2016-08-27 20:56               ` Alexei Starovoitov
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 19:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo


[-- Attachment #1.1: Type: text/plain, Size: 2645 bytes --]


On 27/08/2016 20:19, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 01:05, Alexei Starovoitov wrote:
>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>>
>>>>> As far as safety and type checking that bpf programs has to do,
>>>>> I like the approach of patch 06/10:
>>>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
>>>>> +       PTR_TO_STRUCT_FILE, struct file *, file,
>>>>> +       PTR_TO_STRUCT_CRED, const struct cred *, cred
>>>>> +)
>>>>> teaching verifier to recognize struct file, cred, sockaddr
>>>>> will let bpf program access them naturally without any overhead.
>>>>> Though:
>>>>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
>>>>>         BPF_PROG_TYPE_SCHED_CLS,
>>>>>         BPF_PROG_TYPE_SCHED_ACT,
>>>>>         BPF_PROG_TYPE_TRACEPOINT,
>>>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
>>>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
>>>>> +       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
>>>>>  };
>>>>> is a bit of overkill.
>>>>> I think it would be cleaner to have single
>>>>> BPF_PROG_TYPE_LSM and at program load time pass
>>>>> lsm_hook_id as well, so that verifier can do safety checks
>>>>> based on type info provided in LANDLOCK_HOOKs
>>>>
>>>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
>>>> verifier check programs according to their types. If we need to check
>>>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
>>>> dedicated program types. I don't see any other way to do it with the
>>>> current verifier code. Moreover it's the purpose of program types, right?
>>>
>>> Adding new bpf program type for every lsm hook is not acceptable.
>>> Either do one new program type + pass lsm_hook_id as suggested
>>> or please come up with an alternative approach.
>>
>> OK, so we have to modify the verifier to not only rely on the program
>> type but on another value to check the context accesses. Do you have a
>> hint from where this value could come from? Do we need to add a new bpf
>> command to associate a program to a subtype?
> 
> It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
> Both prog_type and prog_hook_id are used during verification.
> prog_type distinguishes the main aspects whereas prog_hook_id selects
> which lsm_hook's argument definition to apply.
> At the time of attaching to a hook, the prog_hook_id passed at the
> load time should match lsm's hook_id.

OK, so this new prog_subtype field should be use/set by a new bpf_cmd,
right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-27 19:35             ` Mickaël Salaün
@ 2016-08-27 20:43               ` Alexei Starovoitov
  2016-08-27 21:14                 ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-27 20:43 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups

On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>
> >> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>>>
> >>>>>
> >>>>> - I don't think such 'for' loop can scale. The solution needs to work
> >>>>> with thousands of containers and thousands of cgroups.
> >>>>> In the patch 06/10 the proposal is to use 'current' as holder of
> >>>>> the programs:
> >>>>> +   for (prog = current->seccomp.landlock_prog;
> >>>>> +                   prog; prog = prog->prev) {
> >>>>> +           if (prog->filter == landlock_ret->filter) {
> >>>>> +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> >>>>> +                   break;
> >>>>> +           }
> >>>>> +   }
> >>>>> imo that's the root of scalability issue.
> >>>>> I think to be able to scale the bpf programs have to be attached to
> >>>>> cgroups instead of tasks.
> >>>>> That would be very different api. seccomp doesn't need to be touched.
> >>>>> But that is the only way I see to be able to scale.
> >>>>
> >>>> Landlock is inspired from seccomp which also use a BPF program per
> >>>> thread. For seccomp, each BPF programs are executed for each syscall.
> >>>> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> >>>> see why it is a scale issue for Landlock comparing to seccomp. I also
> >>>> don't see why storing the BPF program list pointer in the cgroup struct
> >>>> instead of the task struct change a lot here. The BPF programs execution
> >>>> will be the same anyway (for each LSM hook). Kees should probably have a
> >>>> better opinion on this.
> >>>
> >>> seccomp has its own issues and copying them doesn't make this lsm any better.
> >>> Like seccomp bpf programs are all gigantic switch statement that looks
> >>> for interesting syscall numbers. All syscalls of a task are paying
> >>> non-trivial seccomp penalty due to such design. If bpf was attached per
> >>> syscall it would have been much cheaper. Of course doing it this way
> >>> for seccomp is not easy, but for lsm such facility is already there.
> >>> Blank call of a single bpf prog for all lsm hooks is unnecessary
> >>> overhead that can and should be avoided.
> >>
> >> It's probably a misunderstanding. Contrary to seccomp which run all the
> >> thread's BPF programs for any syscall, Landlock only run eBPF programs
> >> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> >> multiple eBPF program types and contrary to seccomp, Landlock only run
> >> an eBPF program when needed. Landlock will have almost no performance
> >> overhead if the syscalls do not trigger the watched LSM hooks for the
> >> current process.
> > 
> > that's not what I see in the patch 06/10:
> > all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> > (which eventually means all lsm hooks) will call
> > static inline int landlock_hook_##NAME
> > which will call landlock_run_prog()
> > which does:
> > + for (landlock_ret = current->seccomp.landlock_ret;
> > +      landlock_ret; landlock_ret = landlock_ret->prev) {
> > +    if (landlock_ret->triggered) {
> > +       ctx.cookie = landlock_ret->cookie;
> > +       for (prog = current->seccomp.landlock_prog;
> > +            prog; prog = prog->prev) {
> > +               if (prog->filter == landlock_ret->filter) {
> > +                       cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> > +                       break;
> > +               }
> > +       }
> > 
> > that is unacceptable overhead and not a scalable design.
> > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> > as soon as more lsm hooks are added.
> 
> Good catch! I forgot to check the program (sub)type in the loop to only
> run the needed programs for the current hook. I will fix this.
> 
> 
> > 
> >> As said above, Landlock will not run an eBPF programs when not strictly
> >> needed. Attaching to a cgroup will have the same performance impact as
> >> attaching to a process hierarchy.
> > 
> > Having a prog per cgroup per lsm_hook is the only scalable way I
> > could come up with. If you see another way, please propose.
> > current->seccomp.landlock_prog is not the answer.
> 
> Hum, I don't see the difference from a performance point of view between
> a cgroup-based or a process hierarchy-based system.
> 
> Maybe a better option should be to use an array of pointers with N
> entries, one for each supported hook, instead of a unique pointer list?

yes, clearly array dereference is faster than link list walk.
Now the question is where to keep this prog_array[num_lsm_hooks] ?
Since we cannot keep it inside task_struct, we have to allocate it.
Every time the task is creted then. What to do on the fork? That
will require changes all over. Then the obvious optimization would be
to share this allocated array of prog pointers across multiple tasks...
and little by little this new facility will look like cgroup.
Hence the suggestion to put this array into cgroup from the start.

> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> to use a process hierarchy). The downside will be to handle an LSM hook
> program which is not triggered by a seccomp-filter, but this should be
> needed anyway to handle interruptions.

what do you mean 'not triggered by seccomp' ?
You're not suggesting that this lsm has to enable seccomp to be functional?
imo that's non starter due to overhead.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (program types)
  2016-08-27 19:55             ` Mickaël Salaün
@ 2016-08-27 20:56               ` Alexei Starovoitov
  2016-08-27 21:18                 ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-27 20:56 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo

On Sat, Aug 27, 2016 at 09:55:01PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 20:19, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:
> >>
> >> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>>
> >>>>> As far as safety and type checking that bpf programs has to do,
> >>>>> I like the approach of patch 06/10:
> >>>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
> >>>>> +       PTR_TO_STRUCT_FILE, struct file *, file,
> >>>>> +       PTR_TO_STRUCT_CRED, const struct cred *, cred
> >>>>> +)
> >>>>> teaching verifier to recognize struct file, cred, sockaddr
> >>>>> will let bpf program access them naturally without any overhead.
> >>>>> Though:
> >>>>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
> >>>>>         BPF_PROG_TYPE_SCHED_CLS,
> >>>>>         BPF_PROG_TYPE_SCHED_ACT,
> >>>>>         BPF_PROG_TYPE_TRACEPOINT,
> >>>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
> >>>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
> >>>>> +       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
> >>>>>  };
> >>>>> is a bit of overkill.
> >>>>> I think it would be cleaner to have single
> >>>>> BPF_PROG_TYPE_LSM and at program load time pass
> >>>>> lsm_hook_id as well, so that verifier can do safety checks
> >>>>> based on type info provided in LANDLOCK_HOOKs
> >>>>
> >>>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
> >>>> verifier check programs according to their types. If we need to check
> >>>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
> >>>> dedicated program types. I don't see any other way to do it with the
> >>>> current verifier code. Moreover it's the purpose of program types, right?
> >>>
> >>> Adding new bpf program type for every lsm hook is not acceptable.
> >>> Either do one new program type + pass lsm_hook_id as suggested
> >>> or please come up with an alternative approach.
> >>
> >> OK, so we have to modify the verifier to not only rely on the program
> >> type but on another value to check the context accesses. Do you have a
> >> hint from where this value could come from? Do we need to add a new bpf
> >> command to associate a program to a subtype?
> > 
> > It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
> > Both prog_type and prog_hook_id are used during verification.
> > prog_type distinguishes the main aspects whereas prog_hook_id selects
> > which lsm_hook's argument definition to apply.
> > At the time of attaching to a hook, the prog_hook_id passed at the
> > load time should match lsm's hook_id.
> 
> OK, so this new prog_subtype field should be use/set by a new bpf_cmd,
> right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA?

No new command. It will be an optional field to existing BPF_PROG_LOAD.
In other words instead of replicating everything that different
bpf_prog_type-s need, we can pass this hook_id field to fine tune
the purpose (and applicability) of the program.
And one BPF_PROG_TYPE_LANDLOCK type will cover all lsm hooks.
For example existing BPF_PROG_TYPE_TRACEPOINT checks the safety
for all tracepoints while they all have different arguments.
For tracepoints it's easier, since the only difference between them
is the range of ctx access. Here we need strong type safety
of arguments therefore need extra hook_id at load time.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-27 20:43               ` Alexei Starovoitov
@ 2016-08-27 21:14                 ` Mickaël Salaün
  2016-08-28  8:13                   ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 21:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo, cgroups


[-- Attachment #1.1: Type: text/plain, Size: 2300 bytes --]


On 27/08/2016 22:43, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>>> As said above, Landlock will not run an eBPF programs when not strictly
>>>> needed. Attaching to a cgroup will have the same performance impact as
>>>> attaching to a process hierarchy.
>>>
>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>> could come up with. If you see another way, please propose.
>>> current->seccomp.landlock_prog is not the answer.
>>
>> Hum, I don't see the difference from a performance point of view between
>> a cgroup-based or a process hierarchy-based system.
>>
>> Maybe a better option should be to use an array of pointers with N
>> entries, one for each supported hook, instead of a unique pointer list?
> 
> yes, clearly array dereference is faster than link list walk.
> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> Since we cannot keep it inside task_struct, we have to allocate it.
> Every time the task is creted then. What to do on the fork? That
> will require changes all over. Then the obvious optimization would be
> to share this allocated array of prog pointers across multiple tasks...
> and little by little this new facility will look like cgroup.
> Hence the suggestion to put this array into cgroup from the start.

I see your point :)

> 
>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>> to use a process hierarchy). The downside will be to handle an LSM hook
>> program which is not triggered by a seccomp-filter, but this should be
>> needed anyway to handle interruptions.
> 
> what do you mean 'not triggered by seccomp' ?
> You're not suggesting that this lsm has to enable seccomp to be functional?
> imo that's non starter due to overhead.

Yes, for now, it is triggered by a new seccomp filter return value
RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
be needed but could be useful to bind a seccomp filter security policy
with a Landlock one. Waiting for Kees's point of view…


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (program types)
  2016-08-27 20:56               ` Alexei Starovoitov
@ 2016-08-27 21:18                 ` Mickaël Salaün
  0 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-27 21:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Daniel Mack, David S . Miller, Kees Cook,
	Sargun Dhillon, kernel-hardening, linux-api,
	linux-security-module, netdev, Tejun Heo


[-- Attachment #1.1: Type: text/plain, Size: 3548 bytes --]


On 27/08/2016 22:56, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:55:01PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 20:19, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:
>>>>
>>>> On 27/08/2016 01:05, Alexei Starovoitov wrote:
>>>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>>>>
>>>>>>> As far as safety and type checking that bpf programs has to do,
>>>>>>> I like the approach of patch 06/10:
>>>>>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
>>>>>>> +       PTR_TO_STRUCT_FILE, struct file *, file,
>>>>>>> +       PTR_TO_STRUCT_CRED, const struct cred *, cred
>>>>>>> +)
>>>>>>> teaching verifier to recognize struct file, cred, sockaddr
>>>>>>> will let bpf program access them naturally without any overhead.
>>>>>>> Though:
>>>>>>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
>>>>>>>         BPF_PROG_TYPE_SCHED_CLS,
>>>>>>>         BPF_PROG_TYPE_SCHED_ACT,
>>>>>>>         BPF_PROG_TYPE_TRACEPOINT,
>>>>>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
>>>>>>> +       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
>>>>>>> +       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
>>>>>>>  };
>>>>>>> is a bit of overkill.
>>>>>>> I think it would be cleaner to have single
>>>>>>> BPF_PROG_TYPE_LSM and at program load time pass
>>>>>>> lsm_hook_id as well, so that verifier can do safety checks
>>>>>>> based on type info provided in LANDLOCK_HOOKs
>>>>>>
>>>>>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
>>>>>> verifier check programs according to their types. If we need to check
>>>>>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
>>>>>> dedicated program types. I don't see any other way to do it with the
>>>>>> current verifier code. Moreover it's the purpose of program types, right?
>>>>>
>>>>> Adding new bpf program type for every lsm hook is not acceptable.
>>>>> Either do one new program type + pass lsm_hook_id as suggested
>>>>> or please come up with an alternative approach.
>>>>
>>>> OK, so we have to modify the verifier to not only rely on the program
>>>> type but on another value to check the context accesses. Do you have a
>>>> hint from where this value could come from? Do we need to add a new bpf
>>>> command to associate a program to a subtype?
>>>
>>> It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
>>> Both prog_type and prog_hook_id are used during verification.
>>> prog_type distinguishes the main aspects whereas prog_hook_id selects
>>> which lsm_hook's argument definition to apply.
>>> At the time of attaching to a hook, the prog_hook_id passed at the
>>> load time should match lsm's hook_id.
>>
>> OK, so this new prog_subtype field should be use/set by a new bpf_cmd,
>> right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA?
> 
> No new command. It will be an optional field to existing BPF_PROG_LOAD.
> In other words instead of replicating everything that different
> bpf_prog_type-s need, we can pass this hook_id field to fine tune
> the purpose (and applicability) of the program.
> And one BPF_PROG_TYPE_LANDLOCK type will cover all lsm hooks.
> For example existing BPF_PROG_TYPE_TRACEPOINT checks the safety
> for all tracepoints while they all have different arguments.
> For tracepoints it's easier, since the only difference between them
> is the range of ctx access. Here we need strong type safety
> of arguments therefore need extra hook_id at load time.

OK, I will do it.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-27 21:14                 ` Mickaël Salaün
@ 2016-08-28  8:13                   ` Andy Lutomirski
  2016-08-28  9:42                     ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-28  8:13 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Daniel Mack, Alexei Starovoitov,
	Daniel Borkmann

On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>
>
> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> >> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> >>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>>> As said above, Landlock will not run an eBPF programs when not strictly
> >>>> needed. Attaching to a cgroup will have the same performance impact as
> >>>> attaching to a process hierarchy.
> >>>
> >>> Having a prog per cgroup per lsm_hook is the only scalable way I
> >>> could come up with. If you see another way, please propose.
> >>> current->seccomp.landlock_prog is not the answer.
> >>
> >> Hum, I don't see the difference from a performance point of view between
> >> a cgroup-based or a process hierarchy-based system.
> >>
> >> Maybe a better option should be to use an array of pointers with N
> >> entries, one for each supported hook, instead of a unique pointer list?
> >
> > yes, clearly array dereference is faster than link list walk.
> > Now the question is where to keep this prog_array[num_lsm_hooks] ?
> > Since we cannot keep it inside task_struct, we have to allocate it.
> > Every time the task is creted then. What to do on the fork? That
> > will require changes all over. Then the obvious optimization would be
> > to share this allocated array of prog pointers across multiple tasks...
> > and little by little this new facility will look like cgroup.
> > Hence the suggestion to put this array into cgroup from the start.
>
> I see your point :)
>
> >
> >> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> >> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> >> to use a process hierarchy). The downside will be to handle an LSM hook
> >> program which is not triggered by a seccomp-filter, but this should be
> >> needed anyway to handle interruptions.
> >
> > what do you mean 'not triggered by seccomp' ?
> > You're not suggesting that this lsm has to enable seccomp to be functional?
> > imo that's non starter due to overhead.
>
> Yes, for now, it is triggered by a new seccomp filter return value
> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> be needed but could be useful to bind a seccomp filter security policy
> with a Landlock one. Waiting for Kees's point of view…
>

I'm not Kees, but I'd be okay with that.  I still think that doing
this by process hierarchy a la seccomp will be easier to use and to
understand (which is quite important for this kind of work) than doing
it by cgroup.

A feature I've wanted to add for a while is to have an fd that
represents a seccomp layer, the idea being that you would set up your
seccomp layer (with syscall filter, landlock hooks, etc) and then you
would have a syscall to install that layer.  Then an unprivileged
sandbox manager could set up its layer and still be able to inject new
processes into it later on, no cgroups needed.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups
  2016-08-27 18:11           ` Alexei Starovoitov
@ 2016-08-28  8:14             ` Andy Lutomirski
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-28  8:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: LSM List, Network Development, Alexei Starovoitov, Linux API,
	Sargun Dhillon, Tejun Heo, Kees Cook, David S . Miller,
	open list:CONTROL GROUP (CGROUP),
	Mickaël Salaün, Daniel Mack, linux-kernel,
	kernel-hardening, Daniel Borkmann

On Aug 27, 2016 8:12 PM, "Alexei Starovoitov"
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Aug 27, 2016 at 12:30:36AM -0700, Andy Lutomirski wrote:
> > > cgroup is the common way to group multiple tasks.
> > > Without cgroup only parent<->child relationship will be possible,
> > > which will limit usability of such lsm to a master task that controls
> > > its children. Such api restriction would have been ok, if we could
> > > extend it in the future, but unfortunately task-centric won't allow it
> > > without creating a parallel lsm that is cgroup based.
> > > Therefore I think we have to go with cgroup-centric api and your
> > > application has to use cgroups from the start though only parent-child
> > > would have been enough.
> > > Also I don't think the kernel can afford two bpf based lsm. One task
> > > based and another cgroup based, so we have to find common ground
> > > that suits both use cases.
> > > Having unprivliged access is a subset. There is no strong reason why
> > > cgroup+lsm+bpf should be limited to root only always.
> > > When we can guarantee no pointer leaks, we can allow unpriv.
> >
> > I don't really understand what you mean.  In the context of landlock,
> > which is a *sandbox*, can one of you explain a use case that
> > materially benefits from this type of cgroup usage?  I haven't thought
> > of one.
>
> In case of seccomp-like sandbox where parent controls child processes
> cgroup is not needed. It's needed when container management software
> needs to control a set of applications. If we can have one bpf-based lsm
> that works via cgroup and without, I'd be fine with it. Right now
> I haven't seen a plausible proposal to do that. Therefore cgroup based
> api is a common api that works for sandbox as well, though requiring
> parent to create a cgroup just to control a single child is cumbersome.
>

I don't believe that a common API can work to accomplish your goal.
For privileged container management, the manager is trusted.  For
unprivileged sandboxing, the manager is emphatically not trusted,
which means you need special rules like NO_NEW_PRIVS, and, unless you
want to start restricting setuid and such in some cgroups, you really
do need a different interface for joining the sandbox than whatever
the container manager is using.

What could make sense is to have one BPF-based LSM that supports both
a seccomp-like unprivileged interface and a cgroup-based privileged
interface.  Most of the code for it is the BPF part anyway -- all that
the cgroup or seccomp part needs to do is to figure out which BPF
program(s) to call.

Also, for container management software, you don't really need
everything tied to cgroup -- you just need a way to cleanly add new
processes to the same security context.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-28  8:13                   ` Andy Lutomirski
@ 2016-08-28  9:42                     ` Mickaël Salaün
  2016-08-30 18:55                       ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-28  9:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Daniel Mack, Alexei Starovoitov,
	Daniel Borkmann


[-- Attachment #1.1: Type: text/plain, Size: 4048 bytes --]



On 28/08/2016 10:13, Andy Lutomirski wrote:
> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>
>>
>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
>>>>>> needed. Attaching to a cgroup will have the same performance impact as
>>>>>> attaching to a process hierarchy.
>>>>>
>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>>>> could come up with. If you see another way, please propose.
>>>>> current->seccomp.landlock_prog is not the answer.
>>>>
>>>> Hum, I don't see the difference from a performance point of view between
>>>> a cgroup-based or a process hierarchy-based system.
>>>>
>>>> Maybe a better option should be to use an array of pointers with N
>>>> entries, one for each supported hook, instead of a unique pointer list?
>>>
>>> yes, clearly array dereference is faster than link list walk.
>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>> Every time the task is creted then. What to do on the fork? That
>>> will require changes all over. Then the obvious optimization would be
>>> to share this allocated array of prog pointers across multiple tasks...
>>> and little by little this new facility will look like cgroup.
>>> Hence the suggestion to put this array into cgroup from the start.
>>
>> I see your point :)
>>
>>>
>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>>>> to use a process hierarchy). The downside will be to handle an LSM hook
>>>> program which is not triggered by a seccomp-filter, but this should be
>>>> needed anyway to handle interruptions.
>>>
>>> what do you mean 'not triggered by seccomp' ?
>>> You're not suggesting that this lsm has to enable seccomp to be functional?
>>> imo that's non starter due to overhead.
>>
>> Yes, for now, it is triggered by a new seccomp filter return value
>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>> be needed but could be useful to bind a seccomp filter security policy
>> with a Landlock one. Waiting for Kees's point of view…
>>
> 
> I'm not Kees, but I'd be okay with that.  I still think that doing
> this by process hierarchy a la seccomp will be easier to use and to
> understand (which is quite important for this kind of work) than doing
> it by cgroup.
> 
> A feature I've wanted to add for a while is to have an fd that
> represents a seccomp layer, the idea being that you would set up your
> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> would have a syscall to install that layer.  Then an unprivileged
> sandbox manager could set up its layer and still be able to inject new
> processes into it later on, no cgroups needed.

A nice thing I didn't highlight about Landlock is that a process can
prepare a layer of rules (arraymap of handles + Landlock programs) and
pass the file descriptors of the Landlock programs to another process.
This process could then apply this programs to get sandboxed. However,
for now, because a Landlock program is only triggered by a seccomp
filter (which do not follow the Landlock programs as a FD), they will be
useless.

The FD referring to an arraymap of handles can also be used to update a
map and change the behavior of a Landlock program. A master process can
then add or remove restrictions to another process hierarchy on the fly.

However, I think it would make more sense to use cgroups if we want to
move an existing (unwilling) unsandoxed process into a sandboxed
environment. Of course, some more no_new_privs checks would be needed.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (11 preceding siblings ...)
  2016-08-27  7:40 ` Andy Lutomirski
@ 2016-08-30 16:06 ` Andy Lutomirski
  2016-08-30 19:51   ` Mickaël Salaün
  2016-09-15  9:19 ` Pavel Machek
  13 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-30 16:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, kernel-hardening,
	Linux API, LSM List, Network Development

On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.

Mickaël, will you be at KS and/or LPC?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-28  9:42                     ` Mickaël Salaün
@ 2016-08-30 18:55                       ` Andy Lutomirski
  2016-08-30 20:20                         ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-30 18:55 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Daniel Mack, Alexei Starovoitov,
	Daniel Borkmann

On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote:
>
>
> On 28/08/2016 10:13, Andy Lutomirski wrote:
>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>>
>>>
>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
>>>>>>> needed. Attaching to a cgroup will have the same performance impact as
>>>>>>> attaching to a process hierarchy.
>>>>>>
>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>>>>> could come up with. If you see another way, please propose.
>>>>>> current->seccomp.landlock_prog is not the answer.
>>>>>
>>>>> Hum, I don't see the difference from a performance point of view between
>>>>> a cgroup-based or a process hierarchy-based system.
>>>>>
>>>>> Maybe a better option should be to use an array of pointers with N
>>>>> entries, one for each supported hook, instead of a unique pointer list?
>>>>
>>>> yes, clearly array dereference is faster than link list walk.
>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>>> Every time the task is creted then. What to do on the fork? That
>>>> will require changes all over. Then the obvious optimization would be
>>>> to share this allocated array of prog pointers across multiple tasks...
>>>> and little by little this new facility will look like cgroup.
>>>> Hence the suggestion to put this array into cgroup from the start.
>>>
>>> I see your point :)
>>>
>>>>
>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>>>>> to use a process hierarchy). The downside will be to handle an LSM hook
>>>>> program which is not triggered by a seccomp-filter, but this should be
>>>>> needed anyway to handle interruptions.
>>>>
>>>> what do you mean 'not triggered by seccomp' ?
>>>> You're not suggesting that this lsm has to enable seccomp to be functional?
>>>> imo that's non starter due to overhead.
>>>
>>> Yes, for now, it is triggered by a new seccomp filter return value
>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>>> be needed but could be useful to bind a seccomp filter security policy
>>> with a Landlock one. Waiting for Kees's point of view…
>>>
>>
>> I'm not Kees, but I'd be okay with that.  I still think that doing
>> this by process hierarchy a la seccomp will be easier to use and to
>> understand (which is quite important for this kind of work) than doing
>> it by cgroup.
>>
>> A feature I've wanted to add for a while is to have an fd that
>> represents a seccomp layer, the idea being that you would set up your
>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>> would have a syscall to install that layer.  Then an unprivileged
>> sandbox manager could set up its layer and still be able to inject new
>> processes into it later on, no cgroups needed.
>
> A nice thing I didn't highlight about Landlock is that a process can
> prepare a layer of rules (arraymap of handles + Landlock programs) and
> pass the file descriptors of the Landlock programs to another process.
> This process could then apply this programs to get sandboxed. However,
> for now, because a Landlock program is only triggered by a seccomp
> filter (which do not follow the Landlock programs as a FD), they will be
> useless.
>
> The FD referring to an arraymap of handles can also be used to update a
> map and change the behavior of a Landlock program. A master process can
> then add or remove restrictions to another process hierarchy on the fly.

Maybe this could be extended a little bit.  The fd could hold the
seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
the ability to install it and FMODE_WRITE could give the ability to
modify it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 06/10] landlock: Add LSM hooks
  2016-08-25 10:32 ` [RFC v2 06/10] landlock: Add LSM hooks Mickaël Salaün
@ 2016-08-30 18:56   ` Andy Lutomirski
  2016-08-30 20:10     ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-30 18:56 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Serge E. Hallyn, David Drysdale, kernel-hardening,
	Alexei Starovoitov, James Morris, Sargun Dhillon,
	Network Development, Casey Schaufler, Linux API, Kees Cook,
	LSM List, linux-kernel, David S . Miller, Daniel Mack,
	Arnd Bergmann, Will Drewry, Paul Moore, Elena Reshetova,
	Daniel Borkmann

On Aug 25, 2016 12:34 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>
> Add LSM hooks which can be used by userland through Landlock (eBPF)
> programs. This programs are limited to a whitelist of functions (cf.
> next commit). The eBPF program context is depicted by the struct
> landlock_data (cf. include/uapi/linux/bpf.h):
> * hook: LSM hook ID (useful when using the same program for multiple LSM
>   hooks);
> * cookie: the 16-bit value from the seccomp filter that triggered this
>   Landlock program;
> * args[6]: array of LSM hook arguments.
>
> The LSM hook arguments can contain raw values as integers or
> (unleakable) pointers. The only way to use the pointers are to pass them
> to an eBPF function according to their types (e.g. the
> bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
> file pointer).
>
> For now, there is three hooks for file system access control:
> * file_open;
> * file_permission;
> * mmap_file.
>

What's the purpose of exposing struct cred * to userspace?  It's
primarily just an optimization to save a bit of RAM, and it's a
dubious optimization at that.  What are you using it for?  Would it
make more sense to use struct task_struct * or struct pid * instead?

Also, exposing struct cred * has a really weird side-effect: it allows
(maybe even encourages) checking for pointer equality between two
struct cred * objects.  Doing so will have erratic results.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-30 16:06 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
@ 2016-08-30 19:51   ` Mickaël Salaün
  2016-08-30 19:55     ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-30 19:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development, Tejun Heo


[-- Attachment #1.1: Type: text/plain, Size: 897 bytes --]


On 30/08/2016 18:06, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> Hi,
>>
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
> 
> Mickaël, will you be at KS and/or LPC?
> 

I won't be at KS/LPC but I will give a talk at Kernel Recipes (Paris)
for which registration will start Thursday (and will not last long). :)

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-30 19:51   ` Mickaël Salaün
@ 2016-08-30 19:55     ` Andy Lutomirski
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-30 19:55 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development, Tejun Heo

On Tue, Aug 30, 2016 at 12:51 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 30/08/2016 18:06, Andy Lutomirski wrote:
>> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> Hi,
>>>
>>> This series is a proof of concept to fill some missing part of seccomp as the
>>> ability to check syscall argument pointers or creating more dynamic security
>>> policies. The goal of this new stackable Linux Security Module (LSM) called
>>> Landlock is to allow any process, including unprivileged ones, to create
>>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>>> bugs or unexpected/malicious behaviors in userland applications.
>>
>> Mickaël, will you be at KS and/or LPC?
>>
>
> I won't be at KS/LPC but I will give a talk at Kernel Recipes (Paris)
> for which registration will start Thursday (and will not last long). :)

There's a teeny tiny chance I'll be there.  I've done way too much
traveling lately.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 06/10] landlock: Add LSM hooks
  2016-08-30 18:56   ` Andy Lutomirski
@ 2016-08-30 20:10     ` Mickaël Salaün
  2016-08-30 20:18       ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-30 20:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Serge E. Hallyn, David Drysdale, kernel-hardening,
	Alexei Starovoitov, James Morris, Sargun Dhillon,
	Network Development, Casey Schaufler, Linux API, Kees Cook,
	LSM List, linux-kernel, David S . Miller, Daniel Mack,
	Arnd Bergmann, Will Drewry, Paul Moore, Elena Reshetova,
	Daniel Borkmann


[-- Attachment #1.1: Type: text/plain, Size: 2347 bytes --]


On 30/08/2016 20:56, Andy Lutomirski wrote:
> On Aug 25, 2016 12:34 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>
>> Add LSM hooks which can be used by userland through Landlock (eBPF)
>> programs. This programs are limited to a whitelist of functions (cf.
>> next commit). The eBPF program context is depicted by the struct
>> landlock_data (cf. include/uapi/linux/bpf.h):
>> * hook: LSM hook ID (useful when using the same program for multiple LSM
>>   hooks);
>> * cookie: the 16-bit value from the seccomp filter that triggered this
>>   Landlock program;
>> * args[6]: array of LSM hook arguments.
>>
>> The LSM hook arguments can contain raw values as integers or
>> (unleakable) pointers. The only way to use the pointers are to pass them
>> to an eBPF function according to their types (e.g. the
>> bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
>> file pointer).
>>
>> For now, there is three hooks for file system access control:
>> * file_open;
>> * file_permission;
>> * mmap_file.
>>
> 
> What's the purpose of exposing struct cred * to userspace?  It's
> primarily just an optimization to save a bit of RAM, and it's a
> dubious optimization at that.  What are you using it for?  Would it
> make more sense to use struct task_struct * or struct pid * instead?
> 
> Also, exposing struct cred * has a really weird side-effect: it allows
> (maybe even encourages) checking for pointer equality between two
> struct cred * objects.  Doing so will have erratic results.
> 

The pointers exposed in the ePBF context are not directly readable by an
unprivileged eBPF program thanks to the strong typing of the Landlock
context and the static eBPF verification. There is no way to leak a
kernel pointer to userspace from an unprivileged eBPF program: pointer
arithmetic and comparison are prohibited. Pointers can only be pass as
argument to dedicated eBPF functions.

For now, struct cred * is simply not used by any eBPF function and then
not usable at all. It only exist here because I map the LSM hook
arguments in a generic/automatic way to the eBPF context.

I'm planning to extend the Landlock context with extra pointers,
whatever the LSM hook. We could then use task_struct, skb or any other
kernel objects, in a safe way, with dedicated functions.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 06/10] landlock: Add LSM hooks
  2016-08-30 20:10     ` Mickaël Salaün
@ 2016-08-30 20:18       ` Andy Lutomirski
  2016-08-30 20:27         ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-30 20:18 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Serge E. Hallyn, David Drysdale, kernel-hardening,
	Alexei Starovoitov, James Morris, Sargun Dhillon,
	Network Development, Casey Schaufler, Linux API, Kees Cook,
	LSM List, linux-kernel, David S . Miller, Daniel Mack,
	Arnd Bergmann, Will Drewry, Paul Moore, Elena Reshetova,
	Daniel Borkmann

On Tue, Aug 30, 2016 at 1:10 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 30/08/2016 20:56, Andy Lutomirski wrote:
>> On Aug 25, 2016 12:34 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>>
>>> Add LSM hooks which can be used by userland through Landlock (eBPF)
>>> programs. This programs are limited to a whitelist of functions (cf.
>>> next commit). The eBPF program context is depicted by the struct
>>> landlock_data (cf. include/uapi/linux/bpf.h):
>>> * hook: LSM hook ID (useful when using the same program for multiple LSM
>>>   hooks);
>>> * cookie: the 16-bit value from the seccomp filter that triggered this
>>>   Landlock program;
>>> * args[6]: array of LSM hook arguments.
>>>
>>> The LSM hook arguments can contain raw values as integers or
>>> (unleakable) pointers. The only way to use the pointers are to pass them
>>> to an eBPF function according to their types (e.g. the
>>> bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
>>> file pointer).
>>>
>>> For now, there is three hooks for file system access control:
>>> * file_open;
>>> * file_permission;
>>> * mmap_file.
>>>
>>
>> What's the purpose of exposing struct cred * to userspace?  It's
>> primarily just an optimization to save a bit of RAM, and it's a
>> dubious optimization at that.  What are you using it for?  Would it
>> make more sense to use struct task_struct * or struct pid * instead?
>>
>> Also, exposing struct cred * has a really weird side-effect: it allows
>> (maybe even encourages) checking for pointer equality between two
>> struct cred * objects.  Doing so will have erratic results.
>>
>
> The pointers exposed in the ePBF context are not directly readable by an
> unprivileged eBPF program thanks to the strong typing of the Landlock
> context and the static eBPF verification. There is no way to leak a
> kernel pointer to userspace from an unprivileged eBPF program: pointer
> arithmetic and comparison are prohibited. Pointers can only be pass as
> argument to dedicated eBPF functions.

I'm not talking about leaking the value -- I'm talking about leaking
the predicate (a == b) for two struct cred pointers.  That predicate
shouldn't be available because it has very odd effects.

>
> For now, struct cred * is simply not used by any eBPF function and then
> not usable at all. It only exist here because I map the LSM hook
> arguments in a generic/automatic way to the eBPF context.

Maybe remove it from this patch set then?

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-30 18:55                       ` Andy Lutomirski
@ 2016-08-30 20:20                         ` Mickaël Salaün
  2016-08-30 20:23                           ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-30 20:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Daniel Mack, Alexei Starovoitov,
	Daniel Borkmann


[-- Attachment #1.1: Type: text/plain, Size: 4687 bytes --]


On 30/08/2016 20:55, Andy Lutomirski wrote:
> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>>
>> On 28/08/2016 10:13, Andy Lutomirski wrote:
>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>>>
>>>>
>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as
>>>>>>>> attaching to a process hierarchy.
>>>>>>>
>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>>>>>> could come up with. If you see another way, please propose.
>>>>>>> current->seccomp.landlock_prog is not the answer.
>>>>>>
>>>>>> Hum, I don't see the difference from a performance point of view between
>>>>>> a cgroup-based or a process hierarchy-based system.
>>>>>>
>>>>>> Maybe a better option should be to use an array of pointers with N
>>>>>> entries, one for each supported hook, instead of a unique pointer list?
>>>>>
>>>>> yes, clearly array dereference is faster than link list walk.
>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>>>> Every time the task is creted then. What to do on the fork? That
>>>>> will require changes all over. Then the obvious optimization would be
>>>>> to share this allocated array of prog pointers across multiple tasks...
>>>>> and little by little this new facility will look like cgroup.
>>>>> Hence the suggestion to put this array into cgroup from the start.
>>>>
>>>> I see your point :)
>>>>
>>>>>
>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook
>>>>>> program which is not triggered by a seccomp-filter, but this should be
>>>>>> needed anyway to handle interruptions.
>>>>>
>>>>> what do you mean 'not triggered by seccomp' ?
>>>>> You're not suggesting that this lsm has to enable seccomp to be functional?
>>>>> imo that's non starter due to overhead.
>>>>
>>>> Yes, for now, it is triggered by a new seccomp filter return value
>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>>>> be needed but could be useful to bind a seccomp filter security policy
>>>> with a Landlock one. Waiting for Kees's point of view…
>>>>
>>>
>>> I'm not Kees, but I'd be okay with that.  I still think that doing
>>> this by process hierarchy a la seccomp will be easier to use and to
>>> understand (which is quite important for this kind of work) than doing
>>> it by cgroup.
>>>
>>> A feature I've wanted to add for a while is to have an fd that
>>> represents a seccomp layer, the idea being that you would set up your
>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>>> would have a syscall to install that layer.  Then an unprivileged
>>> sandbox manager could set up its layer and still be able to inject new
>>> processes into it later on, no cgroups needed.
>>
>> A nice thing I didn't highlight about Landlock is that a process can
>> prepare a layer of rules (arraymap of handles + Landlock programs) and
>> pass the file descriptors of the Landlock programs to another process.
>> This process could then apply this programs to get sandboxed. However,
>> for now, because a Landlock program is only triggered by a seccomp
>> filter (which do not follow the Landlock programs as a FD), they will be
>> useless.
>>
>> The FD referring to an arraymap of handles can also be used to update a
>> map and change the behavior of a Landlock program. A master process can
>> then add or remove restrictions to another process hierarchy on the fly.
> 
> Maybe this could be extended a little bit.  The fd could hold the
> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> the ability to install it and FMODE_WRITE could give the ability to
> modify it.
> 

This is interesting! It should be possible to append the seccomp stack
of a source process to the seccomp stack of the target process when a
Landlock program is passed and then activated through seccomp(2).

For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
permission of the eBPF program FD in a specific way?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-30 20:20                         ` Mickaël Salaün
@ 2016-08-30 20:23                           ` Andy Lutomirski
  2016-08-30 20:33                             ` Mickaël Salaün
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-30 20:23 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Daniel Mack, Alexei Starovoitov,
	Daniel Borkmann

On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 30/08/2016 20:55, Andy Lutomirski wrote:
>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>>
>>> On 28/08/2016 10:13, Andy Lutomirski wrote:
>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>>>>
>>>>>
>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as
>>>>>>>>> attaching to a process hierarchy.
>>>>>>>>
>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>>>>>>> could come up with. If you see another way, please propose.
>>>>>>>> current->seccomp.landlock_prog is not the answer.
>>>>>>>
>>>>>>> Hum, I don't see the difference from a performance point of view between
>>>>>>> a cgroup-based or a process hierarchy-based system.
>>>>>>>
>>>>>>> Maybe a better option should be to use an array of pointers with N
>>>>>>> entries, one for each supported hook, instead of a unique pointer list?
>>>>>>
>>>>>> yes, clearly array dereference is faster than link list walk.
>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>>>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>>>>> Every time the task is creted then. What to do on the fork? That
>>>>>> will require changes all over. Then the obvious optimization would be
>>>>>> to share this allocated array of prog pointers across multiple tasks...
>>>>>> and little by little this new facility will look like cgroup.
>>>>>> Hence the suggestion to put this array into cgroup from the start.
>>>>>
>>>>> I see your point :)
>>>>>
>>>>>>
>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook
>>>>>>> program which is not triggered by a seccomp-filter, but this should be
>>>>>>> needed anyway to handle interruptions.
>>>>>>
>>>>>> what do you mean 'not triggered by seccomp' ?
>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional?
>>>>>> imo that's non starter due to overhead.
>>>>>
>>>>> Yes, for now, it is triggered by a new seccomp filter return value
>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>>>>> be needed but could be useful to bind a seccomp filter security policy
>>>>> with a Landlock one. Waiting for Kees's point of view…
>>>>>
>>>>
>>>> I'm not Kees, but I'd be okay with that.  I still think that doing
>>>> this by process hierarchy a la seccomp will be easier to use and to
>>>> understand (which is quite important for this kind of work) than doing
>>>> it by cgroup.
>>>>
>>>> A feature I've wanted to add for a while is to have an fd that
>>>> represents a seccomp layer, the idea being that you would set up your
>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>>>> would have a syscall to install that layer.  Then an unprivileged
>>>> sandbox manager could set up its layer and still be able to inject new
>>>> processes into it later on, no cgroups needed.
>>>
>>> A nice thing I didn't highlight about Landlock is that a process can
>>> prepare a layer of rules (arraymap of handles + Landlock programs) and
>>> pass the file descriptors of the Landlock programs to another process.
>>> This process could then apply this programs to get sandboxed. However,
>>> for now, because a Landlock program is only triggered by a seccomp
>>> filter (which do not follow the Landlock programs as a FD), they will be
>>> useless.
>>>
>>> The FD referring to an arraymap of handles can also be used to update a
>>> map and change the behavior of a Landlock program. A master process can
>>> then add or remove restrictions to another process hierarchy on the fly.
>>
>> Maybe this could be extended a little bit.  The fd could hold the
>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
>> the ability to install it and FMODE_WRITE could give the ability to
>> modify it.
>>
>
> This is interesting! It should be possible to append the seccomp stack
> of a source process to the seccomp stack of the target process when a
> Landlock program is passed and then activated through seccomp(2).
>
> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
> permission of the eBPF program FD in a specific way?
>

This wouldn't be an eBPF program FD -- it would be an FD encapsulating
an entire configuration including seccomp BPF program, whatever
landlock stuff is associated, and eventual seccomp monitor
configuration (once I write that code), etc.

You wouldn't say "attach this process's seccomp stack to me" -- you'd
say "attach this seccomp layer to me".

A decision that we'd have to make would be whether the FD links to the
parent layer or whether it can be attached without regard to what the
parent layer is.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 06/10] landlock: Add LSM hooks
  2016-08-30 20:18       ` Andy Lutomirski
@ 2016-08-30 20:27         ` Mickaël Salaün
  0 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-30 20:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Serge E. Hallyn, David Drysdale, kernel-hardening,
	Alexei Starovoitov, James Morris, Sargun Dhillon,
	Network Development, Casey Schaufler, Linux API, Kees Cook,
	LSM List, linux-kernel, David S . Miller, Daniel Mack,
	Arnd Bergmann, Will Drewry, Paul Moore, Elena Reshetova,
	Daniel Borkmann


[-- Attachment #1.1: Type: text/plain, Size: 2804 bytes --]


On 30/08/2016 22:18, Andy Lutomirski wrote:
> On Tue, Aug 30, 2016 at 1:10 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 30/08/2016 20:56, Andy Lutomirski wrote:
>>> On Aug 25, 2016 12:34 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>>>
>>>> Add LSM hooks which can be used by userland through Landlock (eBPF)
>>>> programs. This programs are limited to a whitelist of functions (cf.
>>>> next commit). The eBPF program context is depicted by the struct
>>>> landlock_data (cf. include/uapi/linux/bpf.h):
>>>> * hook: LSM hook ID (useful when using the same program for multiple LSM
>>>>   hooks);
>>>> * cookie: the 16-bit value from the seccomp filter that triggered this
>>>>   Landlock program;
>>>> * args[6]: array of LSM hook arguments.
>>>>
>>>> The LSM hook arguments can contain raw values as integers or
>>>> (unleakable) pointers. The only way to use the pointers are to pass them
>>>> to an eBPF function according to their types (e.g. the
>>>> bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
>>>> file pointer).
>>>>
>>>> For now, there is three hooks for file system access control:
>>>> * file_open;
>>>> * file_permission;
>>>> * mmap_file.
>>>>
>>>
>>> What's the purpose of exposing struct cred * to userspace?  It's
>>> primarily just an optimization to save a bit of RAM, and it's a
>>> dubious optimization at that.  What are you using it for?  Would it
>>> make more sense to use struct task_struct * or struct pid * instead?
>>>
>>> Also, exposing struct cred * has a really weird side-effect: it allows
>>> (maybe even encourages) checking for pointer equality between two
>>> struct cred * objects.  Doing so will have erratic results.
>>>
>>
>> The pointers exposed in the ePBF context are not directly readable by an
>> unprivileged eBPF program thanks to the strong typing of the Landlock
>> context and the static eBPF verification. There is no way to leak a
>> kernel pointer to userspace from an unprivileged eBPF program: pointer
>> arithmetic and comparison are prohibited. Pointers can only be pass as
>> argument to dedicated eBPF functions.
> 
> I'm not talking about leaking the value -- I'm talking about leaking
> the predicate (a == b) for two struct cred pointers.  That predicate
> shouldn't be available because it has very odd effects.

I'm pretty sure this case is covered with the impossibility of doing
pointers comparison.

> 
>>
>> For now, struct cred * is simply not used by any eBPF function and then
>> not usable at all. It only exist here because I map the LSM hook
>> arguments in a generic/automatic way to the eBPF context.
> 
> Maybe remove it from this patch set then?

Well, this is done with the LANDLOCK_HOOK* macros but I will remove it.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-30 20:23                           ` Andy Lutomirski
@ 2016-08-30 20:33                             ` Mickaël Salaün
  2016-08-30 20:55                               ` Alexei Starovoitov
  0 siblings, 1 reply; 66+ messages in thread
From: Mickaël Salaün @ 2016-08-30 20:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: kernel-hardening, Alexei Starovoitov, Tejun Heo, Sargun Dhillon,
	Network Development, Linux API, Kees Cook, LSM List,
	linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Daniel Mack, Alexei Starovoitov,
	Daniel Borkmann


[-- Attachment #1.1: Type: text/plain, Size: 5619 bytes --]



On 30/08/2016 22:23, Andy Lutomirski wrote:
> On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 30/08/2016 20:55, Andy Lutomirski wrote:
>>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>
>>>>
>>>> On 28/08/2016 10:13, Andy Lutomirski wrote:
>>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
>>>>>>
>>>>>>
>>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
>>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as
>>>>>>>>>> attaching to a process hierarchy.
>>>>>>>>>
>>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>>>>>>>> could come up with. If you see another way, please propose.
>>>>>>>>> current->seccomp.landlock_prog is not the answer.
>>>>>>>>
>>>>>>>> Hum, I don't see the difference from a performance point of view between
>>>>>>>> a cgroup-based or a process hierarchy-based system.
>>>>>>>>
>>>>>>>> Maybe a better option should be to use an array of pointers with N
>>>>>>>> entries, one for each supported hook, instead of a unique pointer list?
>>>>>>>
>>>>>>> yes, clearly array dereference is faster than link list walk.
>>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>>>>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>>>>>> Every time the task is creted then. What to do on the fork? That
>>>>>>> will require changes all over. Then the obvious optimization would be
>>>>>>> to share this allocated array of prog pointers across multiple tasks...
>>>>>>> and little by little this new facility will look like cgroup.
>>>>>>> Hence the suggestion to put this array into cgroup from the start.
>>>>>>
>>>>>> I see your point :)
>>>>>>
>>>>>>>
>>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook
>>>>>>>> program which is not triggered by a seccomp-filter, but this should be
>>>>>>>> needed anyway to handle interruptions.
>>>>>>>
>>>>>>> what do you mean 'not triggered by seccomp' ?
>>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional?
>>>>>>> imo that's non starter due to overhead.
>>>>>>
>>>>>> Yes, for now, it is triggered by a new seccomp filter return value
>>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>>>>>> be needed but could be useful to bind a seccomp filter security policy
>>>>>> with a Landlock one. Waiting for Kees's point of view…
>>>>>>
>>>>>
>>>>> I'm not Kees, but I'd be okay with that.  I still think that doing
>>>>> this by process hierarchy a la seccomp will be easier to use and to
>>>>> understand (which is quite important for this kind of work) than doing
>>>>> it by cgroup.
>>>>>
>>>>> A feature I've wanted to add for a while is to have an fd that
>>>>> represents a seccomp layer, the idea being that you would set up your
>>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>>>>> would have a syscall to install that layer.  Then an unprivileged
>>>>> sandbox manager could set up its layer and still be able to inject new
>>>>> processes into it later on, no cgroups needed.
>>>>
>>>> A nice thing I didn't highlight about Landlock is that a process can
>>>> prepare a layer of rules (arraymap of handles + Landlock programs) and
>>>> pass the file descriptors of the Landlock programs to another process.
>>>> This process could then apply this programs to get sandboxed. However,
>>>> for now, because a Landlock program is only triggered by a seccomp
>>>> filter (which do not follow the Landlock programs as a FD), they will be
>>>> useless.
>>>>
>>>> The FD referring to an arraymap of handles can also be used to update a
>>>> map and change the behavior of a Landlock program. A master process can
>>>> then add or remove restrictions to another process hierarchy on the fly.
>>>
>>> Maybe this could be extended a little bit.  The fd could hold the
>>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
>>> the ability to install it and FMODE_WRITE could give the ability to
>>> modify it.
>>>
>>
>> This is interesting! It should be possible to append the seccomp stack
>> of a source process to the seccomp stack of the target process when a
>> Landlock program is passed and then activated through seccomp(2).
>>
>> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
>> permission of the eBPF program FD in a specific way?
>>
> 
> This wouldn't be an eBPF program FD -- it would be an FD encapsulating
> an entire configuration including seccomp BPF program, whatever
> landlock stuff is associated, and eventual seccomp monitor
> configuration (once I write that code), etc.
> 
> You wouldn't say "attach this process's seccomp stack to me" -- you'd
> say "attach this seccomp layer to me".
> 
> A decision that we'd have to make would be whether the FD links to the
> parent layer or whether it can be attached without regard to what the
> parent layer is.

OK, I like that, but I think it could be done on a second time. :)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-30 20:33                             ` Mickaël Salaün
@ 2016-08-30 20:55                               ` Alexei Starovoitov
  2016-08-30 21:45                                 ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-30 20:55 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Andy Lutomirski, kernel-hardening, Alexei Starovoitov, Tejun Heo,
	Sargun Dhillon, Network Development, Linux API, Kees Cook,
	LSM List, linux-kernel, open list:CONTROL GROUP (CGROUP),
	David S . Miller, Daniel Mack, Daniel Borkmann

On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote:
> 
> 
> On 30/08/2016 22:23, Andy Lutomirski wrote:
> > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >>
> >> On 30/08/2016 20:55, Andy Lutomirski wrote:
> >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote:
> >>>>
> >>>>
> >>>> On 28/08/2016 10:13, Andy Lutomirski wrote:
> >>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> >>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> >>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> >>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
> >>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as
> >>>>>>>>>> attaching to a process hierarchy.
> >>>>>>>>>
> >>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
> >>>>>>>>> could come up with. If you see another way, please propose.
> >>>>>>>>> current->seccomp.landlock_prog is not the answer.
> >>>>>>>>
> >>>>>>>> Hum, I don't see the difference from a performance point of view between
> >>>>>>>> a cgroup-based or a process hierarchy-based system.
> >>>>>>>>
> >>>>>>>> Maybe a better option should be to use an array of pointers with N
> >>>>>>>> entries, one for each supported hook, instead of a unique pointer list?
> >>>>>>>
> >>>>>>> yes, clearly array dereference is faster than link list walk.
> >>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> >>>>>>> Since we cannot keep it inside task_struct, we have to allocate it.
> >>>>>>> Every time the task is creted then. What to do on the fork? That
> >>>>>>> will require changes all over. Then the obvious optimization would be
> >>>>>>> to share this allocated array of prog pointers across multiple tasks...
> >>>>>>> and little by little this new facility will look like cgroup.
> >>>>>>> Hence the suggestion to put this array into cgroup from the start.
> >>>>>>
> >>>>>> I see your point :)
> >>>>>>
> >>>>>>>
> >>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> >>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> >>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook
> >>>>>>>> program which is not triggered by a seccomp-filter, but this should be
> >>>>>>>> needed anyway to handle interruptions.
> >>>>>>>
> >>>>>>> what do you mean 'not triggered by seccomp' ?
> >>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional?
> >>>>>>> imo that's non starter due to overhead.
> >>>>>>
> >>>>>> Yes, for now, it is triggered by a new seccomp filter return value
> >>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> >>>>>> be needed but could be useful to bind a seccomp filter security policy
> >>>>>> with a Landlock one. Waiting for Kees's point of view…
> >>>>>>
> >>>>>
> >>>>> I'm not Kees, but I'd be okay with that.  I still think that doing
> >>>>> this by process hierarchy a la seccomp will be easier to use and to
> >>>>> understand (which is quite important for this kind of work) than doing
> >>>>> it by cgroup.
> >>>>>
> >>>>> A feature I've wanted to add for a while is to have an fd that
> >>>>> represents a seccomp layer, the idea being that you would set up your
> >>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> >>>>> would have a syscall to install that layer.  Then an unprivileged
> >>>>> sandbox manager could set up its layer and still be able to inject new
> >>>>> processes into it later on, no cgroups needed.
> >>>>
> >>>> A nice thing I didn't highlight about Landlock is that a process can
> >>>> prepare a layer of rules (arraymap of handles + Landlock programs) and
> >>>> pass the file descriptors of the Landlock programs to another process.
> >>>> This process could then apply this programs to get sandboxed. However,
> >>>> for now, because a Landlock program is only triggered by a seccomp
> >>>> filter (which do not follow the Landlock programs as a FD), they will be
> >>>> useless.
> >>>>
> >>>> The FD referring to an arraymap of handles can also be used to update a
> >>>> map and change the behavior of a Landlock program. A master process can
> >>>> then add or remove restrictions to another process hierarchy on the fly.
> >>>
> >>> Maybe this could be extended a little bit.  The fd could hold the
> >>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> >>> the ability to install it and FMODE_WRITE could give the ability to
> >>> modify it.
> >>>
> >>
> >> This is interesting! It should be possible to append the seccomp stack
> >> of a source process to the seccomp stack of the target process when a
> >> Landlock program is passed and then activated through seccomp(2).
> >>
> >> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
> >> permission of the eBPF program FD in a specific way?
> >>
> > 
> > This wouldn't be an eBPF program FD -- it would be an FD encapsulating
> > an entire configuration including seccomp BPF program, whatever
> > landlock stuff is associated, and eventual seccomp monitor
> > configuration (once I write that code), etc.
> > 
> > You wouldn't say "attach this process's seccomp stack to me" -- you'd
> > say "attach this seccomp layer to me".
> > 
> > A decision that we'd have to make would be whether the FD links to the
> > parent layer or whether it can be attached without regard to what the
> > parent layer is.
> 
> OK, I like that, but I think it could be done on a second time. :)

I don't. Single FD that is a collection of objects seems an odd abstraction
to me. I also don't see what it actually solves.
I think lsm and seccomp should be orthogonal and not tied into each other.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-30 20:55                               ` Alexei Starovoitov
@ 2016-08-30 21:45                                 ` Andy Lutomirski
  2016-08-31  1:36                                   ` Alexei Starovoitov
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-30 21:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: LSM List, Network Development, Alexei Starovoitov, Linux API,
	Sargun Dhillon, Tejun Heo, Kees Cook, David S . Miller,
	open list:CONTROL GROUP (CGROUP),
	Mickaël Salaün, Daniel Mack, linux-kernel,
	kernel-hardening, Daniel Borkmann

On Aug 30, 2016 1:56 PM, "Alexei Starovoitov"
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote:
> >
> >
> > On 30/08/2016 22:23, Andy Lutomirski wrote:
> > > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote:
> > >>
> > >> On 30/08/2016 20:55, Andy Lutomirski wrote:
> > >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote:
> > >>>>
> > >>>>
> > >>>> On 28/08/2016 10:13, Andy Lutomirski wrote:
> > >>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> > >>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> > >>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > >>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> > >>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
> > >>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as
> > >>>>>>>>>> attaching to a process hierarchy.
> > >>>>>>>>>
> > >>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
> > >>>>>>>>> could come up with. If you see another way, please propose.
> > >>>>>>>>> current->seccomp.landlock_prog is not the answer.
> > >>>>>>>>
> > >>>>>>>> Hum, I don't see the difference from a performance point of view between
> > >>>>>>>> a cgroup-based or a process hierarchy-based system.
> > >>>>>>>>
> > >>>>>>>> Maybe a better option should be to use an array of pointers with N
> > >>>>>>>> entries, one for each supported hook, instead of a unique pointer list?
> > >>>>>>>
> > >>>>>>> yes, clearly array dereference is faster than link list walk.
> > >>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> > >>>>>>> Since we cannot keep it inside task_struct, we have to allocate it.
> > >>>>>>> Every time the task is creted then. What to do on the fork? That
> > >>>>>>> will require changes all over. Then the obvious optimization would be
> > >>>>>>> to share this allocated array of prog pointers across multiple tasks...
> > >>>>>>> and little by little this new facility will look like cgroup.
> > >>>>>>> Hence the suggestion to put this array into cgroup from the start.
> > >>>>>>
> > >>>>>> I see your point :)
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> > >>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> > >>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook
> > >>>>>>>> program which is not triggered by a seccomp-filter, but this should be
> > >>>>>>>> needed anyway to handle interruptions.
> > >>>>>>>
> > >>>>>>> what do you mean 'not triggered by seccomp' ?
> > >>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional?
> > >>>>>>> imo that's non starter due to overhead.
> > >>>>>>
> > >>>>>> Yes, for now, it is triggered by a new seccomp filter return value
> > >>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> > >>>>>> be needed but could be useful to bind a seccomp filter security policy
> > >>>>>> with a Landlock one. Waiting for Kees's point of view…
> > >>>>>>
> > >>>>>
> > >>>>> I'm not Kees, but I'd be okay with that.  I still think that doing
> > >>>>> this by process hierarchy a la seccomp will be easier to use and to
> > >>>>> understand (which is quite important for this kind of work) than doing
> > >>>>> it by cgroup.
> > >>>>>
> > >>>>> A feature I've wanted to add for a while is to have an fd that
> > >>>>> represents a seccomp layer, the idea being that you would set up your
> > >>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> > >>>>> would have a syscall to install that layer.  Then an unprivileged
> > >>>>> sandbox manager could set up its layer and still be able to inject new
> > >>>>> processes into it later on, no cgroups needed.
> > >>>>
> > >>>> A nice thing I didn't highlight about Landlock is that a process can
> > >>>> prepare a layer of rules (arraymap of handles + Landlock programs) and
> > >>>> pass the file descriptors of the Landlock programs to another process.
> > >>>> This process could then apply this programs to get sandboxed. However,
> > >>>> for now, because a Landlock program is only triggered by a seccomp
> > >>>> filter (which do not follow the Landlock programs as a FD), they will be
> > >>>> useless.
> > >>>>
> > >>>> The FD referring to an arraymap of handles can also be used to update a
> > >>>> map and change the behavior of a Landlock program. A master process can
> > >>>> then add or remove restrictions to another process hierarchy on the fly.
> > >>>
> > >>> Maybe this could be extended a little bit.  The fd could hold the
> > >>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> > >>> the ability to install it and FMODE_WRITE could give the ability to
> > >>> modify it.
> > >>>
> > >>
> > >> This is interesting! It should be possible to append the seccomp stack
> > >> of a source process to the seccomp stack of the target process when a
> > >> Landlock program is passed and then activated through seccomp(2).
> > >>
> > >> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
> > >> permission of the eBPF program FD in a specific way?
> > >>
> > >
> > > This wouldn't be an eBPF program FD -- it would be an FD encapsulating
> > > an entire configuration including seccomp BPF program, whatever
> > > landlock stuff is associated, and eventual seccomp monitor
> > > configuration (once I write that code), etc.
> > >
> > > You wouldn't say "attach this process's seccomp stack to me" -- you'd
> > > say "attach this seccomp layer to me".
> > >
> > > A decision that we'd have to make would be whether the FD links to the
> > > parent layer or whether it can be attached without regard to what the
> > > parent layer is.
> >
> > OK, I like that, but I think it could be done on a second time. :)
>
> I don't. Single FD that is a collection of objects seems an odd abstraction
> to me. I also don't see what it actually solves.
> I think lsm and seccomp should be orthogonal and not tied into each other.
>

It's not a random collection of objects.  It's a fully configured
sandboxing layer.

One might argue that landlock shouldn't be tied to seccomp (in theory,
attached progs could be given access to syscall_get_xyz()), but I
think that the seccomp attachment mechanism is the right way to
install unprivileged filters.  It handles the no_new_privs stuff, it
allows TSYNC, it's totally independent of systemwide policy, etc.

Trying to use cgroups or similar for this is going to be much nastier.
Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of
putting cgroupfs in their containers, so requiring cgroups or similar
would be a mess for that type of application.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-30 21:45                                 ` Andy Lutomirski
@ 2016-08-31  1:36                                   ` Alexei Starovoitov
  2016-08-31  3:29                                     ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2016-08-31  1:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LSM List, Network Development, Alexei Starovoitov, Linux API,
	Sargun Dhillon, Tejun Heo, Kees Cook, David S . Miller,
	open list:CONTROL GROUP (CGROUP),
	Mickaël Salaün, Daniel Mack, linux-kernel,
	kernel-hardening, Daniel Borkmann

On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote:
> 
> One might argue that landlock shouldn't be tied to seccomp (in theory,
> attached progs could be given access to syscall_get_xyz()), but I

proposed lsm is way more powerful than syscall_get_xyz.
no need to dumb it down.

> think that the seccomp attachment mechanism is the right way to
> install unprivileged filters.  It handles the no_new_privs stuff, it
> allows TSYNC, it's totally independent of systemwide policy, etc.
> 
> Trying to use cgroups or similar for this is going to be much nastier.
> Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of
> putting cgroupfs in their containers, so requiring cgroups or similar
> would be a mess for that type of application.

I don't see why it is a 'mess'. cgroups are already used by majority
of the systems, so I don't see why requiring a cgroup is such a big deal.
But let's say we don't do them. How implementation is going to look like
for task based hierarchy? Note that we need an array of bpf_prog pointers.
One for each lsm hook. Where this array is going to be stored?
We cannot put in task_struct, since it's too large. Cannot put it
into 'struct seccomp' directly either, unless it will become a pointer.
Is that the proposal?
So now we will be wasting extra 1kbyte of memory per task. Not great.
We'd want to optimize it by sharing this such struct seccomp with prog array
across threads of the same task? Or dynimically allocating it when
landlock is in use? May sound nice, but how to account for that kernel
memory? I guess also solvable by charging memlock.
With cgroup based approach we don't need to worry about all that.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
  2016-08-31  1:36                                   ` Alexei Starovoitov
@ 2016-08-31  3:29                                     ` Andy Lutomirski
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-31  3:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: LSM List, Network Development, Alexei Starovoitov, Linux API,
	Sargun Dhillon, Tejun Heo, Kees Cook, David S . Miller,
	open list:CONTROL GROUP (CGROUP),
	Mickaël Salaün, Daniel Mack, linux-kernel,
	kernel-hardening, Daniel Borkmann

On Tue, Aug 30, 2016 at 6:36 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote:
>>
>> One might argue that landlock shouldn't be tied to seccomp (in theory,
>> attached progs could be given access to syscall_get_xyz()), but I
>
> proposed lsm is way more powerful than syscall_get_xyz.
> no need to dumb it down.

I think you're misunderstanding me.

Mickaël's code allows one to make the LSM hook filters depend on the
syscall using SECCOMP_RET_LANDLOCK.  I'm suggesting that a similar
effect could be achieved by allowing the eBPF LSM hook to call
syscall_get_xyz() if it wants to.

>
>> think that the seccomp attachment mechanism is the right way to
>> install unprivileged filters.  It handles the no_new_privs stuff, it
>> allows TSYNC, it's totally independent of systemwide policy, etc.
>>
>> Trying to use cgroups or similar for this is going to be much nastier.
>> Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of
>> putting cgroupfs in their containers, so requiring cgroups or similar
>> would be a mess for that type of application.
>
> I don't see why it is a 'mess'. cgroups are already used by majority
> of the systems, so I don't see why requiring a cgroup is such a big deal.

Requiring cgroup to be configured in isn't a big deal.  Requiring

> But let's say we don't do them. How implementation is going to look like
> for task based hierarchy? Note that we need an array of bpf_prog pointers.
> One for each lsm hook. Where this array is going to be stored?
> We cannot put in task_struct, since it's too large. Cannot put it
> into 'struct seccomp' directly either, unless it will become a pointer.
> Is that the proposal?

It would go in struct seccomp_filter or in something pointed to from there.

> So now we will be wasting extra 1kbyte of memory per task. Not great.
> We'd want to optimize it by sharing this such struct seccomp with prog array
> across threads of the same task? Or dynimically allocating it when
> landlock is in use? May sound nice, but how to account for that kernel
> memory? I guess also solvable by charging memlock.
> With cgroup based approach we don't need to worry about all that.
>

The considerations are essentially identical either way.

With cgroups, if you want to share the memory between multiple
separate sandboxes (Firejail instances, Sandstorm grains, Chromium
instances, xdg-apps, etc), you'd need to get them to all coordinate to
share a cgroup.  With a seccomp-like interface, you'd need to get them
to coordinate to share an installed layer (using my FD idea or
similar).

There would *not* be any duplication of this memory just because a
sandboxed process called fork().

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
                   ` (12 preceding siblings ...)
  2016-08-30 16:06 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
@ 2016-09-15  9:19 ` Pavel Machek
  2016-09-20 17:08   ` Mickaël Salaün
  13 siblings, 1 reply; 66+ messages in thread
From: Pavel Machek @ 2016-09-15  9:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

Hi!

> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.
> 
> The first RFC [1] was focused on extending seccomp while staying at the syscall
> level. This brought a working PoC but with some (mitigated) ToCToU race
> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> syscall argument evaluation (hence the LSM hooks).

Long and nice description follows. Should it go to Documentation/
somewhere?

Because some documentation would be useful...
								Pavel

>  include/linux/bpf.h                   |  41 +++++
>  include/linux/lsm_hooks.h             |   5 +
>  include/linux/seccomp.h               |  54 ++++++-
>  include/uapi/asm-generic/errno-base.h |   1 +
>  include/uapi/linux/bpf.h              | 103 ++++++++++++
>  include/uapi/linux/seccomp.h          |   2 +
>  kernel/bpf/arraymap.c                 | 222 +++++++++++++++++++++++++
>  kernel/bpf/syscall.c                  |  18 ++-
>  kernel/bpf/verifier.c                 |  32 +++-
>  kernel/fork.c                         |  41 ++++-
>  kernel/seccomp.c                      | 211 +++++++++++++++++++++++-
>  samples/Makefile                      |   2 +-
>  samples/landlock/.gitignore           |   1 +
>  samples/landlock/Makefile             |  16 ++
>  samples/landlock/sandbox.c            | 295 ++++++++++++++++++++++++++++++++++
>  security/Kconfig                      |   1 +
>  security/Makefile                     |   2 +
>  security/landlock/Kconfig             |  19 +++
>  security/landlock/Makefile            |   3 +
>  security/landlock/checker_cgroup.c    |  96 +++++++++++
>  security/landlock/checker_cgroup.h    |  18 +++
>  security/landlock/checker_fs.c        | 183 +++++++++++++++++++++
>  security/landlock/checker_fs.h        |  20 +++
>  security/landlock/lsm.c               | 228 ++++++++++++++++++++++++++
>  security/security.c                   |   1 +
>  25 files changed, 1592 insertions(+), 23 deletions(-)
>  create mode 100644 samples/landlock/.gitignore
>  create mode 100644 samples/landlock/Makefile
>  create mode 100644 samples/landlock/sandbox.c
>  create mode 100644 security/landlock/Kconfig
>  create mode 100644 security/landlock/Makefile
>  create mode 100644 security/landlock/checker_cgroup.c
>  create mode 100644 security/landlock/checker_cgroup.h
>  create mode 100644 security/landlock/checker_fs.c
>  create mode 100644 security/landlock/checker_fs.h
>  create mode 100644 security/landlock/lsm.c
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-09-15  9:19 ` Pavel Machek
@ 2016-09-20 17:08   ` Mickaël Salaün
  2016-09-24  7:45     ` Pavel Machek
  2016-10-03 22:56     ` Kees Cook
  0 siblings, 2 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-09-20 17:08 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev


[-- Attachment #1.1: Type: text/plain, Size: 3190 bytes --]


On 15/09/2016 11:19, Pavel Machek wrote:
> Hi!
> 
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
>>
>> The first RFC [1] was focused on extending seccomp while staying at the syscall
>> level. This brought a working PoC but with some (mitigated) ToCToU race
>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>> syscall argument evaluation (hence the LSM hooks).
> 
> Long and nice description follows. Should it go to Documentation/
> somewhere?
> 
> Because some documentation would be useful...
> 								Pavel

Right, but I was looking for feedback before investing in documentation. :)


> 
>>  include/linux/bpf.h                   |  41 +++++
>>  include/linux/lsm_hooks.h             |   5 +
>>  include/linux/seccomp.h               |  54 ++++++-
>>  include/uapi/asm-generic/errno-base.h |   1 +
>>  include/uapi/linux/bpf.h              | 103 ++++++++++++
>>  include/uapi/linux/seccomp.h          |   2 +
>>  kernel/bpf/arraymap.c                 | 222 +++++++++++++++++++++++++
>>  kernel/bpf/syscall.c                  |  18 ++-
>>  kernel/bpf/verifier.c                 |  32 +++-
>>  kernel/fork.c                         |  41 ++++-
>>  kernel/seccomp.c                      | 211 +++++++++++++++++++++++-
>>  samples/Makefile                      |   2 +-
>>  samples/landlock/.gitignore           |   1 +
>>  samples/landlock/Makefile             |  16 ++
>>  samples/landlock/sandbox.c            | 295 ++++++++++++++++++++++++++++++++++
>>  security/Kconfig                      |   1 +
>>  security/Makefile                     |   2 +
>>  security/landlock/Kconfig             |  19 +++
>>  security/landlock/Makefile            |   3 +
>>  security/landlock/checker_cgroup.c    |  96 +++++++++++
>>  security/landlock/checker_cgroup.h    |  18 +++
>>  security/landlock/checker_fs.c        | 183 +++++++++++++++++++++
>>  security/landlock/checker_fs.h        |  20 +++
>>  security/landlock/lsm.c               | 228 ++++++++++++++++++++++++++
>>  security/security.c                   |   1 +
>>  25 files changed, 1592 insertions(+), 23 deletions(-)
>>  create mode 100644 samples/landlock/.gitignore
>>  create mode 100644 samples/landlock/Makefile
>>  create mode 100644 samples/landlock/sandbox.c
>>  create mode 100644 security/landlock/Kconfig
>>  create mode 100644 security/landlock/Makefile
>>  create mode 100644 security/landlock/checker_cgroup.c
>>  create mode 100644 security/landlock/checker_cgroup.h
>>  create mode 100644 security/landlock/checker_fs.c
>>  create mode 100644 security/landlock/checker_fs.h
>>  create mode 100644 security/landlock/lsm.c
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-09-20 17:08   ` Mickaël Salaün
@ 2016-09-24  7:45     ` Pavel Machek
  2016-10-03 22:56     ` Kees Cook
  1 sibling, 0 replies; 66+ messages in thread
From: Pavel Machek @ 2016-09-24  7:45 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]

On Tue 2016-09-20 19:08:23, Mickaël Salaün wrote:
> 
> On 15/09/2016 11:19, Pavel Machek wrote:
> > Hi!
> > 
> >> This series is a proof of concept to fill some missing part of seccomp as the
> >> ability to check syscall argument pointers or creating more dynamic security
> >> policies. The goal of this new stackable Linux Security Module (LSM) called
> >> Landlock is to allow any process, including unprivileged ones, to create
> >> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> >> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> >> bugs or unexpected/malicious behaviors in userland applications.
> >>
> >> The first RFC [1] was focused on extending seccomp while staying at the syscall
> >> level. This brought a working PoC but with some (mitigated) ToCToU race
> >> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> >> syscall argument evaluation (hence the LSM hooks).
> > 
> > Long and nice description follows. Should it go to Documentation/
> > somewhere?
> > 
> > Because some documentation would be useful...
> 
> Right, but I was looking for feedback before investing in documentation. :)

Heh. And I was hoping to learn what I'm reviewing. Too bad :-).

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-09-20 17:08   ` Mickaël Salaün
  2016-09-24  7:45     ` Pavel Machek
@ 2016-10-03 22:56     ` Kees Cook
  2016-10-05 20:30       ` Mickaël Salaün
  1 sibling, 1 reply; 66+ messages in thread
From: Kees Cook @ 2016-10-03 22:56 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Pavel Machek, LKML, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, Linux API, linux-security-module,
	Network Development

On Tue, Sep 20, 2016 at 10:08 AM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 15/09/2016 11:19, Pavel Machek wrote:
>> Hi!
>>
>>> This series is a proof of concept to fill some missing part of seccomp as the
>>> ability to check syscall argument pointers or creating more dynamic security
>>> policies. The goal of this new stackable Linux Security Module (LSM) called
>>> Landlock is to allow any process, including unprivileged ones, to create
>>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>>> bugs or unexpected/malicious behaviors in userland applications.
>>>
>>> The first RFC [1] was focused on extending seccomp while staying at the syscall
>>> level. This brought a working PoC but with some (mitigated) ToCToU race
>>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>>> syscall argument evaluation (hence the LSM hooks).
>>
>> Long and nice description follows. Should it go to Documentation/
>> somewhere?
>>
>> Because some documentation would be useful...
>>                                                               Pavel
>
> Right, but I was looking for feedback before investing in documentation. :)

Heh, understood. There are a number of grammar issues that slow me
down when reading this, so when it does move into Documentation/, I'll
have some English nit-picks. :)

While reading I found myself wanting an explicit list of "guiding
principles" for anyone implementing new hooks. It is touched on in
several places (don't expose things, don't allow for privilege
changes, etc). Having that spelled out somewhere would be nice.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
  2016-10-03 22:56     ` Kees Cook
@ 2016-10-05 20:30       ` Mickaël Salaün
  0 siblings, 0 replies; 66+ messages in thread
From: Mickaël Salaün @ 2016-10-05 20:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Pavel Machek, LKML, Alexei Starovoitov, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Daniel Borkmann, Daniel Mack,
	David Drysdale, David S . Miller, Elena Reshetova, James Morris,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, Linux API, linux-security-module,
	Network Development


[-- Attachment #1.1: Type: text/plain, Size: 1931 bytes --]


On 04/10/2016 00:56, Kees Cook wrote:
> On Tue, Sep 20, 2016 at 10:08 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 15/09/2016 11:19, Pavel Machek wrote:
>>> Hi!
>>>
>>>> This series is a proof of concept to fill some missing part of seccomp as the
>>>> ability to check syscall argument pointers or creating more dynamic security
>>>> policies. The goal of this new stackable Linux Security Module (LSM) called
>>>> Landlock is to allow any process, including unprivileged ones, to create
>>>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>>>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>>>> bugs or unexpected/malicious behaviors in userland applications.
>>>>
>>>> The first RFC [1] was focused on extending seccomp while staying at the syscall
>>>> level. This brought a working PoC but with some (mitigated) ToCToU race
>>>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>>>> syscall argument evaluation (hence the LSM hooks).
>>>
>>> Long and nice description follows. Should it go to Documentation/
>>> somewhere?
>>>
>>> Because some documentation would be useful...
>>>                                                               Pavel
>>
>> Right, but I was looking for feedback before investing in documentation. :)
> 
> Heh, understood. There are a number of grammar issues that slow me
> down when reading this, so when it does move into Documentation/, I'll
> have some English nit-picks. :)
> 
> While reading I found myself wanting an explicit list of "guiding
> principles" for anyone implementing new hooks. It is touched on in
> several places (don't expose things, don't allow for privilege
> changes, etc). Having that spelled out somewhere would be nice.

Right, I'm going to try to create a more consistent documentation with
the "guiding principles".

 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2016-10-05 20:31 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 01/10] landlock: Add Kconfig Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 05/10] seccomp: Handle Landlock Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 06/10] landlock: Add LSM hooks Mickaël Salaün
2016-08-30 18:56   ` Andy Lutomirski
2016-08-30 20:10     ` Mickaël Salaün
2016-08-30 20:18       ` Andy Lutomirski
2016-08-30 20:27         ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 07/10] landlock: Add errno check Mickaël Salaün
2016-08-25 11:13   ` Andy Lutomirski
2016-08-25 10:32 ` [RFC v2 08/10] landlock: Handle file system comparisons Mickaël Salaün
2016-08-25 11:12   ` Andy Lutomirski
2016-08-25 14:10     ` Mickaël Salaün
2016-08-26 14:57       ` Andy Lutomirski
2016-08-27 13:45         ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 09/10] landlock: Handle cgroups Mickaël Salaün
2016-08-25 11:09   ` Andy Lutomirski
2016-08-25 14:44     ` Mickaël Salaün
2016-08-26 12:55       ` Tejun Heo
2016-08-26 14:20       ` Andy Lutomirski
2016-08-26 15:50         ` Tejun Heo
2016-08-26  2:14   ` Alexei Starovoitov
2016-08-26 15:10     ` Mickaël Salaün
2016-08-26 23:05       ` Alexei Starovoitov
2016-08-27  7:30         ` Andy Lutomirski
2016-08-27 18:11           ` Alexei Starovoitov
2016-08-28  8:14             ` Andy Lutomirski
2016-08-27 14:06         ` [RFC v2 09/10] landlock: Handle cgroups (performance) Mickaël Salaün
2016-08-27 18:06           ` Alexei Starovoitov
2016-08-27 19:35             ` Mickaël Salaün
2016-08-27 20:43               ` Alexei Starovoitov
2016-08-27 21:14                 ` Mickaël Salaün
2016-08-28  8:13                   ` Andy Lutomirski
2016-08-28  9:42                     ` Mickaël Salaün
2016-08-30 18:55                       ` Andy Lutomirski
2016-08-30 20:20                         ` Mickaël Salaün
2016-08-30 20:23                           ` Andy Lutomirski
2016-08-30 20:33                             ` Mickaël Salaün
2016-08-30 20:55                               ` Alexei Starovoitov
2016-08-30 21:45                                 ` Andy Lutomirski
2016-08-31  1:36                                   ` Alexei Starovoitov
2016-08-31  3:29                                     ` Andy Lutomirski
2016-08-27 14:19         ` [RFC v2 09/10] landlock: Handle cgroups (netfilter match) Mickaël Salaün
2016-08-27 18:32           ` Alexei Starovoitov
2016-08-27 14:34         ` [RFC v2 09/10] landlock: Handle cgroups (program types) Mickaël Salaün
2016-08-27 18:19           ` Alexei Starovoitov
2016-08-27 19:55             ` Mickaël Salaün
2016-08-27 20:56               ` Alexei Starovoitov
2016-08-27 21:18                 ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 10/10] samples/landlock: Add sandbox example Mickaël Salaün
2016-08-25 11:05 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
2016-08-25 13:57   ` Mickaël Salaün
2016-08-27  7:40 ` Andy Lutomirski
2016-08-27 15:10   ` Mickaël Salaün
2016-08-27 15:21     ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing (cgroup delegation) Mickaël Salaün
2016-08-30 16:06 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
2016-08-30 19:51   ` Mickaël Salaün
2016-08-30 19:55     ` Andy Lutomirski
2016-09-15  9:19 ` Pavel Machek
2016-09-20 17:08   ` Mickaël Salaün
2016-09-24  7:45     ` Pavel Machek
2016-10-03 22:56     ` Kees Cook
2016-10-05 20:30       ` Mickaël Salaün

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).