linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
@ 2018-02-27  0:41 Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata Mickaël Salaün
                   ` (11 more replies)
  0 siblings, 12 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

Hi,

This eight series is a major revamp of the Landlock design compared to
the previous series [1]. This enables more flexibility and granularity
of access control with file paths. It is now possible to enforce an
access control according to a file hierarchy. Landlock uses the concept
of inode and path to identify such hierarchy. In a way, it brings tools
to program what is a file hierarchy.

There is now three types of Landlock hooks: FS_WALK, FS_PICK and FS_GET.
Each of them accepts a dedicated eBPF program, called a Landlock
program.  They can be chained to enforce a full access control according
to a list of directories or files. The set of actions on a file is well
defined (e.g. read, write, ioctl, append, lock, mount...) taking
inspiration from the major Linux LSMs and some other access-controls
like Capsicum.  These program types are designed to be cache-friendly,
which give room for optimizations in the future.

The documentation patch contains some kernel documentation and
explanations on how to use Landlock.  The compiled documentation and
a talk I gave at FOSDEM can be found here: https://landlock.io
This patch series can be found in the branch landlock-v8 in this repo:
https://github.com/landlock-lsm/linux

There is still some minor issues with this patch series but it should
demonstrate how powerful this design may be. One of these issues is that
it is not a stackable LSM anymore, but the infrastructure management of
security blobs should allow to stack it with other LSM [4].

This is the first step of the roadmap discussed at LPC [2].  While the
intended final goal is to allow unprivileged users to use Landlock, this
series allows only a process with global CAP_SYS_ADMIN to load and
enforce a rule.  This may help to get feedback and avoid unexpected
behaviors.

This series can be applied on top of bpf-next, commit 7d72637eb39f
("Merge branch 'x86-jit'").  This can be tested with
CONFIG_SECCOMP_FILTER and CONFIG_SECURITY_LANDLOCK.  I would really
appreciate constructive comments on the design and the code.


# Landlock LSM

The goal of this new Linux Security Module (LSM) called Landlock is to
allow any process, including unprivileged ones, to create powerful
security sandboxes comparable to XNU Sandbox or OpenBSD Pledge. This
kind of sandbox is expected to help mitigate the security impact of bugs
or unexpected/malicious behaviors in user-space applications.

The approach taken is to add the minimum amount of code while still
allowing the user-space application to create quite complex access
rules.  A dedicated security policy language such as the one used by
SELinux, AppArmor and other major LSMs involves a lot of code and is
usually permitted to only a trusted user (i.e. root).  On the contrary,
eBPF programs already exist and are designed to be safely loaded by
unprivileged user-space.

This design does not seem too intrusive but is flexible enough to allow
a powerful sandbox mechanism accessible by any process on Linux. The use
of seccomp and Landlock is more suitable with the help of a user-space
library (e.g.  libseccomp) that could help to specify a high-level
language to express a security policy instead of raw eBPF programs.
Moreover, thanks to the LLVM front-end, it is quite easy to write an
eBPF program with a subset of the C language.


# Frequently asked questions

## Why is seccomp-bpf not enough?

A seccomp filter can access only raw syscall arguments (i.e. the
register values) which means that it is not possible to filter according
to the value pointed to by an argument, such as a file pathname. As an
embryonic Landlock version demonstrated, filtering at the syscall level
is complicated (e.g. need to take care of race conditions). This is
mainly because the access control checkpoints of the kernel are not at
this high-level but more underneath, at the LSM-hook level. The LSM
hooks are designed to handle this kind of checks.  Landlock abstracts
this approach to leverage the ability of unprivileged users to limit
themselves.

Cf. section "What it isn't?" in Documentation/prctl/seccomp_filter.txt


## Why use the seccomp(2) syscall?

Landlock use the same semantic as seccomp to apply access rule
restrictions. It add a new layer of security for the current process
which is inherited by its children. It makes sense to use an unique
access-restricting syscall (that should be allowed by seccomp filters)
which can only drop privileges. Moreover, a Landlock rule could come
from outside a process (e.g.  passed through a UNIX socket). It is then
useful to differentiate the creation/load of Landlock eBPF programs via
bpf(2), from rule enforcement via seccomp(2).


## Why a new LSM? Are SELinux, AppArmor, Smack and Tomoyo not good
   enough?

The current access control LSMs are fine for their purpose which is to
give the *root* the ability to enforce a security policy for the
*system*. What is missing is a way to enforce a security policy for any
application by its developer and *unprivileged user* as seccomp can do
for raw syscall filtering.

Differences from other (access control) LSMs:
* not only dedicated to administrators (i.e. no_new_priv);
* limited kernel attack surface (e.g. policy parsing);
* constrained policy rules (no DoS: deterministic execution time);
* do not leak more information than the loader process can legitimately
  have access to (minimize metadata inference).


# Changes since v7

* major revamp of the file system enforcement:
  * new eBPF map dedicated to tie an inode with an arbitrary 64-bits
    value, which can be used to tag files
  * three new Landlock hooks: FS_WALK, FS_PICK and FS_GET
  * add the ability to chain Landlock programs
  * add a new eBPF map type to compare inodes
  * don't use macros anymore
* replace subtype fields:
  * triggers: fine-grained bitfiel of actions on which a Landlock
    program may be called (if it comes from a sandbox process)
  * previous: a parent chained program
* upstreamed patches:
  * commit 369130b63178 ("selftests: Enhance kselftest_harness.h to
    print which assert failed")


# Changes since v6

* upstreamed patches:
  * commit 752ba56fb130 ("bpf: Extend check_uarg_tail_zero() checks")
  * commit 0b40808a1084 ("selftests: Make test_harness.h more generally
    available") and related ones
  * commit 3bb857e47e49 ("LSM: Enable multiple calls to
    security_add_hooks() for the same LSM")
* simplify the landlock_context (remove syscall_* fields) and add three
  FS sub-events: IOCTL, LOCK, FCNTL
* minimize the number of callable BPF functions from a Landlock rule
* do not split put_seccomp_filter() with put_seccomp()
* rename Landlock version to Landlock ABI
* miscellaneous fixes
* rebase on net-next


# Changes since v5

* eBPF program subtype:
  * use a prog_subtype pointer instead of inlining it into bpf_attr
  * enable a future-proof behavior (reject unhandled data/size)
  * add tests
* use a simple rule hierarchy (similar to seccomp-bpf)
* add a ptrace scope protection
* add more tests
* add more documentation
* rename some files
* miscellaneous fixes
* rebase on net-next


# Changes since v4

* upstreamed patches:
  * commit d498f8719a09 ("bpf: Rebuild bpf.o for any dependency update")
  * commit a734fb5d6006 ("samples/bpf: Reset global variables") and
    related ones
  * commit f4874d01beba ("bpf: Use bpf_create_map() from the library")
    and related ones
  * commit d02d8986a768 ("bpf: Always test unprivileged programs")
  * commit 640eb7e7b524 ("fs: Constify path_is_under()'s arguments")
  * commit 535e7b4b5ef2 ("bpf: Use u64_to_user_ptr()")
* revamp Landlock to not expose an LSM hook interface but wrap and
  abstract them with Landlock events (currently one for all filesystem
  related operations: LANDLOCK_SUBTYPE_EVENT_FS)
* wrap all filesystem kernel objects through the same FS handle (struct
  landlock_handle_fs): struct file, struct inode, struct path and struct
  dentry
* a rule don't return an errno code but only a boolean to allow or deny
  an access request
* handle all filesystem related LSM hooks
* add some tests and a sample:
  * BPF context tests
  * Landlock sandboxing tests and sample
  * write Landlock rules in C and compile them with LLVM
* change field names of eBPF program subtype
* remove arraymap of handles for now (will be replaced with a revamped
  map)
* remove cgroup handling for now
* add user and kernel documentation
* rebase on net-next


# Changes since v3

* upstreamed patch:
  * commit 1955351da41c ("bpf: Set register type according to
    is_valid_access()")
* use abstract LSM hook arguments with custom types (e.g.
  *_LANDLOCK_ARG_FS for struct file, struct inode and struct path)
* add more LSM hooks to support full filesystem access control
* improve the sandbox example
* fix races and RCU issues:
  * eBPF program execution and eBPF helpers
  * revamp the arraymap of handles to cleanly deal with update/delete
* eBPF program subtype for Landlock:
  * remove the "origin" field
  * add an "option" field
* rebase onto Daniel Mack's patches v7 [3]
* remove merged commit 1955351da41c ("bpf: Set register type according
  to is_valid_access()")
* fix spelling mistakes
* cleanup some type and variable names
* split patches
* for now, remove cgroup delegation handling for unprivileged user
* remove extra access check for cgroup_get_from_fd()
* remove unused example code dealing with skb
* remove seccomp-bpf link:
  * no more seccomp cookie
  * for now, it is no more possible to check the current syscall
    properties


# Changes since v2

* revamp cgroup handling:
  * use Daniel Mack's patches "Add eBPF hooks for cgroups" v5
  * remove bpf_landlock_cmp_cgroup_beneath()
  * make BPF_PROG_ATTACH usable with delegated cgroups
  * add a new CGRP_NO_NEW_PRIVS flag for safe cgroups
  * handle Landlock sandboxing for cgroups hierarchy
  * allow unprivileged processes to attach Landlock eBPF program to
    cgroups
* add subtype to eBPF programs:
  * replace Landlock hook identification by custom eBPF program types
    with a dedicated subtype field
  * manage fine-grained privileged Landlock programs
  * register Landlock programs for dedicated trigger origins (e.g.
    syscall, return from seccomp filter and/or interruption)
* performance and memory optimizations: use an array to access Landlock
  hooks directly but do not duplicated it for each thread
  (seccomp-based)
* allow running Landlock programs without seccomp filter
* fix seccomp-related issues
* remove extra errno bounding check for Landlock programs
* add some examples for optional eBPF functions or context access
  (network related) according to security checks to allow more features
  for privileged programs (e.g. Checmate)


# Changes since v1

* focus on the LSM hooks, not the syscalls:
  * much more simple implementation
  * does not need audit cache tricks to avoid race conditions
  * more simple to use and more generic because using the LSM hook
    abstraction directly
  * more efficient because only checking in LSM hooks
  * architecture agnostic
* switch from cBPF to eBPF:
  * new eBPF program types dedicated to Landlock
  * custom functions used by the eBPF program
  * gain some new features (e.g. 10 registers, can load values of
    different size, LLVM translator) but only a few functions allowed
    and a dedicated map type
  * new context: LSM hook ID, cookie and LSM hook arguments
  * need to set the sysctl kernel.unprivileged_bpf_disable to 0 (default
    value) to be able to load hook filters as unprivileged users
* smaller and simpler:
  * no more checker groups but dedicated arraymap of handles
  * simpler userland structs thanks to eBPF functions
* distinctive name: Landlock


[1] https://lkml.kernel.org/r/20170821000933.13024-1-mic@digikod.net
[2] https://lkml.kernel.org/r/5828776A.1010104@digikod.net
[3] https://lkml.kernel.org/r/1477390454-12553-1-git-send-email-daniel@zonque.org
[4] https://lwn.net/Articles/741963/

Regards,

Mickaël Salaün (11):
  fs,security: Add a security blob to nameidata
  fs,security: Add a new file access type: MAY_CHROOT
  bpf: Add eBPF program subtype and is_valid_subtype() verifier
  bpf,landlock: Define an eBPF program type for Landlock hooks
  seccomp,landlock: Enforce Landlock programs per process hierarchy
  bpf,landlock: Add a new map type: inode
  landlock: Handle filesystem access control
  landlock: Add ptrace restrictions
  bpf: Add a Landlock sandbox example
  bpf,landlock: Add tests for Landlock
  landlock: Add user and kernel documentation for Landlock

 Documentation/security/index.rst               |    1 +
 Documentation/security/landlock/index.rst      |   19 +
 Documentation/security/landlock/kernel.rst     |  100 +++
 Documentation/security/landlock/user.rst       |  206 +++++
 MAINTAINERS                                    |   13 +
 fs/namei.c                                     |   39 +
 fs/open.c                                      |    3 +-
 include/linux/bpf.h                            |   35 +-
 include/linux/bpf_types.h                      |    6 +
 include/linux/fs.h                             |    1 +
 include/linux/landlock.h                       |   61 ++
 include/linux/lsm_hooks.h                      |   12 +
 include/linux/namei.h                          |   14 +-
 include/linux/seccomp.h                        |    5 +
 include/linux/security.h                       |    7 +
 include/uapi/linux/bpf.h                       |   34 +-
 include/uapi/linux/landlock.h                  |  155 ++++
 include/uapi/linux/seccomp.h                   |    1 +
 kernel/bpf/Makefile                            |    3 +
 kernel/bpf/cgroup.c                            |    6 +-
 kernel/bpf/core.c                              |    6 +-
 kernel/bpf/helpers.c                           |   38 +
 kernel/bpf/inodemap.c                          |  318 ++++++++
 kernel/bpf/syscall.c                           |   62 +-
 kernel/bpf/verifier.c                          |   44 +-
 kernel/fork.c                                  |    8 +-
 kernel/seccomp.c                               |    4 +
 kernel/trace/bpf_trace.c                       |   15 +-
 net/core/filter.c                              |   76 +-
 samples/bpf/Makefile                           |    4 +
 samples/bpf/bpf_load.c                         |   85 +-
 samples/bpf/bpf_load.h                         |    7 +
 samples/bpf/cookie_uid_helper_example.c        |    2 +-
 samples/bpf/fds_example.c                      |    2 +-
 samples/bpf/landlock1.h                        |   14 +
 samples/bpf/landlock1_kern.c                   |  171 ++++
 samples/bpf/landlock1_user.c                   |  164 ++++
 samples/bpf/sock_example.c                     |    3 +-
 samples/bpf/test_cgrp2_attach.c                |    2 +-
 samples/bpf/test_cgrp2_attach2.c               |    4 +-
 samples/bpf/test_cgrp2_sock.c                  |    2 +-
 security/Kconfig                               |    1 +
 security/Makefile                              |    2 +
 security/landlock/Kconfig                      |   18 +
 security/landlock/Makefile                     |    6 +
 security/landlock/chain.c                      |   39 +
 security/landlock/chain.h                      |   35 +
 security/landlock/common.h                     |   94 +++
 security/landlock/enforce.c                    |  386 +++++++++
 security/landlock/enforce.h                    |   21 +
 security/landlock/enforce_seccomp.c            |  112 +++
 security/landlock/hooks.c                      |  121 +++
 security/landlock/hooks.h                      |   35 +
 security/landlock/hooks_cred.c                 |   52 ++
 security/landlock/hooks_cred.h                 |    1 +
 security/landlock/hooks_fs.c                   | 1021 ++++++++++++++++++++++++
 security/landlock/hooks_fs.h                   |   60 ++
 security/landlock/hooks_ptrace.c               |  124 +++
 security/landlock/hooks_ptrace.h               |   11 +
 security/landlock/init.c                       |  238 ++++++
 security/landlock/tag.c                        |  373 +++++++++
 security/landlock/tag.h                        |   36 +
 security/landlock/tag_fs.c                     |   59 ++
 security/landlock/tag_fs.h                     |   26 +
 security/landlock/task.c                       |   34 +
 security/landlock/task.h                       |   29 +
 security/security.c                            |   19 +-
 tools/include/uapi/linux/bpf.h                 |   34 +-
 tools/include/uapi/linux/landlock.h            |  155 ++++
 tools/lib/bpf/bpf.c                            |   15 +-
 tools/lib/bpf/bpf.h                            |    8 +-
 tools/lib/bpf/libbpf.c                         |    5 +-
 tools/perf/tests/bpf.c                         |    2 +-
 tools/testing/selftests/Makefile               |    1 +
 tools/testing/selftests/bpf/bpf_helpers.h      |    7 +
 tools/testing/selftests/bpf/test_align.c       |    2 +-
 tools/testing/selftests/bpf/test_tag.c         |    2 +-
 tools/testing/selftests/bpf/test_verifier.c    |  102 ++-
 tools/testing/selftests/landlock/.gitignore    |    5 +
 tools/testing/selftests/landlock/Makefile      |   35 +
 tools/testing/selftests/landlock/test.h        |   31 +
 tools/testing/selftests/landlock/test_base.c   |   27 +
 tools/testing/selftests/landlock/test_chain.c  |  249 ++++++
 tools/testing/selftests/landlock/test_fs.c     |  492 ++++++++++++
 tools/testing/selftests/landlock/test_ptrace.c |  158 ++++
 85 files changed, 5960 insertions(+), 75 deletions(-)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst
 create mode 100644 include/linux/landlock.h
 create mode 100644 include/uapi/linux/landlock.h
 create mode 100644 kernel/bpf/inodemap.c
 create mode 100644 samples/bpf/landlock1.h
 create mode 100644 samples/bpf/landlock1_kern.c
 create mode 100644 samples/bpf/landlock1_user.c
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/chain.c
 create mode 100644 security/landlock/chain.h
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/enforce.c
 create mode 100644 security/landlock/enforce.h
 create mode 100644 security/landlock/enforce_seccomp.c
 create mode 100644 security/landlock/hooks.c
 create mode 100644 security/landlock/hooks.h
 create mode 100644 security/landlock/hooks_cred.c
 create mode 100644 security/landlock/hooks_cred.h
 create mode 100644 security/landlock/hooks_fs.c
 create mode 100644 security/landlock/hooks_fs.h
 create mode 100644 security/landlock/hooks_ptrace.c
 create mode 100644 security/landlock/hooks_ptrace.h
 create mode 100644 security/landlock/init.c
 create mode 100644 security/landlock/tag.c
 create mode 100644 security/landlock/tag.h
 create mode 100644 security/landlock/tag_fs.c
 create mode 100644 security/landlock/tag_fs.h
 create mode 100644 security/landlock/task.c
 create mode 100644 security/landlock/task.h
 create mode 100644 tools/include/uapi/linux/landlock.h
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/test.h
 create mode 100644 tools/testing/selftests/landlock/test_base.c
 create mode 100644 tools/testing/selftests/landlock/test_chain.c
 create mode 100644 tools/testing/selftests/landlock/test_fs.c
 create mode 100644 tools/testing/selftests/landlock/test_ptrace.c

-- 
2.16.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  0:57   ` Al Viro
                     ` (2 more replies)
  2018-02-27  0:41 ` [PATCH bpf-next v8 02/11] fs,security: Add a new file access type: MAY_CHROOT Mickaël Salaün
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	Alexander Viro, James Morris, John Johansen, Stephen Smalley,
	Tetsuo Handa, linux-fsdevel

The function current_nameidata_security(struct inode *) can be used to
retrieve a blob's pointer address tied to the inode being walk through.
This enable to follow a path lookup and know where an inode access come
from. This is needed for the Landlock LSM to be able to restrict access
to file path.

The LSM hook nameidata_free_security(struct inode *) is called before
freeing the associated nameidata.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/namei.c                | 39 +++++++++++++++++++++++++++++++++++++++
 include/linux/lsm_hooks.h |  7 +++++++
 include/linux/namei.h     | 14 +++++++++++++-
 include/linux/security.h  |  7 +++++++
 security/security.c       |  7 +++++++
 5 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 921ae32dbc80..d592b3fb0d1e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,9 @@ struct nameidata {
 	struct inode	*link_inode;
 	unsigned	root_seq;
 	int		dfd;
+#ifdef CONFIG_SECURITY
+	struct nameidata_lookup lookup;
+#endif
 } __randomize_layout;
 
 static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
@@ -515,6 +518,9 @@ static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
 	p->name = name;
 	p->total_link_count = old ? old->total_link_count : 0;
 	p->saved = old;
+#ifdef CONFIG_SECURITY
+	p->lookup.security = NULL;
+#endif
 	current->nameidata = p;
 }
 
@@ -522,6 +528,7 @@ static void restore_nameidata(void)
 {
 	struct nameidata *now = current->nameidata, *old = now->saved;
 
+	security_nameidata_put_lookup(&now->lookup, now->inode);
 	current->nameidata = old;
 	if (old)
 		old->total_link_count = now->total_link_count;
@@ -549,6 +556,27 @@ static int __nd_alloc_stack(struct nameidata *nd)
 	return 0;
 }
 
+#ifdef CONFIG_SECURITY
+/**
+ * current_nameidata_lookup - get the state of the current path walk
+ *
+ * @inode: inode associated to the path walk
+ *
+ * Used by LSM modules for access restriction based on path walk. The LSM is in
+ * charge of the lookup->security blob allocation and management. The hook
+ * security_nameidata_put_lookup() will be called after the path walk end.
+ *
+ * Return ERR_PTR(-ENOENT) if there is no match.
+ */
+struct nameidata_lookup *current_nameidata_lookup(const struct inode *inode)
+{
+	if (!current->nameidata || current->nameidata->inode != inode)
+		return ERR_PTR(-ENOENT);
+	return &current->nameidata->lookup;
+}
+EXPORT_SYMBOL(current_nameidata_lookup);
+#endif
+
 /**
  * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
  * @path: nameidate to verify
@@ -2009,6 +2037,13 @@ static inline u64 hash_name(const void *salt, const char *name)
 
 #endif
 
+static inline void refresh_lookup(struct nameidata *nd)
+{
+#ifdef CONFIG_SECURITY
+	nd->lookup.type = nd->last_type;
+#endif
+}
+
 /*
  * Name resolution.
  * This is the basic name resolution function, turning a pathname into
@@ -2025,6 +2060,8 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		name++;
 	if (!*name)
 		return 0;
+	/* be ready for may_lookup() */
+	refresh_lookup(nd);
 
 	/* At this point we know we have a real path component. */
 	for(;;) {
@@ -2064,6 +2101,8 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		nd->last.hash_len = hash_len;
 		nd->last.name = name;
 		nd->last_type = type;
+		/* be ready for the next security_inode_permission() */
+		refresh_lookup(nd);
 
 		name += hashlen_len(hash_len);
 		if (!*name)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7161d8e7ee79..d71cf183f0be 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -428,6 +428,10 @@
  *	security module does not know about attribute or a negative error code
  *	to abort the copy up. Note that the caller is responsible for reading
  *	and writing the xattrs as this hook is merely a filter.
+ * @nameidata_put_lookup:
+ *	Deallocate and clear the current's nameidata->lookup.security field.
+ *	@lookup->security contains the security structure to be freed.
+ *	@inode is the last associated inode to the path walk
  *
  * Security hooks for file operations
  *
@@ -1514,6 +1518,8 @@ union security_list_options {
 	void (*inode_getsecid)(struct inode *inode, u32 *secid);
 	int (*inode_copy_up)(struct dentry *src, struct cred **new);
 	int (*inode_copy_up_xattr)(const char *name);
+	void (*nameidata_put_lookup)(struct nameidata_lookup *lookup,
+					struct inode *inode);
 
 	int (*file_permission)(struct file *file, int mask);
 	int (*file_alloc_security)(struct file *file);
@@ -1805,6 +1811,7 @@ struct security_hook_heads {
 	struct list_head inode_getsecid;
 	struct list_head inode_copy_up;
 	struct list_head inode_copy_up_xattr;
+	struct list_head nameidata_put_lookup;
 	struct list_head file_permission;
 	struct list_head file_alloc_security;
 	struct list_head file_free_security;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index a982bb7cd480..ba08cbb41f97 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -14,7 +14,19 @@ enum { MAX_NESTED_LINKS = 8 };
 /*
  * Type of the last component on LOOKUP_PARENT
  */
-enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
+enum namei_type {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
+
+#ifdef CONFIG_SECURITY
+struct nameidata_lookup {
+	void *security;
+	enum namei_type type;
+};
+
+struct inode;
+
+extern struct nameidata_lookup *current_nameidata_lookup(
+		const struct inode *inode);
+#endif
 
 /*
  * The bitmask for a lookup event:
diff --git a/include/linux/security.h b/include/linux/security.h
index 73f1ef625d40..b1fd4370daf8 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -31,6 +31,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
+#include <linux/namei.h>
 
 struct linux_binprm;
 struct cred;
@@ -302,6 +303,8 @@ int security_inode_listsecurity(struct inode *inode, char *buffer, size_t buffer
 void security_inode_getsecid(struct inode *inode, u32 *secid);
 int security_inode_copy_up(struct dentry *src, struct cred **new);
 int security_inode_copy_up_xattr(const char *name);
+void security_nameidata_put_lookup(struct nameidata_lookup *lookup,
+					struct inode *inode);
 int security_file_permission(struct file *file, int mask);
 int security_file_alloc(struct file *file);
 void security_file_free(struct file *file);
@@ -808,6 +811,10 @@ static inline int security_inode_copy_up_xattr(const char *name)
 	return -EOPNOTSUPP;
 }
 
+static inline void security_nameidata_put_lookup(
+		struct nameidata_lookup *lookup, struct inode *inode)
+{ }
+
 static inline int security_file_permission(struct file *file, int mask)
 {
 	return 0;
diff --git a/security/security.c b/security/security.c
index 1cd8526cb0b7..17053c7a1a77 100644
--- a/security/security.c
+++ b/security/security.c
@@ -857,6 +857,13 @@ int security_inode_copy_up_xattr(const char *name)
 }
 EXPORT_SYMBOL(security_inode_copy_up_xattr);
 
+void security_nameidata_put_lookup(struct nameidata_lookup *lookup,
+					struct inode *inode)
+{
+	call_void_hook(nameidata_put_lookup, lookup, inode);
+}
+EXPORT_SYMBOL(security_nameidata_put_lookup);
+
 int security_file_permission(struct file *file, int mask)
 {
 	int ret;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 02/11] fs,security: Add a new file access type: MAY_CHROOT
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 03/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier Mickaël Salaün
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	Alexander Viro, James Morris, John Johansen, Stephen Smalley,
	Tetsuo Handa, linux-fsdevel

For compatibility reason, MAY_CHROOT is always set with MAY_CHDIR.
However, this new flag enable to differentiate a chdir form a chroot.

This is needed for the Landlock LSM to be able to evaluate a new root
directory.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/open.c          | 3 ++-
 include/linux/fs.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 7ea118471dce..084d147c0e96 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -489,7 +489,8 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
 	if (error)
 		goto out;
 
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR |
+			MAY_CHROOT);
 	if (error)
 		goto dput_and_out;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2a815560fda0..67c62374446c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -90,6 +90,7 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define MAY_CHDIR		0x00000040
 /* called from RCU mode, don't block */
 #define MAY_NOT_BLOCK		0x00000080
+#define MAY_CHROOT		0x00000100
 
 /*
  * flags in file.f_mode.  Note that FMODE_READ and FMODE_WRITE must correspond
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 03/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 02/11] fs,security: Add a new file access type: MAY_CHROOT Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 04/11] bpf,landlock: Define an eBPF program type for Landlock hooks Mickaël Salaün
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

The goal of the program subtype is to be able to have different static
fine-grained verifications for a unique program type.

The struct bpf_verifier_ops gets a new optional function:
is_valid_subtype(). This new verifier is called at the beginning of the
eBPF program verification to check if the (optional) program subtype is
valid.

The struct bpf_prog_ops gets a new optional function: put_extra(). This
may be used to put extra data.

For now, only Landlock eBPF programs are using a program subtype (see
next commits) but this could be used by other program types in the
future.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lkml.kernel.org/r/20160827205559.GA43880@ast-mbp.thefacebook.com
---

Changes since v7:
* rename LANDLOCK_SUBTYPE_* to LANDLOCK_*
* move subtype in bpf_prog_aux and use only one bit for has_subtype
  (suggested by Alexei Starovoitov)
* wrap the prog_subtype with a prog_extra to be able to reference kernel
  pointers:
  * add an optional put_extra() function to struct bpf_prog_ops to be
    able to free the pointed data
  * replace all the prog_subtype with prog_extra in the struct
    bpf_verifier_ops functions
* remove the ABI field (requested by Alexei Starovoitov)
* rename subtype fields

Changes since v6:
* rename Landlock version to ABI to better reflect its purpose
* fix unsigned integer checks
* fix pointer cast
* constify pointers
* rebase

Changes since v5:
* use a prog_subtype pointer and make it future-proof
* add subtype test
* constify bpf_load_program()'s subtype argument
* cleanup subtype initialization
* rebase

Changes since v4:
* replace the "status" field with "version" (more generic)
* replace the "access" field with "ability" (less confusing)

Changes since v3:
* remove the "origin" field
* add an "option" field
* cleanup comments
---
 include/linux/bpf.h                         | 17 ++++++-
 include/uapi/linux/bpf.h                    | 11 +++++
 kernel/bpf/cgroup.c                         |  6 ++-
 kernel/bpf/core.c                           |  5 +-
 kernel/bpf/syscall.c                        | 35 ++++++++++++-
 kernel/bpf/verifier.c                       | 19 ++++++--
 kernel/trace/bpf_trace.c                    | 15 ++++--
 net/core/filter.c                           | 76 ++++++++++++++++++-----------
 samples/bpf/bpf_load.c                      |  3 +-
 samples/bpf/cookie_uid_helper_example.c     |  2 +-
 samples/bpf/fds_example.c                   |  2 +-
 samples/bpf/sock_example.c                  |  3 +-
 samples/bpf/test_cgrp2_attach.c             |  2 +-
 samples/bpf/test_cgrp2_attach2.c            |  4 +-
 samples/bpf/test_cgrp2_sock.c               |  2 +-
 tools/include/uapi/linux/bpf.h              | 11 +++++
 tools/lib/bpf/bpf.c                         | 15 ++++--
 tools/lib/bpf/bpf.h                         |  8 +--
 tools/lib/bpf/libbpf.c                      |  5 +-
 tools/perf/tests/bpf.c                      |  2 +-
 tools/testing/selftests/bpf/test_align.c    |  2 +-
 tools/testing/selftests/bpf/test_tag.c      |  2 +-
 tools/testing/selftests/bpf/test_verifier.c | 18 ++++++-
 23 files changed, 200 insertions(+), 65 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 66df387106de..377b2f3519f3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -200,26 +200,38 @@ bpf_ctx_record_field_size(struct bpf_insn_access_aux *aux, u32 size)
 	aux->ctx_field_size = size;
 }
 
+/* specific data per program type */
+struct bpf_prog_extra {
+	union bpf_prog_subtype subtype;
+	union {
+		struct bpf_prog *previous;
+	} landlock_hook;
+};
+
 struct bpf_prog_ops {
 	int (*test_run)(struct bpf_prog *prog, const union bpf_attr *kattr,
 			union bpf_attr __user *uattr);
+	void (*put_extra)(struct bpf_prog_extra *prog_extra);
 };
 
 struct bpf_verifier_ops {
 	/* return eBPF function prototype for verification */
-	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
+	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id,
+				      const struct bpf_prog_extra *prog_extra);
 
 	/* return true if 'size' wide access at offset 'off' within bpf_context
 	 * with 'type' (read or write) is allowed
 	 */
 	bool (*is_valid_access)(int off, int size, enum bpf_access_type type,
-				struct bpf_insn_access_aux *info);
+				struct bpf_insn_access_aux *info,
+				const struct bpf_prog_extra *prog_extra);
 	int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
 			    const struct bpf_prog *prog);
 	u32 (*convert_ctx_access)(enum bpf_access_type type,
 				  const struct bpf_insn *src,
 				  struct bpf_insn *dst,
 				  struct bpf_prog *prog, u32 *target_size);
+	bool (*is_valid_subtype)(struct bpf_prog_extra *prog_extra);
 };
 
 struct bpf_prog_offload_ops {
@@ -264,6 +276,7 @@ struct bpf_prog_aux {
 		struct work_struct work;
 		struct rcu_head	rcu;
 	};
+	struct bpf_prog_extra *extra;
 };
 
 struct bpf_array {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index db6bdc375126..87885c92ca78 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -231,6 +231,15 @@ enum bpf_attach_type {
 #define BPF_F_RDONLY		(1U << 3)
 #define BPF_F_WRONLY		(1U << 4)
 
+union bpf_prog_subtype {
+	struct {
+		__u32		type; /* enum landlock_hook_type */
+		__aligned_u64	triggers; /* LANDLOCK_TRIGGER_* */
+		__aligned_u64	options; /* LANDLOCK_OPTION_* */
+		__u32		previous; /* chained program FD */
+	} landlock_hook;
+} __attribute__((aligned(8)));
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -270,6 +279,8 @@ union bpf_attr {
 		__u32		prog_flags;
 		char		prog_name[BPF_OBJ_NAME_LEN];
 		__u32		prog_ifindex;	/* ifindex of netdev to prep for */
+		__aligned_u64	prog_subtype;	/* bpf_prog_subtype address */
+		__u32		prog_subtype_size;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index c1c0b60d3f2f..77d6d25d2f50 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -545,7 +545,8 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 EXPORT_SYMBOL(__cgroup_bpf_check_dev_permission);
 
 static const struct bpf_func_proto *
-cgroup_dev_func_proto(enum bpf_func_id func_id)
+cgroup_dev_func_proto(enum bpf_func_id func_id,
+		      const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -566,7 +567,8 @@ cgroup_dev_func_proto(enum bpf_func_id func_id)
 
 static bool cgroup_dev_is_valid_access(int off, int size,
 				       enum bpf_access_type type,
-				       struct bpf_insn_access_aux *info)
+				       struct bpf_insn_access_aux *info,
+				       const struct bpf_prog_extra *prog_extra)
 {
 	const int size_default = sizeof(__u32);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29ca9208dcfa..e4567f7434af 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -142,7 +142,10 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
 
 void __bpf_prog_free(struct bpf_prog *fp)
 {
-	kfree(fp->aux);
+	if (fp->aux) {
+		kfree(fp->aux->extra);
+		kfree(fp->aux);
+	}
 	vfree(fp);
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e24aa3241387..90d7de6d7393 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -992,6 +992,8 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 	if (atomic_dec_and_test(&prog->aux->refcnt)) {
 		int i;
 
+		if (prog->aux->ops->put_extra && prog->aux->extra)
+			prog->aux->ops->put_extra(prog->aux->extra);
 		trace_bpf_prog_put_rcu(prog);
 		/* bpf_prog_free_id() must be called first */
 		bpf_prog_free_id(prog, do_idr_lock);
@@ -1168,7 +1170,7 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
 EXPORT_SYMBOL_GPL(bpf_prog_get_type_dev);
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD prog_ifindex
+#define	BPF_PROG_LOAD_LAST_FIELD prog_subtype_size
 
 static int bpf_prog_load(union bpf_attr *attr)
 {
@@ -1249,6 +1251,35 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err)
 		goto free_prog;
 
+	/* copy eBPF program subtype from user space */
+	if (attr->prog_subtype) {
+		u32 size;
+
+		err = check_uarg_tail_zero(u64_to_user_ptr(attr->prog_subtype),
+					   sizeof(prog->aux->extra->subtype),
+					   attr->prog_subtype_size);
+		if (err)
+			goto free_prog;
+		size = min_t(u32, attr->prog_subtype_size,
+			     sizeof(prog->aux->extra->subtype));
+
+		prog->aux->extra = kzalloc(sizeof(*prog->aux->extra),
+					   GFP_KERNEL | GFP_USER);
+		if (!prog->aux->extra) {
+			err = -ENOMEM;
+			goto free_prog;
+		}
+		if (copy_from_user(&prog->aux->extra->subtype,
+				   u64_to_user_ptr(attr->prog_subtype), size)
+				   != 0) {
+			err = -EFAULT;
+			goto free_prog;
+		}
+	} else if (attr->prog_subtype_size != 0) {
+		err = -EINVAL;
+		goto free_prog;
+	}
+
 	/* run eBPF verifier */
 	err = bpf_check(&prog, attr);
 	if (err < 0)
@@ -1281,6 +1312,8 @@ static int bpf_prog_load(union bpf_attr *attr)
 	return err;
 
 free_used_maps:
+	if (prog->aux->ops->put_extra && prog->aux->extra)
+		prog->aux->ops->put_extra(prog->aux->extra);
 	free_used_maps(prog->aux);
 free_prog:
 	bpf_prog_uncharge_memlock(prog);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c74b163eaeb..ed0905338bb6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1310,7 +1310,8 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
 	};
 
 	if (env->ops->is_valid_access &&
-	    env->ops->is_valid_access(off, size, t, &info)) {
+	    env->ops->is_valid_access(off, size, t, &info,
+				      env->prog->aux->extra)) {
 		/* A non zero info.ctx_field_size indicates that this field is a
 		 * candidate for later verifier transformation to load the whole
 		 * field and then apply a mask when accessed with a narrower
@@ -2325,7 +2326,8 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 	}
 
 	if (env->ops->get_func_proto)
-		fn = env->ops->get_func_proto(func_id);
+		fn = env->ops->get_func_proto(func_id, env->prog->aux->extra);
+
 	if (!fn) {
 		verbose(env, "unknown func %s#%d\n", func_id_name(func_id),
 			func_id);
@@ -5546,7 +5548,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			insn      = new_prog->insnsi + i + delta;
 		}
 patch_call_imm:
-		fn = env->ops->get_func_proto(insn->imm);
+		fn = env->ops->get_func_proto(insn->imm, prog->aux->extra);
 		/* all functions that have prototype and verifier allowed
 		 * programs to call them, must be real in-kernel functions
 		 */
@@ -5633,6 +5635,17 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
 		env->strict_alignment = true;
 
+	if (env->ops->is_valid_subtype) {
+		if (!env->ops->is_valid_subtype(env->prog->aux->extra)) {
+			ret = -EINVAL;
+			goto err_unlock;
+		}
+	} else if (env->prog->aux->extra) {
+		/* do not accept a subtype if the program does not handle it */
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
 	if (bpf_prog_is_dev_bound(env->prog->aux)) {
 		ret = bpf_prog_offload_verifier_prep(env);
 		if (ret)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index fc2838ac8b78..bb5ada28c0f6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -568,7 +568,8 @@ static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id func_id)
 	}
 }
 
-static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id)
+static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id,
+		const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -588,7 +589,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 
 /* bpf+kprobe programs can access fields of 'struct pt_regs' */
 static bool kprobe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-					struct bpf_insn_access_aux *info)
+					struct bpf_insn_access_aux *info,
+					const struct bpf_prog_extra *prog_extra)
 {
 	if (off < 0 || off >= sizeof(struct pt_regs))
 		return false;
@@ -687,7 +689,8 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto_tp = {
          .arg3_type      = ARG_CONST_SIZE,
 };
 
-static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
+static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id,
+		const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -702,7 +705,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 }
 
 static bool tp_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-				    struct bpf_insn_access_aux *info)
+				    struct bpf_insn_access_aux *info,
+				    const struct bpf_prog_extra *prog_extra)
 {
 	if (off < sizeof(void *) || off >= PERF_MAX_TRACE_SIZE)
 		return false;
@@ -724,7 +728,8 @@ const struct bpf_prog_ops tracepoint_prog_ops = {
 };
 
 static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-				    struct bpf_insn_access_aux *info)
+				    struct bpf_insn_access_aux *info,
+				    const struct bpf_prog_extra *prog_extra)
 {
 	const int size_sp = FIELD_SIZEOF(struct bpf_perf_event_data,
 					 sample_period);
diff --git a/net/core/filter.c b/net/core/filter.c
index 08ab4c65a998..8c01140db592 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3403,7 +3403,8 @@ static const struct bpf_func_proto bpf_sock_ops_cb_flags_set_proto = {
 };
 
 static const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id,
+		    const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -3431,7 +3432,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
-sock_filter_func_proto(enum bpf_func_id func_id)
+sock_filter_func_proto(enum bpf_func_id func_id,
+		       const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	/* inet and inet6 sockets are created in a process
@@ -3440,12 +3442,13 @@ sock_filter_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_get_current_uid_gid:
 		return &bpf_get_current_uid_gid_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog_extra);
 	}
 }
 
 static const struct bpf_func_proto *
-sk_filter_func_proto(enum bpf_func_id func_id)
+sk_filter_func_proto(enum bpf_func_id func_id,
+		     const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_load_bytes:
@@ -3455,12 +3458,13 @@ sk_filter_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_get_socket_uid:
 		return &bpf_get_socket_uid_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog_extra);
 	}
 }
 
 static const struct bpf_func_proto *
-tc_cls_act_func_proto(enum bpf_func_id func_id)
+tc_cls_act_func_proto(enum bpf_func_id func_id,
+		      const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_store_bytes:
@@ -3522,12 +3526,13 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_get_socket_uid:
 		return &bpf_get_socket_uid_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog_extra);
 	}
 }
 
 static const struct bpf_func_proto *
-xdp_func_proto(enum bpf_func_id func_id)
+xdp_func_proto(enum bpf_func_id func_id,
+	       const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -3545,12 +3550,13 @@ xdp_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_redirect_map:
 		return &bpf_xdp_redirect_map_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog_extra);
 	}
 }
 
 static const struct bpf_func_proto *
-lwt_inout_func_proto(enum bpf_func_id func_id)
+lwt_inout_func_proto(enum bpf_func_id func_id,
+		     const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_load_bytes:
@@ -3572,12 +3578,13 @@ lwt_inout_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog_extra);
 	}
 }
 
 static const struct bpf_func_proto *
-	sock_ops_func_proto(enum bpf_func_id func_id)
+	sock_ops_func_proto(enum bpf_func_id func_id,
+			    const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_setsockopt:
@@ -3589,11 +3596,13 @@ static const struct bpf_func_proto *
 	case BPF_FUNC_sock_map_update:
 		return &bpf_sock_map_update_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog_extra);
 	}
 }
 
-static const struct bpf_func_proto *sk_skb_func_proto(enum bpf_func_id func_id)
+static const struct bpf_func_proto *
+sk_skb_func_proto(enum bpf_func_id func_id,
+		  const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_store_bytes:
@@ -3613,12 +3622,13 @@ static const struct bpf_func_proto *sk_skb_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_sk_redirect_map:
 		return &bpf_sk_redirect_map_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog_extra);
 	}
 }
 
 static const struct bpf_func_proto *
-lwt_xmit_func_proto(enum bpf_func_id func_id)
+lwt_xmit_func_proto(enum bpf_func_id func_id,
+		    const struct bpf_prog_extra *prog_extra)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_get_tunnel_key:
@@ -3648,12 +3658,13 @@ lwt_xmit_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_set_hash_invalid:
 		return &bpf_set_hash_invalid_proto;
 	default:
-		return lwt_inout_func_proto(func_id);
+		return lwt_inout_func_proto(func_id, prog_extra);
 	}
 }
 
 static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type,
-				    struct bpf_insn_access_aux *info)
+				    struct bpf_insn_access_aux *info,
+				    const struct bpf_prog_extra *prog_extra)
 {
 	const int size_default = sizeof(__u32);
 
@@ -3696,7 +3707,8 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
 
 static bool sk_filter_is_valid_access(int off, int size,
 				      enum bpf_access_type type,
-				      struct bpf_insn_access_aux *info)
+				      struct bpf_insn_access_aux *info,
+				      const struct bpf_prog_extra *prog_extra)
 {
 	switch (off) {
 	case bpf_ctx_range(struct __sk_buff, tc_classid):
@@ -3716,12 +3728,13 @@ static bool sk_filter_is_valid_access(int off, int size,
 		}
 	}
 
-	return bpf_skb_is_valid_access(off, size, type, info);
+	return bpf_skb_is_valid_access(off, size, type, info, prog_extra);
 }
 
 static bool lwt_is_valid_access(int off, int size,
 				enum bpf_access_type type,
-				struct bpf_insn_access_aux *info)
+				struct bpf_insn_access_aux *info,
+				const struct bpf_prog_extra *prog_extra)
 {
 	switch (off) {
 	case bpf_ctx_range(struct __sk_buff, tc_classid):
@@ -3750,12 +3763,13 @@ static bool lwt_is_valid_access(int off, int size,
 		break;
 	}
 
-	return bpf_skb_is_valid_access(off, size, type, info);
+	return bpf_skb_is_valid_access(off, size, type, info, prog_extra);
 }
 
 static bool sock_filter_is_valid_access(int off, int size,
 					enum bpf_access_type type,
-					struct bpf_insn_access_aux *info)
+					struct bpf_insn_access_aux *info,
+					const struct bpf_prog_extra *prog_extra)
 {
 	if (type == BPF_WRITE) {
 		switch (off) {
@@ -3826,7 +3840,8 @@ static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
 
 static bool tc_cls_act_is_valid_access(int off, int size,
 				       enum bpf_access_type type,
-				       struct bpf_insn_access_aux *info)
+				       struct bpf_insn_access_aux *info,
+				       const struct bpf_prog_extra *prog_extra)
 {
 	if (type == BPF_WRITE) {
 		switch (off) {
@@ -3855,7 +3870,7 @@ static bool tc_cls_act_is_valid_access(int off, int size,
 		return false;
 	}
 
-	return bpf_skb_is_valid_access(off, size, type, info);
+	return bpf_skb_is_valid_access(off, size, type, info, prog_extra);
 }
 
 static bool __is_valid_xdp_access(int off, int size)
@@ -3872,7 +3887,8 @@ static bool __is_valid_xdp_access(int off, int size)
 
 static bool xdp_is_valid_access(int off, int size,
 				enum bpf_access_type type,
-				struct bpf_insn_access_aux *info)
+				struct bpf_insn_access_aux *info,
+				const struct bpf_prog_extra *prog_extra)
 {
 	if (type == BPF_WRITE)
 		return false;
@@ -3904,7 +3920,8 @@ EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
 static bool sock_ops_is_valid_access(int off, int size,
 				     enum bpf_access_type type,
-				     struct bpf_insn_access_aux *info)
+				     struct bpf_insn_access_aux *info,
+				     const struct bpf_prog_extra *prog_extra)
 {
 	const int size_default = sizeof(__u32);
 
@@ -3950,7 +3967,8 @@ static int sk_skb_prologue(struct bpf_insn *insn_buf, bool direct_write,
 
 static bool sk_skb_is_valid_access(int off, int size,
 				   enum bpf_access_type type,
-				   struct bpf_insn_access_aux *info)
+				   struct bpf_insn_access_aux *info,
+				   const struct bpf_prog_extra *prog_extra)
 {
 	switch (off) {
 	case bpf_ctx_range(struct __sk_buff, tc_classid):
@@ -3979,7 +3997,7 @@ static bool sk_skb_is_valid_access(int off, int size,
 		break;
 	}
 
-	return bpf_skb_is_valid_access(off, size, type, info);
+	return bpf_skb_is_valid_access(off, size, type, info, prog_extra);
 }
 
 static u32 bpf_convert_ctx_access(enum bpf_access_type type,
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 69806d74fa53..5bb37db6054b 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -72,6 +72,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	char buf[256];
 	int fd, efd, err, id;
 	struct perf_event_attr attr = {};
+	union bpf_prog_subtype *st = NULL;
 
 	attr.type = PERF_TYPE_TRACEPOINT;
 	attr.sample_type = PERF_SAMPLE_RAW;
@@ -102,7 +103,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	}
 
 	fd = bpf_load_program(prog_type, prog, insns_cnt, license, kern_version,
-			      bpf_log_buf, BPF_LOG_BUF_SIZE);
+			      bpf_log_buf, BPF_LOG_BUF_SIZE, st);
 	if (fd < 0) {
 		printf("bpf_load_program() err=%d\n%s", errno, bpf_log_buf);
 		return -1;
diff --git a/samples/bpf/cookie_uid_helper_example.c b/samples/bpf/cookie_uid_helper_example.c
index 9d751e209f31..df457c07d35d 100644
--- a/samples/bpf/cookie_uid_helper_example.c
+++ b/samples/bpf/cookie_uid_helper_example.c
@@ -159,7 +159,7 @@ static void prog_load(void)
 	};
 	prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog,
 					ARRAY_SIZE(prog), "GPL", 0,
-					log_buf, sizeof(log_buf));
+					log_buf, sizeof(log_buf), NULL);
 	if (prog_fd < 0)
 		error(1, errno, "failed to load prog\n%s\n", log_buf);
 }
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index e29bd52ff9e8..0f4f5f6a9f9f 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -62,7 +62,7 @@ static int bpf_prog_create(const char *object)
 	} else {
 		return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER,
 					insns, insns_cnt, "GPL", 0,
-					bpf_log_buf, BPF_LOG_BUF_SIZE);
+					bpf_log_buf, BPF_LOG_BUF_SIZE, NULL);
 	}
 }
 
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 6fc6e193ef1b..3778f66deb76 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -60,7 +60,8 @@ static int test_sock(void)
 	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
 
 	prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, insns_cnt,
-				   "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
+				   "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE,
+				   NULL);
 	if (prog_fd < 0) {
 		printf("failed to load prog '%s'\n", strerror(errno));
 		goto cleanup;
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
index 4bfcaf93fcf3..f8a91d2b7896 100644
--- a/samples/bpf/test_cgrp2_attach.c
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -72,7 +72,7 @@ static int prog_load(int map_fd, int verdict)
 
 	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
 				prog, insns_cnt, "GPL", 0,
-				bpf_log_buf, BPF_LOG_BUF_SIZE);
+				bpf_log_buf, BPF_LOG_BUF_SIZE, NULL);
 }
 
 static int usage(const char *argv0)
diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c
index 1af412ec6007..16023240ce5d 100644
--- a/samples/bpf/test_cgrp2_attach2.c
+++ b/samples/bpf/test_cgrp2_attach2.c
@@ -45,7 +45,7 @@ static int prog_load(int verdict)
 
 	ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
 			       prog, insns_cnt, "GPL", 0,
-			       bpf_log_buf, BPF_LOG_BUF_SIZE);
+			       bpf_log_buf, BPF_LOG_BUF_SIZE, NULL);
 
 	if (ret < 0) {
 		log_err("Loading program");
@@ -229,7 +229,7 @@ static int prog_load_cnt(int verdict, int val)
 
 	ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
 			       prog, insns_cnt, "GPL", 0,
-			       bpf_log_buf, BPF_LOG_BUF_SIZE);
+			       bpf_log_buf, BPF_LOG_BUF_SIZE, NULL);
 
 	if (ret < 0) {
 		log_err("Loading program");
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index e79594dd629b..b4b8d96f61e7 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -115,7 +115,7 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 	insns_cnt /= sizeof(struct bpf_insn);
 
 	ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt,
-				"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
+				"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE, NULL);
 
 	free(prog);
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index db6bdc375126..87885c92ca78 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -231,6 +231,15 @@ enum bpf_attach_type {
 #define BPF_F_RDONLY		(1U << 3)
 #define BPF_F_WRONLY		(1U << 4)
 
+union bpf_prog_subtype {
+	struct {
+		__u32		type; /* enum landlock_hook_type */
+		__aligned_u64	triggers; /* LANDLOCK_TRIGGER_* */
+		__aligned_u64	options; /* LANDLOCK_OPTION_* */
+		__u32		previous; /* chained program FD */
+	} landlock_hook;
+} __attribute__((aligned(8)));
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -270,6 +279,8 @@ union bpf_attr {
 		__u32		prog_flags;
 		char		prog_name[BPF_OBJ_NAME_LEN];
 		__u32		prog_ifindex;	/* ifindex of netdev to prep for */
+		__aligned_u64	prog_subtype;	/* bpf_prog_subtype address */
+		__u32		prog_subtype_size;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 592a58a2b681..630060c71c5e 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -150,7 +150,8 @@ int bpf_load_program_name(enum bpf_prog_type type, const char *name,
 			  const struct bpf_insn *insns,
 			  size_t insns_cnt, const char *license,
 			  __u32 kern_version, char *log_buf,
-			  size_t log_buf_sz)
+			  size_t log_buf_sz,
+			  const union bpf_prog_subtype *subtype)
 {
 	int fd;
 	union bpf_attr attr;
@@ -166,6 +167,8 @@ int bpf_load_program_name(enum bpf_prog_type type, const char *name,
 	attr.log_level = 0;
 	attr.kern_version = kern_version;
 	memcpy(attr.prog_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
+	attr.prog_subtype = ptr_to_u64(subtype);
+	attr.prog_subtype_size = subtype ? sizeof(*subtype) : 0;
 
 	fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
 	if (fd >= 0 || !log_buf || !log_buf_sz)
@@ -182,16 +185,18 @@ int bpf_load_program_name(enum bpf_prog_type type, const char *name,
 int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		     size_t insns_cnt, const char *license,
 		     __u32 kern_version, char *log_buf,
-		     size_t log_buf_sz)
+		     size_t log_buf_sz, const union bpf_prog_subtype *subtype)
 {
 	return bpf_load_program_name(type, NULL, insns, insns_cnt, license,
-				     kern_version, log_buf, log_buf_sz);
+				     kern_version, log_buf, log_buf_sz,
+				     subtype);
 }
 
 int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		       size_t insns_cnt, int strict_alignment,
 		       const char *license, __u32 kern_version,
-		       char *log_buf, size_t log_buf_sz, int log_level)
+		       char *log_buf, size_t log_buf_sz, int log_level,
+		       const union bpf_prog_subtype *subtype)
 {
 	union bpf_attr attr;
 
@@ -205,6 +210,8 @@ int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 	attr.log_level = log_level;
 	log_buf[0] = 0;
 	attr.kern_version = kern_version;
+	attr.prog_subtype = ptr_to_u64(subtype);
+	attr.prog_subtype_size = subtype ? sizeof(*subtype) : 0;
 	attr.prog_flags = strict_alignment ? BPF_F_STRICT_ALIGNMENT : 0;
 
 	return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 8d18fb73d7fb..25f58da59df3 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -47,15 +47,17 @@ int bpf_load_program_name(enum bpf_prog_type type, const char *name,
 			  const struct bpf_insn *insns,
 			  size_t insns_cnt, const char *license,
 			  __u32 kern_version, char *log_buf,
-			  size_t log_buf_sz);
+			  size_t log_buf_sz,
+			  const union bpf_prog_subtype *subtype);
 int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		     size_t insns_cnt, const char *license,
 		     __u32 kern_version, char *log_buf,
-		     size_t log_buf_sz);
+		     size_t log_buf_sz, const union bpf_prog_subtype *subtype);
 int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		       size_t insns_cnt, int strict_alignment,
 		       const char *license, __u32 kern_version,
-		       char *log_buf, size_t log_buf_sz, int log_level);
+		       char *log_buf, size_t log_buf_sz, int log_level,
+		       const union bpf_prog_subtype *subtype);
 
 int bpf_map_update_elem(int fd, const void *key, const void *value,
 			__u64 flags);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 97073d649c1a..615860db6224 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1175,7 +1175,8 @@ load_program(enum bpf_prog_type type, const char *name, struct bpf_insn *insns,
 		pr_warning("Alloc log buffer for bpf loader error, continue without log\n");
 
 	ret = bpf_load_program_name(type, name, insns, insns_cnt, license,
-				    kern_version, log_buf, BPF_LOG_BUF_SIZE);
+				    kern_version, log_buf, BPF_LOG_BUF_SIZE,
+				    NULL);
 
 	if (ret >= 0) {
 		*pfd = ret;
@@ -1202,7 +1203,7 @@ load_program(enum bpf_prog_type type, const char *name, struct bpf_insn *insns,
 
 			fd = bpf_load_program_name(BPF_PROG_TYPE_KPROBE, name,
 						   insns, insns_cnt, license,
-						   kern_version, NULL, 0);
+						   kern_version, NULL, 0, NULL);
 			if (fd >= 0) {
 				close(fd);
 				ret = -LIBBPF_ERRNO__PROGTYPE;
diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
index e8399beca62b..2635de920a71 100644
--- a/tools/perf/tests/bpf.c
+++ b/tools/perf/tests/bpf.c
@@ -306,7 +306,7 @@ static int check_env(void)
 
 	err = bpf_load_program(BPF_PROG_TYPE_KPROBE, insns,
 			       sizeof(insns) / sizeof(insns[0]),
-			       license, kver_int, NULL, 0);
+			       license, kver_int, NULL, 0, NULL);
 	if (err < 0) {
 		pr_err("Missing basic BPF support, skip this test: %s\n",
 		       strerror(errno));
diff --git a/tools/testing/selftests/bpf/test_align.c b/tools/testing/selftests/bpf/test_align.c
index ff8bd7e3e50c..ed638ce9932e 100644
--- a/tools/testing/selftests/bpf/test_align.c
+++ b/tools/testing/selftests/bpf/test_align.c
@@ -625,7 +625,7 @@ static int do_test_single(struct bpf_align_test *test)
 	prog_len = probe_filter_length(prog);
 	fd_prog = bpf_verify_program(prog_type ? : BPF_PROG_TYPE_SOCKET_FILTER,
 				     prog, prog_len, 1, "GPL", 0,
-				     bpf_vlog, sizeof(bpf_vlog), 2);
+				     bpf_vlog, sizeof(bpf_vlog), 2, NULL);
 	if (fd_prog < 0 && test->result != REJECT) {
 		printf("Failed to load program.\n");
 		printf("%s", bpf_vlog);
diff --git a/tools/testing/selftests/bpf/test_tag.c b/tools/testing/selftests/bpf/test_tag.c
index 8b201895c569..0b58e746af0c 100644
--- a/tools/testing/selftests/bpf/test_tag.c
+++ b/tools/testing/selftests/bpf/test_tag.c
@@ -58,7 +58,7 @@ static int bpf_try_load_prog(int insns, int fd_map,
 
 	bpf_filler(insns, fd_map);
 	fd_prog = bpf_load_program(BPF_PROG_TYPE_SCHED_CLS, prog, insns, "", 0,
-				   NULL, 0);
+				   NULL, 0, NULL);
 	assert(fd_prog > 0);
 	if (fd_map > 0)
 		bpf_filler(insns, 0);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index c987d3a2426f..3c24a5a7bafc 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -77,6 +77,8 @@ struct bpf_test {
 	} result, result_unpriv;
 	enum bpf_prog_type prog_type;
 	uint8_t flags;
+	bool has_prog_subtype;
+	union bpf_prog_subtype prog_subtype;
 };
 
 /* Note we want this to be 64 bit aligned so that the end of our array is
@@ -11228,7 +11230,16 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 		.retval = 2,
 	},
-
+	{
+		"superfluous subtype",
+		.insns = {
+			BPF_MOV32_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "",
+		.result = REJECT,
+		.has_prog_subtype = true,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)
@@ -11346,6 +11357,8 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 	const char *expected_err;
 	uint32_t retval;
 	int i, err;
+	union bpf_prog_subtype *prog_subtype =
+		test->has_prog_subtype ? &test->prog_subtype : NULL;
 
 	for (i = 0; i < MAX_NR_MAPS; i++)
 		map_fds[i] = -1;
@@ -11354,7 +11367,8 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 
 	fd_prog = bpf_verify_program(prog_type ? : BPF_PROG_TYPE_SOCKET_FILTER,
 				     prog, prog_len, test->flags & F_LOAD_WITH_STRICT_ALIGNMENT,
-				     "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 1);
+				     "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 1,
+				     prog_subtype);
 
 	expected_ret = unpriv && test->result_unpriv != UNDEF ?
 		       test->result_unpriv : test->result;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 04/11] bpf,landlock: Define an eBPF program type for Landlock hooks
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (2 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 03/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy Mickaël Salaün
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

Add a new type of eBPF program used by Landlock hooks. This type of
program can be chained with the same eBPF program type (according to
subtype rules). A state can be kept with a value available in the
program's context (e.g. named "cookie" for Landlock programs).

This new BPF program type will be registered with the Landlock LSM
initialization.

Add an initial Landlock Kconfig and update the MAINTAINERS file.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v7:
* cosmetic fixes
* rename LANDLOCK_SUBTYPE_* to LANDLOCK_*
* cleanup UAPI definitions and move them from bpf.h to landlock.h
  (suggested by Alexei Starovoitov)
* disable Landlock by default (suggested by Alexei Starovoitov)
* rename BPF_PROG_TYPE_LANDLOCK_{RULE,HOOK}
* update the Kconfig
* update the MAINTAINERS file
* replace the IOCTL, LOCK and FCNTL events with FS_PICK, FS_WALK and
  FS_GET hook types
* add the ability to chain programs with an eBPF program file descriptor
  (i.e. the "previous" field in a Landlock subtype) and keep a state
  with a "cookie" value available from the context
* add a "triggers" subtype bitfield to match specific actions (e.g.
  append, chdir, read...)

Changes since v6:
* add 3 more sub-events: IOCTL, LOCK, FCNTL
  https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35571@digikod.net
* rename LANDLOCK_VERSION to LANDLOCK_ABI to better reflect its purpose,
  and move it from landlock.h to common.h
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE: an eBPF
  program could be used for something else than a rule
* simplify struct landlock_context by removing the arch and syscall_nr fields
* remove all eBPF map functions call, remove ABILITY_WRITE
* refactor bpf_landlock_func_proto() (suggested by Kees Cook)
* constify pointers
* fix doc inclusion

Changes since v5:
* rename file hooks.c to init.c
* fix spelling

Changes since v4:
* merge a minimal (not enabled) LSM code and Kconfig in this commit

Changes since v3:
* split commit
* revamp the landlock_context:
  * add arch, syscall_nr and syscall_cmd (ioctl, fcntl…) to be able to
    cross-check action with the event type
  * replace args array with dedicated fields to ease the addition of new
    fields
---
 MAINTAINERS                         |  13 +++
 include/linux/bpf_types.h           |   3 +
 include/uapi/linux/bpf.h            |   1 +
 include/uapi/linux/landlock.h       | 155 +++++++++++++++++++++++++++++++
 security/Kconfig                    |   1 +
 security/Makefile                   |   2 +
 security/landlock/Kconfig           |  18 ++++
 security/landlock/Makefile          |   3 +
 security/landlock/common.h          |  32 +++++++
 security/landlock/init.c            | 180 ++++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h      |   1 +
 tools/include/uapi/linux/landlock.h | 155 +++++++++++++++++++++++++++++++
 12 files changed, 564 insertions(+)
 create mode 100644 include/uapi/linux/landlock.h
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/init.c
 create mode 100644 tools/include/uapi/linux/landlock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..ac0809094bae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7882,6 +7882,19 @@ S:	Maintained
 F:	net/l3mdev
 F:	include/net/l3mdev.h
 
+LANDLOCK SECURITY MODULE
+M:	Mickaël Salaün <mic@digikod.net>
+S:	Supported
+F:	Documentation/security/landlock/
+F:	include/linux/landlock.h
+F:	include/uapi/linux/landlock.h
+F:	samples/bpf/landlock*
+F:	security/landlock/
+F:	tools/include/uapi/linux/landlock.h
+F:	tools/testing/selftests/landlock/
+K:	landlock
+K:	LANDLOCK
+
 LANTIQ MIPS ARCHITECTURE
 M:	John Crispin <john@phrozen.org>
 L:	linux-mips@linux-mips.org
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 19b8349a3809..0ca019f3ae4a 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -22,6 +22,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
 #ifdef CONFIG_CGROUP_BPF
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+BPF_PROG_TYPE(BPF_PROG_TYPE_LANDLOCK_HOOK, landlock)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 87885c92ca78..2433aa1a0fd4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -133,6 +133,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SOCK_OPS,
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
+	BPF_PROG_TYPE_LANDLOCK_HOOK,
 };
 
 enum bpf_attach_type {
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
new file mode 100644
index 000000000000..49a132092fd9
--- /dev/null
+++ b/include/uapi/linux/landlock.h
@@ -0,0 +1,155 @@
+/*
+ * Landlock - UAPI headers
+ *
+ * Copyright © 2017-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _UAPI__LINUX_LANDLOCK_H__
+#define _UAPI__LINUX_LANDLOCK_H__
+
+#include <linux/types.h>
+
+#define LANDLOCK_RET_ALLOW	0
+#define LANDLOCK_RET_DENY	1
+
+/**
+ * enum landlock_hook_type - hook type for which a Landlock program is called
+ *
+ * A hook is a policy decision point which exposes the same context type for
+ * each program evaluation.
+ *
+ * @LANDLOCK_HOOK_FS_PICK: called for the last element of a file path
+ * @LANDLOCK_HOOK_FS_WALK: called for each directory of a file path (excluding
+ *			   the directory passed to fs_pick, if any)
+ * @LANDLOCK_HOOK_FS_GET: called for file opening or receiveing or when
+ *			  changing directory or root
+ */
+enum landlock_hook_type {
+	LANDLOCK_HOOK_FS_PICK = 1,
+	LANDLOCK_HOOK_FS_WALK,
+	LANDLOCK_HOOK_FS_GET,
+};
+
+/**
+ * DOC: landlock_subtype_options
+ *
+ * - %LANDLOCK_OPTION_PREVIOUS: specify a previous file descriptor in the
+ *			        dedicated field
+ */
+#define LANDLOCK_OPTION_PREVIOUS			(1ULL << 0)
+
+/**
+ * DOC: landlock_triggers
+ *
+ * A landlock trigger is used as a bitmask in subtype.landlock_hook.triggers
+ * for a fs_pick program.  It defines a set of actions for which the program
+ * should verify an access request.
+ *
+ * - %LANDLOCK_TRIGGER_FS_PICK_APPEND
+ * - %LANDLOCK_TRIGGER_FS_PICK_CHDIR
+ * - %LANDLOCK_TRIGGER_FS_PICK_CHROOT
+ * - %LANDLOCK_TRIGGER_FS_PICK_CREATE
+ * - %LANDLOCK_TRIGGER_FS_PICK_EXECUTE
+ * - %LANDLOCK_TRIGGER_FS_PICK_FCNTL
+ * - %LANDLOCK_TRIGGER_FS_PICK_GETATTR
+ * - %LANDLOCK_TRIGGER_FS_PICK_IOCTL
+ * - %LANDLOCK_TRIGGER_FS_PICK_LINK
+ * - %LANDLOCK_TRIGGER_FS_PICK_LINKTO
+ * - %LANDLOCK_TRIGGER_FS_PICK_LOCK
+ * - %LANDLOCK_TRIGGER_FS_PICK_MAP
+ * - %LANDLOCK_TRIGGER_FS_PICK_MOUNTON
+ * - %LANDLOCK_TRIGGER_FS_PICK_OPEN
+ * - %LANDLOCK_TRIGGER_FS_PICK_READ
+ * - %LANDLOCK_TRIGGER_FS_PICK_READDIR
+ * - %LANDLOCK_TRIGGER_FS_PICK_RECEIVE
+ * - %LANDLOCK_TRIGGER_FS_PICK_RENAME
+ * - %LANDLOCK_TRIGGER_FS_PICK_RENAMETO
+ * - %LANDLOCK_TRIGGER_FS_PICK_RMDIR
+ * - %LANDLOCK_TRIGGER_FS_PICK_SETATTR
+ * - %LANDLOCK_TRIGGER_FS_PICK_TRANSFER
+ * - %LANDLOCK_TRIGGER_FS_PICK_UNLINK
+ * - %LANDLOCK_TRIGGER_FS_PICK_WRITE
+ */
+#define LANDLOCK_TRIGGER_FS_PICK_APPEND			(1ULL << 0)
+#define LANDLOCK_TRIGGER_FS_PICK_CHDIR			(1ULL << 1)
+#define LANDLOCK_TRIGGER_FS_PICK_CHROOT			(1ULL << 2)
+#define LANDLOCK_TRIGGER_FS_PICK_CREATE			(1ULL << 3)
+#define LANDLOCK_TRIGGER_FS_PICK_EXECUTE		(1ULL << 4)
+#define LANDLOCK_TRIGGER_FS_PICK_FCNTL			(1ULL << 5)
+#define LANDLOCK_TRIGGER_FS_PICK_GETATTR		(1ULL << 6)
+#define LANDLOCK_TRIGGER_FS_PICK_IOCTL			(1ULL << 7)
+#define LANDLOCK_TRIGGER_FS_PICK_LINK			(1ULL << 8)
+#define LANDLOCK_TRIGGER_FS_PICK_LINKTO			(1ULL << 9)
+#define LANDLOCK_TRIGGER_FS_PICK_LOCK			(1ULL << 10)
+#define LANDLOCK_TRIGGER_FS_PICK_MAP			(1ULL << 11)
+#define LANDLOCK_TRIGGER_FS_PICK_MOUNTON		(1ULL << 12)
+#define LANDLOCK_TRIGGER_FS_PICK_OPEN			(1ULL << 13)
+#define LANDLOCK_TRIGGER_FS_PICK_READ			(1ULL << 14)
+#define LANDLOCK_TRIGGER_FS_PICK_READDIR		(1ULL << 15)
+#define LANDLOCK_TRIGGER_FS_PICK_RECEIVE		(1ULL << 16)
+#define LANDLOCK_TRIGGER_FS_PICK_RENAME			(1ULL << 17)
+#define LANDLOCK_TRIGGER_FS_PICK_RENAMETO		(1ULL << 18)
+#define LANDLOCK_TRIGGER_FS_PICK_RMDIR			(1ULL << 19)
+#define LANDLOCK_TRIGGER_FS_PICK_SETATTR		(1ULL << 20)
+#define LANDLOCK_TRIGGER_FS_PICK_TRANSFER		(1ULL << 21)
+#define LANDLOCK_TRIGGER_FS_PICK_UNLINK			(1ULL << 22)
+#define LANDLOCK_TRIGGER_FS_PICK_WRITE			(1ULL << 23)
+
+/* inode_lookup */
+/* LOOKUP_ROOT can only be seen for the first fs_walk call */
+#define LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_ROOT		1
+#define LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT		2
+#define LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT	3
+
+/**
+ * struct landlock_ctx_fs_pick - context accessible to a fs_pick program
+ *
+ * @cookie: value saved and restored between calls to chained programs
+ * @chain: chain pointer to identify the current chain
+ * @inode: pointer to the current kernel object that can be used with
+ *	   bpf_inode_get_tag()
+ * @inode_lookup: bitflags to identify how we got there
+ */
+struct landlock_ctx_fs_pick {
+	__u64 cookie;
+	__u64 chain;
+	__u64 inode;
+	__u8 inode_lookup;
+};
+
+/**
+ * struct landlock_ctx_fs_walk - context accessible to a fs_walk program
+ *
+ * @cookie: value saved and restored between calls to chained programs
+ * @chain: chain pointer to identify the current chain
+ * @inode: pointer to the current kernel object that can be used with
+ *	   bpf_inode_get_tag()
+ * @inode_lookup: bitflags to identify how we got there
+ */
+struct landlock_ctx_fs_walk {
+	__u64 cookie;
+	__u64 chain;
+	__u64 inode;
+	__u8 inode_lookup;
+};
+
+/**
+ * struct landlock_ctx_fs_get - context accessible to a fs_get program
+ *
+ * @cookie: value saved and restored between calls to chained programs
+ * @chain: chain pointer to identify the current chain
+ * @tag_object: pointer that can be used to tag a file/inode with
+ *		bpf_landlock_set_tag()
+ */
+struct landlock_ctx_fs_get {
+	__u64 cookie;
+	__u64 chain;
+	__u64 tag_object;
+};
+
+#endif /* _UAPI__LINUX_LANDLOCK_H__ */
diff --git a/security/Kconfig b/security/Kconfig
index c4302067a3ad..649695e88c87 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -237,6 +237,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/landlock/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/Makefile b/security/Makefile
index 4d2d3782ddef..808317bd11d1 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -10,6 +10,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
 subdir-$(CONFIG_SECURITY_YAMA)		+= yama
 subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
+subdir-$(CONFIG_SECURITY_LANDLOCK)		+= landlock
 
 # always enable default capabilities
 obj-y					+= commoncap.o
@@ -25,6 +26,7 @@ obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
 obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
 obj-$(CONFIG_SECURITY_YAMA)		+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
+obj-$(CONFIG_SECURITY_LANDLOCK)	+= landlock/
 obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
 
 # Object integrity file lists
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index 000000000000..8bd103102008
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,18 @@
+config SECURITY_LANDLOCK
+	bool "Landlock support"
+	depends on SECURITY
+	depends on BPF_SYSCALL
+	depends on SECCOMP_FILTER
+	default n
+	help
+	  This selects Landlock, a programmatic access control.  It enables to
+	  restrict processes on the fly (i.e. create a sandbox).  The security
+	  policy is a set of eBPF programs, dedicated to deny a list of actions
+	  on specific kernel objects (e.g. file).
+
+	  You need to enable seccomp filter to apply a security policy to a
+	  process hierarchy (e.g. application with built-in sandboxing).
+
+	  See Documentation/security/landlock/ for further information.
+
+	  If you are unsure how to answer this question, answer N.
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
new file mode 100644
index 000000000000..7205f9a7a2ee
--- /dev/null
+++ b/security/landlock/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
+
+landlock-y := init.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
new file mode 100644
index 000000000000..0906678c0ed0
--- /dev/null
+++ b/security/landlock/common.h
@@ -0,0 +1,32 @@
+/*
+ * Landlock LSM - private headers
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_COMMON_H
+#define _SECURITY_LANDLOCK_COMMON_H
+
+#include <linux/bpf.h> /* enum bpf_prog_aux */
+#include <linux/filter.h> /* bpf_prog */
+#include <linux/refcount.h> /* refcount_t */
+#include <uapi/linux/landlock.h> /* enum landlock_hook_type */
+
+#define LANDLOCK_NAME "landlock"
+
+/* UAPI bounds and bitmasks */
+
+#define _LANDLOCK_HOOK_LAST LANDLOCK_HOOK_FS_GET
+
+#define _LANDLOCK_OPTION_LAST		LANDLOCK_OPTION_PREVIOUS
+#define _LANDLOCK_OPTION_MASK		((_LANDLOCK_OPTION_LAST << 1ULL) - 1)
+
+#define _LANDLOCK_TRIGGER_FS_PICK_LAST	LANDLOCK_TRIGGER_FS_PICK_WRITE
+#define _LANDLOCK_TRIGGER_FS_PICK_MASK	((_LANDLOCK_TRIGGER_FS_PICK_LAST << 1ULL) - 1)
+
+#endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git a/security/landlock/init.c b/security/landlock/init.c
new file mode 100644
index 000000000000..ef2ee0742c53
--- /dev/null
+++ b/security/landlock/init.c
@@ -0,0 +1,180 @@
+/*
+ * Landlock LSM - init
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h> /* enum bpf_access_type */
+#include <linux/capability.h> /* capable */
+#include <linux/filter.h> /* struct bpf_prog */
+
+#include "common.h" /* LANDLOCK_* */
+
+static bool bpf_landlock_is_valid_access(int off, int size,
+		enum bpf_access_type type, struct bpf_insn_access_aux *info,
+		const struct bpf_prog_extra *prog_extra)
+{
+	const union bpf_prog_subtype *prog_subtype;
+	enum bpf_reg_type reg_type = NOT_INIT;
+	int max_size = 0;
+
+	if (WARN_ON(!prog_extra))
+		return false;
+	prog_subtype = &prog_extra->subtype;
+
+	if (off < 0)
+		return false;
+	if (size <= 0 || size > sizeof(__u64))
+		return false;
+
+	/* check memory range access */
+	switch (reg_type) {
+	case NOT_INIT:
+		return false;
+	case SCALAR_VALUE:
+		/* allow partial raw value */
+		if (size > max_size)
+			return false;
+		info->ctx_field_size = max_size;
+		break;
+	default:
+		/* deny partial pointer */
+		if (size != max_size)
+			return false;
+	}
+
+	info->reg_type = reg_type;
+	return true;
+}
+
+/*
+ * Check order of Landlock programs
+ *
+ * Keep in sync with enforce.c:is_hook_type_forkable().
+ */
+static bool good_previous_prog(enum landlock_hook_type current_type,
+		const struct bpf_prog *previous)
+{
+	enum landlock_hook_type previous_type;
+
+	if (previous->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
+		return false;
+	if (WARN_ON(!previous->aux->extra))
+		return false;
+	previous_type = previous->aux->extra->subtype.landlock_hook.type;
+	switch (current_type) {
+	case LANDLOCK_HOOK_FS_PICK:
+		switch (previous_type) {
+		case LANDLOCK_HOOK_FS_PICK:
+		case LANDLOCK_HOOK_FS_WALK:
+			return true;
+		default:
+			return false;
+		}
+	case LANDLOCK_HOOK_FS_GET:
+		/* In the future, fs_get could be chained with another fs_get
+		 * (different triggers), but not for now. */
+		if (previous_type != LANDLOCK_HOOK_FS_PICK)
+			return false;
+		return true;
+	case LANDLOCK_HOOK_FS_WALK:
+		return false;
+	}
+	WARN_ON(1);
+	return false;
+}
+
+static bool bpf_landlock_is_valid_subtype(struct bpf_prog_extra *prog_extra)
+{
+	const union bpf_prog_subtype *subtype;
+
+	if (!prog_extra)
+		return false;
+	subtype = &prog_extra->subtype;
+
+	switch (subtype->landlock_hook.type) {
+	case LANDLOCK_HOOK_FS_PICK:
+		if (!subtype->landlock_hook.triggers ||
+				subtype->landlock_hook.triggers &
+				~_LANDLOCK_TRIGGER_FS_PICK_MASK)
+			return false;
+		break;
+	case LANDLOCK_HOOK_FS_WALK:
+	case LANDLOCK_HOOK_FS_GET:
+		if (subtype->landlock_hook.triggers)
+			return false;
+		break;
+	default:
+		return false;
+	}
+
+	if (subtype->landlock_hook.options & ~_LANDLOCK_OPTION_MASK)
+		return false;
+	if (subtype->landlock_hook.options & LANDLOCK_OPTION_PREVIOUS) {
+		struct bpf_prog *previous;
+
+		/* check and save the chained program */
+		previous = bpf_prog_get(subtype->landlock_hook.previous);
+		if (IS_ERR(previous))
+			return false;
+		if (!good_previous_prog(subtype->landlock_hook.type,
+					previous)) {
+			bpf_prog_put(previous);
+			return false;
+		}
+		/* It is not possible to create loops because the current
+		 * program does not exist yet. */
+		prog_extra->landlock_hook.previous = previous;
+	}
+
+	return true;
+}
+
+static const struct bpf_func_proto *bpf_landlock_func_proto(
+		enum bpf_func_id func_id,
+		const struct bpf_prog_extra *prog_extra)
+{
+	u64 hook_type;
+
+	if (WARN_ON(!prog_extra))
+		return NULL;
+	hook_type = prog_extra->subtype.landlock_hook.type;
+
+	/* generic functions */
+	/* TODO: do we need/want update/delete functions for every LL prog?
+	 * => impurity vs. audit */
+	switch (func_id) {
+	case BPF_FUNC_map_lookup_elem:
+		return &bpf_map_lookup_elem_proto;
+	case BPF_FUNC_map_update_elem:
+		return &bpf_map_update_elem_proto;
+	case BPF_FUNC_map_delete_elem:
+		return &bpf_map_delete_elem_proto;
+	default:
+		break;
+	}
+	return NULL;
+}
+
+static void bpf_landlock_put_extra(struct bpf_prog_extra *prog_extra)
+{
+	if (WARN_ON(!prog_extra))
+		return;
+	if (prog_extra->landlock_hook.previous)
+		bpf_prog_put(prog_extra->landlock_hook.previous);
+}
+
+const struct bpf_verifier_ops landlock_verifier_ops = {
+	.get_func_proto	= bpf_landlock_func_proto,
+	.is_valid_access = bpf_landlock_is_valid_access,
+	.is_valid_subtype = bpf_landlock_is_valid_subtype,
+};
+
+const struct bpf_prog_ops landlock_prog_ops = {
+	.put_extra = bpf_landlock_put_extra,
+};
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 87885c92ca78..2433aa1a0fd4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -133,6 +133,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SOCK_OPS,
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
+	BPF_PROG_TYPE_LANDLOCK_HOOK,
 };
 
 enum bpf_attach_type {
diff --git a/tools/include/uapi/linux/landlock.h b/tools/include/uapi/linux/landlock.h
new file mode 100644
index 000000000000..49a132092fd9
--- /dev/null
+++ b/tools/include/uapi/linux/landlock.h
@@ -0,0 +1,155 @@
+/*
+ * Landlock - UAPI headers
+ *
+ * Copyright © 2017-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _UAPI__LINUX_LANDLOCK_H__
+#define _UAPI__LINUX_LANDLOCK_H__
+
+#include <linux/types.h>
+
+#define LANDLOCK_RET_ALLOW	0
+#define LANDLOCK_RET_DENY	1
+
+/**
+ * enum landlock_hook_type - hook type for which a Landlock program is called
+ *
+ * A hook is a policy decision point which exposes the same context type for
+ * each program evaluation.
+ *
+ * @LANDLOCK_HOOK_FS_PICK: called for the last element of a file path
+ * @LANDLOCK_HOOK_FS_WALK: called for each directory of a file path (excluding
+ *			   the directory passed to fs_pick, if any)
+ * @LANDLOCK_HOOK_FS_GET: called for file opening or receiveing or when
+ *			  changing directory or root
+ */
+enum landlock_hook_type {
+	LANDLOCK_HOOK_FS_PICK = 1,
+	LANDLOCK_HOOK_FS_WALK,
+	LANDLOCK_HOOK_FS_GET,
+};
+
+/**
+ * DOC: landlock_subtype_options
+ *
+ * - %LANDLOCK_OPTION_PREVIOUS: specify a previous file descriptor in the
+ *			        dedicated field
+ */
+#define LANDLOCK_OPTION_PREVIOUS			(1ULL << 0)
+
+/**
+ * DOC: landlock_triggers
+ *
+ * A landlock trigger is used as a bitmask in subtype.landlock_hook.triggers
+ * for a fs_pick program.  It defines a set of actions for which the program
+ * should verify an access request.
+ *
+ * - %LANDLOCK_TRIGGER_FS_PICK_APPEND
+ * - %LANDLOCK_TRIGGER_FS_PICK_CHDIR
+ * - %LANDLOCK_TRIGGER_FS_PICK_CHROOT
+ * - %LANDLOCK_TRIGGER_FS_PICK_CREATE
+ * - %LANDLOCK_TRIGGER_FS_PICK_EXECUTE
+ * - %LANDLOCK_TRIGGER_FS_PICK_FCNTL
+ * - %LANDLOCK_TRIGGER_FS_PICK_GETATTR
+ * - %LANDLOCK_TRIGGER_FS_PICK_IOCTL
+ * - %LANDLOCK_TRIGGER_FS_PICK_LINK
+ * - %LANDLOCK_TRIGGER_FS_PICK_LINKTO
+ * - %LANDLOCK_TRIGGER_FS_PICK_LOCK
+ * - %LANDLOCK_TRIGGER_FS_PICK_MAP
+ * - %LANDLOCK_TRIGGER_FS_PICK_MOUNTON
+ * - %LANDLOCK_TRIGGER_FS_PICK_OPEN
+ * - %LANDLOCK_TRIGGER_FS_PICK_READ
+ * - %LANDLOCK_TRIGGER_FS_PICK_READDIR
+ * - %LANDLOCK_TRIGGER_FS_PICK_RECEIVE
+ * - %LANDLOCK_TRIGGER_FS_PICK_RENAME
+ * - %LANDLOCK_TRIGGER_FS_PICK_RENAMETO
+ * - %LANDLOCK_TRIGGER_FS_PICK_RMDIR
+ * - %LANDLOCK_TRIGGER_FS_PICK_SETATTR
+ * - %LANDLOCK_TRIGGER_FS_PICK_TRANSFER
+ * - %LANDLOCK_TRIGGER_FS_PICK_UNLINK
+ * - %LANDLOCK_TRIGGER_FS_PICK_WRITE
+ */
+#define LANDLOCK_TRIGGER_FS_PICK_APPEND			(1ULL << 0)
+#define LANDLOCK_TRIGGER_FS_PICK_CHDIR			(1ULL << 1)
+#define LANDLOCK_TRIGGER_FS_PICK_CHROOT			(1ULL << 2)
+#define LANDLOCK_TRIGGER_FS_PICK_CREATE			(1ULL << 3)
+#define LANDLOCK_TRIGGER_FS_PICK_EXECUTE		(1ULL << 4)
+#define LANDLOCK_TRIGGER_FS_PICK_FCNTL			(1ULL << 5)
+#define LANDLOCK_TRIGGER_FS_PICK_GETATTR		(1ULL << 6)
+#define LANDLOCK_TRIGGER_FS_PICK_IOCTL			(1ULL << 7)
+#define LANDLOCK_TRIGGER_FS_PICK_LINK			(1ULL << 8)
+#define LANDLOCK_TRIGGER_FS_PICK_LINKTO			(1ULL << 9)
+#define LANDLOCK_TRIGGER_FS_PICK_LOCK			(1ULL << 10)
+#define LANDLOCK_TRIGGER_FS_PICK_MAP			(1ULL << 11)
+#define LANDLOCK_TRIGGER_FS_PICK_MOUNTON		(1ULL << 12)
+#define LANDLOCK_TRIGGER_FS_PICK_OPEN			(1ULL << 13)
+#define LANDLOCK_TRIGGER_FS_PICK_READ			(1ULL << 14)
+#define LANDLOCK_TRIGGER_FS_PICK_READDIR		(1ULL << 15)
+#define LANDLOCK_TRIGGER_FS_PICK_RECEIVE		(1ULL << 16)
+#define LANDLOCK_TRIGGER_FS_PICK_RENAME			(1ULL << 17)
+#define LANDLOCK_TRIGGER_FS_PICK_RENAMETO		(1ULL << 18)
+#define LANDLOCK_TRIGGER_FS_PICK_RMDIR			(1ULL << 19)
+#define LANDLOCK_TRIGGER_FS_PICK_SETATTR		(1ULL << 20)
+#define LANDLOCK_TRIGGER_FS_PICK_TRANSFER		(1ULL << 21)
+#define LANDLOCK_TRIGGER_FS_PICK_UNLINK			(1ULL << 22)
+#define LANDLOCK_TRIGGER_FS_PICK_WRITE			(1ULL << 23)
+
+/* inode_lookup */
+/* LOOKUP_ROOT can only be seen for the first fs_walk call */
+#define LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_ROOT		1
+#define LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT		2
+#define LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT	3
+
+/**
+ * struct landlock_ctx_fs_pick - context accessible to a fs_pick program
+ *
+ * @cookie: value saved and restored between calls to chained programs
+ * @chain: chain pointer to identify the current chain
+ * @inode: pointer to the current kernel object that can be used with
+ *	   bpf_inode_get_tag()
+ * @inode_lookup: bitflags to identify how we got there
+ */
+struct landlock_ctx_fs_pick {
+	__u64 cookie;
+	__u64 chain;
+	__u64 inode;
+	__u8 inode_lookup;
+};
+
+/**
+ * struct landlock_ctx_fs_walk - context accessible to a fs_walk program
+ *
+ * @cookie: value saved and restored between calls to chained programs
+ * @chain: chain pointer to identify the current chain
+ * @inode: pointer to the current kernel object that can be used with
+ *	   bpf_inode_get_tag()
+ * @inode_lookup: bitflags to identify how we got there
+ */
+struct landlock_ctx_fs_walk {
+	__u64 cookie;
+	__u64 chain;
+	__u64 inode;
+	__u8 inode_lookup;
+};
+
+/**
+ * struct landlock_ctx_fs_get - context accessible to a fs_get program
+ *
+ * @cookie: value saved and restored between calls to chained programs
+ * @chain: chain pointer to identify the current chain
+ * @tag_object: pointer that can be used to tag a file/inode with
+ *		bpf_landlock_set_tag()
+ */
+struct landlock_ctx_fs_get {
+	__u64 cookie;
+	__u64 chain;
+	__u64 tag_object;
+};
+
+#endif /* _UAPI__LINUX_LANDLOCK_H__ */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (3 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 04/11] bpf,landlock: Define an eBPF program type for Landlock hooks Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  2:08   ` Alexei Starovoitov
  2018-02-27  0:41 ` [PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode Mickaël Salaün
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	Andrew Morton

The seccomp(2) syscall can be used by a task to apply a Landlock program
to itself. As a seccomp filter, a Landlock program is enforced for the
current task and all its future children. A program is immutable and a
task can only add new restricting programs to itself, forming a list of
programss.

A Landlock program is tied to a Landlock hook. If the action on a kernel
object is allowed by the other Linux security mechanisms (e.g. DAC,
capabilities, other LSM), then a Landlock hook related to this kind of
object is triggered. The list of programs for this hook is then
evaluated. Each program return a 32-bit value which can deny the action
on a kernel object with a non-zero value. If every programs of the list
return zero, then the action on the object is allowed.

Multiple Landlock programs can be chained to share a 64-bits value for a
call chain (e.g. evaluating multiple elements of a file path).  This
chaining is restricted when a process construct this chain by loading a
program, but additional checks are performed when it requests to apply
this chain of programs to itself.  The restrictions ensure that it is
not possible to call multiple programs in a way that would imply to
handle multiple shared values (i.e. cookies) for one chain.  For now,
only a fs_pick program can be chained to the same type of program,
because it may make sense if they have different triggers (cf. next
commits).  This restrictions still allows to reuse Landlock programs in
a safe way (e.g. use the same loaded fs_walk program with multiple
chains of fs_pick programs).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Will Drewry <wad@chromium.org>
Link: https://lkml.kernel.org/r/c10a503d-5e35-7785-2f3d-25ed8dd63fab@digikod.net
---

Changes since v7:
* handle and verify program chains
* split and rename providers.c to enforce.c and enforce_seccomp.c
* rename LANDLOCK_SUBTYPE_* to LANDLOCK_*

Changes since v6:
* rename some functions with more accurate names to reflect that an eBPF
  program for Landlock could be used for something else than a rule
* reword rule "appending" to "prepending" and explain it
* remove the superfluous no_new_privs check, only check global
  CAP_SYS_ADMIN when prepending a Landlock rule (needed for containers)
* create and use {get,put}_seccomp_landlock() (suggested by Kees Cook)
* replace ifdef with static inlined function (suggested by Kees Cook)
* use get_user() (suggested by Kees Cook)
* replace atomic_t with refcount_t (requested by Kees Cook)
* move struct landlock_{rule,events} from landlock.h to common.h
* cleanup headers

Changes since v5:
* remove struct landlock_node and use a similar inheritance mechanisme
  as seccomp-bpf (requested by Andy Lutomirski)
* rename SECCOMP_ADD_LANDLOCK_RULE to SECCOMP_APPEND_LANDLOCK_RULE
* rename file manager.c to providers.c
* add comments
* typo and cosmetic fixes

Changes since v4:
* merge manager and seccomp patches
* return -EFAULT in seccomp(2) when user_bpf_fd is null to easely check
  if Landlock is supported
* only allow a process with the global CAP_SYS_ADMIN to use Landlock
  (will be lifted in the future)
* add an early check to exit as soon as possible if the current process
  does not have Landlock rules

Changes since v3:
* remove the hard link with seccomp (suggested by Andy Lutomirski and
  Kees Cook):
  * remove the cookie which could imply multiple evaluation of Landlock
    rules
  * remove the origin field in struct landlock_data
* remove documentation fix (merged upstream)
* rename the new seccomp command to SECCOMP_ADD_LANDLOCK_RULE
* internal renaming
* split commit
* new design to be able to inherit on the fly the parent rules

Changes since v2:
* Landlock programs can now be run without seccomp filter but for any
  syscall (from the process) or interruption
* move Landlock related functions and structs into security/landlock/*
  (to manage cgroups as well)
* fix seccomp filter handling: run Landlock programs for each of their
  legitimate seccomp filter
* properly clean up all seccomp results
* cosmetic changes to ease the understanding
* fix some ifdef
---
 include/linux/landlock.h            |  37 ++++
 include/linux/seccomp.h             |   5 +
 include/uapi/linux/seccomp.h        |   1 +
 kernel/fork.c                       |   8 +-
 kernel/seccomp.c                    |   4 +
 security/landlock/Makefile          |   3 +-
 security/landlock/chain.c           |  39 ++++
 security/landlock/chain.h           |  35 ++++
 security/landlock/common.h          |  53 +++++
 security/landlock/enforce.c         | 386 ++++++++++++++++++++++++++++++++++++
 security/landlock/enforce.h         |  21 ++
 security/landlock/enforce_seccomp.c | 102 ++++++++++
 12 files changed, 692 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/landlock.h
 create mode 100644 security/landlock/chain.c
 create mode 100644 security/landlock/chain.h
 create mode 100644 security/landlock/enforce.c
 create mode 100644 security/landlock/enforce.h
 create mode 100644 security/landlock/enforce_seccomp.c

diff --git a/include/linux/landlock.h b/include/linux/landlock.h
new file mode 100644
index 000000000000..933d65c00075
--- /dev/null
+++ b/include/linux/landlock.h
@@ -0,0 +1,37 @@
+/*
+ * Landlock LSM - public kernel headers
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_LANDLOCK_H
+#define _LINUX_LANDLOCK_H
+
+#include <linux/errno.h>
+#include <linux/sched.h> /* task_struct */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+extern int landlock_seccomp_prepend_prog(unsigned int flags,
+		const int __user *user_bpf_fd);
+extern void put_seccomp_landlock(struct task_struct *tsk);
+extern void get_seccomp_landlock(struct task_struct *tsk);
+#else /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
+static inline int landlock_seccomp_prepend_prog(unsigned int flags,
+		const int __user *user_bpf_fd)
+{
+		return -EINVAL;
+}
+static inline void put_seccomp_landlock(struct task_struct *tsk)
+{
+}
+static inline void get_seccomp_landlock(struct task_struct *tsk)
+{
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
+
+#endif /* _LINUX_LANDLOCK_H */
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index c723a5c4e3ff..dedad0d5b664 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -9,6 +9,7 @@
 
 #ifdef CONFIG_SECCOMP
 
+#include <linux/landlock.h>
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
 
@@ -20,6 +21,7 @@ struct seccomp_filter;
  *         system calls available to a process.
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *          accessed without locking during system call entry.
+ * @landlock_prog_set: contains a set of Landlock programs.
  *
  *          @filter must only be accessed from the context of current as there
  *          is no read locking.
@@ -27,6 +29,9 @@ struct seccomp_filter;
 struct seccomp {
 	int mode;
 	struct seccomp_filter *filter;
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+	struct landlock_prog_set *landlock_prog_set;
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
 };
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 2a0bd9dd104d..a4927638be82 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -15,6 +15,7 @@
 #define SECCOMP_SET_MODE_STRICT		0
 #define SECCOMP_SET_MODE_FILTER		1
 #define SECCOMP_GET_ACTION_AVAIL	2
+#define SECCOMP_PREPEND_LANDLOCK_PROG	3
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
diff --git a/kernel/fork.c b/kernel/fork.c
index be8aa5b98666..5a5f8cbbfcb9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -48,6 +48,7 @@
 #include <linux/security.h>
 #include <linux/hugetlb.h>
 #include <linux/seccomp.h>
+#include <linux/landlock.h>
 #include <linux/swap.h>
 #include <linux/syscalls.h>
 #include <linux/jiffies.h>
@@ -385,6 +386,7 @@ void free_task(struct task_struct *tsk)
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
 	put_seccomp_filter(tsk);
+	put_seccomp_landlock(tsk);
 	arch_release_task_struct(tsk);
 	if (tsk->flags & PF_KTHREAD)
 		free_kthread_struct(tsk);
@@ -814,7 +816,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	 * the usage counts on the error path calling free_task.
 	 */
 	tsk->seccomp.filter = NULL;
-#endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+	tsk->seccomp.landlock_prog_set = NULL;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+#endif /* CONFIG_SECCOMP */
 
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
@@ -1496,6 +1501,7 @@ static void copy_seccomp(struct task_struct *p)
 
 	/* Ref-count the new filter user, and assign it. */
 	get_seccomp_filter(current);
+	get_seccomp_landlock(current);
 	p->seccomp = current->seccomp;
 
 	/*
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 940fa408a288..47a37f6c0dcd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -37,6 +37,7 @@
 #include <linux/security.h>
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
+#include <linux/landlock.h>
 
 /**
  * struct seccomp_filter - container for seccomp BPF programs
@@ -932,6 +933,9 @@ static long do_seccomp(unsigned int op, unsigned int flags,
 			return -EINVAL;
 
 		return seccomp_get_action_avail(uargs);
+	case SECCOMP_PREPEND_LANDLOCK_PROG:
+		return landlock_seccomp_prepend_prog(flags,
+				(const int __user *)uargs);
 	default:
 		return -EINVAL;
 	}
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 7205f9a7a2ee..05fce359028e 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := init.o
+landlock-y := init.o chain.o \
+	enforce.o enforce_seccomp.o
diff --git a/security/landlock/chain.c b/security/landlock/chain.c
new file mode 100644
index 000000000000..805f4cb60e7e
--- /dev/null
+++ b/security/landlock/chain.c
@@ -0,0 +1,39 @@
+/*
+ * Landlock LSM - chain helpers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/refcount.h>
+#include <linux/slab.h>
+
+#include "chain.h"
+
+/* TODO: use a dedicated kmem_cache_alloc() instead of k*alloc() */
+
+/* never return NULL */
+struct landlock_chain *landlock_new_chain(u8 index)
+{
+	struct landlock_chain *chain;
+
+	chain = kzalloc(sizeof(*chain), GFP_KERNEL);
+	if (!chain)
+		return ERR_PTR(-ENOMEM);
+	chain->index = index;
+	refcount_set(&chain->usage, 1);
+	return chain;
+}
+
+void landlock_put_chain(struct landlock_chain *chain)
+{
+	if (!chain)
+		return;
+	if (!refcount_dec_and_test(&chain->usage))
+		return;
+	kfree(chain);
+}
diff --git a/security/landlock/chain.h b/security/landlock/chain.h
new file mode 100644
index 000000000000..a1497ee779a6
--- /dev/null
+++ b/security/landlock/chain.h
@@ -0,0 +1,35 @@
+/*
+ * Landlock LSM - chain headers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_CHAIN_H
+#define _SECURITY_LANDLOCK_CHAIN_H
+
+#include <linux/landlock.h> /* struct landlock_chain */
+#include <linux/refcount.h>
+
+/*
+ * @chain_index: index of the chain (defined by the user, different from a
+ *		 program list)
+ * @next: point to the next sibling in the same prog_set (used to match a chain
+ *	  against the current process)
+ * @index: index in the array dedicated to store data for a chain instance
+ */
+struct landlock_chain {
+	struct landlock_chain *next;
+	refcount_t usage;
+	u8 index;
+	u8 shared:1;
+};
+
+struct landlock_chain *landlock_new_chain(u8 index);
+void landlock_put_chain(struct landlock_chain *chain);
+
+#endif /* _SECURITY_LANDLOCK_CHAIN_H */
diff --git a/security/landlock/common.h b/security/landlock/common.h
index 0906678c0ed0..245e4ccafcf2 100644
--- a/security/landlock/common.h
+++ b/security/landlock/common.h
@@ -29,4 +29,57 @@
 #define _LANDLOCK_TRIGGER_FS_PICK_LAST	LANDLOCK_TRIGGER_FS_PICK_WRITE
 #define _LANDLOCK_TRIGGER_FS_PICK_MASK	((_LANDLOCK_TRIGGER_FS_PICK_LAST << 1ULL) - 1)
 
+struct landlock_chain;
+
+/*
+ * @is_last_of_type: in a chain of programs, it marks if this program is the
+ *		     last of its type
+ */
+struct landlock_prog_list {
+	struct landlock_prog_list *prev;
+	struct bpf_prog *prog;
+	struct landlock_chain *chain;
+	refcount_t usage;
+	u8 is_last_of_type:1;
+};
+
+/**
+ * struct landlock_prog_set - Landlock programs enforced on a thread
+ *
+ * This is used for low performance impact when forking a process. Instead of
+ * copying the full array and incrementing the usage of each entries, only
+ * create a pointer to &struct landlock_prog_set and increments its usage. When
+ * prepending a new program, if &struct landlock_prog_set is shared with other
+ * tasks, then duplicate it and prepend the program to this new &struct
+ * landlock_prog_set.
+ *
+ * @usage: reference count to manage the object lifetime. When a thread need to
+ *	   add Landlock programs and if @usage is greater than 1, then the
+ *	   thread must duplicate &struct landlock_prog_set to not change the
+ *	   children's programs as well.
+ * @chain_last: chain of the last prepended program
+ * @programs: array of non-NULL &struct landlock_prog_list pointers
+ */
+struct landlock_prog_set {
+	struct landlock_chain *chain_last;
+	struct landlock_prog_list *programs[_LANDLOCK_HOOK_LAST];
+	refcount_t usage;
+};
+
+/**
+ * get_index - get an index for the programs of struct landlock_prog_set
+ *
+ * @type: a Landlock hook type
+ */
+static inline int get_index(enum landlock_hook_type type)
+{
+	/* type ID > 0 for loaded programs */
+	return type - 1;
+}
+
+static inline enum landlock_hook_type get_type(struct bpf_prog *prog)
+{
+	return prog->aux->extra->subtype.landlock_hook.type;
+}
+
 #endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git a/security/landlock/enforce.c b/security/landlock/enforce.c
new file mode 100644
index 000000000000..8846cfd9aff7
--- /dev/null
+++ b/security/landlock/enforce.c
@@ -0,0 +1,386 @@
+/*
+ * Landlock LSM - enforcing helpers
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/barrier.h> /* smp_store_release() */
+#include <asm/page.h> /* PAGE_SIZE */
+#include <linux/bpf.h> /* bpf_prog_put() */
+#include <linux/compiler.h> /* READ_ONCE() */
+#include <linux/err.h> /* PTR_ERR() */
+#include <linux/errno.h>
+#include <linux/filter.h> /* struct bpf_prog */
+#include <linux/refcount.h>
+#include <linux/slab.h> /* alloc(), kfree() */
+
+#include "chain.h"
+#include "common.h" /* struct landlock_prog_list */
+
+/* TODO: use a dedicated kmem_cache_alloc() instead of k*alloc() */
+
+static void put_landlock_prog_list(struct landlock_prog_list *prog_list)
+{
+	struct landlock_prog_list *orig = prog_list;
+
+	/* clean up single-reference branches iteratively */
+	while (orig && refcount_dec_and_test(&orig->usage)) {
+		struct landlock_prog_list *freeme = orig;
+
+		if (orig->prog)
+			bpf_prog_put(orig->prog);
+		landlock_put_chain(orig->chain);
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+
+void landlock_put_prog_set(struct landlock_prog_set *prog_set)
+{
+	if (prog_set && refcount_dec_and_test(&prog_set->usage)) {
+		size_t i;
+
+		for (i = 0; i < ARRAY_SIZE(prog_set->programs); i++)
+			put_landlock_prog_list(prog_set->programs[i]);
+		landlock_put_chain(prog_set->chain_last);
+		kfree(prog_set);
+	}
+}
+
+void landlock_get_prog_set(struct landlock_prog_set *prog_set)
+{
+	struct landlock_chain *chain;
+
+	if (!prog_set)
+		return;
+	refcount_inc(&prog_set->usage);
+	chain = prog_set->chain_last;
+	/* mark all inherited chains as (potentially) shared */
+	while (chain && !chain->shared) {
+		chain->shared = 1;
+		chain = chain->next;
+	}
+}
+
+static struct landlock_prog_set *new_landlock_prog_set(void)
+{
+	struct landlock_prog_set *ret;
+
+	/* array filled with NULL values */
+	ret = kzalloc(sizeof(*ret), GFP_KERNEL);
+	if (!ret)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&ret->usage, 1);
+	return ret;
+}
+
+/*
+ * If a program type is able to fork, this means that there is one amongst
+ * multiple programs (types) that may be called after, depending on the action
+ * type. This means that if a (sub)type has a "triggers" field (e.g. fs_pick),
+ * then it is forkable.
+ *
+ * Keep in sync with init.c:good_previous_prog().
+ */
+static bool is_hook_type_forkable(enum landlock_hook_type hook_type)
+{
+	switch (hook_type) {
+	case LANDLOCK_HOOK_FS_WALK:
+		return false;
+	case LANDLOCK_HOOK_FS_PICK:
+		/* can fork to fs_get or fs_ioctl... */
+		return true;
+	case LANDLOCK_HOOK_FS_GET:
+		return false;
+	}
+	WARN_ON(1);
+	return false;
+}
+
+/**
+ * store_landlock_prog - prepend and deduplicate a Landlock prog_list
+ *
+ * Prepend @prog to @init_prog_set while ignoring @prog and its chained programs
+ * if they are already in @ref_prog_set.  Whatever is the result of this
+ * function call, you can call bpf_prog_put(@prog) after.
+ *
+ * @init_prog_set: empty prog_set to prepend to
+ * @ref_prog_set: prog_set to check for duplicate programs
+ * @prog: program chain to prepend
+ *
+ * Return -errno on error or 0 if @prog was successfully stored.
+ */
+static int store_landlock_prog(struct landlock_prog_set *init_prog_set,
+		const struct landlock_prog_set *ref_prog_set,
+		struct bpf_prog *prog)
+{
+	struct landlock_prog_list *tmp_list = NULL;
+	int err;
+	u32 hook_idx;
+	bool new_is_last_of_type;
+	bool first = true;
+	struct landlock_chain *chain = NULL;
+	enum landlock_hook_type last_type;
+	struct bpf_prog *new = prog;
+
+	/* allocate all the memory we need */
+	for (; new; new = new->aux->extra->landlock_hook.previous) {
+		bool ignore = false;
+		struct landlock_prog_list *new_list;
+
+		new_is_last_of_type = first || (last_type != get_type(new));
+		last_type = get_type(new);
+		first = false;
+		/* ignore duplicate programs */
+		if (ref_prog_set) {
+			struct landlock_prog_list *ref;
+			struct bpf_prog *new_prev;
+
+			/*
+			 * The subtype verifier has already checked the
+			 * coherency of the program types chained in @new (cf.
+			 * good_previous_prog).
+			 *
+			 * Here we only allow linking to a chain if the common
+			 * program's type is able to fork (e.g. fs_pick) and
+			 * come from the same task (i.e. not shared).  This
+			 * program must also be the last one of its type in
+			 * both the @ref and the @new chains.  Finally, two
+			 * programs with the same parent must be of different
+			 * type.
+			 */
+			if (WARN_ON(!new->aux->extra))
+				continue;
+			new_prev = new->aux->extra->landlock_hook.previous;
+			hook_idx = get_index(get_type(new));
+			for (ref = ref_prog_set->programs[hook_idx];
+					ref; ref = ref->prev) {
+				struct bpf_prog *ref_prev;
+
+				ignore = (ref->prog == new);
+				if (ignore)
+					break;
+				ref_prev = ref->prog->aux->extra->
+					landlock_hook.previous;
+				/* deny fork to the same types */
+				if (new_prev && new_prev == ref_prev) {
+					err = -EINVAL;
+					goto put_tmp_list;
+				}
+			}
+			/* remaining programs are already in ref_prog_set */
+			if (ignore) {
+				bool is_forkable =
+					is_hook_type_forkable(get_type(new));
+
+				if (ref->chain->shared || !is_forkable ||
+						!new_is_last_of_type ||
+						!ref->is_last_of_type) {
+					err = -EINVAL;
+					goto put_tmp_list;
+				}
+				/* use the same session (i.e. cookie state) */
+				chain = ref->chain;
+				/* will increment the usage counter later */
+				break;
+			}
+		}
+
+		new = bpf_prog_inc(new);
+		if (IS_ERR(new)) {
+			err = PTR_ERR(new);
+			goto put_tmp_list;
+		}
+		new_list = kzalloc(sizeof(*new_list), GFP_KERNEL);
+		if (!new_list) {
+			bpf_prog_put(new);
+			err = -ENOMEM;
+			goto put_tmp_list;
+		}
+		/* ignore Landlock types in this tmp_list */
+		new_list->is_last_of_type = new_is_last_of_type;
+		new_list->prog = new;
+		new_list->prev = tmp_list;
+		refcount_set(&new_list->usage, 1);
+		tmp_list = new_list;
+	}
+
+	if (!tmp_list)
+		/* inform user space that this program was already added */
+		return -EEXIST;
+
+	if (!chain) {
+		u8 chain_index;
+
+		if (ref_prog_set) {
+			/* this is a new independent chain */
+			chain_index = ref_prog_set->chain_last->index + 1;
+			/* check for integer overflow */
+			if (chain_index < ref_prog_set->chain_last->index) {
+				err = -E2BIG;
+				goto put_tmp_list;
+			}
+		} else {
+			chain_index = 0;
+		}
+		chain = landlock_new_chain(chain_index);
+		if (IS_ERR(chain)) {
+			err = PTR_ERR(chain);
+			goto put_tmp_list;
+		}
+		/* no need to refcount_dec(&init_prog_set->chain_last) */
+	}
+	init_prog_set->chain_last = chain;
+
+	/* properly store the list (without error cases) */
+	while (tmp_list) {
+		struct landlock_prog_list *new_list;
+
+		new_list = tmp_list;
+		tmp_list = tmp_list->prev;
+		/* do not increment the previous prog list usage */
+		hook_idx = get_index(get_type(new_list->prog));
+		new_list->prev = init_prog_set->programs[hook_idx];
+		new_list->chain = chain;
+		refcount_inc(&chain->usage);
+		/* no need to add from the last program to the first because
+		 * each of them are a different Landlock type */
+		smp_store_release(&init_prog_set->programs[hook_idx], new_list);
+	}
+	return 0;
+
+put_tmp_list:
+	put_landlock_prog_list(tmp_list);
+	return err;
+}
+
+/* limit Landlock programs set to 256KB */
+#define LANDLOCK_PROGRAMS_MAX_PAGES (1 << 6)
+
+/**
+ * landlock_prepend_prog - attach a Landlock prog_list to @current_prog_set
+ *
+ * Whatever is the result of this function call, you can call
+ * bpf_prog_put(@prog) after.
+ *
+ * @current_prog_set: landlock_prog_set pointer, must be locked (if needed) to
+ *                    prevent a concurrent put/free. This pointer must not be
+ *                    freed after the call.
+ * @prog: non-NULL Landlock prog_list to prepend to @current_prog_set. @prog
+ *	  will be owned by landlock_prepend_prog() and freed if an error
+ *	  happened.
+ *
+ * Return @current_prog_set or a new pointer when OK. Return a pointer error
+ * otherwise.
+ */
+struct landlock_prog_set *landlock_prepend_prog(
+		struct landlock_prog_set *current_prog_set,
+		struct bpf_prog *prog)
+{
+	struct landlock_prog_set *new_prog_set = current_prog_set;
+	unsigned long pages;
+	int err;
+	size_t i;
+	struct landlock_prog_set tmp_prog_set = {};
+
+	if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
+		return ERR_PTR(-EINVAL);
+
+	/* validate memory size allocation */
+	pages = prog->pages;
+	if (current_prog_set) {
+		size_t i;
+
+		for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
+			struct landlock_prog_list *walker_p;
+
+			for (walker_p = current_prog_set->programs[i];
+					walker_p; walker_p = walker_p->prev)
+				pages += walker_p->prog->pages;
+		}
+		/* count a struct landlock_prog_set if we need to allocate one */
+		if (refcount_read(&current_prog_set->usage) != 1)
+			pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
+				/ PAGE_SIZE;
+	}
+	if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
+		return ERR_PTR(-E2BIG);
+
+	/* ensure early that we can allocate enough memory for the new
+	 * prog_lists */
+	err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
+	if (err)
+		return ERR_PTR(err);
+
+	/*
+	 * Each task_struct points to an array of prog list pointers.  These
+	 * tables are duplicated when additions are made (which means each
+	 * table needs to be refcounted for the processes using it). When a new
+	 * table is created, all the refcounters on the prog_list are bumped (to
+	 * track each table that references the prog). When a new prog is
+	 * added, it's just prepended to the list for the new table to point
+	 * at.
+	 *
+	 * Manage all the possible errors before this step to not uselessly
+	 * duplicate current_prog_set and avoid a rollback.
+	 */
+	if (!new_prog_set) {
+		/*
+		 * If there is no Landlock program set used by the current task,
+		 * then create a new one.
+		 */
+		new_prog_set = new_landlock_prog_set();
+		if (IS_ERR(new_prog_set))
+			goto put_tmp_lists;
+	} else if (refcount_read(&current_prog_set->usage) > 1) {
+		/*
+		 * If the current task is not the sole user of its Landlock
+		 * program set, then duplicate them.
+		 */
+		new_prog_set = new_landlock_prog_set();
+		if (IS_ERR(new_prog_set))
+			goto put_tmp_lists;
+		for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
+			new_prog_set->programs[i] =
+				READ_ONCE(current_prog_set->programs[i]);
+			if (new_prog_set->programs[i])
+				refcount_inc(&new_prog_set->programs[i]->usage);
+		}
+
+		/*
+		 * Landlock program set from the current task will not be freed
+		 * here because the usage is strictly greater than 1. It is
+		 * only prevented to be freed by another task thanks to the
+		 * caller of landlock_prepend_prog() which should be locked if
+		 * needed.
+		 */
+		landlock_put_prog_set(current_prog_set);
+	}
+
+	/* prepend tmp_prog_set to new_prog_set */
+	for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
+		/* get the last new list */
+		struct landlock_prog_list *last_list =
+			tmp_prog_set.programs[i];
+
+		if (last_list) {
+			while (last_list->prev)
+				last_list = last_list->prev;
+			/* no need to increment usage (pointer replacement) */
+			last_list->prev = new_prog_set->programs[i];
+			new_prog_set->programs[i] = tmp_prog_set.programs[i];
+		}
+	}
+	new_prog_set->chain_last = tmp_prog_set.chain_last;
+	return new_prog_set;
+
+put_tmp_lists:
+	for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
+		put_landlock_prog_list(tmp_prog_set.programs[i]);
+	return new_prog_set;
+}
diff --git a/security/landlock/enforce.h b/security/landlock/enforce.h
new file mode 100644
index 000000000000..27de15d4ca3e
--- /dev/null
+++ b/security/landlock/enforce.h
@@ -0,0 +1,21 @@
+/*
+ * Landlock LSM - enforcing helpers headers
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_ENFORCE_H
+#define _SECURITY_LANDLOCK_ENFORCE_H
+
+struct landlock_prog_set *landlock_prepend_prog(
+		struct landlock_prog_set *current_prog_set,
+		struct bpf_prog *prog);
+void landlock_put_prog_set(struct landlock_prog_set *prog_set);
+void landlock_get_prog_set(struct landlock_prog_set *prog_set);
+
+#endif /* _SECURITY_LANDLOCK_ENFORCE_H */
diff --git a/security/landlock/enforce_seccomp.c b/security/landlock/enforce_seccomp.c
new file mode 100644
index 000000000000..8da72e868422
--- /dev/null
+++ b/security/landlock/enforce_seccomp.c
@@ -0,0 +1,102 @@
+/*
+ * Landlock LSM - enforcing with seccomp
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifdef CONFIG_SECCOMP_FILTER
+
+#include <linux/bpf.h> /* bpf_prog_put() */
+#include <linux/capability.h>
+#include <linux/err.h> /* PTR_ERR() */
+#include <linux/errno.h>
+#include <linux/filter.h> /* struct bpf_prog */
+#include <linux/landlock.h>
+#include <linux/refcount.h>
+#include <linux/sched.h> /* current */
+#include <linux/uaccess.h> /* get_user() */
+
+#include "enforce.h"
+
+/* headers in include/linux/landlock.h */
+
+/**
+ * landlock_seccomp_prepend_prog - attach a Landlock program to the current
+ *                                 process
+ *
+ * current->seccomp.landlock_state->prog_set is lazily allocated. When a
+ * process fork, only a pointer is copied.  When a new program is added by a
+ * process, if there is other references to this process' prog_set, then a new
+ * allocation is made to contain an array pointing to Landlock program lists.
+ * This design enable low-performance impact and is memory efficient while
+ * keeping the property of prepend-only programs.
+ *
+ * For now, installing a Landlock prog requires that the requesting task has
+ * the global CAP_SYS_ADMIN. We cannot force the use of no_new_privs to not
+ * exclude containers where a process may legitimately acquire more privileges
+ * thanks to an SUID binary.
+ *
+ * @flags: not used for now, but could be used for TSYNC
+ * @user_bpf_fd: file descriptor pointing to a loaded Landlock prog
+ */
+int landlock_seccomp_prepend_prog(unsigned int flags,
+		const int __user *user_bpf_fd)
+{
+	struct landlock_prog_set *new_prog_set;
+	struct bpf_prog *prog;
+	int bpf_fd, err;
+
+	/* planned to be replaced with a no_new_privs check to allow
+	 * unprivileged tasks */
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	/* enable to check if Landlock is supported with early EFAULT */
+	if (!user_bpf_fd)
+		return -EFAULT;
+	if (flags)
+		return -EINVAL;
+	err = get_user(bpf_fd, user_bpf_fd);
+	if (err)
+		return err;
+
+	prog = bpf_prog_get(bpf_fd);
+	if (IS_ERR(prog)) {
+		err = PTR_ERR(prog);
+		goto free_task;
+	}
+
+	/*
+	 * We don't need to lock anything for the current process hierarchy,
+	 * everything is guarded by the atomic counters.
+	 */
+	new_prog_set = landlock_prepend_prog(
+			current->seccomp.landlock_prog_set, prog);
+	bpf_prog_put(prog);
+	/* @prog is managed/freed by landlock_prepend_prog() */
+	if (IS_ERR(new_prog_set)) {
+		err = PTR_ERR(new_prog_set);
+		goto free_task;
+	}
+	current->seccomp.landlock_prog_set = new_prog_set;
+	return 0;
+
+free_task:
+	return err;
+}
+
+void put_seccomp_landlock(struct task_struct *tsk)
+{
+	landlock_put_prog_set(tsk->seccomp.landlock_prog_set);
+}
+
+void get_seccomp_landlock(struct task_struct *tsk)
+{
+	landlock_get_prog_set(tsk->seccomp.landlock_prog_set);
+}
+
+#endif /* CONFIG_SECCOMP_FILTER */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (4 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-28 17:35   ` kbuild test robot
  2018-02-27  0:41 ` [PATCH bpf-next v8 07/11] landlock: Handle filesystem access control Mickaël Salaün
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

This new map store arbitrary 64-bits values referenced by inode keys.
The map can be updated from user space with file descriptor pointing to
inodes tied to a file system.  From an eBPF (Landlock) program point of
view, such a map is read-only and can only be used to retrieved a
64-bits value tied to a given inode.  This is useful to recognize an
inode tagged by user space, without access right to this inode (i.e. no
need to have a write access to this inode).

This also add new BPF map object types: landlock_tag_object and
landlock_chain.  The landlock_chain pointer is needed to be able to
handle multiple tags per inode.  The landlock_tag_object is needed to
update a reference to a list of shared tags.  This is typically used by
a struct file (reference) and a struct inode (shared list of tags).
This way, we can account the process/user for the number of tagged
files, while still being able to read the tags from the pointed inode.

Add dedicated BPF functions to handle this type of map:
* bpf_inode_map_update_elem()
* bpf_inode_map_lookup_elem()
* bpf_inode_map_delete_elem()

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Jann Horn <jann@thejh.net>
---

Changes since v7:
* new design with a dedicated map and a BPF function to tie a value to
  an inode
* add the ability to set or get a tag on an inode from a Landlock
  program

Changes since v6:
* remove WARN_ON() for missing dentry->d_inode
* refactor bpf_landlock_func_proto() (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes and rebase

Changes since v4:
* use a file abstraction (handle) to wrap inode, dentry, path and file
  structs
* remove bpf_landlock_cmp_fs_beneath()
* rename the BPF helper and move it to kernel/bpf/
* tighten helpers accessible by a Landlock rule

Changes since v3:
* remove bpf_landlock_cmp_fs_prop() (suggested by Alexei Starovoitov)
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
* add bpf_landlock_get_fs_mode() helper to check file type and mode
* merge WARN_ON() (suggested by Kees Cook)
* fix and update bpf_helpers.h
* use BPF_CALL_* for eBPF helpers (suggested by Alexei Starovoitov)
* make handle arraymap safe (RCU) and remove buggy synchronize_rcu()
* factor out the arraymay walk
* use size_t to index array (suggested by Jann Horn)

Changes since v2:
* add MNT_INTERNAL check to only add file handle from user-visible FS
  (e.g. no anonymous inode)
* replace struct file* with struct path* in map_landlock_handle
* add BPF protos
* fix bpf_landlock_cmp_fs_prop_with_struct_file()
---
 include/linux/bpf.h            |  18 ++
 include/linux/bpf_types.h      |   3 +
 include/linux/landlock.h       |  24 +++
 include/uapi/linux/bpf.h       |  22 ++-
 kernel/bpf/Makefile            |   3 +
 kernel/bpf/core.c              |   1 +
 kernel/bpf/helpers.c           |  38 +++++
 kernel/bpf/inodemap.c          | 318 +++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  27 ++-
 kernel/bpf/verifier.c          |  25 +++
 security/landlock/Makefile     |   1 +
 security/landlock/tag.c        | 373 +++++++++++++++++++++++++++++++++++++++++
 security/landlock/tag.h        |  36 ++++
 security/landlock/tag_fs.c     |  59 +++++++
 security/landlock/tag_fs.h     |  26 +++
 tools/include/uapi/linux/bpf.h |  22 ++-
 16 files changed, 993 insertions(+), 3 deletions(-)
 create mode 100644 kernel/bpf/inodemap.c
 create mode 100644 security/landlock/tag.c
 create mode 100644 security/landlock/tag.h
 create mode 100644 security/landlock/tag_fs.c
 create mode 100644 security/landlock/tag_fs.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 377b2f3519f3..c9b940a44c3e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -127,6 +127,10 @@ enum bpf_arg_type {
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
+
+	ARG_PTR_TO_INODE,	/* pointer to a struct inode */
+	ARG_PTR_TO_LL_TAG_OBJ,	/* pointer to a struct landlock_tag_object */
+	ARG_PTR_TO_LL_CHAIN,	/* pointer to a struct landlock_chain */
 };
 
 /* type of values returned from helper functions */
@@ -184,6 +188,9 @@ enum bpf_reg_type {
 	PTR_TO_PACKET_META,	 /* skb->data - meta_len */
 	PTR_TO_PACKET,		 /* reg points to skb->data */
 	PTR_TO_PACKET_END,	 /* skb->data + headlen */
+	PTR_TO_INODE,		 /* reg points to struct inode */
+	PTR_TO_LL_TAG_OBJ,	 /* reg points to struct landlock_tag_object */
+	PTR_TO_LL_CHAIN,	 /* reg points to struct landlock_chain */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -306,6 +313,10 @@ struct bpf_event_entry {
 	struct rcu_head rcu;
 };
 
+
+u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
+u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp);
 int bpf_prog_calc_tag(struct bpf_prog *fp);
 
@@ -447,6 +458,10 @@ void bpf_fd_array_map_clear(struct bpf_map *map);
 int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
 				void *key, void *value, u64 map_flags);
 int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
+int bpf_inode_map_update_elem(struct bpf_map *map, int *key, u64 *value,
+			      u64 flags);
+int bpf_inode_map_lookup_elem(struct bpf_map *map, int *key, u64 *value);
+int bpf_inode_map_delete_elem(struct bpf_map *map, int *key);
 
 int bpf_get_file_flag(int flags);
 
@@ -686,6 +701,9 @@ extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
+extern const struct bpf_func_proto bpf_inode_map_lookup_proto;
+extern const struct bpf_func_proto bpf_inode_get_tag_proto;
+extern const struct bpf_func_proto bpf_landlock_set_tag_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 0ca019f3ae4a..44dca1fa9d01 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -50,3 +50,6 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+BPF_MAP_TYPE(BPF_MAP_TYPE_INODE, inode_ops)
+#endif
diff --git a/include/linux/landlock.h b/include/linux/landlock.h
index 933d65c00075..e85c2c0ab582 100644
--- a/include/linux/landlock.h
+++ b/include/linux/landlock.h
@@ -15,6 +15,30 @@
 #include <linux/errno.h>
 #include <linux/sched.h> /* task_struct */
 
+struct inode;
+struct landlock_chain;
+struct landlock_tag_object;
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern u64 landlock_get_inode_tag(const struct inode *inode,
+		const struct landlock_chain *chain);
+extern int landlock_set_object_tag(struct landlock_tag_object *tag_obj,
+		struct landlock_chain *chain, u64 value);
+#else /* CONFIG_SECURITY_LANDLOCK */
+static inline u64 landlock_get_inode_tag(const struct inode *inode,
+		const struct landlock_chain *chain)
+{
+	WARN_ON(1);
+	return 0;
+}
+static inline int landlock_set_object_tag(struct landlock_tag_object *tag_obj,
+		struct landlock_chain *chain, u64 value)
+{
+	WARN_ON(1);
+	return -ENOTSUPP;
+}
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
 extern int landlock_seccomp_prepend_prog(unsigned int flags,
 		const int __user *user_bpf_fd);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2433aa1a0fd4..6dffd4ec7036 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -114,6 +114,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_DEVMAP,
 	BPF_MAP_TYPE_SOCKMAP,
 	BPF_MAP_TYPE_CPUMAP,
+	BPF_MAP_TYPE_INODE,
 };
 
 enum bpf_prog_type {
@@ -708,6 +709,22 @@ union bpf_attr {
  * int bpf_override_return(pt_regs, rc)
  *	@pt_regs: pointer to struct pt_regs
  *	@rc: the return value to set
+ *
+ * u64 bpf_inode_map_lookup(map, key)
+ *     @map: pointer to inode map
+ *     @key: pointer to inode
+ *     Return: value tied to this key, or zero if none
+ *
+ * u64 bpf_inode_get_tag(inode, chain)
+ *     @inode: pointer to struct inode
+ *     @chain: pointer to struct landlock_chain
+ *     Return: tag tied to this inode and chain, or zero if none
+ *
+ * int bpf_landlock_set_tag(tag_obj, chain, value)
+ *     @tag_obj: pointer pointing to a taggable object (e.g. inode)
+ *     @chain: pointer to struct landlock_chain
+ *     @value: value of the tag
+ *     Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -769,7 +786,10 @@ union bpf_attr {
 	FN(perf_prog_read_value),	\
 	FN(getsockopt),			\
 	FN(override_return),		\
-	FN(sock_ops_cb_flags_set),
+	FN(sock_ops_cb_flags_set),	\
+	FN(inode_map_lookup),		\
+	FN(inode_get_tag),		\
+	FN(landlock_set_tag),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index a713fd23ec88..68069d9630e1 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -18,3 +18,6 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
 endif
 obj-$(CONFIG_CGROUP_BPF) += cgroup.o
+ifeq ($(CONFIG_SECURITY_LANDLOCK),y)
+obj-$(CONFIG_BPF_SYSCALL) += inodemap.o
+endif
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index e4567f7434af..e32b184c0281 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1772,6 +1772,7 @@ const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
 const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 const struct bpf_func_proto bpf_sock_map_update_proto __weak;
+const struct bpf_func_proto bpf_inode_map_update_proto __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3d24e238221e..794bd6f604fc 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -18,6 +18,7 @@
 #include <linux/sched.h>
 #include <linux/uidgid.h>
 #include <linux/filter.h>
+#include <linux/landlock.h>
 
 /* If kernel subsystem is allowing eBPF programs to call this function,
  * inside its own verifier_ops->get_func_proto() callback it should return
@@ -179,3 +180,40 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
 	.arg2_type	= ARG_CONST_SIZE,
 };
+
+BPF_CALL_2(bpf_inode_get_tag, void *, inode, void *, chain)
+{
+	if (WARN_ON(!inode))
+		return 0;
+	if (WARN_ON(!chain))
+		return 0;
+
+	return landlock_get_inode_tag(inode, chain);
+}
+
+const struct bpf_func_proto bpf_inode_get_tag_proto = {
+	.func		= bpf_inode_get_tag,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_INODE,
+	.arg2_type	= ARG_PTR_TO_LL_CHAIN,
+};
+
+BPF_CALL_3(bpf_landlock_set_tag, void *, tag_obj, void *, chain, u64, value)
+{
+	if (WARN_ON(!tag_obj))
+		return -EFAULT;
+	if (WARN_ON(!chain))
+		return -EFAULT;
+
+	return landlock_set_object_tag(tag_obj, chain, value);
+}
+
+const struct bpf_func_proto bpf_landlock_set_tag_proto = {
+	.func		= bpf_landlock_set_tag,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_LL_TAG_OBJ,
+	.arg2_type	= ARG_PTR_TO_LL_CHAIN,
+	.arg3_type	= ARG_ANYTHING,
+};
diff --git a/kernel/bpf/inodemap.c b/kernel/bpf/inodemap.c
new file mode 100644
index 000000000000..27714d2bc1c7
--- /dev/null
+++ b/kernel/bpf/inodemap.c
@@ -0,0 +1,318 @@
+/*
+ * inode map for Landlock
+ *
+ * Copyright © 2017-2018 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/resource.h> /* RLIMIT_NOFILE */
+#include <linux/bpf.h>
+#include <linux/err.h>
+#include <linux/file.h> /* fput() */
+#include <linux/filter.h> /* BPF_CALL_2() */
+#include <linux/fs.h> /* struct file */
+#include <linux/mm.h>
+#include <linux/mount.h> /* MNT_INTERNAL */
+#include <linux/path.h> /* struct path */
+#include <linux/sched/signal.h> /* rlimit() */
+#include <linux/security.h>
+#include <linux/slab.h>
+
+struct inode_elem {
+	struct inode *inode;
+	u64 value;
+};
+
+struct inode_array {
+	struct bpf_map map;
+	size_t nb_entries;
+	struct inode_elem elems[0];
+};
+
+/* must call iput(inode) after this call */
+static struct inode *inode_from_fd(int ufd, bool check_access)
+{
+	struct inode *ret;
+	struct fd f;
+	int deny;
+
+	f = fdget(ufd);
+	if (unlikely(!f.file || !file_inode(f.file))) {
+		ret = ERR_PTR(-EBADF);
+		goto put_fd;
+	}
+	/* TODO: add this check when called from an eBPF program too (already
+	 * checked by the LSM parent hooks anyway) */
+	if (unlikely(IS_PRIVATE(file_inode(f.file)))) {
+		ret = ERR_PTR(-EINVAL);
+		goto put_fd;
+	}
+	/* check if the FD is tied to a mount point */
+	/* TODO: add this check when called from an eBPF program too */
+	if (unlikely(!f.file->f_path.mnt || f.file->f_path.mnt->mnt_flags &
+				MNT_INTERNAL)) {
+		ret = ERR_PTR(-EINVAL);
+		goto put_fd;
+	}
+	if (check_access) {
+		/* need to be allowed to access attributes from this file to
+		 * then be able to compare an inode to this entry */
+		deny = security_inode_getattr(&f.file->f_path);
+		if (deny) {
+			ret = ERR_PTR(deny);
+			goto put_fd;
+		}
+	}
+	ret = file_inode(f.file);
+	ihold(ret);
+
+put_fd:
+	fdput(f);
+	return ret;
+}
+
+/* (never) called from eBPF program */
+static int fake_map_delete_elem(struct bpf_map *map, void *key)
+{
+	WARN_ON(1);
+	return -EINVAL;
+}
+
+/* called from syscall */
+static int sys_inode_map_delete_elem(struct bpf_map *map, struct inode *key)
+{
+	struct inode_array *array = container_of(map, struct inode_array, map);
+	struct inode *inode;
+	int i;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	for (i = 0; i < array->map.max_entries; i++) {
+		if (array->elems[i].inode == key) {
+			inode = xchg(&array->elems[i].inode, NULL);
+			array->nb_entries--;
+			iput(inode);
+			return 0;
+		}
+	}
+	return -ENOENT;
+}
+
+/* called from syscall */
+int bpf_inode_map_delete_elem(struct bpf_map *map, int *key)
+{
+	struct inode *inode;
+	int err;
+
+	inode = inode_from_fd(*key, false);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+	err = sys_inode_map_delete_elem(map, inode);
+	iput(inode);
+	return err;
+}
+
+static void inode_map_free(struct bpf_map *map)
+{
+	struct inode_array *array = container_of(map, struct inode_array, map);
+	int i;
+
+	synchronize_rcu();
+	for (i = 0; i < array->map.max_entries; i++)
+		iput(array->elems[i].inode);
+	bpf_map_area_free(array);
+}
+
+static struct bpf_map *inode_map_alloc(union bpf_attr *attr)
+{
+	int numa_node = bpf_map_attr_numa_node(attr);
+	struct inode_array *array;
+	u64 array_size;
+
+	/* only allow root to create this type of map (for now), should be
+	 * removed when Landlock will be usable by unprivileged users */
+	if (!capable(CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	/* the key is a file descriptor and the value must be 64-bits (for
+	 * now) */
+	if (attr->max_entries == 0 || attr->key_size != sizeof(u32) ||
+	    attr->value_size != FIELD_SIZEOF(struct inode_elem, value) ||
+	    attr->map_flags & ~(BPF_F_RDONLY | BPF_F_WRONLY) ||
+	    numa_node != NUMA_NO_NODE)
+		return ERR_PTR(-EINVAL);
+
+	if (attr->value_size > KMALLOC_MAX_SIZE)
+		/* if value_size is bigger, the user space won't be able to
+		 * access the elements.
+		 */
+		return ERR_PTR(-E2BIG);
+
+	/*
+	 * Limit number of entries in an inode map to the maximum number of
+	 * open files for the current process. The maximum number of file
+	 * references (including all inode maps) for a process is then
+	 * (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
+	 * is 0, then any entry update is forbidden.
+	 *
+	 * An eBPF program can inherit all the inode map FD. The worse case is
+	 * to fill a bunch of arraymaps, create an eBPF program, close the
+	 * inode map FDs, and start again. The maximum number of inode map
+	 * entries can then be close to RLIMIT_NOFILE^3.
+	 */
+	if (attr->max_entries > rlimit(RLIMIT_NOFILE))
+		return ERR_PTR(-EMFILE);
+
+	array_size = sizeof(*array);
+	array_size += (u64) attr->max_entries * sizeof(struct inode_elem);
+
+	/* make sure there is no u32 overflow later in round_up() */
+	if (array_size >= U32_MAX - PAGE_SIZE)
+		return ERR_PTR(-ENOMEM);
+
+	/* allocate all map elements and zero-initialize them */
+	array = bpf_map_area_alloc(array_size, numa_node);
+	if (!array)
+		return ERR_PTR(-ENOMEM);
+
+	/* copy mandatory map attributes */
+	array->map.key_size = attr->key_size;
+	array->map.map_flags = attr->map_flags;
+	array->map.map_type = attr->map_type;
+	array->map.max_entries = attr->max_entries;
+	array->map.numa_node = numa_node;
+	array->map.pages = round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;
+	array->map.value_size = attr->value_size;
+
+	return &array->map;
+}
+
+/* (never) called from eBPF program */
+static void *fake_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	WARN_ON(1);
+	return ERR_PTR(-EINVAL);
+}
+
+/* called from syscall (wrapped) and eBPF program */
+static u64 inode_map_lookup_elem(struct bpf_map *map, struct inode *key)
+{
+	struct inode_array *array = container_of(map, struct inode_array, map);
+	size_t i;
+	u64 ret = 0;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	/* TODO: use rbtree to switch to O(log n) */
+	for (i = 0; i < array->map.max_entries; i++) {
+		if (array->elems[i].inode == key) {
+			ret = array->elems[i].value;
+			break;
+		}
+	}
+	return ret;
+}
+
+/* key is an FD when called from a syscall, but an inode pointer when called
+ * from an eBPF program */
+
+/* called from syscall */
+int bpf_inode_map_lookup_elem(struct bpf_map *map, int *key, u64 *value)
+{
+	struct inode *inode;
+
+	inode = inode_from_fd(*key, false);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+	*value = inode_map_lookup_elem(map, inode);
+	iput(inode);
+	if (!value)
+		return -ENOENT;
+	return 0;
+}
+
+/* (never) called from eBPF program */
+static int fake_map_update_elem(struct bpf_map *map, void *key, void *value,
+				u64 flags)
+{
+	WARN_ON(1);
+	/* do not leak an inode accessed by a Landlock program */
+	return -EINVAL;
+}
+
+/* called from syscall */
+static int sys_inode_map_update_elem(struct bpf_map *map, struct inode *key,
+		u64 *value, u64 flags)
+{
+	struct inode_array *array = container_of(map, struct inode_array, map);
+	size_t i;
+
+	if (unlikely(flags != BPF_ANY))
+		return -EINVAL;
+
+	if (unlikely(array->nb_entries >= array->map.max_entries))
+		/* all elements were pre-allocated, cannot insert a new one */
+		return -E2BIG;
+
+	for (i = 0; i < array->map.max_entries; i++) {
+		if (!array->elems[i].inode) {
+			/* the inode (key) is already grabbed by the caller */
+			ihold(key);
+			array->elems[i].inode = key;
+			array->elems[i].value = *value;
+			array->nb_entries++;
+			return 0;
+		}
+	}
+	WARN_ON(1);
+	return -ENOENT;
+}
+
+/* called from syscall */
+int bpf_inode_map_update_elem(struct bpf_map *map, int *key, u64 *value,
+			      u64 flags)
+{
+	struct inode *inode;
+	int err;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	inode = inode_from_fd(*key, true);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+	err = sys_inode_map_update_elem(map, inode, value, flags);
+	iput(inode);
+	return err;
+}
+
+/* called from syscall or (never) from eBPF program */
+static int fake_map_get_next_key(struct bpf_map *map, void *key,
+				 void *next_key)
+{
+	/* do not leak a file descriptor */
+	return -EINVAL;
+}
+
+/* void map for eBPF program */
+const struct bpf_map_ops inode_ops = {
+	.map_alloc = inode_map_alloc,
+	.map_free = inode_map_free,
+	.map_get_next_key = fake_map_get_next_key,
+	.map_lookup_elem = fake_map_lookup_elem,
+	.map_delete_elem = fake_map_delete_elem,
+	.map_update_elem = fake_map_update_elem,
+};
+
+BPF_CALL_2(bpf_inode_map_lookup, struct bpf_map *, map, void *, key)
+{
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	return inode_map_lookup_elem(map, key);
+}
+
+const struct bpf_func_proto bpf_inode_map_lookup_proto = {
+	.func		= bpf_inode_map_lookup,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_INODE,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 90d7de6d7393..fd140da20e68 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -554,6 +554,22 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 	return -ENOTSUPP;
 }
 
+int __weak bpf_inode_map_update_elem(struct bpf_map *map, int *key,
+				     u64 *value, u64 flags)
+{
+	return -ENOTSUPP;
+}
+
+int __weak bpf_inode_map_lookup_elem(struct bpf_map *map, int *key, u64 *value)
+{
+	return -ENOTSUPP;
+}
+
+int __weak bpf_inode_map_delete_elem(struct bpf_map *map, int *key)
+{
+	return -ENOTSUPP;
+}
+
 /* last field in 'union bpf_attr' used by this command */
 #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD value
 
@@ -614,6 +630,8 @@ static int map_lookup_elem(union bpf_attr *attr)
 		err = bpf_fd_array_map_lookup_elem(map, key, value);
 	} else if (IS_FD_HASH(map)) {
 		err = bpf_fd_htab_map_lookup_elem(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_INODE) {
+		err = bpf_inode_map_lookup_elem(map, key, value);
 	} else {
 		rcu_read_lock();
 		ptr = map->ops->map_lookup_elem(map, key);
@@ -719,6 +737,10 @@ static int map_update_elem(union bpf_attr *attr)
 		err = bpf_fd_htab_map_update_elem(map, f.file, key, value,
 						  attr->flags);
 		rcu_read_unlock();
+	} else if (map->map_type == BPF_MAP_TYPE_INODE) {
+		rcu_read_lock();
+		err = bpf_inode_map_update_elem(map, key, value, attr->flags);
+		rcu_read_unlock();
 	} else {
 		rcu_read_lock();
 		err = map->ops->map_update_elem(map, key, value, attr->flags);
@@ -776,7 +798,10 @@ static int map_delete_elem(union bpf_attr *attr)
 	preempt_disable();
 	__this_cpu_inc(bpf_prog_active);
 	rcu_read_lock();
-	err = map->ops->map_delete_elem(map, key);
+	if (map->map_type == BPF_MAP_TYPE_INODE)
+		err = bpf_inode_map_delete_elem(map, key);
+	else
+		err = map->ops->map_delete_elem(map, key);
 	rcu_read_unlock();
 	__this_cpu_dec(bpf_prog_active);
 	preempt_enable();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ed0905338bb6..4a13dda251a8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -224,6 +224,9 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_PACKET]		= "pkt",
 	[PTR_TO_PACKET_META]	= "pkt_meta",
 	[PTR_TO_PACKET_END]	= "pkt_end",
+	[PTR_TO_INODE]		= "inode",
+	[PTR_TO_LL_TAG_OBJ]	= "landlock_tag_object",
+	[PTR_TO_LL_CHAIN]	= "landlock_chain",
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
@@ -949,6 +952,9 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_PACKET_META:
 	case PTR_TO_PACKET_END:
 	case CONST_PTR_TO_MAP:
+	case PTR_TO_INODE:
+	case PTR_TO_LL_TAG_OBJ:
+	case PTR_TO_LL_CHAIN:
 		return true;
 	default:
 		return false;
@@ -1909,6 +1915,18 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		expected_type = PTR_TO_CTX;
 		if (type != expected_type)
 			goto err_type;
+	} else if (arg_type == ARG_PTR_TO_INODE) {
+		expected_type = PTR_TO_INODE;
+		if (type != expected_type)
+			goto err_type;
+	} else if (arg_type == ARG_PTR_TO_LL_TAG_OBJ) {
+		expected_type = PTR_TO_LL_TAG_OBJ;
+		if (type != expected_type)
+			goto err_type;
+	} else if (arg_type == ARG_PTR_TO_LL_CHAIN) {
+		expected_type = PTR_TO_LL_CHAIN;
+		if (type != expected_type)
+			goto err_type;
 	} else if (arg_type_is_mem_ptr(arg_type)) {
 		expected_type = PTR_TO_STACK;
 		/* One exception here. In case function allows for NULL to be
@@ -2066,6 +2084,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    func_id != BPF_FUNC_map_delete_elem)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_INODE:
+		if (func_id != BPF_FUNC_inode_map_lookup)
+			goto error;
+		break;
 	default:
 		break;
 	}
@@ -2108,6 +2130,9 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (map->map_type != BPF_MAP_TYPE_SOCKMAP)
 			goto error;
 		break;
+	case BPF_FUNC_inode_map_lookup:
+		if (map->map_type != BPF_MAP_TYPE_INODE)
+			goto error;
 	default:
 		break;
 	}
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 05fce359028e..0e1dd4612ecc 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
 landlock-y := init.o chain.o \
+	tag.o tag_fs.o \
 	enforce.o enforce_seccomp.o
diff --git a/security/landlock/tag.c b/security/landlock/tag.c
new file mode 100644
index 000000000000..3f7f0f04f220
--- /dev/null
+++ b/security/landlock/tag.c
@@ -0,0 +1,373 @@
+/*
+ * Landlock LSM - tag helpers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/atomic.h>
+#include <linux/landlock.h> /* landlock_set_object_tag */
+#include <linux/rculist.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+
+#include "chain.h"
+#include "tag.h"
+
+/* TODO: use a dedicated kmem_cache_alloc() instead of k*alloc() */
+
+/*
+ * @list_object: list of tags tied to a kernel object, e.g. inode
+ * @rcu_free: for freeing this tag
+ */
+struct landlock_tag {
+	struct list_head list_object;
+	struct rcu_head rcu_put;
+	struct landlock_chain *chain;
+	atomic64_t value;
+	/* usage is only for tag_ref, not for tag_root nor tag list */
+	refcount_t usage;
+};
+
+/* never return NULL */
+static struct landlock_tag *new_tag(struct landlock_chain *chain, u64 value)
+{
+	struct landlock_tag *tag;
+
+	tag = kzalloc(sizeof(*tag), GFP_ATOMIC);
+	if (!tag)
+		return ERR_PTR(-ENOMEM);
+	if (WARN_ON(!refcount_inc_not_zero(&chain->usage))) {
+		kfree(tag);
+		return ERR_PTR(-EFAULT);
+	}
+	tag->chain = chain;
+	INIT_LIST_HEAD(&tag->list_object);
+	refcount_set(&tag->usage, 1);
+	atomic64_set(&tag->value, value);
+	return tag;
+}
+
+static void free_tag(struct landlock_tag *tag)
+{
+	if (!tag)
+		return;
+	if (WARN_ON(refcount_read(&tag->usage)))
+		return;
+	landlock_put_chain(tag->chain);
+	kfree(tag);
+}
+
+struct landlock_tag_root {
+	spinlock_t appending;
+	struct list_head tag_list;
+	struct rcu_head rcu_put;
+	refcount_t tag_nb;
+};
+
+/* never return NULL */
+static struct landlock_tag_root *new_tag_root(struct landlock_chain *chain,
+		u64 value)
+{
+	struct landlock_tag_root *root;
+	struct landlock_tag *tag;
+
+	root = kzalloc(sizeof(*root), GFP_ATOMIC);
+	if (!root)
+		return ERR_PTR(-ENOMEM);
+	spin_lock_init(&root->appending);
+	refcount_set(&root->tag_nb, 1);
+	INIT_LIST_HEAD(&root->tag_list);
+
+	tag = new_tag(chain, value);
+	if (IS_ERR(tag)) {
+		kfree(root);
+		return ERR_CAST(tag);
+	}
+	list_add_tail(&tag->list_object, &root->tag_list);
+	return root;
+}
+
+static void free_tag_root(struct landlock_tag_root *root)
+{
+	if (!root)
+		return;
+	if (WARN_ON(refcount_read(&root->tag_nb)))
+		return;
+	/* the tag list should be singular it is a call from put_tag() or empty
+	 * if it is a call from landlock_set_tag():free_ref */
+	if (WARN_ON(!list_is_singular(&root->tag_list) &&
+				!list_empty(&root->tag_list)))
+		return;
+	kfree(root);
+}
+
+static void put_tag_root_rcu(struct rcu_head *head)
+{
+	struct landlock_tag_root *root;
+
+	root = container_of(head, struct landlock_tag_root, rcu_put);
+	free_tag_root(root);
+}
+
+/* return true if the tag_root is queued for freeing, false otherwise */
+static void put_tag_root(struct landlock_tag_root **root,
+		spinlock_t *root_lock)
+{
+	struct landlock_tag_root *freeme;
+
+	if (!root || WARN_ON(!root_lock))
+		return;
+
+	rcu_read_lock();
+	freeme = rcu_dereference(*root);
+	if (WARN_ON(!freeme))
+		goto out_rcu;
+	if (!refcount_dec_and_lock(&freeme->tag_nb, root_lock))
+		goto out_rcu;
+
+	rcu_assign_pointer(*root, NULL);
+	spin_unlock(root_lock);
+	call_rcu(&freeme->rcu_put, put_tag_root_rcu);
+
+out_rcu:
+	rcu_read_unlock();
+}
+
+static void put_tag_rcu(struct rcu_head *head)
+{
+	struct landlock_tag *tag;
+
+	tag = container_of(head, struct landlock_tag, rcu_put);
+	free_tag(tag);
+}
+
+/* put @tag if not recycled in an RCU */
+/* Only called to free an object; a chain deleting will happen after all the
+ * tagged struct files are deleted because their tied task is being deleted as
+ * well.  Then, there is no need to expressively delete the tag associated to a
+ * chain when this chain is getting deleted. */
+static void put_tag(struct landlock_tag *tag, struct landlock_tag_root **root,
+		spinlock_t *root_lock)
+{
+	if (!tag)
+		return;
+	if (!refcount_dec_and_test(&tag->usage))
+		return;
+	put_tag_root(root, root_lock);
+	list_del_rcu(&tag->list_object);
+	call_rcu(&tag->rcu_put, put_tag_rcu);
+}
+
+/*
+ * landlock_tag_ref - Account for tags
+ *
+ * @tag_nb: count the number of tags pointed by @tag, will free the struct when
+ *	    reaching zero
+ */
+struct landlock_tag_ref {
+	struct landlock_tag_ref *next;
+	struct landlock_tag *tag;
+};
+
+/* never return NULL */
+static struct landlock_tag_ref *landlock_new_tag_ref(void)
+{
+	struct landlock_tag_ref *ret;
+
+	ret = kzalloc(sizeof(*ret), GFP_ATOMIC);
+	if (!ret)
+		return ERR_PTR(-ENOMEM);
+	return ret;
+}
+
+void landlock_free_tag_ref(struct landlock_tag_ref *tag_ref,
+		struct landlock_tag_root **tag_root, spinlock_t *root_lock)
+{
+	while (tag_ref) {
+		struct landlock_tag_ref *freeme = tag_ref;
+
+		tag_ref = tag_ref->next;
+		put_tag(freeme->tag, tag_root, root_lock);
+		kfree(freeme);
+	}
+}
+
+/* tweaked from rculist.h */
+#define list_for_each_entry_nopre_rcu(pos, head, member)		\
+	for (; &pos->member != (head);					\
+	     pos = list_entry_rcu((pos)->member.next, typeof(*(pos)), member))
+
+int landlock_set_tag(struct landlock_tag_ref **tag_ref,
+		struct landlock_tag_root **tag_root,
+		spinlock_t *root_lock,
+		struct landlock_chain *chain, u64 value)
+{
+	struct landlock_tag_root *root;
+	struct landlock_tag_ref *ref, **ref_next, **ref_walk, **ref_prev;
+	struct landlock_tag *tag, *last_tag;
+	int err;
+
+	if (WARN_ON(!tag_ref) || WARN_ON(!tag_root))
+		return -EFAULT;
+
+	/* start by looking for a (protected) ref to the tag */
+	ref_walk = tag_ref;
+	ref_prev = tag_ref;
+	ref_next = tag_ref;
+	tag = NULL;
+	while (*ref_walk) {
+		ref_next = &(*ref_walk)->next;
+		if (!WARN_ON(!(*ref_walk)->tag) &&
+				(*ref_walk)->tag->chain == chain) {
+			tag = (*ref_walk)->tag;
+			break;
+		}
+		ref_prev = ref_walk;
+		ref_walk = &(*ref_walk)->next;
+	}
+	if (tag) {
+		if (value) {
+			/* the tag already exist (and is protected) */
+			atomic64_set(&tag->value, value);
+		} else {
+			/* a value of zero means to delete the tag */
+			put_tag(tag, tag_root, root_lock);
+			*ref_prev = *ref_next;
+			kfree(*ref_walk);
+		}
+		return 0;
+	} else if (!value) {
+		/* do not create a tag with a value of zero */
+		return 0;
+	}
+
+	/* create a new tag and a dedicated ref earlier to keep a consistent
+	 * usage of the tag in case of memory allocation error */
+	ref = landlock_new_tag_ref();
+	if (IS_ERR(ref))
+		return PTR_ERR(ref);
+
+	/* lock-less as possible */
+	rcu_read_lock();
+	root = rcu_dereference(*tag_root);
+	/* if tag_root does not exist or is being deleted */
+	if (!root || !refcount_inc_not_zero(&root->tag_nb)) {
+		/* may need to create a new tag_root */
+		spin_lock(root_lock);
+		/* the root may have been created meanwhile, recheck */
+		root = rcu_dereference(*tag_root);
+		if (root) {
+			refcount_inc(&root->tag_nb);
+			spin_unlock(root_lock);
+		} else {
+			/* create a tag_root populated with the tag */
+			root = new_tag_root(chain, value);
+			if (IS_ERR(root)) {
+				spin_unlock(root_lock);
+				err = PTR_ERR(root);
+				tag_root = NULL;
+				goto free_ref;
+			}
+			rcu_assign_pointer(*tag_root, root);
+			spin_unlock(root_lock);
+			tag = list_first_entry(&root->tag_list, typeof(*tag),
+					list_object);
+			goto register_tag;
+		}
+	}
+
+	last_tag = NULL;
+	/* look for the tag */
+	list_for_each_entry_rcu(tag, &root->tag_list, list_object) {
+		/* ignore tag being deleted */
+		if (tag->chain == chain &&
+				refcount_inc_not_zero(&tag->usage)) {
+			atomic64_set(&tag->value, value);
+			goto register_tag;
+		}
+		last_tag = tag;
+	}
+	/*
+	 * Did not find a matching chain: lock tag_root, continue an exclusive
+	 * appending walk through the list (a new tag may have been appended
+	 * after the first walk), and if not matching one of the potential new
+	 * tags, then append a new one.
+	 */
+	spin_lock(&root->appending);
+	if (last_tag)
+		tag = list_entry_rcu(last_tag->list_object.next, typeof(*tag),
+				list_object);
+	else
+		tag = list_entry_rcu(root->tag_list.next, typeof(*tag),
+				list_object);
+	list_for_each_entry_nopre_rcu(tag, &root->tag_list, list_object) {
+		/* ignore tag being deleted */
+		if (tag->chain == chain &&
+				refcount_inc_not_zero(&tag->usage)) {
+			spin_unlock(&root->appending);
+			atomic64_set(&tag->value, value);
+			goto register_tag;
+		}
+	}
+	/* did not find any tag, create a new one */
+	tag = new_tag(chain, value);
+	if (IS_ERR(tag)) {
+		spin_unlock(&root->appending);
+		err = PTR_ERR(tag);
+		goto free_ref;
+	}
+	list_add_tail_rcu(&tag->list_object, &root->tag_list);
+	spin_unlock(&root->appending);
+
+register_tag:
+	rcu_read_unlock();
+	ref->tag = tag;
+	*ref_next = ref;
+	return 0;
+
+free_ref:
+	put_tag_root(tag_root, root_lock);
+	rcu_read_unlock();
+	landlock_free_tag_ref(ref, NULL, NULL);
+	return err;
+}
+
+int landlock_set_object_tag(struct landlock_tag_object *tag_obj,
+		struct landlock_chain *chain, u64 value)
+{
+	if (WARN_ON(!tag_obj))
+		return -EFAULT;
+	return landlock_set_tag(tag_obj->ref, tag_obj->root, tag_obj->lock,
+			chain, value);
+}
+
+u64 landlock_get_tag(const struct landlock_tag_root *tag_root,
+		const struct landlock_chain *chain)
+{
+	const struct landlock_tag_root *root;
+	struct landlock_tag *tag;
+	u64 ret = 0;
+
+	rcu_read_lock();
+	root = rcu_dereference(tag_root);
+	if (!root)
+		goto out_rcu;
+
+	/* no need to check if it is being deleted, it is guarded by RCU */
+	list_for_each_entry_rcu(tag, &root->tag_list, list_object) {
+		/* may return to-be-deleted tag */
+		if (tag->chain == chain) {
+			ret = atomic64_read(&tag->value);
+			goto out_rcu;
+		}
+	}
+
+out_rcu:
+	rcu_read_unlock();
+	return ret;
+}
diff --git a/security/landlock/tag.h b/security/landlock/tag.h
new file mode 100644
index 000000000000..71ad9f9ef16e
--- /dev/null
+++ b/security/landlock/tag.h
@@ -0,0 +1,36 @@
+/*
+ * Landlock LSM - tag headers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_TAG_H
+#define _SECURITY_LANDLOCK_TAG_H
+
+#include <linux/spinlock_types.h>
+
+struct landlock_tag;
+struct landlock_tag_root;
+struct landlock_tag_ref;
+
+struct landlock_tag_object {
+	spinlock_t *lock;
+	struct landlock_tag_root **root;
+	struct landlock_tag_ref **ref;
+};
+
+int landlock_set_tag(struct landlock_tag_ref **tag_ref,
+		struct landlock_tag_root **tag_root,
+		spinlock_t *root_lock,
+		struct landlock_chain *chain, u64 value);
+u64 landlock_get_tag(const struct landlock_tag_root *tag_root,
+		const struct landlock_chain *chain);
+void landlock_free_tag_ref(struct landlock_tag_ref *tag_ref,
+		struct landlock_tag_root **tag_root, spinlock_t *root_lock);
+
+#endif /* _SECURITY_LANDLOCK_TAG_H */
diff --git a/security/landlock/tag_fs.c b/security/landlock/tag_fs.c
new file mode 100644
index 000000000000..86a48e8a61f4
--- /dev/null
+++ b/security/landlock/tag_fs.c
@@ -0,0 +1,59 @@
+/*
+ * Landlock LSM - tag FS helpers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/fs.h> /* struct inode */
+#include <linux/landlock.h> /* landlock_get_inode_tag */
+#include <linux/slab.h>
+
+#include "tag_fs.h"
+#include "tag.h"
+
+u64 landlock_get_inode_tag(const struct inode *inode,
+		const struct landlock_chain *chain)
+{
+	return landlock_get_tag(inode->i_security, chain);
+}
+
+/* never return NULL */
+struct landlock_tag_fs *landlock_new_tag_fs(struct inode *inode)
+{
+	struct landlock_tag_fs *tag_fs;
+
+	tag_fs = kmalloc(sizeof(*tag_fs), GFP_KERNEL);
+	if (!tag_fs)
+		return ERR_PTR(-ENOMEM);
+	ihold(inode);
+	tag_fs->inode = inode;
+	tag_fs->ref = NULL;
+	return tag_fs;
+}
+
+void landlock_reset_tag_fs(struct landlock_tag_fs *tag_fs, struct inode *inode)
+{
+	if (WARN_ON(!tag_fs))
+		return;
+	landlock_free_tag_ref(tag_fs->ref, (struct landlock_tag_root **)
+			&tag_fs->inode->i_security, &tag_fs->inode->i_lock);
+	iput(tag_fs->inode);
+	ihold(inode);
+	tag_fs->inode = inode;
+	tag_fs->ref = NULL;
+}
+
+void landlock_free_tag_fs(struct landlock_tag_fs *tag_fs)
+{
+	if (!tag_fs)
+		return;
+	landlock_free_tag_ref(tag_fs->ref, (struct landlock_tag_root **)
+			&tag_fs->inode->i_security, &tag_fs->inode->i_lock);
+	iput(tag_fs->inode);
+	kfree(tag_fs);
+}
diff --git a/security/landlock/tag_fs.h b/security/landlock/tag_fs.h
new file mode 100644
index 000000000000..a73b84c43d35
--- /dev/null
+++ b/security/landlock/tag_fs.h
@@ -0,0 +1,26 @@
+/*
+ * Landlock LSM - tag FS headers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_TAG_FS_H
+#define _SECURITY_LANDLOCK_TAG_FS_H
+
+#include <linux/fs.h> /* struct inode */
+
+struct landlock_tag_fs {
+	struct inode *inode;
+	struct landlock_tag_ref *ref;
+};
+
+struct landlock_tag_fs *landlock_new_tag_fs(struct inode *inode);
+void landlock_reset_tag_fs(struct landlock_tag_fs *tag_fs, struct inode *inode);
+void landlock_free_tag_fs(struct landlock_tag_fs *tag_fs);
+
+#endif /* _SECURITY_LANDLOCK_TAG_FS_H */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2433aa1a0fd4..6dffd4ec7036 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -114,6 +114,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_DEVMAP,
 	BPF_MAP_TYPE_SOCKMAP,
 	BPF_MAP_TYPE_CPUMAP,
+	BPF_MAP_TYPE_INODE,
 };
 
 enum bpf_prog_type {
@@ -708,6 +709,22 @@ union bpf_attr {
  * int bpf_override_return(pt_regs, rc)
  *	@pt_regs: pointer to struct pt_regs
  *	@rc: the return value to set
+ *
+ * u64 bpf_inode_map_lookup(map, key)
+ *     @map: pointer to inode map
+ *     @key: pointer to inode
+ *     Return: value tied to this key, or zero if none
+ *
+ * u64 bpf_inode_get_tag(inode, chain)
+ *     @inode: pointer to struct inode
+ *     @chain: pointer to struct landlock_chain
+ *     Return: tag tied to this inode and chain, or zero if none
+ *
+ * int bpf_landlock_set_tag(tag_obj, chain, value)
+ *     @tag_obj: pointer pointing to a taggable object (e.g. inode)
+ *     @chain: pointer to struct landlock_chain
+ *     @value: value of the tag
+ *     Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -769,7 +786,10 @@ union bpf_attr {
 	FN(perf_prog_read_value),	\
 	FN(getsockopt),			\
 	FN(override_return),		\
-	FN(sock_ops_cb_flags_set),
+	FN(sock_ops_cb_flags_set),	\
+	FN(inode_map_lookup),		\
+	FN(inode_get_tag),		\
+	FN(landlock_set_tag),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 07/11] landlock: Handle filesystem access control
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (5 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions Mickaël Salaün
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

This add three Landlock: FS_WALK, FS_PICK and FS_GET.

The FS_WALK hook is used to walk through a file path. A program tied to
this hook will be evaluated for each directory traversal except the last
one if it is the leaf of the path.

The FS_PICK hook is used to validate a set of actions requested on a
file. This actions are defined with triggers (e.g. read, write, open,
append...).

The FS_GET hook is used to tag open files, which is necessary to be able
to evaluate relative paths.  A program tied to this hook can tag a file
with an inode map.

A Landlock program can be chained to another if it is permitted by the
BPF verifier. A FS_WALK can be chained to a FS_PICK which can be chained
to a FS_GET.

The Landlock LSM hook registration is done after other LSM to only run
actions from user-space, via eBPF programs, if the access was granted by
major (privileged) LSMs.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v7:
* major rewrite with clean Landlock hooks able to deal with file paths

Changes since v6:
* add 3 more sub-events: IOCTL, LOCK, FCNTL
  https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35571@digikod.net
* use the new security_add_hooks()
* explain the -Werror=unused-function
* constify pointers
* cleanup headers

Changes since v5:
* split hooks.[ch] into hooks.[ch] and hooks_fs.[ch]
* add more documentation
* cosmetic fixes
* rebase (SCALAR_VALUE)

Changes since v4:
* add LSM hook abstraction called Landlock event
  * use the compiler type checking to verify hooks use by an event
  * handle all filesystem related LSM hooks (e.g. file_permission,
    mmap_file, sb_mount...)
* register BPF programs for Landlock just after LSM hooks registration
* move hooks registration after other LSMs
* add failsafes to check if a hook is not used by the kernel
* allow partial raw value access form the context (needed for programs
  generated by LLVM)

Changes since v3:
* split commit
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
---
 include/linux/lsm_hooks.h           |    5 +
 security/landlock/Makefile          |    5 +-
 security/landlock/common.h          |    9 +
 security/landlock/enforce_seccomp.c |   10 +
 security/landlock/hooks.c           |  121 +++++
 security/landlock/hooks.h           |   35 ++
 security/landlock/hooks_cred.c      |   52 ++
 security/landlock/hooks_cred.h      |    1 +
 security/landlock/hooks_fs.c        | 1021 +++++++++++++++++++++++++++++++++++
 security/landlock/hooks_fs.h        |   60 ++
 security/landlock/init.c            |   56 ++
 security/landlock/task.c            |   34 ++
 security/landlock/task.h            |   29 +
 security/security.c                 |   12 +-
 14 files changed, 1447 insertions(+), 3 deletions(-)
 create mode 100644 security/landlock/hooks.c
 create mode 100644 security/landlock/hooks.h
 create mode 100644 security/landlock/hooks_cred.c
 create mode 100644 security/landlock/hooks_cred.h
 create mode 100644 security/landlock/hooks_fs.c
 create mode 100644 security/landlock/hooks_fs.h
 create mode 100644 security/landlock/task.c
 create mode 100644 security/landlock/task.h

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d71cf183f0be..c40163385b68 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -2032,5 +2032,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 0e1dd4612ecc..d0f532a93b4e 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := init.o chain.o \
+landlock-y := init.o chain.o task.o \
 	tag.o tag_fs.o \
-	enforce.o enforce_seccomp.o
+	enforce.o enforce_seccomp.o \
+	hooks.o hooks_cred.o hooks_fs.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
index 245e4ccafcf2..6d36b70068d5 100644
--- a/security/landlock/common.h
+++ b/security/landlock/common.h
@@ -82,4 +82,13 @@ static inline enum landlock_hook_type get_type(struct bpf_prog *prog)
 	return prog->aux->extra->subtype.landlock_hook.type;
 }
 
+__maybe_unused
+static bool current_has_prog_type(enum landlock_hook_type hook_type)
+{
+	struct landlock_prog_set *prog_set;
+
+	prog_set = current->seccomp.landlock_prog_set;
+	return (prog_set && prog_set->programs[get_index(hook_type)]);
+}
+
 #endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git a/security/landlock/enforce_seccomp.c b/security/landlock/enforce_seccomp.c
index 8da72e868422..7d06ad26e0f8 100644
--- a/security/landlock/enforce_seccomp.c
+++ b/security/landlock/enforce_seccomp.c
@@ -22,6 +22,7 @@
 #include <linux/uaccess.h> /* get_user() */
 
 #include "enforce.h"
+#include "task.h"
 
 /* headers in include/linux/landlock.h */
 
@@ -64,6 +65,13 @@ int landlock_seccomp_prepend_prog(unsigned int flags,
 	if (err)
 		return err;
 
+	/* allocate current->security here to not have to handle this in
+	 * hook_nameidata_free_security() */
+	if (!current->security) {
+		current->security = landlock_new_task_security(GFP_KERNEL);
+		if (!current->security)
+			return -ENOMEM;
+	}
 	prog = bpf_prog_get(bpf_fd);
 	if (IS_ERR(prog)) {
 		err = PTR_ERR(prog);
@@ -86,6 +94,8 @@ int landlock_seccomp_prepend_prog(unsigned int flags,
 	return 0;
 
 free_task:
+	landlock_free_task_security(current->security);
+	current->security = NULL;
 	return err;
 }
 
diff --git a/security/landlock/hooks.c b/security/landlock/hooks.c
new file mode 100644
index 000000000000..e9535937a7b9
--- /dev/null
+++ b/security/landlock/hooks.c
@@ -0,0 +1,121 @@
+/*
+ * Landlock LSM - hook helpers
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/current.h>
+#include <linux/bpf.h> /* enum bpf_prog_aux */
+#include <linux/errno.h>
+#include <linux/filter.h> /* BPF_PROG_RUN() */
+#include <linux/rculist.h> /* list_add_tail_rcu */
+#include <uapi/linux/landlock.h> /* struct landlock_context */
+
+#include "common.h" /* struct landlock_rule, get_index() */
+#include "hooks.h" /* landlock_hook_ctx */
+
+#include "hooks_fs.h"
+
+/* return a Landlock program context (e.g. hook_ctx->fs_walk.prog_ctx) */
+static void *update_ctx(enum landlock_hook_type hook_type,
+		struct landlock_hook_ctx *hook_ctx,
+		const struct landlock_chain *chain)
+{
+	switch (hook_type) {
+	case LANDLOCK_HOOK_FS_WALK:
+		return landlock_update_ctx_fs_walk(hook_ctx->fs_walk, chain);
+	case LANDLOCK_HOOK_FS_PICK:
+		return landlock_update_ctx_fs_pick(hook_ctx->fs_pick, chain);
+	case LANDLOCK_HOOK_FS_GET:
+		return landlock_update_ctx_fs_get(hook_ctx->fs_get, chain);
+	}
+	WARN_ON(1);
+	return NULL;
+}
+
+/* save the program context (e.g. hook_ctx->fs_get.prog_ctx.inode_tag) */
+static int save_ctx(enum landlock_hook_type hook_type,
+		struct landlock_hook_ctx *hook_ctx,
+		struct landlock_chain *chain)
+{
+	switch (hook_type) {
+	case LANDLOCK_HOOK_FS_WALK:
+		return landlock_save_ctx_fs_walk(hook_ctx->fs_walk, chain);
+	case LANDLOCK_HOOK_FS_PICK:
+		return landlock_save_ctx_fs_pick(hook_ctx->fs_pick, chain);
+	case LANDLOCK_HOOK_FS_GET:
+		/* no need to save the cookie */
+		return 0;
+	}
+	WARN_ON(1);
+	return 1;
+}
+
+/**
+ * landlock_access_deny - run Landlock programs tied to a hook
+ *
+ * @hook_idx: hook index in the programs array
+ * @ctx: non-NULL valid eBPF context
+ * @prog_set: Landlock program set pointer
+ * @triggers: a bitmask to check if a program should be run
+ *
+ * Return true if at least one program return deny.
+ */
+static bool landlock_access_deny(enum landlock_hook_type hook_type,
+		struct landlock_hook_ctx *hook_ctx,
+		struct landlock_prog_set *prog_set, u64 triggers)
+{
+	struct landlock_prog_list *prog_list, *prev_list = NULL;
+	u32 hook_idx = get_index(hook_type);
+
+	if (!prog_set)
+		return false;
+
+	for (prog_list = prog_set->programs[hook_idx];
+			prog_list; prog_list = prog_list->prev) {
+		u32 ret;
+		void *prog_ctx;
+
+		/* check if @prog expect at least one of this triggers */
+		if (triggers && !(triggers & prog_list->prog->aux->extra->
+					subtype.landlock_hook.triggers))
+			continue;
+		prog_ctx = update_ctx(hook_type, hook_ctx, prog_list->chain);
+		if (!prog_ctx || WARN_ON(IS_ERR(prog_ctx)))
+			return true;
+		rcu_read_lock();
+		ret = BPF_PROG_RUN(prog_list->prog, prog_ctx);
+		rcu_read_unlock();
+		if (save_ctx(hook_type, hook_ctx, prog_list->chain))
+			return true;
+		/* deny access if a program returns a value different than 0 */
+		if (ret)
+			return true;
+		if (prev_list && prog_list->prev && prog_list->prev->prog->
+				aux->extra->subtype.landlock_hook.type ==
+				prev_list->prog->aux->extra->
+				subtype.landlock_hook.type)
+			WARN_ON(prog_list->prev != prev_list);
+		prev_list = prog_list;
+	}
+	return false;
+}
+
+int landlock_decide(enum landlock_hook_type hook_type,
+		struct landlock_hook_ctx *hook_ctx, u64 triggers)
+{
+	bool deny = false;
+
+#ifdef CONFIG_SECCOMP_FILTER
+	deny = landlock_access_deny(hook_type, hook_ctx,
+			current->seccomp.landlock_prog_set, triggers);
+#endif /* CONFIG_SECCOMP_FILTER */
+
+	/* should we use -EPERM or -EACCES? */
+	return deny ? -EACCES : 0;
+}
diff --git a/security/landlock/hooks.h b/security/landlock/hooks.h
new file mode 100644
index 000000000000..30ffd8ffa738
--- /dev/null
+++ b/security/landlock/hooks.h
@@ -0,0 +1,35 @@
+/*
+ * Landlock LSM - hooks helpers
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/current.h>
+#include <linux/sched.h> /* struct task_struct */
+#include <linux/seccomp.h>
+
+#include "hooks_fs.h"
+
+struct landlock_hook_ctx {
+	union {
+		struct landlock_hook_ctx_fs_walk *fs_walk;
+		struct landlock_hook_ctx_fs_pick *fs_pick;
+		struct landlock_hook_ctx_fs_get *fs_get;
+	};
+};
+
+static inline bool landlocked(const struct task_struct *task)
+{
+#ifdef CONFIG_SECCOMP_FILTER
+	return !!(task->seccomp.landlock_prog_set);
+#else
+	return false;
+#endif /* CONFIG_SECCOMP_FILTER */
+}
+
+int landlock_decide(enum landlock_hook_type, struct landlock_hook_ctx *, u64);
diff --git a/security/landlock/hooks_cred.c b/security/landlock/hooks_cred.c
new file mode 100644
index 000000000000..1e30b3a3fe0e
--- /dev/null
+++ b/security/landlock/hooks_cred.c
@@ -0,0 +1,52 @@
+/*
+ * Landlock LSM - private headers
+ *
+ * Copyright © 2017-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cred.h>
+#include <linux/lsm_hooks.h>
+#include <linux/slab.h> /* alloc(), kfree() */
+
+#include "common.h" /* LANDLOCK_NAME */
+#include "task.h"
+
+static void hook_cred_free(struct cred *cred)
+{
+	struct landlock_task_security *tsec = cred->security;
+
+	if (!tsec)
+		return;
+	cred->security = NULL;
+	landlock_free_task_security(tsec);
+}
+
+/* TODO: make Landlock exclusive until the LSM stacking infrastructure */
+static int hook_cred_prepare(struct cred *new, const struct cred *old,
+		gfp_t gfp)
+{
+	struct landlock_task_security *tsec;
+
+	/* TODO: only allocate if the current task is landlocked */
+	tsec = landlock_new_task_security(gfp);
+	if (!tsec)
+		return -ENOMEM;
+	new->security = tsec;
+	return 0;
+}
+
+static struct security_hook_list landlock_hooks[] = {
+	LSM_HOOK_INIT(cred_prepare, hook_cred_prepare),
+	LSM_HOOK_INIT(cred_free, hook_cred_free),
+};
+
+__init void landlock_add_hooks_cred(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			LANDLOCK_NAME);
+}
diff --git a/security/landlock/hooks_cred.h b/security/landlock/hooks_cred.h
new file mode 100644
index 000000000000..18ec646a7eb0
--- /dev/null
+++ b/security/landlock/hooks_cred.h
@@ -0,0 +1 @@
+__init void landlock_add_hooks_cred(void);
diff --git a/security/landlock/hooks_fs.c b/security/landlock/hooks_fs.c
new file mode 100644
index 000000000000..8f91800feef4
--- /dev/null
+++ b/security/landlock/hooks_fs.c
@@ -0,0 +1,1021 @@
+/*
+ * Landlock LSM - filesystem hooks
+ *
+ * Copyright © 2016-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h> /* enum bpf_access_type */
+#include <linux/kernel.h> /* ARRAY_SIZE */
+#include <linux/lsm_hooks.h>
+#include <linux/rcupdate.h> /* synchronize_rcu() */
+#include <linux/stat.h> /* S_ISDIR */
+#include <linux/stddef.h> /* offsetof */
+#include <linux/types.h> /* uintptr_t */
+#include <linux/workqueue.h> /* INIT_WORK() */
+
+/* permissions translation */
+#include <linux/fs.h> /* MAY_* */
+#include <linux/mman.h> /* PROT_* */
+#include <linux/namei.h>
+
+/* hook arguments */
+#include <linux/cred.h>
+#include <linux/dcache.h> /* struct dentry */
+#include <linux/fs.h> /* struct inode, struct iattr */
+#include <linux/mm_types.h> /* struct vm_area_struct */
+#include <linux/mount.h> /* struct vfsmount */
+#include <linux/path.h> /* struct path */
+#include <linux/sched.h> /* struct task_struct */
+#include <linux/time.h> /* struct timespec */
+
+#include "chain.h"
+#include "common.h"
+#include "hooks_fs.h"
+#include "hooks.h"
+#include "tag.h"
+#include "task.h"
+
+/* fs_pick */
+
+#include <asm/page.h> /* PAGE_SIZE */
+#include <asm/syscall.h>
+#include <linux/dcache.h> /* d_path, dentry_path_raw */
+#include <linux/err.h> /* *_ERR */
+#include <linux/gfp.h> /* __get_free_page, GFP_KERNEL */
+#include <linux/path.h> /* struct path */
+#include <linux/sched/task_stack.h> /* task_pt_regs dependency */
+
+bool landlock_is_valid_access_fs_pick(int off, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type, int *max_size)
+{
+	switch (off) {
+	case offsetof(struct landlock_ctx_fs_pick, cookie):
+		if (type != BPF_READ && type != BPF_WRITE)
+			return false;
+		*reg_type = SCALAR_VALUE;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_pick, chain):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = PTR_TO_LL_CHAIN;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_pick, inode):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = PTR_TO_INODE;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_pick, inode_lookup):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = SCALAR_VALUE;
+		/* TODO: check the bit mask */
+		*max_size = sizeof(u8);
+		return true;
+	default:
+		return false;
+	}
+}
+
+bool landlock_is_valid_access_fs_walk(int off, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type, int *max_size)
+{
+	switch (off) {
+	case offsetof(struct landlock_ctx_fs_walk, cookie):
+		if (type != BPF_READ && type != BPF_WRITE)
+			return false;
+		*reg_type = SCALAR_VALUE;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_walk, chain):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = PTR_TO_LL_CHAIN;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_walk, inode):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = PTR_TO_INODE;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_walk, inode_lookup):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = SCALAR_VALUE;
+		/* TODO: check the bit mask */
+		*max_size = sizeof(u8);
+		return true;
+	default:
+		return false;
+	}
+}
+
+bool landlock_is_valid_access_fs_get(int off, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type, int *max_size)
+{
+	switch (off) {
+	case offsetof(struct landlock_ctx_fs_get, cookie):
+		/* fs_get is the last possible hook, hence not useful to allow
+		 * cookie modification */
+		if (type != BPF_READ)
+			return false;
+		*reg_type = SCALAR_VALUE;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_get, chain):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = PTR_TO_LL_CHAIN;
+		*max_size = sizeof(u64);
+		return true;
+	case offsetof(struct landlock_ctx_fs_get, tag_object):
+		if (type != BPF_READ)
+			return false;
+		*reg_type = PTR_TO_LL_TAG_OBJ;
+		*max_size = sizeof(u64);
+		return true;
+	default:
+		return false;
+	}
+}
+
+/* fs_walk */
+
+struct landlock_walk_state {
+	u64 cookie;
+};
+
+struct landlock_walk_list {
+	/* array of states */
+	struct work_struct work;
+	struct landlock_walk_state *state;
+	struct inode *last_inode;
+	struct task_struct *task;
+	struct landlock_walk_list *next;
+	enum namei_type lookup_type;
+};
+
+/* allocate an array of states nested in a new struct landlock_walk_list */
+/* never return NULL */
+/* TODO: use a dedicated kmem_cache_alloc() instead of k*alloc() */
+static struct landlock_walk_list *new_walk_list(struct task_struct *task)
+{
+	struct landlock_walk_list *walk_list;
+	struct landlock_walk_state *walk_state;
+	struct landlock_prog_set *prog_set =
+		task->seccomp.landlock_prog_set;
+
+	/* allocate an array of cookies: one for each fs_walk program */
+	if (WARN_ON(!prog_set))
+		return ERR_PTR(-EFAULT);
+	/* fill with zero */
+	walk_state = kcalloc(prog_set->chain_last->index + 1,
+			sizeof(*walk_state), GFP_ATOMIC);
+	if (!walk_state)
+		return ERR_PTR(-ENOMEM);
+	walk_list = kzalloc(sizeof(*walk_list), GFP_ATOMIC);
+	if (!walk_list) {
+		kfree(walk_state);
+		return ERR_PTR(-ENOMEM);
+	}
+	walk_list->state = walk_state;
+	walk_list->task = task;
+	return walk_list;
+}
+
+static void free_walk_list(struct landlock_walk_list *walker)
+{
+	while (walker) {
+		struct landlock_walk_list *freeme = walker;
+
+		walker = walker->next;
+		/* iput() might sleep */
+		iput(freeme->last_inode);
+		kfree(freeme->state);
+		kfree(freeme);
+	}
+}
+
+/* called from workqueue */
+static void free_walk_list_deferred(struct work_struct *work)
+{
+	struct landlock_walk_list *walk_list;
+
+	synchronize_rcu();
+	walk_list = container_of(work, struct landlock_walk_list, work);
+	free_walk_list(walk_list);
+}
+
+void landlock_free_walk_list(struct landlock_walk_list *freeme)
+{
+	if (!freeme)
+		return;
+	INIT_WORK(&freeme->work, free_walk_list_deferred);
+	schedule_work(&freeme->work);
+}
+
+/* return NULL if there is no fs_walk programs */
+static struct landlock_walk_list *get_current_walk_list(
+		const struct inode *inode)
+{
+	struct landlock_walk_list **walk_list;
+	struct nameidata_lookup *lookup;
+
+	lookup = current_nameidata_lookup(inode);
+	if (IS_ERR(lookup))
+		/* -ENOENT */
+		return ERR_CAST(lookup);
+	if (WARN_ON(!lookup))
+		return ERR_PTR(-EFAULT);
+	walk_list = (struct landlock_walk_list **)&lookup->security;
+	if (!*walk_list) {
+		struct landlock_walk_list *new_list;
+
+		/* allocate a landlock_walk_list to be able to move it without
+		 * new allocation in hook_nameidata_put_lookup() */
+		new_list = new_walk_list(current);
+		if (IS_ERR_OR_NULL(new_list))
+			/* no fs_walk prog */
+			return ERR_CAST(new_list);
+		*walk_list = new_list;
+	}
+	(*walk_list)->lookup_type = lookup->type;
+	return *walk_list;
+}
+
+static inline u8 translate_lookup(enum namei_type type)
+{
+	/* TODO: Use bitmask instead, and add an autonomous LOOKUP_ROOT
+	 * (doesn't show when encountering a LAST_DOTDOT)? */
+	BUILD_BUG_ON(LAST_ROOT != LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_ROOT);
+	BUILD_BUG_ON(LAST_DOT != LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT);
+	BUILD_BUG_ON(LAST_DOTDOT != LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT);
+	return type & 3;
+}
+
+/* for now, handle syscalls dealing with up to 2 concurrent path walks */
+#define LANDLOCK_MAX_CONCURRENT_WALK 2
+
+/* retrieve the walk state strictly associated to an inode (i.e. when the
+ * actual walk is done) */
+/* never return NULL */
+static struct landlock_walk_list *get_saved_walk_list(struct inode *inode)
+{
+	struct landlock_task_security *tsec;
+	struct landlock_walk_list **walker, *walk_match = NULL;
+	unsigned int walk_nb = 0;
+
+	tsec = current_security();
+	if (WARN_ON(!tsec) || WARN_ON(!inode))
+		return ERR_PTR(-EFAULT);
+	/* find the walk that match the inode */
+	walker = &tsec->walk_list;
+	while (*walker) {
+		walk_nb++;
+		if (walk_nb > LANDLOCK_MAX_CONCURRENT_WALK) {
+			free_walk_list(*walker);
+			*walker = NULL;
+			break;
+		}
+		if (!walk_match && (*walker)->last_inode == inode)
+			walk_match = *walker;
+		walker = &(*walker)->next;
+	}
+	if (!walk_match) {
+		/* create empty walk states */
+		walk_match = new_walk_list(current);
+		if (WARN_ON(!walk_match))
+			return ERR_PTR(-EFAULT);
+		ihold(inode);
+		walk_match->last_inode = inode;
+		walk_match->next = tsec->walk_list;
+		tsec->walk_list = walk_match;
+	}
+	return walk_match;
+}
+
+/* Move the walk state/list in current->security.  It will be freed by
+ * hook_cred_free(). */
+static void hook_nameidata_put_lookup(struct nameidata_lookup *lookup,
+		struct inode *inode)
+{
+	struct landlock_task_security *tsec;
+	struct landlock_walk_list *walk_list = lookup->security;
+
+	if (!landlocked(current))
+		return;
+	if (!walk_list)
+		return;
+	if (!inode)
+		goto free_list;
+	if (WARN_ON(walk_list->task != current))
+		goto free_list;
+	tsec = current_security();
+	if (WARN_ON(!tsec))
+		goto free_list;
+	inode = igrab(inode);
+	if (!inode)
+		goto free_list;
+	walk_list->lookup_type = lookup->type;
+	walk_list->last_inode = inode;
+	walk_list->next = tsec->walk_list;
+	tsec->walk_list = walk_list;
+	return;
+
+free_list:
+	landlock_free_walk_list(walk_list);
+}
+
+struct landlock_hook_ctx_fs_walk {
+	struct landlock_walk_state *state;
+	struct landlock_ctx_fs_walk prog_ctx;
+};
+
+/* set cookie and chain */
+struct landlock_ctx_fs_walk *landlock_update_ctx_fs_walk(
+		struct landlock_hook_ctx_fs_walk *hook_ctx,
+		const struct landlock_chain *chain)
+{
+	if (WARN_ON(!hook_ctx))
+		return NULL;
+	if (WARN_ON(!hook_ctx->state))
+		return NULL;
+	/* cookie initially contains zero */
+	hook_ctx->prog_ctx.cookie = hook_ctx->state[chain->index].cookie;
+	hook_ctx->prog_ctx.chain = (uintptr_t)chain;
+	return &hook_ctx->prog_ctx;
+}
+
+/* save cookie */
+int landlock_save_ctx_fs_walk(struct landlock_hook_ctx_fs_walk *hook_ctx,
+		struct landlock_chain *chain)
+{
+	if (WARN_ON(!hook_ctx))
+		return 1;
+	if (WARN_ON(!hook_ctx->state))
+		return 1;
+	hook_ctx->state[chain->index].cookie = hook_ctx->prog_ctx.cookie;
+	return 0;
+}
+
+static int decide_fs_walk(int may_mask, struct inode *inode)
+{
+	struct landlock_walk_list *walk_list;
+	struct landlock_hook_ctx_fs_walk fs_walk = {};
+	struct landlock_hook_ctx hook_ctx = {
+		.fs_walk = &fs_walk,
+	};
+	const enum landlock_hook_type hook_type = LANDLOCK_HOOK_FS_WALK;
+
+	if (!current_has_prog_type(hook_type))
+		/* no fs_walk */
+		return 0;
+	if (WARN_ON(!inode))
+		return -EFAULT;
+	walk_list = get_current_walk_list(inode);
+	if (IS_ERR_OR_NULL(walk_list))
+		/* error or no fs_walk */
+		return PTR_ERR(walk_list);
+
+	fs_walk.state = walk_list->state;
+	/* init common data: inode, is_dot, is_dotdot, is_root */
+	fs_walk.prog_ctx.inode = (uintptr_t)inode;
+	fs_walk.prog_ctx.inode_lookup =
+		translate_lookup(walk_list->lookup_type);
+	return landlock_decide(hook_type, &hook_ctx, 0);
+}
+
+/* fs_pick */
+
+struct landlock_hook_ctx_fs_pick {
+	__u64 triggers;
+	struct landlock_walk_state *state;
+	struct landlock_ctx_fs_pick prog_ctx;
+};
+
+/* set cookie and chain */
+struct landlock_ctx_fs_pick *landlock_update_ctx_fs_pick(
+		struct landlock_hook_ctx_fs_pick *hook_ctx,
+		const struct landlock_chain *chain)
+{
+	if (WARN_ON(!hook_ctx))
+		return NULL;
+	if (WARN_ON(!hook_ctx->state))
+		return NULL;
+	/* cookie initially contains zero */
+	hook_ctx->prog_ctx.cookie = hook_ctx->state[chain->index].cookie;
+	hook_ctx->prog_ctx.chain = (uintptr_t)chain;
+	return &hook_ctx->prog_ctx;
+}
+
+/* save cookie */
+int landlock_save_ctx_fs_pick(struct landlock_hook_ctx_fs_pick *hook_ctx,
+		struct landlock_chain *chain)
+{
+	if (WARN_ON(!hook_ctx))
+		return 1;
+	if (WARN_ON(!hook_ctx->state))
+		return 1;
+	hook_ctx->state[chain->index].cookie = hook_ctx->prog_ctx.cookie;
+	return 0;
+}
+
+static int decide_fs_pick(__u64 triggers, struct inode *inode)
+{
+	struct landlock_walk_list *walk_list;
+	struct landlock_hook_ctx_fs_pick fs_pick = {};
+	struct landlock_hook_ctx hook_ctx = {
+		.fs_pick = &fs_pick,
+	};
+	const enum landlock_hook_type hook_type = LANDLOCK_HOOK_FS_PICK;
+
+	if (WARN_ON(!triggers))
+		return 0;
+	if (!current_has_prog_type(hook_type))
+		/* no fs_pick */
+		return 0;
+	if (WARN_ON(!inode))
+		return -EFAULT;
+	/* first, try to get the current walk (e.g. open(2)) */
+	walk_list = get_current_walk_list(inode);
+	if (!walk_list || PTR_ERR(walk_list) == -ENOENT) {
+		/* otherwise, the path walk may have end (e.g. access(2)) */
+		walk_list = get_saved_walk_list(inode);
+		if (IS_ERR(walk_list))
+			return PTR_ERR(walk_list);
+		if (WARN_ON(!walk_list))
+			return -EFAULT;
+	}
+	if (IS_ERR(walk_list))
+		return PTR_ERR(walk_list);
+
+	fs_pick.state = walk_list->state;
+	fs_pick.triggers = triggers,
+	/* init common data: inode */
+	fs_pick.prog_ctx.inode = (uintptr_t)inode;
+	fs_pick.prog_ctx.inode_lookup =
+		translate_lookup(walk_list->lookup_type);
+	return landlock_decide(hook_type, &hook_ctx, fs_pick.triggers);
+}
+
+/* fs_get */
+
+struct landlock_hook_ctx_fs_get {
+	struct landlock_walk_state *state;
+	struct landlock_ctx_fs_get prog_ctx;
+};
+
+/* set cookie and chain */
+struct landlock_ctx_fs_get *landlock_update_ctx_fs_get(
+		struct landlock_hook_ctx_fs_get *hook_ctx,
+		const struct landlock_chain *chain)
+{
+	if (WARN_ON(!hook_ctx))
+		return NULL;
+	if (WARN_ON(!hook_ctx->state))
+		return NULL;
+	hook_ctx->prog_ctx.cookie = hook_ctx->state[chain->index].cookie;
+	hook_ctx->prog_ctx.chain = (uintptr_t)chain;
+	return &hook_ctx->prog_ctx;
+}
+
+static int decide_fs_get(struct inode *inode,
+		struct landlock_tag_ref **tag_ref)
+{
+	struct landlock_walk_list *walk_list;
+	struct landlock_hook_ctx_fs_get fs_get = {};
+	struct landlock_hook_ctx hook_ctx = {
+		.fs_get = &fs_get,
+	};
+	struct landlock_tag_object tag_obj = {
+		.lock = &inode->i_lock,
+		.root = (struct landlock_tag_root **)&inode->i_security,
+		.ref = tag_ref,
+	};
+	const enum landlock_hook_type hook_type = LANDLOCK_HOOK_FS_GET;
+
+	if (!current_has_prog_type(hook_type))
+		/* no fs_get */
+		return 0;
+	if (WARN_ON(!inode))
+		return -EFAULT;
+	walk_list = get_saved_walk_list(inode);
+	if (IS_ERR(walk_list))
+		return PTR_ERR(walk_list);
+	if (WARN_ON(!walk_list))
+		return -EFAULT;
+	fs_get.state = walk_list->state;
+	/* init common data: tag_obj */
+	fs_get.prog_ctx.tag_object = (uintptr_t)&tag_obj;
+	return landlock_decide(hook_type, &hook_ctx, 0);
+}
+
+/* helpers */
+
+static u64 fs_may_to_triggers(int may_mask, umode_t mode)
+{
+	u64 ret = 0;
+
+	if (may_mask & MAY_EXEC)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_EXECUTE;
+	if (may_mask & MAY_READ) {
+		if (S_ISDIR(mode))
+			ret |= LANDLOCK_TRIGGER_FS_PICK_READDIR;
+		else
+			ret |= LANDLOCK_TRIGGER_FS_PICK_READ;
+	}
+	if (may_mask & MAY_WRITE)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_WRITE;
+	if (may_mask & MAY_APPEND)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_APPEND;
+	/* do not (re-)run fs_pick in hook_file_open() */
+	if (may_mask & MAY_OPEN)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_OPEN;
+	if (may_mask & MAY_CHROOT)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_CHROOT;
+	else if (may_mask & MAY_CHDIR)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_CHDIR;
+	/* XXX: ignore MAY_ACCESS */
+	WARN_ON(!ret);
+	return ret;
+}
+
+static inline u64 mem_prot_to_triggers(unsigned long prot, bool private)
+{
+	u64 ret = LANDLOCK_TRIGGER_FS_PICK_MAP;
+
+	/* private mapping do not write to files */
+	if (!private && (prot & PROT_WRITE))
+		ret |= LANDLOCK_TRIGGER_FS_PICK_WRITE;
+	if (prot & PROT_READ)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_READ;
+	if (prot & PROT_EXEC)
+		ret |= LANDLOCK_TRIGGER_FS_PICK_EXECUTE;
+	WARN_ON(!ret);
+	return ret;
+}
+
+/* binder hooks */
+
+static int hook_binder_transfer_file(struct task_struct *from,
+		struct task_struct *to, struct file *file)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!file))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_TRANSFER,
+			file_inode(file));
+}
+
+/* sb hooks */
+
+static int hook_sb_statfs(struct dentry *dentry)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_GETATTR,
+			dentry->d_inode);
+}
+
+/* TODO: handle mount source and remount */
+static int hook_sb_mount(const char *dev_name, const struct path *path,
+		const char *type, unsigned long flags, void *data)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!path))
+		return 0;
+	if (WARN_ON(!path->dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_MOUNTON,
+			path->dentry->d_inode);
+}
+
+/*
+ * The @old_path is similar to a destination mount point.
+ */
+static int hook_sb_pivotroot(const struct path *old_path,
+		const struct path *new_path)
+{
+	int err;
+	struct landlock_task_security *tsec;
+
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!old_path))
+		return 0;
+	if (WARN_ON(!old_path->dentry))
+		return 0;
+	err = decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_MOUNTON,
+			old_path->dentry->d_inode);
+	if (err)
+		return err;
+	err = decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_CHROOT,
+			new_path->dentry->d_inode);
+	if (err)
+		return err;
+
+	/* handle root directory tag */
+	tsec = current_security();
+	if (!tsec->root) {
+		struct landlock_tag_fs *new_tag_fs;
+
+		new_tag_fs = landlock_new_tag_fs(new_path->dentry->d_inode);
+		if (IS_ERR(new_tag_fs))
+			return PTR_ERR(new_tag_fs);
+		tsec->root = new_tag_fs;
+	} else {
+		landlock_reset_tag_fs(tsec->root, new_path->dentry->d_inode);
+	}
+	return decide_fs_get(tsec->root->inode, &tsec->root->ref);
+}
+
+/* inode hooks */
+
+/* a directory inode contains only one dentry */
+static int hook_inode_create(struct inode *dir, struct dentry *dentry,
+		umode_t mode)
+{
+	if (!landlocked(current))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_CREATE, dir);
+}
+
+static int hook_inode_link(struct dentry *old_dentry, struct inode *dir,
+		struct dentry *new_dentry)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!old_dentry)) {
+		int ret = decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_LINK,
+				old_dentry->d_inode);
+		if (ret)
+			return ret;
+	}
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_LINKTO, dir);
+}
+
+static int hook_inode_unlink(struct inode *dir, struct dentry *dentry)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_UNLINK,
+			dentry->d_inode);
+}
+
+static int hook_inode_symlink(struct inode *dir, struct dentry *dentry,
+		const char *old_name)
+{
+	if (!landlocked(current))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_CREATE, dir);
+}
+
+static int hook_inode_mkdir(struct inode *dir, struct dentry *dentry,
+		umode_t mode)
+{
+	if (!landlocked(current))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_CREATE, dir);
+}
+
+static int hook_inode_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_RMDIR, dentry->d_inode);
+}
+
+static int hook_inode_mknod(struct inode *dir, struct dentry *dentry,
+		umode_t mode, dev_t dev)
+{
+	if (!landlocked(current))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_CREATE, dir);
+}
+
+static int hook_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
+		struct inode *new_dir, struct dentry *new_dentry)
+{
+	if (!landlocked(current))
+		return 0;
+	/* TODO: add artificial walk session from old_dir to old_dentry */
+	if (!WARN_ON(!old_dentry)) {
+		int ret = decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_RENAME,
+				old_dentry->d_inode);
+		if (ret)
+			return ret;
+	}
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_RENAMETO, new_dir);
+}
+
+static int hook_inode_readlink(struct dentry *dentry)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_READ, dentry->d_inode);
+}
+
+/* ignore the inode_follow_link hook (could set is_symlink in the fs_walk
+ * context) */
+
+static int hook_inode_permission(struct inode *inode, int mask)
+{
+	int err;
+	u64 triggers;
+	struct landlock_tag_fs **tag_fs;
+	struct landlock_task_security *tsec;
+
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!inode))
+		return 0;
+
+	triggers = fs_may_to_triggers(mask, inode->i_mode);
+	/* decide_fs_walk() is exclusive with decide_fs_pick(): in a path walk,
+	 * ignore execute-only access on directory for any fs_pick program. */
+	if (triggers == LANDLOCK_TRIGGER_FS_PICK_EXECUTE &&
+			S_ISDIR(inode->i_mode))
+		return decide_fs_walk(mask, inode);
+
+	err = decide_fs_pick(triggers, inode);
+	if (err)
+		return err;
+
+	/* handle current working directory and root directory tags */
+	tsec = current_security();
+	if (triggers & LANDLOCK_TRIGGER_FS_PICK_CHDIR)
+		tag_fs = &tsec->cwd;
+	else if (triggers & LANDLOCK_TRIGGER_FS_PICK_CHROOT)
+		tag_fs = &tsec->root;
+	else
+		return 0;
+	if (!*tag_fs) {
+		struct landlock_tag_fs *new_tag_fs;
+
+		new_tag_fs = landlock_new_tag_fs(inode);
+		if (IS_ERR(new_tag_fs))
+			return PTR_ERR(new_tag_fs);
+		*tag_fs = new_tag_fs;
+	} else {
+		landlock_reset_tag_fs(*tag_fs, inode);
+	}
+	return decide_fs_get((*tag_fs)->inode, &(*tag_fs)->ref);
+}
+
+static int hook_inode_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_SETATTR,
+			dentry->d_inode);
+}
+
+static int hook_inode_getattr(const struct path *path)
+{
+	/* TODO: link parent inode and path */
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!path))
+		return 0;
+	if (WARN_ON(!path->dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_GETATTR,
+			path->dentry->d_inode);
+}
+
+static int hook_inode_setxattr(struct dentry *dentry, const char *name,
+		const void *value, size_t size, int flags)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_SETATTR,
+			dentry->d_inode);
+}
+
+static int hook_inode_getxattr(struct dentry *dentry, const char *name)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_GETATTR,
+			dentry->d_inode);
+}
+
+static int hook_inode_listxattr(struct dentry *dentry)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_GETATTR,
+			dentry->d_inode);
+}
+
+static int hook_inode_removexattr(struct dentry *dentry, const char *name)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!dentry))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_SETATTR,
+			dentry->d_inode);
+}
+
+static int hook_inode_getsecurity(struct inode *inode, const char *name,
+		void **buffer, bool alloc)
+{
+	if (!landlocked(current))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_GETATTR, inode);
+}
+
+static int hook_inode_setsecurity(struct inode *inode, const char *name,
+		const void *value, size_t size, int flag)
+{
+	if (!landlocked(current))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_SETATTR, inode);
+}
+
+static int hook_inode_listsecurity(struct inode *inode, char *buffer,
+		size_t buffer_size)
+{
+	if (!landlocked(current))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_GETATTR, inode);
+}
+
+/* file hooks */
+
+static int hook_file_ioctl(struct file *file, unsigned int cmd,
+		unsigned long arg)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!file))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_IOCTL,
+			file_inode(file));
+}
+
+static int hook_file_lock(struct file *file, unsigned int cmd)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!file))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_LOCK, file_inode(file));
+}
+
+static int hook_file_fcntl(struct file *file, unsigned int cmd,
+		unsigned long arg)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!file))
+		return 0;
+	return decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_FCNTL,
+			file_inode(file));
+}
+
+static int hook_mmap_file(struct file *file, unsigned long reqprot,
+		unsigned long prot, unsigned long flags)
+{
+	if (!landlocked(current))
+		return 0;
+	/* file can be null for anonymous mmap */
+	if (!file)
+		return 0;
+	return decide_fs_pick(mem_prot_to_triggers(prot, flags & MAP_PRIVATE),
+			file_inode(file));
+}
+
+static int hook_file_mprotect(struct vm_area_struct *vma,
+		unsigned long reqprot, unsigned long prot)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!vma))
+		return 0;
+	if (!vma->vm_file)
+		return 0;
+	return decide_fs_pick(mem_prot_to_triggers(prot,
+				!(vma->vm_flags & VM_SHARED)),
+			file_inode(vma->vm_file));
+}
+
+static int hook_file_receive(struct file *file)
+{
+	int err;
+
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!file))
+		return 0;
+	err = decide_fs_pick(LANDLOCK_TRIGGER_FS_PICK_RECEIVE,
+			file_inode(file));
+	if (err)
+		return err;
+
+	return decide_fs_get(file_inode(file),
+			(struct landlock_tag_ref **)&file->f_security);
+}
+
+static int hook_file_open(struct file *file, const struct cred *cred)
+{
+	if (!landlocked(current))
+		return 0;
+	if (WARN_ON(!file))
+		return 0;
+	/* do not re-run fs_pick/LANDLOCK_TRIGGER_FS_PICK_OPEN here for now */
+	return decide_fs_get(file_inode(file),
+			(struct landlock_tag_ref **)&file->f_security);
+}
+
+static void hook_inode_free_security(struct inode *inode)
+{
+	if (!landlocked(current))
+		return;
+	WARN_ON(inode->i_security);
+}
+
+static void hook_file_free_security(struct file *file)
+{
+	if (!landlocked(current))
+		return;
+	/* free inode tags */
+	if (!file_inode(file))
+		return;
+	landlock_free_tag_ref(file->f_security, (struct landlock_tag_root **)
+			&file_inode(file)->i_security,
+			&file_inode(file)->i_lock);
+}
+
+static struct security_hook_list landlock_hooks[] = {
+	LSM_HOOK_INIT(binder_transfer_file, hook_binder_transfer_file),
+
+	LSM_HOOK_INIT(sb_statfs, hook_sb_statfs),
+	LSM_HOOK_INIT(sb_mount, hook_sb_mount),
+	LSM_HOOK_INIT(sb_pivotroot, hook_sb_pivotroot),
+
+	LSM_HOOK_INIT(inode_create, hook_inode_create),
+	LSM_HOOK_INIT(inode_link, hook_inode_link),
+	LSM_HOOK_INIT(inode_unlink, hook_inode_unlink),
+	LSM_HOOK_INIT(inode_symlink, hook_inode_symlink),
+	LSM_HOOK_INIT(inode_mkdir, hook_inode_mkdir),
+	LSM_HOOK_INIT(inode_rmdir, hook_inode_rmdir),
+	LSM_HOOK_INIT(inode_mknod, hook_inode_mknod),
+	LSM_HOOK_INIT(inode_rename, hook_inode_rename),
+	LSM_HOOK_INIT(inode_readlink, hook_inode_readlink),
+	LSM_HOOK_INIT(inode_permission, hook_inode_permission),
+	LSM_HOOK_INIT(inode_setattr, hook_inode_setattr),
+	LSM_HOOK_INIT(inode_getattr, hook_inode_getattr),
+	LSM_HOOK_INIT(inode_setxattr, hook_inode_setxattr),
+	LSM_HOOK_INIT(inode_getxattr, hook_inode_getxattr),
+	LSM_HOOK_INIT(inode_listxattr, hook_inode_listxattr),
+	LSM_HOOK_INIT(inode_removexattr, hook_inode_removexattr),
+	LSM_HOOK_INIT(inode_getsecurity, hook_inode_getsecurity),
+	LSM_HOOK_INIT(inode_setsecurity, hook_inode_setsecurity),
+	LSM_HOOK_INIT(inode_listsecurity, hook_inode_listsecurity),
+	LSM_HOOK_INIT(nameidata_put_lookup, hook_nameidata_put_lookup),
+
+	/* do not handle file_permission for now */
+	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
+	LSM_HOOK_INIT(file_free_security, hook_file_free_security),
+	LSM_HOOK_INIT(file_ioctl, hook_file_ioctl),
+	LSM_HOOK_INIT(file_lock, hook_file_lock),
+	LSM_HOOK_INIT(file_fcntl, hook_file_fcntl),
+	LSM_HOOK_INIT(mmap_file, hook_mmap_file),
+	LSM_HOOK_INIT(file_mprotect, hook_file_mprotect),
+	LSM_HOOK_INIT(file_receive, hook_file_receive),
+	LSM_HOOK_INIT(file_open, hook_file_open),
+};
+
+__init void landlock_add_hooks_fs(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			LANDLOCK_NAME);
+}
diff --git a/security/landlock/hooks_fs.h b/security/landlock/hooks_fs.h
new file mode 100644
index 000000000000..71cd2e7c47d4
--- /dev/null
+++ b/security/landlock/hooks_fs.h
@@ -0,0 +1,60 @@
+/*
+ * Landlock LSM - filesystem hooks
+ *
+ * Copyright © 2017-2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h> /* enum bpf_access_type */
+
+#include "common.h" /* struct landlock_chain */
+
+/* needed for struct landlock_task_security */
+struct landlock_walk_list;
+
+void landlock_free_walk_list(struct landlock_walk_list *freeme);
+
+__init void landlock_add_hooks_fs(void);
+
+/* fs_pick */
+
+struct landlock_hook_ctx_fs_pick;
+
+bool landlock_is_valid_access_fs_pick(int off, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type, int *max_size);
+
+struct landlock_ctx_fs_pick *landlock_update_ctx_fs_pick(
+		struct landlock_hook_ctx_fs_pick *hook_ctx,
+		const struct landlock_chain *chain);
+
+int landlock_save_ctx_fs_pick(struct landlock_hook_ctx_fs_pick *hook_ctx,
+		struct landlock_chain *chain);
+
+/* fs_walk */
+
+struct landlock_hook_ctx_fs_walk;
+
+bool landlock_is_valid_access_fs_walk(int off, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type, int *max_size);
+
+struct landlock_ctx_fs_walk *landlock_update_ctx_fs_walk(
+		struct landlock_hook_ctx_fs_walk *hook_ctx,
+		const struct landlock_chain *chain);
+
+int landlock_save_ctx_fs_walk(struct landlock_hook_ctx_fs_walk *hook_ctx,
+		struct landlock_chain *chain);
+
+/* fs_get */
+
+struct landlock_hook_ctx_fs_get;
+
+bool landlock_is_valid_access_fs_get(int off, enum bpf_access_type type,
+		enum bpf_reg_type *reg_type, int *max_size);
+
+struct landlock_ctx_fs_get *landlock_update_ctx_fs_get(
+		struct landlock_hook_ctx_fs_get *hook_ctx,
+		const struct landlock_chain *chain);
diff --git a/security/landlock/init.c b/security/landlock/init.c
index ef2ee0742c53..3486272d17b2 100644
--- a/security/landlock/init.c
+++ b/security/landlock/init.c
@@ -12,8 +12,11 @@
 #include <linux/bpf.h> /* enum bpf_access_type */
 #include <linux/capability.h> /* capable */
 #include <linux/filter.h> /* struct bpf_prog */
+#include <linux/lsm_hooks.h>
 
 #include "common.h" /* LANDLOCK_* */
+#include "hooks_fs.h"
+#include "hooks_cred.h"
 
 static bool bpf_landlock_is_valid_access(int off, int size,
 		enum bpf_access_type type, struct bpf_insn_access_aux *info,
@@ -32,6 +35,28 @@ static bool bpf_landlock_is_valid_access(int off, int size,
 	if (size <= 0 || size > sizeof(__u64))
 		return false;
 
+	/* set register type and max size */
+	switch (prog_subtype->landlock_hook.type) {
+	case LANDLOCK_HOOK_FS_PICK:
+		if (!landlock_is_valid_access_fs_pick(off, type, &reg_type,
+					&max_size))
+			return false;
+		break;
+	case LANDLOCK_HOOK_FS_WALK:
+		if (!landlock_is_valid_access_fs_walk(off, type, &reg_type,
+					&max_size))
+			return false;
+		break;
+	case LANDLOCK_HOOK_FS_GET:
+		if (!landlock_is_valid_access_fs_get(off, type, &reg_type,
+					&max_size))
+			return false;
+		break;
+	default:
+		WARN_ON(1);
+		return false;
+	}
+
 	/* check memory range access */
 	switch (reg_type) {
 	case NOT_INIT:
@@ -158,6 +183,30 @@ static const struct bpf_func_proto *bpf_landlock_func_proto(
 	default:
 		break;
 	}
+
+	switch (hook_type) {
+	case LANDLOCK_HOOK_FS_WALK:
+	case LANDLOCK_HOOK_FS_PICK:
+		switch (func_id) {
+		case BPF_FUNC_inode_map_lookup:
+			return &bpf_inode_map_lookup_proto;
+		case BPF_FUNC_inode_get_tag:
+			return &bpf_inode_get_tag_proto;
+		default:
+			break;
+		}
+		break;
+	case LANDLOCK_HOOK_FS_GET:
+		switch (func_id) {
+		case BPF_FUNC_inode_get_tag:
+			return &bpf_inode_get_tag_proto;
+		case BPF_FUNC_landlock_set_tag:
+			return &bpf_landlock_set_tag_proto;
+		default:
+			break;
+		}
+		break;
+	}
 	return NULL;
 }
 
@@ -178,3 +227,10 @@ const struct bpf_verifier_ops landlock_verifier_ops = {
 const struct bpf_prog_ops landlock_prog_ops = {
 	.put_extra = bpf_landlock_put_extra,
 };
+
+void __init landlock_add_hooks(void)
+{
+	pr_info(LANDLOCK_NAME ": Ready to sandbox with seccomp\n");
+	landlock_add_hooks_cred();
+	landlock_add_hooks_fs();
+}
diff --git a/security/landlock/task.c b/security/landlock/task.c
new file mode 100644
index 000000000000..8932570d3314
--- /dev/null
+++ b/security/landlock/task.c
@@ -0,0 +1,34 @@
+/*
+ * Landlock LSM - task helpers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/slab.h>
+#include <linux/types.h> /* gfp_t */
+
+#include "hooks_fs.h" /* landlock_free_walk_list() */
+#include "tag_fs.h"
+#include "task.h"
+
+/* TODO: inherit tsec->root and tsec->cwd on fork/execve */
+
+void landlock_free_task_security(struct landlock_task_security *tsec)
+{
+	if (!tsec)
+		return;
+	landlock_free_walk_list(tsec->walk_list);
+	landlock_free_tag_fs(tsec->root);
+	landlock_free_tag_fs(tsec->cwd);
+	kfree(tsec);
+}
+
+struct landlock_task_security *landlock_new_task_security(gfp_t gfp)
+{
+	return kzalloc(sizeof(struct landlock_task_security), gfp);
+}
diff --git a/security/landlock/task.h b/security/landlock/task.h
new file mode 100644
index 000000000000..31e640a6a4cb
--- /dev/null
+++ b/security/landlock/task.h
@@ -0,0 +1,29 @@
+/*
+ * Landlock LSM - task headers
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018 ANSSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SECURITY_LANDLOCK_TASK_H
+#define _SECURITY_LANDLOCK_TASK_H
+
+#include <linux/types.h> /* gfp_t */
+
+#include "hooks_fs.h"
+#include "tag_fs.h"
+
+/* exclusively used by the current task (i.e. no concurrent access) */
+struct landlock_task_security {
+	struct landlock_walk_list *walk_list;
+	struct landlock_tag_fs *root, *cwd;
+};
+
+struct landlock_task_security *landlock_new_task_security(gfp_t gfp);
+void landlock_free_task_security(struct landlock_task_security *tsec);
+
+#endif /* _SECURITY_LANDLOCK_TASK_H */
diff --git a/security/security.c b/security/security.c
index 17053c7a1a77..5000b64a5363 100644
--- a/security/security.c
+++ b/security/security.c
@@ -76,10 +76,20 @@ int __init security_init(void)
 	loadpin_add_hooks();
 
 	/*
-	 * Load all the remaining security modules.
+	 * Load all remaining privileged security modules.
 	 */
 	do_security_initcalls();
 
+	/*
+	 * Load potentially-unprivileged security modules at the end.
+	 *
+	 * For an unprivileged access-control, we don't want to give the
+	 * ability to any process to do some checks (e.g. through an eBPF
+	 * program) on kernel objects (e.g. files) if a privileged security
+	 * policy forbid their access.
+	 */
+	landlock_add_hooks();
+
 	return 0;
 }
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (6 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 07/11] landlock: Handle filesystem access control Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  4:17   ` Andy Lutomirski
  2018-02-27  0:41 ` [PATCH bpf-next v8 09/11] bpf: Add a Landlock sandbox example Mickaël Salaün
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

A landlocked process has less privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process' rules.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v6:
* factor out ptrace check
* constify pointers
* cleanup headers
* use the new security_add_hooks()
---
 security/landlock/Makefile       |   2 +-
 security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
 security/landlock/hooks_ptrace.h |  11 ++++
 security/landlock/init.c         |   2 +
 4 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/hooks_ptrace.c
 create mode 100644 security/landlock/hooks_ptrace.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index d0f532a93b4e..605504d852d3 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 landlock-y := init.o chain.o task.o \
 	tag.o tag_fs.o \
 	enforce.o enforce_seccomp.o \
-	hooks.o hooks_cred.o hooks_fs.o
+	hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
new file mode 100644
index 000000000000..f1b977b9c808
--- /dev/null
+++ b/security/landlock/hooks_ptrace.c
@@ -0,0 +1,124 @@
+/*
+ * Landlock LSM - ptrace hooks
+ *
+ * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/current.h>
+#include <linux/errno.h>
+#include <linux/kernel.h> /* ARRAY_SIZE */
+#include <linux/lsm_hooks.h>
+#include <linux/sched.h> /* struct task_struct */
+#include <linux/seccomp.h>
+
+#include "common.h" /* struct landlock_prog_set */
+#include "hooks.h" /* landlocked() */
+#include "hooks_ptrace.h"
+
+static bool progs_are_subset(const struct landlock_prog_set *parent,
+		const struct landlock_prog_set *child)
+{
+	size_t i;
+
+	if (!parent || !child)
+		return false;
+	if (parent == child)
+		return true;
+
+	for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
+		struct landlock_prog_list *walker;
+		bool found_parent = false;
+
+		if (!parent->programs[i])
+			continue;
+		for (walker = child->programs[i]; walker;
+				walker = walker->prev) {
+			if (walker == parent->programs[i]) {
+				found_parent = true;
+				break;
+			}
+		}
+		if (!found_parent)
+			return false;
+	}
+	return true;
+}
+
+static bool task_has_subset_progs(const struct task_struct *parent,
+		const struct task_struct *child)
+{
+#ifdef CONFIG_SECCOMP_FILTER
+	if (progs_are_subset(parent->seccomp.landlock_prog_set,
+				child->seccomp.landlock_prog_set))
+		/* must be ANDed with other providers (i.e. cgroup) */
+		return true;
+#endif /* CONFIG_SECCOMP_FILTER */
+	return false;
+}
+
+static int task_ptrace(const struct task_struct *parent,
+		const struct task_struct *child)
+{
+	if (!landlocked(parent))
+		return 0;
+
+	if (!landlocked(child))
+		return -EPERM;
+
+	if (task_has_subset_progs(parent, child))
+		return 0;
+
+	return -EPERM;
+}
+
+/**
+ * hook_ptrace_access_check - determine whether the current process may access
+ *			      another
+ *
+ * @child: the process to be accessed
+ * @mode: the mode of attachment
+ *
+ * If the current task has Landlock programs, then the child must have at least
+ * the same programs.  Else denied.
+ *
+ * Determine whether a process may access another, returning 0 if permission
+ * granted, -errno if denied.
+ */
+static int hook_ptrace_access_check(struct task_struct *child,
+		unsigned int mode)
+{
+	return task_ptrace(current, child);
+}
+
+/**
+ * hook_ptrace_traceme - determine whether another process may trace the
+ *			 current one
+ *
+ * @parent: the task proposed to be the tracer
+ *
+ * If the parent has Landlock programs, then the current task must have the
+ * same or more programs.
+ * Else denied.
+ *
+ * Determine whether the nominated task is permitted to trace the current
+ * process, returning 0 if permission is granted, -errno if denied.
+ */
+static int hook_ptrace_traceme(struct task_struct *parent)
+{
+	return task_ptrace(parent, current);
+}
+
+static struct security_hook_list landlock_hooks[] = {
+	LSM_HOOK_INIT(ptrace_access_check, hook_ptrace_access_check),
+	LSM_HOOK_INIT(ptrace_traceme, hook_ptrace_traceme),
+};
+
+__init void landlock_add_hooks_ptrace(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			LANDLOCK_NAME);
+}
diff --git a/security/landlock/hooks_ptrace.h b/security/landlock/hooks_ptrace.h
new file mode 100644
index 000000000000..15b1f3479e0e
--- /dev/null
+++ b/security/landlock/hooks_ptrace.h
@@ -0,0 +1,11 @@
+/*
+ * Landlock LSM - ptrace hooks
+ *
+ * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+__init void landlock_add_hooks_ptrace(void);
diff --git a/security/landlock/init.c b/security/landlock/init.c
index 3486272d17b2..0f16848f5ad1 100644
--- a/security/landlock/init.c
+++ b/security/landlock/init.c
@@ -17,6 +17,7 @@
 #include "common.h" /* LANDLOCK_* */
 #include "hooks_fs.h"
 #include "hooks_cred.h"
+#include "hooks_ptrace.h"
 
 static bool bpf_landlock_is_valid_access(int off, int size,
 		enum bpf_access_type type, struct bpf_insn_access_aux *info,
@@ -232,5 +233,6 @@ void __init landlock_add_hooks(void)
 {
 	pr_info(LANDLOCK_NAME ": Ready to sandbox with seccomp\n");
 	landlock_add_hooks_cred();
+	landlock_add_hooks_ptrace();
 	landlock_add_hooks_fs();
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 09/11] bpf: Add a Landlock sandbox example
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (7 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 10/11] bpf,landlock: Add tests for Landlock Mickaël Salaün
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

Add a basic sandbox tool to launch a command which is only allowed to
access in a read only or read-write way a whitelist of file hierarchies.

Add to the bpf_load library the ability to handle a BPF program subtype.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v7:
* rewrite the example using an inode map
* add to bpf_load the ability to handle subtypes per program type

Changes since v6:
* check return value of load_and_attach()
* allow to write on pipes
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose
* use const variable (suggested by Kees Cook)
* remove useless definitions (suggested by Kees Cook)
* add detailed explanations (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes
* rebase

Changes since v4:
* write Landlock rule in C and compiled it with LLVM
* remove cgroup handling
* remove path handling: only handle a read-only environment
* remove errno return codes

Changes since v3:
* remove seccomp and origin field: completely free from seccomp programs
* handle more FS-related hooks
* handle inode hooks and directory traversal
* add faked but consistent view thanks to ENOENT
* add /lib64 in the example
* fix spelling
* rename some types and definitions (e.g. SECCOMP_ADD_LANDLOCK_RULE)

Changes since v2:
* use BPF_PROG_ATTACH for cgroup handling
---
 samples/bpf/Makefile         |   4 +
 samples/bpf/bpf_load.c       |  82 ++++++++++++++++++++-
 samples/bpf/bpf_load.h       |   7 ++
 samples/bpf/landlock1.h      |  14 ++++
 samples/bpf/landlock1_kern.c | 171 +++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/landlock1_user.c | 164 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 439 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/landlock1.h
 create mode 100644 samples/bpf/landlock1_kern.c
 create mode 100644 samples/bpf/landlock1_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ec3fc8d88e87..015b1375daa5 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -43,6 +43,7 @@ hostprogs-y += xdp_redirect_cpu
 hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
+hostprogs-y += landlock1
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -93,6 +94,7 @@ xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+landlock1-objs := bpf_load.o $(LIBBPF) landlock1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -144,6 +146,7 @@ always += xdp_monitor_kern.o
 always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
+always += landlock1_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -188,6 +191,7 @@ HOSTLOADLIBES_xdp_redirect_cpu += -lelf
 HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_xdp_rxq_info += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
+HOSTLOADLIBES_landlock1 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 5bb37db6054b..f7c91093b2f5 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -12,6 +12,7 @@
 #include <stdlib.h>
 #include <linux/bpf.h>
 #include <linux/filter.h>
+#include <linux/landlock.h>
 #include <linux/perf_event.h>
 #include <linux/netlink.h>
 #include <linux/rtnetlink.h>
@@ -43,6 +44,9 @@ int prog_array_fd = -1;
 struct bpf_map_data map_data[MAX_MAPS];
 int map_data_count = 0;
 
+struct bpf_subtype_data subtype_data[MAX_PROGS];
+int subtype_data_count = 0;
+
 static int populate_prog_array(const char *event, int prog_fd)
 {
 	int ind = atoi(event), err;
@@ -67,12 +71,14 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
 	bool is_sockops = strncmp(event, "sockops", 7) == 0;
 	bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+	bool is_landlock = strncmp(event, "landlock", 8) == 0;
 	size_t insns_cnt = size / sizeof(struct bpf_insn);
 	enum bpf_prog_type prog_type;
 	char buf[256];
 	int fd, efd, err, id;
 	struct perf_event_attr attr = {};
 	union bpf_prog_subtype *st = NULL;
+	struct bpf_subtype_data *sd = NULL;
 
 	attr.type = PERF_TYPE_TRACEPOINT;
 	attr.sample_type = PERF_SAMPLE_RAW;
@@ -97,6 +103,50 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		prog_type = BPF_PROG_TYPE_SOCK_OPS;
 	} else if (is_sk_skb) {
 		prog_type = BPF_PROG_TYPE_SK_SKB;
+	} else if (is_landlock) {
+		int i, prog_id;
+		const char *event_id = (event + 8);
+
+		if (!isdigit(*event_id)) {
+			printf("invalid prog number\n");
+			return -1;
+		}
+		prog_id = atoi(event_id);
+		for (i = 0; i < subtype_data_count; i++) {
+			if (subtype_data[i].name && strcmp(event,
+						subtype_data[i].name) == 0) {
+				/* save the prog_id for a next program */
+				sd = &subtype_data[i];
+				sd->prog_id = prog_id;
+				st = &sd->subtype;
+				free(sd->name);
+				sd->name = NULL;
+				break;
+			}
+		}
+		if (!st) {
+			printf("missing subtype\n");
+			return -1;
+		}
+		/* automatic conversion of program pointer to FD */
+		if (st->landlock_hook.options & LANDLOCK_OPTION_PREVIOUS) {
+			int previous = -1;
+
+			/* assume the previous program is already loaded */
+			for (i = 0; i < subtype_data_count; i++) {
+				if (subtype_data[i].prog_id ==
+						st->landlock_hook.previous) {
+					previous = subtype_data[i].prog_fd;
+					break;
+				}
+			}
+			if (previous == -1) {
+				printf("could not find the previous program\n");
+				return -1;
+			}
+			st->landlock_hook.previous = previous;
+		}
+		prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK;
 	} else {
 		printf("Unknown event '%s'\n", event);
 		return -1;
@@ -108,10 +158,13 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		printf("bpf_load_program() err=%d\n%s", errno, bpf_log_buf);
 		return -1;
 	}
+	if (sd)
+		sd->prog_fd = fd;
 
 	prog_fd[prog_cnt++] = fd;
 
-	if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
+	if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk ||
+	    is_landlock)
 		return 0;
 
 	if (is_socket || is_sockops || is_sk_skb) {
@@ -515,6 +568,29 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 			data_maps = data;
 			for (j = 0; j < MAX_MAPS; j++)
 				map_data[j].fd = -1;
+		} else if (strncmp(shname, "subtype", 7) == 0) {
+			processed_sec[i] = true;
+			if (*(shname + 7) != '/') {
+				printf("invalid name of subtype section");
+				return 1;
+			}
+			if (data->d_size != sizeof(union bpf_prog_subtype)) {
+				printf("invalid size of subtype section: %zd\n",
+				       data->d_size);
+				printf("ref: %zd\n",
+				       sizeof(union bpf_prog_subtype));
+				return 1;
+			}
+			if (subtype_data_count >= MAX_PROGS) {
+				printf("too many subtype sections");
+				return 1;
+			}
+			memcpy(&subtype_data[subtype_data_count].subtype,
+					data->d_buf,
+					sizeof(union bpf_prog_subtype));
+			subtype_data[subtype_data_count].name =
+				strdup((shname + 8));
+			subtype_data_count++;
 		} else if (shdr.sh_type == SHT_SYMTAB) {
 			strtabidx = shdr.sh_link;
 			symbols = data;
@@ -575,7 +651,6 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 
 	/* load programs */
 	for (i = 1; i < ehdr.e_shnum; i++) {
-
 		if (processed_sec[i])
 			continue;
 
@@ -590,7 +665,8 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 		    memcmp(shname, "socket", 6) == 0 ||
 		    memcmp(shname, "cgroup/", 7) == 0 ||
 		    memcmp(shname, "sockops", 7) == 0 ||
-		    memcmp(shname, "sk_skb", 6) == 0) {
+		    memcmp(shname, "sk_skb", 6) == 0 ||
+		    memcmp(shname, "landlock", 8) == 0) {
 			ret = load_and_attach(shname, data->d_buf,
 					      data->d_size);
 			if (ret != 0)
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 453c200b389b..b5abe793e271 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -24,6 +24,13 @@ struct bpf_map_data {
 	struct bpf_map_def def;
 };
 
+struct bpf_subtype_data {
+	char *name;
+	int prog_id;
+	int prog_fd;
+	union bpf_prog_subtype subtype;
+};
+
 typedef void (*fixup_map_cb)(struct bpf_map_data *map, int idx);
 
 extern int prog_fd[MAX_PROGS];
diff --git a/samples/bpf/landlock1.h b/samples/bpf/landlock1.h
new file mode 100644
index 000000000000..6d0538816088
--- /dev/null
+++ b/samples/bpf/landlock1.h
@@ -0,0 +1,14 @@
+/*
+ * Landlock sample 1 - common header
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#define MAP_MARK_READ		(1ULL << 63)
+#define MAP_MARK_WRITE		(1ULL << 62)
+#define COOKIE_VALUE_FREEZED	(1ULL << 61)
+#define _MAP_MARK_MASK		(MAP_MARK_READ | MAP_MARK_WRITE | COOKIE_VALUE_FREEZED)
diff --git a/samples/bpf/landlock1_kern.c b/samples/bpf/landlock1_kern.c
new file mode 100644
index 000000000000..113243677ddd
--- /dev/null
+++ b/samples/bpf/landlock1_kern.c
@@ -0,0 +1,171 @@
+/*
+ * Landlock sample 1 - whitelist of read only or read-write file hierarchy
+ *
+ * Copyright © 2017-2018 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * This file contains a function that will be compiled to eBPF bytecode thanks
+ * to LLVM/Clang.
+ *
+ * Each SEC() means that the following function or variable will be part of a
+ * custom ELF section. This sections are then processed by the userspace part
+ * (see landlock1_user.c) to extract eBPF bytecode and take into account
+ * variables describing the eBPF program subtype or its license.
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/landlock.h>
+
+#include "bpf_helpers.h"
+#include "landlock1.h" /* MAP_MARK_* */
+
+SEC("maps")
+struct bpf_map_def inode_map = {
+	.type = BPF_MAP_TYPE_INODE,
+	.key_size = sizeof(u32),
+	.value_size = sizeof(u64),
+	.max_entries = 20,
+};
+
+SEC("subtype/landlock1")
+static union bpf_prog_subtype _subtype1 = {
+	.landlock_hook = {
+		.type = LANDLOCK_HOOK_FS_WALK,
+	}
+};
+
+static __always_inline __u64 update_cookie(__u64 cookie, __u8 lookup,
+		void *inode, void *chain, bool freeze)
+{
+	__u64 map_allow = 0;
+
+	if (cookie == 0) {
+		cookie = bpf_inode_get_tag(inode, chain);
+		if (cookie)
+			return cookie;
+		/* only look for the first match in the map, ignore nested
+		 * paths in this example */
+		map_allow = bpf_inode_map_lookup(&inode_map, inode);
+		if (map_allow)
+			cookie = 1 | map_allow;
+	} else {
+		if (cookie & COOKIE_VALUE_FREEZED)
+			return cookie;
+		map_allow = cookie & _MAP_MARK_MASK;
+		cookie &= ~_MAP_MARK_MASK;
+		switch (lookup) {
+		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT:
+			cookie--;
+			break;
+		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT:
+			break;
+		default:
+			/* ignore _MAP_MARK_MASK overflow in this example */
+			cookie++;
+			break;
+		}
+		if (cookie >= 1)
+			cookie |= map_allow;
+	}
+	/* do not modify the cookie for each fs_pick */
+	if (freeze && cookie)
+		cookie |= COOKIE_VALUE_FREEZED;
+	return cookie;
+}
+
+/*
+ * The function fs_walk() is a simple Landlock program enforced on a set of
+ * processes. This program will be run for each walk through a file path.
+ *
+ * The argument ctx contains the context of the program when it is run, which
+ * enable to evaluate the file path.  This context can change for each run of
+ * the program.
+ */
+SEC("landlock1")
+int fs_walk(struct landlock_ctx_fs_walk *ctx)
+{
+	ctx->cookie = update_cookie(ctx->cookie, ctx->inode_lookup,
+			(void *)ctx->inode, (void *)ctx->chain, false);
+	return LANDLOCK_RET_ALLOW;
+}
+
+SEC("subtype/landlock2")
+static union bpf_prog_subtype _subtype2 = {
+	.landlock_hook = {
+		.type = LANDLOCK_HOOK_FS_PICK,
+		.options = LANDLOCK_OPTION_PREVIOUS,
+		.previous = 1, /* landlock1 */
+		.triggers = LANDLOCK_TRIGGER_FS_PICK_CHDIR |
+			    LANDLOCK_TRIGGER_FS_PICK_GETATTR |
+			    LANDLOCK_TRIGGER_FS_PICK_READDIR |
+			    LANDLOCK_TRIGGER_FS_PICK_TRANSFER |
+			    LANDLOCK_TRIGGER_FS_PICK_OPEN,
+	}
+};
+
+SEC("landlock2")
+int fs_pick_ro(struct landlock_ctx_fs_pick *ctx)
+{
+	ctx->cookie = update_cookie(ctx->cookie, ctx->inode_lookup,
+			(void *)ctx->inode, (void *)ctx->chain, true);
+	if (ctx->cookie & MAP_MARK_READ)
+		return LANDLOCK_RET_ALLOW;
+	return LANDLOCK_RET_DENY;
+}
+
+SEC("subtype/landlock3")
+static union bpf_prog_subtype _subtype3 = {
+	.landlock_hook = {
+		.type = LANDLOCK_HOOK_FS_PICK,
+		.options = LANDLOCK_OPTION_PREVIOUS,
+		.previous = 2, /* landlock2 */
+		.triggers = LANDLOCK_TRIGGER_FS_PICK_APPEND |
+			    LANDLOCK_TRIGGER_FS_PICK_CREATE |
+			    LANDLOCK_TRIGGER_FS_PICK_LINK |
+			    LANDLOCK_TRIGGER_FS_PICK_LINKTO |
+			    LANDLOCK_TRIGGER_FS_PICK_LOCK |
+			    LANDLOCK_TRIGGER_FS_PICK_MOUNTON |
+			    LANDLOCK_TRIGGER_FS_PICK_RENAME |
+			    LANDLOCK_TRIGGER_FS_PICK_RENAMETO |
+			    LANDLOCK_TRIGGER_FS_PICK_RMDIR |
+			    LANDLOCK_TRIGGER_FS_PICK_SETATTR |
+			    LANDLOCK_TRIGGER_FS_PICK_UNLINK |
+			    LANDLOCK_TRIGGER_FS_PICK_WRITE,
+	}
+};
+
+SEC("landlock3")
+int fs_pick_rw(struct landlock_ctx_fs_pick *ctx)
+{
+	ctx->cookie = update_cookie(ctx->cookie, ctx->inode_lookup,
+			(void *)ctx->inode, (void *)ctx->chain, true);
+	if (ctx->cookie & MAP_MARK_WRITE)
+		return LANDLOCK_RET_ALLOW;
+	return LANDLOCK_RET_DENY;
+}
+
+SEC("subtype/landlock4")
+static union bpf_prog_subtype _subtype4 = {
+	.landlock_hook = {
+		.type = LANDLOCK_HOOK_FS_GET,
+		.options = LANDLOCK_OPTION_PREVIOUS,
+		.previous = 3, /* landlock3 */
+	}
+};
+
+SEC("landlock4")
+int fs_get(struct landlock_ctx_fs_get *ctx)
+{
+	/* save the cookie in the tag for relative path lookup */
+	bpf_landlock_set_tag((void *)ctx->tag_object, (void *)ctx->chain,
+			ctx->cookie & ~COOKIE_VALUE_FREEZED);
+	return LANDLOCK_RET_ALLOW;
+}
+
+SEC("license")
+static const char _license[] = "GPL";
diff --git a/samples/bpf/landlock1_user.c b/samples/bpf/landlock1_user.c
new file mode 100644
index 000000000000..e46e0ca182cd
--- /dev/null
+++ b/samples/bpf/landlock1_user.c
@@ -0,0 +1,164 @@
+/*
+ * Landlock sample 1 - partial read-only filesystem
+ *
+ * Copyright © 2017-2018 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include "bpf_load.h"
+#include "landlock1.h" /* MAP_MARK_* */
+#include "libbpf.h"
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h> /* open() */
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/landlock.h>
+#include <linux/prctl.h>
+#include <linux/seccomp.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#ifndef seccomp
+static int seccomp(unsigned int op, unsigned int flags, void *args)
+{
+	errno = 0;
+	return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
+
+static int apply_sandbox(int prog_fd)
+{
+	int ret = 0;
+
+	/* set up the test sandbox */
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(no_new_priv)");
+		return 1;
+	}
+	if (seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &prog_fd)) {
+		perror("seccomp(set_hook)");
+		ret = 1;
+	}
+	close(prog_fd);
+
+	return ret;
+}
+
+#define ENV_FS_PATH_RO_NAME "LL_PATH_RO"
+#define ENV_FS_PATH_RW_NAME "LL_PATH_RW"
+#define ENV_PATH_TOKEN ":"
+
+static int parse_path(char *env_path, const char ***path_list)
+{
+	int i, path_nb = 0;
+
+	if (env_path) {
+		path_nb++;
+		for (i = 0; env_path[i]; i++) {
+			if (env_path[i] == ENV_PATH_TOKEN[0])
+				path_nb++;
+		}
+	}
+	*path_list = malloc(path_nb * sizeof(**path_list));
+	for (i = 0; i < path_nb; i++)
+		(*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN);
+
+	return path_nb;
+}
+
+static int populate_map(const char *env_var, unsigned long long value,
+		int map_fd)
+{
+	int path_nb, ref_fd, i;
+	char *env_path_name;
+	const char **path_list = NULL;
+
+	env_path_name = getenv(env_var);
+	if (!env_path_name)
+		return 0;
+	env_path_name = strdup(env_path_name);
+	path_nb = parse_path(env_path_name, &path_list);
+
+	for (i = 0; i < path_nb; i++) {
+		ref_fd = open(path_list[i], O_RDONLY | O_CLOEXEC);
+		if (ref_fd < 0) {
+			fprintf(stderr, "Failed to open \"%s\": %s\n",
+					path_list[i],
+					strerror(errno));
+			return 1;
+		}
+		if (bpf_map_update_elem(map_fd, &ref_fd, &value, BPF_ANY)) {
+			fprintf(stderr, "Failed to update the map with"
+					" \"%s\": %s\n", path_list[i],
+					strerror(errno));
+			return 1;
+		}
+		close(ref_fd);
+	}
+	free(env_path_name);
+	return 0;
+}
+
+int main(int argc, char * const argv[], char * const *envp)
+{
+	char filename[256];
+	char *cmd_path;
+	char * const *cmd_argv;
+	int ll_prog;
+
+	if (argc < 2) {
+		fprintf(stderr, "usage: %s <cmd> [args]...\n\n", argv[0]);
+		fprintf(stderr, "Launch a command in a restricted environment.\n");
+		fprintf(stderr, "Environment variables containing paths, each separated by a colon:\n");
+		fprintf(stderr, "* %s: whitelist of allowed files and directories to be read\n",
+				ENV_FS_PATH_RO_NAME);
+		fprintf(stderr, "* %s: whitelist of allowed files and directories to be modified\n",
+				ENV_FS_PATH_RW_NAME);
+		fprintf(stderr, "\nexample:\n"
+				"%s=\"/bin:/lib:/lib64:/usr:${HOME}\" "
+				"%s=\"/tmp:/dev/urandom:/dev/random:/dev/null\" "
+				"%s /bin/sh -i\n",
+				ENV_FS_PATH_RO_NAME, ENV_FS_PATH_RW_NAME, argv[0]);
+		return 1;
+	}
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+	ll_prog = prog_fd[3]; /* fs_get */
+	if (!ll_prog) {
+		if (errno)
+			printf("load_bpf_file: %s\n", strerror(errno));
+		else
+			printf("load_bpf_file: Error\n");
+		return 1;
+	}
+
+	if (populate_map(ENV_FS_PATH_RO_NAME, MAP_MARK_READ, map_fd[0]))
+		return 1;
+	if (populate_map(ENV_FS_PATH_RW_NAME, MAP_MARK_READ | MAP_MARK_WRITE,
+				map_fd[0]))
+		return 1;
+	close(map_fd[0]);
+
+	fprintf(stderr, "Launching a new sandboxed process\n");
+	if (apply_sandbox(ll_prog))
+		return 1;
+	cmd_path = argv[1];
+	cmd_argv = argv + 1;
+	execve(cmd_path, cmd_argv, envp);
+	perror("Failed to call execve");
+	return 1;
+}
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 10/11] bpf,landlock: Add tests for Landlock
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (8 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 09/11] bpf: Add a Landlock sandbox example Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  0:41 ` [PATCH bpf-next v8 11/11] landlock: Add user and kernel documentation " Mickaël Salaün
  2018-02-27  4:36 ` [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Andy Lutomirski
  11 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

Test basic context access, ptrace protection and filesystem hooks and
Landlock program chaining with multiple cases.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
---

Changes since v7:
* update tests and add new ones for filesystem hierarchy and Landlock
  chains.

Changes since v6:
* use the new kselftest_harness.h
* use const variables
* replace ASSERT_STEP with ASSERT_*
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* force sample library rebuild
* fix install target

Changes since v5:
* add subtype test
* add ptrace tests
* split and rename files
* cleanup and rebase
---
 tools/testing/selftests/Makefile               |   1 +
 tools/testing/selftests/bpf/bpf_helpers.h      |   7 +
 tools/testing/selftests/bpf/test_verifier.c    |  84 +++++
 tools/testing/selftests/landlock/.gitignore    |   5 +
 tools/testing/selftests/landlock/Makefile      |  35 ++
 tools/testing/selftests/landlock/test.h        |  31 ++
 tools/testing/selftests/landlock/test_base.c   |  27 ++
 tools/testing/selftests/landlock/test_chain.c  | 249 +++++++++++++
 tools/testing/selftests/landlock/test_fs.c     | 492 +++++++++++++++++++++++++
 tools/testing/selftests/landlock/test_ptrace.c | 158 ++++++++
 10 files changed, 1089 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/test.h
 create mode 100644 tools/testing/selftests/landlock/test_base.c
 create mode 100644 tools/testing/selftests/landlock/test_chain.c
 create mode 100644 tools/testing/selftests/landlock/test_fs.c
 create mode 100644 tools/testing/selftests/landlock/test_ptrace.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 7442dfb73b7f..5d00deb3cab6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += gpio
 TARGETS += intel_pstate
 TARGETS += ipc
 TARGETS += kcmp
+TARGETS += landlock
 TARGETS += lib
 TARGETS += membarrier
 TARGETS += memfd
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index dde2c11d7771..414e267491f7 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -86,6 +86,13 @@ static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
 	(void *) BPF_FUNC_perf_prog_read_value;
 static int (*bpf_override_return)(void *ctx, unsigned long rc) =
 	(void *) BPF_FUNC_override_return;
+static unsigned long long (*bpf_inode_map_lookup)(void *map, void *key) =
+	(void *) BPF_FUNC_inode_map_lookup;
+static unsigned long long (*bpf_inode_get_tag)(void *inode, void *chain) =
+	(void *) BPF_FUNC_inode_get_tag;
+static unsigned long long (*bpf_landlock_set_tag)(void *tag_obj, void *chain,
+						  unsigned long long value) =
+	(void *) BPF_FUNC_landlock_set_tag;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 3c24a5a7bafc..5f68b95187fe 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -31,6 +31,7 @@
 #include <linux/bpf_perf_event.h>
 #include <linux/bpf.h>
 #include <linux/if_ether.h>
+#include <linux/landlock.h>
 
 #include <bpf/bpf.h>
 
@@ -11240,6 +11241,89 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.has_prog_subtype = true,
 	},
+	{
+		"missing subtype",
+		.insns = {
+			BPF_MOV32_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+	},
+	{
+		"landlock/fs_pick: always accept",
+		.insns = {
+			BPF_MOV32_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+		.has_prog_subtype = true,
+		.prog_subtype = {
+			.landlock_hook = {
+				.type = LANDLOCK_HOOK_FS_PICK,
+				.triggers = LANDLOCK_TRIGGER_FS_PICK_READ,
+			}
+		},
+	},
+	{
+		"landlock/fs_pick: read context",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+				offsetof(struct landlock_ctx_fs_pick, cookie)),
+			/* test operations on raw values */
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, 1),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+				offsetof(struct landlock_ctx_fs_pick, inode)),
+			BPF_MOV32_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+		.has_prog_subtype = true,
+		.prog_subtype = {
+			.landlock_hook = {
+				.type = LANDLOCK_HOOK_FS_PICK,
+				.triggers = LANDLOCK_TRIGGER_FS_PICK_READ,
+			}
+		},
+	},
+	{
+		"landlock/fs_pick: no option for previous program",
+		.insns = {
+			BPF_MOV32_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+		.prog_subtype = {
+			.landlock_hook = {
+				.type = LANDLOCK_HOOK_FS_PICK,
+				.previous = 1,
+			}
+		},
+	},
+	{
+		"landlock/fs_pick: bad previous program FD",
+		.insns = {
+			BPF_MOV32_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+		.prog_subtype = {
+			.landlock_hook = {
+				.type = LANDLOCK_HOOK_FS_PICK,
+				.options = LANDLOCK_OPTION_PREVIOUS,
+				/* assume FD 0 is a TTY or a pipe */
+				.previous = 0,
+			}
+		},
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)
diff --git a/tools/testing/selftests/landlock/.gitignore b/tools/testing/selftests/landlock/.gitignore
new file mode 100644
index 000000000000..d4e365980c9c
--- /dev/null
+++ b/tools/testing/selftests/landlock/.gitignore
@@ -0,0 +1,5 @@
+/test_base
+/test_chain
+/test_fs
+/test_ptrace
+/tmp_*
diff --git a/tools/testing/selftests/landlock/Makefile b/tools/testing/selftests/landlock/Makefile
new file mode 100644
index 000000000000..9b2791ded1cc
--- /dev/null
+++ b/tools/testing/selftests/landlock/Makefile
@@ -0,0 +1,35 @@
+LIBDIR := ../../../lib
+OBJDIR := ../../../lib/bpf
+BPFOBJS := $(OBJDIR)/bpf.o $(OBJDIR)/nlattr.o
+LOADOBJ := ../../../../samples/bpf/bpf_load.o
+
+CFLAGS += -Wl,-no-as-needed -Wall -O2 -I../../../include/uapi -I$(LIBDIR)
+LDFLAGS += -lelf
+
+test_src = $(wildcard test_*.c)
+
+test_objs := $(test_src:.c=)
+
+TEST_PROGS := $(test_objs)
+
+.PHONY: all clean force
+
+all: $(test_objs)
+
+# force a rebuild of BPFOBJS when its dependencies are updated
+force:
+
+# rebuild bpf.o as a workaround for the samples/bpf bug
+$(BPFOBJS): $(LOADOBJ) force
+	$(MAKE) -C $(OBJDIR)
+
+$(LOADOBJ): force
+	$(MAKE) -C $(dir $(LOADOBJ))
+
+$(test_objs): $(BPFOBJS) $(LOADOBJ) ../kselftest_harness.h
+
+include ../lib.mk
+
+clean:
+	$(RM) $(test_objs)
+
diff --git a/tools/testing/selftests/landlock/test.h b/tools/testing/selftests/landlock/test.h
new file mode 100644
index 000000000000..3046516488d9
--- /dev/null
+++ b/tools/testing/selftests/landlock/test.h
@@ -0,0 +1,31 @@
+/*
+ * Landlock helpers
+ *
+ * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <errno.h>
+#include <linux/landlock.h>
+#include <linux/seccomp.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h>
+
+#include "../kselftest_harness.h"
+#include "../../../../samples/bpf/bpf_load.h"
+
+#ifndef SECCOMP_PREPEND_LANDLOCK_PROG
+#define SECCOMP_PREPEND_LANDLOCK_PROG	3
+#endif
+
+#ifndef seccomp
+static int __attribute__((unused)) seccomp(unsigned int op, unsigned int flags,
+		void *args)
+{
+	errno = 0;
+	return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
diff --git a/tools/testing/selftests/landlock/test_base.c b/tools/testing/selftests/landlock/test_base.c
new file mode 100644
index 000000000000..3ad18a779ecf
--- /dev/null
+++ b/tools/testing/selftests/landlock/test_base.c
@@ -0,0 +1,27 @@
+/*
+ * Landlock tests - base
+ *
+ * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+
+#include "test.h"
+
+TEST(seccomp_landlock)
+{
+	int ret;
+
+	ret = seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, NULL);
+	EXPECT_EQ(-1, ret);
+	EXPECT_EQ(EFAULT, errno) {
+		TH_LOG("Kernel does not support CONFIG_SECURITY_LANDLOCK");
+	}
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/test_chain.c b/tools/testing/selftests/landlock/test_chain.c
new file mode 100644
index 000000000000..916e84802fd4
--- /dev/null
+++ b/tools/testing/selftests/landlock/test_chain.c
@@ -0,0 +1,249 @@
+/*
+ * Landlock tests - chain
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <errno.h>
+
+#include "test.h"
+
+static int new_prog(struct __test_metadata *_metadata, int is_valid,
+		__u32 hook_type, int prev)
+{
+	const struct bpf_insn prog_accept[] = {
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	union bpf_prog_subtype subtype = {
+		.landlock_hook = {
+			.type = hook_type,
+			.triggers = hook_type == LANDLOCK_HOOK_FS_PICK ?
+				LANDLOCK_TRIGGER_FS_PICK_OPEN : 0,
+		}
+	};
+	int prog;
+	char log[256] = "";
+
+	if (prev != -1) {
+		subtype.landlock_hook.options = LANDLOCK_OPTION_PREVIOUS;
+		subtype.landlock_hook.previous = prev;
+	}
+	prog = bpf_load_program(BPF_PROG_TYPE_LANDLOCK_HOOK,
+			(const struct bpf_insn *)&prog_accept,
+			sizeof(prog_accept) / sizeof(struct bpf_insn), "GPL",
+			0, log, sizeof(log), &subtype);
+	if (is_valid) {
+		ASSERT_NE(-1, prog) {
+			TH_LOG("Failed to load program: %s\n%s",
+					strerror(errno), log);
+		}
+	} else {
+		ASSERT_EQ(-1, prog) {
+			TH_LOG("Successfully loaded a wrong program\n");
+		}
+		ASSERT_EQ(errno, EINVAL);
+	}
+	return prog;
+}
+
+static void apply_chain(struct __test_metadata *_metadata, int is_valid,
+		int prog)
+{
+	if (is_valid) {
+		ASSERT_EQ(0, seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &prog)) {
+			TH_LOG("Failed to apply chain: %s", strerror(errno));
+		}
+	} else {
+		ASSERT_NE(0, seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &prog)) {
+			TH_LOG("Successfully applied a wrong chain");
+		}
+		ASSERT_EQ(errno, EINVAL);
+	}
+}
+
+TEST(chain_fs_good_walk_pick)
+{
+	/* fs_walk1 -> [fs_pick1] */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	apply_chain(_metadata, 1, fs_pick1);
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_good_pick_pick)
+{
+	/* fs_pick1 -> [fs_pick2] */
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, -1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	apply_chain(_metadata, 1, fs_pick2);
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+}
+
+TEST(chain_fs_wrong_pick_walk)
+{
+	/* fs_pick1 -> fs_walk1 */
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, -1);
+	new_prog(_metadata, 0, LANDLOCK_HOOK_FS_WALK, fs_pick1);
+	EXPECT_EQ(0, close(fs_pick1));
+}
+
+TEST(chain_fs_wrong_walk_walk)
+{
+	/* fs_walk1 -> fs_walk2 */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	new_prog(_metadata, 0, LANDLOCK_HOOK_FS_WALK, fs_walk1);
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_good_pick_get)
+{
+	/* fs_pick1 -> [fs_get1] */
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, -1);
+	int fs_get1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_GET, fs_pick1);
+	apply_chain(_metadata, 1, fs_get1);
+	EXPECT_EQ(0, close(fs_get1));
+	EXPECT_EQ(0, close(fs_pick1));
+}
+
+TEST(chain_fs_wrong_get_get)
+{
+	/* fs_get1 -> [fs_get2] */
+	int fs_get1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	new_prog(_metadata, 0, LANDLOCK_HOOK_FS_GET, fs_get1);
+	EXPECT_EQ(0, close(fs_get1));
+}
+
+TEST(chain_fs_wrong_tree_1)
+{
+	/* [fs_walk1] -> { [fs_pick1] , [fs_pick2] } */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	apply_chain(_metadata, 1, fs_walk1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	apply_chain(_metadata, 0, fs_pick1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	apply_chain(_metadata, 0, fs_pick2);
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_wrong_tree_2)
+{
+	/* fs_walk1 -> { [fs_pick1] , [fs_pick2] } */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	apply_chain(_metadata, 1, fs_pick1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	apply_chain(_metadata, 0, fs_pick2);
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_wrong_tree_3)
+{
+	/* fs_walk1 -> [fs_pick1] -> [fs_pick2] */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	apply_chain(_metadata, 1, fs_pick1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	apply_chain(_metadata, 0, fs_pick2);
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_wrong_tree_4)
+{
+	/* fs_walk1 -> fs_pick1 -> fs_pick2 -> { [fs_get1] , [fs_get2] } */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	int fs_get1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_GET, fs_pick2);
+	apply_chain(_metadata, 1, fs_get1);
+	int fs_get2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_GET, fs_pick2);
+	apply_chain(_metadata, 0, fs_get2);
+	EXPECT_EQ(0, close(fs_get2));
+	EXPECT_EQ(0, close(fs_get1));
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_wrong_tree_5)
+{
+	/* fs_walk1 -> fs_pick1 -> { [fs_pick2] , [fs_pick3] } */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	apply_chain(_metadata, 1, fs_pick2);
+	int fs_pick3 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	apply_chain(_metadata, 0, fs_pick3);
+	EXPECT_EQ(0, close(fs_pick3));
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_wrong_tree_6)
+{
+	/* thread 1: fs_walk1 -> fs_pick1 -> [fs_pick2] */
+	/* thread 2: fs_walk1 -> fs_pick1 -> [fs_pick2] -> [fs_get1] */
+	int child;
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	apply_chain(_metadata, 1, fs_pick2);
+	child = fork();
+	if (child) {
+		/* parent */
+		int status;
+		waitpid(child, &status, 0);
+		EXPECT_TRUE(WIFEXITED(status) && !WEXITSTATUS(status));
+	} else {
+		/* child */
+		int fs_get1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_GET,
+				fs_pick2);
+		apply_chain(_metadata, 0, fs_get1);
+		_exit(0);
+	}
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_good_tree_1)
+{
+	/* fs_walk1 -> fs_pick1 -> [fs_pick2] */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	apply_chain(_metadata, 1, fs_pick2);
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST(chain_fs_good_tree_2)
+{
+	/* fs_walk1 -> fs_pick1 -> [fs_pick2] -> [fs_get1] */
+	int fs_walk1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_WALK, -1);
+	int fs_pick1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_walk1);
+	int fs_pick2 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_PICK, fs_pick1);
+	apply_chain(_metadata, 1, fs_pick2);
+	int fs_get1 = new_prog(_metadata, 1, LANDLOCK_HOOK_FS_GET, fs_pick2);
+	apply_chain(_metadata, 1, fs_get1);
+	EXPECT_EQ(0, close(fs_get1));
+	EXPECT_EQ(0, close(fs_pick2));
+	EXPECT_EQ(0, close(fs_pick1));
+	EXPECT_EQ(0, close(fs_walk1));
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/test_fs.c b/tools/testing/selftests/landlock/test_fs.c
new file mode 100644
index 000000000000..54d85b16aafb
--- /dev/null
+++ b/tools/testing/selftests/landlock/test_fs.c
@@ -0,0 +1,492 @@
+/*
+ * Landlock tests - file system
+ *
+ * Copyright © 2018 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <fcntl.h> /* O_DIRECTORY */
+#include <sys/stat.h> /* statbuf */
+#include <unistd.h> /* faccessat() */
+
+#include "test.h"
+
+#define TEST_PATH_TRIGGERS ( \
+		LANDLOCK_TRIGGER_FS_PICK_OPEN | \
+		LANDLOCK_TRIGGER_FS_PICK_READDIR | \
+		LANDLOCK_TRIGGER_FS_PICK_EXECUTE | \
+		LANDLOCK_TRIGGER_FS_PICK_GETATTR)
+
+static void enforce_depth(struct __test_metadata *_metadata, int depth)
+{
+	const struct bpf_insn prog_walk[] = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1,
+			offsetof(struct landlock_ctx_fs_walk, cookie)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1,
+			offsetof(struct landlock_ctx_fs_walk, inode_lookup)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7,
+				LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT, 3),
+		/* assume 1 is the root */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 1, 4),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_6, 1),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 2),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_7,
+				LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT, 1),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, 1),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_walk, cookie)),
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_ALLOW),
+		BPF_EXIT_INSN(),
+	};
+	const struct bpf_insn prog_pick[] = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1,
+			offsetof(struct landlock_ctx_fs_pick, cookie)),
+		/* allow without fs_walk */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 11),
+		BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1,
+			offsetof(struct landlock_ctx_fs_walk, inode_lookup)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7,
+				LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT, 3),
+		/* assume 1 is the root */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 1, 4),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_6, 1),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 2),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_7,
+				LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT, 1),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, 1),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_walk, cookie)),
+		/* with fs_walk */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, depth + 1, 2),
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_DENY),
+		BPF_EXIT_INSN(),
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_ALLOW),
+		BPF_EXIT_INSN(),
+	};
+	union bpf_prog_subtype subtype = {
+		.landlock_hook = {
+			.type = LANDLOCK_HOOK_FS_WALK,
+		}
+	};
+	int fd_walk, fd_pick;
+	char log[1030] = "";
+
+	fd_walk = bpf_load_program(BPF_PROG_TYPE_LANDLOCK_HOOK,
+			(const struct bpf_insn *)&prog_walk,
+			sizeof(prog_walk) / sizeof(struct bpf_insn), "GPL",
+			0, log, sizeof(log), &subtype);
+	ASSERT_NE(-1, fd_walk) {
+		TH_LOG("Failed to load fs_walk program: %s\n%s",
+				strerror(errno), log);
+	}
+
+	subtype.landlock_hook.type = LANDLOCK_HOOK_FS_PICK;
+	subtype.landlock_hook.options = LANDLOCK_OPTION_PREVIOUS;
+	subtype.landlock_hook.previous = fd_walk;
+	subtype.landlock_hook.triggers = TEST_PATH_TRIGGERS;
+	fd_pick = bpf_load_program(BPF_PROG_TYPE_LANDLOCK_HOOK,
+			(const struct bpf_insn *)&prog_pick,
+			sizeof(prog_pick) / sizeof(struct bpf_insn), "GPL",
+			0, log, sizeof(log), &subtype);
+	ASSERT_NE(-1, fd_pick) {
+		TH_LOG("Failed to load fs_pick program: %s\n%s",
+				strerror(errno), log);
+	}
+
+	ASSERT_EQ(0, seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &fd_pick)) {
+		TH_LOG("Failed to apply Landlock chain: %s", strerror(errno));
+	}
+	EXPECT_EQ(0, close(fd_pick));
+	EXPECT_EQ(0, close(fd_walk));
+}
+
+static void test_path_rel(struct __test_metadata *_metadata, int dirfd,
+		const char *path, int ret)
+{
+	int fd;
+	struct stat statbuf;
+
+	ASSERT_EQ(ret, faccessat(dirfd, path, R_OK | X_OK, 0));
+	ASSERT_EQ(ret, fstatat(dirfd, path, &statbuf, 0));
+	fd = openat(dirfd, path, O_DIRECTORY);
+	if (ret) {
+		ASSERT_EQ(-1, fd);
+	} else {
+		ASSERT_NE(-1, fd);
+		EXPECT_EQ(0, close(fd));
+	}
+}
+
+static void test_path(struct __test_metadata *_metadata, const char *path,
+		int ret)
+{
+	return test_path_rel(_metadata, AT_FDCWD, path, ret);
+}
+
+const char d1[] = "/usr";
+const char d1_dotdot1[] = "/usr/share/..";
+const char d1_dotdot2[] = "/usr/../usr/share/..";
+const char d1_dotdot3[] = "/usr/../../usr/share/..";
+const char d1_dotdot4[] = "/usr/../../../usr/share/..";
+const char d1_dotdot5[] = "/usr/../../../usr/share/../.";
+const char d1_dotdot6[] = "/././usr/./share/..";
+const char d2[] = "/usr/share";
+const char d2_dotdot1[] = "/usr/share/doc/..";
+const char d2_dotdot2[] = "/usr/../usr/share";
+const char d3[] = "/usr/share/doc";
+const char d4[] = "/etc";
+
+TEST(fs_depth_free)
+{
+	test_path(_metadata, d1, 0);
+	test_path(_metadata, d2, 0);
+	test_path(_metadata, d3, 0);
+}
+
+TEST(fs_depth_1)
+{
+	enforce_depth(_metadata, 1);
+	test_path(_metadata, d1, 0);
+	test_path(_metadata, d1_dotdot1, 0);
+	test_path(_metadata, d1_dotdot2, 0);
+	test_path(_metadata, d1_dotdot3, 0);
+	test_path(_metadata, d1_dotdot4, 0);
+	test_path(_metadata, d1_dotdot5, 0);
+	test_path(_metadata, d1_dotdot6, 0);
+	test_path(_metadata, d2, -1);
+	test_path(_metadata, d2_dotdot1, -1);
+	test_path(_metadata, d2_dotdot2, -1);
+	test_path(_metadata, d3, -1);
+}
+
+TEST(fs_depth_2)
+{
+	enforce_depth(_metadata, 2);
+	test_path(_metadata, d1, -1);
+	test_path(_metadata, d1_dotdot1, -1);
+	test_path(_metadata, d1_dotdot2, -1);
+	test_path(_metadata, d1_dotdot3, -1);
+	test_path(_metadata, d1_dotdot4, -1);
+	test_path(_metadata, d1_dotdot5, -1);
+	test_path(_metadata, d1_dotdot6, -1);
+	test_path(_metadata, d2, 0);
+	test_path(_metadata, d2_dotdot2, 0);
+	test_path(_metadata, d2_dotdot1, 0);
+	test_path(_metadata, d3, -1);
+}
+
+#define MAP_VALUE_ALLOW 1
+#define COOKIE_VALUE_ALLOW 2
+
+static int create_inode_map(struct __test_metadata *_metadata,
+		const char *const dirs[])
+{
+	int map, key, i;
+	__u64 value = MAP_VALUE_ALLOW;
+
+	ASSERT_NE(NULL, dirs) {
+		TH_LOG("No directory list\n");
+	}
+	ASSERT_NE(NULL, dirs[0]) {
+		TH_LOG("Empty directory list\n");
+	}
+	for (i = 0; dirs[i]; i++);
+	map = bpf_create_map(BPF_MAP_TYPE_INODE, sizeof(key), sizeof(value),
+			i, 0);
+	ASSERT_NE(-1, map) {
+		TH_LOG("Failed to create a map of %d elements: %s\n", i,
+				strerror(errno));
+	}
+	for (i = 0; dirs[i]; i++) {
+		key = open(dirs[i], O_RDONLY | O_CLOEXEC | O_DIRECTORY);
+		ASSERT_NE(-1, key) {
+			TH_LOG("Failed to open directory \"%s\": %s\n", dirs[i],
+					strerror(errno));
+		}
+		ASSERT_EQ(0, bpf_map_update_elem(map, &key, &value, BPF_ANY)) {
+			TH_LOG("Failed to update the map with \"%s\": %s\n",
+					dirs[i], strerror(errno));
+		}
+		close(key);
+	}
+	return map;
+}
+
+#define TAG_VALUE_ALLOW 1
+
+static void enforce_map(struct __test_metadata *_metadata, int map,
+		bool subpath, bool tag)
+{
+	/* do not handle dot nor dotdot */
+	const struct bpf_insn prog_walk[] = {
+		BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+		/* look at the inode's tag */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_walk, inode)),
+		BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_walk, chain)),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				BPF_FUNC_inode_get_tag),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, TAG_VALUE_ALLOW, 5),
+		/* look for the requested inode in the map */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_walk, inode)),
+		BPF_LD_MAP_FD(BPF_REG_1, map), /* 2 instructions */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				BPF_FUNC_inode_map_lookup),
+		/* if it is there, then mark the session as such */
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, MAP_VALUE_ALLOW, 2),
+		BPF_MOV64_IMM(BPF_REG_7, COOKIE_VALUE_ALLOW),
+		BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_7,
+			offsetof(struct landlock_ctx_fs_walk, cookie)),
+		/* allow to walk anything... but not to pick anything */
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_ALLOW),
+		BPF_EXIT_INSN(),
+	};
+	/* do not handle dot nor dotdot */
+	const struct bpf_insn prog_pick[] = {
+		BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+		/* allow if the inode's tag is mark as such */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_pick, inode)),
+		BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_pick, chain)),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				BPF_FUNC_inode_get_tag),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, TAG_VALUE_ALLOW, 9),
+		/* look if the walk saw an inode in the whitelist */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_pick, cookie)),
+		/* if it was there, then allow access */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_7, COOKIE_VALUE_ALLOW, 7),
+		/* otherwise, look for the requested inode in the map */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_pick, inode)),
+		BPF_LD_MAP_FD(BPF_REG_1, map), /* 2 instructions */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				BPF_FUNC_inode_map_lookup),
+		/* if it is there, then allow access */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, MAP_VALUE_ALLOW, 2),
+		/* otherwise deny access */
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_DENY),
+		BPF_EXIT_INSN(),
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_ALLOW),
+		BPF_EXIT_INSN(),
+	};
+	const struct bpf_insn prog_get[] = {
+		BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+		/* if prog_pick allowed this prog_get, then keep the state in
+		 * the inode's tag */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_get, tag_object)),
+		BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,
+			offsetof(struct landlock_ctx_fs_get, chain)),
+		BPF_MOV64_IMM(BPF_REG_3, TAG_VALUE_ALLOW),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				BPF_FUNC_landlock_set_tag),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+		/* for this test, deny on error */
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_DENY),
+		BPF_EXIT_INSN(),
+		/* the check was previously performed by prog_pick */
+		BPF_MOV32_IMM(BPF_REG_0, LANDLOCK_RET_ALLOW),
+		BPF_EXIT_INSN(),
+	};
+	union bpf_prog_subtype subtype = {};
+	int fd_walk = -1, fd_pick, fd_get, fd_last;
+	char log[1024] = "";
+
+	if (subpath) {
+		subtype.landlock_hook.type = LANDLOCK_HOOK_FS_WALK;
+		fd_walk = bpf_load_program(BPF_PROG_TYPE_LANDLOCK_HOOK,
+				(const struct bpf_insn *)&prog_walk,
+				sizeof(prog_walk) / sizeof(struct bpf_insn),
+				"GPL", 0, log, sizeof(log), &subtype);
+		ASSERT_NE(-1, fd_walk) {
+			TH_LOG("Failed to load fs_walk program: %s\n%s",
+					strerror(errno), log);
+		}
+		subtype.landlock_hook.options = LANDLOCK_OPTION_PREVIOUS;
+		subtype.landlock_hook.previous = fd_walk;
+	}
+
+	subtype.landlock_hook.type = LANDLOCK_HOOK_FS_PICK;
+	subtype.landlock_hook.triggers = TEST_PATH_TRIGGERS;
+	fd_pick = bpf_load_program(BPF_PROG_TYPE_LANDLOCK_HOOK,
+			(const struct bpf_insn *)&prog_pick,
+			sizeof(prog_pick) / sizeof(struct bpf_insn), "GPL", 0,
+			log, sizeof(log), &subtype);
+	ASSERT_NE(-1, fd_pick) {
+		TH_LOG("Failed to load fs_pick program: %s\n%s",
+				strerror(errno), log);
+	}
+	fd_last = fd_pick;
+
+	if (tag) {
+		subtype.landlock_hook.type = LANDLOCK_HOOK_FS_GET;
+		subtype.landlock_hook.triggers = 0;
+		subtype.landlock_hook.options = LANDLOCK_OPTION_PREVIOUS;
+		subtype.landlock_hook.previous = fd_pick;
+		fd_get = bpf_load_program(BPF_PROG_TYPE_LANDLOCK_HOOK,
+				(const struct bpf_insn *)&prog_get,
+				sizeof(prog_get) / sizeof(struct bpf_insn),
+				"GPL", 0, log, sizeof(log), &subtype);
+		ASSERT_NE(-1, fd_get) {
+			TH_LOG("Failed to load fs_get program: %s\n%s",
+					strerror(errno), log);
+		}
+		fd_last = fd_get;
+	}
+
+	ASSERT_EQ(0, seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &fd_last)) {
+		TH_LOG("Failed to apply Landlock chain: %s", strerror(errno));
+	}
+	if (tag)
+		EXPECT_EQ(0, close(fd_get));
+	EXPECT_EQ(0, close(fd_pick));
+	if (subpath)
+		EXPECT_EQ(0, close(fd_walk));
+}
+
+/* do not handle dot nor dotdot */
+static void check_map_whitelist(struct __test_metadata *_metadata,
+		bool subpath)
+{
+	int map = create_inode_map(_metadata, (const char *const [])
+			{ d2, NULL });
+	ASSERT_NE(-1, map);
+	enforce_map(_metadata, map, subpath, false);
+	test_path(_metadata, d1, -1);
+	test_path(_metadata, d2, 0);
+	test_path(_metadata, d3, subpath ? 0 : -1);
+	EXPECT_EQ(0, close(map));
+}
+
+TEST(fs_map_whitelist_literal)
+{
+	check_map_whitelist(_metadata, false);
+}
+
+TEST(fs_map_whitelist_subpath)
+{
+	check_map_whitelist(_metadata, true);
+}
+
+const char r2[] = ".";
+const char r3[] = "./doc";
+
+enum relative_access {
+	REL_OPEN,
+	REL_CHDIR,
+	REL_CHROOT,
+};
+
+static void check_tag(struct __test_metadata *_metadata,
+		bool enforce, bool with_tag, enum relative_access rel)
+{
+	int dirfd;
+	int map = -1;
+	int access_beneath, access_absolute;
+
+	if (rel == REL_CHROOT) {
+		/* do not tag with the chdir, only with the chroot */
+		ASSERT_NE(-1, chdir(d2));
+	}
+	if (enforce) {
+		map = create_inode_map(_metadata, (const char *const [])
+				{ d1, NULL });
+		ASSERT_NE(-1, map);
+		enforce_map(_metadata, map, true, with_tag);
+	}
+	switch (rel) {
+	case REL_OPEN:
+		dirfd = open(d2, O_DIRECTORY);
+		ASSERT_NE(-1, dirfd);
+		break;
+	case REL_CHDIR:
+		ASSERT_NE(-1, chdir(d2));
+		dirfd = AT_FDCWD;
+		break;
+	case REL_CHROOT:
+		ASSERT_NE(-1, chroot(d2)) {
+			TH_LOG("Failed to chroot: %s\n", strerror(errno));
+		}
+		dirfd = AT_FDCWD;
+		break;
+	default:
+		ASSERT_TRUE(false);
+		return;
+	}
+
+	access_beneath = (!enforce || with_tag) ? 0 : -1;
+	test_path_rel(_metadata, dirfd, r2, access_beneath);
+	test_path_rel(_metadata, dirfd, r3, access_beneath);
+
+	access_absolute = (enforce || rel == REL_CHROOT) ? -1 : 0;
+	test_path(_metadata, d4, access_absolute);
+	test_path_rel(_metadata, dirfd, d4, access_absolute);
+
+	if (rel == REL_OPEN)
+		EXPECT_EQ(0, close(dirfd));
+	if (enforce)
+		EXPECT_EQ(0, close(map));
+}
+
+TEST(fs_notag_allow_open)
+{
+	/* no enforcement, via open */
+	check_tag(_metadata, false, false, REL_OPEN);
+}
+
+TEST(fs_notag_allow_chdir)
+{
+	/* no enforcement, via chdir */
+	check_tag(_metadata, false, false, REL_CHDIR);
+}
+
+TEST(fs_notag_allow_chroot)
+{
+	/* no enforcement, via chroot */
+	check_tag(_metadata, false, false, REL_CHROOT);
+}
+
+TEST(fs_notag_deny_open)
+{
+	/* enforcement without tag, via open */
+	check_tag(_metadata, true, false, REL_OPEN);
+}
+
+TEST(fs_notag_deny_chdir)
+{
+	/* enforcement without tag, via chdir */
+	check_tag(_metadata, true, false, REL_CHDIR);
+}
+
+TEST(fs_notag_deny_chroot)
+{
+	/* enforcement without tag, via chroot */
+	check_tag(_metadata, true, false, REL_CHROOT);
+}
+
+TEST(fs_tag_allow_open)
+{
+	/* enforcement with tag, via open */
+	check_tag(_metadata, true, true, REL_OPEN);
+}
+
+TEST(fs_tag_allow_chdir)
+{
+	/* enforcement with tag, via chdir */
+	check_tag(_metadata, true, true, REL_CHDIR);
+}
+
+TEST(fs_tag_allow_chroot)
+{
+	/* enforcement with tag, via chroot */
+	check_tag(_metadata, true, true, REL_CHROOT);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/test_ptrace.c b/tools/testing/selftests/landlock/test_ptrace.c
new file mode 100644
index 000000000000..1423a60b6e0a
--- /dev/null
+++ b/tools/testing/selftests/landlock/test_ptrace.c
@@ -0,0 +1,158 @@
+/*
+ * Landlock tests - ptrace
+ *
+ * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#define _GNU_SOURCE
+#include <signal.h> /* raise */
+#include <sys/ptrace.h>
+#include <sys/types.h> /* waitpid */
+#include <sys/wait.h> /* waitpid */
+#include <unistd.h> /* fork, pipe */
+
+#include "test.h"
+
+static void apply_null_sandbox(struct __test_metadata *_metadata)
+{
+	const struct bpf_insn prog_accept[] = {
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	const union bpf_prog_subtype subtype = {
+		.landlock_hook = {
+			.type = LANDLOCK_HOOK_FS_PICK,
+			.triggers = LANDLOCK_TRIGGER_FS_PICK_OPEN,
+		}
+	};
+	int prog;
+	char log[256] = "";
+
+	prog = bpf_load_program(BPF_PROG_TYPE_LANDLOCK_HOOK,
+			(const struct bpf_insn *)&prog_accept,
+			sizeof(prog_accept) / sizeof(struct bpf_insn), "GPL",
+			0, log, sizeof(log), &subtype);
+	ASSERT_NE(-1, prog) {
+		TH_LOG("Failed to load minimal rule: %s\n%s",
+				strerror(errno), log);
+	}
+	ASSERT_EQ(0, seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &prog)) {
+		TH_LOG("Failed to apply minimal rule: %s", strerror(errno));
+	}
+	EXPECT_EQ(0, close(prog));
+}
+
+/* PTRACE_TRACEME and PTRACE_ATTACH without Landlock rules effect */
+static void check_ptrace(struct __test_metadata *_metadata,
+		int sandbox_both, int sandbox_parent, int sandbox_child,
+		int expect_ptrace)
+{
+	pid_t child;
+	int status;
+	int pipefd[2];
+
+	ASSERT_EQ(0, pipe(pipefd));
+	if (sandbox_both)
+		apply_null_sandbox(_metadata);
+
+	child = fork();
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		char buf;
+
+		EXPECT_EQ(0, close(pipefd[1]));
+		if (sandbox_child)
+			apply_null_sandbox(_metadata);
+
+		/* test traceme */
+		ASSERT_EQ(expect_ptrace, ptrace(PTRACE_TRACEME));
+		if (expect_ptrace) {
+			ASSERT_EQ(EPERM, errno);
+		} else {
+			ASSERT_EQ(0, raise(SIGSTOP));
+		}
+
+		/* sync */
+		ASSERT_EQ(1, read(pipefd[0], &buf, 1)) {
+			TH_LOG("Failed to read() sync from parent");
+		}
+		ASSERT_EQ('.', buf);
+		_exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE);
+	}
+
+	EXPECT_EQ(0, close(pipefd[0]));
+	if (sandbox_parent)
+		apply_null_sandbox(_metadata);
+
+	/* test traceme */
+	if (!expect_ptrace) {
+		ASSERT_EQ(child, waitpid(child, &status, 0));
+		ASSERT_EQ(1, WIFSTOPPED(status));
+		ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0));
+	}
+	/* test attach */
+	ASSERT_EQ(expect_ptrace, ptrace(PTRACE_ATTACH, child, NULL, 0));
+	if (expect_ptrace) {
+		ASSERT_EQ(EPERM, errno);
+	} else {
+		ASSERT_EQ(child, waitpid(child, &status, 0));
+		ASSERT_EQ(1, WIFSTOPPED(status));
+		ASSERT_EQ(0, ptrace(PTRACE_CONT, child, NULL, 0));
+	}
+
+	/* sync */
+	ASSERT_EQ(1, write(pipefd[1], ".", 1)) {
+		TH_LOG("Failed to write() sync to child");
+	}
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	if (WIFSIGNALED(status) || WEXITSTATUS(status))
+		_metadata->passed = 0;
+}
+
+TEST(ptrace_allow_without_sandbox)
+{
+	/* no sandbox */
+	check_ptrace(_metadata, 0, 0, 0, 0);
+}
+
+TEST(ptrace_allow_with_one_sandbox)
+{
+	/* child sandbox */
+	check_ptrace(_metadata, 0, 0, 1, 0);
+}
+
+TEST(ptrace_allow_with_nested_sandbox)
+{
+	/* inherited and child sandbox */
+	check_ptrace(_metadata, 1, 0, 1, 0);
+}
+
+TEST(ptrace_deny_with_parent_sandbox)
+{
+	/* parent sandbox */
+	check_ptrace(_metadata, 0, 1, 0, -1);
+}
+
+TEST(ptrace_deny_with_nested_and_parent_sandbox)
+{
+	/* inherited and parent sandbox */
+	check_ptrace(_metadata, 1, 1, 0, -1);
+}
+
+TEST(ptrace_deny_with_forked_sandbox)
+{
+	/* inherited, parent and child sandbox */
+	check_ptrace(_metadata, 1, 1, 1, -1);
+}
+
+TEST(ptrace_deny_with_sibling_sandbox)
+{
+	/* parent and child sandbox */
+	check_ptrace(_metadata, 0, 1, 1, -1);
+}
+
+TEST_HARNESS_MAIN
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 11/11] landlock: Add user and kernel documentation for Landlock
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (9 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 10/11] bpf,landlock: Add tests for Landlock Mickaël Salaün
@ 2018-02-27  0:41 ` Mickaël Salaün
  2018-02-27  4:36 ` [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Andy Lutomirski
  11 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev

This documentation can be built with the Sphinx framework.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v7:
* update documentation according to the Landlock revamp

Changes since v6:
* add a check for ctx->event
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose and add a
  dedicated changelog section
* update tables
* relax no_new_privs recommendations
* remove ABILITY_WRITE related functions
* reword rule "appending" to "prepending" and explain it
* cosmetic fixes

Changes since v5:
* update the rule hierarchy inheritance explanation
* briefly explain ctx->arg2
* add ptrace restrictions
* explain EPERM
* update example (subtype)
* use ":manpage:"
---
 Documentation/security/index.rst           |   1 +
 Documentation/security/landlock/index.rst  |  19 +++
 Documentation/security/landlock/kernel.rst | 100 ++++++++++++++
 Documentation/security/landlock/user.rst   | 206 +++++++++++++++++++++++++++++
 4 files changed, 326 insertions(+)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 298a94a33f05..1db294025d0f 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -11,3 +11,4 @@ Security Documentation
    LSM
    self-protection
    tpm/index
+   landlock/index
diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst
new file mode 100644
index 000000000000..8afde6a5805c
--- /dev/null
+++ b/Documentation/security/landlock/index.rst
@@ -0,0 +1,19 @@
+=========================================
+Landlock LSM: programmatic access control
+=========================================
+
+Landlock is a stackable Linux Security Module (LSM) that makes it possible to
+create security sandboxes.  This kind of sandbox is expected to help mitigate
+the security impact of bugs or unexpected/malicious behaviors in user-space
+applications.  The current version allows only a process with the global
+CAP_SYS_ADMIN capability to create such sandboxes but the ultimate goal of
+Landlock is to empower any process, including unprivileged ones, to securely
+restrict themselves.  Landlock is inspired by seccomp-bpf but instead of
+filtering syscalls and their raw arguments, a Landlock rule can inspect the use
+of kernel objects like files and hence make a decision according to the kernel
+semantic.
+
+.. toctree::
+
+    user
+    kernel
diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst
new file mode 100644
index 000000000000..0a52915e346c
--- /dev/null
+++ b/Documentation/security/landlock/kernel.rst
@@ -0,0 +1,100 @@
+==============================
+Landlock: kernel documentation
+==============================
+
+eBPF properties
+===============
+
+To get an expressive language while still being safe and small, Landlock is
+based on eBPF. Landlock should be usable by untrusted processes and must
+therefore expose a minimal attack surface. The eBPF bytecode is minimal,
+powerful, widely used and designed to be used by untrusted applications. Thus,
+reusing the eBPF support in the kernel enables a generic approach while
+minimizing new code.
+
+An eBPF program has access to an eBPF context containing some fields used to
+inspect the current object. These arguments can be used directly (e.g. cookie)
+or passed to helper functions according to their types (e.g. inode pointer). It
+is then possible to do complex access checks without race conditions or
+inconsistent evaluation (i.e.  `incorrect mirroring of the OS code and state
+<https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_).
+
+A Landlock hook describes a particular access type.  For now, there is three
+hooks dedicated to filesystem related operations: LANDLOCK_HOOK_FS_PICK,
+LANDLOCK_HOOK_FS_WALK and LANDLOCK_HOOK_FS_GET.  A Landlock program is tied to
+one hook.  This makes it possible to statically check context accesses,
+potentially performed by such program, and hence prevents kernel address leaks
+and ensure the right use of hook arguments with eBPF functions.  Any user can
+add multiple Landlock programs per Landlock hook.  They are stacked and
+evaluated one after the other, starting from the most recent program, as
+seccomp-bpf does with its filters.  Underneath, a hook is an abstraction over a
+set of LSM hooks.
+
+
+Guiding principles
+==================
+
+Unprivileged use
+----------------
+
+* Landlock helpers and context should be usable by any unprivileged and
+  untrusted program while following the system security policy enforced by
+  other access control mechanisms (e.g. DAC, LSM).
+
+
+Landlock hook and context
+-------------------------
+
+* A Landlock hook shall be focused on access control on kernel objects instead
+  of syscall filtering (i.e. syscall arguments), which is the purpose of
+  seccomp-bpf.
+* A Landlock context provided by a hook shall express the minimal and more
+  generic interface to control an access for a kernel object.
+* A hook shall guaranty that all the BPF function calls from a program are
+  safe.  Thus, the related Landlock context arguments shall always be of the
+  same type for a particular hook.  For example, a network hook could share
+  helpers with a file hook because of UNIX socket.  However, the same helpers
+  may not be compatible for a FS handle and a net handle.
+* Multiple hooks may use the same context interface.
+
+
+Landlock helpers
+----------------
+
+* Landlock helpers shall be as generic as possible while at the same time being
+  as simple as possible and following the syscall creation principles (cf.
+  *Documentation/adding-syscalls.txt*).
+* The only behavior change allowed on a helper is to fix a (logical) bug to
+  match the initial semantic.
+* Helpers shall be reentrant, i.e. only take inputs from arguments (e.g. from
+  the BPF context), to enable a hook to use a cache.  Future program options
+  might change this cache behavior.
+* It is quite easy to add new helpers to extend Landlock.  The main concern
+  should be about the possibility to leak information from the kernel that may
+  not be accessible otherwise (i.e. side-channel attack).
+
+
+Questions and answers
+=====================
+
+Why not create a custom hook for each kind of action?
+-----------------------------------------------------
+
+Landlock programs can handle these checks.  Adding more exceptions to the
+kernel code would lead to more code complexity.  A decision to ignore a kind of
+action can and should be done at the beginning of a Landlock program.
+
+
+Why a program does not return an errno or a kill code?
+------------------------------------------------------
+
+seccomp filters can return multiple kind of code, including an errno value or a
+kill signal, which may be convenient for access control.  Those return codes
+are hardwired in the userland ABI.  Instead, Landlock's approach is to return a
+boolean to allow or deny an action, which is much simpler and more generic.
+Moreover, we do not really have a choice because, unlike to seccomp, Landlock
+programs are not enforced at the syscall entry point but may be executed at any
+point in the kernel (through LSM hooks) where an errno return code may not make
+sense.  However, with this simple ABI and with the ability to call helpers,
+Landlock may gain features similar to seccomp-bpf in the future while being
+compatible with previous programs.
diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst
new file mode 100644
index 000000000000..3130063c9087
--- /dev/null
+++ b/Documentation/security/landlock/user.rst
@@ -0,0 +1,206 @@
+================================
+Landlock: userland documentation
+================================
+
+Landlock programs
+=================
+
+eBPF programs are used to create security programs.  They are contained and can
+call only a whitelist of dedicated functions. Moreover, they cannot loop, which
+protects from denial of service.  More information on BPF can be found in
+*Documentation/networking/filter.txt*.
+
+
+Writing a program
+-----------------
+
+To enforce a security policy, a thread first needs to create a Landlock program.
+The easiest way to write an eBPF program depicting a security program is to write
+it in the C language.  As described in *samples/bpf/README.rst*, LLVM can
+compile such programs.  Files *samples/bpf/landlock1_kern.c* and those in
+*tools/testing/selftests/landlock/* can be used as examples.
+
+Once the eBPF program is created, the next step is to create the metadata
+describing the Landlock program.  This metadata includes a subtype which
+contains the hook type to which the program is tied and some options.
+
+.. code-block:: c
+
+    union bpf_prog_subtype subtype = {
+        .landlock_hook = {
+            .type = LANDLOCK_HOOK_FS_PICK,
+            .triggers = LANDLOCK_TRIGGER_FS_PICK_OPEN,
+        }
+    };
+
+A Landlock hook describes the kind of kernel object for which a program will be
+triggered to allow or deny an action.  For example, the hook
+LANDLOCK_HOOK_FS_PICK can be triggered every time a landlocked thread performs
+a set of action related to the filesystem (e.g. open, read, write, mount...).
+This actions are identified by the `triggers` bitfield.
+
+The next step is to fill a :c:type:`union bpf_attr <bpf_attr>` with
+BPF_PROG_TYPE_LANDLOCK_HOOK, the previously created subtype and other BPF
+program metadata.  This bpf_attr must then be passed to the :manpage:`bpf(2)`
+syscall alongside the BPF_PROG_LOAD command.  If everything is deemed correct
+by the kernel, the thread gets a file descriptor referring to this program.
+
+In the following code, the *insn* variable is an array of BPF instructions
+which can be extracted from an ELF file as is done in bpf_load_file() from
+*samples/bpf/bpf_load.c*.
+
+.. code-block:: c
+
+    union bpf_attr attr = {
+        .prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+        .insn_cnt = sizeof(insn) / sizeof(struct bpf_insn),
+        .insns = (__u64) (unsigned long) insn,
+        .license = (__u64) (unsigned long) "GPL",
+        .prog_subtype = &subtype,
+        .prog_subtype_size = sizeof(subtype),
+    };
+    int fd = bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+    if (fd == -1)
+        exit(1);
+
+
+Enforcing a program
+-------------------
+
+Once the Landlock program has been created or received (e.g. through a UNIX
+socket), the thread willing to sandbox itself (and its future children) should
+perform the following two steps.
+
+The thread should first request to never be allowed to get new privileges with a
+call to :manpage:`prctl(2)` and the PR_SET_NO_NEW_PRIVS option.  More
+information can be found in *Documentation/prctl/no_new_privs.txt*.
+
+.. code-block:: c
+
+    if (prctl(PR_SET_NO_NEW_PRIVS, 1, NULL, 0, 0))
+        exit(1);
+
+A thread can apply a program to itself by using the :manpage:`seccomp(2)` syscall.
+The operation is SECCOMP_PREPEND_LANDLOCK_PROG, the flags must be empty and the
+*args* argument must point to a valid Landlock program file descriptor.
+
+.. code-block:: c
+
+    if (seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &fd))
+        exit(1);
+
+If the syscall succeeds, the program is now enforced on the calling thread and
+will be enforced on all its subsequently created children of the thread as
+well.  Once a thread is landlocked, there is no way to remove this security
+policy, only stacking more restrictions is allowed.  The program evaluation is
+performed from the newest to the oldest.
+
+When a syscall ask for an action on a kernel object, if this action is denied,
+then an EACCES errno code is returned through the syscall.
+
+
+.. _inherited_programs:
+
+Inherited programs
+------------------
+
+Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program
+restrictions from its parent.  This is similar to the seccomp inheritance as
+described in *Documentation/prctl/seccomp_filter.txt*.
+
+
+Ptrace restrictions
+-------------------
+
+A landlocked process has less privileges than a non-landlocked process and must
+then be subject to additional restrictions when manipulating another process.
+To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target
+process, a landlocked process must have a subset of the target process programs.
+
+
+Chained programs
+================
+
+Landlock programs can be chained according to the hook they are tied to.  This
+enable to keep a state between multiple program evaluation for an object access
+check (e.g. walking through a file path).  The *cookie* field from the context
+can be used as a temporary storage shared between a chain of programs.
+
+The following graph is an example of the chain of programs used in
+*samples/bpf/landlock1_kern.c*.  The fs_walk program evaluate if a file is
+beneath a set of file hierarchy.  The first fs_pick program may be called when
+there is a read-like action (i.e. trigger for open, chdir, getattr...).  The
+second fs_pick program may be called for write-like actions.  And finally, the
+fs_get program is called to tag a file when it is open, receive or when the
+current task changes directory.  This tagging is needed to be able to keep the
+state of this file evaluation for a next one involving the same opened file.
+
+::
+
+    .---------.
+    | fs_walk |
+    '---------'
+         |
+         v
+    .---------.
+    | fs_pick |  open, chdir, getattr...
+    '---------'
+         |
+         v
+    .---------.
+    | fs_pick |  create, write, link...
+    '---------'
+         |
+         v
+    .--------.
+    | fs_get |
+    '--------'
+
+
+Landlock structures and constants
+=================================
+
+Hook types
+----------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :functions: landlock_hook_type
+
+
+Contexts
+--------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :functions: landlock_ctx_fs_pick landlock_ctx_fs_walk landlock_ctx_fs_get
+
+
+Triggers for fs_pick
+--------------------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :functions: landlock_triggers
+
+
+Helper functions
+----------------
+
+::
+
+    u64 bpf_inode_get_tag(inode, chain)
+        @inode: pointer to struct inode
+        @chain: pointer to struct landlock_chain
+        Return: tag tied to this inode and chain, or zero if none
+
+    int bpf_landlock_set_tag(tag_obj, chain, value)
+        @tag_obj: pointer pointing to a taggable object (e.g. inode)
+        @chain: pointer to struct landlock_chain
+        @value: value of the tag
+        Return: 0 on success or negative error code
+
+See *include/uapi/linux/bpf.h* for other functions documentation.
+
+
+Additional documentation
+========================
+
+See https://landlock.io
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata
  2018-02-27  0:41 ` [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata Mickaël Salaün
@ 2018-02-27  0:57   ` Al Viro
  2018-02-27  1:23     ` Al Viro
  2018-02-28 16:27   ` kbuild test robot
  2018-02-28 16:58   ` kbuild test robot
  2 siblings, 1 reply; 55+ messages in thread
From: Al Viro @ 2018-02-27  0:57 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	James Morris, John Johansen, Stephen Smalley, Tetsuo Handa,
	linux-fsdevel

On Tue, Feb 27, 2018 at 01:41:11AM +0100, Mickaël Salaün wrote:
> The function current_nameidata_security(struct inode *) can be used to
> retrieve a blob's pointer address tied to the inode being walk through.
> This enable to follow a path lookup and know where an inode access come
> from. This is needed for the Landlock LSM to be able to restrict access
> to file path.
> 
> The LSM hook nameidata_free_security(struct inode *) is called before
> freeing the associated nameidata.

NAK.  Not without well-defined semantics and "some Linux S&M uses that for
something, don't ask what" does not count.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata
  2018-02-27  0:57   ` Al Viro
@ 2018-02-27  1:23     ` Al Viro
  2018-03-11 20:14       ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Al Viro @ 2018-02-27  1:23 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	James Morris, John Johansen, Stephen Smalley, Tetsuo Handa,
	linux-fsdevel

On Tue, Feb 27, 2018 at 12:57:21AM +0000, Al Viro wrote:
> On Tue, Feb 27, 2018 at 01:41:11AM +0100, Mickaël Salaün wrote:
> > The function current_nameidata_security(struct inode *) can be used to
> > retrieve a blob's pointer address tied to the inode being walk through.
> > This enable to follow a path lookup and know where an inode access come
> > from. This is needed for the Landlock LSM to be able to restrict access
> > to file path.
> > 
> > The LSM hook nameidata_free_security(struct inode *) is called before
> > freeing the associated nameidata.
> 
> NAK.  Not without well-defined semantics and "some Linux S&M uses that for
> something, don't ask what" does not count.

Incidentally, pathwalk mechanics is subject to change at zero notice, so
if you want something, you'd better
	* have explicitly defined semantics
	* explain what it is - on fsdevel
	* not have it hidden behind the layers of opaque LSM dreck, pardon
the redundance.

Again, pathwalk internals have changed in the past and may bloody well
change again in the future.  There's a damn good reason why struct nameidata
is _not_ visible outside of fs/namei.c, and quietly relying upon any
implementation details is no-go.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27  0:41 ` [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy Mickaël Salaün
@ 2018-02-27  2:08   ` Alexei Starovoitov
  2018-02-27  4:40     ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2018-02-27  2:08 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	Andrew Morton

On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
> The seccomp(2) syscall can be used by a task to apply a Landlock program
> to itself. As a seccomp filter, a Landlock program is enforced for the
> current task and all its future children. A program is immutable and a
> task can only add new restricting programs to itself, forming a list of
> programss.
> 
> A Landlock program is tied to a Landlock hook. If the action on a kernel
> object is allowed by the other Linux security mechanisms (e.g. DAC,
> capabilities, other LSM), then a Landlock hook related to this kind of
> object is triggered. The list of programs for this hook is then
> evaluated. Each program return a 32-bit value which can deny the action
> on a kernel object with a non-zero value. If every programs of the list
> return zero, then the action on the object is allowed.
> 
> Multiple Landlock programs can be chained to share a 64-bits value for a
> call chain (e.g. evaluating multiple elements of a file path).  This
> chaining is restricted when a process construct this chain by loading a
> program, but additional checks are performed when it requests to apply
> this chain of programs to itself.  The restrictions ensure that it is
> not possible to call multiple programs in a way that would imply to
> handle multiple shared values (i.e. cookies) for one chain.  For now,
> only a fs_pick program can be chained to the same type of program,
> because it may make sense if they have different triggers (cf. next
> commits).  This restrictions still allows to reuse Landlock programs in
> a safe way (e.g. use the same loaded fs_walk program with multiple
> chains of fs_pick programs).
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>

...

> +struct landlock_prog_set *landlock_prepend_prog(
> +		struct landlock_prog_set *current_prog_set,
> +		struct bpf_prog *prog)
> +{
> +	struct landlock_prog_set *new_prog_set = current_prog_set;
> +	unsigned long pages;
> +	int err;
> +	size_t i;
> +	struct landlock_prog_set tmp_prog_set = {};
> +
> +	if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* validate memory size allocation */
> +	pages = prog->pages;
> +	if (current_prog_set) {
> +		size_t i;
> +
> +		for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
> +			struct landlock_prog_list *walker_p;
> +
> +			for (walker_p = current_prog_set->programs[i];
> +					walker_p; walker_p = walker_p->prev)
> +				pages += walker_p->prog->pages;
> +		}
> +		/* count a struct landlock_prog_set if we need to allocate one */
> +		if (refcount_read(&current_prog_set->usage) != 1)
> +			pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
> +				/ PAGE_SIZE;
> +	}
> +	if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
> +		return ERR_PTR(-E2BIG);
> +
> +	/* ensure early that we can allocate enough memory for the new
> +	 * prog_lists */
> +	err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	/*
> +	 * Each task_struct points to an array of prog list pointers.  These
> +	 * tables are duplicated when additions are made (which means each
> +	 * table needs to be refcounted for the processes using it). When a new
> +	 * table is created, all the refcounters on the prog_list are bumped (to
> +	 * track each table that references the prog). When a new prog is
> +	 * added, it's just prepended to the list for the new table to point
> +	 * at.
> +	 *
> +	 * Manage all the possible errors before this step to not uselessly
> +	 * duplicate current_prog_set and avoid a rollback.
> +	 */
> +	if (!new_prog_set) {
> +		/*
> +		 * If there is no Landlock program set used by the current task,
> +		 * then create a new one.
> +		 */
> +		new_prog_set = new_landlock_prog_set();
> +		if (IS_ERR(new_prog_set))
> +			goto put_tmp_lists;
> +	} else if (refcount_read(&current_prog_set->usage) > 1) {
> +		/*
> +		 * If the current task is not the sole user of its Landlock
> +		 * program set, then duplicate them.
> +		 */
> +		new_prog_set = new_landlock_prog_set();
> +		if (IS_ERR(new_prog_set))
> +			goto put_tmp_lists;
> +		for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
> +			new_prog_set->programs[i] =
> +				READ_ONCE(current_prog_set->programs[i]);
> +			if (new_prog_set->programs[i])
> +				refcount_inc(&new_prog_set->programs[i]->usage);
> +		}
> +
> +		/*
> +		 * Landlock program set from the current task will not be freed
> +		 * here because the usage is strictly greater than 1. It is
> +		 * only prevented to be freed by another task thanks to the
> +		 * caller of landlock_prepend_prog() which should be locked if
> +		 * needed.
> +		 */
> +		landlock_put_prog_set(current_prog_set);
> +	}
> +
> +	/* prepend tmp_prog_set to new_prog_set */
> +	for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
> +		/* get the last new list */
> +		struct landlock_prog_list *last_list =
> +			tmp_prog_set.programs[i];
> +
> +		if (last_list) {
> +			while (last_list->prev)
> +				last_list = last_list->prev;
> +			/* no need to increment usage (pointer replacement) */
> +			last_list->prev = new_prog_set->programs[i];
> +			new_prog_set->programs[i] = tmp_prog_set.programs[i];
> +		}
> +	}
> +	new_prog_set->chain_last = tmp_prog_set.chain_last;
> +	return new_prog_set;
> +
> +put_tmp_lists:
> +	for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
> +		put_landlock_prog_list(tmp_prog_set.programs[i]);
> +	return new_prog_set;
> +}

Nack on the chaining concept.
Please do not reinvent the wheel.
There is an existing mechanism for attaching/detaching/quering multiple
programs attached to cgroup and tracing hooks that are also
efficiently executed via BPF_PROG_RUN_ARRAY.
Please use that instead.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27  0:41 ` [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions Mickaël Salaün
@ 2018-02-27  4:17   ` Andy Lutomirski
  2018-02-27  5:01     ` Andy Lutomirski
  2018-02-27 22:18     ` Mickaël Salaün
  0 siblings, 2 replies; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27  4:17 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development

On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
> A landlocked process has less privileges than a non-landlocked process
> and must then be subject to additional restrictions when manipulating
> processes. To be allowed to use ptrace(2) and related syscalls on a
> target process, a landlocked process must have a subset of the target
> process' rules.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> ---
>
> Changes since v6:
> * factor out ptrace check
> * constify pointers
> * cleanup headers
> * use the new security_add_hooks()
> ---
>  security/landlock/Makefile       |   2 +-
>  security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
>  security/landlock/hooks_ptrace.h |  11 ++++
>  security/landlock/init.c         |   2 +
>  4 files changed, 138 insertions(+), 1 deletion(-)
>  create mode 100644 security/landlock/hooks_ptrace.c
>  create mode 100644 security/landlock/hooks_ptrace.h
>
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index d0f532a93b4e..605504d852d3 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>  landlock-y := init.o chain.o task.o \
>         tag.o tag_fs.o \
>         enforce.o enforce_seccomp.o \
> -       hooks.o hooks_cred.o hooks_fs.o
> +       hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
> diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
> new file mode 100644
> index 000000000000..f1b977b9c808
> --- /dev/null
> +++ b/security/landlock/hooks_ptrace.c
> @@ -0,0 +1,124 @@
> +/*
> + * Landlock LSM - ptrace hooks
> + *
> + * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <asm/current.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h> /* ARRAY_SIZE */
> +#include <linux/lsm_hooks.h>
> +#include <linux/sched.h> /* struct task_struct */
> +#include <linux/seccomp.h>
> +
> +#include "common.h" /* struct landlock_prog_set */
> +#include "hooks.h" /* landlocked() */
> +#include "hooks_ptrace.h"
> +
> +static bool progs_are_subset(const struct landlock_prog_set *parent,
> +               const struct landlock_prog_set *child)
> +{
> +       size_t i;
> +
> +       if (!parent || !child)
> +               return false;
> +       if (parent == child)
> +               return true;
> +
> +       for (i = 0; i < ARRAY_SIZE(child->programs); i++) {

ARRAY_SIZE(child->programs) seems misleading.  Is there no define
NUM_LANDLOCK_PROG_TYPES or similar?

> +               struct landlock_prog_list *walker;
> +               bool found_parent = false;
> +
> +               if (!parent->programs[i])
> +                       continue;
> +               for (walker = child->programs[i]; walker;
> +                               walker = walker->prev) {
> +                       if (walker == parent->programs[i]) {
> +                               found_parent = true;
> +                               break;
> +                       }
> +               }
> +               if (!found_parent)
> +                       return false;
> +       }
> +       return true;
> +}

If you used seccomp, you'd get this type of check for free, and it
would be a lot easier to comprehend.  AFAICT the only extra leniency
you're granting is that you're agnostic to the order in which the
rules associated with different program types were applied, which
could easily be added to seccomp.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
                   ` (10 preceding siblings ...)
  2018-02-27  0:41 ` [PATCH bpf-next v8 11/11] landlock: Add user and kernel documentation " Mickaël Salaün
@ 2018-02-27  4:36 ` Andy Lutomirski
  2018-02-27 22:03   ` Mickaël Salaün
  11 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27  4:36 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development

On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
> Hi,
>
> This eight series is a major revamp of the Landlock design compared to
> the previous series [1]. This enables more flexibility and granularity
> of access control with file paths. It is now possible to enforce an
> access control according to a file hierarchy. Landlock uses the concept
> of inode and path to identify such hierarchy. In a way, it brings tools
> to program what is a file hierarchy.
>
> There is now three types of Landlock hooks: FS_WALK, FS_PICK and FS_GET.
> Each of them accepts a dedicated eBPF program, called a Landlock
> program.  They can be chained to enforce a full access control according
> to a list of directories or files. The set of actions on a file is well
> defined (e.g. read, write, ioctl, append, lock, mount...) taking
> inspiration from the major Linux LSMs and some other access-controls
> like Capsicum.  These program types are designed to be cache-friendly,
> which give room for optimizations in the future.
>
> The documentation patch contains some kernel documentation and
> explanations on how to use Landlock.  The compiled documentation and
> a talk I gave at FOSDEM can be found here: https://landlock.io
> This patch series can be found in the branch landlock-v8 in this repo:
> https://github.com/landlock-lsm/linux
>
> There is still some minor issues with this patch series but it should
> demonstrate how powerful this design may be. One of these issues is that
> it is not a stackable LSM anymore, but the infrastructure management of
> security blobs should allow to stack it with other LSM [4].
>
> This is the first step of the roadmap discussed at LPC [2].  While the
> intended final goal is to allow unprivileged users to use Landlock, this
> series allows only a process with global CAP_SYS_ADMIN to load and
> enforce a rule.  This may help to get feedback and avoid unexpected
> behaviors.
>
> This series can be applied on top of bpf-next, commit 7d72637eb39f
> ("Merge branch 'x86-jit'").  This can be tested with
> CONFIG_SECCOMP_FILTER and CONFIG_SECURITY_LANDLOCK.  I would really
> appreciate constructive comments on the design and the code.
>
>
> # Landlock LSM
>
> The goal of this new Linux Security Module (LSM) called Landlock is to
> allow any process, including unprivileged ones, to create powerful
> security sandboxes comparable to XNU Sandbox or OpenBSD Pledge. This
> kind of sandbox is expected to help mitigate the security impact of bugs
> or unexpected/malicious behaviors in user-space applications.
>
> The approach taken is to add the minimum amount of code while still
> allowing the user-space application to create quite complex access
> rules.  A dedicated security policy language such as the one used by
> SELinux, AppArmor and other major LSMs involves a lot of code and is
> usually permitted to only a trusted user (i.e. root).  On the contrary,
> eBPF programs already exist and are designed to be safely loaded by
> unprivileged user-space.
>
> This design does not seem too intrusive but is flexible enough to allow
> a powerful sandbox mechanism accessible by any process on Linux. The use
> of seccomp and Landlock is more suitable with the help of a user-space
> library (e.g.  libseccomp) that could help to specify a high-level
> language to express a security policy instead of raw eBPF programs.
> Moreover, thanks to the LLVM front-end, it is quite easy to write an
> eBPF program with a subset of the C language.
>
>
> # Frequently asked questions
>
> ## Why is seccomp-bpf not enough?
>
> A seccomp filter can access only raw syscall arguments (i.e. the
> register values) which means that it is not possible to filter according
> to the value pointed to by an argument, such as a file pathname. As an
> embryonic Landlock version demonstrated, filtering at the syscall level
> is complicated (e.g. need to take care of race conditions). This is
> mainly because the access control checkpoints of the kernel are not at
> this high-level but more underneath, at the LSM-hook level. The LSM
> hooks are designed to handle this kind of checks.  Landlock abstracts
> this approach to leverage the ability of unprivileged users to limit
> themselves.
>
> Cf. section "What it isn't?" in Documentation/prctl/seccomp_filter.txt
>
>
> ## Why use the seccomp(2) syscall?
>
> Landlock use the same semantic as seccomp to apply access rule
> restrictions. It add a new layer of security for the current process
> which is inherited by its children. It makes sense to use an unique
> access-restricting syscall (that should be allowed by seccomp filters)
> which can only drop privileges. Moreover, a Landlock rule could come
> from outside a process (e.g.  passed through a UNIX socket). It is then
> useful to differentiate the creation/load of Landlock eBPF programs via
> bpf(2), from rule enforcement via seccomp(2).

This seems like a weak argument to me.  Sure, this is a bit different
from seccomp(), and maybe shoving it into the seccomp() multiplexer is
awkward, but surely the bpf() multiplexer is even less applicable.

But I think that you have more in common with seccomp() than you're
giving it credit for.  With seccomp, you need to either prevent
ptrace() of any more-privileged task or you need to filter to make
sure you can't trace a more privileged program.  With landlock, you
need exactly the same thing.  You have basically the same no_new_privs
considerations, etc.

Also, looking forward, I think you're going to want a bunch of the
stuff that's under consideration as new seccomp features.  Tycho is
working on a "user notifier" feature for seccomp where, in addition to
accepting, rejecting, or kicking to ptrace, you can send a message to
the creator of the filter and wait for a reply.  I think that Landlock
will want exactly the same feature.

In other words, it really seems to be that you should extend seccomp()
with the ability to attach filters to things that aren't syscall
entry, e.g. file open.

I would also seriously consider doing a scaled-back Landlock variant
first, with the intent of getting the main mechanism into the kernel.
In particular, there are two big sources of complexity in Landlock.
You need to deal with the API for managing bpf programs that filter
various actions beyond just syscall entry, and you need to deal with
giving those filters a way to deal with inodes, paths, etc.  But you
can do the former without the latter.  For example, you could start
with some Landlock-style filters on things that have nothing to do
with files.  For example, you could allow a filter for connecting to
an abstract-namespace unix socket.  Or you could have a hook for
file_receive.  (You couldn't meaningfully filter based on the *path*
of the fd being received without adding all the path infrastructure,
but you could fitler on the *type* of the fd being received.)  Both of
these add new sandboxing abilities that don't currently exist.  In
particular, you can't write a seccomp rule that prevents receiving an
fd using recvmsg() right now unless you block cmsg entirely.  And you
can't write a filter that allows connecting to unix sockets by path
without allowing abstract namespace sockets either.

If you split up Landlock like this then, once you got all the
installation and management of filters down, you could submit patches
to add all the path stuff and deal with that review separately.

What do you all think?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27  2:08   ` Alexei Starovoitov
@ 2018-02-27  4:40     ` Andy Lutomirski
  2018-02-27  4:54       ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27  4:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development,
	Andrew Morton

On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>> The seccomp(2) syscall can be used by a task to apply a Landlock program
>> to itself. As a seccomp filter, a Landlock program is enforced for the
>> current task and all its future children. A program is immutable and a
>> task can only add new restricting programs to itself, forming a list of
>> programss.
>>
>> A Landlock program is tied to a Landlock hook. If the action on a kernel
>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>> capabilities, other LSM), then a Landlock hook related to this kind of
>> object is triggered. The list of programs for this hook is then
>> evaluated. Each program return a 32-bit value which can deny the action
>> on a kernel object with a non-zero value. If every programs of the list
>> return zero, then the action on the object is allowed.
>>
>> Multiple Landlock programs can be chained to share a 64-bits value for a
>> call chain (e.g. evaluating multiple elements of a file path).  This
>> chaining is restricted when a process construct this chain by loading a
>> program, but additional checks are performed when it requests to apply
>> this chain of programs to itself.  The restrictions ensure that it is
>> not possible to call multiple programs in a way that would imply to
>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>> only a fs_pick program can be chained to the same type of program,
>> because it may make sense if they have different triggers (cf. next
>> commits).  This restrictions still allows to reuse Landlock programs in
>> a safe way (e.g. use the same loaded fs_walk program with multiple
>> chains of fs_pick programs).
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>
> ...
>
>> +struct landlock_prog_set *landlock_prepend_prog(
>> +             struct landlock_prog_set *current_prog_set,
>> +             struct bpf_prog *prog)
>> +{
>> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>> +     unsigned long pages;
>> +     int err;
>> +     size_t i;
>> +     struct landlock_prog_set tmp_prog_set = {};
>> +
>> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>> +             return ERR_PTR(-EINVAL);
>> +
>> +     /* validate memory size allocation */
>> +     pages = prog->pages;
>> +     if (current_prog_set) {
>> +             size_t i;
>> +
>> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>> +                     struct landlock_prog_list *walker_p;
>> +
>> +                     for (walker_p = current_prog_set->programs[i];
>> +                                     walker_p; walker_p = walker_p->prev)
>> +                             pages += walker_p->prog->pages;
>> +             }
>> +             /* count a struct landlock_prog_set if we need to allocate one */
>> +             if (refcount_read(&current_prog_set->usage) != 1)
>> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>> +                             / PAGE_SIZE;
>> +     }
>> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>> +             return ERR_PTR(-E2BIG);
>> +
>> +     /* ensure early that we can allocate enough memory for the new
>> +      * prog_lists */
>> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>> +     if (err)
>> +             return ERR_PTR(err);
>> +
>> +     /*
>> +      * Each task_struct points to an array of prog list pointers.  These
>> +      * tables are duplicated when additions are made (which means each
>> +      * table needs to be refcounted for the processes using it). When a new
>> +      * table is created, all the refcounters on the prog_list are bumped (to
>> +      * track each table that references the prog). When a new prog is
>> +      * added, it's just prepended to the list for the new table to point
>> +      * at.
>> +      *
>> +      * Manage all the possible errors before this step to not uselessly
>> +      * duplicate current_prog_set and avoid a rollback.
>> +      */
>> +     if (!new_prog_set) {
>> +             /*
>> +              * If there is no Landlock program set used by the current task,
>> +              * then create a new one.
>> +              */
>> +             new_prog_set = new_landlock_prog_set();
>> +             if (IS_ERR(new_prog_set))
>> +                     goto put_tmp_lists;
>> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>> +             /*
>> +              * If the current task is not the sole user of its Landlock
>> +              * program set, then duplicate them.
>> +              */
>> +             new_prog_set = new_landlock_prog_set();
>> +             if (IS_ERR(new_prog_set))
>> +                     goto put_tmp_lists;
>> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>> +                     new_prog_set->programs[i] =
>> +                             READ_ONCE(current_prog_set->programs[i]);
>> +                     if (new_prog_set->programs[i])
>> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>> +             }
>> +
>> +             /*
>> +              * Landlock program set from the current task will not be freed
>> +              * here because the usage is strictly greater than 1. It is
>> +              * only prevented to be freed by another task thanks to the
>> +              * caller of landlock_prepend_prog() which should be locked if
>> +              * needed.
>> +              */
>> +             landlock_put_prog_set(current_prog_set);
>> +     }
>> +
>> +     /* prepend tmp_prog_set to new_prog_set */
>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>> +             /* get the last new list */
>> +             struct landlock_prog_list *last_list =
>> +                     tmp_prog_set.programs[i];
>> +
>> +             if (last_list) {
>> +                     while (last_list->prev)
>> +                             last_list = last_list->prev;
>> +                     /* no need to increment usage (pointer replacement) */
>> +                     last_list->prev = new_prog_set->programs[i];
>> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>> +             }
>> +     }
>> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>> +     return new_prog_set;
>> +
>> +put_tmp_lists:
>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>> +     return new_prog_set;
>> +}
>
> Nack on the chaining concept.
> Please do not reinvent the wheel.
> There is an existing mechanism for attaching/detaching/quering multiple
> programs attached to cgroup and tracing hooks that are also
> efficiently executed via BPF_PROG_RUN_ARRAY.
> Please use that instead.
>

I don't see how that would help.  Suppose you add a filter, then
fork(), and then the child adds another filter.  Do you want to
duplicate the entire array?  You certainly can't *modify* the array
because you'll affect processes that shouldn't be affected.

In contrast, doing this through seccomp like the earlier patches
seemed just fine to me, and seccomp already had the right logic.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27  4:40     ` Andy Lutomirski
@ 2018-02-27  4:54       ` Alexei Starovoitov
  2018-02-27  5:20         ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2018-02-27  4:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development,
	Andrew Morton

On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
> >> The seccomp(2) syscall can be used by a task to apply a Landlock program
> >> to itself. As a seccomp filter, a Landlock program is enforced for the
> >> current task and all its future children. A program is immutable and a
> >> task can only add new restricting programs to itself, forming a list of
> >> programss.
> >>
> >> A Landlock program is tied to a Landlock hook. If the action on a kernel
> >> object is allowed by the other Linux security mechanisms (e.g. DAC,
> >> capabilities, other LSM), then a Landlock hook related to this kind of
> >> object is triggered. The list of programs for this hook is then
> >> evaluated. Each program return a 32-bit value which can deny the action
> >> on a kernel object with a non-zero value. If every programs of the list
> >> return zero, then the action on the object is allowed.
> >>
> >> Multiple Landlock programs can be chained to share a 64-bits value for a
> >> call chain (e.g. evaluating multiple elements of a file path).  This
> >> chaining is restricted when a process construct this chain by loading a
> >> program, but additional checks are performed when it requests to apply
> >> this chain of programs to itself.  The restrictions ensure that it is
> >> not possible to call multiple programs in a way that would imply to
> >> handle multiple shared values (i.e. cookies) for one chain.  For now,
> >> only a fs_pick program can be chained to the same type of program,
> >> because it may make sense if they have different triggers (cf. next
> >> commits).  This restrictions still allows to reuse Landlock programs in
> >> a safe way (e.g. use the same loaded fs_walk program with multiple
> >> chains of fs_pick programs).
> >>
> >> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> >
> > ...
> >
> >> +struct landlock_prog_set *landlock_prepend_prog(
> >> +             struct landlock_prog_set *current_prog_set,
> >> +             struct bpf_prog *prog)
> >> +{
> >> +     struct landlock_prog_set *new_prog_set = current_prog_set;
> >> +     unsigned long pages;
> >> +     int err;
> >> +     size_t i;
> >> +     struct landlock_prog_set tmp_prog_set = {};
> >> +
> >> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
> >> +             return ERR_PTR(-EINVAL);
> >> +
> >> +     /* validate memory size allocation */
> >> +     pages = prog->pages;
> >> +     if (current_prog_set) {
> >> +             size_t i;
> >> +
> >> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
> >> +                     struct landlock_prog_list *walker_p;
> >> +
> >> +                     for (walker_p = current_prog_set->programs[i];
> >> +                                     walker_p; walker_p = walker_p->prev)
> >> +                             pages += walker_p->prog->pages;
> >> +             }
> >> +             /* count a struct landlock_prog_set if we need to allocate one */
> >> +             if (refcount_read(&current_prog_set->usage) != 1)
> >> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
> >> +                             / PAGE_SIZE;
> >> +     }
> >> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
> >> +             return ERR_PTR(-E2BIG);
> >> +
> >> +     /* ensure early that we can allocate enough memory for the new
> >> +      * prog_lists */
> >> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
> >> +     if (err)
> >> +             return ERR_PTR(err);
> >> +
> >> +     /*
> >> +      * Each task_struct points to an array of prog list pointers.  These
> >> +      * tables are duplicated when additions are made (which means each
> >> +      * table needs to be refcounted for the processes using it). When a new
> >> +      * table is created, all the refcounters on the prog_list are bumped (to
> >> +      * track each table that references the prog). When a new prog is
> >> +      * added, it's just prepended to the list for the new table to point
> >> +      * at.
> >> +      *
> >> +      * Manage all the possible errors before this step to not uselessly
> >> +      * duplicate current_prog_set and avoid a rollback.
> >> +      */
> >> +     if (!new_prog_set) {
> >> +             /*
> >> +              * If there is no Landlock program set used by the current task,
> >> +              * then create a new one.
> >> +              */
> >> +             new_prog_set = new_landlock_prog_set();
> >> +             if (IS_ERR(new_prog_set))
> >> +                     goto put_tmp_lists;
> >> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
> >> +             /*
> >> +              * If the current task is not the sole user of its Landlock
> >> +              * program set, then duplicate them.
> >> +              */
> >> +             new_prog_set = new_landlock_prog_set();
> >> +             if (IS_ERR(new_prog_set))
> >> +                     goto put_tmp_lists;
> >> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
> >> +                     new_prog_set->programs[i] =
> >> +                             READ_ONCE(current_prog_set->programs[i]);
> >> +                     if (new_prog_set->programs[i])
> >> +                             refcount_inc(&new_prog_set->programs[i]->usage);
> >> +             }
> >> +
> >> +             /*
> >> +              * Landlock program set from the current task will not be freed
> >> +              * here because the usage is strictly greater than 1. It is
> >> +              * only prevented to be freed by another task thanks to the
> >> +              * caller of landlock_prepend_prog() which should be locked if
> >> +              * needed.
> >> +              */
> >> +             landlock_put_prog_set(current_prog_set);
> >> +     }
> >> +
> >> +     /* prepend tmp_prog_set to new_prog_set */
> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
> >> +             /* get the last new list */
> >> +             struct landlock_prog_list *last_list =
> >> +                     tmp_prog_set.programs[i];
> >> +
> >> +             if (last_list) {
> >> +                     while (last_list->prev)
> >> +                             last_list = last_list->prev;
> >> +                     /* no need to increment usage (pointer replacement) */
> >> +                     last_list->prev = new_prog_set->programs[i];
> >> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
> >> +             }
> >> +     }
> >> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
> >> +     return new_prog_set;
> >> +
> >> +put_tmp_lists:
> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
> >> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
> >> +     return new_prog_set;
> >> +}
> >
> > Nack on the chaining concept.
> > Please do not reinvent the wheel.
> > There is an existing mechanism for attaching/detaching/quering multiple
> > programs attached to cgroup and tracing hooks that are also
> > efficiently executed via BPF_PROG_RUN_ARRAY.
> > Please use that instead.
> >
> 
> I don't see how that would help.  Suppose you add a filter, then
> fork(), and then the child adds another filter.  Do you want to
> duplicate the entire array?  You certainly can't *modify* the array
> because you'll affect processes that shouldn't be affected.
> 
> In contrast, doing this through seccomp like the earlier patches
> seemed just fine to me, and seccomp already had the right logic.

it doesn't look to me that existing seccomp side of managing fork
situation can be reused. Here there is an attempt to add 'chaining'
concept which sort of an extension of existing seccomp style,
but somehow heavily done on bpf side and contradicts cgroup/tracing.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27  4:17   ` Andy Lutomirski
@ 2018-02-27  5:01     ` Andy Lutomirski
  2018-02-27 22:14       ` Mickaël Salaün
  2018-02-27 22:18     ` Mickaël Salaün
  1 sibling, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27  5:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development



> On Feb 26, 2018, at 8:17 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> A landlocked process has less privileges than a non-landlocked process
>> and must then be subject to additional restrictions when manipulating
>> processes. To be allowed to use ptrace(2) and related syscalls on a
>> target process, a landlocked process must have a subset of the target
>> process' rules.
>> 
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Serge E. Hallyn <serge@hallyn.com>
>> ---
>> 
>> Changes since v6:
>> * factor out ptrace check
>> * constify pointers
>> * cleanup headers
>> * use the new security_add_hooks()
>> ---
>> security/landlock/Makefile       |   2 +-
>> security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
>> security/landlock/hooks_ptrace.h |  11 ++++
>> security/landlock/init.c         |   2 +
>> 4 files changed, 138 insertions(+), 1 deletion(-)
>> create mode 100644 security/landlock/hooks_ptrace.c
>> create mode 100644 security/landlock/hooks_ptrace.h
>> 
>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>> index d0f532a93b4e..605504d852d3 100644
>> --- a/security/landlock/Makefile
>> +++ b/security/landlock/Makefile
>> @@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>> landlock-y := init.o chain.o task.o \
>>        tag.o tag_fs.o \
>>        enforce.o enforce_seccomp.o \
>> -       hooks.o hooks_cred.o hooks_fs.o
>> +       hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
>> diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
>> new file mode 100644
>> index 000000000000..f1b977b9c808
>> --- /dev/null
>> +++ b/security/landlock/hooks_ptrace.c
>> @@ -0,0 +1,124 @@
>> +/*
>> + * Landlock LSM - ptrace hooks
>> + *
>> + * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2, as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <asm/current.h>
>> +#include <linux/errno.h>
>> +#include <linux/kernel.h> /* ARRAY_SIZE */
>> +#include <linux/lsm_hooks.h>
>> +#include <linux/sched.h> /* struct task_struct */
>> +#include <linux/seccomp.h>
>> +
>> +#include "common.h" /* struct landlock_prog_set */
>> +#include "hooks.h" /* landlocked() */
>> +#include "hooks_ptrace.h"
>> +
>> +static bool progs_are_subset(const struct landlock_prog_set *parent,
>> +               const struct landlock_prog_set *child)
>> +{
>> +       size_t i;
>> +
>> +       if (!parent || !child)
>> +               return false;
>> +       if (parent == child)
>> +               return true;
>> +
>> +       for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
> 
> ARRAY_SIZE(child->programs) seems misleading.  Is there no define
> NUM_LANDLOCK_PROG_TYPES or similar?
> 
>> +               struct landlock_prog_list *walker;
>> +               bool found_parent = false;
>> +
>> +               if (!parent->programs[i])
>> +                       continue;
>> +               for (walker = child->programs[i]; walker;
>> +                               walker = walker->prev) {
>> +                       if (walker == parent->programs[i]) {
>> +                               found_parent = true;
>> +                               break;
>> +                       }
>> +               }
>> +               if (!found_parent)
>> +                       return false;
>> +       }
>> +       return true;
>> +}
> 
> If you used seccomp, you'd get this type of check for free, and it
> would be a lot easier to comprehend.  AFAICT the only extra leniency
> you're granting is that you're agnostic to the order in which the
> rules associated with different program types were applied, which
> could easily be added to seccomp.

On second thought, this is all way too complicated.  I think the correct logic is either "if you are filtered by Landlock, you cannot ptrace anything" or to delete this patch entirely. If something like Tycho's notifiers goes in, then it's not obvious that, just because you have the same set of filters, you have the same privilege.  Similarly, if a feature that lets a filter query its cgroup goes in (and you proposed this once!) then the logic you implemented here is wrong.

Or you could just say that it's the responsibility of a Landlock user to properly filter ptrace() just like it's the responsibility of seccomp users to filter ptrace if needed.

I take this as further evidence that Landlock makes much more sense as part of seccomp than as a totally separate thing.  We've very carefully reviewed these things for seccomp.  Please don't make us do it again from scratch.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27  4:54       ` Alexei Starovoitov
@ 2018-02-27  5:20         ` Andy Lutomirski
  2018-02-27  5:32           ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27  5:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Mickaël Salaün, LKML,
	Alexei Starovoitov, Arnaldo Carvalho de Melo, Casey Schaufler,
	Daniel Borkmann, David Drysdale, David S . Miller,
	Eric W . Biederman, James Morris, Jann Horn, Jonathan Corbet,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf,
	Tycho Andersen, Will Drewry, Kernel Hardening, Linux API,
	LSM List, Network Development, Andrew Morton

On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>> >> The seccomp(2) syscall can be used by a task to apply a Landlock program
>> >> to itself. As a seccomp filter, a Landlock program is enforced for the
>> >> current task and all its future children. A program is immutable and a
>> >> task can only add new restricting programs to itself, forming a list of
>> >> programss.
>> >>
>> >> A Landlock program is tied to a Landlock hook. If the action on a kernel
>> >> object is allowed by the other Linux security mechanisms (e.g. DAC,
>> >> capabilities, other LSM), then a Landlock hook related to this kind of
>> >> object is triggered. The list of programs for this hook is then
>> >> evaluated. Each program return a 32-bit value which can deny the action
>> >> on a kernel object with a non-zero value. If every programs of the list
>> >> return zero, then the action on the object is allowed.
>> >>
>> >> Multiple Landlock programs can be chained to share a 64-bits value for a
>> >> call chain (e.g. evaluating multiple elements of a file path).  This
>> >> chaining is restricted when a process construct this chain by loading a
>> >> program, but additional checks are performed when it requests to apply
>> >> this chain of programs to itself.  The restrictions ensure that it is
>> >> not possible to call multiple programs in a way that would imply to
>> >> handle multiple shared values (i.e. cookies) for one chain.  For now,
>> >> only a fs_pick program can be chained to the same type of program,
>> >> because it may make sense if they have different triggers (cf. next
>> >> commits).  This restrictions still allows to reuse Landlock programs in
>> >> a safe way (e.g. use the same loaded fs_walk program with multiple
>> >> chains of fs_pick programs).
>> >>
>> >> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> >
>> > ...
>> >
>> >> +struct landlock_prog_set *landlock_prepend_prog(
>> >> +             struct landlock_prog_set *current_prog_set,
>> >> +             struct bpf_prog *prog)
>> >> +{
>> >> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>> >> +     unsigned long pages;
>> >> +     int err;
>> >> +     size_t i;
>> >> +     struct landlock_prog_set tmp_prog_set = {};
>> >> +
>> >> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>> >> +             return ERR_PTR(-EINVAL);
>> >> +
>> >> +     /* validate memory size allocation */
>> >> +     pages = prog->pages;
>> >> +     if (current_prog_set) {
>> >> +             size_t i;
>> >> +
>> >> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>> >> +                     struct landlock_prog_list *walker_p;
>> >> +
>> >> +                     for (walker_p = current_prog_set->programs[i];
>> >> +                                     walker_p; walker_p = walker_p->prev)
>> >> +                             pages += walker_p->prog->pages;
>> >> +             }
>> >> +             /* count a struct landlock_prog_set if we need to allocate one */
>> >> +             if (refcount_read(&current_prog_set->usage) != 1)
>> >> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>> >> +                             / PAGE_SIZE;
>> >> +     }
>> >> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>> >> +             return ERR_PTR(-E2BIG);
>> >> +
>> >> +     /* ensure early that we can allocate enough memory for the new
>> >> +      * prog_lists */
>> >> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>> >> +     if (err)
>> >> +             return ERR_PTR(err);
>> >> +
>> >> +     /*
>> >> +      * Each task_struct points to an array of prog list pointers.  These
>> >> +      * tables are duplicated when additions are made (which means each
>> >> +      * table needs to be refcounted for the processes using it). When a new
>> >> +      * table is created, all the refcounters on the prog_list are bumped (to
>> >> +      * track each table that references the prog). When a new prog is
>> >> +      * added, it's just prepended to the list for the new table to point
>> >> +      * at.
>> >> +      *
>> >> +      * Manage all the possible errors before this step to not uselessly
>> >> +      * duplicate current_prog_set and avoid a rollback.
>> >> +      */
>> >> +     if (!new_prog_set) {
>> >> +             /*
>> >> +              * If there is no Landlock program set used by the current task,
>> >> +              * then create a new one.
>> >> +              */
>> >> +             new_prog_set = new_landlock_prog_set();
>> >> +             if (IS_ERR(new_prog_set))
>> >> +                     goto put_tmp_lists;
>> >> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>> >> +             /*
>> >> +              * If the current task is not the sole user of its Landlock
>> >> +              * program set, then duplicate them.
>> >> +              */
>> >> +             new_prog_set = new_landlock_prog_set();
>> >> +             if (IS_ERR(new_prog_set))
>> >> +                     goto put_tmp_lists;
>> >> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>> >> +                     new_prog_set->programs[i] =
>> >> +                             READ_ONCE(current_prog_set->programs[i]);
>> >> +                     if (new_prog_set->programs[i])
>> >> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>> >> +             }
>> >> +
>> >> +             /*
>> >> +              * Landlock program set from the current task will not be freed
>> >> +              * here because the usage is strictly greater than 1. It is
>> >> +              * only prevented to be freed by another task thanks to the
>> >> +              * caller of landlock_prepend_prog() which should be locked if
>> >> +              * needed.
>> >> +              */
>> >> +             landlock_put_prog_set(current_prog_set);
>> >> +     }
>> >> +
>> >> +     /* prepend tmp_prog_set to new_prog_set */
>> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>> >> +             /* get the last new list */
>> >> +             struct landlock_prog_list *last_list =
>> >> +                     tmp_prog_set.programs[i];
>> >> +
>> >> +             if (last_list) {
>> >> +                     while (last_list->prev)
>> >> +                             last_list = last_list->prev;
>> >> +                     /* no need to increment usage (pointer replacement) */
>> >> +                     last_list->prev = new_prog_set->programs[i];
>> >> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>> >> +             }
>> >> +     }
>> >> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>> >> +     return new_prog_set;
>> >> +
>> >> +put_tmp_lists:
>> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>> >> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>> >> +     return new_prog_set;
>> >> +}
>> >
>> > Nack on the chaining concept.
>> > Please do not reinvent the wheel.
>> > There is an existing mechanism for attaching/detaching/quering multiple
>> > programs attached to cgroup and tracing hooks that are also
>> > efficiently executed via BPF_PROG_RUN_ARRAY.
>> > Please use that instead.
>> >
>>
>> I don't see how that would help.  Suppose you add a filter, then
>> fork(), and then the child adds another filter.  Do you want to
>> duplicate the entire array?  You certainly can't *modify* the array
>> because you'll affect processes that shouldn't be affected.
>>
>> In contrast, doing this through seccomp like the earlier patches
>> seemed just fine to me, and seccomp already had the right logic.
>
> it doesn't look to me that existing seccomp side of managing fork
> situation can be reused. Here there is an attempt to add 'chaining'
> concept which sort of an extension of existing seccomp style,
> but somehow heavily done on bpf side and contradicts cgroup/tracing.
>

I don't see why the seccomp way can't be used.  I agree with you that
the seccomp *style* shouldn't be used in bpf code like this, but I
think that Landlock programs can and should just live in the existing
seccomp chain.  If the existing seccomp code needs some modification
to make this work, then so be it.

In other words, the kernel already has two kinds of chaining:
seccomp's and bpf's.  bpf's doesn't work right for this type of usage
across fork(), whereas seccomp's already handles that case correctly.
(In contrast, seccomp's is totally wrong for cgroup-attached filters.)
 So IMO Landlock should use the seccomp core code and call into bpf
for the actual filtering.

For what it's worth, I haven't figured out that Mickaël means about
chaining restrictions involving cookies.  This seems like something
wrong and the design is too complicated.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27  5:20         ` Andy Lutomirski
@ 2018-02-27  5:32           ` Alexei Starovoitov
  2018-02-27 16:39             ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2018-02-27  5:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development, Andrew Morton

On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
> >> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
> >> >> The seccomp(2) syscall can be used by a task to apply a Landlock program
> >> >> to itself. As a seccomp filter, a Landlock program is enforced for the
> >> >> current task and all its future children. A program is immutable and a
> >> >> task can only add new restricting programs to itself, forming a list of
> >> >> programss.
> >> >>
> >> >> A Landlock program is tied to a Landlock hook. If the action on a kernel
> >> >> object is allowed by the other Linux security mechanisms (e.g. DAC,
> >> >> capabilities, other LSM), then a Landlock hook related to this kind of
> >> >> object is triggered. The list of programs for this hook is then
> >> >> evaluated. Each program return a 32-bit value which can deny the action
> >> >> on a kernel object with a non-zero value. If every programs of the list
> >> >> return zero, then the action on the object is allowed.
> >> >>
> >> >> Multiple Landlock programs can be chained to share a 64-bits value for a
> >> >> call chain (e.g. evaluating multiple elements of a file path).  This
> >> >> chaining is restricted when a process construct this chain by loading a
> >> >> program, but additional checks are performed when it requests to apply
> >> >> this chain of programs to itself.  The restrictions ensure that it is
> >> >> not possible to call multiple programs in a way that would imply to
> >> >> handle multiple shared values (i.e. cookies) for one chain.  For now,
> >> >> only a fs_pick program can be chained to the same type of program,
> >> >> because it may make sense if they have different triggers (cf. next
> >> >> commits).  This restrictions still allows to reuse Landlock programs in
> >> >> a safe way (e.g. use the same loaded fs_walk program with multiple
> >> >> chains of fs_pick programs).
> >> >>
> >> >> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> >> >
> >> > ...
> >> >
> >> >> +struct landlock_prog_set *landlock_prepend_prog(
> >> >> +             struct landlock_prog_set *current_prog_set,
> >> >> +             struct bpf_prog *prog)
> >> >> +{
> >> >> +     struct landlock_prog_set *new_prog_set = current_prog_set;
> >> >> +     unsigned long pages;
> >> >> +     int err;
> >> >> +     size_t i;
> >> >> +     struct landlock_prog_set tmp_prog_set = {};
> >> >> +
> >> >> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
> >> >> +             return ERR_PTR(-EINVAL);
> >> >> +
> >> >> +     /* validate memory size allocation */
> >> >> +     pages = prog->pages;
> >> >> +     if (current_prog_set) {
> >> >> +             size_t i;
> >> >> +
> >> >> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
> >> >> +                     struct landlock_prog_list *walker_p;
> >> >> +
> >> >> +                     for (walker_p = current_prog_set->programs[i];
> >> >> +                                     walker_p; walker_p = walker_p->prev)
> >> >> +                             pages += walker_p->prog->pages;
> >> >> +             }
> >> >> +             /* count a struct landlock_prog_set if we need to allocate one */
> >> >> +             if (refcount_read(&current_prog_set->usage) != 1)
> >> >> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
> >> >> +                             / PAGE_SIZE;
> >> >> +     }
> >> >> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
> >> >> +             return ERR_PTR(-E2BIG);
> >> >> +
> >> >> +     /* ensure early that we can allocate enough memory for the new
> >> >> +      * prog_lists */
> >> >> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
> >> >> +     if (err)
> >> >> +             return ERR_PTR(err);
> >> >> +
> >> >> +     /*
> >> >> +      * Each task_struct points to an array of prog list pointers.  These
> >> >> +      * tables are duplicated when additions are made (which means each
> >> >> +      * table needs to be refcounted for the processes using it). When a new
> >> >> +      * table is created, all the refcounters on the prog_list are bumped (to
> >> >> +      * track each table that references the prog). When a new prog is
> >> >> +      * added, it's just prepended to the list for the new table to point
> >> >> +      * at.
> >> >> +      *
> >> >> +      * Manage all the possible errors before this step to not uselessly
> >> >> +      * duplicate current_prog_set and avoid a rollback.
> >> >> +      */
> >> >> +     if (!new_prog_set) {
> >> >> +             /*
> >> >> +              * If there is no Landlock program set used by the current task,
> >> >> +              * then create a new one.
> >> >> +              */
> >> >> +             new_prog_set = new_landlock_prog_set();
> >> >> +             if (IS_ERR(new_prog_set))
> >> >> +                     goto put_tmp_lists;
> >> >> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
> >> >> +             /*
> >> >> +              * If the current task is not the sole user of its Landlock
> >> >> +              * program set, then duplicate them.
> >> >> +              */
> >> >> +             new_prog_set = new_landlock_prog_set();
> >> >> +             if (IS_ERR(new_prog_set))
> >> >> +                     goto put_tmp_lists;
> >> >> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
> >> >> +                     new_prog_set->programs[i] =
> >> >> +                             READ_ONCE(current_prog_set->programs[i]);
> >> >> +                     if (new_prog_set->programs[i])
> >> >> +                             refcount_inc(&new_prog_set->programs[i]->usage);
> >> >> +             }
> >> >> +
> >> >> +             /*
> >> >> +              * Landlock program set from the current task will not be freed
> >> >> +              * here because the usage is strictly greater than 1. It is
> >> >> +              * only prevented to be freed by another task thanks to the
> >> >> +              * caller of landlock_prepend_prog() which should be locked if
> >> >> +              * needed.
> >> >> +              */
> >> >> +             landlock_put_prog_set(current_prog_set);
> >> >> +     }
> >> >> +
> >> >> +     /* prepend tmp_prog_set to new_prog_set */
> >> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
> >> >> +             /* get the last new list */
> >> >> +             struct landlock_prog_list *last_list =
> >> >> +                     tmp_prog_set.programs[i];
> >> >> +
> >> >> +             if (last_list) {
> >> >> +                     while (last_list->prev)
> >> >> +                             last_list = last_list->prev;
> >> >> +                     /* no need to increment usage (pointer replacement) */
> >> >> +                     last_list->prev = new_prog_set->programs[i];
> >> >> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
> >> >> +             }
> >> >> +     }
> >> >> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
> >> >> +     return new_prog_set;
> >> >> +
> >> >> +put_tmp_lists:
> >> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
> >> >> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
> >> >> +     return new_prog_set;
> >> >> +}
> >> >
> >> > Nack on the chaining concept.
> >> > Please do not reinvent the wheel.
> >> > There is an existing mechanism for attaching/detaching/quering multiple
> >> > programs attached to cgroup and tracing hooks that are also
> >> > efficiently executed via BPF_PROG_RUN_ARRAY.
> >> > Please use that instead.
> >> >
> >>
> >> I don't see how that would help.  Suppose you add a filter, then
> >> fork(), and then the child adds another filter.  Do you want to
> >> duplicate the entire array?  You certainly can't *modify* the array
> >> because you'll affect processes that shouldn't be affected.
> >>
> >> In contrast, doing this through seccomp like the earlier patches
> >> seemed just fine to me, and seccomp already had the right logic.
> >
> > it doesn't look to me that existing seccomp side of managing fork
> > situation can be reused. Here there is an attempt to add 'chaining'
> > concept which sort of an extension of existing seccomp style,
> > but somehow heavily done on bpf side and contradicts cgroup/tracing.
> >
> 
> I don't see why the seccomp way can't be used.  I agree with you that
> the seccomp *style* shouldn't be used in bpf code like this, but I
> think that Landlock programs can and should just live in the existing
> seccomp chain.  If the existing seccomp code needs some modification
> to make this work, then so be it.

+1
if that was the case...
but that's not my reading of the patch set.

> In other words, the kernel already has two kinds of chaining:
> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
> across fork(), whereas seccomp's already handles that case correctly.
> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
>  So IMO Landlock should use the seccomp core code and call into bpf
> for the actual filtering.

+1
in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
since cgroup hierarchy can be complicated with bpf progs attached
at different levels with different override/multiprog properties,
so walking link list and checking all flags at run-time would have
been too slow. That's why we added compute_effective_progs().

imo doing similar stuff at fork is not a big deal either.
allocating new bpf_prog_array for the task and computing
array of progs is trivial.

> For what it's worth, I haven't figured out that Mickaël means about
> chaining restrictions involving cookies.  This seems like something
> wrong and the design is too complicated.

the cookie part I don't like either.
In networking land we have cb in skb and xdp metadata to pass data
between programs.
imo right now for landlock there is no need to do any of this stuff.
It's purely a feature extension that is clearly controversial and
blocking even basic review of the rest of the patches.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27  5:32           ` Alexei Starovoitov
@ 2018-02-27 16:39             ` Andy Lutomirski
  2018-02-27 17:30               ` Casey Schaufler
  2018-02-27 21:48               ` Mickaël Salaün
  0 siblings, 2 replies; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27 16:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Mickaël Salaün, LKML,
	Alexei Starovoitov, Arnaldo Carvalho de Melo, Casey Schaufler,
	Daniel Borkmann, David Drysdale, David S . Miller,
	Eric W . Biederman, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development,
	Andrew Morton

On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
>> >> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>> >> <alexei.starovoitov@gmail.com> wrote:
>> >> > On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>> >> >> The seccomp(2) syscall can be used by a task to apply a Landlock program
>> >> >> to itself. As a seccomp filter, a Landlock program is enforced for the
>> >> >> current task and all its future children. A program is immutable and a
>> >> >> task can only add new restricting programs to itself, forming a list of
>> >> >> programss.
>> >> >>
>> >> >> A Landlock program is tied to a Landlock hook. If the action on a kernel
>> >> >> object is allowed by the other Linux security mechanisms (e.g. DAC,
>> >> >> capabilities, other LSM), then a Landlock hook related to this kind of
>> >> >> object is triggered. The list of programs for this hook is then
>> >> >> evaluated. Each program return a 32-bit value which can deny the action
>> >> >> on a kernel object with a non-zero value. If every programs of the list
>> >> >> return zero, then the action on the object is allowed.
>> >> >>
>> >> >> Multiple Landlock programs can be chained to share a 64-bits value for a
>> >> >> call chain (e.g. evaluating multiple elements of a file path).  This
>> >> >> chaining is restricted when a process construct this chain by loading a
>> >> >> program, but additional checks are performed when it requests to apply
>> >> >> this chain of programs to itself.  The restrictions ensure that it is
>> >> >> not possible to call multiple programs in a way that would imply to
>> >> >> handle multiple shared values (i.e. cookies) for one chain.  For now,
>> >> >> only a fs_pick program can be chained to the same type of program,
>> >> >> because it may make sense if they have different triggers (cf. next
>> >> >> commits).  This restrictions still allows to reuse Landlock programs in
>> >> >> a safe way (e.g. use the same loaded fs_walk program with multiple
>> >> >> chains of fs_pick programs).
>> >> >>
>> >> >> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> >> >
>> >> > ...
>> >> >
>> >> >> +struct landlock_prog_set *landlock_prepend_prog(
>> >> >> +             struct landlock_prog_set *current_prog_set,
>> >> >> +             struct bpf_prog *prog)
>> >> >> +{
>> >> >> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>> >> >> +     unsigned long pages;
>> >> >> +     int err;
>> >> >> +     size_t i;
>> >> >> +     struct landlock_prog_set tmp_prog_set = {};
>> >> >> +
>> >> >> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>> >> >> +             return ERR_PTR(-EINVAL);
>> >> >> +
>> >> >> +     /* validate memory size allocation */
>> >> >> +     pages = prog->pages;
>> >> >> +     if (current_prog_set) {
>> >> >> +             size_t i;
>> >> >> +
>> >> >> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>> >> >> +                     struct landlock_prog_list *walker_p;
>> >> >> +
>> >> >> +                     for (walker_p = current_prog_set->programs[i];
>> >> >> +                                     walker_p; walker_p = walker_p->prev)
>> >> >> +                             pages += walker_p->prog->pages;
>> >> >> +             }
>> >> >> +             /* count a struct landlock_prog_set if we need to allocate one */
>> >> >> +             if (refcount_read(&current_prog_set->usage) != 1)
>> >> >> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>> >> >> +                             / PAGE_SIZE;
>> >> >> +     }
>> >> >> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>> >> >> +             return ERR_PTR(-E2BIG);
>> >> >> +
>> >> >> +     /* ensure early that we can allocate enough memory for the new
>> >> >> +      * prog_lists */
>> >> >> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>> >> >> +     if (err)
>> >> >> +             return ERR_PTR(err);
>> >> >> +
>> >> >> +     /*
>> >> >> +      * Each task_struct points to an array of prog list pointers.  These
>> >> >> +      * tables are duplicated when additions are made (which means each
>> >> >> +      * table needs to be refcounted for the processes using it). When a new
>> >> >> +      * table is created, all the refcounters on the prog_list are bumped (to
>> >> >> +      * track each table that references the prog). When a new prog is
>> >> >> +      * added, it's just prepended to the list for the new table to point
>> >> >> +      * at.
>> >> >> +      *
>> >> >> +      * Manage all the possible errors before this step to not uselessly
>> >> >> +      * duplicate current_prog_set and avoid a rollback.
>> >> >> +      */
>> >> >> +     if (!new_prog_set) {
>> >> >> +             /*
>> >> >> +              * If there is no Landlock program set used by the current task,
>> >> >> +              * then create a new one.
>> >> >> +              */
>> >> >> +             new_prog_set = new_landlock_prog_set();
>> >> >> +             if (IS_ERR(new_prog_set))
>> >> >> +                     goto put_tmp_lists;
>> >> >> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>> >> >> +             /*
>> >> >> +              * If the current task is not the sole user of its Landlock
>> >> >> +              * program set, then duplicate them.
>> >> >> +              */
>> >> >> +             new_prog_set = new_landlock_prog_set();
>> >> >> +             if (IS_ERR(new_prog_set))
>> >> >> +                     goto put_tmp_lists;
>> >> >> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>> >> >> +                     new_prog_set->programs[i] =
>> >> >> +                             READ_ONCE(current_prog_set->programs[i]);
>> >> >> +                     if (new_prog_set->programs[i])
>> >> >> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>> >> >> +             }
>> >> >> +
>> >> >> +             /*
>> >> >> +              * Landlock program set from the current task will not be freed
>> >> >> +              * here because the usage is strictly greater than 1. It is
>> >> >> +              * only prevented to be freed by another task thanks to the
>> >> >> +              * caller of landlock_prepend_prog() which should be locked if
>> >> >> +              * needed.
>> >> >> +              */
>> >> >> +             landlock_put_prog_set(current_prog_set);
>> >> >> +     }
>> >> >> +
>> >> >> +     /* prepend tmp_prog_set to new_prog_set */
>> >> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>> >> >> +             /* get the last new list */
>> >> >> +             struct landlock_prog_list *last_list =
>> >> >> +                     tmp_prog_set.programs[i];
>> >> >> +
>> >> >> +             if (last_list) {
>> >> >> +                     while (last_list->prev)
>> >> >> +                             last_list = last_list->prev;
>> >> >> +                     /* no need to increment usage (pointer replacement) */
>> >> >> +                     last_list->prev = new_prog_set->programs[i];
>> >> >> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>> >> >> +             }
>> >> >> +     }
>> >> >> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>> >> >> +     return new_prog_set;
>> >> >> +
>> >> >> +put_tmp_lists:
>> >> >> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>> >> >> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>> >> >> +     return new_prog_set;
>> >> >> +}
>> >> >
>> >> > Nack on the chaining concept.
>> >> > Please do not reinvent the wheel.
>> >> > There is an existing mechanism for attaching/detaching/quering multiple
>> >> > programs attached to cgroup and tracing hooks that are also
>> >> > efficiently executed via BPF_PROG_RUN_ARRAY.
>> >> > Please use that instead.
>> >> >
>> >>
>> >> I don't see how that would help.  Suppose you add a filter, then
>> >> fork(), and then the child adds another filter.  Do you want to
>> >> duplicate the entire array?  You certainly can't *modify* the array
>> >> because you'll affect processes that shouldn't be affected.
>> >>
>> >> In contrast, doing this through seccomp like the earlier patches
>> >> seemed just fine to me, and seccomp already had the right logic.
>> >
>> > it doesn't look to me that existing seccomp side of managing fork
>> > situation can be reused. Here there is an attempt to add 'chaining'
>> > concept which sort of an extension of existing seccomp style,
>> > but somehow heavily done on bpf side and contradicts cgroup/tracing.
>> >
>>
>> I don't see why the seccomp way can't be used.  I agree with you that
>> the seccomp *style* shouldn't be used in bpf code like this, but I
>> think that Landlock programs can and should just live in the existing
>> seccomp chain.  If the existing seccomp code needs some modification
>> to make this work, then so be it.
>
> +1
> if that was the case...
> but that's not my reading of the patch set.

An earlier version of the patch set used the seccomp filter chain.
Mickaël, what exactly was wrong with that approach other than that the
seccomp() syscall was awkward for you to use?  You could add a
seccomp_add_landlock_rule() syscall if you needed to.

As a side comment, why is this an LSM at all, let alone a non-stacking
LSM?  It would make a lot more sense to me to make Landlock depend on
having LSMs configured in but to call the landlock hooks directly from
the security_xyz() hooks.

>
>> In other words, the kernel already has two kinds of chaining:
>> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
>> across fork(), whereas seccomp's already handles that case correctly.
>> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
>>  So IMO Landlock should use the seccomp core code and call into bpf
>> for the actual filtering.
>
> +1
> in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
> since cgroup hierarchy can be complicated with bpf progs attached
> at different levels with different override/multiprog properties,
> so walking link list and checking all flags at run-time would have
> been too slow. That's why we added compute_effective_progs().

If we start adding override flags to Landlock, I think we're doing it
wrong.   With cgroup bpf programs, the whole mess is set up by the
administrator.  With seccomp, and with Landlock if done correctly, it
*won't* be set up by the administrator, so the chance that everyone
gets all the flags right is about zero.  All attached filters should
run unconditionally.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27 16:39             ` Andy Lutomirski
@ 2018-02-27 17:30               ` Casey Schaufler
  2018-02-27 17:36                 ` Andy Lutomirski
  2018-02-27 21:48               ` Mickaël Salaün
  1 sibling, 1 reply; 55+ messages in thread
From: Casey Schaufler @ 2018-02-27 17:30 UTC (permalink / raw)
  To: Andy Lutomirski, Alexei Starovoitov
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, Jann Horn, Jonathan Corbet,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf,
	Tycho Andersen, Will Drewry, Kernel Hardening, Linux API,
	LSM List, Network Development, Andrew Morton

On 2/27/2018 8:39 AM, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> [ Snip ]
> An earlier version of the patch set used the seccomp filter chain.
> Mickaël, what exactly was wrong with that approach other than that the
> seccomp() syscall was awkward for you to use?  You could add a
> seccomp_add_landlock_rule() syscall if you needed to.
>
> As a side comment, why is this an LSM at all, let alone a non-stacking
> LSM?  It would make a lot more sense to me to make Landlock depend on
> having LSMs configured in but to call the landlock hooks directly from
> the security_xyz() hooks.

Please, no. It is my serious intention to have at least the
infrastructure blob management in within a release or two, and
I think that's all Landlock needs. The security_xyz() hooks are
sufficiently hackish as it is without unnecessarily adding more
special cases.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27 17:30               ` Casey Schaufler
@ 2018-02-27 17:36                 ` Andy Lutomirski
  2018-02-27 18:03                   ` Casey Schaufler
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27 17:36 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Andy Lutomirski, Alexei Starovoitov, Mickaël Salaün,
	LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Daniel Borkmann, David Drysdale, David S . Miller,
	Eric W . Biederman, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development,
	Andrew Morton

On Tue, Feb 27, 2018 at 5:30 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 2/27/2018 8:39 AM, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> [ Snip ]
>> An earlier version of the patch set used the seccomp filter chain.
>> Mickaël, what exactly was wrong with that approach other than that the
>> seccomp() syscall was awkward for you to use?  You could add a
>> seccomp_add_landlock_rule() syscall if you needed to.
>>
>> As a side comment, why is this an LSM at all, let alone a non-stacking
>> LSM?  It would make a lot more sense to me to make Landlock depend on
>> having LSMs configured in but to call the landlock hooks directly from
>> the security_xyz() hooks.
>
> Please, no. It is my serious intention to have at least the
> infrastructure blob management in within a release or two, and
> I think that's all Landlock needs. The security_xyz() hooks are
> sufficiently hackish as it is without unnecessarily adding more
> special cases.
>
>

What do you mean by "infrastructure blob management"?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27 17:36                 ` Andy Lutomirski
@ 2018-02-27 18:03                   ` Casey Schaufler
  0 siblings, 0 replies; 55+ messages in thread
From: Casey Schaufler @ 2018-02-27 18:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Mickaël Salaün, LKML,
	Alexei Starovoitov, Arnaldo Carvalho de Melo, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development, Andrew Morton

On 2/27/2018 9:36 AM, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 5:30 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 2/27/2018 8:39 AM, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> [ Snip ]
>>> An earlier version of the patch set used the seccomp filter chain.
>>> Mickaël, what exactly was wrong with that approach other than that the
>>> seccomp() syscall was awkward for you to use?  You could add a
>>> seccomp_add_landlock_rule() syscall if you needed to.
>>>
>>> As a side comment, why is this an LSM at all, let alone a non-stacking
>>> LSM?  It would make a lot more sense to me to make Landlock depend on
>>> having LSMs configured in but to call the landlock hooks directly from
>>> the security_xyz() hooks.
>> Please, no. It is my serious intention to have at least the
>> infrastructure blob management in within a release or two, and
>> I think that's all Landlock needs. The security_xyz() hooks are
>> sufficiently hackish as it is without unnecessarily adding more
>> special cases.
>>
>>
> What do you mean by "infrastructure blob management"?

Today each security module manages their own module specific data,
for example inode->i_security and file->f_security. This prevents
having two security modules that have inode or file data from being
used at the same time, because they both need to manage those fields.
Moving the management of the module specific data (aka "blobs") from
the security modules to the module infrastructure will allow those
modules to coexist. Restrictions apply, of course, but I don't think
that Landlock uses any of the facilities that would have issues.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27 16:39             ` Andy Lutomirski
  2018-02-27 17:30               ` Casey Schaufler
@ 2018-02-27 21:48               ` Mickaël Salaün
  2018-04-08 13:13                 ` Mickaël Salaün
  1 sibling, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27 21:48 UTC (permalink / raw)
  To: Andy Lutomirski, Alexei Starovoitov
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, Jann Horn, Jonathan Corbet,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf,
	Tycho Andersen, Will Drewry, Kernel Hardening, Linux API,
	LSM List, Network Development, Andrew Morton


[-- Attachment #1.1: Type: text/plain, Size: 17637 bytes --]


On 27/02/2018 17:39, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock program
>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the
>>>>>>> current task and all its future children. A program is immutable and a
>>>>>>> task can only add new restricting programs to itself, forming a list of
>>>>>>> programss.
>>>>>>>
>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a kernel
>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of
>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>> evaluated. Each program return a 32-bit value which can deny the action
>>>>>>> on a kernel object with a non-zero value. If every programs of the list
>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>
>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for a
>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
>>>>>>> chaining is restricted when a process construct this chain by loading a
>>>>>>> program, but additional checks are performed when it requests to apply
>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>> commits).  This restrictions still allows to reuse Landlock programs in
>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>> chains of fs_pick programs).
>>>>>>>
>>>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
>>>>>>> +             struct landlock_prog_set *current_prog_set,
>>>>>>> +             struct bpf_prog *prog)
>>>>>>> +{
>>>>>>> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>>>>>>> +     unsigned long pages;
>>>>>>> +     int err;
>>>>>>> +     size_t i;
>>>>>>> +     struct landlock_prog_set tmp_prog_set = {};
>>>>>>> +
>>>>>>> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>>>>>>> +             return ERR_PTR(-EINVAL);
>>>>>>> +
>>>>>>> +     /* validate memory size allocation */
>>>>>>> +     pages = prog->pages;
>>>>>>> +     if (current_prog_set) {
>>>>>>> +             size_t i;
>>>>>>> +
>>>>>>> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>>>>>>> +                     struct landlock_prog_list *walker_p;
>>>>>>> +
>>>>>>> +                     for (walker_p = current_prog_set->programs[i];
>>>>>>> +                                     walker_p; walker_p = walker_p->prev)
>>>>>>> +                             pages += walker_p->prog->pages;
>>>>>>> +             }
>>>>>>> +             /* count a struct landlock_prog_set if we need to allocate one */
>>>>>>> +             if (refcount_read(&current_prog_set->usage) != 1)
>>>>>>> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>>>>>>> +                             / PAGE_SIZE;
>>>>>>> +     }
>>>>>>> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>>>>>>> +             return ERR_PTR(-E2BIG);
>>>>>>> +
>>>>>>> +     /* ensure early that we can allocate enough memory for the new
>>>>>>> +      * prog_lists */
>>>>>>> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>>>>>>> +     if (err)
>>>>>>> +             return ERR_PTR(err);
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Each task_struct points to an array of prog list pointers.  These
>>>>>>> +      * tables are duplicated when additions are made (which means each
>>>>>>> +      * table needs to be refcounted for the processes using it). When a new
>>>>>>> +      * table is created, all the refcounters on the prog_list are bumped (to
>>>>>>> +      * track each table that references the prog). When a new prog is
>>>>>>> +      * added, it's just prepended to the list for the new table to point
>>>>>>> +      * at.
>>>>>>> +      *
>>>>>>> +      * Manage all the possible errors before this step to not uselessly
>>>>>>> +      * duplicate current_prog_set and avoid a rollback.
>>>>>>> +      */
>>>>>>> +     if (!new_prog_set) {
>>>>>>> +             /*
>>>>>>> +              * If there is no Landlock program set used by the current task,
>>>>>>> +              * then create a new one.
>>>>>>> +              */
>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>> +                     goto put_tmp_lists;
>>>>>>> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>>>>>>> +             /*
>>>>>>> +              * If the current task is not the sole user of its Landlock
>>>>>>> +              * program set, then duplicate them.
>>>>>>> +              */
>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>> +                     goto put_tmp_lists;
>>>>>>> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>>>>>>> +                     new_prog_set->programs[i] =
>>>>>>> +                             READ_ONCE(current_prog_set->programs[i]);
>>>>>>> +                     if (new_prog_set->programs[i])
>>>>>>> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>>>>>>> +             }
>>>>>>> +
>>>>>>> +             /*
>>>>>>> +              * Landlock program set from the current task will not be freed
>>>>>>> +              * here because the usage is strictly greater than 1. It is
>>>>>>> +              * only prevented to be freed by another task thanks to the
>>>>>>> +              * caller of landlock_prepend_prog() which should be locked if
>>>>>>> +              * needed.
>>>>>>> +              */
>>>>>>> +             landlock_put_prog_set(current_prog_set);
>>>>>>> +     }
>>>>>>> +
>>>>>>> +     /* prepend tmp_prog_set to new_prog_set */
>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>>>>>>> +             /* get the last new list */
>>>>>>> +             struct landlock_prog_list *last_list =
>>>>>>> +                     tmp_prog_set.programs[i];
>>>>>>> +
>>>>>>> +             if (last_list) {
>>>>>>> +                     while (last_list->prev)
>>>>>>> +                             last_list = last_list->prev;
>>>>>>> +                     /* no need to increment usage (pointer replacement) */
>>>>>>> +                     last_list->prev = new_prog_set->programs[i];
>>>>>>> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>>>>>>> +             }
>>>>>>> +     }
>>>>>>> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>>>>>>> +     return new_prog_set;
>>>>>>> +
>>>>>>> +put_tmp_lists:
>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>>>>>>> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>>>>>>> +     return new_prog_set;
>>>>>>> +}
>>>>>>
>>>>>> Nack on the chaining concept.
>>>>>> Please do not reinvent the wheel.
>>>>>> There is an existing mechanism for attaching/detaching/quering multiple
>>>>>> programs attached to cgroup and tracing hooks that are also
>>>>>> efficiently executed via BPF_PROG_RUN_ARRAY.
>>>>>> Please use that instead.
>>>>>>
>>>>>
>>>>> I don't see how that would help.  Suppose you add a filter, then
>>>>> fork(), and then the child adds another filter.  Do you want to
>>>>> duplicate the entire array?  You certainly can't *modify* the array
>>>>> because you'll affect processes that shouldn't be affected.
>>>>>
>>>>> In contrast, doing this through seccomp like the earlier patches
>>>>> seemed just fine to me, and seccomp already had the right logic.
>>>>
>>>> it doesn't look to me that existing seccomp side of managing fork
>>>> situation can be reused. Here there is an attempt to add 'chaining'
>>>> concept which sort of an extension of existing seccomp style,
>>>> but somehow heavily done on bpf side and contradicts cgroup/tracing.
>>>>
>>>
>>> I don't see why the seccomp way can't be used.  I agree with you that
>>> the seccomp *style* shouldn't be used in bpf code like this, but I
>>> think that Landlock programs can and should just live in the existing
>>> seccomp chain.  If the existing seccomp code needs some modification
>>> to make this work, then so be it.
>>
>> +1
>> if that was the case...
>> but that's not my reading of the patch set.
> 
> An earlier version of the patch set used the seccomp filter chain.
> Mickaël, what exactly was wrong with that approach other than that the
> seccomp() syscall was awkward for you to use?  You could add a
> seccomp_add_landlock_rule() syscall if you needed to.

Nothing was wrong about about that, this part did not changed (see my
next comment).

> 
> As a side comment, why is this an LSM at all, let alone a non-stacking
> LSM?  It would make a lot more sense to me to make Landlock depend on
> having LSMs configured in but to call the landlock hooks directly from
> the security_xyz() hooks.

See Casey's answer and his patch series: https://lwn.net/Articles/741963/

> 
>>
>>> In other words, the kernel already has two kinds of chaining:
>>> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
>>> across fork(), whereas seccomp's already handles that case correctly.
>>> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
>>>  So IMO Landlock should use the seccomp core code and call into bpf
>>> for the actual filtering.
>>
>> +1
>> in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
>> since cgroup hierarchy can be complicated with bpf progs attached
>> at different levels with different override/multiprog properties,
>> so walking link list and checking all flags at run-time would have
>> been too slow. That's why we added compute_effective_progs().
> 
> If we start adding override flags to Landlock, I think we're doing it
> wrong.   With cgroup bpf programs, the whole mess is set up by the
> administrator.  With seccomp, and with Landlock if done correctly, it
> *won't* be set up by the administrator, so the chance that everyone
> gets all the flags right is about zero.  All attached filters should
> run unconditionally.


There is a misunderstanding about this chaining mechanism. This should
not be confused with the list of seccomp filters nor the cgroup
hierarchies. Landlock programs can be stacked the same way seccomp's
filters can (cf. struct landlock_prog_set, the "chain_last" field is an
optimization which is not used for this struct handling). This stackable
property did not changed from the previous patch series. The chaining
mechanism is for another use case, which does not make sense for seccomp
filters nor other eBPF program types, at least for now, from what I can
tell.

You may want to get a look at my talk at FOSDEM
(https://landlock.io/talks/2018-02-04_landlock-fosdem.pdf), especially
slides 11 and 12.

Let me explain my reasoning about this program chaining thing.

To check if an action on a file is allowed, we first need to identify
this file and match it to the security policy. In a previous
(non-public) patch series, I tried to use one type of eBPF program to
check every kind of access to a file. To be able to identify a file, I
relied on an eBPF map, similar to the current inode map. This map store
a set of references to file descriptors. I then created a function
bpf_is_file_beneath() to check if the requested file was beneath a file
in the map. This way, no chaining, only one eBPF program type to check
an access to a file... but some issues then emerged. First, this design
create a side-channel which help an attacker using such a program to
infer some information not normally available, for example to get a hint
on where a file descriptor (received from a UNIX socket) come from.
Another issue is that this type of program would be called for each
component of a path. Indeed, when the kernel check if an access to a
file is allowed, it walk through all of the directories in its path
(checking if the current process is allowed to execute them). That first
attempt led me to rethink the way we could filter an access to a file
*path*.

To minimize the number of called to an eBPF program dedicated to
validate an access to a file path, I decided to create three subtype of
eBPF programs. The FS_WALK type is called when walking through every
directory of a file path (except the last one if it is the target). We
can then restrict this type of program to the minimum set of functions
it is allowed to call and the minimum set of data available from its
context. The first implicit chaining is for this type of program. To be
able to evaluate a path while being called for all its components, this
program need to store a state (to remember what was the parent directory
of this path). There is no "previous" field in the subtype for this
program because it is chained with itself, for each directories. This
enable to create a FS_WALK program to evaluate a file hierarchy, thank
to the inode map which can be used to check if a directory of this
hierarchy is part of an allowed (or denied) list of directories. This
design enables to express a file hierarchy in a programmatic way,
without requiring an eBPF helper to do the job (unlike my first experiment).

The explicit chaining is used to tied a path evaluation (with a FS_WALK
program) to an access to the actual file being requested (the last
component of a file path), with a FS_PICK program. It is only at this
time that the kernel check for the requested action (e.g. read, write,
chdir, append...). To be able to filter such access request we can have
one call to the same program for every action and let this program check
for which action it was called. However, this design does not allow the
kernel to know if the current action is indeed handled by this program.
Hence, it is not possible to implement a cache mechanism to only call
this program if it knows how to handle this action.

The approach I took for this FS_PICK type of program is to add to its
subtype which action it can handle (with the "triggers" bitfield, seen
as ORed actions). This way, the kernel knows if a call to a FS_PICK
program is necessary. If the user wants to enforce a different security
policy according to the action requested on a file, then it needs
multiple FS_PICK programs. However, to reduce the number of such
programs, this patch series allow a FS_PICK program to be chained with
another, the same way a FS_WALK is chained with itself. This way, if the
user want to check if the action is a for example an "open" and a "read"
and not a "map" and a "read", then it can chain multiple FS_PICK
programs with different triggers actions. The OR check performed by the
kernel is not a limitation then, only a way to know if a call to an eBPF
program is needed.

The last type of program is FS_GET. This one is called when a process
get a struct file or change its working directory. This is the only
program type able (and allowed) to tag a file. This restriction is
important to not being subject to resource exhaustion attacks (i.e.
tagging every inode accessible to an attacker, which would allocate too
much kernel memory).

This design gives room for improvements to create a cache of eBPF
context (input data, including maps if any), with the result of an eBPF
program. This would help limit the number of call to an eBPF program the
same way SELinux or other kernel components do to limit costly checks.

The eBPF maps of progs are useful to call the same type of eBPF
program. It does not fit with this use case because we may want multiple
eBPF program according to the action requested on a kernel object (e.g.
FS_GET). The other reason is because the eBPF program does not know what
will be the next (type of) access check performed by the kernel.

To say it another way, this chaining mechanism is a way to split a
kernel object evaluation with multiple specialized programs, each of
them being able to deal with data tied to their type. Using a monolithic
eBPF program to check everything does not scale and does not fit with
unprivileged use either.

As a side note, the cookie value is only an ephemeral value to keep a
state between multiple programs call. It can be used to create a state
machine for an object evaluation.

I don't see a way to do an efficient and programmatic path evaluation,
with different access checks, with the current eBPF features. Please let
me know if you know how to do it another way.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-02-27  4:36 ` [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Andy Lutomirski
@ 2018-02-27 22:03   ` Mickaël Salaün
  2018-02-27 23:09     ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27 22:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 8758 bytes --]


On 27/02/2018 05:36, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> Hi,
>>
>> This eight series is a major revamp of the Landlock design compared to
>> the previous series [1]. This enables more flexibility and granularity
>> of access control with file paths. It is now possible to enforce an
>> access control according to a file hierarchy. Landlock uses the concept
>> of inode and path to identify such hierarchy. In a way, it brings tools
>> to program what is a file hierarchy.
>>
>> There is now three types of Landlock hooks: FS_WALK, FS_PICK and FS_GET.
>> Each of them accepts a dedicated eBPF program, called a Landlock
>> program.  They can be chained to enforce a full access control according
>> to a list of directories or files. The set of actions on a file is well
>> defined (e.g. read, write, ioctl, append, lock, mount...) taking
>> inspiration from the major Linux LSMs and some other access-controls
>> like Capsicum.  These program types are designed to be cache-friendly,
>> which give room for optimizations in the future.
>>
>> The documentation patch contains some kernel documentation and
>> explanations on how to use Landlock.  The compiled documentation and
>> a talk I gave at FOSDEM can be found here: https://landlock.io
>> This patch series can be found in the branch landlock-v8 in this repo:
>> https://github.com/landlock-lsm/linux
>>
>> There is still some minor issues with this patch series but it should
>> demonstrate how powerful this design may be. One of these issues is that
>> it is not a stackable LSM anymore, but the infrastructure management of
>> security blobs should allow to stack it with other LSM [4].
>>
>> This is the first step of the roadmap discussed at LPC [2].  While the
>> intended final goal is to allow unprivileged users to use Landlock, this
>> series allows only a process with global CAP_SYS_ADMIN to load and
>> enforce a rule.  This may help to get feedback and avoid unexpected
>> behaviors.
>>
>> This series can be applied on top of bpf-next, commit 7d72637eb39f
>> ("Merge branch 'x86-jit'").  This can be tested with
>> CONFIG_SECCOMP_FILTER and CONFIG_SECURITY_LANDLOCK.  I would really
>> appreciate constructive comments on the design and the code.
>>
>>
>> # Landlock LSM
>>
>> The goal of this new Linux Security Module (LSM) called Landlock is to
>> allow any process, including unprivileged ones, to create powerful
>> security sandboxes comparable to XNU Sandbox or OpenBSD Pledge. This
>> kind of sandbox is expected to help mitigate the security impact of bugs
>> or unexpected/malicious behaviors in user-space applications.
>>
>> The approach taken is to add the minimum amount of code while still
>> allowing the user-space application to create quite complex access
>> rules.  A dedicated security policy language such as the one used by
>> SELinux, AppArmor and other major LSMs involves a lot of code and is
>> usually permitted to only a trusted user (i.e. root).  On the contrary,
>> eBPF programs already exist and are designed to be safely loaded by
>> unprivileged user-space.
>>
>> This design does not seem too intrusive but is flexible enough to allow
>> a powerful sandbox mechanism accessible by any process on Linux. The use
>> of seccomp and Landlock is more suitable with the help of a user-space
>> library (e.g.  libseccomp) that could help to specify a high-level
>> language to express a security policy instead of raw eBPF programs.
>> Moreover, thanks to the LLVM front-end, it is quite easy to write an
>> eBPF program with a subset of the C language.
>>
>>
>> # Frequently asked questions
>>
>> ## Why is seccomp-bpf not enough?
>>
>> A seccomp filter can access only raw syscall arguments (i.e. the
>> register values) which means that it is not possible to filter according
>> to the value pointed to by an argument, such as a file pathname. As an
>> embryonic Landlock version demonstrated, filtering at the syscall level
>> is complicated (e.g. need to take care of race conditions). This is
>> mainly because the access control checkpoints of the kernel are not at
>> this high-level but more underneath, at the LSM-hook level. The LSM
>> hooks are designed to handle this kind of checks.  Landlock abstracts
>> this approach to leverage the ability of unprivileged users to limit
>> themselves.
>>
>> Cf. section "What it isn't?" in Documentation/prctl/seccomp_filter.txt
>>
>>
>> ## Why use the seccomp(2) syscall?
>>
>> Landlock use the same semantic as seccomp to apply access rule
>> restrictions. It add a new layer of security for the current process
>> which is inherited by its children. It makes sense to use an unique
>> access-restricting syscall (that should be allowed by seccomp filters)
>> which can only drop privileges. Moreover, a Landlock rule could come
>> from outside a process (e.g.  passed through a UNIX socket). It is then
>> useful to differentiate the creation/load of Landlock eBPF programs via
>> bpf(2), from rule enforcement via seccomp(2).
> 
> This seems like a weak argument to me.  Sure, this is a bit different
> from seccomp(), and maybe shoving it into the seccomp() multiplexer is
> awkward, but surely the bpf() multiplexer is even less applicable.

I think using the seccomp syscall is fine, and everyone agreed on it.

> 
> But I think that you have more in common with seccomp() than you're
> giving it credit for.  With seccomp, you need to either prevent
> ptrace() of any more-privileged task or you need to filter to make
> sure you can't trace a more privileged program.  With landlock, you
> need exactly the same thing.  You have basically the same no_new_privs
> considerations, etc.

Right, I did not mean to not give credit to seccomp at all.

> 
> Also, looking forward, I think you're going to want a bunch of the
> stuff that's under consideration as new seccomp features.  Tycho is
> working on a "user notifier" feature for seccomp where, in addition to
> accepting, rejecting, or kicking to ptrace, you can send a message to
> the creator of the filter and wait for a reply.  I think that Landlock
> will want exactly the same feature.

I don't think why this may be useful at all her. Landlock does not
filter at the syscall level but handles kernel object and actions as
does an LSM. That is the whole purpose of Landlock.

> 
> In other words, it really seems to be that you should extend seccomp()
> with the ability to attach filters to things that aren't syscall
> entry, e.g. file open.

It seems that it is what I do… Not sure to understand you here.

> 
> I would also seriously consider doing a scaled-back Landlock variant
> first, with the intent of getting the main mechanism into the kernel.
> In particular, there are two big sources of complexity in Landlock.
> You need to deal with the API for managing bpf programs that filter
> various actions beyond just syscall entry, and you need to deal with
> giving those filters a way to deal with inodes, paths, etc.  But you
> can do the former without the latter.  For example, you could start
> with some Landlock-style filters on things that have nothing to do
> with files.  For example, you could allow a filter for connecting to
> an abstract-namespace unix socket.  Or you could have a hook for
> file_receive.  (You couldn't meaningfully filter based on the *path*
> of the fd being received without adding all the path infrastructure,
> but you could fitler on the *type* of the fd being received.)  Both of
> these add new sandboxing abilities that don't currently exist.  In
> particular, you can't write a seccomp rule that prevents receiving an
> fd using recvmsg() right now unless you block cmsg entirely.  And you
> can't write a filter that allows connecting to unix sockets by path
> without allowing abstract namespace sockets either.

What you are proposing here seems like my previous patch series (v7). As
it was suggested by Kees (and requested by future users), this patch
series is able to handle file paths. This is a good thing because it
highlight that this design can scale to evaluate complex objects (i.e.
file paths), which was not the case for the previous patch series.

> 
> If you split up Landlock like this then, once you got all the
> installation and management of filters down, you could submit patches
> to add all the path stuff and deal with that review separately.
> 
> What do you all think?
> 

I may be able to strip some parts for a first inclusion, but a complex
use case like controlling access to files is an important use case.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27  5:01     ` Andy Lutomirski
@ 2018-02-27 22:14       ` Mickaël Salaün
  2018-02-27 23:02         ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27 22:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 6196 bytes --]


On 27/02/2018 06:01, Andy Lutomirski wrote:
> 
> 
>> On Feb 26, 2018, at 8:17 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> A landlocked process has less privileges than a non-landlocked process
>>> and must then be subject to additional restrictions when manipulating
>>> processes. To be allowed to use ptrace(2) and related syscalls on a
>>> target process, a landlocked process must have a subset of the target
>>> process' rules.
>>>
>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> Cc: David S. Miller <davem@davemloft.net>
>>> Cc: James Morris <james.l.morris@oracle.com>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Serge E. Hallyn <serge@hallyn.com>
>>> ---
>>>
>>> Changes since v6:
>>> * factor out ptrace check
>>> * constify pointers
>>> * cleanup headers
>>> * use the new security_add_hooks()
>>> ---
>>> security/landlock/Makefile       |   2 +-
>>> security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
>>> security/landlock/hooks_ptrace.h |  11 ++++
>>> security/landlock/init.c         |   2 +
>>> 4 files changed, 138 insertions(+), 1 deletion(-)
>>> create mode 100644 security/landlock/hooks_ptrace.c
>>> create mode 100644 security/landlock/hooks_ptrace.h
>>>
>>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>>> index d0f532a93b4e..605504d852d3 100644
>>> --- a/security/landlock/Makefile
>>> +++ b/security/landlock/Makefile
>>> @@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>>> landlock-y := init.o chain.o task.o \
>>>        tag.o tag_fs.o \
>>>        enforce.o enforce_seccomp.o \
>>> -       hooks.o hooks_cred.o hooks_fs.o
>>> +       hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
>>> diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
>>> new file mode 100644
>>> index 000000000000..f1b977b9c808
>>> --- /dev/null
>>> +++ b/security/landlock/hooks_ptrace.c
>>> @@ -0,0 +1,124 @@
>>> +/*
>>> + * Landlock LSM - ptrace hooks
>>> + *
>>> + * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2, as
>>> + * published by the Free Software Foundation.
>>> + */
>>> +
>>> +#include <asm/current.h>
>>> +#include <linux/errno.h>
>>> +#include <linux/kernel.h> /* ARRAY_SIZE */
>>> +#include <linux/lsm_hooks.h>
>>> +#include <linux/sched.h> /* struct task_struct */
>>> +#include <linux/seccomp.h>
>>> +
>>> +#include "common.h" /* struct landlock_prog_set */
>>> +#include "hooks.h" /* landlocked() */
>>> +#include "hooks_ptrace.h"
>>> +
>>> +static bool progs_are_subset(const struct landlock_prog_set *parent,
>>> +               const struct landlock_prog_set *child)
>>> +{
>>> +       size_t i;
>>> +
>>> +       if (!parent || !child)
>>> +               return false;
>>> +       if (parent == child)
>>> +               return true;
>>> +
>>> +       for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
>>
>> ARRAY_SIZE(child->programs) seems misleading.  Is there no define
>> NUM_LANDLOCK_PROG_TYPES or similar?
>>
>>> +               struct landlock_prog_list *walker;
>>> +               bool found_parent = false;
>>> +
>>> +               if (!parent->programs[i])
>>> +                       continue;
>>> +               for (walker = child->programs[i]; walker;
>>> +                               walker = walker->prev) {
>>> +                       if (walker == parent->programs[i]) {
>>> +                               found_parent = true;
>>> +                               break;
>>> +                       }
>>> +               }
>>> +               if (!found_parent)
>>> +                       return false;
>>> +       }
>>> +       return true;
>>> +}
>>
>> If you used seccomp, you'd get this type of check for free, and it
>> would be a lot easier to comprehend.  AFAICT the only extra leniency
>> you're granting is that you're agnostic to the order in which the
>> rules associated with different program types were applied, which
>> could easily be added to seccomp.
> 
> On second thought, this is all way too complicated.  I think the correct logic is either "if you are filtered by Landlock, you cannot ptrace anything" or to delete this patch entirely.

This does not fit a lot of use cases like running a container
constrained with some Landlock programs. We should not deny users the
ability to debug their stuff.

This patch add the minimal protection which are needed to have
meaningful Landlock security policy. Without it, they may be easily
bypassable, hence useless.


> If something like Tycho's notifiers goes in, then it's not obvious that, just because you have the same set of filters, you have the same privilege.  Similarly, if a feature that lets a filter query its cgroup goes in (and you proposed this once!) then the logic you implemented here is wrong.

I don't get your point. Please take a look at the tests (patch 10).

> 
> Or you could just say that it's the responsibility of a Landlock user to properly filter ptrace() just like it's the responsibility of seccomp users to filter ptrace if needed.

A user should be able to enforce a security policy on ptrace as well,
but this patch enforce a minimal set of security boundaries. It will be
easy to add a new Landlock program type to get this kind of access control.

> 
> I take this as further evidence that Landlock makes much more sense as part of seccomp than as a totally separate thing.  We've very carefully reviewed these things for seccomp.  Please don't make us do it again from scratch.
> 

Landlock is more complex than seccomp, because of its different goal.
seccomp is less restrictive because it is more simple, but this patch
follow the same logic with the knowledge of the Landlock internals.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27  4:17   ` Andy Lutomirski
  2018-02-27  5:01     ` Andy Lutomirski
@ 2018-02-27 22:18     ` Mickaël Salaün
  1 sibling, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-27 22:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 3461 bytes --]



On 27/02/2018 05:17, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> A landlocked process has less privileges than a non-landlocked process
>> and must then be subject to additional restrictions when manipulating
>> processes. To be allowed to use ptrace(2) and related syscalls on a
>> target process, a landlocked process must have a subset of the target
>> process' rules.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Serge E. Hallyn <serge@hallyn.com>
>> ---
>>
>> Changes since v6:
>> * factor out ptrace check
>> * constify pointers
>> * cleanup headers
>> * use the new security_add_hooks()
>> ---
>>  security/landlock/Makefile       |   2 +-
>>  security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
>>  security/landlock/hooks_ptrace.h |  11 ++++
>>  security/landlock/init.c         |   2 +
>>  4 files changed, 138 insertions(+), 1 deletion(-)
>>  create mode 100644 security/landlock/hooks_ptrace.c
>>  create mode 100644 security/landlock/hooks_ptrace.h
>>
>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>> index d0f532a93b4e..605504d852d3 100644
>> --- a/security/landlock/Makefile
>> +++ b/security/landlock/Makefile
>> @@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>>  landlock-y := init.o chain.o task.o \
>>         tag.o tag_fs.o \
>>         enforce.o enforce_seccomp.o \
>> -       hooks.o hooks_cred.o hooks_fs.o
>> +       hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
>> diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
>> new file mode 100644
>> index 000000000000..f1b977b9c808
>> --- /dev/null
>> +++ b/security/landlock/hooks_ptrace.c
>> @@ -0,0 +1,124 @@
>> +/*
>> + * Landlock LSM - ptrace hooks
>> + *
>> + * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2, as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <asm/current.h>
>> +#include <linux/errno.h>
>> +#include <linux/kernel.h> /* ARRAY_SIZE */
>> +#include <linux/lsm_hooks.h>
>> +#include <linux/sched.h> /* struct task_struct */
>> +#include <linux/seccomp.h>
>> +
>> +#include "common.h" /* struct landlock_prog_set */
>> +#include "hooks.h" /* landlocked() */
>> +#include "hooks_ptrace.h"
>> +
>> +static bool progs_are_subset(const struct landlock_prog_set *parent,
>> +               const struct landlock_prog_set *child)
>> +{
>> +       size_t i;
>> +
>> +       if (!parent || !child)
>> +               return false;
>> +       if (parent == child)
>> +               return true;
>> +
>> +       for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
> 
> ARRAY_SIZE(child->programs) seems misleading.  Is there no define
> NUM_LANDLOCK_PROG_TYPES or similar?

Yes, there is _LANDLOCK_HOOK_LAST, but this code seems more readable
exactly because it does not require the developer (or the code checking
tools) to know about this static value.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27 22:14       ` Mickaël Salaün
@ 2018-02-27 23:02         ` Andy Lutomirski
  2018-02-27 23:23           ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27 23:02 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development

On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 27/02/2018 06:01, Andy Lutomirski wrote:
>>
>>
>>> On Feb 26, 2018, at 8:17 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>
>>>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>> A landlocked process has less privileges than a non-landlocked process
>>>> and must then be subject to additional restrictions when manipulating
>>>> processes. To be allowed to use ptrace(2) and related syscalls on a
>>>> target process, a landlocked process must have a subset of the target
>>>> process' rules.
>>>>
>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>> Cc: David S. Miller <davem@davemloft.net>
>>>> Cc: James Morris <james.l.morris@oracle.com>
>>>> Cc: Kees Cook <keescook@chromium.org>
>>>> Cc: Serge E. Hallyn <serge@hallyn.com>
>>>> ---
>>>>
>>>> Changes since v6:
>>>> * factor out ptrace check
>>>> * constify pointers
>>>> * cleanup headers
>>>> * use the new security_add_hooks()
>>>> ---
>>>> security/landlock/Makefile       |   2 +-
>>>> security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
>>>> security/landlock/hooks_ptrace.h |  11 ++++
>>>> security/landlock/init.c         |   2 +
>>>> 4 files changed, 138 insertions(+), 1 deletion(-)
>>>> create mode 100644 security/landlock/hooks_ptrace.c
>>>> create mode 100644 security/landlock/hooks_ptrace.h
>>>>
>>>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>>>> index d0f532a93b4e..605504d852d3 100644
>>>> --- a/security/landlock/Makefile
>>>> +++ b/security/landlock/Makefile
>>>> @@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>>>> landlock-y := init.o chain.o task.o \
>>>>        tag.o tag_fs.o \
>>>>        enforce.o enforce_seccomp.o \
>>>> -       hooks.o hooks_cred.o hooks_fs.o
>>>> +       hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
>>>> diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
>>>> new file mode 100644
>>>> index 000000000000..f1b977b9c808
>>>> --- /dev/null
>>>> +++ b/security/landlock/hooks_ptrace.c
>>>> @@ -0,0 +1,124 @@
>>>> +/*
>>>> + * Landlock LSM - ptrace hooks
>>>> + *
>>>> + * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2, as
>>>> + * published by the Free Software Foundation.
>>>> + */
>>>> +
>>>> +#include <asm/current.h>
>>>> +#include <linux/errno.h>
>>>> +#include <linux/kernel.h> /* ARRAY_SIZE */
>>>> +#include <linux/lsm_hooks.h>
>>>> +#include <linux/sched.h> /* struct task_struct */
>>>> +#include <linux/seccomp.h>
>>>> +
>>>> +#include "common.h" /* struct landlock_prog_set */
>>>> +#include "hooks.h" /* landlocked() */
>>>> +#include "hooks_ptrace.h"
>>>> +
>>>> +static bool progs_are_subset(const struct landlock_prog_set *parent,
>>>> +               const struct landlock_prog_set *child)
>>>> +{
>>>> +       size_t i;
>>>> +
>>>> +       if (!parent || !child)
>>>> +               return false;
>>>> +       if (parent == child)
>>>> +               return true;
>>>> +
>>>> +       for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
>>>
>>> ARRAY_SIZE(child->programs) seems misleading.  Is there no define
>>> NUM_LANDLOCK_PROG_TYPES or similar?
>>>
>>>> +               struct landlock_prog_list *walker;
>>>> +               bool found_parent = false;
>>>> +
>>>> +               if (!parent->programs[i])
>>>> +                       continue;
>>>> +               for (walker = child->programs[i]; walker;
>>>> +                               walker = walker->prev) {
>>>> +                       if (walker == parent->programs[i]) {
>>>> +                               found_parent = true;
>>>> +                               break;
>>>> +                       }
>>>> +               }
>>>> +               if (!found_parent)
>>>> +                       return false;
>>>> +       }
>>>> +       return true;
>>>> +}
>>>
>>> If you used seccomp, you'd get this type of check for free, and it
>>> would be a lot easier to comprehend.  AFAICT the only extra leniency
>>> you're granting is that you're agnostic to the order in which the
>>> rules associated with different program types were applied, which
>>> could easily be added to seccomp.
>>
>> On second thought, this is all way too complicated.  I think the correct logic is either "if you are filtered by Landlock, you cannot ptrace anything" or to delete this patch entirely.
>
> This does not fit a lot of use cases like running a container
> constrained with some Landlock programs. We should not deny users the
> ability to debug their stuff.
>
> This patch add the minimal protection which are needed to have
> meaningful Landlock security policy. Without it, they may be easily
> bypassable, hence useless.
>

I think you're wrong here.  Any sane container trying to use Landlock
like this would also create a PID namespace.  Problem solved.  I still
think you should drop this patch.

>
>> If something like Tycho's notifiers goes in, then it's not obvious that, just because you have the same set of filters, you have the same privilege.  Similarly, if a feature that lets a filter query its cgroup goes in (and you proposed this once!) then the logic you implemented here is wrong.
>
> I don't get your point. Please take a look at the tests (patch 10).

I don't know what you want me to look at.

What I'm saying is: suppose I write a filter like this:

bool allow_some_action(void)
{
  int value_from_container_manager = call_out_to_user_notifier();
  return value_from_container_manager == 42;
}

or

bool allow_some_action(void)
{
  return my_cgroup_is("/foo/bar");
}

In both of these cases, your code will do the wrong thing.

>
>>
>> Or you could just say that it's the responsibility of a Landlock user to properly filter ptrace() just like it's the responsibility of seccomp users to filter ptrace if needed.
>
> A user should be able to enforce a security policy on ptrace as well,
> but this patch enforce a minimal set of security boundaries. It will be
> easy to add a new Landlock program type to get this kind of access control.

It sounds like you want Landlock to be a complete container system all
by itself.  I disagree with that design goal.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-02-27 22:03   ` Mickaël Salaün
@ 2018-02-27 23:09     ` Andy Lutomirski
  2018-03-06 22:25       ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27 23:09 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development

On Tue, Feb 27, 2018 at 10:03 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 27/02/2018 05:36, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> Hi,
>>>

>>>
>>> ## Why use the seccomp(2) syscall?
>>>
>>> Landlock use the same semantic as seccomp to apply access rule
>>> restrictions. It add a new layer of security for the current process
>>> which is inherited by its children. It makes sense to use an unique
>>> access-restricting syscall (that should be allowed by seccomp filters)
>>> which can only drop privileges. Moreover, a Landlock rule could come
>>> from outside a process (e.g.  passed through a UNIX socket). It is then
>>> useful to differentiate the creation/load of Landlock eBPF programs via
>>> bpf(2), from rule enforcement via seccomp(2).
>>
>> This seems like a weak argument to me.  Sure, this is a bit different
>> from seccomp(), and maybe shoving it into the seccomp() multiplexer is
>> awkward, but surely the bpf() multiplexer is even less applicable.
>
> I think using the seccomp syscall is fine, and everyone agreed on it.
>

Ah, sorry, I completely misread what you wrote.  My apologies.  You
can disregard most of my email.

>
>>
>> Also, looking forward, I think you're going to want a bunch of the
>> stuff that's under consideration as new seccomp features.  Tycho is
>> working on a "user notifier" feature for seccomp where, in addition to
>> accepting, rejecting, or kicking to ptrace, you can send a message to
>> the creator of the filter and wait for a reply.  I think that Landlock
>> will want exactly the same feature.
>
> I don't think why this may be useful at all her. Landlock does not
> filter at the syscall level but handles kernel object and actions as
> does an LSM. That is the whole purpose of Landlock.

Suppose I'm writing a container manager.  I want to run "mount" in the
container, but I don't want to allow moun() in general and I want to
emulate certain mount() actions.  I can write a filter that catches
mount using seccomp and calls out to the container manager for help.
This isn't theoretical -- Tycho wants *exactly* this use case to be
supported.

But using seccomp for this is indeed annoying.  It would be nice to
use Landlock's ability to filter based on the filesystem type, for
example.  So Tycho could write a Landlock rule like:

bool filter_mount(...)
{
  if (path needs emulation)
    call_user_notifier();
}

And it should work.

This means that, if both seccomp user notifiers and Landlock make it
upstream, then there should probably be a way to have a user notifier
bound to a seccomp filter and a set of landlock filters.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27 23:02         ` Andy Lutomirski
@ 2018-02-27 23:23           ` Andy Lutomirski
  2018-02-28  0:00             ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-27 23:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development

On Tue, Feb 27, 2018 at 11:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 27/02/2018 06:01, Andy Lutomirski wrote:
>>>
>>>
>>>> On Feb 26, 2018, at 8:17 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>
>>>>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>> A landlocked process has less privileges than a non-landlocked process
>>>>> and must then be subject to additional restrictions when manipulating
>>>>> processes. To be allowed to use ptrace(2) and related syscalls on a
>>>>> target process, a landlocked process must have a subset of the target
>>>>> process' rules.
>>>>>
>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>>> Cc: David S. Miller <davem@davemloft.net>
>>>>> Cc: James Morris <james.l.morris@oracle.com>
>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>> Cc: Serge E. Hallyn <serge@hallyn.com>
>>>>> ---
>>>>>
>>>>> Changes since v6:
>>>>> * factor out ptrace check
>>>>> * constify pointers
>>>>> * cleanup headers
>>>>> * use the new security_add_hooks()
>>>>> ---
>>>>> security/landlock/Makefile       |   2 +-
>>>>> security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
>>>>> security/landlock/hooks_ptrace.h |  11 ++++
>>>>> security/landlock/init.c         |   2 +
>>>>> 4 files changed, 138 insertions(+), 1 deletion(-)
>>>>> create mode 100644 security/landlock/hooks_ptrace.c
>>>>> create mode 100644 security/landlock/hooks_ptrace.h
>>>>>
>>>>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>>>>> index d0f532a93b4e..605504d852d3 100644
>>>>> --- a/security/landlock/Makefile
>>>>> +++ b/security/landlock/Makefile
>>>>> @@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>>>>> landlock-y := init.o chain.o task.o \
>>>>>        tag.o tag_fs.o \
>>>>>        enforce.o enforce_seccomp.o \
>>>>> -       hooks.o hooks_cred.o hooks_fs.o
>>>>> +       hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
>>>>> diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
>>>>> new file mode 100644
>>>>> index 000000000000..f1b977b9c808
>>>>> --- /dev/null
>>>>> +++ b/security/landlock/hooks_ptrace.c
>>>>> @@ -0,0 +1,124 @@
>>>>> +/*
>>>>> + * Landlock LSM - ptrace hooks
>>>>> + *
>>>>> + * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
>>>>> + *
>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>> + * it under the terms of the GNU General Public License version 2, as
>>>>> + * published by the Free Software Foundation.
>>>>> + */
>>>>> +
>>>>> +#include <asm/current.h>
>>>>> +#include <linux/errno.h>
>>>>> +#include <linux/kernel.h> /* ARRAY_SIZE */
>>>>> +#include <linux/lsm_hooks.h>
>>>>> +#include <linux/sched.h> /* struct task_struct */
>>>>> +#include <linux/seccomp.h>
>>>>> +
>>>>> +#include "common.h" /* struct landlock_prog_set */
>>>>> +#include "hooks.h" /* landlocked() */
>>>>> +#include "hooks_ptrace.h"
>>>>> +
>>>>> +static bool progs_are_subset(const struct landlock_prog_set *parent,
>>>>> +               const struct landlock_prog_set *child)
>>>>> +{
>>>>> +       size_t i;
>>>>> +
>>>>> +       if (!parent || !child)
>>>>> +               return false;
>>>>> +       if (parent == child)
>>>>> +               return true;
>>>>> +
>>>>> +       for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
>>>>
>>>> ARRAY_SIZE(child->programs) seems misleading.  Is there no define
>>>> NUM_LANDLOCK_PROG_TYPES or similar?
>>>>
>>>>> +               struct landlock_prog_list *walker;
>>>>> +               bool found_parent = false;
>>>>> +
>>>>> +               if (!parent->programs[i])
>>>>> +                       continue;
>>>>> +               for (walker = child->programs[i]; walker;
>>>>> +                               walker = walker->prev) {
>>>>> +                       if (walker == parent->programs[i]) {
>>>>> +                               found_parent = true;
>>>>> +                               break;
>>>>> +                       }
>>>>> +               }
>>>>> +               if (!found_parent)
>>>>> +                       return false;
>>>>> +       }
>>>>> +       return true;
>>>>> +}
>>>>
>>>> If you used seccomp, you'd get this type of check for free, and it
>>>> would be a lot easier to comprehend.  AFAICT the only extra leniency
>>>> you're granting is that you're agnostic to the order in which the
>>>> rules associated with different program types were applied, which
>>>> could easily be added to seccomp.
>>>
>>> On second thought, this is all way too complicated.  I think the correct logic is either "if you are filtered by Landlock, you cannot ptrace anything" or to delete this patch entirely.
>>
>> This does not fit a lot of use cases like running a container
>> constrained with some Landlock programs. We should not deny users the
>> ability to debug their stuff.
>>
>> This patch add the minimal protection which are needed to have
>> meaningful Landlock security policy. Without it, they may be easily
>> bypassable, hence useless.
>>
>
> I think you're wrong here.  Any sane container trying to use Landlock
> like this would also create a PID namespace.  Problem solved.  I still
> think you should drop this patch.
>
>>
>>> If something like Tycho's notifiers goes in, then it's not obvious that, just because you have the same set of filters, you have the same privilege.  Similarly, if a feature that lets a filter query its cgroup goes in (and you proposed this once!) then the logic you implemented here is wrong.
>>
>> I don't get your point. Please take a look at the tests (patch 10).
>
> I don't know what you want me to look at.
>
> What I'm saying is: suppose I write a filter like this:
>
> bool allow_some_action(void)
> {
>   int value_from_container_manager = call_out_to_user_notifier();
>   return value_from_container_manager == 42;
> }
>
> or
>
> bool allow_some_action(void)
> {
>   return my_cgroup_is("/foo/bar");
> }
>
> In both of these cases, your code will do the wrong thing.
>
>>
>>>
>>> Or you could just say that it's the responsibility of a Landlock user to properly filter ptrace() just like it's the responsibility of seccomp users to filter ptrace if needed.
>>
>> A user should be able to enforce a security policy on ptrace as well,
>> but this patch enforce a minimal set of security boundaries. It will be
>> easy to add a new Landlock program type to get this kind of access control.
>
> It sounds like you want Landlock to be a complete container system all
> by itself.  I disagree with that design goal.

Having actually read your series more correctly now (oops!), I still
think that this patch should be dropped.  I can see an argument for
having a flag that one can set when adding a seccomp filter that says
"prevent ptrace of any child that doesn't have this exact stack
installed", but I think that could be added later and should not be
part of an initial submission.  For now, Landlock users can block
ptrace() entirely or use PID namespaces.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-27 23:23           ` Andy Lutomirski
@ 2018-02-28  0:00             ` Mickaël Salaün
  2018-02-28  0:09               ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-02-28  0:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 9732 bytes --]


On 28/02/2018 00:23, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 11:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> On 27/02/2018 06:01, Andy Lutomirski wrote:
>>>>
>>>>
>>>>> On Feb 26, 2018, at 8:17 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>
>>>>>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>>> A landlocked process has less privileges than a non-landlocked process
>>>>>> and must then be subject to additional restrictions when manipulating
>>>>>> processes. To be allowed to use ptrace(2) and related syscalls on a
>>>>>> target process, a landlocked process must have a subset of the target
>>>>>> process' rules.
>>>>>>
>>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>>>> Cc: David S. Miller <davem@davemloft.net>
>>>>>> Cc: James Morris <james.l.morris@oracle.com>
>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>> Cc: Serge E. Hallyn <serge@hallyn.com>
>>>>>> ---
>>>>>>
>>>>>> Changes since v6:
>>>>>> * factor out ptrace check
>>>>>> * constify pointers
>>>>>> * cleanup headers
>>>>>> * use the new security_add_hooks()
>>>>>> ---
>>>>>> security/landlock/Makefile       |   2 +-
>>>>>> security/landlock/hooks_ptrace.c | 124 +++++++++++++++++++++++++++++++++++++++
>>>>>> security/landlock/hooks_ptrace.h |  11 ++++
>>>>>> security/landlock/init.c         |   2 +
>>>>>> 4 files changed, 138 insertions(+), 1 deletion(-)
>>>>>> create mode 100644 security/landlock/hooks_ptrace.c
>>>>>> create mode 100644 security/landlock/hooks_ptrace.h
>>>>>>
>>>>>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>>>>>> index d0f532a93b4e..605504d852d3 100644
>>>>>> --- a/security/landlock/Makefile
>>>>>> +++ b/security/landlock/Makefile
>>>>>> @@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>>>>>> landlock-y := init.o chain.o task.o \
>>>>>>        tag.o tag_fs.o \
>>>>>>        enforce.o enforce_seccomp.o \
>>>>>> -       hooks.o hooks_cred.o hooks_fs.o
>>>>>> +       hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
>>>>>> diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..f1b977b9c808
>>>>>> --- /dev/null
>>>>>> +++ b/security/landlock/hooks_ptrace.c
>>>>>> @@ -0,0 +1,124 @@
>>>>>> +/*
>>>>>> + * Landlock LSM - ptrace hooks
>>>>>> + *
>>>>>> + * Copyright © 2017 Mickaël Salaün <mic@digikod.net>
>>>>>> + *
>>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>>> + * it under the terms of the GNU General Public License version 2, as
>>>>>> + * published by the Free Software Foundation.
>>>>>> + */
>>>>>> +
>>>>>> +#include <asm/current.h>
>>>>>> +#include <linux/errno.h>
>>>>>> +#include <linux/kernel.h> /* ARRAY_SIZE */
>>>>>> +#include <linux/lsm_hooks.h>
>>>>>> +#include <linux/sched.h> /* struct task_struct */
>>>>>> +#include <linux/seccomp.h>
>>>>>> +
>>>>>> +#include "common.h" /* struct landlock_prog_set */
>>>>>> +#include "hooks.h" /* landlocked() */
>>>>>> +#include "hooks_ptrace.h"
>>>>>> +
>>>>>> +static bool progs_are_subset(const struct landlock_prog_set *parent,
>>>>>> +               const struct landlock_prog_set *child)
>>>>>> +{
>>>>>> +       size_t i;
>>>>>> +
>>>>>> +       if (!parent || !child)
>>>>>> +               return false;
>>>>>> +       if (parent == child)
>>>>>> +               return true;
>>>>>> +
>>>>>> +       for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
>>>>>
>>>>> ARRAY_SIZE(child->programs) seems misleading.  Is there no define
>>>>> NUM_LANDLOCK_PROG_TYPES or similar?
>>>>>
>>>>>> +               struct landlock_prog_list *walker;
>>>>>> +               bool found_parent = false;
>>>>>> +
>>>>>> +               if (!parent->programs[i])
>>>>>> +                       continue;
>>>>>> +               for (walker = child->programs[i]; walker;
>>>>>> +                               walker = walker->prev) {
>>>>>> +                       if (walker == parent->programs[i]) {
>>>>>> +                               found_parent = true;
>>>>>> +                               break;
>>>>>> +                       }
>>>>>> +               }
>>>>>> +               if (!found_parent)
>>>>>> +                       return false;
>>>>>> +       }
>>>>>> +       return true;
>>>>>> +}
>>>>>
>>>>> If you used seccomp, you'd get this type of check for free, and it
>>>>> would be a lot easier to comprehend.  AFAICT the only extra leniency
>>>>> you're granting is that you're agnostic to the order in which the
>>>>> rules associated with different program types were applied, which
>>>>> could easily be added to seccomp.
>>>>
>>>> On second thought, this is all way too complicated.  I think the correct logic is either "if you are filtered by Landlock, you cannot ptrace anything" or to delete this patch entirely.
>>>
>>> This does not fit a lot of use cases like running a container
>>> constrained with some Landlock programs. We should not deny users the
>>> ability to debug their stuff.
>>>
>>> This patch add the minimal protection which are needed to have
>>> meaningful Landlock security policy. Without it, they may be easily
>>> bypassable, hence useless.
>>>
>>
>> I think you're wrong here.  Any sane container trying to use Landlock
>> like this would also create a PID namespace.  Problem solved.  I still
>> think you should drop this patch.

Containers is one use case, another is build-in sandboxing (e.g. for web
browser…) and another one is for sandbox managers (e.g. Firejail,
Bubblewrap, Flatpack…). In some of these use cases, especially from a
developer point of view, you may want/need to debug your applications
(without requiring to be root). For nested Landlock access-controls
(e.g. container + user session + web browser), it may not be allowed to
create a PID namespace, but you still want to have a meaningful
access-control.

>>
>>>
>>>> If something like Tycho's notifiers goes in, then it's not obvious that, just because you have the same set of filters, you have the same privilege.  Similarly, if a feature that lets a filter query its cgroup goes in (and you proposed this once!) then the logic you implemented here is wrong.
>>>
>>> I don't get your point. Please take a look at the tests (patch 10).
>>
>> I don't know what you want me to look at.
>>
>> What I'm saying is: suppose I write a filter like this:
>>
>> bool allow_some_action(void)
>> {
>>   int value_from_container_manager = call_out_to_user_notifier();
>>   return value_from_container_manager == 42;
>> }
>>
>> or
>>
>> bool allow_some_action(void)
>> {
>>   return my_cgroup_is("/foo/bar");
>> }
>>
>> In both of these cases, your code will do the wrong thing.

You are right about the fact that the same filters/programs may not be
equivalent if they use external data (other than from the eBPF context)
to take a decision. This is why using a function
my_cgroup_is("/foo/bar") should not be allowed. If we want to enforce a
security policy according to a cgroup, the Landlock programs should be
pinned on this cgroup. This way, the kernel knows if this programs
should be called or not. It is the same argument I used in the thread
[PATCH bpf-next v8 05/11] about the cache.

The only way a Landlock program may change its behavior is because of an
eBPF map. However, in this case the map is common to all the instances
of this program.

To say it another way, the Landlock's enforce API (currently only
seccomp) is in charge of defining what is a subject. By using seccomp to
enforce a policy, the subject is a hierarchy of processes. By pinning a
Landlock program to a cgroup, the subject is the set of processes under
this cgroup. This is much more efficient than letting a program define
its one subjects. This also allows to audit which processes are
restricted by a set of Landlock programs. Because of that, calls to
functions like bpf_get_current_pid_tgid() should not be allowed (or
limited) for a Landlock program. Let's make this programs as pure as
possible. :)


>>
>>>
>>>>
>>>> Or you could just say that it's the responsibility of a Landlock user to properly filter ptrace() just like it's the responsibility of seccomp users to filter ptrace if needed.
>>>
>>> A user should be able to enforce a security policy on ptrace as well,
>>> but this patch enforce a minimal set of security boundaries. It will be
>>> easy to add a new Landlock program type to get this kind of access control.
>>
>> It sounds like you want Landlock to be a complete container system all
>> by itself.  I disagree with that design goal.
> 
> Having actually read your series more correctly now (oops!), I still
> think that this patch should be dropped.  I can see an argument for
> having a flag that one can set when adding a seccomp filter that says
> "prevent ptrace of any child that doesn't have this exact stack
> installed", but I think that could be added later and should not be
> part of an initial submission.  For now, Landlock users can block
> ptrace() entirely or use PID namespaces.
> 

I also though about using a flag, but we should encourage sane/safe
default behavior, which means at least to not have trivially bypassable
access-control rules, to not shoot yourself in the foot. However, a flag
could be added to disable this safe behavior.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-28  0:00             ` Mickaël Salaün
@ 2018-02-28  0:09               ` Andy Lutomirski
  2018-03-06 22:28                 ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-02-28  0:09 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Andy Lutomirski, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development

On Wed, Feb 28, 2018 at 12:00 AM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 28/02/2018 00:23, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 11:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>
>>>
>>> I think you're wrong here.  Any sane container trying to use Landlock
>>> like this would also create a PID namespace.  Problem solved.  I still
>>> think you should drop this patch.
>
> Containers is one use case, another is build-in sandboxing (e.g. for web
> browser…) and another one is for sandbox managers (e.g. Firejail,
> Bubblewrap, Flatpack…). In some of these use cases, especially from a
> developer point of view, you may want/need to debug your applications
> (without requiring to be root). For nested Landlock access-controls
> (e.g. container + user session + web browser), it may not be allowed to
> create a PID namespace, but you still want to have a meaningful
> access-control.
>

The consideration should be exactly the same as for normal seccomp.
If I'm in a container (using PID namespaces + seccomp) and a run a web
browser, I can debug the browser.

If there's a real use case for adding this type of automatic ptrace
protection, then by all means, let's add it as a general seccomp
feature.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata
  2018-02-27  0:41 ` [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata Mickaël Salaün
  2018-02-27  0:57   ` Al Viro
@ 2018-02-28 16:27   ` kbuild test robot
  2018-02-28 16:58   ` kbuild test robot
  2 siblings, 0 replies; 55+ messages in thread
From: kbuild test robot @ 2018-02-28 16:27 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kbuild-all, linux-kernel, Mickaël Salaün,
	Alexei Starovoitov, Andy Lutomirski, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, Alexander Viro,
	James Morris, John Johansen, Stephen Smalley, Tetsuo Handa,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]

Hi Mickaël,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20180228-233659
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-x017-201808 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from fs//quota/dquot.c:74:0:
>> include/linux/security.h:815:10: warning: 'struct nameidata_lookup' declared inside parameter list will not be visible outside of this definition or declaration
      struct nameidata_lookup *lookup, struct inode *inode)
             ^~~~~~~~~~~~~~~~

vim +815 include/linux/security.h

   813	
   814	static inline void security_nameidata_put_lookup(
 > 815			struct nameidata_lookup *lookup, struct inode *inode)
   816	{ }
   817	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26738 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata
  2018-02-27  0:41 ` [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata Mickaël Salaün
  2018-02-27  0:57   ` Al Viro
  2018-02-28 16:27   ` kbuild test robot
@ 2018-02-28 16:58   ` kbuild test robot
  2 siblings, 0 replies; 55+ messages in thread
From: kbuild test robot @ 2018-02-28 16:58 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kbuild-all, linux-kernel, Mickaël Salaün,
	Alexei Starovoitov, Andy Lutomirski, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev, Alexander Viro,
	James Morris, John Johansen, Stephen Smalley, Tetsuo Handa,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2547 bytes --]

Hi Mickaël,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20180228-233659
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-randconfig-a1-201808 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from init/main.c:37:0:
>> include/linux/security.h:815:43: warning: 'struct nameidata_lookup' declared inside parameter list
      struct nameidata_lookup *lookup, struct inode *inode)
                                              ^
>> include/linux/security.h:815:43: warning: its scope is only this definition or declaration, which is probably not what you want
--
   In file included from fs/namei.c:27:0:
>> include/linux/security.h:815:43: warning: 'struct nameidata_lookup' declared inside parameter list
      struct nameidata_lookup *lookup, struct inode *inode)
                                              ^
>> include/linux/security.h:815:43: warning: its scope is only this definition or declaration, which is probably not what you want
   fs/namei.c: In function 'restore_nameidata':
   fs/namei.c:531:36: error: 'struct nameidata' has no member named 'lookup'
     security_nameidata_put_lookup(&now->lookup, now->inode);
                                       ^
--
   In file included from include/linux/lsm_hooks.h:28:0,
                    from security/commoncap.c:15:
>> include/linux/security.h:815:43: warning: 'struct nameidata_lookup' declared inside parameter list
      struct nameidata_lookup *lookup, struct inode *inode)
                                              ^
>> include/linux/security.h:815:43: warning: its scope is only this definition or declaration, which is probably not what you want
   In file included from security/commoncap.c:15:0:
>> include/linux/lsm_hooks.h:1522:13: warning: 'struct nameidata_lookup' declared inside parameter list
         struct inode *inode);
                ^

vim +815 include/linux/security.h

   813	
   814	static inline void security_nameidata_put_lookup(
 > 815			struct nameidata_lookup *lookup, struct inode *inode)
   816	{ }
   817	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29018 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode
  2018-02-27  0:41 ` [PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode Mickaël Salaün
@ 2018-02-28 17:35   ` kbuild test robot
  0 siblings, 0 replies; 55+ messages in thread
From: kbuild test robot @ 2018-02-28 17:35 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kbuild-all, linux-kernel, Mickaël Salaün,
	Alexei Starovoitov, Andy Lutomirski, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, kernel-hardening,
	linux-api, linux-security-module, netdev

Hi Mickaël,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20180228-233659
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/linux/init.h:134:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/init.h:135:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/init.h:268:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/init.h:269:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/printk.h:200:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:32:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:34:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:37:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:38:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:40:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:42:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:43:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:45:5: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:46:5: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:49:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/qspinlock.h:53:32: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/workqueue.h:646:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/workqueue.h:647:5: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/numa.h:34:12: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/numa.h:35:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/numa.h:62:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/vmalloc.h:64:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/vmalloc.h:173:8: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/vmalloc.h:174:8: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:174:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:176:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:178:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:180:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/apic.h:254:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/apic.h:430:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/io_apic.h:184:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/smp.h:113:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/smp.h:125:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/smp.h:126:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:110:33: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:112:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:114:12: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:118:12: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:126:12: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/hrtimer.h:497:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/memory_hotplug.h:221:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/mmzone.h:1292:15: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/kmemleak.h:29:33: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/kasan.h:29:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/kasan.h:30:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/pgtable.h:28:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/slab.h:135:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/slab.h:716:6: sparse: attribute 'indirect_branch': unknown attribute
>> security/landlock/tag.c:127:18: sparse: incompatible types in comparison expression (different address spaces)
   security/landlock/tag.c:257:16: sparse: incompatible types in comparison expression (different address spaces)
   security/landlock/tag.c:263:24: sparse: incompatible types in comparison expression (different address spaces)
   security/landlock/tag.c:357:16: sparse: incompatible types in comparison expression (different address spaces)
--
   arch/x86/include/asm/mem_encrypt.h:37:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:38:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:40:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:42:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:43:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:45:5: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:46:5: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/mem_encrypt.h:49:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/qspinlock.h:53:32: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/workqueue.h:646:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/workqueue.h:647:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/wait_bit.h:41:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/numa.h:34:12: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/numa.h:35:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/numa.h:62:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/vmalloc.h:64:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/vmalloc.h:173:8: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/vmalloc.h:174:8: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:174:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:176:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:178:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/fixmap.h:180:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/apic.h:254:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/apic.h:430:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/io_apic.h:184:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/smp.h:113:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/smp.h:125:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/smp.h:126:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:110:33: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:112:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:114:12: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:118:12: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/percpu.h:126:12: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/fs.h:63:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/fs.h:64:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/fs.h:65:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/fs.h:66:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/memory_hotplug.h:221:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/mmzone.h:1292:15: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/fs.h:2422:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/fs.h:2423:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/fs.h:3330:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/hrtimer.h:497:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/kmemleak.h:29:33: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/kasan.h:29:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/kasan.h:30:6: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/pgtable.h:28:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/slab.h:135:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/slab.h:716:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/mm.h:1753:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/mm.h:1941:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/mm.h:2083:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/mm.h:2671:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/swiotlb.h:39:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/swiotlb.h:124:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/swiotlb.h:9:12: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/swiotlb.h:10:12: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/swiotlb.h:11:13: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/swiotlb.h:12:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/dma-contiguous.h:85:5: sparse: attribute 'indirect_branch': unknown attribute
   arch/x86/include/asm/vdso.h:44:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/cred.h:167:13: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/nsproxy.h:74:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/io.h:47:6: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/netdevice.h:302:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/netdevice.h:4056:5: sparse: attribute 'indirect_branch': unknown attribute
   include/linux/ftrace.h:462:6: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:59:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:95:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:120:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:150:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:191:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:231:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:285:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/bpf.h:315:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/xdp.h:28:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/xdp.h:53:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/xdp.h:155:1: sparse: attribute 'indirect_branch': unknown attribute
   include/trace/events/xdp.h:190:1: sparse: attribute 'indirect_branch': unknown attribute
   kernel/bpf/core.c:1549:31: sparse: incorrect type in return expression (different address spaces) @@    expected struct bpf_prog_array [noderef] <asn:4>* @@    got sn:4>* @@
   kernel/bpf/core.c:1549:31:    expected struct bpf_prog_array [noderef] <asn:4>*
   kernel/bpf/core.c:1549:31:    got void *
   kernel/bpf/core.c:1553:17: sparse: incorrect type in return expression (different address spaces) @@    expected struct bpf_prog_array [noderef] <asn:4>* @@    got rray [noderef] <asn:4>* @@
   kernel/bpf/core.c:1553:17:    expected struct bpf_prog_array [noderef] <asn:4>*
   kernel/bpf/core.c:1553:17:    got struct bpf_prog_array *<noident>
   kernel/bpf/core.c:1561:9: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct callback_head *head @@    got struct callback_hstruct callback_head *head @@
   kernel/bpf/core.c:1561:9:    expected struct callback_head *head
   kernel/bpf/core.c:1561:9:    got struct callback_head [noderef] <asn:4>*<noident>
   kernel/bpf/core.c:1624:34: sparse: incorrect type in initializer (different address spaces) @@    expected struct bpf_prog **prog @@    got struct bpf_prog *struct bpf_prog **prog @@
   kernel/bpf/core.c:1624:34:    expected struct bpf_prog **prog
   kernel/bpf/core.c:1624:34:    got struct bpf_prog *[noderef] <asn:4>*<noident>
   kernel/bpf/core.c:1647:31: sparse: incorrect type in assignment (different address spaces) @@    expected struct bpf_prog **existing_prog @@    got struct bpf_prog *struct bpf_prog **existing_prog @@
   kernel/bpf/core.c:1647:31:    expected struct bpf_prog **existing_prog
   kernel/bpf/core.c:1647:31:    got struct bpf_prog *[noderef] <asn:4>*<noident>
   kernel/bpf/core.c:1669:15: sparse: incorrect type in assignment (different address spaces) @@    expected struct bpf_prog_array *array @@    got struct bpf_prog_astruct bpf_prog_array *array @@
   kernel/bpf/core.c:1669:15:    expected struct bpf_prog_array *array
   kernel/bpf/core.c:1669:15:    got struct bpf_prog_array [noderef] <asn:4>*
   kernel/bpf/core.c:1675:31: sparse: incorrect type in assignment (different address spaces) @@    expected struct bpf_prog **[assigned] existing_prog @@    got structstruct bpf_prog **[assigned] existing_prog @@
   kernel/bpf/core.c:1675:31:    expected struct bpf_prog **[assigned] existing_prog
   kernel/bpf/core.c:1675:31:    got struct bpf_prog *[noderef] <asn:4>*<noident>
   include/trace/events/bpf.h:59:1: sparse: Using plain integer as NULL pointer
   include/trace/events/bpf.h:95:1: sparse: Using plain integer as NULL pointer
   include/trace/events/bpf.h:120:1: sparse: Using plain integer as NULL pointer
   include/trace/events/bpf.h:191:1: sparse: Using plain integer as NULL pointer
   include/trace/events/bpf.h:231:1: sparse: Using plain integer as NULL pointer
   include/trace/events/bpf.h:285:1: sparse: too many warnings

vim +127 security/landlock/tag.c

   116	
   117	/* return true if the tag_root is queued for freeing, false otherwise */
   118	static void put_tag_root(struct landlock_tag_root **root,
   119			spinlock_t *root_lock)
   120	{
   121		struct landlock_tag_root *freeme;
   122	
   123		if (!root || WARN_ON(!root_lock))
   124			return;
   125	
   126		rcu_read_lock();
 > 127		freeme = rcu_dereference(*root);
   128		if (WARN_ON(!freeme))
   129			goto out_rcu;
   130		if (!refcount_dec_and_lock(&freeme->tag_nb, root_lock))
   131			goto out_rcu;
   132	
   133		rcu_assign_pointer(*root, NULL);
   134		spin_unlock(root_lock);
   135		call_rcu(&freeme->rcu_put, put_tag_root_rcu);
   136	
   137	out_rcu:
   138		rcu_read_unlock();
   139	}
   140	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-02-27 23:09     ` Andy Lutomirski
@ 2018-03-06 22:25       ` Mickaël Salaün
  2018-03-06 22:33         ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-03-06 22:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 3784 bytes --]



On 28/02/2018 00:09, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 10:03 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 27/02/2018 05:36, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>> Hi,
>>>>
> 
>>>>
>>>> ## Why use the seccomp(2) syscall?
>>>>
>>>> Landlock use the same semantic as seccomp to apply access rule
>>>> restrictions. It add a new layer of security for the current process
>>>> which is inherited by its children. It makes sense to use an unique
>>>> access-restricting syscall (that should be allowed by seccomp filters)
>>>> which can only drop privileges. Moreover, a Landlock rule could come
>>>> from outside a process (e.g.  passed through a UNIX socket). It is then
>>>> useful to differentiate the creation/load of Landlock eBPF programs via
>>>> bpf(2), from rule enforcement via seccomp(2).
>>>
>>> This seems like a weak argument to me.  Sure, this is a bit different
>>> from seccomp(), and maybe shoving it into the seccomp() multiplexer is
>>> awkward, but surely the bpf() multiplexer is even less applicable.
>>
>> I think using the seccomp syscall is fine, and everyone agreed on it.
>>
> 
> Ah, sorry, I completely misread what you wrote.  My apologies.  You
> can disregard most of my email.
> 
>>
>>>
>>> Also, looking forward, I think you're going to want a bunch of the
>>> stuff that's under consideration as new seccomp features.  Tycho is
>>> working on a "user notifier" feature for seccomp where, in addition to
>>> accepting, rejecting, or kicking to ptrace, you can send a message to
>>> the creator of the filter and wait for a reply.  I think that Landlock
>>> will want exactly the same feature.
>>
>> I don't think why this may be useful at all her. Landlock does not
>> filter at the syscall level but handles kernel object and actions as
>> does an LSM. That is the whole purpose of Landlock.
> 
> Suppose I'm writing a container manager.  I want to run "mount" in the
> container, but I don't want to allow moun() in general and I want to
> emulate certain mount() actions.  I can write a filter that catches
> mount using seccomp and calls out to the container manager for help.
> This isn't theoretical -- Tycho wants *exactly* this use case to be
> supported.

Well, I think this use case should be handled with something like
LD_PRELOAD and a helper library. FYI, I did something like this:
https://github.com/stemjail/stemshim

Otherwise, we should think about enabling a process to (dynamically)
extend/patch the vDSO (similar to LD_PRELOAD but at the syscall level
and works with static binaries) for a subset of processes (the same way
seccomp filters are inherited). It may be more powerful and flexible
than extending the kernel/seccomp to patch (buggy?) userland.

> 
> But using seccomp for this is indeed annoying.  It would be nice to
> use Landlock's ability to filter based on the filesystem type, for
> example.  So Tycho could write a Landlock rule like:
> 
> bool filter_mount(...)
> {
>   if (path needs emulation)
>     call_user_notifier();
> }
> 
> And it should work.
> 
> This means that, if both seccomp user notifiers and Landlock make it
> upstream, then there should probably be a way to have a user notifier
> bound to a seccomp filter and a set of landlock filters.
> 

Using seccomp filters and Landlock programs may be powerful. However,
for this use case, I think a *post-syscall* vDSO-like (which could get
some data returned by a Landlock program) may be much more flexible
(with less kernel code). What is needed here is a way to know the kernel
semantic (Landlock) and a way to patch userland without patching its
code (vDSO-like).


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-02-28  0:09               ` Andy Lutomirski
@ 2018-03-06 22:28                 ` Mickaël Salaün
  2018-04-01 22:48                   ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-03-06 22:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 1534 bytes --]


On 28/02/2018 01:09, Andy Lutomirski wrote:
> On Wed, Feb 28, 2018 at 12:00 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 28/02/2018 00:23, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 11:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>> On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>>
>>>>
>>>> I think you're wrong here.  Any sane container trying to use Landlock
>>>> like this would also create a PID namespace.  Problem solved.  I still
>>>> think you should drop this patch.
>>
>> Containers is one use case, another is build-in sandboxing (e.g. for web
>> browser…) and another one is for sandbox managers (e.g. Firejail,
>> Bubblewrap, Flatpack…). In some of these use cases, especially from a
>> developer point of view, you may want/need to debug your applications
>> (without requiring to be root). For nested Landlock access-controls
>> (e.g. container + user session + web browser), it may not be allowed to
>> create a PID namespace, but you still want to have a meaningful
>> access-control.
>>
> 
> The consideration should be exactly the same as for normal seccomp.
> If I'm in a container (using PID namespaces + seccomp) and a run a web
> browser, I can debug the browser.
> 
> If there's a real use case for adding this type of automatic ptrace
> protection, then by all means, let's add it as a general seccomp
> feature.
> 

Right, it makes sense to add this feature to seccomp filters as well.
What do you think Kees?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-03-06 22:25       ` Mickaël Salaün
@ 2018-03-06 22:33         ` Andy Lutomirski
  2018-03-06 22:46           ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-03-06 22:33 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development

On Tue, Mar 6, 2018 at 10:25 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
>
> On 28/02/2018 00:09, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 10:03 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> On 27/02/2018 05:36, Andy Lutomirski wrote:
>>>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>> Hi,
>>>>>
>>
>>>>>
>>>>> ## Why use the seccomp(2) syscall?
>>>>>
>>>>> Landlock use the same semantic as seccomp to apply access rule
>>>>> restrictions. It add a new layer of security for the current process
>>>>> which is inherited by its children. It makes sense to use an unique
>>>>> access-restricting syscall (that should be allowed by seccomp filters)
>>>>> which can only drop privileges. Moreover, a Landlock rule could come
>>>>> from outside a process (e.g.  passed through a UNIX socket). It is then
>>>>> useful to differentiate the creation/load of Landlock eBPF programs via
>>>>> bpf(2), from rule enforcement via seccomp(2).
>>>>
>>>> This seems like a weak argument to me.  Sure, this is a bit different
>>>> from seccomp(), and maybe shoving it into the seccomp() multiplexer is
>>>> awkward, but surely the bpf() multiplexer is even less applicable.
>>>
>>> I think using the seccomp syscall is fine, and everyone agreed on it.
>>>
>>
>> Ah, sorry, I completely misread what you wrote.  My apologies.  You
>> can disregard most of my email.
>>
>>>
>>>>
>>>> Also, looking forward, I think you're going to want a bunch of the
>>>> stuff that's under consideration as new seccomp features.  Tycho is
>>>> working on a "user notifier" feature for seccomp where, in addition to
>>>> accepting, rejecting, or kicking to ptrace, you can send a message to
>>>> the creator of the filter and wait for a reply.  I think that Landlock
>>>> will want exactly the same feature.
>>>
>>> I don't think why this may be useful at all her. Landlock does not
>>> filter at the syscall level but handles kernel object and actions as
>>> does an LSM. That is the whole purpose of Landlock.
>>
>> Suppose I'm writing a container manager.  I want to run "mount" in the
>> container, but I don't want to allow moun() in general and I want to
>> emulate certain mount() actions.  I can write a filter that catches
>> mount using seccomp and calls out to the container manager for help.
>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>> supported.
>
> Well, I think this use case should be handled with something like
> LD_PRELOAD and a helper library. FYI, I did something like this:
> https://github.com/stemjail/stemshim

I doubt that will work for containers.  Containers that use user
namespaces and, for example, setuid programs aren't going to honor
LD_PRELOAD.

>
> Otherwise, we should think about enabling a process to (dynamically)
> extend/patch the vDSO (similar to LD_PRELOAD but at the syscall level
> and works with static binaries) for a subset of processes (the same way
> seccomp filters are inherited). It may be more powerful and flexible
> than extending the kernel/seccomp to patch (buggy?) userland.

Egads!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-03-06 22:33         ` Andy Lutomirski
@ 2018-03-06 22:46           ` Tycho Andersen
  2018-03-06 23:06             ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Tycho Andersen @ 2018-03-06 22:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development

On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote:
> >> Suppose I'm writing a container manager.  I want to run "mount" in the
> >> container, but I don't want to allow moun() in general and I want to
> >> emulate certain mount() actions.  I can write a filter that catches
> >> mount using seccomp and calls out to the container manager for help.
> >> This isn't theoretical -- Tycho wants *exactly* this use case to be
> >> supported.
> >
> > Well, I think this use case should be handled with something like
> > LD_PRELOAD and a helper library. FYI, I did something like this:
> > https://github.com/stemjail/stemshim
> 
> I doubt that will work for containers.  Containers that use user
> namespaces and, for example, setuid programs aren't going to honor
> LD_PRELOAD.

Or anything that calls syscalls directly, like go programs.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-03-06 22:46           ` Tycho Andersen
@ 2018-03-06 23:06             ` Mickaël Salaün
  2018-03-07  1:21               ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-03-06 23:06 UTC (permalink / raw)
  To: Tycho Andersen, Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Will Drewry, Kernel Hardening, Linux API, LSM List,
	Network Development


[-- Attachment #1.1: Type: text/plain, Size: 1530 bytes --]


On 06/03/2018 23:46, Tycho Andersen wrote:
> On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote:
>>>> Suppose I'm writing a container manager.  I want to run "mount" in the
>>>> container, but I don't want to allow moun() in general and I want to
>>>> emulate certain mount() actions.  I can write a filter that catches
>>>> mount using seccomp and calls out to the container manager for help.
>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>>>> supported.
>>>
>>> Well, I think this use case should be handled with something like
>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>> https://github.com/stemjail/stemshim
>>
>> I doubt that will work for containers.  Containers that use user
>> namespaces and, for example, setuid programs aren't going to honor
>> LD_PRELOAD.
> 
> Or anything that calls syscalls directly, like go programs.

That's why the vDSO-like approach. Enforcing an access control is not
the issue here, patching a buggy userland (without patching its code) is
the issue isn't it?

As far as I remember, the main problem is to handle file descriptors
while "emulating" the kernel behavior. This can be done with a "shim"
code mapped in every processes. Chrome used something like this (in a
previous sandbox mechanism) as a kind of emulation (with the current
seccomp-bpf ). I think it should be doable to replace the (userland)
emulation code with an IPC wrapper receiving file descriptors through
UNIX socket.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-03-06 23:06             ` Mickaël Salaün
@ 2018-03-07  1:21               ` Andy Lutomirski
  2018-03-08 23:51                 ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-03-07  1:21 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Tycho Andersen, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development

On Tue, Mar 6, 2018 at 11:06 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 06/03/2018 23:46, Tycho Andersen wrote:
>> On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote:
>>>>> Suppose I'm writing a container manager.  I want to run "mount" in the
>>>>> container, but I don't want to allow moun() in general and I want to
>>>>> emulate certain mount() actions.  I can write a filter that catches
>>>>> mount using seccomp and calls out to the container manager for help.
>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>>>>> supported.
>>>>
>>>> Well, I think this use case should be handled with something like
>>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>>> https://github.com/stemjail/stemshim
>>>
>>> I doubt that will work for containers.  Containers that use user
>>> namespaces and, for example, setuid programs aren't going to honor
>>> LD_PRELOAD.
>>
>> Or anything that calls syscalls directly, like go programs.
>
> That's why the vDSO-like approach. Enforcing an access control is not
> the issue here, patching a buggy userland (without patching its code) is
> the issue isn't it?
>
> As far as I remember, the main problem is to handle file descriptors
> while "emulating" the kernel behavior. This can be done with a "shim"
> code mapped in every processes. Chrome used something like this (in a
> previous sandbox mechanism) as a kind of emulation (with the current
> seccomp-bpf ). I think it should be doable to replace the (userland)
> emulation code with an IPC wrapper receiving file descriptors through
> UNIX socket.
>

Can you explain exactly what you mean by "vDSO-like"?

When a 64-bit program does a syscall, it just executes the SYSCALL
instruction.  The vDSO isn't involved at all.  32-bit programs usually
go through the vDSO, but not always.

It could be possible to force-load a DSO into an entire container and
rig up seccomp to intercept all SYSCALLs not originating from the DSO
such that they merely redirect control to the DSO, but that seems
quite messy.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-03-07  1:21               ` Andy Lutomirski
@ 2018-03-08 23:51                 ` Mickaël Salaün
  2018-03-08 23:53                   ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-03-08 23:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tycho Andersen, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 3183 bytes --]


On 07/03/2018 02:21, Andy Lutomirski wrote:
> On Tue, Mar 6, 2018 at 11:06 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 06/03/2018 23:46, Tycho Andersen wrote:
>>> On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote:
>>>>>> Suppose I'm writing a container manager.  I want to run "mount" in the
>>>>>> container, but I don't want to allow moun() in general and I want to
>>>>>> emulate certain mount() actions.  I can write a filter that catches
>>>>>> mount using seccomp and calls out to the container manager for help.
>>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>>>>>> supported.
>>>>>
>>>>> Well, I think this use case should be handled with something like
>>>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>>>> https://github.com/stemjail/stemshim
>>>>
>>>> I doubt that will work for containers.  Containers that use user
>>>> namespaces and, for example, setuid programs aren't going to honor
>>>> LD_PRELOAD.
>>>
>>> Or anything that calls syscalls directly, like go programs.
>>
>> That's why the vDSO-like approach. Enforcing an access control is not
>> the issue here, patching a buggy userland (without patching its code) is
>> the issue isn't it?
>>
>> As far as I remember, the main problem is to handle file descriptors
>> while "emulating" the kernel behavior. This can be done with a "shim"
>> code mapped in every processes. Chrome used something like this (in a
>> previous sandbox mechanism) as a kind of emulation (with the current
>> seccomp-bpf ). I think it should be doable to replace the (userland)
>> emulation code with an IPC wrapper receiving file descriptors through
>> UNIX socket.
>>
> 
> Can you explain exactly what you mean by "vDSO-like"?
> 
> When a 64-bit program does a syscall, it just executes the SYSCALL
> instruction.  The vDSO isn't involved at all.  32-bit programs usually
> go through the vDSO, but not always.
> 
> It could be possible to force-load a DSO into an entire container and
> rig up seccomp to intercept all SYSCALLs not originating from the DSO
> such that they merely redirect control to the DSO, but that seems
> quite messy.

vDSO is a code mapped for all processes. As you said, these processes
may use it or not. What I was thinking about is to use the same concept,
i.e. map a "shim" code into each processes pertaining to a particular
hierarchy (the same way seccomp filters are inherited across processes).
With a seccomp filter matching some syscall (e.g. mount, open), it is
possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This
shim code should then be able to emulate/patch what is needed, even
faking a file opening by receiving a file descriptor through a UNIX
socket. As did the Chrome sandbox, the seccomp filter may look at the
calling address to allow the shim code to call syscalls without being
catched, if needed. However, relying on SIGSYS may not fit with
arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump
to a specific process address, to emulate the syscall in an easier way
than only relying on a {c,e}BPF program.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-03-08 23:51                 ` Mickaël Salaün
@ 2018-03-08 23:53                   ` Andy Lutomirski
  2018-04-01 22:04                     ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-03-08 23:53 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Tycho Andersen, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development

On Thu, Mar 8, 2018 at 11:51 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 07/03/2018 02:21, Andy Lutomirski wrote:
>> On Tue, Mar 6, 2018 at 11:06 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> On 06/03/2018 23:46, Tycho Andersen wrote:
>>>> On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote:
>>>>>>> Suppose I'm writing a container manager.  I want to run "mount" in the
>>>>>>> container, but I don't want to allow moun() in general and I want to
>>>>>>> emulate certain mount() actions.  I can write a filter that catches
>>>>>>> mount using seccomp and calls out to the container manager for help.
>>>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>>>>>>> supported.
>>>>>>
>>>>>> Well, I think this use case should be handled with something like
>>>>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>>>>> https://github.com/stemjail/stemshim
>>>>>
>>>>> I doubt that will work for containers.  Containers that use user
>>>>> namespaces and, for example, setuid programs aren't going to honor
>>>>> LD_PRELOAD.
>>>>
>>>> Or anything that calls syscalls directly, like go programs.
>>>
>>> That's why the vDSO-like approach. Enforcing an access control is not
>>> the issue here, patching a buggy userland (without patching its code) is
>>> the issue isn't it?
>>>
>>> As far as I remember, the main problem is to handle file descriptors
>>> while "emulating" the kernel behavior. This can be done with a "shim"
>>> code mapped in every processes. Chrome used something like this (in a
>>> previous sandbox mechanism) as a kind of emulation (with the current
>>> seccomp-bpf ). I think it should be doable to replace the (userland)
>>> emulation code with an IPC wrapper receiving file descriptors through
>>> UNIX socket.
>>>
>>
>> Can you explain exactly what you mean by "vDSO-like"?
>>
>> When a 64-bit program does a syscall, it just executes the SYSCALL
>> instruction.  The vDSO isn't involved at all.  32-bit programs usually
>> go through the vDSO, but not always.
>>
>> It could be possible to force-load a DSO into an entire container and
>> rig up seccomp to intercept all SYSCALLs not originating from the DSO
>> such that they merely redirect control to the DSO, but that seems
>> quite messy.
>
> vDSO is a code mapped for all processes. As you said, these processes
> may use it or not. What I was thinking about is to use the same concept,
> i.e. map a "shim" code into each processes pertaining to a particular
> hierarchy (the same way seccomp filters are inherited across processes).
> With a seccomp filter matching some syscall (e.g. mount, open), it is
> possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This
> shim code should then be able to emulate/patch what is needed, even
> faking a file opening by receiving a file descriptor through a UNIX
> socket. As did the Chrome sandbox, the seccomp filter may look at the
> calling address to allow the shim code to call syscalls without being
> catched, if needed. However, relying on SIGSYS may not fit with
> arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump
> to a specific process address, to emulate the syscall in an easier way
> than only relying on a {c,e}BPF program.
>

This could indeed be done, but I think that Tycho's approach is much
cleaner and probably faster.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata
  2018-02-27  1:23     ` Al Viro
@ 2018-03-11 20:14       ` Mickaël Salaün
  0 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-03-11 20:14 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev,
	James Morris, John Johansen, Stephen Smalley, Tetsuo Handa,
	linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 3655 bytes --]


On 02/27/2018 02:23 AM, Al Viro wrote:
> On Tue, Feb 27, 2018 at 12:57:21AM +0000, Al Viro wrote:
>> On Tue, Feb 27, 2018 at 01:41:11AM +0100, Mickaël Salaün wrote:
>>> The function current_nameidata_security(struct inode *) can be used to
>>> retrieve a blob's pointer address tied to the inode being walk through.
>>> This enable to follow a path lookup and know where an inode access come
>>> from. This is needed for the Landlock LSM to be able to restrict access
>>> to file path.
>>>
>>> The LSM hook nameidata_free_security(struct inode *) is called before
>>> freeing the associated nameidata.
>>
>> NAK.  Not without well-defined semantics and "some Linux S&M uses that for
>> something, don't ask what" does not count.
> 
> Incidentally, pathwalk mechanics is subject to change at zero notice, so
> if you want something, you'd better
> 	* have explicitly defined semantics
> 	* explain what it is - on fsdevel
> 	* not have it hidden behind the layers of opaque LSM dreck, pardon
> the redundance.
> 
> Again, pathwalk internals have changed in the past and may bloody well
> change again in the future.  There's a damn good reason why struct nameidata
> is _not_ visible outside of fs/namei.c, and quietly relying upon any
> implementation details is no-go.
> 

I thought this whole patch series would go to linux-fsdevel but only
this patch did. I'll CCed fsdevel for the next round. Meanwhile, the
cover letter is here: https://lkml.org/lkml/2018/2/26/1214
The code using current_nameidata_lookup(inode) is in the patch 07/11:
https://lkml.org/lkml/2018/2/26/1206

To sum up, I don't know any way to identify if a directory (execute)
access was directly requested by a process or inferred by the kernel
because of a path walk. This was not needed until now because the other
access control systems (either the DAC or access controls enforced by
inode-based LSM, i.e. SELinux and Smack) do not care about the file
hierarchy. Path-based access controls (i.e. AppArmor and Tomoyo)
directly use the notion of path to define a security policy (in the
kernel, not only in the user space configuration). Landlock can't rely
on xattrs (because of composed and unprivileged access control). Because
we can't know for sure from which path an inode come from (if any),
path-based LSM hooks do not help for some file system checks (e.g.
inode_permission). With Landlock, I try to find a way to identify a set
of inodes, from the user space point of view, which is most of the time
related to file hierarchies.

I needed a way to "follow" a path walk, with the minimum amount of code,
and if possible without touching the fs/namei.c . I saw that the
pathwalk mechanism has evolved over time. With this patch, I tried to
make a kernel object (nameidata) usable in some way by LSM, but only
through an inode (current_nameidata_lookup(inode)). The "only" guarantee
of this function should be to identify if an inode is tied to a path
walk. This enable to follow a path walk and know why an inode access is
requested.

I get your concern about the "instability" of the path walk mechanism.
However, I though that a path resolution should not change from the user
space point of view, like other Linux ABI. Anyway, all the current
inode-based access controls, including DAC, rely on this path walks
mechanism. This patch does not expose anything to user space, but only
through the API of Landlock, which is currently relying on path walk
resolutions, already visible to user space. Did I miss something? Do you
have another suggestion to tie an inode to a path walk?

Thanks,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-03-08 23:53                   ` Andy Lutomirski
@ 2018-04-01 22:04                     ` Mickaël Salaün
  2018-04-02  0:39                       ` Tycho Andersen
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-04-01 22:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tycho Andersen, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 3621 bytes --]


On 03/09/2018 12:53 AM, Andy Lutomirski wrote:
> On Thu, Mar 8, 2018 at 11:51 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 07/03/2018 02:21, Andy Lutomirski wrote:
>>> On Tue, Mar 6, 2018 at 11:06 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>
>>>> On 06/03/2018 23:46, Tycho Andersen wrote:
>>>>> On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote:
>>>>>>>> Suppose I'm writing a container manager.  I want to run "mount" in the
>>>>>>>> container, but I don't want to allow moun() in general and I want to
>>>>>>>> emulate certain mount() actions.  I can write a filter that catches
>>>>>>>> mount using seccomp and calls out to the container manager for help.
>>>>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>>>>>>>> supported.
>>>>>>>
>>>>>>> Well, I think this use case should be handled with something like
>>>>>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>>>>>> https://github.com/stemjail/stemshim
>>>>>>
>>>>>> I doubt that will work for containers.  Containers that use user
>>>>>> namespaces and, for example, setuid programs aren't going to honor
>>>>>> LD_PRELOAD.
>>>>>
>>>>> Or anything that calls syscalls directly, like go programs.
>>>>
>>>> That's why the vDSO-like approach. Enforcing an access control is not
>>>> the issue here, patching a buggy userland (without patching its code) is
>>>> the issue isn't it?
>>>>
>>>> As far as I remember, the main problem is to handle file descriptors
>>>> while "emulating" the kernel behavior. This can be done with a "shim"
>>>> code mapped in every processes. Chrome used something like this (in a
>>>> previous sandbox mechanism) as a kind of emulation (with the current
>>>> seccomp-bpf ). I think it should be doable to replace the (userland)
>>>> emulation code with an IPC wrapper receiving file descriptors through
>>>> UNIX socket.
>>>>
>>>
>>> Can you explain exactly what you mean by "vDSO-like"?
>>>
>>> When a 64-bit program does a syscall, it just executes the SYSCALL
>>> instruction.  The vDSO isn't involved at all.  32-bit programs usually
>>> go through the vDSO, but not always.
>>>
>>> It could be possible to force-load a DSO into an entire container and
>>> rig up seccomp to intercept all SYSCALLs not originating from the DSO
>>> such that they merely redirect control to the DSO, but that seems
>>> quite messy.
>>
>> vDSO is a code mapped for all processes. As you said, these processes
>> may use it or not. What I was thinking about is to use the same concept,
>> i.e. map a "shim" code into each processes pertaining to a particular
>> hierarchy (the same way seccomp filters are inherited across processes).
>> With a seccomp filter matching some syscall (e.g. mount, open), it is
>> possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This
>> shim code should then be able to emulate/patch what is needed, even
>> faking a file opening by receiving a file descriptor through a UNIX
>> socket. As did the Chrome sandbox, the seccomp filter may look at the
>> calling address to allow the shim code to call syscalls without being
>> catched, if needed. However, relying on SIGSYS may not fit with
>> arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump
>> to a specific process address, to emulate the syscall in an easier way
>> than only relying on a {c,e}BPF program.
>>
> 
> This could indeed be done, but I think that Tycho's approach is much
> cleaner and probably faster.
> 

I like it too but how does this handle file descriptors?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
  2018-03-06 22:28                 ` Mickaël Salaün
@ 2018-04-01 22:48                   ` Mickaël Salaün
  0 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-04-01 22:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, Daniel Borkmann, David Drysdale,
	David S . Miller, Eric W . Biederman, James Morris, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development


[-- Attachment #1.1: Type: text/plain, Size: 3058 bytes --]


On 03/06/2018 11:28 PM, Mickaël Salaün wrote:
> 
> On 28/02/2018 01:09, Andy Lutomirski wrote:
>> On Wed, Feb 28, 2018 at 12:00 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> On 28/02/2018 00:23, Andy Lutomirski wrote:
>>>> On Tue, Feb 27, 2018 at 11:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>> On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>>>
>>>>>
>>>>> I think you're wrong here.  Any sane container trying to use Landlock
>>>>> like this would also create a PID namespace.  Problem solved.  I still
>>>>> think you should drop this patch.
>>>
>>> Containers is one use case, another is build-in sandboxing (e.g. for web
>>> browser…) and another one is for sandbox managers (e.g. Firejail,
>>> Bubblewrap, Flatpack…). In some of these use cases, especially from a
>>> developer point of view, you may want/need to debug your applications
>>> (without requiring to be root). For nested Landlock access-controls
>>> (e.g. container + user session + web browser), it may not be allowed to
>>> create a PID namespace, but you still want to have a meaningful
>>> access-control.
>>>
>>
>> The consideration should be exactly the same as for normal seccomp.
>> If I'm in a container (using PID namespaces + seccomp) and a run a web
>> browser, I can debug the browser.
>>
>> If there's a real use case for adding this type of automatic ptrace
>> protection, then by all means, let's add it as a general seccomp
>> feature.
>>
> 
> Right, it makes sense to add this feature to seccomp filters as well.
> What do you think Kees?
> 

As a second though, it may be useful for seccomp but it should be
another patch series, independent from this one.

The idea to keep in mind is that this ptrace restriction is an automatic
way to define what is called a subject in common access control
vocabulary, like used by SELinux. A subject should not be able to
impersonate another one with less restrictions (to get more rights).
Because of the stackable restrictions of Landlock (same principle used
by seccomp), it is easy to identify which subject (i.e. group of
processes) is more restricted (or with different restrictions) than
another. This follow the same principle as Yama's ptrace_scope.

Another important argument for a different ptrace-protection
mechanism than seccomp is that Landlock programs may be applied (i.e.
define subject) otherwise than with a process hierarchy. Another way to
define a Landlock subject may be by using cgroups (which was previously
discussed). I'm also thinking about being able to create (real)
capabilities (not to be confused with POSIX capabilities), which may be
useful to implement some parts of Capsicum, by attaching Landlock
programs to a file descriptor (and not directly to a group of
processes). All this to highlight that the ptrace protection is specific
to Landlock and may not be directly shared with seccomp.

Even if Landlock follows the footprints of seccomp, they are different
beasts.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
  2018-04-01 22:04                     ` Mickaël Salaün
@ 2018-04-02  0:39                       ` Tycho Andersen
  0 siblings, 0 replies; 55+ messages in thread
From: Tycho Andersen @ 2018-04-02  0:39 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Andy Lutomirski, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development

Hi Mickaël,

On Mon, Apr 02, 2018 at 12:04:36AM +0200, Mickaël Salaün wrote:
> >> vDSO is a code mapped for all processes. As you said, these processes
> >> may use it or not. What I was thinking about is to use the same concept,
> >> i.e. map a "shim" code into each processes pertaining to a particular
> >> hierarchy (the same way seccomp filters are inherited across processes).
> >> With a seccomp filter matching some syscall (e.g. mount, open), it is
> >> possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This
> >> shim code should then be able to emulate/patch what is needed, even
> >> faking a file opening by receiving a file descriptor through a UNIX
> >> socket. As did the Chrome sandbox, the seccomp filter may look at the
> >> calling address to allow the shim code to call syscalls without being
> >> catched, if needed. However, relying on SIGSYS may not fit with
> >> arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump
> >> to a specific process address, to emulate the syscall in an easier way
> >> than only relying on a {c,e}BPF program.
> >>
> > 
> > This could indeed be done, but I think that Tycho's approach is much
> > cleaner and probably faster.
> > 
> 
> I like it too but how does this handle file descriptors?

I think it could be done fairly simply, the most complicated part is
probably designing an API that doesn't suck. But the basic idea would
be:

struct seccomp_notif_resp {
    __u64 id;
    __s32 error;
    __s64 val;
    __s32 fd;
};

if the handler responds with fd >= 0, we grab the tracer's fd,
duplicate it, and install it somewhere in the tracee's fd table. Since
things like socket() will want to return the fd number as its
installed and the handler doesn't know that, we'll probably want some
way to indicate that the kernel should return this value. We could
either mandate that if fd >= 0, that's the value that will be returned
from the syscall, or add another flag that says "no, install the fd,
but really return what's in val instead).

I guess we can't mandate that we return fd, because e.g. netlink
sockets can sometimes return fds as part of the netlink messages, and
not as the return value from the syscall.

Tycho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-02-27 21:48               ` Mickaël Salaün
@ 2018-04-08 13:13                 ` Mickaël Salaün
  2018-04-08 21:06                   ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-04-08 13:13 UTC (permalink / raw)
  To: Andy Lutomirski, Alexei Starovoitov, Daniel Borkmann
  Cc: LKML, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Casey Schaufler, David Drysdale, David S . Miller,
	Eric W . Biederman, Jann Horn, Jonathan Corbet, Michael Kerrisk,
	Kees Cook, Paul Moore, Sargun Dhillon, Serge E . Hallyn,
	Shuah Khan, Tejun Heo, Thomas Graf, Tycho Andersen, Will Drewry,
	Kernel Hardening, Linux API, LSM List, Network Development,
	Andrew Morton


[-- Attachment #1.1: Type: text/plain, Size: 18229 bytes --]


On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
> 
> On 27/02/2018 17:39, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock program
>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the
>>>>>>>> current task and all its future children. A program is immutable and a
>>>>>>>> task can only add new restricting programs to itself, forming a list of
>>>>>>>> programss.
>>>>>>>>
>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a kernel
>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of
>>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>>> evaluated. Each program return a 32-bit value which can deny the action
>>>>>>>> on a kernel object with a non-zero value. If every programs of the list
>>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>>
>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for a
>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
>>>>>>>> chaining is restricted when a process construct this chain by loading a
>>>>>>>> program, but additional checks are performed when it requests to apply
>>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
>>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>>> commits).  This restrictions still allows to reuse Landlock programs in
>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>>> chains of fs_pick programs).
>>>>>>>>
>>>>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
>>>>>>>> +             struct landlock_prog_set *current_prog_set,
>>>>>>>> +             struct bpf_prog *prog)
>>>>>>>> +{
>>>>>>>> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>>>>>>>> +     unsigned long pages;
>>>>>>>> +     int err;
>>>>>>>> +     size_t i;
>>>>>>>> +     struct landlock_prog_set tmp_prog_set = {};
>>>>>>>> +
>>>>>>>> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>>>>>>>> +             return ERR_PTR(-EINVAL);
>>>>>>>> +
>>>>>>>> +     /* validate memory size allocation */
>>>>>>>> +     pages = prog->pages;
>>>>>>>> +     if (current_prog_set) {
>>>>>>>> +             size_t i;
>>>>>>>> +
>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>>>>>>>> +                     struct landlock_prog_list *walker_p;
>>>>>>>> +
>>>>>>>> +                     for (walker_p = current_prog_set->programs[i];
>>>>>>>> +                                     walker_p; walker_p = walker_p->prev)
>>>>>>>> +                             pages += walker_p->prog->pages;
>>>>>>>> +             }
>>>>>>>> +             /* count a struct landlock_prog_set if we need to allocate one */
>>>>>>>> +             if (refcount_read(&current_prog_set->usage) != 1)
>>>>>>>> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>>>>>>>> +                             / PAGE_SIZE;
>>>>>>>> +     }
>>>>>>>> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>>>>>>>> +             return ERR_PTR(-E2BIG);
>>>>>>>> +
>>>>>>>> +     /* ensure early that we can allocate enough memory for the new
>>>>>>>> +      * prog_lists */
>>>>>>>> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>>>>>>>> +     if (err)
>>>>>>>> +             return ERR_PTR(err);
>>>>>>>> +
>>>>>>>> +     /*
>>>>>>>> +      * Each task_struct points to an array of prog list pointers.  These
>>>>>>>> +      * tables are duplicated when additions are made (which means each
>>>>>>>> +      * table needs to be refcounted for the processes using it). When a new
>>>>>>>> +      * table is created, all the refcounters on the prog_list are bumped (to
>>>>>>>> +      * track each table that references the prog). When a new prog is
>>>>>>>> +      * added, it's just prepended to the list for the new table to point
>>>>>>>> +      * at.
>>>>>>>> +      *
>>>>>>>> +      * Manage all the possible errors before this step to not uselessly
>>>>>>>> +      * duplicate current_prog_set and avoid a rollback.
>>>>>>>> +      */
>>>>>>>> +     if (!new_prog_set) {
>>>>>>>> +             /*
>>>>>>>> +              * If there is no Landlock program set used by the current task,
>>>>>>>> +              * then create a new one.
>>>>>>>> +              */
>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>>>>>>>> +             /*
>>>>>>>> +              * If the current task is not the sole user of its Landlock
>>>>>>>> +              * program set, then duplicate them.
>>>>>>>> +              */
>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>>>>>>>> +                     new_prog_set->programs[i] =
>>>>>>>> +                             READ_ONCE(current_prog_set->programs[i]);
>>>>>>>> +                     if (new_prog_set->programs[i])
>>>>>>>> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>>>>>>>> +             }
>>>>>>>> +
>>>>>>>> +             /*
>>>>>>>> +              * Landlock program set from the current task will not be freed
>>>>>>>> +              * here because the usage is strictly greater than 1. It is
>>>>>>>> +              * only prevented to be freed by another task thanks to the
>>>>>>>> +              * caller of landlock_prepend_prog() which should be locked if
>>>>>>>> +              * needed.
>>>>>>>> +              */
>>>>>>>> +             landlock_put_prog_set(current_prog_set);
>>>>>>>> +     }
>>>>>>>> +
>>>>>>>> +     /* prepend tmp_prog_set to new_prog_set */
>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>>>>>>>> +             /* get the last new list */
>>>>>>>> +             struct landlock_prog_list *last_list =
>>>>>>>> +                     tmp_prog_set.programs[i];
>>>>>>>> +
>>>>>>>> +             if (last_list) {
>>>>>>>> +                     while (last_list->prev)
>>>>>>>> +                             last_list = last_list->prev;
>>>>>>>> +                     /* no need to increment usage (pointer replacement) */
>>>>>>>> +                     last_list->prev = new_prog_set->programs[i];
>>>>>>>> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>>>>>>>> +             }
>>>>>>>> +     }
>>>>>>>> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>>>>>>>> +     return new_prog_set;
>>>>>>>> +
>>>>>>>> +put_tmp_lists:
>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>>>>>>>> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>>>>>>>> +     return new_prog_set;
>>>>>>>> +}
>>>>>>>
>>>>>>> Nack on the chaining concept.
>>>>>>> Please do not reinvent the wheel.
>>>>>>> There is an existing mechanism for attaching/detaching/quering multiple
>>>>>>> programs attached to cgroup and tracing hooks that are also
>>>>>>> efficiently executed via BPF_PROG_RUN_ARRAY.
>>>>>>> Please use that instead.
>>>>>>>
>>>>>>
>>>>>> I don't see how that would help.  Suppose you add a filter, then
>>>>>> fork(), and then the child adds another filter.  Do you want to
>>>>>> duplicate the entire array?  You certainly can't *modify* the array
>>>>>> because you'll affect processes that shouldn't be affected.
>>>>>>
>>>>>> In contrast, doing this through seccomp like the earlier patches
>>>>>> seemed just fine to me, and seccomp already had the right logic.
>>>>>
>>>>> it doesn't look to me that existing seccomp side of managing fork
>>>>> situation can be reused. Here there is an attempt to add 'chaining'
>>>>> concept which sort of an extension of existing seccomp style,
>>>>> but somehow heavily done on bpf side and contradicts cgroup/tracing.
>>>>>
>>>>
>>>> I don't see why the seccomp way can't be used.  I agree with you that
>>>> the seccomp *style* shouldn't be used in bpf code like this, but I
>>>> think that Landlock programs can and should just live in the existing
>>>> seccomp chain.  If the existing seccomp code needs some modification
>>>> to make this work, then so be it.
>>>
>>> +1
>>> if that was the case...
>>> but that's not my reading of the patch set.
>>
>> An earlier version of the patch set used the seccomp filter chain.
>> Mickaël, what exactly was wrong with that approach other than that the
>> seccomp() syscall was awkward for you to use?  You could add a
>> seccomp_add_landlock_rule() syscall if you needed to.
> 
> Nothing was wrong about about that, this part did not changed (see my
> next comment).
> 
>>
>> As a side comment, why is this an LSM at all, let alone a non-stacking
>> LSM?  It would make a lot more sense to me to make Landlock depend on
>> having LSMs configured in but to call the landlock hooks directly from
>> the security_xyz() hooks.
> 
> See Casey's answer and his patch series: https://lwn.net/Articles/741963/
> 
>>
>>>
>>>> In other words, the kernel already has two kinds of chaining:
>>>> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
>>>> across fork(), whereas seccomp's already handles that case correctly.
>>>> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
>>>>  So IMO Landlock should use the seccomp core code and call into bpf
>>>> for the actual filtering.
>>>
>>> +1
>>> in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
>>> since cgroup hierarchy can be complicated with bpf progs attached
>>> at different levels with different override/multiprog properties,
>>> so walking link list and checking all flags at run-time would have
>>> been too slow. That's why we added compute_effective_progs().
>>
>> If we start adding override flags to Landlock, I think we're doing it
>> wrong.   With cgroup bpf programs, the whole mess is set up by the
>> administrator.  With seccomp, and with Landlock if done correctly, it
>> *won't* be set up by the administrator, so the chance that everyone
>> gets all the flags right is about zero.  All attached filters should
>> run unconditionally.
> 
> 
> There is a misunderstanding about this chaining mechanism. This should
> not be confused with the list of seccomp filters nor the cgroup
> hierarchies. Landlock programs can be stacked the same way seccomp's
> filters can (cf. struct landlock_prog_set, the "chain_last" field is an
> optimization which is not used for this struct handling). This stackable
> property did not changed from the previous patch series. The chaining
> mechanism is for another use case, which does not make sense for seccomp
> filters nor other eBPF program types, at least for now, from what I can
> tell.
> 
> You may want to get a look at my talk at FOSDEM
> (https://landlock.io/talks/2018-02-04_landlock-fosdem.pdf), especially
> slides 11 and 12.
> 
> Let me explain my reasoning about this program chaining thing.
> 
> To check if an action on a file is allowed, we first need to identify
> this file and match it to the security policy. In a previous
> (non-public) patch series, I tried to use one type of eBPF program to
> check every kind of access to a file. To be able to identify a file, I
> relied on an eBPF map, similar to the current inode map. This map store
> a set of references to file descriptors. I then created a function
> bpf_is_file_beneath() to check if the requested file was beneath a file
> in the map. This way, no chaining, only one eBPF program type to check
> an access to a file... but some issues then emerged. First, this design
> create a side-channel which help an attacker using such a program to
> infer some information not normally available, for example to get a hint
> on where a file descriptor (received from a UNIX socket) come from.
> Another issue is that this type of program would be called for each
> component of a path. Indeed, when the kernel check if an access to a
> file is allowed, it walk through all of the directories in its path
> (checking if the current process is allowed to execute them). That first
> attempt led me to rethink the way we could filter an access to a file
> *path*.
> 
> To minimize the number of called to an eBPF program dedicated to
> validate an access to a file path, I decided to create three subtype of
> eBPF programs. The FS_WALK type is called when walking through every
> directory of a file path (except the last one if it is the target). We
> can then restrict this type of program to the minimum set of functions
> it is allowed to call and the minimum set of data available from its
> context. The first implicit chaining is for this type of program. To be
> able to evaluate a path while being called for all its components, this
> program need to store a state (to remember what was the parent directory
> of this path). There is no "previous" field in the subtype for this
> program because it is chained with itself, for each directories. This
> enable to create a FS_WALK program to evaluate a file hierarchy, thank
> to the inode map which can be used to check if a directory of this
> hierarchy is part of an allowed (or denied) list of directories. This
> design enables to express a file hierarchy in a programmatic way,
> without requiring an eBPF helper to do the job (unlike my first experiment).
> 
> The explicit chaining is used to tied a path evaluation (with a FS_WALK
> program) to an access to the actual file being requested (the last
> component of a file path), with a FS_PICK program. It is only at this
> time that the kernel check for the requested action (e.g. read, write,
> chdir, append...). To be able to filter such access request we can have
> one call to the same program for every action and let this program check
> for which action it was called. However, this design does not allow the
> kernel to know if the current action is indeed handled by this program.
> Hence, it is not possible to implement a cache mechanism to only call
> this program if it knows how to handle this action.
> 
> The approach I took for this FS_PICK type of program is to add to its
> subtype which action it can handle (with the "triggers" bitfield, seen
> as ORed actions). This way, the kernel knows if a call to a FS_PICK
> program is necessary. If the user wants to enforce a different security
> policy according to the action requested on a file, then it needs
> multiple FS_PICK programs. However, to reduce the number of such
> programs, this patch series allow a FS_PICK program to be chained with
> another, the same way a FS_WALK is chained with itself. This way, if the
> user want to check if the action is a for example an "open" and a "read"
> and not a "map" and a "read", then it can chain multiple FS_PICK
> programs with different triggers actions. The OR check performed by the
> kernel is not a limitation then, only a way to know if a call to an eBPF
> program is needed.
> 
> The last type of program is FS_GET. This one is called when a process
> get a struct file or change its working directory. This is the only
> program type able (and allowed) to tag a file. This restriction is
> important to not being subject to resource exhaustion attacks (i.e.
> tagging every inode accessible to an attacker, which would allocate too
> much kernel memory).
> 
> This design gives room for improvements to create a cache of eBPF
> context (input data, including maps if any), with the result of an eBPF
> program. This would help limit the number of call to an eBPF program the
> same way SELinux or other kernel components do to limit costly checks.
> 
> The eBPF maps of progs are useful to call the same type of eBPF
> program. It does not fit with this use case because we may want multiple
> eBPF program according to the action requested on a kernel object (e.g.
> FS_GET). The other reason is because the eBPF program does not know what
> will be the next (type of) access check performed by the kernel.
> 
> To say it another way, this chaining mechanism is a way to split a
> kernel object evaluation with multiple specialized programs, each of
> them being able to deal with data tied to their type. Using a monolithic
> eBPF program to check everything does not scale and does not fit with
> unprivileged use either.
> 
> As a side note, the cookie value is only an ephemeral value to keep a
> state between multiple programs call. It can be used to create a state
> machine for an object evaluation.
> 
> I don't see a way to do an efficient and programmatic path evaluation,
> with different access checks, with the current eBPF features. Please let
> me know if you know how to do it another way.
> 

Andy, Alexei, Daniel, what do you think about this Landlock program
chaining and cookie?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-04-08 13:13                 ` Mickaël Salaün
@ 2018-04-08 21:06                   ` Andy Lutomirski
  2018-04-08 22:01                     ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2018-04-08 21:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Andy Lutomirski, Alexei Starovoitov, Daniel Borkmann, LKML,
	Alexei Starovoitov, Arnaldo Carvalho de Melo, Casey Schaufler,
	David Drysdale, David S . Miller, Eric W . Biederman, Jann Horn,
	Jonathan Corbet, Michael Kerrisk, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Tejun Heo,
	Thomas Graf, Tycho Andersen, Will Drewry, Kernel Hardening,
	Linux API, LSM List, Network Development, Andrew Morton

On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
>>
>> On 27/02/2018 17:39, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock program
>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the
>>>>>>>>> current task and all its future children. A program is immutable and a
>>>>>>>>> task can only add new restricting programs to itself, forming a list of
>>>>>>>>> programss.
>>>>>>>>>
>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a kernel
>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of
>>>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the action
>>>>>>>>> on a kernel object with a non-zero value. If every programs of the list
>>>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>>>
>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for a
>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
>>>>>>>>> chaining is restricted when a process construct this chain by loading a
>>>>>>>>> program, but additional checks are performed when it requests to apply
>>>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
>>>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>>>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>>>> commits).  This restrictions still allows to reuse Landlock programs in
>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>>>> chains of fs_pick programs).
>>>>>>>>>
>>>>>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
>>>>>>>>> +             struct landlock_prog_set *current_prog_set,
>>>>>>>>> +             struct bpf_prog *prog)
>>>>>>>>> +{
>>>>>>>>> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>>>>>>>>> +     unsigned long pages;
>>>>>>>>> +     int err;
>>>>>>>>> +     size_t i;
>>>>>>>>> +     struct landlock_prog_set tmp_prog_set = {};
>>>>>>>>> +
>>>>>>>>> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>>>>>>>>> +             return ERR_PTR(-EINVAL);
>>>>>>>>> +
>>>>>>>>> +     /* validate memory size allocation */
>>>>>>>>> +     pages = prog->pages;
>>>>>>>>> +     if (current_prog_set) {
>>>>>>>>> +             size_t i;
>>>>>>>>> +
>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>>>>>>>>> +                     struct landlock_prog_list *walker_p;
>>>>>>>>> +
>>>>>>>>> +                     for (walker_p = current_prog_set->programs[i];
>>>>>>>>> +                                     walker_p; walker_p = walker_p->prev)
>>>>>>>>> +                             pages += walker_p->prog->pages;
>>>>>>>>> +             }
>>>>>>>>> +             /* count a struct landlock_prog_set if we need to allocate one */
>>>>>>>>> +             if (refcount_read(&current_prog_set->usage) != 1)
>>>>>>>>> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>>>>>>>>> +                             / PAGE_SIZE;
>>>>>>>>> +     }
>>>>>>>>> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>>>>>>>>> +             return ERR_PTR(-E2BIG);
>>>>>>>>> +
>>>>>>>>> +     /* ensure early that we can allocate enough memory for the new
>>>>>>>>> +      * prog_lists */
>>>>>>>>> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>>>>>>>>> +     if (err)
>>>>>>>>> +             return ERR_PTR(err);
>>>>>>>>> +
>>>>>>>>> +     /*
>>>>>>>>> +      * Each task_struct points to an array of prog list pointers.  These
>>>>>>>>> +      * tables are duplicated when additions are made (which means each
>>>>>>>>> +      * table needs to be refcounted for the processes using it). When a new
>>>>>>>>> +      * table is created, all the refcounters on the prog_list are bumped (to
>>>>>>>>> +      * track each table that references the prog). When a new prog is
>>>>>>>>> +      * added, it's just prepended to the list for the new table to point
>>>>>>>>> +      * at.
>>>>>>>>> +      *
>>>>>>>>> +      * Manage all the possible errors before this step to not uselessly
>>>>>>>>> +      * duplicate current_prog_set and avoid a rollback.
>>>>>>>>> +      */
>>>>>>>>> +     if (!new_prog_set) {
>>>>>>>>> +             /*
>>>>>>>>> +              * If there is no Landlock program set used by the current task,
>>>>>>>>> +              * then create a new one.
>>>>>>>>> +              */
>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>>> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>>>>>>>>> +             /*
>>>>>>>>> +              * If the current task is not the sole user of its Landlock
>>>>>>>>> +              * program set, then duplicate them.
>>>>>>>>> +              */
>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>>>>>>>>> +                     new_prog_set->programs[i] =
>>>>>>>>> +                             READ_ONCE(current_prog_set->programs[i]);
>>>>>>>>> +                     if (new_prog_set->programs[i])
>>>>>>>>> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>>>>>>>>> +             }
>>>>>>>>> +
>>>>>>>>> +             /*
>>>>>>>>> +              * Landlock program set from the current task will not be freed
>>>>>>>>> +              * here because the usage is strictly greater than 1. It is
>>>>>>>>> +              * only prevented to be freed by another task thanks to the
>>>>>>>>> +              * caller of landlock_prepend_prog() which should be locked if
>>>>>>>>> +              * needed.
>>>>>>>>> +              */
>>>>>>>>> +             landlock_put_prog_set(current_prog_set);
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>> +     /* prepend tmp_prog_set to new_prog_set */
>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>>>>>>>>> +             /* get the last new list */
>>>>>>>>> +             struct landlock_prog_list *last_list =
>>>>>>>>> +                     tmp_prog_set.programs[i];
>>>>>>>>> +
>>>>>>>>> +             if (last_list) {
>>>>>>>>> +                     while (last_list->prev)
>>>>>>>>> +                             last_list = last_list->prev;
>>>>>>>>> +                     /* no need to increment usage (pointer replacement) */
>>>>>>>>> +                     last_list->prev = new_prog_set->programs[i];
>>>>>>>>> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>>>>>>>>> +             }
>>>>>>>>> +     }
>>>>>>>>> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>>>>>>>>> +     return new_prog_set;
>>>>>>>>> +
>>>>>>>>> +put_tmp_lists:
>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>>>>>>>>> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>>>>>>>>> +     return new_prog_set;
>>>>>>>>> +}
>>>>>>>>
>>>>>>>> Nack on the chaining concept.
>>>>>>>> Please do not reinvent the wheel.
>>>>>>>> There is an existing mechanism for attaching/detaching/quering multiple
>>>>>>>> programs attached to cgroup and tracing hooks that are also
>>>>>>>> efficiently executed via BPF_PROG_RUN_ARRAY.
>>>>>>>> Please use that instead.
>>>>>>>>
>>>>>>>
>>>>>>> I don't see how that would help.  Suppose you add a filter, then
>>>>>>> fork(), and then the child adds another filter.  Do you want to
>>>>>>> duplicate the entire array?  You certainly can't *modify* the array
>>>>>>> because you'll affect processes that shouldn't be affected.
>>>>>>>
>>>>>>> In contrast, doing this through seccomp like the earlier patches
>>>>>>> seemed just fine to me, and seccomp already had the right logic.
>>>>>>
>>>>>> it doesn't look to me that existing seccomp side of managing fork
>>>>>> situation can be reused. Here there is an attempt to add 'chaining'
>>>>>> concept which sort of an extension of existing seccomp style,
>>>>>> but somehow heavily done on bpf side and contradicts cgroup/tracing.
>>>>>>
>>>>>
>>>>> I don't see why the seccomp way can't be used.  I agree with you that
>>>>> the seccomp *style* shouldn't be used in bpf code like this, but I
>>>>> think that Landlock programs can and should just live in the existing
>>>>> seccomp chain.  If the existing seccomp code needs some modification
>>>>> to make this work, then so be it.
>>>>
>>>> +1
>>>> if that was the case...
>>>> but that's not my reading of the patch set.
>>>
>>> An earlier version of the patch set used the seccomp filter chain.
>>> Mickaël, what exactly was wrong with that approach other than that the
>>> seccomp() syscall was awkward for you to use?  You could add a
>>> seccomp_add_landlock_rule() syscall if you needed to.
>>
>> Nothing was wrong about about that, this part did not changed (see my
>> next comment).
>>
>>>
>>> As a side comment, why is this an LSM at all, let alone a non-stacking
>>> LSM?  It would make a lot more sense to me to make Landlock depend on
>>> having LSMs configured in but to call the landlock hooks directly from
>>> the security_xyz() hooks.
>>
>> See Casey's answer and his patch series: https://lwn.net/Articles/741963/
>>
>>>
>>>>
>>>>> In other words, the kernel already has two kinds of chaining:
>>>>> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
>>>>> across fork(), whereas seccomp's already handles that case correctly.
>>>>> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
>>>>>  So IMO Landlock should use the seccomp core code and call into bpf
>>>>> for the actual filtering.
>>>>
>>>> +1
>>>> in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
>>>> since cgroup hierarchy can be complicated with bpf progs attached
>>>> at different levels with different override/multiprog properties,
>>>> so walking link list and checking all flags at run-time would have
>>>> been too slow. That's why we added compute_effective_progs().
>>>
>>> If we start adding override flags to Landlock, I think we're doing it
>>> wrong.   With cgroup bpf programs, the whole mess is set up by the
>>> administrator.  With seccomp, and with Landlock if done correctly, it
>>> *won't* be set up by the administrator, so the chance that everyone
>>> gets all the flags right is about zero.  All attached filters should
>>> run unconditionally.
>>
>>
>> There is a misunderstanding about this chaining mechanism. This should
>> not be confused with the list of seccomp filters nor the cgroup
>> hierarchies. Landlock programs can be stacked the same way seccomp's
>> filters can (cf. struct landlock_prog_set, the "chain_last" field is an
>> optimization which is not used for this struct handling). This stackable
>> property did not changed from the previous patch series. The chaining
>> mechanism is for another use case, which does not make sense for seccomp
>> filters nor other eBPF program types, at least for now, from what I can
>> tell.
>>
>> You may want to get a look at my talk at FOSDEM
>> (https://landlock.io/talks/2018-02-04_landlock-fosdem.pdf), especially
>> slides 11 and 12.
>>
>> Let me explain my reasoning about this program chaining thing.
>>
>> To check if an action on a file is allowed, we first need to identify
>> this file and match it to the security policy. In a previous
>> (non-public) patch series, I tried to use one type of eBPF program to
>> check every kind of access to a file. To be able to identify a file, I
>> relied on an eBPF map, similar to the current inode map. This map store
>> a set of references to file descriptors. I then created a function
>> bpf_is_file_beneath() to check if the requested file was beneath a file
>> in the map. This way, no chaining, only one eBPF program type to check
>> an access to a file... but some issues then emerged. First, this design
>> create a side-channel which help an attacker using such a program to
>> infer some information not normally available, for example to get a hint
>> on where a file descriptor (received from a UNIX socket) come from.
>> Another issue is that this type of program would be called for each
>> component of a path. Indeed, when the kernel check if an access to a
>> file is allowed, it walk through all of the directories in its path
>> (checking if the current process is allowed to execute them). That first
>> attempt led me to rethink the way we could filter an access to a file
>> *path*.
>>
>> To minimize the number of called to an eBPF program dedicated to
>> validate an access to a file path, I decided to create three subtype of
>> eBPF programs. The FS_WALK type is called when walking through every
>> directory of a file path (except the last one if it is the target). We
>> can then restrict this type of program to the minimum set of functions
>> it is allowed to call and the minimum set of data available from its
>> context. The first implicit chaining is for this type of program. To be
>> able to evaluate a path while being called for all its components, this
>> program need to store a state (to remember what was the parent directory
>> of this path). There is no "previous" field in the subtype for this
>> program because it is chained with itself, for each directories. This
>> enable to create a FS_WALK program to evaluate a file hierarchy, thank
>> to the inode map which can be used to check if a directory of this
>> hierarchy is part of an allowed (or denied) list of directories. This
>> design enables to express a file hierarchy in a programmatic way,
>> without requiring an eBPF helper to do the job (unlike my first experiment).
>>
>> The explicit chaining is used to tied a path evaluation (with a FS_WALK
>> program) to an access to the actual file being requested (the last
>> component of a file path), with a FS_PICK program. It is only at this
>> time that the kernel check for the requested action (e.g. read, write,
>> chdir, append...). To be able to filter such access request we can have
>> one call to the same program for every action and let this program check
>> for which action it was called. However, this design does not allow the
>> kernel to know if the current action is indeed handled by this program.
>> Hence, it is not possible to implement a cache mechanism to only call
>> this program if it knows how to handle this action.
>>
>> The approach I took for this FS_PICK type of program is to add to its
>> subtype which action it can handle (with the "triggers" bitfield, seen
>> as ORed actions). This way, the kernel knows if a call to a FS_PICK
>> program is necessary. If the user wants to enforce a different security
>> policy according to the action requested on a file, then it needs
>> multiple FS_PICK programs. However, to reduce the number of such
>> programs, this patch series allow a FS_PICK program to be chained with
>> another, the same way a FS_WALK is chained with itself. This way, if the
>> user want to check if the action is a for example an "open" and a "read"
>> and not a "map" and a "read", then it can chain multiple FS_PICK
>> programs with different triggers actions. The OR check performed by the
>> kernel is not a limitation then, only a way to know if a call to an eBPF
>> program is needed.
>>
>> The last type of program is FS_GET. This one is called when a process
>> get a struct file or change its working directory. This is the only
>> program type able (and allowed) to tag a file. This restriction is
>> important to not being subject to resource exhaustion attacks (i.e.
>> tagging every inode accessible to an attacker, which would allocate too
>> much kernel memory).
>>
>> This design gives room for improvements to create a cache of eBPF
>> context (input data, including maps if any), with the result of an eBPF
>> program. This would help limit the number of call to an eBPF program the
>> same way SELinux or other kernel components do to limit costly checks.
>>
>> The eBPF maps of progs are useful to call the same type of eBPF
>> program. It does not fit with this use case because we may want multiple
>> eBPF program according to the action requested on a kernel object (e.g.
>> FS_GET). The other reason is because the eBPF program does not know what
>> will be the next (type of) access check performed by the kernel.
>>
>> To say it another way, this chaining mechanism is a way to split a
>> kernel object evaluation with multiple specialized programs, each of
>> them being able to deal with data tied to their type. Using a monolithic
>> eBPF program to check everything does not scale and does not fit with
>> unprivileged use either.
>>
>> As a side note, the cookie value is only an ephemeral value to keep a
>> state between multiple programs call. It can be used to create a state
>> machine for an object evaluation.
>>
>> I don't see a way to do an efficient and programmatic path evaluation,
>> with different access checks, with the current eBPF features. Please let
>> me know if you know how to do it another way.
>>
>
> Andy, Alexei, Daniel, what do you think about this Landlock program
> chaining and cookie?
>

Can you give a small pseudocode real world example that acutally needs
chaining?  The mechanism is quite complicated and I'd like to
understand how it'll be used.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-04-08 21:06                   ` Andy Lutomirski
@ 2018-04-08 22:01                     ` Mickaël Salaün
  2018-04-10  4:48                       ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Mickaël Salaün @ 2018-04-08 22:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Daniel Borkmann, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, David Drysdale,
	David S . Miller, Eric W . Biederman, Jann Horn, Jonathan Corbet,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf,
	Tycho Andersen, Will Drewry, Kernel Hardening, Linux API,
	LSM List, Network Development, Andrew Morton, Al Viro,
	Linux-Fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 22278 bytes --]


On 04/08/2018 11:06 PM, Andy Lutomirski wrote:
> On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
>>>
>>> On 27/02/2018 17:39, Andy Lutomirski wrote:
>>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>> On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
>>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
>>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock program
>>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the
>>>>>>>>>> current task and all its future children. A program is immutable and a
>>>>>>>>>> task can only add new restricting programs to itself, forming a list of
>>>>>>>>>> programss.
>>>>>>>>>>
>>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a kernel
>>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of
>>>>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the action
>>>>>>>>>> on a kernel object with a non-zero value. If every programs of the list
>>>>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>>>>
>>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for a
>>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
>>>>>>>>>> chaining is restricted when a process construct this chain by loading a
>>>>>>>>>> program, but additional checks are performed when it requests to apply
>>>>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
>>>>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>>>>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>>>>> commits).  This restrictions still allows to reuse Landlock programs in
>>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>>>>> chains of fs_pick programs).
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
>>>>>>>>>> +             struct landlock_prog_set *current_prog_set,
>>>>>>>>>> +             struct bpf_prog *prog)
>>>>>>>>>> +{
>>>>>>>>>> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>>>>>>>>>> +     unsigned long pages;
>>>>>>>>>> +     int err;
>>>>>>>>>> +     size_t i;
>>>>>>>>>> +     struct landlock_prog_set tmp_prog_set = {};
>>>>>>>>>> +
>>>>>>>>>> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>>>>>>>>>> +             return ERR_PTR(-EINVAL);
>>>>>>>>>> +
>>>>>>>>>> +     /* validate memory size allocation */
>>>>>>>>>> +     pages = prog->pages;
>>>>>>>>>> +     if (current_prog_set) {
>>>>>>>>>> +             size_t i;
>>>>>>>>>> +
>>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>>>>>>>>>> +                     struct landlock_prog_list *walker_p;
>>>>>>>>>> +
>>>>>>>>>> +                     for (walker_p = current_prog_set->programs[i];
>>>>>>>>>> +                                     walker_p; walker_p = walker_p->prev)
>>>>>>>>>> +                             pages += walker_p->prog->pages;
>>>>>>>>>> +             }
>>>>>>>>>> +             /* count a struct landlock_prog_set if we need to allocate one */
>>>>>>>>>> +             if (refcount_read(&current_prog_set->usage) != 1)
>>>>>>>>>> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>>>>>>>>>> +                             / PAGE_SIZE;
>>>>>>>>>> +     }
>>>>>>>>>> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>>>>>>>>>> +             return ERR_PTR(-E2BIG);
>>>>>>>>>> +
>>>>>>>>>> +     /* ensure early that we can allocate enough memory for the new
>>>>>>>>>> +      * prog_lists */
>>>>>>>>>> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>>>>>>>>>> +     if (err)
>>>>>>>>>> +             return ERR_PTR(err);
>>>>>>>>>> +
>>>>>>>>>> +     /*
>>>>>>>>>> +      * Each task_struct points to an array of prog list pointers.  These
>>>>>>>>>> +      * tables are duplicated when additions are made (which means each
>>>>>>>>>> +      * table needs to be refcounted for the processes using it). When a new
>>>>>>>>>> +      * table is created, all the refcounters on the prog_list are bumped (to
>>>>>>>>>> +      * track each table that references the prog). When a new prog is
>>>>>>>>>> +      * added, it's just prepended to the list for the new table to point
>>>>>>>>>> +      * at.
>>>>>>>>>> +      *
>>>>>>>>>> +      * Manage all the possible errors before this step to not uselessly
>>>>>>>>>> +      * duplicate current_prog_set and avoid a rollback.
>>>>>>>>>> +      */
>>>>>>>>>> +     if (!new_prog_set) {
>>>>>>>>>> +             /*
>>>>>>>>>> +              * If there is no Landlock program set used by the current task,
>>>>>>>>>> +              * then create a new one.
>>>>>>>>>> +              */
>>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>>>> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>>>>>>>>>> +             /*
>>>>>>>>>> +              * If the current task is not the sole user of its Landlock
>>>>>>>>>> +              * program set, then duplicate them.
>>>>>>>>>> +              */
>>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>>>>>>>>>> +                     new_prog_set->programs[i] =
>>>>>>>>>> +                             READ_ONCE(current_prog_set->programs[i]);
>>>>>>>>>> +                     if (new_prog_set->programs[i])
>>>>>>>>>> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>>>>>>>>>> +             }
>>>>>>>>>> +
>>>>>>>>>> +             /*
>>>>>>>>>> +              * Landlock program set from the current task will not be freed
>>>>>>>>>> +              * here because the usage is strictly greater than 1. It is
>>>>>>>>>> +              * only prevented to be freed by another task thanks to the
>>>>>>>>>> +              * caller of landlock_prepend_prog() which should be locked if
>>>>>>>>>> +              * needed.
>>>>>>>>>> +              */
>>>>>>>>>> +             landlock_put_prog_set(current_prog_set);
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>> +     /* prepend tmp_prog_set to new_prog_set */
>>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>>>>>>>>>> +             /* get the last new list */
>>>>>>>>>> +             struct landlock_prog_list *last_list =
>>>>>>>>>> +                     tmp_prog_set.programs[i];
>>>>>>>>>> +
>>>>>>>>>> +             if (last_list) {
>>>>>>>>>> +                     while (last_list->prev)
>>>>>>>>>> +                             last_list = last_list->prev;
>>>>>>>>>> +                     /* no need to increment usage (pointer replacement) */
>>>>>>>>>> +                     last_list->prev = new_prog_set->programs[i];
>>>>>>>>>> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>>>>>>>>>> +             }
>>>>>>>>>> +     }
>>>>>>>>>> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>>>>>>>>>> +     return new_prog_set;
>>>>>>>>>> +
>>>>>>>>>> +put_tmp_lists:
>>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>>>>>>>>>> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>>>>>>>>>> +     return new_prog_set;
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> Nack on the chaining concept.
>>>>>>>>> Please do not reinvent the wheel.
>>>>>>>>> There is an existing mechanism for attaching/detaching/quering multiple
>>>>>>>>> programs attached to cgroup and tracing hooks that are also
>>>>>>>>> efficiently executed via BPF_PROG_RUN_ARRAY.
>>>>>>>>> Please use that instead.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't see how that would help.  Suppose you add a filter, then
>>>>>>>> fork(), and then the child adds another filter.  Do you want to
>>>>>>>> duplicate the entire array?  You certainly can't *modify* the array
>>>>>>>> because you'll affect processes that shouldn't be affected.
>>>>>>>>
>>>>>>>> In contrast, doing this through seccomp like the earlier patches
>>>>>>>> seemed just fine to me, and seccomp already had the right logic.
>>>>>>>
>>>>>>> it doesn't look to me that existing seccomp side of managing fork
>>>>>>> situation can be reused. Here there is an attempt to add 'chaining'
>>>>>>> concept which sort of an extension of existing seccomp style,
>>>>>>> but somehow heavily done on bpf side and contradicts cgroup/tracing.
>>>>>>>
>>>>>>
>>>>>> I don't see why the seccomp way can't be used.  I agree with you that
>>>>>> the seccomp *style* shouldn't be used in bpf code like this, but I
>>>>>> think that Landlock programs can and should just live in the existing
>>>>>> seccomp chain.  If the existing seccomp code needs some modification
>>>>>> to make this work, then so be it.
>>>>>
>>>>> +1
>>>>> if that was the case...
>>>>> but that's not my reading of the patch set.
>>>>
>>>> An earlier version of the patch set used the seccomp filter chain.
>>>> Mickaël, what exactly was wrong with that approach other than that the
>>>> seccomp() syscall was awkward for you to use?  You could add a
>>>> seccomp_add_landlock_rule() syscall if you needed to.
>>>
>>> Nothing was wrong about about that, this part did not changed (see my
>>> next comment).
>>>
>>>>
>>>> As a side comment, why is this an LSM at all, let alone a non-stacking
>>>> LSM?  It would make a lot more sense to me to make Landlock depend on
>>>> having LSMs configured in but to call the landlock hooks directly from
>>>> the security_xyz() hooks.
>>>
>>> See Casey's answer and his patch series: https://lwn.net/Articles/741963/
>>>
>>>>
>>>>>
>>>>>> In other words, the kernel already has two kinds of chaining:
>>>>>> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
>>>>>> across fork(), whereas seccomp's already handles that case correctly.
>>>>>> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
>>>>>>  So IMO Landlock should use the seccomp core code and call into bpf
>>>>>> for the actual filtering.
>>>>>
>>>>> +1
>>>>> in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
>>>>> since cgroup hierarchy can be complicated with bpf progs attached
>>>>> at different levels with different override/multiprog properties,
>>>>> so walking link list and checking all flags at run-time would have
>>>>> been too slow. That's why we added compute_effective_progs().
>>>>
>>>> If we start adding override flags to Landlock, I think we're doing it
>>>> wrong.   With cgroup bpf programs, the whole mess is set up by the
>>>> administrator.  With seccomp, and with Landlock if done correctly, it
>>>> *won't* be set up by the administrator, so the chance that everyone
>>>> gets all the flags right is about zero.  All attached filters should
>>>> run unconditionally.
>>>
>>>
>>> There is a misunderstanding about this chaining mechanism. This should
>>> not be confused with the list of seccomp filters nor the cgroup
>>> hierarchies. Landlock programs can be stacked the same way seccomp's
>>> filters can (cf. struct landlock_prog_set, the "chain_last" field is an
>>> optimization which is not used for this struct handling). This stackable
>>> property did not changed from the previous patch series. The chaining
>>> mechanism is for another use case, which does not make sense for seccomp
>>> filters nor other eBPF program types, at least for now, from what I can
>>> tell.
>>>
>>> You may want to get a look at my talk at FOSDEM
>>> (https://landlock.io/talks/2018-02-04_landlock-fosdem.pdf), especially
>>> slides 11 and 12.
>>>
>>> Let me explain my reasoning about this program chaining thing.
>>>
>>> To check if an action on a file is allowed, we first need to identify
>>> this file and match it to the security policy. In a previous
>>> (non-public) patch series, I tried to use one type of eBPF program to
>>> check every kind of access to a file. To be able to identify a file, I
>>> relied on an eBPF map, similar to the current inode map. This map store
>>> a set of references to file descriptors. I then created a function
>>> bpf_is_file_beneath() to check if the requested file was beneath a file
>>> in the map. This way, no chaining, only one eBPF program type to check
>>> an access to a file... but some issues then emerged. First, this design
>>> create a side-channel which help an attacker using such a program to
>>> infer some information not normally available, for example to get a hint
>>> on where a file descriptor (received from a UNIX socket) come from.
>>> Another issue is that this type of program would be called for each
>>> component of a path. Indeed, when the kernel check if an access to a
>>> file is allowed, it walk through all of the directories in its path
>>> (checking if the current process is allowed to execute them). That first
>>> attempt led me to rethink the way we could filter an access to a file
>>> *path*.
>>>
>>> To minimize the number of called to an eBPF program dedicated to
>>> validate an access to a file path, I decided to create three subtype of
>>> eBPF programs. The FS_WALK type is called when walking through every
>>> directory of a file path (except the last one if it is the target). We
>>> can then restrict this type of program to the minimum set of functions
>>> it is allowed to call and the minimum set of data available from its
>>> context. The first implicit chaining is for this type of program. To be
>>> able to evaluate a path while being called for all its components, this
>>> program need to store a state (to remember what was the parent directory
>>> of this path). There is no "previous" field in the subtype for this
>>> program because it is chained with itself, for each directories. This
>>> enable to create a FS_WALK program to evaluate a file hierarchy, thank
>>> to the inode map which can be used to check if a directory of this
>>> hierarchy is part of an allowed (or denied) list of directories. This
>>> design enables to express a file hierarchy in a programmatic way,
>>> without requiring an eBPF helper to do the job (unlike my first experiment).
>>>
>>> The explicit chaining is used to tied a path evaluation (with a FS_WALK
>>> program) to an access to the actual file being requested (the last
>>> component of a file path), with a FS_PICK program. It is only at this
>>> time that the kernel check for the requested action (e.g. read, write,
>>> chdir, append...). To be able to filter such access request we can have
>>> one call to the same program for every action and let this program check
>>> for which action it was called. However, this design does not allow the
>>> kernel to know if the current action is indeed handled by this program.
>>> Hence, it is not possible to implement a cache mechanism to only call
>>> this program if it knows how to handle this action.
>>>
>>> The approach I took for this FS_PICK type of program is to add to its
>>> subtype which action it can handle (with the "triggers" bitfield, seen
>>> as ORed actions). This way, the kernel knows if a call to a FS_PICK
>>> program is necessary. If the user wants to enforce a different security
>>> policy according to the action requested on a file, then it needs
>>> multiple FS_PICK programs. However, to reduce the number of such
>>> programs, this patch series allow a FS_PICK program to be chained with
>>> another, the same way a FS_WALK is chained with itself. This way, if the
>>> user want to check if the action is a for example an "open" and a "read"
>>> and not a "map" and a "read", then it can chain multiple FS_PICK
>>> programs with different triggers actions. The OR check performed by the
>>> kernel is not a limitation then, only a way to know if a call to an eBPF
>>> program is needed.
>>>
>>> The last type of program is FS_GET. This one is called when a process
>>> get a struct file or change its working directory. This is the only
>>> program type able (and allowed) to tag a file. This restriction is
>>> important to not being subject to resource exhaustion attacks (i.e.
>>> tagging every inode accessible to an attacker, which would allocate too
>>> much kernel memory).
>>>
>>> This design gives room for improvements to create a cache of eBPF
>>> context (input data, including maps if any), with the result of an eBPF
>>> program. This would help limit the number of call to an eBPF program the
>>> same way SELinux or other kernel components do to limit costly checks.
>>>
>>> The eBPF maps of progs are useful to call the same type of eBPF
>>> program. It does not fit with this use case because we may want multiple
>>> eBPF program according to the action requested on a kernel object (e.g.
>>> FS_GET). The other reason is because the eBPF program does not know what
>>> will be the next (type of) access check performed by the kernel.
>>>
>>> To say it another way, this chaining mechanism is a way to split a
>>> kernel object evaluation with multiple specialized programs, each of
>>> them being able to deal with data tied to their type. Using a monolithic
>>> eBPF program to check everything does not scale and does not fit with
>>> unprivileged use either.
>>>
>>> As a side note, the cookie value is only an ephemeral value to keep a
>>> state between multiple programs call. It can be used to create a state
>>> machine for an object evaluation.
>>>
>>> I don't see a way to do an efficient and programmatic path evaluation,
>>> with different access checks, with the current eBPF features. Please let
>>> me know if you know how to do it another way.
>>>
>>
>> Andy, Alexei, Daniel, what do you think about this Landlock program
>> chaining and cookie?
>>
> 
> Can you give a small pseudocode real world example that acutally needs
> chaining?  The mechanism is quite complicated and I'd like to
> understand how it'll be used.
> 

Here is the interesting part from the example (patch 09/11):

+SEC("maps")
+struct bpf_map_def inode_map = {
+	.type = BPF_MAP_TYPE_INODE,
+	.key_size = sizeof(u32),
+	.value_size = sizeof(u64),
+	.max_entries = 20,
+};
+
+SEC("subtype/landlock1")
+static union bpf_prog_subtype _subtype1 = {
+	.landlock_hook = {
+		.type = LANDLOCK_HOOK_FS_WALK,
+	}
+};
+
+static __always_inline __u64 update_cookie(__u64 cookie, __u8 lookup,
+		void *inode, void *chain, bool freeze)
+{
+	__u64 map_allow = 0;
+
+	if (cookie == 0) {
+		cookie = bpf_inode_get_tag(inode, chain);
+		if (cookie)
+			return cookie;
+		/* only look for the first match in the map, ignore nested
+		 * paths in this example */
+		map_allow = bpf_inode_map_lookup(&inode_map, inode);
+		if (map_allow)
+			cookie = 1 | map_allow;
+	} else {
+		if (cookie & COOKIE_VALUE_FREEZED)
+			return cookie;
+		map_allow = cookie & _MAP_MARK_MASK;
+		cookie &= ~_MAP_MARK_MASK;
+		switch (lookup) {
+		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT:
+			cookie--;
+			break;
+		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT:
+			break;
+		default:
+			/* ignore _MAP_MARK_MASK overflow in this example */
+			cookie++;
+			break;
+		}
+		if (cookie >= 1)
+			cookie |= map_allow;
+	}
+	/* do not modify the cookie for each fs_pick */
+	if (freeze && cookie)
+		cookie |= COOKIE_VALUE_FREEZED;
+	return cookie;
+}
+
+SEC("landlock1")
+int fs_walk(struct landlock_ctx_fs_walk *ctx)
+{
+	ctx->cookie = update_cookie(ctx->cookie, ctx->inode_lookup,
+			(void *)ctx->inode, (void *)ctx->chain, false);
+	return LANDLOCK_RET_ALLOW;
+}

The program "landlock1" is called for every directory execution (except
the last one if it is the leaf of a path). This enables to identify a
file hierarchy with only a (one dimension) list of file descriptors
(i.e. inode_map).

Underneath, the Landlock LSM part looks if there is an associated path
walk (nameidata) with each inode access request. If there is one, then
the cookie associated with the path walk (if any) is made available
through the eBPF program context. This enables to develop a state
machine with an eBPF program to "evaluate" a file path (without string
parsing).

The goal with this chaining mechanism is to be able to express a complex
kernel object like a file, with multiple run of one or more eBPF
programs, as a multilayer evaluation. This semantic may only make sense
for the user/developer and his security policy. We must keep in mind
that this object identification should be available to unprivileged
processes. This means that we must be very careful to what kind of
information are available to an eBPF program because this can then leak
to a process (e.g. through a map). With this mechanism, only information
already available to user space is available to the eBPF program.

In this example, the complexity of the path evaluation is in the eBPF
program. We can then keep the kernel code more simple and generic. This
enables more flexibility for a security policy definition.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-04-08 22:01                     ` Mickaël Salaün
@ 2018-04-10  4:48                       ` Alexei Starovoitov
  2018-04-11 22:18                         ` Mickaël Salaün
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2018-04-10  4:48 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Andy Lutomirski, Daniel Borkmann, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, David Drysdale,
	David S . Miller, Eric W . Biederman, Jann Horn, Jonathan Corbet,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf,
	Tycho Andersen, Will Drewry, Kernel Hardening, Linux API,
	LSM List, Network Development, Andrew Morton, Al Viro,
	Linux-Fsdevel

On Mon, Apr 09, 2018 at 12:01:59AM +0200, Mickaël Salaün wrote:
> 
> On 04/08/2018 11:06 PM, Andy Lutomirski wrote:
> > On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün <mic@digikod.net> wrote:
> >>
> >> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
> >>>
> >>> On 27/02/2018 17:39, Andy Lutomirski wrote:
> >>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
> >>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>> On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
> >>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
> >>>>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
> >>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
> >>>>>>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
> >>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock program
> >>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the
> >>>>>>>>>> current task and all its future children. A program is immutable and a
> >>>>>>>>>> task can only add new restricting programs to itself, forming a list of
> >>>>>>>>>> programss.
> >>>>>>>>>>
> >>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a kernel
> >>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
> >>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of
> >>>>>>>>>> object is triggered. The list of programs for this hook is then
> >>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the action
> >>>>>>>>>> on a kernel object with a non-zero value. If every programs of the list
> >>>>>>>>>> return zero, then the action on the object is allowed.
> >>>>>>>>>>
> >>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for a
> >>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
> >>>>>>>>>> chaining is restricted when a process construct this chain by loading a
> >>>>>>>>>> program, but additional checks are performed when it requests to apply
> >>>>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
> >>>>>>>>>> not possible to call multiple programs in a way that would imply to
> >>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
> >>>>>>>>>> only a fs_pick program can be chained to the same type of program,
> >>>>>>>>>> because it may make sense if they have different triggers (cf. next
> >>>>>>>>>> commits).  This restrictions still allows to reuse Landlock programs in
> >>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
> >>>>>>>>>> chains of fs_pick programs).
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> >>>>>>>>>
> >>>>>>>>> ...
> >>>>>>>>>
> >>>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
> >>>>>>>>>> +             struct landlock_prog_set *current_prog_set,
> >>>>>>>>>> +             struct bpf_prog *prog)
> >>>>>>>>>> +{
> >>>>>>>>>> +     struct landlock_prog_set *new_prog_set = current_prog_set;
> >>>>>>>>>> +     unsigned long pages;
> >>>>>>>>>> +     int err;
> >>>>>>>>>> +     size_t i;
> >>>>>>>>>> +     struct landlock_prog_set tmp_prog_set = {};
> >>>>>>>>>> +
> >>>>>>>>>> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
> >>>>>>>>>> +             return ERR_PTR(-EINVAL);
> >>>>>>>>>> +
> >>>>>>>>>> +     /* validate memory size allocation */
> >>>>>>>>>> +     pages = prog->pages;
> >>>>>>>>>> +     if (current_prog_set) {
> >>>>>>>>>> +             size_t i;
> >>>>>>>>>> +
> >>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
> >>>>>>>>>> +                     struct landlock_prog_list *walker_p;
> >>>>>>>>>> +
> >>>>>>>>>> +                     for (walker_p = current_prog_set->programs[i];
> >>>>>>>>>> +                                     walker_p; walker_p = walker_p->prev)
> >>>>>>>>>> +                             pages += walker_p->prog->pages;
> >>>>>>>>>> +             }
> >>>>>>>>>> +             /* count a struct landlock_prog_set if we need to allocate one */
> >>>>>>>>>> +             if (refcount_read(&current_prog_set->usage) != 1)
> >>>>>>>>>> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
> >>>>>>>>>> +                             / PAGE_SIZE;
> >>>>>>>>>> +     }
> >>>>>>>>>> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
> >>>>>>>>>> +             return ERR_PTR(-E2BIG);
> >>>>>>>>>> +
> >>>>>>>>>> +     /* ensure early that we can allocate enough memory for the new
> >>>>>>>>>> +      * prog_lists */
> >>>>>>>>>> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
> >>>>>>>>>> +     if (err)
> >>>>>>>>>> +             return ERR_PTR(err);
> >>>>>>>>>> +
> >>>>>>>>>> +     /*
> >>>>>>>>>> +      * Each task_struct points to an array of prog list pointers.  These
> >>>>>>>>>> +      * tables are duplicated when additions are made (which means each
> >>>>>>>>>> +      * table needs to be refcounted for the processes using it). When a new
> >>>>>>>>>> +      * table is created, all the refcounters on the prog_list are bumped (to
> >>>>>>>>>> +      * track each table that references the prog). When a new prog is
> >>>>>>>>>> +      * added, it's just prepended to the list for the new table to point
> >>>>>>>>>> +      * at.
> >>>>>>>>>> +      *
> >>>>>>>>>> +      * Manage all the possible errors before this step to not uselessly
> >>>>>>>>>> +      * duplicate current_prog_set and avoid a rollback.
> >>>>>>>>>> +      */
> >>>>>>>>>> +     if (!new_prog_set) {
> >>>>>>>>>> +             /*
> >>>>>>>>>> +              * If there is no Landlock program set used by the current task,
> >>>>>>>>>> +              * then create a new one.
> >>>>>>>>>> +              */
> >>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
> >>>>>>>>>> +             if (IS_ERR(new_prog_set))
> >>>>>>>>>> +                     goto put_tmp_lists;
> >>>>>>>>>> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
> >>>>>>>>>> +             /*
> >>>>>>>>>> +              * If the current task is not the sole user of its Landlock
> >>>>>>>>>> +              * program set, then duplicate them.
> >>>>>>>>>> +              */
> >>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
> >>>>>>>>>> +             if (IS_ERR(new_prog_set))
> >>>>>>>>>> +                     goto put_tmp_lists;
> >>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
> >>>>>>>>>> +                     new_prog_set->programs[i] =
> >>>>>>>>>> +                             READ_ONCE(current_prog_set->programs[i]);
> >>>>>>>>>> +                     if (new_prog_set->programs[i])
> >>>>>>>>>> +                             refcount_inc(&new_prog_set->programs[i]->usage);
> >>>>>>>>>> +             }
> >>>>>>>>>> +
> >>>>>>>>>> +             /*
> >>>>>>>>>> +              * Landlock program set from the current task will not be freed
> >>>>>>>>>> +              * here because the usage is strictly greater than 1. It is
> >>>>>>>>>> +              * only prevented to be freed by another task thanks to the
> >>>>>>>>>> +              * caller of landlock_prepend_prog() which should be locked if
> >>>>>>>>>> +              * needed.
> >>>>>>>>>> +              */
> >>>>>>>>>> +             landlock_put_prog_set(current_prog_set);
> >>>>>>>>>> +     }
> >>>>>>>>>> +
> >>>>>>>>>> +     /* prepend tmp_prog_set to new_prog_set */
> >>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
> >>>>>>>>>> +             /* get the last new list */
> >>>>>>>>>> +             struct landlock_prog_list *last_list =
> >>>>>>>>>> +                     tmp_prog_set.programs[i];
> >>>>>>>>>> +
> >>>>>>>>>> +             if (last_list) {
> >>>>>>>>>> +                     while (last_list->prev)
> >>>>>>>>>> +                             last_list = last_list->prev;
> >>>>>>>>>> +                     /* no need to increment usage (pointer replacement) */
> >>>>>>>>>> +                     last_list->prev = new_prog_set->programs[i];
> >>>>>>>>>> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
> >>>>>>>>>> +             }
> >>>>>>>>>> +     }
> >>>>>>>>>> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
> >>>>>>>>>> +     return new_prog_set;
> >>>>>>>>>> +
> >>>>>>>>>> +put_tmp_lists:
> >>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
> >>>>>>>>>> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
> >>>>>>>>>> +     return new_prog_set;
> >>>>>>>>>> +}
> >>>>>>>>>
> >>>>>>>>> Nack on the chaining concept.
> >>>>>>>>> Please do not reinvent the wheel.
> >>>>>>>>> There is an existing mechanism for attaching/detaching/quering multiple
> >>>>>>>>> programs attached to cgroup and tracing hooks that are also
> >>>>>>>>> efficiently executed via BPF_PROG_RUN_ARRAY.
> >>>>>>>>> Please use that instead.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I don't see how that would help.  Suppose you add a filter, then
> >>>>>>>> fork(), and then the child adds another filter.  Do you want to
> >>>>>>>> duplicate the entire array?  You certainly can't *modify* the array
> >>>>>>>> because you'll affect processes that shouldn't be affected.
> >>>>>>>>
> >>>>>>>> In contrast, doing this through seccomp like the earlier patches
> >>>>>>>> seemed just fine to me, and seccomp already had the right logic.
> >>>>>>>
> >>>>>>> it doesn't look to me that existing seccomp side of managing fork
> >>>>>>> situation can be reused. Here there is an attempt to add 'chaining'
> >>>>>>> concept which sort of an extension of existing seccomp style,
> >>>>>>> but somehow heavily done on bpf side and contradicts cgroup/tracing.
> >>>>>>>
> >>>>>>
> >>>>>> I don't see why the seccomp way can't be used.  I agree with you that
> >>>>>> the seccomp *style* shouldn't be used in bpf code like this, but I
> >>>>>> think that Landlock programs can and should just live in the existing
> >>>>>> seccomp chain.  If the existing seccomp code needs some modification
> >>>>>> to make this work, then so be it.
> >>>>>
> >>>>> +1
> >>>>> if that was the case...
> >>>>> but that's not my reading of the patch set.
> >>>>
> >>>> An earlier version of the patch set used the seccomp filter chain.
> >>>> Mickaël, what exactly was wrong with that approach other than that the
> >>>> seccomp() syscall was awkward for you to use?  You could add a
> >>>> seccomp_add_landlock_rule() syscall if you needed to.
> >>>
> >>> Nothing was wrong about about that, this part did not changed (see my
> >>> next comment).
> >>>
> >>>>
> >>>> As a side comment, why is this an LSM at all, let alone a non-stacking
> >>>> LSM?  It would make a lot more sense to me to make Landlock depend on
> >>>> having LSMs configured in but to call the landlock hooks directly from
> >>>> the security_xyz() hooks.
> >>>
> >>> See Casey's answer and his patch series: https://lwn.net/Articles/741963/
> >>>
> >>>>
> >>>>>
> >>>>>> In other words, the kernel already has two kinds of chaining:
> >>>>>> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
> >>>>>> across fork(), whereas seccomp's already handles that case correctly.
> >>>>>> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
> >>>>>>  So IMO Landlock should use the seccomp core code and call into bpf
> >>>>>> for the actual filtering.
> >>>>>
> >>>>> +1
> >>>>> in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
> >>>>> since cgroup hierarchy can be complicated with bpf progs attached
> >>>>> at different levels with different override/multiprog properties,
> >>>>> so walking link list and checking all flags at run-time would have
> >>>>> been too slow. That's why we added compute_effective_progs().
> >>>>
> >>>> If we start adding override flags to Landlock, I think we're doing it
> >>>> wrong.   With cgroup bpf programs, the whole mess is set up by the
> >>>> administrator.  With seccomp, and with Landlock if done correctly, it
> >>>> *won't* be set up by the administrator, so the chance that everyone
> >>>> gets all the flags right is about zero.  All attached filters should
> >>>> run unconditionally.
> >>>
> >>>
> >>> There is a misunderstanding about this chaining mechanism. This should
> >>> not be confused with the list of seccomp filters nor the cgroup
> >>> hierarchies. Landlock programs can be stacked the same way seccomp's
> >>> filters can (cf. struct landlock_prog_set, the "chain_last" field is an
> >>> optimization which is not used for this struct handling). This stackable
> >>> property did not changed from the previous patch series. The chaining
> >>> mechanism is for another use case, which does not make sense for seccomp
> >>> filters nor other eBPF program types, at least for now, from what I can
> >>> tell.
> >>>
> >>> You may want to get a look at my talk at FOSDEM
> >>> (https://landlock.io/talks/2018-02-04_landlock-fosdem.pdf), especially
> >>> slides 11 and 12.
> >>>
> >>> Let me explain my reasoning about this program chaining thing.
> >>>
> >>> To check if an action on a file is allowed, we first need to identify
> >>> this file and match it to the security policy. In a previous
> >>> (non-public) patch series, I tried to use one type of eBPF program to
> >>> check every kind of access to a file. To be able to identify a file, I
> >>> relied on an eBPF map, similar to the current inode map. This map store
> >>> a set of references to file descriptors. I then created a function
> >>> bpf_is_file_beneath() to check if the requested file was beneath a file
> >>> in the map. This way, no chaining, only one eBPF program type to check
> >>> an access to a file... but some issues then emerged. First, this design
> >>> create a side-channel which help an attacker using such a program to
> >>> infer some information not normally available, for example to get a hint
> >>> on where a file descriptor (received from a UNIX socket) come from.
> >>> Another issue is that this type of program would be called for each
> >>> component of a path. Indeed, when the kernel check if an access to a
> >>> file is allowed, it walk through all of the directories in its path
> >>> (checking if the current process is allowed to execute them). That first
> >>> attempt led me to rethink the way we could filter an access to a file
> >>> *path*.
> >>>
> >>> To minimize the number of called to an eBPF program dedicated to
> >>> validate an access to a file path, I decided to create three subtype of
> >>> eBPF programs. The FS_WALK type is called when walking through every
> >>> directory of a file path (except the last one if it is the target). We
> >>> can then restrict this type of program to the minimum set of functions
> >>> it is allowed to call and the minimum set of data available from its
> >>> context. The first implicit chaining is for this type of program. To be
> >>> able to evaluate a path while being called for all its components, this
> >>> program need to store a state (to remember what was the parent directory
> >>> of this path). There is no "previous" field in the subtype for this
> >>> program because it is chained with itself, for each directories. This
> >>> enable to create a FS_WALK program to evaluate a file hierarchy, thank
> >>> to the inode map which can be used to check if a directory of this
> >>> hierarchy is part of an allowed (or denied) list of directories. This
> >>> design enables to express a file hierarchy in a programmatic way,
> >>> without requiring an eBPF helper to do the job (unlike my first experiment).
> >>>
> >>> The explicit chaining is used to tied a path evaluation (with a FS_WALK
> >>> program) to an access to the actual file being requested (the last
> >>> component of a file path), with a FS_PICK program. It is only at this
> >>> time that the kernel check for the requested action (e.g. read, write,
> >>> chdir, append...). To be able to filter such access request we can have
> >>> one call to the same program for every action and let this program check
> >>> for which action it was called. However, this design does not allow the
> >>> kernel to know if the current action is indeed handled by this program.
> >>> Hence, it is not possible to implement a cache mechanism to only call
> >>> this program if it knows how to handle this action.
> >>>
> >>> The approach I took for this FS_PICK type of program is to add to its
> >>> subtype which action it can handle (with the "triggers" bitfield, seen
> >>> as ORed actions). This way, the kernel knows if a call to a FS_PICK
> >>> program is necessary. If the user wants to enforce a different security
> >>> policy according to the action requested on a file, then it needs
> >>> multiple FS_PICK programs. However, to reduce the number of such
> >>> programs, this patch series allow a FS_PICK program to be chained with
> >>> another, the same way a FS_WALK is chained with itself. This way, if the
> >>> user want to check if the action is a for example an "open" and a "read"
> >>> and not a "map" and a "read", then it can chain multiple FS_PICK
> >>> programs with different triggers actions. The OR check performed by the
> >>> kernel is not a limitation then, only a way to know if a call to an eBPF
> >>> program is needed.
> >>>
> >>> The last type of program is FS_GET. This one is called when a process
> >>> get a struct file or change its working directory. This is the only
> >>> program type able (and allowed) to tag a file. This restriction is
> >>> important to not being subject to resource exhaustion attacks (i.e.
> >>> tagging every inode accessible to an attacker, which would allocate too
> >>> much kernel memory).
> >>>
> >>> This design gives room for improvements to create a cache of eBPF
> >>> context (input data, including maps if any), with the result of an eBPF
> >>> program. This would help limit the number of call to an eBPF program the
> >>> same way SELinux or other kernel components do to limit costly checks.
> >>>
> >>> The eBPF maps of progs are useful to call the same type of eBPF
> >>> program. It does not fit with this use case because we may want multiple
> >>> eBPF program according to the action requested on a kernel object (e.g.
> >>> FS_GET). The other reason is because the eBPF program does not know what
> >>> will be the next (type of) access check performed by the kernel.
> >>>
> >>> To say it another way, this chaining mechanism is a way to split a
> >>> kernel object evaluation with multiple specialized programs, each of
> >>> them being able to deal with data tied to their type. Using a monolithic
> >>> eBPF program to check everything does not scale and does not fit with
> >>> unprivileged use either.
> >>>
> >>> As a side note, the cookie value is only an ephemeral value to keep a
> >>> state between multiple programs call. It can be used to create a state
> >>> machine for an object evaluation.
> >>>
> >>> I don't see a way to do an efficient and programmatic path evaluation,
> >>> with different access checks, with the current eBPF features. Please let
> >>> me know if you know how to do it another way.
> >>>
> >>
> >> Andy, Alexei, Daniel, what do you think about this Landlock program
> >> chaining and cookie?
> >>
> > 
> > Can you give a small pseudocode real world example that acutally needs
> > chaining?  The mechanism is quite complicated and I'd like to
> > understand how it'll be used.
> > 
> 
> Here is the interesting part from the example (patch 09/11):
> 
> +SEC("maps")
> +struct bpf_map_def inode_map = {
> +	.type = BPF_MAP_TYPE_INODE,
> +	.key_size = sizeof(u32),
> +	.value_size = sizeof(u64),
> +	.max_entries = 20,
> +};
> +
> +SEC("subtype/landlock1")
> +static union bpf_prog_subtype _subtype1 = {
> +	.landlock_hook = {
> +		.type = LANDLOCK_HOOK_FS_WALK,
> +	}
> +};
> +
> +static __always_inline __u64 update_cookie(__u64 cookie, __u8 lookup,
> +		void *inode, void *chain, bool freeze)
> +{
> +	__u64 map_allow = 0;
> +
> +	if (cookie == 0) {
> +		cookie = bpf_inode_get_tag(inode, chain);
> +		if (cookie)
> +			return cookie;
> +		/* only look for the first match in the map, ignore nested
> +		 * paths in this example */
> +		map_allow = bpf_inode_map_lookup(&inode_map, inode);
> +		if (map_allow)
> +			cookie = 1 | map_allow;
> +	} else {
> +		if (cookie & COOKIE_VALUE_FREEZED)
> +			return cookie;
> +		map_allow = cookie & _MAP_MARK_MASK;
> +		cookie &= ~_MAP_MARK_MASK;
> +		switch (lookup) {
> +		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT:
> +			cookie--;
> +			break;
> +		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT:
> +			break;
> +		default:
> +			/* ignore _MAP_MARK_MASK overflow in this example */
> +			cookie++;
> +			break;
> +		}
> +		if (cookie >= 1)
> +			cookie |= map_allow;
> +	}
> +	/* do not modify the cookie for each fs_pick */
> +	if (freeze && cookie)
> +		cookie |= COOKIE_VALUE_FREEZED;
> +	return cookie;
> +}
> +
> +SEC("landlock1")
> +int fs_walk(struct landlock_ctx_fs_walk *ctx)
> +{
> +	ctx->cookie = update_cookie(ctx->cookie, ctx->inode_lookup,
> +			(void *)ctx->inode, (void *)ctx->chain, false);
> +	return LANDLOCK_RET_ALLOW;
> +}
> 
> The program "landlock1" is called for every directory execution (except
> the last one if it is the leaf of a path). This enables to identify a
> file hierarchy with only a (one dimension) list of file descriptors
> (i.e. inode_map).
> 
> Underneath, the Landlock LSM part looks if there is an associated path
> walk (nameidata) with each inode access request. If there is one, then
> the cookie associated with the path walk (if any) is made available
> through the eBPF program context. This enables to develop a state
> machine with an eBPF program to "evaluate" a file path (without string
> parsing).
> 
> The goal with this chaining mechanism is to be able to express a complex
> kernel object like a file, with multiple run of one or more eBPF
> programs, as a multilayer evaluation. This semantic may only make sense
> for the user/developer and his security policy. We must keep in mind
> that this object identification should be available to unprivileged
> processes. This means that we must be very careful to what kind of
> information are available to an eBPF program because this can then leak
> to a process (e.g. through a map). With this mechanism, only information
> already available to user space is available to the eBPF program.
> 
> In this example, the complexity of the path evaluation is in the eBPF
> program. We can then keep the kernel code more simple and generic. This
> enables more flexibility for a security policy definition.

it all sounds correct on paper, but it's pretty novel
approach and I'm not sure I see all the details in the patch.
When people say "inode" they most of the time mean inode integer number,
whereas in this patch do you mean a raw pointer to in-kernel
'struct inode' ?
To avoid confusion it should probably be called differently.

If you meant inode as a number then why inode only?
where is superblock, device, mount point?
How bpf side can compare inodes without this additional info?
How bpf side will know what inode to compare to?
What if inode number is reused?
This approach is an optimization to compare inodes
instead of strings passed into sys_open ?

If you meant inode as a pointer how bpf side will
know the pointer before the walk begins?
What guarantees that it's not a stale pointer?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
  2018-04-10  4:48                       ` Alexei Starovoitov
@ 2018-04-11 22:18                         ` Mickaël Salaün
  0 siblings, 0 replies; 55+ messages in thread
From: Mickaël Salaün @ 2018-04-11 22:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Daniel Borkmann, LKML, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Casey Schaufler, David Drysdale,
	David S . Miller, Eric W . Biederman, Jann Horn, Jonathan Corbet,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf,
	Tycho Andersen, Will Drewry, Kernel Hardening, Linux API,
	LSM List, Network Development, Andrew Morton, Al Viro,
	Linux-Fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 25891 bytes --]


On 04/10/2018 06:48 AM, Alexei Starovoitov wrote:
> On Mon, Apr 09, 2018 at 12:01:59AM +0200, Mickaël Salaün wrote:
>>
>> On 04/08/2018 11:06 PM, Andy Lutomirski wrote:
>>> On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>>>
>>>> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
>>>>>
>>>>> On 27/02/2018 17:39, Andy Lutomirski wrote:
>>>>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>> On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote:
>>>>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote:
>>>>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock program
>>>>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the
>>>>>>>>>>>> current task and all its future children. A program is immutable and a
>>>>>>>>>>>> task can only add new restricting programs to itself, forming a list of
>>>>>>>>>>>> programss.
>>>>>>>>>>>>
>>>>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a kernel
>>>>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of
>>>>>>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the action
>>>>>>>>>>>> on a kernel object with a non-zero value. If every programs of the list
>>>>>>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>>>>>>
>>>>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for a
>>>>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
>>>>>>>>>>>> chaining is restricted when a process construct this chain by loading a
>>>>>>>>>>>> program, but additional checks are performed when it requests to apply
>>>>>>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
>>>>>>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>>>>>>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>>>>>>> commits).  This restrictions still allows to reuse Landlock programs in
>>>>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>>>>>>> chains of fs_pick programs).
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
>>>>>>>>>>>> +             struct landlock_prog_set *current_prog_set,
>>>>>>>>>>>> +             struct bpf_prog *prog)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +     struct landlock_prog_set *new_prog_set = current_prog_set;
>>>>>>>>>>>> +     unsigned long pages;
>>>>>>>>>>>> +     int err;
>>>>>>>>>>>> +     size_t i;
>>>>>>>>>>>> +     struct landlock_prog_set tmp_prog_set = {};
>>>>>>>>>>>> +
>>>>>>>>>>>> +     if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>>>>>>>>>>>> +             return ERR_PTR(-EINVAL);
>>>>>>>>>>>> +
>>>>>>>>>>>> +     /* validate memory size allocation */
>>>>>>>>>>>> +     pages = prog->pages;
>>>>>>>>>>>> +     if (current_prog_set) {
>>>>>>>>>>>> +             size_t i;
>>>>>>>>>>>> +
>>>>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) {
>>>>>>>>>>>> +                     struct landlock_prog_list *walker_p;
>>>>>>>>>>>> +
>>>>>>>>>>>> +                     for (walker_p = current_prog_set->programs[i];
>>>>>>>>>>>> +                                     walker_p; walker_p = walker_p->prev)
>>>>>>>>>>>> +                             pages += walker_p->prog->pages;
>>>>>>>>>>>> +             }
>>>>>>>>>>>> +             /* count a struct landlock_prog_set if we need to allocate one */
>>>>>>>>>>>> +             if (refcount_read(&current_prog_set->usage) != 1)
>>>>>>>>>>>> +                     pages += round_up(sizeof(*current_prog_set), PAGE_SIZE)
>>>>>>>>>>>> +                             / PAGE_SIZE;
>>>>>>>>>>>> +     }
>>>>>>>>>>>> +     if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>>>>>>>>>>>> +             return ERR_PTR(-E2BIG);
>>>>>>>>>>>> +
>>>>>>>>>>>> +     /* ensure early that we can allocate enough memory for the new
>>>>>>>>>>>> +      * prog_lists */
>>>>>>>>>>>> +     err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog);
>>>>>>>>>>>> +     if (err)
>>>>>>>>>>>> +             return ERR_PTR(err);
>>>>>>>>>>>> +
>>>>>>>>>>>> +     /*
>>>>>>>>>>>> +      * Each task_struct points to an array of prog list pointers.  These
>>>>>>>>>>>> +      * tables are duplicated when additions are made (which means each
>>>>>>>>>>>> +      * table needs to be refcounted for the processes using it). When a new
>>>>>>>>>>>> +      * table is created, all the refcounters on the prog_list are bumped (to
>>>>>>>>>>>> +      * track each table that references the prog). When a new prog is
>>>>>>>>>>>> +      * added, it's just prepended to the list for the new table to point
>>>>>>>>>>>> +      * at.
>>>>>>>>>>>> +      *
>>>>>>>>>>>> +      * Manage all the possible errors before this step to not uselessly
>>>>>>>>>>>> +      * duplicate current_prog_set and avoid a rollback.
>>>>>>>>>>>> +      */
>>>>>>>>>>>> +     if (!new_prog_set) {
>>>>>>>>>>>> +             /*
>>>>>>>>>>>> +              * If there is no Landlock program set used by the current task,
>>>>>>>>>>>> +              * then create a new one.
>>>>>>>>>>>> +              */
>>>>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>>>>>> +     } else if (refcount_read(&current_prog_set->usage) > 1) {
>>>>>>>>>>>> +             /*
>>>>>>>>>>>> +              * If the current task is not the sole user of its Landlock
>>>>>>>>>>>> +              * program set, then duplicate them.
>>>>>>>>>>>> +              */
>>>>>>>>>>>> +             new_prog_set = new_landlock_prog_set();
>>>>>>>>>>>> +             if (IS_ERR(new_prog_set))
>>>>>>>>>>>> +                     goto put_tmp_lists;
>>>>>>>>>>>> +             for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) {
>>>>>>>>>>>> +                     new_prog_set->programs[i] =
>>>>>>>>>>>> +                             READ_ONCE(current_prog_set->programs[i]);
>>>>>>>>>>>> +                     if (new_prog_set->programs[i])
>>>>>>>>>>>> +                             refcount_inc(&new_prog_set->programs[i]->usage);
>>>>>>>>>>>> +             }
>>>>>>>>>>>> +
>>>>>>>>>>>> +             /*
>>>>>>>>>>>> +              * Landlock program set from the current task will not be freed
>>>>>>>>>>>> +              * here because the usage is strictly greater than 1. It is
>>>>>>>>>>>> +              * only prevented to be freed by another task thanks to the
>>>>>>>>>>>> +              * caller of landlock_prepend_prog() which should be locked if
>>>>>>>>>>>> +              * needed.
>>>>>>>>>>>> +              */
>>>>>>>>>>>> +             landlock_put_prog_set(current_prog_set);
>>>>>>>>>>>> +     }
>>>>>>>>>>>> +
>>>>>>>>>>>> +     /* prepend tmp_prog_set to new_prog_set */
>>>>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) {
>>>>>>>>>>>> +             /* get the last new list */
>>>>>>>>>>>> +             struct landlock_prog_list *last_list =
>>>>>>>>>>>> +                     tmp_prog_set.programs[i];
>>>>>>>>>>>> +
>>>>>>>>>>>> +             if (last_list) {
>>>>>>>>>>>> +                     while (last_list->prev)
>>>>>>>>>>>> +                             last_list = last_list->prev;
>>>>>>>>>>>> +                     /* no need to increment usage (pointer replacement) */
>>>>>>>>>>>> +                     last_list->prev = new_prog_set->programs[i];
>>>>>>>>>>>> +                     new_prog_set->programs[i] = tmp_prog_set.programs[i];
>>>>>>>>>>>> +             }
>>>>>>>>>>>> +     }
>>>>>>>>>>>> +     new_prog_set->chain_last = tmp_prog_set.chain_last;
>>>>>>>>>>>> +     return new_prog_set;
>>>>>>>>>>>> +
>>>>>>>>>>>> +put_tmp_lists:
>>>>>>>>>>>> +     for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++)
>>>>>>>>>>>> +             put_landlock_prog_list(tmp_prog_set.programs[i]);
>>>>>>>>>>>> +     return new_prog_set;
>>>>>>>>>>>> +}
>>>>>>>>>>>
>>>>>>>>>>> Nack on the chaining concept.
>>>>>>>>>>> Please do not reinvent the wheel.
>>>>>>>>>>> There is an existing mechanism for attaching/detaching/quering multiple
>>>>>>>>>>> programs attached to cgroup and tracing hooks that are also
>>>>>>>>>>> efficiently executed via BPF_PROG_RUN_ARRAY.
>>>>>>>>>>> Please use that instead.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't see how that would help.  Suppose you add a filter, then
>>>>>>>>>> fork(), and then the child adds another filter.  Do you want to
>>>>>>>>>> duplicate the entire array?  You certainly can't *modify* the array
>>>>>>>>>> because you'll affect processes that shouldn't be affected.
>>>>>>>>>>
>>>>>>>>>> In contrast, doing this through seccomp like the earlier patches
>>>>>>>>>> seemed just fine to me, and seccomp already had the right logic.
>>>>>>>>>
>>>>>>>>> it doesn't look to me that existing seccomp side of managing fork
>>>>>>>>> situation can be reused. Here there is an attempt to add 'chaining'
>>>>>>>>> concept which sort of an extension of existing seccomp style,
>>>>>>>>> but somehow heavily done on bpf side and contradicts cgroup/tracing.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't see why the seccomp way can't be used.  I agree with you that
>>>>>>>> the seccomp *style* shouldn't be used in bpf code like this, but I
>>>>>>>> think that Landlock programs can and should just live in the existing
>>>>>>>> seccomp chain.  If the existing seccomp code needs some modification
>>>>>>>> to make this work, then so be it.
>>>>>>>
>>>>>>> +1
>>>>>>> if that was the case...
>>>>>>> but that's not my reading of the patch set.
>>>>>>
>>>>>> An earlier version of the patch set used the seccomp filter chain.
>>>>>> Mickaël, what exactly was wrong with that approach other than that the
>>>>>> seccomp() syscall was awkward for you to use?  You could add a
>>>>>> seccomp_add_landlock_rule() syscall if you needed to.
>>>>>
>>>>> Nothing was wrong about about that, this part did not changed (see my
>>>>> next comment).
>>>>>
>>>>>>
>>>>>> As a side comment, why is this an LSM at all, let alone a non-stacking
>>>>>> LSM?  It would make a lot more sense to me to make Landlock depend on
>>>>>> having LSMs configured in but to call the landlock hooks directly from
>>>>>> the security_xyz() hooks.
>>>>>
>>>>> See Casey's answer and his patch series: https://lwn.net/Articles/741963/
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> In other words, the kernel already has two kinds of chaining:
>>>>>>>> seccomp's and bpf's.  bpf's doesn't work right for this type of usage
>>>>>>>> across fork(), whereas seccomp's already handles that case correctly.
>>>>>>>> (In contrast, seccomp's is totally wrong for cgroup-attached filters.)
>>>>>>>>  So IMO Landlock should use the seccomp core code and call into bpf
>>>>>>>> for the actual filtering.
>>>>>>>
>>>>>>> +1
>>>>>>> in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism,
>>>>>>> since cgroup hierarchy can be complicated with bpf progs attached
>>>>>>> at different levels with different override/multiprog properties,
>>>>>>> so walking link list and checking all flags at run-time would have
>>>>>>> been too slow. That's why we added compute_effective_progs().
>>>>>>
>>>>>> If we start adding override flags to Landlock, I think we're doing it
>>>>>> wrong.   With cgroup bpf programs, the whole mess is set up by the
>>>>>> administrator.  With seccomp, and with Landlock if done correctly, it
>>>>>> *won't* be set up by the administrator, so the chance that everyone
>>>>>> gets all the flags right is about zero.  All attached filters should
>>>>>> run unconditionally.
>>>>>
>>>>>
>>>>> There is a misunderstanding about this chaining mechanism. This should
>>>>> not be confused with the list of seccomp filters nor the cgroup
>>>>> hierarchies. Landlock programs can be stacked the same way seccomp's
>>>>> filters can (cf. struct landlock_prog_set, the "chain_last" field is an
>>>>> optimization which is not used for this struct handling). This stackable
>>>>> property did not changed from the previous patch series. The chaining
>>>>> mechanism is for another use case, which does not make sense for seccomp
>>>>> filters nor other eBPF program types, at least for now, from what I can
>>>>> tell.
>>>>>
>>>>> You may want to get a look at my talk at FOSDEM
>>>>> (https://landlock.io/talks/2018-02-04_landlock-fosdem.pdf), especially
>>>>> slides 11 and 12.
>>>>>
>>>>> Let me explain my reasoning about this program chaining thing.
>>>>>
>>>>> To check if an action on a file is allowed, we first need to identify
>>>>> this file and match it to the security policy. In a previous
>>>>> (non-public) patch series, I tried to use one type of eBPF program to
>>>>> check every kind of access to a file. To be able to identify a file, I
>>>>> relied on an eBPF map, similar to the current inode map. This map store
>>>>> a set of references to file descriptors. I then created a function
>>>>> bpf_is_file_beneath() to check if the requested file was beneath a file
>>>>> in the map. This way, no chaining, only one eBPF program type to check
>>>>> an access to a file... but some issues then emerged. First, this design
>>>>> create a side-channel which help an attacker using such a program to
>>>>> infer some information not normally available, for example to get a hint
>>>>> on where a file descriptor (received from a UNIX socket) come from.
>>>>> Another issue is that this type of program would be called for each
>>>>> component of a path. Indeed, when the kernel check if an access to a
>>>>> file is allowed, it walk through all of the directories in its path
>>>>> (checking if the current process is allowed to execute them). That first
>>>>> attempt led me to rethink the way we could filter an access to a file
>>>>> *path*.
>>>>>
>>>>> To minimize the number of called to an eBPF program dedicated to
>>>>> validate an access to a file path, I decided to create three subtype of
>>>>> eBPF programs. The FS_WALK type is called when walking through every
>>>>> directory of a file path (except the last one if it is the target). We
>>>>> can then restrict this type of program to the minimum set of functions
>>>>> it is allowed to call and the minimum set of data available from its
>>>>> context. The first implicit chaining is for this type of program. To be
>>>>> able to evaluate a path while being called for all its components, this
>>>>> program need to store a state (to remember what was the parent directory
>>>>> of this path). There is no "previous" field in the subtype for this
>>>>> program because it is chained with itself, for each directories. This
>>>>> enable to create a FS_WALK program to evaluate a file hierarchy, thank
>>>>> to the inode map which can be used to check if a directory of this
>>>>> hierarchy is part of an allowed (or denied) list of directories. This
>>>>> design enables to express a file hierarchy in a programmatic way,
>>>>> without requiring an eBPF helper to do the job (unlike my first experiment).
>>>>>
>>>>> The explicit chaining is used to tied a path evaluation (with a FS_WALK
>>>>> program) to an access to the actual file being requested (the last
>>>>> component of a file path), with a FS_PICK program. It is only at this
>>>>> time that the kernel check for the requested action (e.g. read, write,
>>>>> chdir, append...). To be able to filter such access request we can have
>>>>> one call to the same program for every action and let this program check
>>>>> for which action it was called. However, this design does not allow the
>>>>> kernel to know if the current action is indeed handled by this program.
>>>>> Hence, it is not possible to implement a cache mechanism to only call
>>>>> this program if it knows how to handle this action.
>>>>>
>>>>> The approach I took for this FS_PICK type of program is to add to its
>>>>> subtype which action it can handle (with the "triggers" bitfield, seen
>>>>> as ORed actions). This way, the kernel knows if a call to a FS_PICK
>>>>> program is necessary. If the user wants to enforce a different security
>>>>> policy according to the action requested on a file, then it needs
>>>>> multiple FS_PICK programs. However, to reduce the number of such
>>>>> programs, this patch series allow a FS_PICK program to be chained with
>>>>> another, the same way a FS_WALK is chained with itself. This way, if the
>>>>> user want to check if the action is a for example an "open" and a "read"
>>>>> and not a "map" and a "read", then it can chain multiple FS_PICK
>>>>> programs with different triggers actions. The OR check performed by the
>>>>> kernel is not a limitation then, only a way to know if a call to an eBPF
>>>>> program is needed.
>>>>>
>>>>> The last type of program is FS_GET. This one is called when a process
>>>>> get a struct file or change its working directory. This is the only
>>>>> program type able (and allowed) to tag a file. This restriction is
>>>>> important to not being subject to resource exhaustion attacks (i.e.
>>>>> tagging every inode accessible to an attacker, which would allocate too
>>>>> much kernel memory).
>>>>>
>>>>> This design gives room for improvements to create a cache of eBPF
>>>>> context (input data, including maps if any), with the result of an eBPF
>>>>> program. This would help limit the number of call to an eBPF program the
>>>>> same way SELinux or other kernel components do to limit costly checks.
>>>>>
>>>>> The eBPF maps of progs are useful to call the same type of eBPF
>>>>> program. It does not fit with this use case because we may want multiple
>>>>> eBPF program according to the action requested on a kernel object (e.g.
>>>>> FS_GET). The other reason is because the eBPF program does not know what
>>>>> will be the next (type of) access check performed by the kernel.
>>>>>
>>>>> To say it another way, this chaining mechanism is a way to split a
>>>>> kernel object evaluation with multiple specialized programs, each of
>>>>> them being able to deal with data tied to their type. Using a monolithic
>>>>> eBPF program to check everything does not scale and does not fit with
>>>>> unprivileged use either.
>>>>>
>>>>> As a side note, the cookie value is only an ephemeral value to keep a
>>>>> state between multiple programs call. It can be used to create a state
>>>>> machine for an object evaluation.
>>>>>
>>>>> I don't see a way to do an efficient and programmatic path evaluation,
>>>>> with different access checks, with the current eBPF features. Please let
>>>>> me know if you know how to do it another way.
>>>>>
>>>>
>>>> Andy, Alexei, Daniel, what do you think about this Landlock program
>>>> chaining and cookie?
>>>>
>>>
>>> Can you give a small pseudocode real world example that acutally needs
>>> chaining?  The mechanism is quite complicated and I'd like to
>>> understand how it'll be used.
>>>
>>
>> Here is the interesting part from the example (patch 09/11):
>>
>> +SEC("maps")
>> +struct bpf_map_def inode_map = {
>> +	.type = BPF_MAP_TYPE_INODE,
>> +	.key_size = sizeof(u32),
>> +	.value_size = sizeof(u64),
>> +	.max_entries = 20,
>> +};
>> +
>> +SEC("subtype/landlock1")
>> +static union bpf_prog_subtype _subtype1 = {
>> +	.landlock_hook = {
>> +		.type = LANDLOCK_HOOK_FS_WALK,
>> +	}
>> +};
>> +
>> +static __always_inline __u64 update_cookie(__u64 cookie, __u8 lookup,
>> +		void *inode, void *chain, bool freeze)
>> +{
>> +	__u64 map_allow = 0;
>> +
>> +	if (cookie == 0) {
>> +		cookie = bpf_inode_get_tag(inode, chain);
>> +		if (cookie)
>> +			return cookie;
>> +		/* only look for the first match in the map, ignore nested
>> +		 * paths in this example */
>> +		map_allow = bpf_inode_map_lookup(&inode_map, inode);
>> +		if (map_allow)
>> +			cookie = 1 | map_allow;
>> +	} else {
>> +		if (cookie & COOKIE_VALUE_FREEZED)
>> +			return cookie;
>> +		map_allow = cookie & _MAP_MARK_MASK;
>> +		cookie &= ~_MAP_MARK_MASK;
>> +		switch (lookup) {
>> +		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOTDOT:
>> +			cookie--;
>> +			break;
>> +		case LANDLOCK_CTX_FS_WALK_INODE_LOOKUP_DOT:
>> +			break;
>> +		default:
>> +			/* ignore _MAP_MARK_MASK overflow in this example */
>> +			cookie++;
>> +			break;
>> +		}
>> +		if (cookie >= 1)
>> +			cookie |= map_allow;
>> +	}
>> +	/* do not modify the cookie for each fs_pick */
>> +	if (freeze && cookie)
>> +		cookie |= COOKIE_VALUE_FREEZED;
>> +	return cookie;
>> +}
>> +
>> +SEC("landlock1")
>> +int fs_walk(struct landlock_ctx_fs_walk *ctx)
>> +{
>> +	ctx->cookie = update_cookie(ctx->cookie, ctx->inode_lookup,
>> +			(void *)ctx->inode, (void *)ctx->chain, false);
>> +	return LANDLOCK_RET_ALLOW;
>> +}
>>
>> The program "landlock1" is called for every directory execution (except
>> the last one if it is the leaf of a path). This enables to identify a
>> file hierarchy with only a (one dimension) list of file descriptors
>> (i.e. inode_map).
>>
>> Underneath, the Landlock LSM part looks if there is an associated path
>> walk (nameidata) with each inode access request. If there is one, then
>> the cookie associated with the path walk (if any) is made available
>> through the eBPF program context. This enables to develop a state
>> machine with an eBPF program to "evaluate" a file path (without string
>> parsing).
>>
>> The goal with this chaining mechanism is to be able to express a complex
>> kernel object like a file, with multiple run of one or more eBPF
>> programs, as a multilayer evaluation. This semantic may only make sense
>> for the user/developer and his security policy. We must keep in mind
>> that this object identification should be available to unprivileged
>> processes. This means that we must be very careful to what kind of
>> information are available to an eBPF program because this can then leak
>> to a process (e.g. through a map). With this mechanism, only information
>> already available to user space is available to the eBPF program.
>>
>> In this example, the complexity of the path evaluation is in the eBPF
>> program. We can then keep the kernel code more simple and generic. This
>> enables more flexibility for a security policy definition.
> 
> it all sounds correct on paper, but it's pretty novel
> approach and I'm not sure I see all the details in the patch.
> When people say "inode" they most of the time mean inode integer number,
> whereas in this patch do you mean a raw pointer to in-kernel
> 'struct inode' ?
> To avoid confusion it should probably be called differently.

It's indeed a pointer to a "struct inode", not an inode number.

I was thinking about generalizing the BPF_MAP_TYPE_INODE by renaming it
to BPF_MAP_TYPE_FD. This map type could then be used either to identify
a set of inodes (pointers) or other kernel objects identifiable by a
file descriptor. A "subtype" (similar to the BPF prog subtype introduced
in this patch series) may be used to specialize such a map to statically
identify the kind of content it may hold. We could then add more
subtypes to identify sockets, devices, processes, and so on.

> 
> If you meant inode as a number then why inode only?
> where is superblock, device, mount point?
> How bpf side can compare inodes without this additional info?
> How bpf side will know what inode to compare to?
> What if inode number is reused?

This pointer can identify if a giver inode is the same as one pointed by
a file descriptor (or a file path).


> This approach is an optimization to compare inodes
> instead of strings passed into sys_open ?

Comparing paths with strings is less efficient but it is also very
error-prone. Another advantage of using file descriptors is for
unprivileged processes: we can be sure that this processes are allowed
to access a file referred by a file descriptor (opened file). Indeed we
check (security_inode_getattr) that the process is allowed to stat an
opened file. This way, a malicious process can't infer information by
crafting path strings.


> 
> If you meant inode as a pointer how bpf side will
> know the pointer before the walk begins?

The BPF map is filled by user space with file descriptors pointing to
opened files. When a path walk begin, the LSM part of Landlock is
notified that a process is requesting an access to the first element of
the path (e.g. "/"). This first element may be part of a map or not. The
BPF program can then choose if this request is legitimate or not.


> What guarantees that it's not a stale pointer?

When user space updates a map with a new file descriptor, the kernel
checks if this FD is valid. If this is the case, then the inode's usage
counter is incremented and its address is stored in the map.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2018-04-11 22:18 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-27  0:41 [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata Mickaël Salaün
2018-02-27  0:57   ` Al Viro
2018-02-27  1:23     ` Al Viro
2018-03-11 20:14       ` Mickaël Salaün
2018-02-28 16:27   ` kbuild test robot
2018-02-28 16:58   ` kbuild test robot
2018-02-27  0:41 ` [PATCH bpf-next v8 02/11] fs,security: Add a new file access type: MAY_CHROOT Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 03/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 04/11] bpf,landlock: Define an eBPF program type for Landlock hooks Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy Mickaël Salaün
2018-02-27  2:08   ` Alexei Starovoitov
2018-02-27  4:40     ` Andy Lutomirski
2018-02-27  4:54       ` Alexei Starovoitov
2018-02-27  5:20         ` Andy Lutomirski
2018-02-27  5:32           ` Alexei Starovoitov
2018-02-27 16:39             ` Andy Lutomirski
2018-02-27 17:30               ` Casey Schaufler
2018-02-27 17:36                 ` Andy Lutomirski
2018-02-27 18:03                   ` Casey Schaufler
2018-02-27 21:48               ` Mickaël Salaün
2018-04-08 13:13                 ` Mickaël Salaün
2018-04-08 21:06                   ` Andy Lutomirski
2018-04-08 22:01                     ` Mickaël Salaün
2018-04-10  4:48                       ` Alexei Starovoitov
2018-04-11 22:18                         ` Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode Mickaël Salaün
2018-02-28 17:35   ` kbuild test robot
2018-02-27  0:41 ` [PATCH bpf-next v8 07/11] landlock: Handle filesystem access control Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions Mickaël Salaün
2018-02-27  4:17   ` Andy Lutomirski
2018-02-27  5:01     ` Andy Lutomirski
2018-02-27 22:14       ` Mickaël Salaün
2018-02-27 23:02         ` Andy Lutomirski
2018-02-27 23:23           ` Andy Lutomirski
2018-02-28  0:00             ` Mickaël Salaün
2018-02-28  0:09               ` Andy Lutomirski
2018-03-06 22:28                 ` Mickaël Salaün
2018-04-01 22:48                   ` Mickaël Salaün
2018-02-27 22:18     ` Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 09/11] bpf: Add a Landlock sandbox example Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 10/11] bpf,landlock: Add tests for Landlock Mickaël Salaün
2018-02-27  0:41 ` [PATCH bpf-next v8 11/11] landlock: Add user and kernel documentation " Mickaël Salaün
2018-02-27  4:36 ` [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing Andy Lutomirski
2018-02-27 22:03   ` Mickaël Salaün
2018-02-27 23:09     ` Andy Lutomirski
2018-03-06 22:25       ` Mickaël Salaün
2018-03-06 22:33         ` Andy Lutomirski
2018-03-06 22:46           ` Tycho Andersen
2018-03-06 23:06             ` Mickaël Salaün
2018-03-07  1:21               ` Andy Lutomirski
2018-03-08 23:51                 ` Mickaël Salaün
2018-03-08 23:53                   ` Andy Lutomirski
2018-04-01 22:04                     ` Mickaël Salaün
2018-04-02  0:39                       ` Tycho Andersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).