Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH v14 00/10] Landlock LSM
@ 2020-02-24 16:02 Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
                   ` (12 more replies)
  0 siblings, 13 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

Hi,

This new version of Landlock is a major revamp of the previous series
[1], hence the RFC tag.  The three main changes are the replacement of
eBPF with a dedicated safe management of access rules, the replacement
of the use of seccomp(2) with a dedicated syscall, and the management of
filesystem access-control (back from the v10).

As discussed in [2], eBPF may be too powerful and dangerous to be put in
the hand of unprivileged and potentially malicious processes, especially
because of side-channel attacks against access-controls or other parts
of the kernel.

Thanks to this new implementation (1540 SLOC), designed from the ground
to be used by unprivileged processes, this series enables a process to
sandbox itself without requiring CAP_SYS_ADMIN, but only the
no_new_privs constraint (like seccomp).  Not relying on eBPF also
enables to improve performances, especially for stacked security
policies thanks to mergeable rulesets.

The compiled documentation is available here:
https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html

This series can be applied on top of v5.6-rc3.  This can be tested with
CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK.  This patch series
can be found in a Git repository here:
https://github.com/landlock-lsm/linux/commits/landlock-v14
I would really appreciate constructive comments on the design and the code.


# Landlock LSM

The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes.  Because Landlock is a
stackable LSM [3], it makes possible to create safe security sandboxes
as new security layers in addition to the existing system-wide
access-controls. This kind of sandbox is expected to help mitigate the
security impact of bugs or unexpected/malicious behaviors in user-space
applications. Landlock empower any process, including unprivileged ones,
to securely restrict themselves.

Landlock is inspired by seccomp-bpf but instead of filtering syscalls
and their raw arguments, a Landlock rule can restrict the use of kernel
objects like file hierarchies, according to the kernel semantic.
Landlock also takes inspiration from other OS sandbox mechanisms: XNU
Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.


# Current limitations

## Path walk

Landlock need to use dentries to identify a file hierarchy, which is
needed for composable and unprivileged access-controls. This means that
path resolution/walking (handled with inode_permission()) is not
supported, yet. This could be filled with a future extension first of
the LSM framework. The Landlock userspace ABI can handle such change
with new option (e.g. to the struct landlock_ruleset).

## UnionFS

An UnionFS super-block use a set of upper and lower directories. An
access request to a file in one of these hierarchy trigger a call to
ovl_path_real() which generate another access request according to the
matching hierarchy. Because such super-block is not aware of its current
mount point, OverlayFS can't create a dedicated mnt_parent for each of
the upper and lower directories mount clones. It is then not currently
possible to track the source of such indirect access-request, and then
not possible to identify a unified OverlayFS hierarchy.

## Syscall

Because it is only tested on x86_64, the syscall is only wired up for
this architecture.  The whole x86 family (and probably all the others)
will be supported in the next patch series.


## Memory limits

There is currently no limit on the memory usage.  Any idea to leverage
an existing mechanism (e.g. rlimit)?


# Changes since v13

* Revamp of the LSM: remove the need for eBPF and seccomp(2).
* Implement a full filesystem access-control.
* Take care of the backward compatibility issues, especially for
  this security features.

Previous version:
https://lore.kernel.org/lkml/20191104172146.30797-1-mic@digikod.net/

[1] https://lore.kernel.org/lkml/20191104172146.30797-1-mic@digikod.net/
[2] https://lore.kernel.org/lkml/a6b61f33-82dc-0c1c-7a6c-1926343ef63e@digikod.net/
[3] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/

Regards,

Mickaël Salaün (10):
  landlock: Add object and rule management
  landlock: Add ruleset and domain management
  landlock: Set up the security framework and manage credentials
  landlock: Add ptrace restrictions
  fs,landlock: Support filesystem access-control
  landlock: Add syscall implementation
  arch: Wire up landlock() syscall
  selftests/landlock: Add initial tests
  samples/landlock: Add a sandbox manager example
  landlock: Add user and kernel documentation

 Documentation/security/index.rst              |   1 +
 Documentation/security/landlock/index.rst     |  18 +
 Documentation/security/landlock/kernel.rst    |  44 ++
 Documentation/security/landlock/user.rst      | 233 +++++++
 MAINTAINERS                                   |  12 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 fs/super.c                                    |   2 +
 include/linux/landlock.h                      |  22 +
 include/linux/syscalls.h                      |   3 +
 include/uapi/asm-generic/unistd.h             |   4 +-
 include/uapi/linux/landlock.h                 | 315 +++++++++
 samples/Kconfig                               |   7 +
 samples/Makefile                              |   1 +
 samples/landlock/.gitignore                   |   1 +
 samples/landlock/Makefile                     |  15 +
 samples/landlock/sandboxer.c                  | 226 +++++++
 security/Kconfig                              |  11 +-
 security/Makefile                             |   2 +
 security/landlock/Kconfig                     |  16 +
 security/landlock/Makefile                    |   4 +
 security/landlock/cred.c                      |  47 ++
 security/landlock/cred.h                      |  55 ++
 security/landlock/fs.c                        | 591 +++++++++++++++++
 security/landlock/fs.h                        |  42 ++
 security/landlock/object.c                    | 341 ++++++++++
 security/landlock/object.h                    | 134 ++++
 security/landlock/ptrace.c                    | 118 ++++
 security/landlock/ptrace.h                    |  14 +
 security/landlock/ruleset.c                   | 463 +++++++++++++
 security/landlock/ruleset.h                   | 106 +++
 security/landlock/setup.c                     |  38 ++
 security/landlock/setup.h                     |  20 +
 security/landlock/syscall.c                   | 470 +++++++++++++
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/landlock/.gitignore   |   3 +
 tools/testing/selftests/landlock/Makefile     |  13 +
 tools/testing/selftests/landlock/config       |   4 +
 tools/testing/selftests/landlock/test.h       |  40 ++
 tools/testing/selftests/landlock/test_base.c  |  80 +++
 tools/testing/selftests/landlock/test_fs.c    | 624 ++++++++++++++++++
 .../testing/selftests/landlock/test_ptrace.c  | 293 ++++++++
 41 files changed, 4429 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst
 create mode 100644 include/linux/landlock.h
 create mode 100644 include/uapi/linux/landlock.h
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandboxer.c
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/cred.c
 create mode 100644 security/landlock/cred.h
 create mode 100644 security/landlock/fs.c
 create mode 100644 security/landlock/fs.h
 create mode 100644 security/landlock/object.c
 create mode 100644 security/landlock/object.h
 create mode 100644 security/landlock/ptrace.c
 create mode 100644 security/landlock/ptrace.h
 create mode 100644 security/landlock/ruleset.c
 create mode 100644 security/landlock/ruleset.h
 create mode 100644 security/landlock/setup.c
 create mode 100644 security/landlock/setup.h
 create mode 100644 security/landlock/syscall.c
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/config
 create mode 100644 tools/testing/selftests/landlock/test.h
 create mode 100644 tools/testing/selftests/landlock/test_base.c
 create mode 100644 tools/testing/selftests/landlock/test_fs.c
 create mode 100644 tools/testing/selftests/landlock/test_ptrace.c

-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 01/10] landlock: Add object and rule management
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-25 20:49   ` Jann Horn
  2020-02-24 16:02 ` [RFC PATCH v14 02/10] landlock: Add ruleset and domain management Mickaël Salaün
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

A Landlock object enables to identify a kernel object (e.g. an inode).
A Landlock rule is a set of access rights allowed on an object.  Rules
are grouped in rulesets that may be tied to a set of processes (i.e.
subjects) to enforce a scoped access-control (i.e. a domain).

Because Landlock's goal is to empower any process (especially
unprivileged ones) to sandbox themselves, we can't rely on a system-wide
object identification such as file extended attributes.  Indeed, we need
innocuous, composable and modular access-controls.

The main challenge with this constraints is to identify kernel objects
while this identification is useful (i.e. when a security policy makes
use of this object).  But this identification data should be freed once
no policy is using it.  This ephemeral tagging should not and may not be
written in the filesystem.  We then need to manage the lifetime of a
rule according to the lifetime of its object.  To avoid a global lock,
this implementation make use of RCU and counters to safely reference
objects.

A following commit uses this generic object management for inodes.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v13:
* New dedicated implementation, removing the need for eBPF.

Previous version:
https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/
---
 MAINTAINERS                |  10 ++
 security/Kconfig           |   1 +
 security/Makefile          |   2 +
 security/landlock/Kconfig  |  15 ++
 security/landlock/Makefile |   3 +
 security/landlock/object.c | 339 +++++++++++++++++++++++++++++++++++++
 security/landlock/object.h | 134 +++++++++++++++
 7 files changed, 504 insertions(+)
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/object.c
 create mode 100644 security/landlock/object.h

diff --git a/MAINTAINERS b/MAINTAINERS
index fcd79fc38928..206f85768cd9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9360,6 +9360,16 @@ F:	net/core/skmsg.c
 F:	net/core/sock_map.c
 F:	net/ipv4/tcp_bpf.c
 
+LANDLOCK SECURITY MODULE
+M:	Mickaël Salaün <mic@digikod.net>
+L:	linux-security-module@vger.kernel.org
+W:	https://landlock.io
+T:	git https://github.com/landlock-lsm/linux.git
+S:	Supported
+F:	security/landlock/
+K:	landlock
+K:	LANDLOCK
+
 LANTIQ / INTEL Ethernet drivers
 M:	Hauke Mehrtens <hauke@hauke-m.de>
 L:	netdev@vger.kernel.org
diff --git a/security/Kconfig b/security/Kconfig
index 2a1a2d396228..9d9981394fb0 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -238,6 +238,7 @@ source "security/loadpin/Kconfig"
 source "security/yama/Kconfig"
 source "security/safesetid/Kconfig"
 source "security/lockdown/Kconfig"
+source "security/landlock/Kconfig"
 
 source "security/integrity/Kconfig"
 
diff --git a/security/Makefile b/security/Makefile
index 746438499029..2472ef96d40a 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -12,6 +12,7 @@ subdir-$(CONFIG_SECURITY_YAMA)		+= yama
 subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
 subdir-$(CONFIG_SECURITY_SAFESETID)    += safesetid
 subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown
+subdir-$(CONFIG_SECURITY_LANDLOCK)		+= landlock
 
 # always enable default capabilities
 obj-y					+= commoncap.o
@@ -29,6 +30,7 @@ obj-$(CONFIG_SECURITY_YAMA)		+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
 obj-$(CONFIG_SECURITY_SAFESETID)       += safesetid/
 obj-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown/
+obj-$(CONFIG_SECURITY_LANDLOCK)	+= landlock/
 obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
 
 # Object integrity file lists
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index 000000000000..4a321d5b3f67
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config SECURITY_LANDLOCK
+	bool "Landlock support"
+	depends on SECURITY
+	default n
+	help
+	  This selects Landlock, a safe sandboxing mechanism.  It enables to
+	  restrict processes on the fly (i.e. enforce an access control policy),
+	  which can complement seccomp-bpf.  The security policy is a set of access
+	  rights tied to an object, which could be a file, a socket or a process.
+
+	  See Documentation/security/landlock/ for further information.
+
+	  If you are unsure how to answer this question, answer N.
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
new file mode 100644
index 000000000000..cb6deefbf4c0
--- /dev/null
+++ b/security/landlock/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
+
+landlock-y := object.o
diff --git a/security/landlock/object.c b/security/landlock/object.c
new file mode 100644
index 000000000000..38fbbb108120
--- /dev/null
+++ b/security/landlock/object.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Object and rule management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ *
+ * Principles and constraints of the object and rule management:
+ * - Do not leak memory.
+ * - Try as much as possible to free a memory allocation as soon as it is
+ *   unused.
+ * - Do not use global lock.
+ * - Do not charge processes other than the one requesting a Landlock
+ *   operation.
+ */
+
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/compiler_types.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+
+#include "object.h"
+
+struct landlock_object *landlock_create_object(
+		const enum landlock_object_type type, void *underlying_object)
+{
+	struct landlock_object *object;
+
+	if (WARN_ON_ONCE(!underlying_object))
+		return NULL;
+	object = kzalloc(sizeof(*object), GFP_KERNEL);
+	if (!object)
+		return NULL;
+	refcount_set(&object->usage, 1);
+	refcount_set(&object->cleaners, 1);
+	spin_lock_init(&object->lock);
+	INIT_LIST_HEAD(&object->rules);
+	object->type = type;
+	WRITE_ONCE(object->underlying_object, underlying_object);
+	return object;
+}
+
+struct landlock_object *landlock_get_object(struct landlock_object *object)
+	__acquires(object->usage)
+{
+	__acquire(object->usage);
+	/*
+	 * If @object->usage equal 0, then it will be ignored by writers, and
+	 * underlying_object->object may be replaced, but this is not an issue
+	 * for release_object().
+	 */
+	if (object && refcount_inc_not_zero(&object->usage)) {
+		/*
+		 * It should not be possible to get a reference to an object if
+		 * its underlying object is being terminated (e.g. with
+		 * landlock_release_object()), because an object is only
+		 * modifiable through such underlying object.  This is not the
+		 * case with landlock_get_object_cleaner().
+		 */
+		WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
+		return object;
+	}
+	return NULL;
+}
+
+static struct landlock_object *get_object_cleaner(
+		struct landlock_object *object)
+	__acquires(object->cleaners)
+{
+	__acquire(object->cleaners);
+	if (object && refcount_inc_not_zero(&object->cleaners))
+		return object;
+	return NULL;
+}
+
+/*
+ * There is two cases when an object should be free and the reference to the
+ * underlying object should be put:
+ * - when the last rule tied to this object is removed, which is handled by
+ *   landlock_put_rule() and then release_object();
+ * - when the object is being terminated (e.g. no more reference to an inode),
+ *   which is handled by landlock_put_object().
+ */
+static void put_object_free(struct landlock_object *object)
+	__releases(object->cleaners)
+{
+	__release(object->cleaners);
+	if (!refcount_dec_and_test(&object->cleaners))
+		return;
+	WARN_ON_ONCE(refcount_read(&object->usage));
+	/*
+	 * Ensures a safe use of @object in the RCU block from
+	 * landlock_put_rule().
+	 */
+	kfree_rcu(object, rcu_free);
+}
+
+/*
+ * Destroys a newly created and useless object.
+ */
+void landlock_drop_object(struct landlock_object *object)
+{
+	if (WARN_ON_ONCE(!refcount_dec_and_test(&object->usage)))
+		return;
+	__acquire(object->cleaners);
+	put_object_free(object);
+}
+
+/*
+ * Puts the underlying object (e.g. inode) if it is the first request to
+ * release @object, without calling landlock_put_object().
+ *
+ * Return true if this call effectively marks @object as released, false
+ * otherwise.
+ */
+static bool release_object(struct landlock_object *object)
+	__releases(&object->lock)
+{
+	void *underlying_object;
+
+	lockdep_assert_held(&object->lock);
+
+	underlying_object = xchg(&object->underlying_object, NULL);
+	spin_unlock(&object->lock);
+	might_sleep();
+	if (!underlying_object)
+		return false;
+
+	switch (object->type) {
+	case LANDLOCK_OBJECT_INODE:
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+	return true;
+}
+
+static void put_object_cleaner(struct landlock_object *object)
+	__releases(object->cleaners)
+{
+	/* Let's try an early lockless check. */
+	if (list_empty(&object->rules) &&
+			READ_ONCE(object->underlying_object)) {
+		/*
+		 * Puts @object if there is no rule tied to it and the
+		 * remaining user is the underlying object.  This check is
+		 * atomic because @object->rules and @object->underlying_object
+		 * are protected by @object->lock.
+		 */
+		spin_lock(&object->lock);
+		if (list_empty(&object->rules) &&
+				READ_ONCE(object->underlying_object) &&
+				refcount_dec_if_one(&object->usage)) {
+			/*
+			 * Releases @object, in place of
+			 * landlock_release_object().
+			 *
+			 * @object is already empty, implying that all its
+			 * previous rules are already disabled.
+			 *
+			 * Unbalance the @object->cleaners counter to reflect
+			 * the underlying object release.
+			 */
+			if (!WARN_ON_ONCE(!release_object(object))) {
+				__acquire(object->cleaners);
+				put_object_free(object);
+			}
+		} else {
+			spin_unlock(&object->lock);
+		}
+	}
+	put_object_free(object);
+}
+
+/*
+ * Putting an object is easy when the object is being terminated, but it is
+ * much more tricky when the reason is that there is no more rule tied to this
+ * object.  Indeed, new rules could be added at the same time.
+ */
+void landlock_put_object(struct landlock_object *object)
+	__releases(object->usage)
+{
+	struct landlock_object *object_cleaner;
+
+	__release(object->usage);
+	might_sleep();
+	if (!object)
+		return;
+	/*
+	 * Guards against concurrent termination to be able to terminate
+	 * @object if it is empty and not referenced by another rule-appender
+	 * other than the underlying object.
+	 */
+	object_cleaner = get_object_cleaner(object);
+	if (WARN_ON_ONCE(!object_cleaner)) {
+		__release(object->cleaners);
+		return;
+	}
+	/*
+	 * Decrements @object->usage and if it reach zero, also decrement
+	 * @object->cleaners.  If both reach zero, then release and free
+	 * @object.
+	 */
+	if (refcount_dec_and_test(&object->usage)) {
+		struct landlock_rule *rule_walker, *rule_walker2;
+
+		spin_lock(&object->lock);
+		/*
+		 * Disables all the rules tied to @object when it is forbidden
+		 * to add new rule but still allowed to remove them with
+		 * landlock_put_rule().  This is crucial to be able to safely
+		 * free a rule according to landlock_rule_is_disabled().
+		 */
+		list_for_each_entry_safe(rule_walker, rule_walker2,
+				&object->rules, list)
+			list_del_rcu(&rule_walker->list);
+
+		/*
+		 * Releases @object if it is not already released (e.g. with
+		 * landlock_release_object()).
+		 */
+		release_object(object);
+		/*
+		 * Unbalances the @object->cleaners counter to reflect the
+		 * underlying object release.
+		 */
+		__acquire(object->cleaners);
+		put_object_free(object);
+	}
+	put_object_cleaner(object_cleaner);
+}
+
+void landlock_put_rule(struct landlock_object *object,
+		struct landlock_rule *rule)
+{
+	if (!rule)
+		return;
+	WARN_ON_ONCE(!object);
+	/*
+	 * Guards against a concurrent @object self-destruction with
+	 * landlock_put_object() or put_object_cleaner().
+	 */
+	rcu_read_lock();
+	if (landlock_rule_is_disabled(rule)) {
+		rcu_read_unlock();
+		if (refcount_dec_and_test(&rule->usage))
+			kfree_rcu(rule, rcu_free);
+		return;
+	}
+	if (refcount_dec_and_test(&rule->usage)) {
+		struct landlock_object *safe_object;
+
+		/*
+		 * Now, @rule may still be enabled, or in the process of being
+		 * untied to @object by put_object_cleaner().  However, we know
+		 * that @object will not be freed until rcu_read_unlock() and
+		 * until @object->cleaners reach zero.  Furthermore, we may not
+		 * be the only one willing to free a @rule linked with @object.
+		 * If we succeed to hold @object with get_object_cleaner(), we
+		 * know that until put_object_cleaner(), we can safely use
+		 * @object to remove @rule.
+		 */
+		safe_object = get_object_cleaner(object);
+		rcu_read_unlock();
+		if (!safe_object) {
+			__release(safe_object->cleaners);
+			/*
+			 * We can safely free @rule because it is already
+			 * removed from @object's list.
+			 */
+			WARN_ON_ONCE(!landlock_rule_is_disabled(rule));
+			kfree_rcu(rule, rcu_free);
+		} else {
+			spin_lock(&safe_object->lock);
+			if (!landlock_rule_is_disabled(rule))
+				list_del(&rule->list);
+			spin_unlock(&safe_object->lock);
+			kfree_rcu(rule, rcu_free);
+			put_object_cleaner(safe_object);
+		}
+	} else {
+		rcu_read_unlock();
+	}
+	/*
+	 * put_object_cleaner() might sleep, but it is only reachable if
+	 * !landlock_rule_is_disabled().  Therefore, clean_ref() can not sleep.
+	 */
+	might_sleep();
+}
+
+void landlock_release_object(struct landlock_object __rcu *rcu_object)
+{
+	struct landlock_object *object;
+
+	if (!rcu_object)
+		return;
+	rcu_read_lock();
+	object = get_object_cleaner(rcu_dereference(rcu_object));
+	rcu_read_unlock();
+	if (unlikely(!object)) {
+		__release(object->cleaners);
+		return;
+	}
+	/*
+	 * Makes sure that the underlying object never point to a freed object
+	 * by firstly releasing the object (i.e. NULL the reference to it) to
+	 * be sure no one could get a new reference to it while it is being
+	 * terminated.  Secondly, put the object globally (e.g. for the
+	 * super-block).
+	 *
+	 * This can run concurrently with put_object_cleaner(), which may try
+	 * to release @object as well.
+	 */
+	spin_lock(&object->lock);
+	if (release_object(object)) {
+		/*
+		 * Unbalances the object to reflect the underlying object
+		 * release.
+		 */
+		__acquire(object->usage);
+		landlock_put_object(object);
+	}
+	/*
+	 * If a concurrent thread is adding a new rule, the object will be free
+	 * at the end of this rule addition, otherwise it will be free with the
+	 * following put_object_cleaner() or a remaining one.
+	 */
+	put_object_cleaner(object);
+}
diff --git a/security/landlock/object.h b/security/landlock/object.h
new file mode 100644
index 000000000000..15dfc9a75a82
--- /dev/null
+++ b/security/landlock/object.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Object and rule management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_OBJECT_H
+#define _SECURITY_LANDLOCK_OBJECT_H
+
+#include <linux/compiler_types.h>
+#include <linux/list.h>
+#include <linux/poison.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/spinlock.h>
+
+struct landlock_access {
+	/*
+	 * @self: Bitfield of allowed actions on the kernel object.  They are
+	 * relative to the object type (e.g. LANDLOCK_ACTION_FS_READ).
+	 */
+	u32 self;
+	/*
+	 * @beneath: Same as @self, but for the child objects (e.g. a file in a
+	 * directory).
+	 */
+	u32 beneath;
+};
+
+struct landlock_rule {
+	struct landlock_access access;
+	/*
+	 * @list: Linked list with other rules tied to the same object, which
+	 * enable to manage their lifetimes.  This is also used to identify if
+	 * a rule is still valid, thanks to landlock_rule_is_disabled(), which
+	 * is important in the matching process because the original object
+	 * address might have been recycled.
+	 */
+	struct list_head list;
+	union {
+		/*
+		 * @usage: Number of rulesets pointing to this rule.  This
+		 * field is never used by RCU readers.
+		 */
+		refcount_t usage;
+		struct rcu_head rcu_free;
+	};
+};
+
+enum landlock_object_type {
+	LANDLOCK_OBJECT_INODE = 1,
+};
+
+struct landlock_object {
+	/*
+	 * @usage: Main usage counter, used to tie an object to it's underlying
+	 * object (i.e. create a lifetime) and potentially add new rules.
+	 */
+	refcount_t usage;
+	/*
+	 * @cleaners: Usage counter used to free a rule from @rules (thanks to
+	 * put_rule()).  Enables to get a reference to this object until it
+	 * really become freed.  Cf. put_object().
+	 */
+	refcount_t cleaners;
+	union {
+		/*
+		 * The use of this struct is controlled by @usage and
+		 * @cleaners, which makes it safe to union it with @rcu_free.
+		 */
+		struct {
+			/*
+			 * @underlying_object: Used when cleaning up an object
+			 * and to mark an object as tied to its underlying
+			 * kernel structure.  It must then be atomically read
+			 * using READ_ONCE().
+			 *
+			 * The one who clear @underlying_object must:
+			 * 1. clear the object self-reference and
+			 * 2. decrement @usage (and potentially free the
+			 *    object).
+			 *
+			 * Cf. clean_object().
+			 */
+			void *underlying_object;
+			/*
+			 * @type: Only used when cleaning up an object.
+			 */
+			enum landlock_object_type type;
+			spinlock_t lock;
+			/*
+			 * @rules: List of struct landlock_rule linked with
+			 * their "list" field.  This list is only accessed when
+			 * updating the list (to be able to clean up later)
+			 * while holding @lock.
+			 */
+			struct list_head rules;
+		};
+		struct rcu_head rcu_free;
+	};
+};
+
+void landlock_put_rule(struct landlock_object *object,
+		struct landlock_rule *rule);
+
+void landlock_release_object(struct landlock_object __rcu *rcu_object);
+
+struct landlock_object *landlock_create_object(
+		const enum landlock_object_type type, void *underlying_object);
+
+struct landlock_object *landlock_get_object(struct landlock_object *object)
+	__acquires(object->usage);
+
+void landlock_put_object(struct landlock_object *object)
+	__releases(object->usage);
+
+void landlock_drop_object(struct landlock_object *object);
+
+static inline bool landlock_rule_is_disabled(
+		struct landlock_rule *rule)
+{
+	/*
+	 * Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
+	 * It is not possible to re-enable such a rule, then there is no need
+	 * for smp_load_acquire().
+	 *
+	 * LIST_POISON2 is set by list_del() and list_del_rcu().
+	 */
+	return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
+}
+
+#endif /* _SECURITY_LANDLOCK_OBJECT_H */
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 02/10] landlock: Add ruleset and domain management
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 03/10] landlock: Set up the security framework and manage credentials Mickaël Salaün
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

A Landlock ruleset is mainly a red-black tree with Landlock rules as
nodes.  This enables quick update and lookup to match a requested access
e.g., to a file.  A ruleset is usable through a dedicated file
descriptor (cf. following commit adding the syscall) which enables a
process to build it by adding new rules.

A domain is a ruleset tied to a set of processes.  This group of rules
defined the security policy enforced on these processes and their future
children.  A domain can transition to a new domain which is the merge of
itself with a ruleset provided by the current process.  This merge is
the intersection of all the constraints, which means that a process can
only gain more constraints over time.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v13:
* New implementation, inspired by the previous inode eBPF map, but
  agnostic to the underlying kernel object.

Previous version:
https://lore.kernel.org/lkml/20190721213116.23476-7-mic@digikod.net/
---
 MAINTAINERS                   |   1 +
 include/uapi/linux/landlock.h | 102 ++++++++
 security/landlock/Makefile    |   2 +-
 security/landlock/ruleset.c   | 460 ++++++++++++++++++++++++++++++++++
 security/landlock/ruleset.h   | 106 ++++++++
 5 files changed, 670 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/landlock.h
 create mode 100644 security/landlock/ruleset.c
 create mode 100644 security/landlock/ruleset.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 206f85768cd9..937257925e65 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9366,6 +9366,7 @@ L:	linux-security-module@vger.kernel.org
 W:	https://landlock.io
 T:	git https://github.com/landlock-lsm/linux.git
 S:	Supported
+F:	include/uapi/linux/landlock.h
 F:	security/landlock/
 K:	landlock
 K:	LANDLOCK
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
new file mode 100644
index 000000000000..92760aca3645
--- /dev/null
+++ b/include/uapi/linux/landlock.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Landlock - UAPI headers
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _UAPI__LINUX_LANDLOCK_H__
+#define _UAPI__LINUX_LANDLOCK_H__
+
+/**
+ * DOC: fs_access
+ *
+ * A set of actions on kernel objects may be defined by an attribute (e.g.
+ * &struct landlock_attr_path_beneath) and a bitmask of access.
+ *
+ * Filesystem flags
+ * ~~~~~~~~~~~~~~~~
+ *
+ * These flags enable to restrict a sandbox process to a set of of actions on
+ * files and directories.  Files or directories opened before the sandboxing
+ * are not subject to these restrictions.
+ *
+ * - %LANDLOCK_ACCESS_FS_READ: Open or map a file with read access.
+ * - %LANDLOCK_ACCESS_FS_READDIR: List the content of a directory.
+ * - %LANDLOCK_ACCESS_FS_GETATTR: Read metadata of a file or a directory.
+ * - %LANDLOCK_ACCESS_FS_WRITE: Write to a file.
+ * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file.
+ * - %LANDLOCK_ACCESS_FS_LOCK: Lock a file.
+ * - %LANDLOCK_ACCESS_FS_CHMOD: Change DAC permissions on a file or a
+ *   directory.
+ * - %LANDLOCK_ACCESS_FS_CHOWN: Change the owner of a file or a directory.
+ * - %LANDLOCK_ACCESS_FS_CHGRP: Change the group of a file or a directory.
+ * - %LANDLOCK_ACCESS_FS_IOCTL: Send various command to a special file, cf.
+ *   :manpage:`ioctl(2)`.
+ * - %LANDLOCK_ACCESS_FS_LINK_TO: Link a file into a directory.
+ * - %LANDLOCK_ACCESS_FS_RENAME_FROM: Rename a file or a directory.
+ * - %LANDLOCK_ACCESS_FS_RENAME_TO: Rename a file or a directory.
+ * - %LANDLOCK_ACCESS_FS_RMDIR: Remove an empty directory.
+ * - %LANDLOCK_ACCESS_FS_UNLINK: Remove a file.
+ * - %LANDLOCK_ACCESS_FS_MAKE_CHAR: Create a character device.
+ * - %LANDLOCK_ACCESS_FS_MAKE_DIR: Create a directory.
+ * - %LANDLOCK_ACCESS_FS_MAKE_REG: Create a regular file.
+ * - %LANDLOCK_ACCESS_FS_MAKE_SOCK: Create a UNIX domain socket.
+ * - %LANDLOCK_ACCESS_FS_MAKE_FIFO: Create a named pipe.
+ * - %LANDLOCK_ACCESS_FS_MAKE_BLOCK: Create a block device.
+ * - %LANDLOCK_ACCESS_FS_MAKE_SYM: Create a symbolic link.
+ * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
+ * - %LANDLOCK_ACCESS_FS_CHROOT: Change the root directory of the current
+ *   process.
+ * - %LANDLOCK_ACCESS_FS_OPEN: Open a file or a directory.  This flag is set
+ *   for any actions (e.g. read, write, execute) requested to open a file or
+ *   directory.
+ * - %LANDLOCK_ACCESS_FS_MAP: Map a file.  This flag is set for any actions
+ *   (e.g. read, write, execute) requested to map a file.
+ *
+ * There is currently no restriction for directory walking e.g.,
+ * :manpage:`chdir(2)`.
+ */
+#define LANDLOCK_ACCESS_FS_READ			(1ULL << 0)
+#define LANDLOCK_ACCESS_FS_READDIR		(1ULL << 1)
+#define LANDLOCK_ACCESS_FS_GETATTR		(1ULL << 2)
+#define LANDLOCK_ACCESS_FS_WRITE		(1ULL << 3)
+#define LANDLOCK_ACCESS_FS_TRUNCATE		(1ULL << 4)
+#define LANDLOCK_ACCESS_FS_LOCK			(1ULL << 5)
+#define LANDLOCK_ACCESS_FS_CHMOD		(1ULL << 6)
+#define LANDLOCK_ACCESS_FS_CHOWN		(1ULL << 7)
+#define LANDLOCK_ACCESS_FS_CHGRP		(1ULL << 8)
+#define LANDLOCK_ACCESS_FS_IOCTL		(1ULL << 9)
+#define LANDLOCK_ACCESS_FS_LINK_TO		(1ULL << 10)
+#define LANDLOCK_ACCESS_FS_RENAME_FROM		(1ULL << 11)
+#define LANDLOCK_ACCESS_FS_RENAME_TO		(1ULL << 12)
+#define LANDLOCK_ACCESS_FS_RMDIR		(1ULL << 13)
+#define LANDLOCK_ACCESS_FS_UNLINK		(1ULL << 14)
+#define LANDLOCK_ACCESS_FS_MAKE_CHAR		(1ULL << 15)
+#define LANDLOCK_ACCESS_FS_MAKE_DIR		(1ULL << 16)
+#define LANDLOCK_ACCESS_FS_MAKE_REG		(1ULL << 17)
+#define LANDLOCK_ACCESS_FS_MAKE_SOCK		(1ULL << 18)
+#define LANDLOCK_ACCESS_FS_MAKE_FIFO		(1ULL << 19)
+#define LANDLOCK_ACCESS_FS_MAKE_BLOCK		(1ULL << 20)
+#define LANDLOCK_ACCESS_FS_MAKE_SYM		(1ULL << 21)
+#define LANDLOCK_ACCESS_FS_EXECUTE		(1ULL << 22)
+#define LANDLOCK_ACCESS_FS_CHROOT		(1ULL << 23)
+#define LANDLOCK_ACCESS_FS_OPEN			(1ULL << 24)
+#define LANDLOCK_ACCESS_FS_MAP			(1ULL << 25)
+
+/*
+ * Potential future access:
+ * - %LANDLOCK_ACCESS_FS_SETATTR
+ * - %LANDLOCK_ACCESS_FS_APPEND
+ * - %LANDLOCK_ACCESS_FS_LINK_FROM
+ * - %LANDLOCK_ACCESS_FS_MOUNT_FROM
+ * - %LANDLOCK_ACCESS_FS_MOUNT_TO
+ * - %LANDLOCK_ACCESS_FS_UNMOUNT
+ * - %LANDLOCK_ACCESS_FS_TRANSFER
+ * - %LANDLOCK_ACCESS_FS_RECEIVE
+ * - %LANDLOCK_ACCESS_FS_CHDIR
+ * - %LANDLOCK_ACCESS_FS_FCNTL
+ */
+
+#endif /* _UAPI__LINUX_LANDLOCK_H__ */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index cb6deefbf4c0..d846eba445bb 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := object.o
+landlock-y := object.o ruleset.o
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
new file mode 100644
index 000000000000..5ec013a4188d
--- /dev/null
+++ b/security/landlock/ruleset.c
@@ -0,0 +1,460 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Ruleset management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+
+#include "object.h"
+#include "ruleset.h"
+
+static struct landlock_ruleset *create_ruleset(void)
+{
+	struct landlock_ruleset *ruleset;
+
+	ruleset = kzalloc(sizeof(*ruleset), GFP_KERNEL);
+	if (!ruleset)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&ruleset->usage, 1);
+	mutex_init(&ruleset->lock);
+	atomic_set(&ruleset->nb_rules, 0);
+	ruleset->root = RB_ROOT;
+	return ruleset;
+}
+
+struct landlock_ruleset *landlock_create_ruleset(u64 fs_access_mask)
+{
+	struct landlock_ruleset *ruleset;
+
+	/* Safely handles 32-bits conversion. */
+	BUILD_BUG_ON(!__same_type(fs_access_mask, _LANDLOCK_ACCESS_FS_LAST));
+
+	/* Checks content. */
+	if ((fs_access_mask | _LANDLOCK_ACCESS_FS_MASK) !=
+			_LANDLOCK_ACCESS_FS_MASK)
+		return ERR_PTR(-EINVAL);
+	/* Informs about useless ruleset. */
+	if (!fs_access_mask)
+		return ERR_PTR(-ENOMSG);
+	ruleset = create_ruleset();
+	if (!IS_ERR(ruleset))
+		ruleset->fs_access_mask = fs_access_mask;
+	return ruleset;
+}
+
+/*
+ * The underlying kernel object must be held by the caller.
+ */
+static struct landlock_ruleset_elem *create_ruleset_elem(
+		struct landlock_object *object)
+{
+	struct landlock_ruleset_elem *ruleset_elem;
+
+	ruleset_elem = kzalloc(sizeof(*ruleset_elem), GFP_KERNEL);
+	if (!ruleset_elem)
+		return ERR_PTR(-ENOMEM);
+	RB_CLEAR_NODE(&ruleset_elem->node);
+	RCU_INIT_POINTER(ruleset_elem->ref.object, object);
+	return ruleset_elem;
+}
+
+static struct landlock_rule *create_rule(struct landlock_object *object,
+		struct landlock_access *access)
+{
+	struct landlock_rule *new_rule;
+
+	if (WARN_ON_ONCE(!object))
+		return ERR_PTR(-EFAULT);
+	if (WARN_ON_ONCE(!access))
+		return ERR_PTR(-EFAULT);
+	new_rule = kzalloc(sizeof(*new_rule), GFP_KERNEL);
+	if (!new_rule)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&new_rule->usage, 1);
+	INIT_LIST_HEAD(&new_rule->list);
+	new_rule->access = *access;
+
+	spin_lock(&object->lock);
+	list_add_tail(&new_rule->list, &object->rules);
+	spin_unlock(&object->lock);
+	return new_rule;
+}
+
+/*
+ * An inserted rule can not be removed, only disabled (cf. struct
+ * landlock_ruleset_elem).
+ *
+ * The underlying kernel object must be held by the caller.
+ *
+ * @rule: Allocated struct owned by this function. The caller must hold the
+ * underlying kernel object (e.g., with a FD).
+ */
+int landlock_insert_ruleset_rule(struct landlock_ruleset *ruleset,
+		struct landlock_object *object, struct landlock_access *access,
+		struct landlock_rule *rule)
+{
+	struct rb_node **new;
+	struct rb_node *parent = NULL;
+	struct landlock_ruleset_elem *ruleset_elem;
+	struct landlock_rule *new_rule;
+
+	might_sleep();
+	/* Accesses may be set when creating a new rule. */
+	if (rule) {
+		if (WARN_ON_ONCE(access))
+			return -EINVAL;
+	} else {
+		if (WARN_ON_ONCE(!access))
+			return -EFAULT;
+	}
+
+	lockdep_assert_held(&ruleset->lock);
+	new = &(ruleset->root.rb_node);
+	while (*new) {
+		struct landlock_ruleset_elem *this = rb_entry(*new,
+				struct landlock_ruleset_elem, node);
+		uintptr_t this_object;
+		struct landlock_rule *this_rule;
+		struct landlock_access new_access;
+
+		this_object = (uintptr_t)rcu_access_pointer(this->ref.object);
+		if (this_object != (uintptr_t)object) {
+			parent = *new;
+			if (this_object < (uintptr_t)object)
+				new = &((*new)->rb_right);
+			else
+				new = &((*new)->rb_left);
+			continue;
+		}
+
+		/* Do not increment ruleset->nb_rules. */
+		this_rule = rcu_dereference_protected(this->ref.rule,
+				lockdep_is_held(&ruleset->lock));
+		/*
+		 * Checks if it is a new object with the same address as a
+		 * previously disabled one.  There is no possible race
+		 * condition because an object can not be disabled/deleted
+		 * while being inserted in this tree.
+		 */
+		if (landlock_rule_is_disabled(this_rule)) {
+			if (rule) {
+				refcount_inc(&rule->usage);
+				new_rule = rule;
+			} else {
+				/* Replace the previous rule with a new one. */
+				new_rule = create_rule(object, access);
+				if (IS_ERR(new_rule))
+					return PTR_ERR(new_rule);
+			}
+			rcu_assign_pointer(this->ref.rule, new_rule);
+			landlock_put_rule(object, this_rule);
+			return 0;
+		}
+
+		/* this_rule is potentially enabled. */
+		if (refcount_read(&this_rule->usage) == 1) {
+			if (rule) {
+				/* merge rule: intersection of access rights */
+				this_rule->access.self &= rule->access.self;
+				this_rule->access.beneath &=
+					rule->access.beneath;
+			} else {
+				/* extend rule: union of access rights */
+				this_rule->access.self |= access->self;
+				this_rule->access.beneath |= access->beneath;
+			}
+			return 0;
+		}
+
+		/*
+		 * If this_rule is shared with another ruleset, then create a
+		 * new object rule.
+		 */
+		if (rule) {
+			/* Merging a rule means an intersection of access. */
+			new_access.self = this_rule->access.self &
+				rule->access.self;
+			new_access.beneath = this_rule->access.beneath &
+				rule->access.beneath;
+		} else {
+			/* Extending a rule means a union of access. */
+			new_access.self = this_rule->access.self |
+				access->self;
+			new_access.beneath = this_rule->access.self |
+				access->beneath;
+		}
+		new_rule = create_rule(object, &new_access);
+		if (IS_ERR(new_rule))
+			return PTR_ERR(new_rule);
+		rcu_assign_pointer(this->ref.rule, new_rule);
+		landlock_put_rule(object, this_rule);
+		return 0;
+	}
+
+	/* There is no match for @object. */
+	ruleset_elem = create_ruleset_elem(object);
+	if (IS_ERR(ruleset_elem))
+		return PTR_ERR(ruleset_elem);
+	if (rule) {
+		refcount_inc(&rule->usage);
+		new_rule = rule;
+	} else {
+		new_rule = create_rule(object, access);
+		if (IS_ERR(new_rule)) {
+			kfree(ruleset_elem);
+			return PTR_ERR(new_rule);
+		}
+	}
+	RCU_INIT_POINTER(ruleset_elem->ref.rule, new_rule);
+	/*
+	 * Because of the missing RCU context annotation in struct rb_node,
+	 * Sparse emits a warning when encountering rb_link_node_rcu(), but
+	 * this function call is still safe.
+	 */
+	rb_link_node_rcu(&ruleset_elem->node, parent, new);
+	rb_insert_color(&ruleset_elem->node, &ruleset->root);
+	atomic_inc(&ruleset->nb_rules);
+	return 0;
+}
+
+static int merge_ruleset(struct landlock_ruleset *dst,
+		struct landlock_ruleset *src)
+{
+	struct rb_node *node;
+	int err = 0;
+
+	might_sleep();
+	if (!src)
+		return 0;
+	if (WARN_ON_ONCE(!dst))
+		return -EFAULT;
+	if (WARN_ON_ONCE(!dst->hierarchy))
+		return -EINVAL;
+
+	mutex_lock(&dst->lock);
+	mutex_lock_nested(&src->lock, 1);
+	dst->fs_access_mask |= src->fs_access_mask;
+	for (node = rb_first(&src->root); node; node = rb_next(node)) {
+		struct landlock_ruleset_elem *elem = rb_entry(node,
+				struct landlock_ruleset_elem, node);
+		struct landlock_object *object =
+			rcu_dereference_protected(elem->ref.object,
+					lockdep_is_held(&src->lock));
+		struct landlock_rule *rule =
+			rcu_dereference_protected(elem->ref.rule,
+					lockdep_is_held(&src->lock));
+
+		err = landlock_insert_ruleset_rule(dst, object, NULL, rule);
+		if (err)
+			goto out_unlock;
+	}
+
+out_unlock:
+	mutex_unlock(&src->lock);
+	mutex_unlock(&dst->lock);
+	return err;
+}
+
+void landlock_get_ruleset(struct landlock_ruleset *ruleset)
+{
+	if (!ruleset)
+		return;
+	refcount_inc(&ruleset->usage);
+}
+
+static void put_hierarchy(struct landlock_hierarchy *hierarchy)
+{
+	if (hierarchy && refcount_dec_and_test(&hierarchy->usage))
+		kfree(hierarchy);
+}
+
+static void put_ruleset(struct landlock_ruleset *ruleset)
+{
+	struct rb_node *orig;
+
+	might_sleep();
+	for (orig = rb_first(&ruleset->root); orig; orig = rb_next(orig)) {
+		struct landlock_ruleset_elem *freeme;
+		struct landlock_object *object;
+		struct landlock_rule *rule;
+
+		freeme = rb_entry(orig, struct landlock_ruleset_elem, node);
+		object = rcu_dereference_protected(freeme->ref.object,
+				refcount_read(&ruleset->usage) == 0);
+		rule = rcu_dereference_protected(freeme->ref.rule,
+				refcount_read(&ruleset->usage) == 0);
+		landlock_put_rule(object, rule);
+		kfree_rcu(freeme, rcu_free);
+	}
+	put_hierarchy(ruleset->hierarchy);
+	kfree_rcu(ruleset, rcu_free);
+}
+
+void landlock_put_ruleset(struct landlock_ruleset *ruleset)
+{
+	might_sleep();
+	if (ruleset && refcount_dec_and_test(&ruleset->usage))
+		put_ruleset(ruleset);
+}
+
+static void put_ruleset_work(struct work_struct *work)
+{
+	struct landlock_ruleset *ruleset;
+
+	ruleset = container_of(work, struct landlock_ruleset, work_put);
+	/*
+	 * Clean up rcu_free because of previous use through union work_put.
+	 * ruleset->rcu_free.func is already NULLed by __rcu_reclaim().
+	 */
+	ruleset->rcu_free.next = NULL;
+	put_ruleset(ruleset);
+}
+
+void landlock_put_ruleset_enqueue(struct landlock_ruleset *ruleset)
+{
+	if (ruleset && refcount_dec_and_test(&ruleset->usage)) {
+		INIT_WORK(&ruleset->work_put, put_ruleset_work);
+		schedule_work(&ruleset->work_put);
+	}
+}
+
+static bool clean_ref(struct landlock_ref *ref)
+{
+	struct landlock_rule *rule;
+
+	rule = rcu_dereference(ref->rule);
+	if (!rule)
+		return false;
+	if (!landlock_rule_is_disabled(rule))
+		return false;
+	rcu_assign_pointer(ref->rule, NULL);
+	/*
+	 * landlock_put_rule() will not sleep because we already checked
+	 * !landlock_rule_is_disabled(rule).
+	 */
+	landlock_put_rule(rcu_dereference(ref->object), rule);
+	return true;
+}
+
+static void clean_ruleset(struct landlock_ruleset *ruleset)
+{
+	struct rb_node *node;
+
+	if (!ruleset)
+		return;
+	/* We must lock the ruleset to not have a wrong nb_rules counter. */
+	mutex_lock(&ruleset->lock);
+	rcu_read_lock();
+	for (node = rb_first(&ruleset->root); node; node = rb_next(node)) {
+		struct landlock_ruleset_elem *elem = rb_entry(node,
+				struct landlock_ruleset_elem, node);
+
+		if (clean_ref(&elem->ref)) {
+			rb_erase(&elem->node, &ruleset->root);
+			kfree_rcu(elem, rcu_free);
+			atomic_dec(&ruleset->nb_rules);
+		}
+	}
+	rcu_read_unlock();
+	mutex_unlock(&ruleset->lock);
+}
+
+/*
+ * Creates a new ruleset, merged of @parent and @ruleset, or return @parent if
+ * @ruleset is empty.  If @parent is empty, return a duplicate of @ruleset.
+ *
+ * @parent: Must not be modified (i.e. locked or read-only).
+ */
+struct landlock_ruleset *landlock_merge_ruleset(
+		struct landlock_ruleset *parent,
+		struct landlock_ruleset *ruleset)
+{
+	struct landlock_ruleset *new_dom;
+	int err;
+
+	might_sleep();
+	/* Opportunistically put disabled rules. */
+	clean_ruleset(ruleset);
+
+	if (parent && WARN_ON_ONCE(!parent->hierarchy))
+		return ERR_PTR(-EINVAL);
+	if (!ruleset || atomic_read(&ruleset->nb_rules) == 0 ||
+			parent == ruleset) {
+		landlock_get_ruleset(parent);
+		return parent;
+	}
+
+	new_dom = create_ruleset();
+	if (IS_ERR(new_dom))
+		return new_dom;
+	new_dom->hierarchy = kzalloc(sizeof(*new_dom->hierarchy), GFP_KERNEL);
+	if (!new_dom->hierarchy) {
+		landlock_put_ruleset(new_dom);
+		return ERR_PTR(-ENOMEM);
+	}
+	refcount_set(&new_dom->hierarchy->usage, 1);
+
+	if (parent) {
+		new_dom->hierarchy->parent = parent->hierarchy;
+		refcount_inc(&parent->hierarchy->usage);
+		err = merge_ruleset(new_dom, parent);
+		if (err) {
+			landlock_put_ruleset(new_dom);
+			return ERR_PTR(err);
+		}
+	}
+	err = merge_ruleset(new_dom, ruleset);
+	if (err) {
+		landlock_put_ruleset(new_dom);
+		return ERR_PTR(err);
+	}
+	return new_dom;
+}
+
+/*
+ * The return pointer must only be used in a RCU-read block.
+ */
+const struct landlock_access *landlock_find_access(
+		const struct landlock_ruleset *ruleset,
+		const struct landlock_object *object)
+{
+	struct rb_node *node;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	if (!object)
+		return NULL;
+	node = ruleset->root.rb_node;
+	while (node) {
+		struct landlock_ruleset_elem *this = rb_entry(node,
+				struct landlock_ruleset_elem, node);
+		uintptr_t this_object =
+			(uintptr_t)rcu_access_pointer(this->ref.object);
+
+		if (this_object == (uintptr_t)object) {
+			struct landlock_rule *rule;
+
+			rule = rcu_dereference(this->ref.rule);
+			if (!landlock_rule_is_disabled(rule))
+				return &rule->access;
+			return NULL;
+		}
+		if (this_object < (uintptr_t)object)
+			node = node->rb_right;
+		else
+			node = node->rb_left;
+	}
+	return NULL;
+}
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
new file mode 100644
index 000000000000..afc88dbb8b4b
--- /dev/null
+++ b/security/landlock/ruleset.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Ruleset management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_RULESET_H
+#define _SECURITY_LANDLOCK_RULESET_H
+
+#include <linux/compiler.h>
+#include <linux/mutex.h>
+#include <linux/poison.h>
+#include <linux/rbtree.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+#include <uapi/linux/landlock.h>
+
+#include "object.h"
+
+#define _LANDLOCK_ACCESS_FS_LAST	LANDLOCK_ACCESS_FS_MAP
+#define _LANDLOCK_ACCESS_FS_MASK	((_LANDLOCK_ACCESS_FS_LAST << 1) - 1)
+
+struct landlock_ref {
+	/*
+	 * @object: Identify a kernel object (e.g. an inode).  This is used as
+	 * a key for a ruleset tree (cf. struct landlock_ruleset_elem).  This
+	 * pointer is set once and never modified.  It may point to a deleted
+	 * object and should then be dereferenced with great care, thanks to a
+	 * call to landlock_rule_is_disabled(@rule) from inside an RCU-read
+	 * block, cf. landlock_put_rule().
+	 */
+	struct landlock_object __rcu *object;
+	/*
+	 * @rule: Ties a rule to an object. Set once with an allocated rule,
+	 * but can be NULLed if the rule is disabled.
+	 */
+	struct landlock_rule __rcu *rule;
+};
+
+/*
+ * Red-black tree element used in a landlock_ruleset.
+ */
+struct landlock_ruleset_elem {
+	struct landlock_ref ref;
+	struct rb_node node;
+	struct rcu_head rcu_free;
+};
+
+/*
+ * Enable hierarchy identification even when a parent domain vanishes.  This is
+ * needed for the ptrace protection.
+ */
+struct landlock_hierarchy {
+	struct landlock_hierarchy *parent;
+	refcount_t usage;
+};
+
+/*
+ * Kernel representation of a ruleset.  This data structure must contains
+ * unique entries, be updatable, and quick to match an object.
+ */
+struct landlock_ruleset {
+	/*
+	 * @fs_access_mask: Contains the subset of filesystem actions which are
+	 * restricted by a ruleset.  This is used when merging rulesets and for
+	 * userspace backward compatibility (i.e. future-proof).  Set once and
+	 * never changed for the lifetime of the ruleset.
+	 */
+	u32 fs_access_mask;
+	struct landlock_hierarchy *hierarchy;
+	refcount_t usage;
+	union {
+		struct rcu_head	rcu_free;
+		struct work_struct work_put;
+	};
+	struct mutex lock;
+	atomic_t nb_rules;
+	/*
+	 * @root: Red-black tree containing landlock_ruleset_elem nodes.
+	 */
+	struct rb_root root;
+};
+
+struct landlock_ruleset *landlock_create_ruleset(u64 fs_access_mask);
+
+void landlock_get_ruleset(struct landlock_ruleset *ruleset);
+void landlock_put_ruleset(struct landlock_ruleset *ruleset);
+void landlock_put_ruleset_enqueue(struct landlock_ruleset *ruleset);
+
+int landlock_insert_ruleset_rule(struct landlock_ruleset *ruleset,
+		struct landlock_object *object, struct landlock_access *access,
+		struct landlock_rule *rule);
+
+struct landlock_ruleset *landlock_merge_ruleset(
+		struct landlock_ruleset *domain,
+		struct landlock_ruleset *ruleset);
+
+const struct landlock_access *landlock_find_access(
+		const struct landlock_ruleset *ruleset,
+		const struct landlock_object *object);
+
+#endif /* _SECURITY_LANDLOCK_RULESET_H */
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 03/10] landlock: Set up the security framework and manage credentials
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 02/10] landlock: Add ruleset and domain management Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 04/10] landlock: Add ptrace restrictions Mickaël Salaün
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

A process credentials point to a Landlock domain, which is underneath
implemented with a ruleset.  In the following commits, this domain is
used to check and enforce the ptrace and filesystem security policies.
A domain is inherited from a parent to its child the same way a thread
inherits a seccomp policy.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v13:
* totally get ride of the seccomp dependency
* only keep credential management and LSM setup.

Previous version:
https://lore.kernel.org/lkml/20191104172146.30797-4-mic@digikod.net/
---
 security/Kconfig           | 10 +++----
 security/landlock/Makefile |  3 ++-
 security/landlock/cred.c   | 47 ++++++++++++++++++++++++++++++++
 security/landlock/cred.h   | 55 ++++++++++++++++++++++++++++++++++++++
 security/landlock/setup.c  | 30 +++++++++++++++++++++
 security/landlock/setup.h  | 18 +++++++++++++
 6 files changed, 157 insertions(+), 6 deletions(-)
 create mode 100644 security/landlock/cred.c
 create mode 100644 security/landlock/cred.h
 create mode 100644 security/landlock/setup.c
 create mode 100644 security/landlock/setup.h

diff --git a/security/Kconfig b/security/Kconfig
index 9d9981394fb0..76547b5c694d 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -278,11 +278,11 @@ endchoice
 
 config LSM
 	string "Ordered list of enabled LSMs"
-	default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK
-	default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR
-	default "lockdown,yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO
-	default "lockdown,yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC
-	default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
 	help
 	  A comma-separated list of LSMs, in initialization order.
 	  Any LSMs left off this list will be ignored. This can be
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index d846eba445bb..041ea242e627 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := object.o ruleset.o
+landlock-y := setup.o object.o ruleset.o \
+	cred.o
diff --git a/security/landlock/cred.c b/security/landlock/cred.c
new file mode 100644
index 000000000000..69ef93e29a53
--- /dev/null
+++ b/security/landlock/cred.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Credential hooks
+ *
+ * Copyright © 2017-2019 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2019 ANSSI
+ */
+
+#include <linux/cred.h>
+#include <linux/lsm_hooks.h>
+
+#include "cred.h"
+#include "ruleset.h"
+#include "setup.h"
+
+static int hook_cred_prepare(struct cred *new, const struct cred *old,
+		gfp_t gfp)
+{
+	const struct landlock_cred_security *cred_old = landlock_cred(old);
+	struct landlock_cred_security *cred_new = landlock_cred(new);
+	struct landlock_ruleset *dom_old;
+
+	dom_old = cred_old->domain;
+	if (dom_old) {
+		landlock_get_ruleset(dom_old);
+		cred_new->domain = dom_old;
+	} else {
+		cred_new->domain = NULL;
+	}
+	return 0;
+}
+
+static void hook_cred_free(struct cred *cred)
+{
+	landlock_put_ruleset_enqueue(landlock_cred(cred)->domain);
+}
+
+static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = {
+	LSM_HOOK_INIT(cred_prepare, hook_cred_prepare),
+	LSM_HOOK_INIT(cred_free, hook_cred_free),
+};
+
+__init void landlock_add_hooks_cred(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			LANDLOCK_NAME);
+}
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
new file mode 100644
index 000000000000..1e24682ee27e
--- /dev/null
+++ b/security/landlock/cred.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Credential hooks
+ *
+ * Copyright © 2019 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_CRED_H
+#define _SECURITY_LANDLOCK_CRED_H
+
+#include <linux/cred.h>
+#include <linux/init.h>
+#include <linux/rcupdate.h>
+
+#include "ruleset.h"
+#include "setup.h"
+
+struct landlock_cred_security {
+	struct landlock_ruleset *domain;
+};
+
+static inline struct landlock_cred_security *landlock_cred(
+		const struct cred *cred)
+{
+	return cred->security + landlock_blob_sizes.lbs_cred;
+}
+
+static inline struct landlock_ruleset *landlock_get_current_domain(void)
+{
+	return landlock_cred(current_cred())->domain;
+}
+
+/*
+ * The caller needs an RCU lock.
+ */
+static inline struct landlock_ruleset *landlock_get_task_domain(
+		struct task_struct *task)
+{
+	return landlock_cred(__task_cred(task))->domain;
+}
+
+static inline bool landlocked(struct task_struct *task)
+{
+	bool has_dom;
+
+	rcu_read_lock();
+	has_dom = !!landlock_get_task_domain(task);
+	rcu_read_unlock();
+	return has_dom;
+}
+
+__init void landlock_add_hooks_cred(void);
+
+#endif /* _SECURITY_LANDLOCK_CRED_H */
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
new file mode 100644
index 000000000000..fca5fa185465
--- /dev/null
+++ b/security/landlock/setup.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Security framework setup
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/init.h>
+#include <linux/lsm_hooks.h>
+
+#include "cred.h"
+#include "setup.h"
+
+struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
+	.lbs_cred = sizeof(struct landlock_cred_security),
+};
+
+static int __init landlock_init(void)
+{
+	pr_info(LANDLOCK_NAME ": Registering hooks\n");
+	landlock_add_hooks_cred();
+	return 0;
+}
+
+DEFINE_LSM(LANDLOCK_NAME) = {
+	.name = LANDLOCK_NAME,
+	.init = landlock_init,
+	.blobs = &landlock_blob_sizes,
+};
diff --git a/security/landlock/setup.h b/security/landlock/setup.h
new file mode 100644
index 000000000000..52eb8d806376
--- /dev/null
+++ b/security/landlock/setup.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Security framework setup
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_SETUP_H
+#define _SECURITY_LANDLOCK_SETUP_H
+
+#include <linux/lsm_hooks.h>
+
+#define LANDLOCK_NAME "landlock"
+
+extern struct lsm_blob_sizes landlock_blob_sizes;
+
+#endif /* _SECURITY_LANDLOCK_SETUP_H */
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 04/10] landlock: Add ptrace restrictions
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (2 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 03/10] landlock: Set up the security framework and manage credentials Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control Mickaël Salaün
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

Using ptrace(2) and related debug features on a target process can lead
to a privilege escalation.  Indeed, ptrace(2) can be used by an attacker
to impersonate another task and to remain undetected while performing
malicious activities.  Thanks to  ptrace_may_access(), various part of
the kernel can check if a tracer is more privileged than a tracee.

A landlocked process has fewer privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process' rules (i.e. the tracee must be in a sub-domain of the tracer).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v13:
* Make the ptrace restriction mandatory, like in the v10.
* Remove the eBPF dependency.

Previous version:
https://lore.kernel.org/lkml/20191104172146.30797-5-mic@digikod.net/
---
 security/landlock/Makefile |   2 +-
 security/landlock/ptrace.c | 118 +++++++++++++++++++++++++++++++++++++
 security/landlock/ptrace.h |  14 +++++
 security/landlock/setup.c  |   2 +
 4 files changed, 135 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/ptrace.c
 create mode 100644 security/landlock/ptrace.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 041ea242e627..f1d1eb72fa76 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
 landlock-y := setup.o object.o ruleset.o \
-	cred.o
+	cred.o ptrace.o
diff --git a/security/landlock/ptrace.c b/security/landlock/ptrace.c
new file mode 100644
index 000000000000..6c7326788c46
--- /dev/null
+++ b/security/landlock/ptrace.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Ptrace hooks
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2020 ANSSI
+ */
+
+#include <asm/current.h>
+#include <linux/cred.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/lsm_hooks.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+
+#include "cred.h"
+#include "ptrace.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/**
+ * domain_scope_le - Checks domain ordering for scoped ptrace
+ *
+ * @parent: Parent domain.
+ * @child: Potential child of @parent.
+ *
+ * Checks if the @parent domain is less or equal to (i.e. an ancestor, which
+ * means a subset of) the @child domain.
+ */
+static bool domain_scope_le(const struct landlock_ruleset *parent,
+		const struct landlock_ruleset *child)
+{
+	const struct landlock_hierarchy *walker;
+
+	if (!parent)
+		return true;
+	if (!child)
+		return false;
+	for (walker = child->hierarchy; walker; walker = walker->parent) {
+		if (walker == parent->hierarchy)
+			/* @parent is in the scoped hierarchy of @child. */
+			return true;
+	}
+	/* There is no relationship between @parent and @child. */
+	return false;
+}
+
+static bool task_is_scoped(struct task_struct *parent,
+		struct task_struct *child)
+{
+	bool is_scoped;
+	const struct landlock_ruleset *dom_parent, *dom_child;
+
+	rcu_read_lock();
+	dom_parent = landlock_get_task_domain(parent);
+	dom_child = landlock_get_task_domain(child);
+	is_scoped = domain_scope_le(dom_parent, dom_child);
+	rcu_read_unlock();
+	return is_scoped;
+}
+
+static int task_ptrace(struct task_struct *parent, struct task_struct *child)
+{
+	/* Quick return for non-landlocked tasks. */
+	if (!landlocked(parent))
+		return 0;
+	if (task_is_scoped(parent, child))
+		return 0;
+	return -EPERM;
+}
+
+/**
+ * hook_ptrace_access_check - Determines whether the current process may access
+ *			      another
+ *
+ * @child: Process to be accessed.
+ * @mode: Mode of attachment.
+ *
+ * If the current task has Landlock rules, then the child must have at least
+ * the same rules.  Else denied.
+ *
+ * Determines whether a process may access another, returning 0 if permission
+ * granted, -errno if denied.
+ */
+static int hook_ptrace_access_check(struct task_struct *child,
+		unsigned int mode)
+{
+	return task_ptrace(current, child);
+}
+
+/**
+ * hook_ptrace_traceme - Determines whether another process may trace the
+ *			 current one
+ *
+ * @parent: Task proposed to be the tracer.
+ *
+ * If the parent has Landlock rules, then the current task must have the same
+ * or more rules.  Else denied.
+ *
+ * Determines whether the nominated task is permitted to trace the current
+ * process, returning 0 if permission is granted, -errno if denied.
+ */
+static int hook_ptrace_traceme(struct task_struct *parent)
+{
+	return task_ptrace(parent, current);
+}
+
+static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = {
+	LSM_HOOK_INIT(ptrace_access_check, hook_ptrace_access_check),
+	LSM_HOOK_INIT(ptrace_traceme, hook_ptrace_traceme),
+};
+
+__init void landlock_add_hooks_ptrace(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			LANDLOCK_NAME);
+}
diff --git a/security/landlock/ptrace.h b/security/landlock/ptrace.h
new file mode 100644
index 000000000000..6740c6a723de
--- /dev/null
+++ b/security/landlock/ptrace.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Ptrace hooks
+ *
+ * Copyright © 2017-2019 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_PTRACE_H
+#define _SECURITY_LANDLOCK_PTRACE_H
+
+__init void landlock_add_hooks_ptrace(void);
+
+#endif /* _SECURITY_LANDLOCK_PTRACE_H */
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index fca5fa185465..117afb344da6 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -10,6 +10,7 @@
 #include <linux/lsm_hooks.h>
 
 #include "cred.h"
+#include "ptrace.h"
 #include "setup.h"
 
 struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
@@ -20,6 +21,7 @@ static int __init landlock_init(void)
 {
 	pr_info(LANDLOCK_NAME ": Registering hooks\n");
 	landlock_add_hooks_cred();
+	landlock_add_hooks_ptrace();
 	return 0;
 }
 
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (3 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 04/10] landlock: Add ptrace restrictions Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-26 20:29   ` Jann Horn
  2020-02-24 16:02 ` [RFC PATCH v14 06/10] landlock: Add syscall implementation Mickaël Salaün
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

Thanks to the Landlock objects and ruleset, it is possible to identify
inodes according to a process' domain.  To enable an unprivileged
process to express a file hierarchy, it first needs to open a directory
(or a file) and pass this file descriptor to the kernel through
landlock(2).  When checking if a file access request is allowed, we walk
from the requested dentry to the real root, following the different
mount layers.  The access to each "tagged" inodes are collected and
ANDed to create an access to the requested file hierarchy.  This makes
possible to identify a lot of files without tagging every inodes nor
modifying the filesystem, while still following the view and
understanding the user has from the filesystem.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v11:
* Add back, revamp and make a fully working filesystem access-control
  based on paths and inodes.
* Remove the eBPF dependency.

Previous version:
https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/
---
 MAINTAINERS                |   1 +
 fs/super.c                 |   2 +
 include/linux/landlock.h   |  22 ++
 security/landlock/Kconfig  |   1 +
 security/landlock/Makefile |   2 +-
 security/landlock/fs.c     | 591 +++++++++++++++++++++++++++++++++++++
 security/landlock/fs.h     |  42 +++
 security/landlock/object.c |   2 +
 security/landlock/setup.c  |   6 +
 security/landlock/setup.h  |   2 +
 10 files changed, 670 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/landlock.h
 create mode 100644 security/landlock/fs.c
 create mode 100644 security/landlock/fs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 937257925e65..0c8c2c651b96 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9366,6 +9366,7 @@ L:	linux-security-module@vger.kernel.org
 W:	https://landlock.io
 T:	git https://github.com/landlock-lsm/linux.git
 S:	Supported
+F:	include/linux/landlock.h
 F:	include/uapi/linux/landlock.h
 F:	security/landlock/
 K:	landlock
diff --git a/fs/super.c b/fs/super.c
index cd352530eca9..4ad6a64a1706 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -34,6 +34,7 @@
 #include <linux/cleancache.h>
 #include <linux/fscrypt.h>
 #include <linux/fsnotify.h>
+#include <linux/landlock.h>
 #include <linux/lockdep.h>
 #include <linux/user_namespace.h>
 #include <linux/fs_context.h>
@@ -454,6 +455,7 @@ void generic_shutdown_super(struct super_block *sb)
 		evict_inodes(sb);
 		/* only nonzero refcount inodes can have marks */
 		fsnotify_sb_delete(sb);
+		landlock_release_inodes(sb);
 
 		if (sb->s_dio_done_wq) {
 			destroy_workqueue(sb->s_dio_done_wq);
diff --git a/include/linux/landlock.h b/include/linux/landlock.h
new file mode 100644
index 000000000000..0fb16d130b0a
--- /dev/null
+++ b/include/linux/landlock.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Landlock LSM - public kernel headers
+ *
+ * Copyright © 2016-2019 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2019 ANSSI
+ */
+
+#ifndef _LINUX_LANDLOCK_H
+#define _LINUX_LANDLOCK_H
+
+#include <linux/fs.h>
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void landlock_release_inodes(struct super_block *sb);
+#else
+static inline void landlock_release_inodes(struct super_block *sb)
+{
+}
+#endif
+
+#endif /* _LINUX_LANDLOCK_H */
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
index 4a321d5b3f67..af0593c2a9e5 100644
--- a/security/landlock/Kconfig
+++ b/security/landlock/Kconfig
@@ -3,6 +3,7 @@
 config SECURITY_LANDLOCK
 	bool "Landlock support"
 	depends on SECURITY
+	select SECURITY_PATH
 	default n
 	help
 	  This selects Landlock, a safe sandboxing mechanism.  It enables to
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index f1d1eb72fa76..92e3d80ab8ed 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
 landlock-y := setup.o object.o ruleset.o \
-	cred.o ptrace.o
+	cred.o ptrace.o fs.o
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
new file mode 100644
index 000000000000..7f3bd4fd04bb
--- /dev/null
+++ b/security/landlock/fs.c
@@ -0,0 +1,591 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Filesystem management and hooks
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/compiler_types.h>
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/landlock.h>
+#include <linux/lsm_hooks.h>
+#include <linux/mman.h>
+#include <linux/mm_types.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/path.h>
+#include <linux/rcupdate.h>
+#include <linux/spinlock.h>
+#include <linux/stat.h>
+#include <linux/types.h>
+#include <linux/uidgid.h>
+#include <linux/workqueue.h>
+#include <uapi/linux/landlock.h>
+
+#include "cred.h"
+#include "fs.h"
+#include "object.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/* Underlying object management */
+
+void landlock_release_inode(struct inode *inode, struct landlock_object *object)
+{
+	/*
+	 * A call to landlock_put_object() or release_object() might sleep, but
+	 * landlock_release_object() can not sleep because it is called with a
+	 * reference to the inode.  However, we can still mark this function as
+	 * such because this should not bother landlock_release_object()
+	 * callers (e.g. landlock_release_inodes()).
+	 */
+	might_sleep();
+	/*
+	 * We must check that no one else replaced the pinned object in the
+	 * window between the reset of @object->underlying_object and now.
+	 */
+	spin_lock(&inode->i_lock);
+	if (rcu_access_pointer(inode_landlock(inode)->object) == object)
+		rcu_assign_pointer(inode_landlock(inode)->object, NULL);
+	spin_unlock(&inode->i_lock);
+	/*
+	 * Because we first NULL the reference to the object, calling iput()
+	 * won't trigger a call to landlock_put_object() (via
+	 * put_underlying_object).
+	 */
+	iput(inode);
+}
+
+/*
+ * Release the inodes used in a security policy.
+ *
+ * It is much more clean to have a dedicated call in generic_shutdown_super()
+ * than a hacky sb_free_security hook, especially with the locked sb_lock.
+ *
+ * Cf. fsnotify_unmount_inodes()
+ */
+void landlock_release_inodes(struct super_block *sb)
+{
+	struct inode *inode, *next, *iput_inode = NULL;
+
+	if (!landlock_initialized)
+		return;
+
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		if (!atomic_read(&inode->i_count)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&sb->s_inode_list_lock);
+		/*
+		 * We can now actually put the previous inode, which is not
+		 * needed anymore for the loop walk.  Because this inode should
+		 * only be referenced by Landlock for this super block, iput()
+		 * should trigger a call to hook_inode_free_security().
+		 */
+		if (iput_inode)
+			iput(iput_inode);
+
+		landlock_release_object(inode_landlock(inode)->object);
+
+		iput_inode = inode;
+		spin_lock(&sb->s_inode_list_lock);
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+	if (iput_inode)
+		iput(iput_inode);
+}
+
+/* Ruleset management */
+
+static struct landlock_object *get_inode_object(struct inode *inode)
+	__acquires(object->usage)
+{
+	struct landlock_object *object, *new_object;
+
+	/* Let's first try a lockless access. */
+	rcu_read_lock();
+	object = landlock_get_object(rcu_dereference(
+				inode_landlock(inode)->object));
+	rcu_read_unlock();
+	if (object)
+		return object;
+
+	__release(object->usage);
+	/*
+	 * If there is no object tied to @inode, then create a new one (outside
+	 * of a locked block).
+	 */
+	new_object = landlock_create_object(LANDLOCK_OBJECT_INODE, inode);
+
+	spin_lock(&inode->i_lock);
+	object = landlock_get_object(rcu_dereference_protected(
+				inode_landlock(inode)->object,
+				lockdep_is_held(&inode->i_lock)));
+	if (unlikely(object)) {
+		/*
+		 * Do not try to iput(inode) because it is not held yet.
+		 */
+		landlock_drop_object(new_object);
+	} else {
+		__release(object->usage);
+		object = landlock_get_object(new_object);
+		rcu_assign_pointer(inode_landlock(inode)->object, object);
+		/*
+		 * @inode will be released by landlock_release_inodes() on its
+		 * super-block shutdown.
+		 */
+		ihold(inode);
+	}
+	spin_unlock(&inode->i_lock);
+	return object;
+}
+
+/*
+ * @path: Should have been checked by get_path_from_fd().
+ */
+int landlock_append_fs_rule(struct landlock_ruleset *ruleset,
+		struct path *path, u64 access_hierarchy)
+{
+	int err;
+	struct landlock_access access;
+	struct landlock_object *object;
+
+	/*
+	 * Checks that @access_hierarchy matches the @ruleset constraints, but
+	 * allow empty @access_hierarchy i.e., deny @ruleset->fs_access_mask .
+	 */
+	if ((ruleset->fs_access_mask | access_hierarchy) !=
+			ruleset->fs_access_mask)
+		return -EINVAL;
+	/* Transforms relative access rights to absolute ones. */
+	access_hierarchy |= _LANDLOCK_ACCESS_FS_MASK &
+		~ruleset->fs_access_mask;
+	access.self = access_hierarchy;
+	access.beneath = access_hierarchy;
+	object = get_inode_object(d_backing_inode(path->dentry));
+	mutex_lock(&ruleset->lock);
+	err = landlock_insert_ruleset_rule(ruleset, object, &access, NULL);
+	mutex_unlock(&ruleset->lock);
+	/*
+	 * No need to check for an error because landlock_put_object() handles
+	 * empty object and will terminate it if necessary.
+	 */
+	landlock_put_object(object);
+	return err;
+}
+
+/* Access-control management */
+
+static bool check_access_path_continue(
+		const struct landlock_ruleset *domain,
+		const struct path *path, u32 access_request,
+		const bool check_self, bool *allow)
+{
+	const struct landlock_access *access;
+	bool next = true;
+
+	rcu_read_lock();
+	access = landlock_find_access(domain, rcu_dereference(inode_landlock(
+				d_backing_inode(path->dentry))->object));
+	if (access) {
+		next = ((check_self ? access->self : access->beneath) &
+				access_request) == access_request;
+		*allow = next;
+	}
+	rcu_read_unlock();
+	return next;
+}
+
+static int check_access_path(const struct landlock_ruleset *domain,
+		const struct path *path, u32 access_request)
+{
+	bool allow = false;
+	struct path walker_path;
+
+	if (WARN_ON_ONCE(!path))
+		return 0;
+	/* An access request not handled by the domain should be allowed. */
+	access_request &= domain->fs_access_mask;
+	if (access_request == 0)
+		return 0;
+	walker_path = *path;
+	path_get(&walker_path);
+	if (check_access_path_continue(domain, &walker_path, access_request,
+				true, &allow)) {
+		/*
+		 * We need to walk through all the hierarchy to not miss any
+		 * relevant restriction.  This could be optimized with a future
+		 * commit.
+		 */
+		do {
+			struct dentry *parent_dentry;
+
+jump_up:
+			/*
+			 * Does not work with orphaned/private mounts like
+			 * overlayfs layers for now (cf. ovl_path_real() and
+			 * ovl_path_open()).
+			 */
+			if (walker_path.dentry == walker_path.mnt->mnt_root) {
+				if (follow_up(&walker_path))
+					/* Ignores hidden mount points. */
+					goto jump_up;
+				else
+					/* Stops at the real root. */
+					break;
+			}
+			parent_dentry = dget_parent(walker_path.dentry);
+			dput(walker_path.dentry);
+			walker_path.dentry = parent_dentry;
+		} while (check_access_path_continue(domain, &walker_path,
+					access_request, false, &allow));
+	}
+	path_put(&walker_path);
+	return allow ? 0 : -EACCES;
+}
+
+static inline int current_check_access_path(const struct path *path,
+		u32 access_request)
+{
+	struct landlock_ruleset *dom;
+
+	dom = landlock_get_current_domain();
+	if (!dom)
+		return 0;
+	return check_access_path(dom, path, access_request);
+}
+
+/* Super-block hooks */
+
+/*
+ * Because a Landlock security policy is defined according to the filesystem
+ * layout (i.e. the mount namespace), changing it may grant access to files not
+ * previously allowed.
+ *
+ * To make it simple, deny any filesystem layout modification by landlocked
+ * processes.  Non-landlocked processes may still change the namespace of a
+ * landlocked process, but this kind of threat must be handled by a system-wide
+ * access-control security policy.
+ *
+ * This could be lifted in the future if Landlock can safely handle mount
+ * namespace updates requested by a landlocked process.  Indeed, we could
+ * update the current domain (which is currently read-only) by taking into
+ * account the accesses of the source and the destination of a new mount point.
+ * However, it would also require to make all the child domains dynamically
+ * inherit these new constraints.  Anyway, for backward compatibility reasons,
+ * a dedicated user space option would be required (e.g. as a ruleset command
+ * option).
+ */
+static int hook_sb_mount(const char *dev_name, const struct path *path,
+		const char *type, unsigned long flags, void *data)
+{
+	if (!landlock_get_current_domain())
+		return 0;
+	return -EPERM;
+}
+
+static int hook_move_mount(const struct path *from_path,
+		const struct path *to_path)
+{
+	if (!landlock_get_current_domain())
+		return 0;
+	return -EPERM;
+}
+
+/*
+ * Removing a mount point may reveal a previously hidden file hierarchy, which
+ * may then grant access to files, which may have previously been forbidden.
+ */
+static int hook_sb_umount(struct vfsmount *mnt, int flags)
+{
+	if (!landlock_get_current_domain())
+		return 0;
+	return -EPERM;
+}
+
+static int hook_sb_remount(struct super_block *sb, void *mnt_opts)
+{
+	if (!landlock_get_current_domain())
+		return 0;
+	return -EPERM;
+}
+
+/*
+ * pivot_root(2), like mount(2), changes the current mount namespace.  It must
+ * then be forbidden for a landlocked process.
+ *
+ * However, chroot(2) may be allowed because it only changes the relative root
+ * directory of the current process.
+ */
+static int hook_sb_pivotroot(const struct path *old_path,
+		const struct path *new_path)
+{
+	if (!landlock_get_current_domain())
+		return 0;
+	return -EPERM;
+}
+
+/* Path hooks */
+
+static int hook_path_link(struct dentry *old_dentry,
+		const struct path *new_dir, struct dentry *new_dentry)
+{
+	return current_check_access_path(new_dir, LANDLOCK_ACCESS_FS_LINK_TO);
+}
+
+static int hook_path_mkdir(const struct path *dir, struct dentry *dentry,
+		umode_t mode)
+{
+	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_DIR);
+}
+
+static inline u32 get_mode_access(umode_t mode)
+{
+	switch (mode & S_IFMT) {
+	case S_IFLNK:
+		return LANDLOCK_ACCESS_FS_MAKE_SYM;
+	case S_IFREG:
+		return LANDLOCK_ACCESS_FS_MAKE_REG;
+	case S_IFDIR:
+		return LANDLOCK_ACCESS_FS_MAKE_DIR;
+	case S_IFCHR:
+		return LANDLOCK_ACCESS_FS_MAKE_CHAR;
+	case S_IFBLK:
+		return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
+	case S_IFIFO:
+		return LANDLOCK_ACCESS_FS_MAKE_FIFO;
+	case S_IFSOCK:
+		return LANDLOCK_ACCESS_FS_MAKE_SOCK;
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
+static int hook_path_mknod(const struct path *dir, struct dentry *dentry,
+		umode_t mode, unsigned int dev)
+{
+	return current_check_access_path(dir, get_mode_access(mode));
+}
+
+static int hook_path_symlink(const struct path *dir, struct dentry *dentry,
+				const char *old_name)
+{
+	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_SYM);
+}
+
+static int hook_path_truncate(const struct path *path)
+{
+	return current_check_access_path(path, LANDLOCK_ACCESS_FS_TRUNCATE);
+}
+
+static int hook_path_unlink(const struct path *dir, struct dentry *dentry)
+{
+	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_UNLINK);
+}
+
+static int hook_path_rmdir(const struct path *dir, struct dentry *dentry)
+{
+	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_RMDIR);
+}
+
+static int hook_path_rename(const struct path *old_dir,
+		struct dentry *old_dentry, const struct path *new_dir,
+		struct dentry *new_dentry)
+{
+	struct landlock_ruleset *dom;
+	int err;
+
+	dom = landlock_get_current_domain();
+	if (!dom)
+		return 0;
+	err = check_access_path(dom, old_dir, LANDLOCK_ACCESS_FS_RENAME_FROM);
+	if (err)
+		return err;
+	return check_access_path(dom, new_dir, LANDLOCK_ACCESS_FS_RENAME_TO);
+}
+
+static int hook_path_chmod(const struct path *path, umode_t mode)
+{
+	return current_check_access_path(path, LANDLOCK_ACCESS_FS_CHMOD);
+}
+
+static int hook_path_chown(const struct path *path, kuid_t uid, kgid_t gid)
+{
+	struct landlock_ruleset *dom;
+	int err;
+
+	dom = landlock_get_current_domain();
+	if (!dom)
+		return 0;
+	if (uid_valid(uid)) {
+		err = check_access_path(dom, path, LANDLOCK_ACCESS_FS_CHOWN);
+		if (err)
+			return err;
+	}
+	if (gid_valid(gid)) {
+		err = check_access_path(dom, path, LANDLOCK_ACCESS_FS_CHGRP);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int hook_path_chroot(const struct path *path)
+{
+	return current_check_access_path(path, LANDLOCK_ACCESS_FS_CHROOT);
+}
+
+/* Inode hooks */
+
+static int hook_inode_alloc_security(struct inode *inode)
+{
+	inode_landlock(inode)->object = NULL;
+	return 0;
+}
+
+static void hook_inode_free_security(struct inode *inode)
+{
+	WARN_ON_ONCE(rcu_access_pointer(inode_landlock(inode)->object));
+}
+
+static int hook_inode_getattr(const struct path *path)
+{
+	return current_check_access_path(path, LANDLOCK_ACCESS_FS_GETATTR);
+}
+
+/* File hooks */
+
+static inline u32 get_file_access(const struct file *file)
+{
+	u32 access = 0;
+
+	if (file->f_mode & FMODE_READ) {
+		/* A directory can only be opened in read mode. */
+		if (S_ISDIR(file_inode(file)->i_mode))
+			access |= LANDLOCK_ACCESS_FS_READDIR;
+		else
+			access |= LANDLOCK_ACCESS_FS_READ;
+	}
+	/*
+	 * A LANDLOCK_ACCESS_FS_APPEND could be added be we also need to check
+	 * fcntl(2).
+	 */
+	if (file->f_mode & FMODE_WRITE)
+		access |= LANDLOCK_ACCESS_FS_WRITE;
+	/* __FMODE_EXEC is indeed part of f_flags, not f_mode. */
+	if (file->f_flags & __FMODE_EXEC)
+		access |= LANDLOCK_ACCESS_FS_EXECUTE;
+	return access;
+}
+
+static int hook_file_open(struct file *file)
+{
+	if (WARN_ON_ONCE(!file))
+		return 0;
+	if (!file_inode(file))
+		return -ENOENT;
+	return current_check_access_path(&file->f_path,
+			LANDLOCK_ACCESS_FS_OPEN | get_file_access(file));
+}
+
+static inline u32 get_mem_access(unsigned long prot, bool private)
+{
+	u32 access = LANDLOCK_ACCESS_FS_MAP;
+
+	/* Private mapping do not write to files. */
+	if (!private && (prot & PROT_WRITE))
+		access |= LANDLOCK_ACCESS_FS_WRITE;
+	if (prot & PROT_READ)
+		access |= LANDLOCK_ACCESS_FS_READ;
+	if (prot & PROT_EXEC)
+		access |= LANDLOCK_ACCESS_FS_EXECUTE;
+	return access;
+}
+
+static int hook_mmap_file(struct file *file, unsigned long reqprot,
+		unsigned long prot, unsigned long flags)
+{
+	/* @file can be null for anonymous mmap. */
+	if (!file)
+		return 0;
+	return current_check_access_path(&file->f_path,
+			get_mem_access(prot, flags & MAP_PRIVATE));
+}
+
+static int hook_file_mprotect(struct vm_area_struct *vma,
+		unsigned long reqprot, unsigned long prot)
+{
+	if (WARN_ON_ONCE(!vma))
+		return 0;
+	if (!vma->vm_file)
+		return 0;
+	return current_check_access_path(&vma->vm_file->f_path,
+			get_mem_access(prot, !(vma->vm_flags & VM_SHARED)));
+}
+
+static int hook_file_ioctl(struct file *file, unsigned int cmd,
+		unsigned long arg)
+{
+	if (WARN_ON_ONCE(!file))
+		return 0;
+	return current_check_access_path(&file->f_path,
+			LANDLOCK_ACCESS_FS_IOCTL);
+}
+
+static int hook_file_lock(struct file *file, unsigned int cmd)
+{
+	if (WARN_ON_ONCE(!file))
+		return 0;
+	return current_check_access_path(&file->f_path,
+			LANDLOCK_ACCESS_FS_LOCK);
+}
+
+static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = {
+	LSM_HOOK_INIT(sb_mount, hook_sb_mount),
+	LSM_HOOK_INIT(move_mount, hook_move_mount),
+	LSM_HOOK_INIT(sb_umount, hook_sb_umount),
+	LSM_HOOK_INIT(sb_remount, hook_sb_remount),
+	LSM_HOOK_INIT(sb_pivotroot, hook_sb_pivotroot),
+
+	LSM_HOOK_INIT(path_link, hook_path_link),
+	LSM_HOOK_INIT(path_mkdir, hook_path_mkdir),
+	LSM_HOOK_INIT(path_mknod, hook_path_mknod),
+	LSM_HOOK_INIT(path_symlink, hook_path_symlink),
+	LSM_HOOK_INIT(path_truncate, hook_path_truncate),
+	LSM_HOOK_INIT(path_unlink, hook_path_unlink),
+	LSM_HOOK_INIT(path_rmdir, hook_path_rmdir),
+	LSM_HOOK_INIT(path_rename, hook_path_rename),
+	LSM_HOOK_INIT(path_chmod, hook_path_chmod),
+	LSM_HOOK_INIT(path_chown, hook_path_chown),
+	LSM_HOOK_INIT(path_chroot, hook_path_chroot),
+
+	LSM_HOOK_INIT(inode_alloc_security, hook_inode_alloc_security),
+	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
+	LSM_HOOK_INIT(inode_getattr, hook_inode_getattr),
+
+	LSM_HOOK_INIT(file_open, hook_file_open),
+	LSM_HOOK_INIT(mmap_file, hook_mmap_file),
+	LSM_HOOK_INIT(file_mprotect, hook_file_mprotect),
+	LSM_HOOK_INIT(file_ioctl, hook_file_ioctl),
+	LSM_HOOK_INIT(file_lock, hook_file_lock),
+};
+
+__init void landlock_add_hooks_fs(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			LANDLOCK_NAME);
+}
diff --git a/security/landlock/fs.h b/security/landlock/fs.h
new file mode 100644
index 000000000000..5d2ed8a1d4d4
--- /dev/null
+++ b/security/landlock/fs.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Filesystem management and hooks
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_FS_H
+#define _SECURITY_LANDLOCK_FS_H
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/rcupdate.h>
+
+#include "ruleset.h"
+#include "setup.h"
+
+struct landlock_inode_security {
+	/*
+	 * We need an allocated object to be able to safely untie a rule from
+	 * an object (i.e. unlink then free a rule), cf. put_rule().  This
+	 * object is guarded by the underlying object's lock.
+	 */
+	struct landlock_object __rcu *object;
+};
+
+static inline struct landlock_inode_security *inode_landlock(
+		const struct inode *inode)
+{
+	return inode->i_security + landlock_blob_sizes.lbs_inode;
+}
+
+__init void landlock_add_hooks_fs(void);
+
+void landlock_release_inode(struct inode *inode,
+		struct landlock_object *object);
+
+int landlock_append_fs_rule(struct landlock_ruleset *ruleset,
+		struct path *path, u64 actions);
+
+#endif /* _SECURITY_LANDLOCK_FS_H */
diff --git a/security/landlock/object.c b/security/landlock/object.c
index 38fbbb108120..2d373f224989 100644
--- a/security/landlock/object.c
+++ b/security/landlock/object.c
@@ -29,6 +29,7 @@
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 
+#include "fs.h"
 #include "object.h"
 
 struct landlock_object *landlock_create_object(
@@ -138,6 +139,7 @@ static bool release_object(struct landlock_object *object)
 
 	switch (object->type) {
 	case LANDLOCK_OBJECT_INODE:
+		landlock_release_inode(underlying_object, object);
 		break;
 	default:
 		WARN_ON_ONCE(1);
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index 117afb344da6..93ef2dbe83ae 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -10,11 +10,15 @@
 #include <linux/lsm_hooks.h>
 
 #include "cred.h"
+#include "fs.h"
 #include "ptrace.h"
 #include "setup.h"
 
+bool landlock_initialized __lsm_ro_after_init = false;
+
 struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
 	.lbs_cred = sizeof(struct landlock_cred_security),
+	.lbs_inode = sizeof(struct landlock_inode_security),
 };
 
 static int __init landlock_init(void)
@@ -22,6 +26,8 @@ static int __init landlock_init(void)
 	pr_info(LANDLOCK_NAME ": Registering hooks\n");
 	landlock_add_hooks_cred();
 	landlock_add_hooks_ptrace();
+	landlock_add_hooks_fs();
+	landlock_initialized = true;
 	return 0;
 }
 
diff --git a/security/landlock/setup.h b/security/landlock/setup.h
index 52eb8d806376..260fd2068b95 100644
--- a/security/landlock/setup.h
+++ b/security/landlock/setup.h
@@ -13,6 +13,8 @@
 
 #define LANDLOCK_NAME "landlock"
 
+extern bool landlock_initialized;
+
 extern struct lsm_blob_sizes landlock_blob_sizes;
 
 #endif /* _SECURITY_LANDLOCK_SETUP_H */
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 06/10] landlock: Add syscall implementation
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (4 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-03-17 16:47   ` Al Viro
  2020-02-24 16:02 ` [RFC PATCH v14 07/10] arch: Wire up landlock() syscall Mickaël Salaün
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

This syscall, inspired from seccomp(2) and bpf(2), is designed to be
used by unprivileged processes to sandbox themselves.  It has the same
usage restrictions as seccomp(2): no_new_privs check.

There is currently four commands:
* get_features: Gets the supported features (required for backward
  compatibility and best-effort security).
* create_ruleset: Creates a ruleset and returns its file descriptor.
* add_rule: Adds a rule (e.g. file hierarchy access) to a ruleset,
  identified by the dedicated file descriptor.
* enforce_ruleset: Enforces a ruleset on the current thread (similar to
  seccomp).

See the user and code documentation for more details.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v13:
* New implementation, replacing the dependency on seccomp(2) and bpf(2).
---
 include/linux/syscalls.h      |   3 +
 include/uapi/linux/landlock.h | 213 +++++++++++++++
 security/landlock/Makefile    |   2 +-
 security/landlock/ruleset.c   |   3 +
 security/landlock/syscall.c   | 470 ++++++++++++++++++++++++++++++++++
 5 files changed, 690 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/syscall.c

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1815065d52f3..beaadcf4ef77 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1003,6 +1003,9 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_landlock(unsigned int command, unsigned int options,
+			     size_t attr1_size, void __user *attr1_ptr,
+			     size_t attr2_size, void __user *attr2_ptr);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 92760aca3645..0b6d3e9f4b37 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -9,6 +9,219 @@
 #ifndef _UAPI__LINUX_LANDLOCK_H__
 #define _UAPI__LINUX_LANDLOCK_H__
 
+#include <linux/types.h>
+
+/**
+ * enum landlock_cmd - Landlock commands
+ *
+ * First argument of sys_landlock().
+ */
+enum landlock_cmd {
+	/**
+	 * @LANDLOCK_CMD_GET_FEATURES: Asks the kernel for supported Landlock
+	 * features.  The option argument must contains
+	 * %LANDLOCK_OPT_GET_FEATURES.  This commands fills the &struct
+	 * landlock_attr_features provided as first attribute.
+	 */
+	LANDLOCK_CMD_GET_FEATURES = 1,
+	/**
+	 * @LANDLOCK_CMD_CREATE_RULESET: Creates a new ruleset and return its
+	 * file descriptor on success.  The option argument must contains
+	 * %LANDLOCK_OPT_CREATE_RULESET.  The ruleset is defined by the &struct
+	 * landlock_attr_ruleset provided as first attribute.
+	 */
+	LANDLOCK_CMD_CREATE_RULESET,
+	/**
+	 * @LANDLOCK_CMD_ADD_RULE: Adds a rule to a ruleset.  The option
+	 * argument must contains %LANDLOCK_OPT_ADD_RULE_PATH_BENEATH.  The
+	 * ruleset and the rule are both defined by the &struct
+	 * landlock_attr_path_beneath provided as first attribute.
+	 */
+	LANDLOCK_CMD_ADD_RULE,
+	/**
+	 * @LANDLOCK_CMD_ENFORCE_RULESET: Enforces a ruleset on the current
+	 * process.  The option argument must contains
+	 * %LANDLOCK_OPT_ENFORCE_RULESET.  The ruleset is defined by the
+	 * &struct landlock_attr_enforce provided as first attribute.
+	 */
+	LANDLOCK_CMD_ENFORCE_RULESET,
+};
+
+/**
+ * DOC: options_intro
+ *
+ * These options may be used as second argument of sys_landlock().  Each
+ * command have a dedicated set of options, represented as bitmasks.  For two
+ * different commands, their options may overlap.  Each command have at least
+ * one option defining the used attribute type.  This also enables to always
+ * have a usable &struct landlock_attr_features (i.e. filled with bits).
+ */
+
+/**
+ * DOC: options_get_features
+ *
+ * Options for ``LANDLOCK_CMD_GET_FEATURES``
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * - %LANDLOCK_OPT_GET_FEATURES: the attr type is `struct
+ *   landlock_attr_features`.
+ */
+#define LANDLOCK_OPT_GET_FEATURES			(1ULL << 0)
+
+/**
+ * DOC: options_create_ruleset
+ *
+ * Options for ``LANDLOCK_CMD_CREATE_RULESET``
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * - %LANDLOCK_OPT_CREATE_RULESET: the attr type is `struct
+ *   landlock_attr_ruleset`.
+ */
+#define LANDLOCK_OPT_CREATE_RULESET			(1ULL << 0)
+
+/**
+ * DOC: options_add_rule
+ *
+ * Options for ``LANDLOCK_CMD_ADD_RULE``
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * - %LANDLOCK_OPT_ADD_RULE_PATH_BENEATH: the attr type is `struct
+ *   landlock_attr_path_beneath`.
+ */
+#define LANDLOCK_OPT_ADD_RULE_PATH_BENEATH		(1ULL << 0)
+
+/**
+ * DOC: options_enforce_ruleset
+ *
+ * Options for ``LANDLOCK_CMD_ENFORCE_RULESET``
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * - %LANDLOCK_OPT_ENFORCE_RULESET: the attr type is `struct
+ *   landlock_attr_enforce`.
+ */
+#define LANDLOCK_OPT_ENFORCE_RULESET			(1ULL << 0)
+
+/**
+ * struct landlock_attr_features - Receives the supported features
+ *
+ * This struct should be allocated by user space but it will be filled by the
+ * kernel to indicate the subset of Landlock features effectively handled by
+ * the running kernel.  This enables backward compatibility for applications
+ * which are developed on a newer kernel than the one running the application.
+ * This helps avoid hard errors that may entirely disable the use of Landlock
+ * features because some of them may not be supported.  Indeed, because
+ * Landlock is a security feature, even if the kernel doesn't support all the
+ * requested features, user space applications should still use the subset
+ * which is supported by the running kernel.  Indeed, a partial security policy
+ * can still improve the security of the application and better protect the
+ * user (i.e. best-effort approach).  The %LANDLOCK_CMD_GET_FEATURES command
+ * and &struct landlock_attr_features are future-proof because the future
+ * unknown fields requested by user space (i.e. a larger &struct
+ * landlock_attr_features) can still be filled with zeros.
+ *
+ * The Landlock commands will fail if an unsupported option or access is
+ * requested.  By firstly requesting the supported options and accesses, it is
+ * quite easy for the developer to binary AND these returned bitmasks with the
+ * used options and accesses from the attribute structs (e.g. &struct
+ * landlock_attr_ruleset), and even infer the supported Landlock commands.
+ * Indeed, because each command must support at least one option, the options_*
+ * fields are always filled if the related commands are supported.  The
+ * supported attributes are also discoverable thanks to the size_* fields.  All
+ * this data enable to create applications doing their best to sandbox
+ * themselves regardless of the running kernel.
+ */
+struct landlock_attr_features {
+	/**
+	 * @options_get_features: Options supported by the
+	 * %LANDLOCK_CMD_GET_FEATURES command. Cf. `Options`_.
+	 */
+	__aligned_u64 options_get_features;
+	/**
+	 * @options_create_ruleset: Options supported by the
+	 * %LANDLOCK_CMD_CREATE_RULESET command. Cf. `Options`_.
+	 */
+	__aligned_u64 options_create_ruleset;
+	/**
+	 * @options_add_rule: Options supported by the %LANDLOCK_CMD_ADD_RULE
+	 * command. Cf. `Options`_.
+	 */
+	__aligned_u64 options_add_rule;
+	/**
+	 * @options_enforce_ruleset: Options supported by the
+	 * %LANDLOCK_CMD_ENFORCE_RULESET command. Cf. `Options`_.
+	 */
+	__aligned_u64 options_enforce_ruleset;
+	/**
+	 * @access_fs: Subset of file system access supported by the running
+	 * kernel, used in &struct landlock_attr_ruleset and &struct
+	 * landlock_attr_path_beneath.  Cf. `Filesystem flags`_.
+	 */
+	__aligned_u64 access_fs;
+	/**
+	 * @size_attr_ruleset: Size of the &struct landlock_attr_ruleset as
+	 * known by the kernel (i.e.  ``sizeof(struct
+	 * landlock_attr_ruleset)``).
+	 */
+	__aligned_u64 size_attr_ruleset;
+	/**
+	 * @size_attr_path_beneath: Size of the &struct
+	 * landlock_attr_path_beneath as known by the kernel (i.e.
+	 * ``sizeof(struct landlock_path_beneath)``).
+	 */
+	__aligned_u64 size_attr_path_beneath;
+};
+
+/**
+ * struct landlock_attr_ruleset- Defines a new ruleset
+ *
+ * Used as first attribute for the %LANDLOCK_CMD_CREATE_RULESET command and
+ * with the %LANDLOCK_OPT_CREATE_RULESET option.
+ */
+struct landlock_attr_ruleset {
+	/**
+	 * @handled_access_fs: Bitmask of actions (cf. `Filesystem flags`_)
+	 * that is handled by this ruleset and should then be forbidden if no
+	 * rule explicitly allow them.  This is needed for backward
+	 * compatibility reasons.  The user space code should check the
+	 * effectively supported actions thanks to %LANDLOCK_CMD_GET_SUPPORTED
+	 * and &struct landlock_attr_features, and then adjust the arguments of
+	 * the next calls to sys_landlock() accordingly.
+	 */
+	__aligned_u64 handled_access_fs;
+};
+
+/**
+ * struct landlock_attr_path_beneath - Defines a path hierarchy
+ */
+struct landlock_attr_path_beneath {
+	/**
+	 * @ruleset_fd: File descriptor tied to the ruleset which should be
+	 * extended with this new access.
+	 */
+	__aligned_u64 ruleset_fd;
+	/**
+	 * @parent_fd: File descriptor, open with ``O_PATH``, which identify
+	 * the parent directory of a file hierarchy, or just a file.
+	 */
+	__aligned_u64 parent_fd;
+	/**
+	 * @allowed_access: Bitmask of allowed actions for this file hierarchy
+	 * (cf. `Filesystem flags`_).
+	 */
+	__aligned_u64 allowed_access;
+};
+
+/**
+ * struct landlock_attr_enforce - Describes the enforcement
+ */
+struct landlock_attr_enforce {
+	/**
+	 * @ruleset_fd: File descriptor tied to the ruleset to merge with the
+	 * current domain.
+	 */
+	__aligned_u64 ruleset_fd;
+};
+
 /**
  * DOC: fs_access
  *
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 92e3d80ab8ed..4388494779ec 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := setup.o object.o ruleset.o \
+landlock-y := setup.o syscall.o object.o ruleset.o \
 	cred.o ptrace.o fs.o
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 5ec013a4188d..fab17110804f 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -17,6 +17,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
+#include <uapi/linux/landlock.h>
 
 #include "object.h"
 #include "ruleset.h"
@@ -40,6 +41,8 @@ struct landlock_ruleset *landlock_create_ruleset(u64 fs_access_mask)
 	struct landlock_ruleset *ruleset;
 
 	/* Safely handles 32-bits conversion. */
+	BUILD_BUG_ON(!__same_type(fs_access_mask, ((struct
+		landlock_attr_ruleset *)NULL)->handled_access_fs));
 	BUILD_BUG_ON(!__same_type(fs_access_mask, _LANDLOCK_ACCESS_FS_LAST));
 
 	/* Checks content. */
diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c
new file mode 100644
index 000000000000..da80e3061b5a
--- /dev/null
+++ b/security/landlock/syscall.c
@@ -0,0 +1,470 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - System call and user space interfaces
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <asm/current.h>
+#include <linux/anon_inodes.h>
+#include <linux/build_bug.h>
+#include <linux/capability.h>
+#include <linux/dcache.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/landlock.h>
+#include <linux/limits.h>
+#include <linux/path.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/sched.h>
+#include <linux/security.h>
+#include <linux/syscalls.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <uapi/linux/landlock.h>
+
+#include "cred.h"
+#include "fs.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/**
+ * copy_struct_if_any_from_user - Safe future-proof argument copying
+ *
+ * Extend copy_struct_from_user() to handle NULL @src, which allows for future
+ * use of @src even if it is not used right now.
+ *
+ * @dst: kernel space pointer or NULL
+ * @ksize: size of the data pointed by @dst
+ * @src: user space pointer or NULL
+ * @usize: size of the data pointed by @src
+ */
+static int copy_struct_if_any_from_user(void *dst, size_t ksize,
+		const void __user *src, size_t usize)
+{
+	int ret;
+
+	if (dst) {
+		if (WARN_ON_ONCE(ksize == 0))
+			return -EFAULT;
+	} else {
+		if (WARN_ON_ONCE(ksize != 0))
+			return -EFAULT;
+	}
+	if (!src) {
+		if (usize != 0)
+			return -EFAULT;
+		if (dst)
+			memset(dst, 0, ksize);
+		return 0;
+	}
+	if (usize == 0)
+		return -ENODATA;
+	if (usize > PAGE_SIZE)
+		return -E2BIG;
+	if (dst)
+		return copy_struct_from_user(dst, ksize, src, usize);
+	ret = check_zeroed_user(src, usize);
+	if (ret <= 0)
+		return ret ?: -E2BIG;
+	return 0;
+}
+
+/* Features */
+
+#define _LANDLOCK_OPT_GET_FEATURES_LAST		LANDLOCK_OPT_GET_FEATURES
+#define _LANDLOCK_OPT_GET_FEATURES_MASK		((_LANDLOCK_OPT_GET_FEATURES_LAST << 1) - 1)
+
+#define _LANDLOCK_OPT_CREATE_RULESET_LAST	LANDLOCK_OPT_CREATE_RULESET
+#define _LANDLOCK_OPT_CREATE_RULESET_MASK	((_LANDLOCK_OPT_CREATE_RULESET_LAST << 1) - 1)
+
+#define _LANDLOCK_OPT_ADD_RULE_LAST		LANDLOCK_OPT_ADD_RULE_PATH_BENEATH
+#define _LANDLOCK_OPT_ADD_RULE_MASK		((_LANDLOCK_OPT_ADD_RULE_LAST << 1) - 1)
+
+#define _LANDLOCK_OPT_ENFORCE_RULESET_LAST	LANDLOCK_OPT_ENFORCE_RULESET
+#define _LANDLOCK_OPT_ENFORCE_RULESET_MASK	((_LANDLOCK_OPT_ENFORCE_RULESET_LAST << 1) - 1)
+
+static int syscall_get_features(size_t attr_size, void __user *attr_ptr)
+{
+	size_t data_size, fill_size;
+	struct landlock_attr_features supported = {
+		.options_get_features = _LANDLOCK_OPT_GET_FEATURES_MASK,
+		.options_create_ruleset = _LANDLOCK_OPT_CREATE_RULESET_MASK,
+		.options_add_rule = _LANDLOCK_OPT_ADD_RULE_MASK,
+		.options_enforce_ruleset = _LANDLOCK_OPT_ENFORCE_RULESET_MASK,
+		.access_fs = _LANDLOCK_ACCESS_FS_MASK,
+		.size_attr_ruleset = sizeof(struct landlock_attr_ruleset),
+		.size_attr_path_beneath = sizeof(struct
+				landlock_attr_path_beneath),
+	};
+
+	if (attr_size == 0)
+		return -ENODATA;
+	if (attr_size > PAGE_SIZE)
+		return -E2BIG;
+	data_size = min(sizeof(supported), attr_size);
+	if (copy_to_user(attr_ptr, &supported, data_size))
+		return -EFAULT;
+	/* Fills the rest with zeros. */
+	fill_size = attr_size - data_size;
+	if (fill_size > 0 && clear_user(attr_ptr + data_size, fill_size))
+		return -EFAULT;
+	return 0;
+}
+
+/* Ruleset handling */
+
+#ifdef CONFIG_PROC_FS
+static void fop_ruleset_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	const struct landlock_ruleset *ruleset = filp->private_data;
+
+	seq_printf(m, "handled_access_fs:\t%x\n", ruleset->fs_access_mask);
+	seq_printf(m, "nb_rules:\t%d\n", atomic_read(&ruleset->nb_rules));
+}
+#endif
+
+static int fop_ruleset_release(struct inode *inode, struct file *filp)
+{
+	struct landlock_ruleset *ruleset = filp->private_data;
+
+	landlock_put_ruleset(ruleset);
+	return 0;
+}
+
+static ssize_t fop_dummy_read(struct file *filp, char __user *buf, size_t size,
+		loff_t *ppos)
+{
+	/* Dummy handler to enable FMODE_CAN_READ. */
+	return -EINVAL;
+}
+
+static ssize_t fop_dummy_write(struct file *filp, const char __user *buf,
+			       size_t size, loff_t *ppos)
+{
+	/* Dummy handler to enable FMODE_CAN_WRITE. */
+	return -EINVAL;
+}
+
+/*
+ * A ruleset file descriptor enables to build a ruleset by adding (i.e.
+ * writing) rule after rule, without relying on the task's context.  This
+ * reentrant design is also used in a read way to enforce the ruleset on the
+ * current task.
+ */
+static const struct file_operations ruleset_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= fop_ruleset_show_fdinfo,
+#endif
+	.release	= fop_ruleset_release,
+	.read		= fop_dummy_read,
+	.write		= fop_dummy_write,
+};
+
+static int syscall_create_ruleset(size_t attr_size, void __user *attr_ptr)
+{
+	struct landlock_attr_ruleset attr_ruleset;
+	struct landlock_ruleset *ruleset;
+	int err, ruleset_fd;
+
+	/* Copies raw userspace struct. */
+	err = copy_struct_if_any_from_user(&attr_ruleset, sizeof(attr_ruleset),
+			attr_ptr, attr_size);
+	if (err)
+		return err;
+
+	/* Checks arguments and transform to kernel struct. */
+	ruleset = landlock_create_ruleset(attr_ruleset.handled_access_fs);
+	if (IS_ERR(ruleset))
+		return PTR_ERR(ruleset);
+
+	/* Creates anonymous FD referring to the ruleset, with safe flags. */
+	ruleset_fd = anon_inode_getfd("landlock-ruleset", &ruleset_fops,
+			ruleset, O_RDWR | O_CLOEXEC);
+	if (ruleset_fd < 0)
+		landlock_put_ruleset(ruleset);
+	return ruleset_fd;
+}
+
+/*
+ * Returns an owned ruleset from a FD. It is thus needed to call
+ * landlock_put_ruleset() on the return value.
+ */
+static struct landlock_ruleset *get_ruleset_from_fd(u64 fd, fmode_t mode)
+{
+	struct fd ruleset_f;
+	struct landlock_ruleset *ruleset;
+	int err;
+
+	BUILD_BUG_ON(!__same_type(fd,
+		((struct landlock_attr_path_beneath *)NULL)->ruleset_fd));
+	BUILD_BUG_ON(!__same_type(fd,
+		((struct landlock_attr_enforce *)NULL)->ruleset_fd));
+	/* Checks 32-bits overflow. fdget() checks for INT_MAX/FD. */
+	if (fd > U32_MAX)
+		return ERR_PTR(-EINVAL);
+	ruleset_f = fdget(fd);
+	if (!ruleset_f.file)
+		return ERR_PTR(-EBADF);
+	err = 0;
+	if (ruleset_f.file->f_op != &ruleset_fops)
+		err = -EBADR;
+	else if (!(ruleset_f.file->f_mode & mode))
+		err = -EPERM;
+	if (!err) {
+		ruleset = ruleset_f.file->private_data;
+		landlock_get_ruleset(ruleset);
+	}
+	fdput(ruleset_f);
+	return err ? ERR_PTR(err) : ruleset;
+}
+
+/* Path handling */
+
+static inline bool is_user_mountable(struct dentry *dentry)
+{
+	/*
+	 * Check pseudo-filesystems that will never be mountable (e.g. sockfs,
+	 * pipefs, bdev), cf. fs/libfs.c:init_pseudo().
+	 */
+	return d_is_positive(dentry) &&
+		!IS_PRIVATE(dentry->d_inode) &&
+		!(dentry->d_sb->s_flags & SB_NOUSER);
+}
+
+/*
+ * @path: Must call put_path(@path) after the call if it succeeded.
+ */
+static int get_path_from_fd(u64 fd, struct path *path)
+{
+	struct fd f;
+	int err;
+
+	BUILD_BUG_ON(!__same_type(fd,
+		((struct landlock_attr_path_beneath *)NULL)->parent_fd));
+	/* Checks 32-bits overflow. fdget_raw() checks for INT_MAX/FD. */
+	if (fd > U32_MAX)
+		return -EINVAL;
+	/* Handles O_PATH. */
+	f = fdget_raw(fd);
+	if (!f.file)
+		return -EBADF;
+	/*
+	 * Forbids to add to a ruleset a path which is forbidden to open (by
+	 * Landlock, another LSM, DAC...).  Because the file was open with
+	 * O_PATH, the file mode doesn't have FMODE_READ nor FMODE_WRITE.
+	 *
+	 * WARNING: security_file_open() was only called in do_dentry_open()
+	 * until now.  The main difference now is that f_op may be NULL.  This
+	 * field doesn't seem to be dereferenced by any upstream LSM though.
+	 */
+	err = security_file_open(f.file);
+	if (err)
+		goto out_fdput;
+	/*
+	 * Only allows O_PATH FD: enable to restrict ambiant (FS) accesses
+	 * without requiring to open and risk leaking or misuing a FD.  Accept
+	 * removed, but still open directory (S_DEAD).
+	 */
+	if (!(f.file->f_mode & FMODE_PATH) || !f.file->f_path.mnt ||
+			!is_user_mountable(f.file->f_path.dentry)) {
+		err = -EBADR;
+		goto out_fdput;
+	}
+	path->mnt = f.file->f_path.mnt;
+	path->dentry = f.file->f_path.dentry;
+	path_get(path);
+
+out_fdput:
+	fdput(f);
+	return err;
+}
+
+static int syscall_add_rule_path_beneath(size_t attr_size,
+		void __user *attr_ptr)
+{
+	struct landlock_attr_path_beneath attr_path_beneath;
+	struct path path;
+	struct landlock_ruleset *ruleset;
+	int err;
+
+	/* Copies raw userspace struct. */
+	err = copy_struct_if_any_from_user(&attr_path_beneath,
+			sizeof(attr_path_beneath), attr_ptr, attr_size);
+	if (err)
+		return err;
+
+	/* Gets the ruleset. */
+	ruleset = get_ruleset_from_fd(attr_path_beneath.ruleset_fd,
+			FMODE_CAN_WRITE);
+	if (IS_ERR(ruleset))
+		return PTR_ERR(ruleset);
+
+	/* Checks content (fs_access_mask is upgraded to 64-bits). */
+	if ((attr_path_beneath.allowed_access | ruleset->fs_access_mask) !=
+			ruleset->fs_access_mask) {
+		err = -EINVAL;
+		goto out_put_ruleset;
+	}
+
+	err = get_path_from_fd(attr_path_beneath.parent_fd, &path);
+	if (err)
+		goto out_put_ruleset;
+
+	err = landlock_append_fs_rule(ruleset, &path,
+			attr_path_beneath.allowed_access);
+	path_put(&path);
+
+out_put_ruleset:
+	landlock_put_ruleset(ruleset);
+	return err;
+}
+
+/* Enforcement */
+
+static int syscall_enforce_ruleset(size_t attr_size,
+		void __user *attr_ptr)
+{
+	struct landlock_ruleset *new_dom, *ruleset;
+	struct cred *new_cred;
+	struct landlock_cred_security *new_llcred;
+	struct landlock_attr_enforce attr_enforce;
+	int err;
+
+	/*
+	 * Enforcing a Landlock ruleset requires that the task has
+	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
+	 * This avoids scenarios where unprivileged tasks can affect the
+	 * behavior of privileged children.  These are similar checks as for
+	 * seccomp(2), except that an -EPERM may be returned.
+	 */
+	if (!task_no_new_privs(current)) {
+		err = security_capable(current_cred(), current_user_ns(),
+				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT);
+		if (err)
+			return err;
+	}
+
+	/* Copies raw userspace struct. */
+	err = copy_struct_if_any_from_user(&attr_enforce, sizeof(attr_enforce),
+			attr_ptr, attr_size);
+	if (err)
+		return err;
+
+	/* Get the ruleset. */
+	ruleset = get_ruleset_from_fd(attr_enforce.ruleset_fd, FMODE_CAN_READ);
+	if (IS_ERR(ruleset))
+		return PTR_ERR(ruleset);
+	/* Informs about useless ruleset. */
+	if (!atomic_read(&ruleset->nb_rules)) {
+		err = -ENOMSG;
+		goto out_put_ruleset;
+	}
+
+	new_cred = prepare_creds();
+	if (!new_cred) {
+		err = -ENOMEM;
+		goto out_put_ruleset;
+	}
+	new_llcred = landlock_cred(new_cred);
+	/*
+	 * There is no possible race condition while copying and manipulating
+	 * the current credentials because they are dedicated per thread.
+	 */
+	new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset);
+	if (IS_ERR(new_dom)) {
+		err = PTR_ERR(new_dom);
+		goto out_put_creds;
+	}
+	/* Replaces the old (prepared) domain. */
+	landlock_put_ruleset(new_llcred->domain);
+	new_llcred->domain = new_dom;
+
+	landlock_put_ruleset(ruleset);
+	return commit_creds(new_cred);
+
+out_put_creds:
+	abort_creds(new_cred);
+
+out_put_ruleset:
+	landlock_put_ruleset(ruleset);
+	return err;
+}
+
+/**
+ * landlock - System call to enable a process to safely sandbox itself
+ *
+ * @command: Landlock command to perform miscellaneous, but safe, actions. Cf.
+ *           `Commands`_.
+ * @options: Bitmask of options dedicated to one command. Cf. `Options`_.
+ * @attr1_size: First attribute size (i.e. size of the struct).
+ * @attr1_ptr: Pointer to the first attribute. Cf. `Attributes`_.
+ * @attr2_size: Unused for now.
+ * @attr2_ptr: Unused for now.
+ *
+ * The @command and @options arguments enable a seccomp-bpf policy to control
+ * the requested actions.  However, it should be noted that Landlock is
+ * designed from the ground to enable unprivileged process to drop privileges
+ * and accesses in a way that can not harm other processes.  This syscall and
+ * all its arguments should then be allowed for any process, which will then
+ * enable applications to strengthen the security of the whole system.
+ *
+ * @attr2_size and @attr2_ptr describe a second attribute which could be used
+ * in the future to compose with the first attribute (e.g. a
+ * landlock_attr_path_beneath with a landlock_attr_ioctl).
+ *
+ * The order of return errors begins with ENOPKG (disabled Landlock),
+ * EOPNOTSUPP (unknown command or option) and then EINVAL (invalid attribute).
+ * The other error codes may be specific to each command.
+ */
+SYSCALL_DEFINE6(landlock, unsigned int, command, unsigned int, options,
+		size_t, attr1_size, void __user *, attr1_ptr,
+		size_t, attr2_size, void __user *, attr2_ptr)
+{
+	/*
+	 * Enables user space to identify if Landlock is disabled, thanks to a
+	 * specific error code.
+	 */
+	if (!landlock_initialized)
+		return -ENOPKG;
+
+	switch ((enum landlock_cmd)command) {
+	case LANDLOCK_CMD_GET_FEATURES:
+		if (options == LANDLOCK_OPT_GET_FEATURES) {
+			if (attr2_size || attr2_ptr)
+				return -EINVAL;
+			return syscall_get_features(attr1_size, attr1_ptr);
+		}
+		return -EOPNOTSUPP;
+	case LANDLOCK_CMD_CREATE_RULESET:
+		if (options == LANDLOCK_OPT_CREATE_RULESET) {
+			if (attr2_size || attr2_ptr)
+				return -EINVAL;
+			return syscall_create_ruleset(attr1_size, attr1_ptr);
+		}
+		return -EOPNOTSUPP;
+	case LANDLOCK_CMD_ADD_RULE:
+		/*
+		 * A future extension could add a
+		 * LANDLOCK_OPT_ADD_RULE_PATH_RANGE.
+		 */
+		if (options == LANDLOCK_OPT_ADD_RULE_PATH_BENEATH) {
+			if (attr2_size || attr2_ptr)
+				return -EINVAL;
+			return syscall_add_rule_path_beneath(attr1_size,
+					attr1_ptr);
+		}
+		return -EOPNOTSUPP;
+	case LANDLOCK_CMD_ENFORCE_RULESET:
+		if (options == LANDLOCK_OPT_ENFORCE_RULESET) {
+			if (attr2_size || attr2_ptr)
+				return -EINVAL;
+			return syscall_enforce_ruleset(attr1_size, attr1_ptr);
+		}
+		return -EOPNOTSUPP;
+	}
+	return -EOPNOTSUPP;
+}
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 07/10] arch: Wire up landlock() syscall
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (5 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 06/10] landlock: Add syscall implementation Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 08/10] selftests/landlock: Add initial tests Mickaël Salaün
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

Wire up the landlock() call for x86_64 (for now).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v13:
* New implementation.
---
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/uapi/asm-generic/unistd.h      | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 44d510bc9b78..3e759505c8bf 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -359,6 +359,7 @@
 435	common	clone3			__x64_sys_clone3/ptregs
 437	common	openat2			__x64_sys_openat2
 438	common	pidfd_getfd		__x64_sys_pidfd_getfd
+439	common	landlock		__x64_sys_landlock
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 3a3201e4618e..31d5814ddb13 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3)
 __SYSCALL(__NR_openat2, sys_openat2)
 #define __NR_pidfd_getfd 438
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
+#define __NR_landlock 439
+__SYSCALL(__NR_landlock, sys_landlock)
 
 #undef __NR_syscalls
-#define __NR_syscalls 439
+#define __NR_syscalls 440
 
 /*
  * 32 bit systems traditionally used different
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 08/10] selftests/landlock: Add initial tests
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (6 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 07/10] arch: Wire up landlock() syscall Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 09/10] samples/landlock: Add a sandbox manager example Mickaël Salaün
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

Test landlock syscall, ptrace hooks semantic and filesystem
access-control.

This is an initial batch, more tests will follow.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Shuah Khan <shuah@kernel.org>
---

Changes since v13:
* Add back the filesystem tests (from v10) and extend them.
* Add tests for the new syscall.

Previous version:
https://lore.kernel.org/lkml/20191104172146.30797-7-mic@digikod.net/
---
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/landlock/.gitignore   |   3 +
 tools/testing/selftests/landlock/Makefile     |  13 +
 tools/testing/selftests/landlock/config       |   4 +
 tools/testing/selftests/landlock/test.h       |  40 ++
 tools/testing/selftests/landlock/test_base.c  |  80 +++
 tools/testing/selftests/landlock/test_fs.c    | 624 ++++++++++++++++++
 .../testing/selftests/landlock/test_ptrace.c  | 293 ++++++++
 8 files changed, 1058 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/config
 create mode 100644 tools/testing/selftests/landlock/test.h
 create mode 100644 tools/testing/selftests/landlock/test_base.c
 create mode 100644 tools/testing/selftests/landlock/test_fs.c
 create mode 100644 tools/testing/selftests/landlock/test_ptrace.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6ec503912bea..5183f269c244 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -24,6 +24,7 @@ TARGETS += ir
 TARGETS += kcmp
 TARGETS += kexec
 TARGETS += kvm
+TARGETS += landlock
 TARGETS += lib
 TARGETS += livepatch
 TARGETS += lkdtm
diff --git a/tools/testing/selftests/landlock/.gitignore b/tools/testing/selftests/landlock/.gitignore
new file mode 100644
index 000000000000..4ee53c733af0
--- /dev/null
+++ b/tools/testing/selftests/landlock/.gitignore
@@ -0,0 +1,3 @@
+/test_base
+/test_fs
+/test_ptrace
diff --git a/tools/testing/selftests/landlock/Makefile b/tools/testing/selftests/landlock/Makefile
new file mode 100644
index 000000000000..c7e26e1251c4
--- /dev/null
+++ b/tools/testing/selftests/landlock/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+
+test_src := $(wildcard test_*.c)
+
+TEST_GEN_PROGS := $(test_src:.c=)
+
+usr_include := ../../../../usr/include
+
+CFLAGS += -Wall -O2 -I$(usr_include)
+
+include ../lib.mk
+
+$(TEST_GEN_PROGS): ../kselftest_harness.h $(usr_include)/linux/landlock.h
diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
new file mode 100644
index 000000000000..662f72c5a0df
--- /dev/null
+++ b/tools/testing/selftests/landlock/config
@@ -0,0 +1,4 @@
+CONFIG_HEADERS_INSTALL=y
+CONFIG_SECURITY_LANDLOCK=y
+CONFIG_SECURITY_PATH=y
+CONFIG_SECURITY=y
diff --git a/tools/testing/selftests/landlock/test.h b/tools/testing/selftests/landlock/test.h
new file mode 100644
index 000000000000..f9cebd8fc169
--- /dev/null
+++ b/tools/testing/selftests/landlock/test.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Landlock test helpers
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#include <errno.h>
+#include <sys/syscall.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef landlock
+static inline int landlock(unsigned int command, unsigned int options,
+		size_t attr_size, void *attr_ptr)
+{
+	errno = 0;
+	return syscall(__NR_landlock, command, options, attr_size, attr_ptr, 0,
+			NULL);
+}
+#endif
+
+FIXTURE(ruleset_rw) {
+	struct landlock_attr_ruleset attr_ruleset;
+	int ruleset_fd;
+};
+
+FIXTURE_SETUP(ruleset_rw) {
+	self->attr_ruleset.handled_access_fs = LANDLOCK_ACCESS_FS_READ |
+		LANDLOCK_ACCESS_FS_WRITE;
+	self->ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET,
+			sizeof(self->attr_ruleset), &self->attr_ruleset);
+	ASSERT_LE(0, self->ruleset_fd);
+}
+
+FIXTURE_TEARDOWN(ruleset_rw) {
+	ASSERT_EQ(0, close(self->ruleset_fd));
+}
diff --git a/tools/testing/selftests/landlock/test_base.c b/tools/testing/selftests/landlock/test_base.c
new file mode 100644
index 000000000000..1ac7dbead3b2
--- /dev/null
+++ b/tools/testing/selftests/landlock/test_base.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - common resources
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <sys/prctl.h>
+
+#include "test.h"
+
+#define FDINFO_TEMPLATE "/proc/self/fdinfo/%d"
+#define FDINFO_SIZE 128
+
+#ifndef O_PATH
+#define O_PATH		010000000
+#endif
+
+TEST_F(ruleset_rw, fdinfo)
+{
+	int fdinfo_fd, fdinfo_path_size, fdinfo_buf_size;
+	char fdinfo_path[sizeof(FDINFO_TEMPLATE) + 2];
+	char fdinfo_buf[FDINFO_SIZE];
+
+	fdinfo_path_size = snprintf(fdinfo_path, sizeof(fdinfo_path),
+			FDINFO_TEMPLATE, self->ruleset_fd);
+	ASSERT_LE(fdinfo_path_size, sizeof(fdinfo_path));
+
+	fdinfo_fd = open(fdinfo_path, O_RDONLY | O_CLOEXEC);
+	ASSERT_GE(fdinfo_fd, 0);
+
+	fdinfo_buf_size = read(fdinfo_fd, fdinfo_buf, sizeof(fdinfo_buf));
+	ASSERT_LE(fdinfo_buf_size, sizeof(fdinfo_buf) - 1);
+
+	/*
+	 * fdinfo_buf: pos:        0
+	 * flags:  02000002
+	 * mnt_id: 13
+	 * handled_access_fs:     804000
+	 */
+	EXPECT_EQ(0, close(fdinfo_fd));
+}
+
+TEST(features)
+{
+	struct landlock_attr_features attr_features = {
+		.options_get_features = ~0ULL,
+		.options_create_ruleset = ~0ULL,
+		.options_add_rule = ~0ULL,
+		.options_enforce_ruleset = ~0ULL,
+		.access_fs = ~0ULL,
+		.size_attr_ruleset = ~0ULL,
+		.size_attr_path_beneath = ~0ULL,
+	};
+
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+				LANDLOCK_OPT_CREATE_RULESET,
+				sizeof(attr_features), &attr_features));
+	ASSERT_EQ(((LANDLOCK_OPT_GET_FEATURES << 1) - 1),
+			attr_features.options_get_features);
+	ASSERT_EQ(((LANDLOCK_OPT_CREATE_RULESET << 1) - 1),
+			attr_features.options_create_ruleset);
+	ASSERT_EQ(((LANDLOCK_OPT_ADD_RULE_PATH_BENEATH << 1) - 1),
+			attr_features.options_add_rule);
+	ASSERT_EQ(((LANDLOCK_OPT_ENFORCE_RULESET << 1) - 1),
+			attr_features.options_enforce_ruleset);
+	ASSERT_EQ(((LANDLOCK_ACCESS_FS_MAP << 1) - 1),
+			attr_features.access_fs);
+	ASSERT_EQ(sizeof(struct landlock_attr_ruleset),
+		attr_features.size_attr_ruleset);
+	ASSERT_EQ(sizeof(struct landlock_attr_path_beneath),
+		attr_features.size_attr_path_beneath);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/test_fs.c b/tools/testing/selftests/landlock/test_fs.c
new file mode 100644
index 000000000000..627cb3a71f89
--- /dev/null
+++ b/tools/testing/selftests/landlock/test_fs.c
@@ -0,0 +1,624 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - filesystem
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "test.h"
+
+#define TMP_PREFIX "tmp_"
+
+/* Paths (sibling number and depth) */
+const char dir_s0_d1[] = TMP_PREFIX "a0";
+const char dir_s0_d2[] = TMP_PREFIX "a0/b0";
+const char dir_s0_d3[] = TMP_PREFIX "a0/b0/c0";
+const char dir_s1_d1[] = TMP_PREFIX "a1";
+const char dir_s2_d1[] = TMP_PREFIX "a2";
+const char dir_s2_d2[] = TMP_PREFIX "a2/b2";
+
+/* dir_s3_d1 is a tmpfs mount. */
+const char dir_s3_d1[] = TMP_PREFIX "a3";
+const char dir_s3_d2[] = TMP_PREFIX "a3/b3";
+
+/* dir_s4_d2 is a tmpfs mount. */
+const char dir_s4_d1[] = TMP_PREFIX "a4";
+const char dir_s4_d2[] = TMP_PREFIX "a4/b4";
+
+static void cleanup_layout1(void)
+{
+	rmdir(dir_s2_d2);
+	rmdir(dir_s2_d1);
+	rmdir(dir_s1_d1);
+	rmdir(dir_s0_d3);
+	rmdir(dir_s0_d2);
+	rmdir(dir_s0_d1);
+
+	/* dir_s3_d2 may be bind mounted */
+	umount(dir_s3_d2);
+	rmdir(dir_s3_d2);
+	umount(dir_s3_d1);
+	rmdir(dir_s3_d1);
+
+	umount(dir_s4_d2);
+	rmdir(dir_s4_d2);
+	rmdir(dir_s4_d1);
+}
+
+FIXTURE(layout1) {
+};
+
+FIXTURE_SETUP(layout1)
+{
+	cleanup_layout1();
+
+	/* Do not pollute the rest of the system. */
+	ASSERT_NE(-1, unshare(CLONE_NEWNS));
+
+	ASSERT_EQ(0, mkdir(dir_s0_d1, 0600));
+	ASSERT_EQ(0, mkdir(dir_s0_d2, 0600));
+	ASSERT_EQ(0, mkdir(dir_s0_d3, 0600));
+	ASSERT_EQ(0, mkdir(dir_s1_d1, 0600));
+	ASSERT_EQ(0, mkdir(dir_s2_d1, 0600));
+	ASSERT_EQ(0, mkdir(dir_s2_d2, 0600));
+
+	ASSERT_EQ(0, mkdir(dir_s3_d1, 0600));
+	ASSERT_EQ(0, mount("tmp", dir_s3_d1, "tmpfs", 0, NULL));
+	ASSERT_EQ(0, mkdir(dir_s3_d2, 0600));
+
+	ASSERT_EQ(0, mkdir(dir_s4_d1, 0600));
+	ASSERT_EQ(0, mkdir(dir_s4_d2, 0600));
+	ASSERT_EQ(0, mount("tmp", dir_s4_d2, "tmpfs", 0, NULL));
+}
+
+FIXTURE_TEARDOWN(layout1)
+{
+	/*
+	 * cleanup_layout1() would be denied here, use TEST(cleanup) instead.
+	 */
+}
+
+static void test_path_rel(struct __test_metadata *_metadata, int dirfd,
+		const char *path, int ret)
+{
+	int fd;
+	struct stat statbuf;
+
+	/* faccessat() can not be restricted for now */
+	ASSERT_EQ(ret, fstatat(dirfd, path, &statbuf, 0)) {
+		TH_LOG("fstatat path \"%s\" returned %s\n", path,
+				strerror(errno));
+	}
+	if (ret) {
+		ASSERT_EQ(EACCES, errno);
+	}
+	fd = openat(dirfd, path, O_DIRECTORY);
+	if (ret) {
+		ASSERT_EQ(-1, fd);
+		ASSERT_EQ(EACCES, errno);
+	} else {
+		ASSERT_NE(-1, fd);
+		EXPECT_EQ(0, close(fd));
+	}
+}
+
+static void test_path(struct __test_metadata *_metadata, const char *path,
+		int ret)
+{
+	return test_path_rel(_metadata, AT_FDCWD, path, ret);
+}
+
+TEST_F(layout1, no_restriction)
+{
+	test_path(_metadata, dir_s0_d1, 0);
+	test_path(_metadata, dir_s0_d2, 0);
+	test_path(_metadata, dir_s0_d3, 0);
+	test_path(_metadata, dir_s1_d1, 0);
+	test_path(_metadata, dir_s2_d2, 0);
+}
+
+TEST_F(ruleset_rw, inval)
+{
+	int err;
+	struct landlock_attr_path_beneath path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_READ |
+			LANDLOCK_ACCESS_FS_WRITE,
+		.parent_fd = -1,
+	};
+	struct landlock_attr_enforce attr_enforce;
+
+	path_beneath.ruleset_fd = self->ruleset_fd;
+	path_beneath.parent_fd = open(dir_s0_d2, O_PATH | O_NOFOLLOW |
+			O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(path_beneath.parent_fd, 0);
+	err = landlock(LANDLOCK_CMD_ADD_RULE,
+			LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+			sizeof(path_beneath), &path_beneath);
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(err, 0);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	/* Tests without O_PATH. */
+	path_beneath.parent_fd = open(dir_s0_d2, O_NOFOLLOW | O_DIRECTORY |
+			O_CLOEXEC);
+	ASSERT_GE(path_beneath.parent_fd, 0);
+	err = landlock(LANDLOCK_CMD_ADD_RULE,
+			LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+			sizeof(path_beneath), &path_beneath);
+	ASSERT_EQ(err, -1);
+	ASSERT_EQ(errno, EBADR);
+	errno = 0;
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	/* Checks un-handled access. */
+	path_beneath.parent_fd = open(dir_s0_d2, O_PATH | O_NOFOLLOW |
+			O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(path_beneath.parent_fd, 0);
+	path_beneath.allowed_access |= LANDLOCK_ACCESS_FS_EXECUTE;
+	err = landlock(LANDLOCK_CMD_ADD_RULE,
+			LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+			sizeof(path_beneath), &path_beneath);
+	ASSERT_EQ(errno, EINVAL);
+	errno = 0;
+	ASSERT_EQ(err, -1);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(err, 0);
+
+	attr_enforce.ruleset_fd = self->ruleset_fd;
+	err = landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+			LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce),
+			&attr_enforce);
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(err, 0);
+}
+
+TEST_F(ruleset_rw, nsfs)
+{
+	struct landlock_attr_path_beneath path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_READ |
+			LANDLOCK_ACCESS_FS_WRITE,
+		.ruleset_fd = self->ruleset_fd,
+	};
+	int err;
+
+	path_beneath.parent_fd = open("/proc/self/ns/mnt", O_PATH | O_NOFOLLOW |
+			O_CLOEXEC);
+	ASSERT_GE(path_beneath.parent_fd, 0);
+	err = landlock(LANDLOCK_CMD_ADD_RULE,
+			LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+			sizeof(path_beneath), &path_beneath);
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(err, 0);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+static void add_path_beneath(struct __test_metadata *_metadata, int ruleset_fd,
+		__u64 allowed_access, const char *path)
+{
+	int err;
+	struct landlock_attr_path_beneath path_beneath = {
+		.ruleset_fd = ruleset_fd,
+		.allowed_access = allowed_access,
+	};
+
+	path_beneath.parent_fd = open(path, O_PATH | O_NOFOLLOW | O_DIRECTORY |
+			O_CLOEXEC);
+	ASSERT_GE(path_beneath.parent_fd, 0) {
+		TH_LOG("Failed to open directory \"%s\": %s\n", path,
+				strerror(errno));
+	}
+	err = landlock(LANDLOCK_CMD_ADD_RULE,
+			LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+			sizeof(path_beneath), &path_beneath);
+	ASSERT_EQ(err, 0) {
+		TH_LOG("Failed to update the ruleset with \"%s\": %s\n",
+				path, strerror(errno));
+	}
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+static int create_ruleset(struct __test_metadata *_metadata,
+		const char *const dirs[])
+{
+	int ruleset_fd, dirs_len, i;
+	struct landlock_attr_features attr_features;
+	struct landlock_attr_ruleset attr_ruleset = {
+		.handled_access_fs =
+			LANDLOCK_ACCESS_FS_OPEN |
+			LANDLOCK_ACCESS_FS_READ |
+			LANDLOCK_ACCESS_FS_WRITE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_GETATTR
+	};
+	__u64 allowed_access =
+			LANDLOCK_ACCESS_FS_OPEN |
+			LANDLOCK_ACCESS_FS_READ |
+			LANDLOCK_ACCESS_FS_GETATTR;
+
+	ASSERT_NE(NULL, dirs) {
+		TH_LOG("No directory list\n");
+	}
+	ASSERT_NE(NULL, dirs[0]) {
+		TH_LOG("Empty directory list\n");
+	}
+	/* Gets the number of dir entries. */
+	for (dirs_len = 0; dirs[dirs_len]; dirs_len++);
+
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+				LANDLOCK_OPT_GET_FEATURES,
+				sizeof(attr_features), &attr_features));
+	/* Only for test, use a binary AND for real application instead. */
+	ASSERT_EQ(attr_ruleset.handled_access_fs,
+			attr_ruleset.handled_access_fs &
+			attr_features.access_fs);
+	ASSERT_EQ(allowed_access, allowed_access & attr_features.access_fs);
+	ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, sizeof(attr_ruleset),
+			&attr_ruleset);
+	ASSERT_GE(ruleset_fd, 0) {
+		TH_LOG("Failed to create a ruleset: %s\n", strerror(errno));
+	}
+
+	for (i = 0; dirs[i]; i++) {
+		add_path_beneath(_metadata, ruleset_fd, allowed_access,
+				dirs[i]);
+	}
+	return ruleset_fd;
+}
+
+static void enforce_ruleset(struct __test_metadata *_metadata, int ruleset_fd)
+{
+	struct landlock_attr_enforce attr_enforce = {
+		.ruleset_fd = ruleset_fd,
+	};
+	int err;
+
+	err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(err, 0);
+
+	err = landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+			LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce),
+			&attr_enforce);
+	ASSERT_EQ(err, 0) {
+		TH_LOG("Failed to enforce ruleset: %s\n", strerror(errno));
+	}
+	ASSERT_EQ(errno, 0);
+}
+
+TEST_F(layout1, whitelist)
+{
+	int ruleset_fd = create_ruleset(_metadata,
+			(const char *const []){ dir_s0_d2, dir_s1_d1, NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, "/", -1);
+	test_path(_metadata, dir_s0_d1, -1);
+	test_path(_metadata, dir_s0_d2, 0);
+	test_path(_metadata, dir_s0_d3, 0);
+}
+
+TEST_F(layout1, unhandled_access)
+{
+	int ruleset_fd = create_ruleset(_metadata,
+			(const char *const []){ dir_s0_d2, NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * Because the policy does not handled LANDLOCK_ACCESS_FS_CHROOT,
+	 * chroot(2) should be allowed.
+	 */
+	ASSERT_EQ(0, chroot(dir_s0_d1));
+	ASSERT_EQ(0, chroot(dir_s0_d2));
+	ASSERT_EQ(0, chroot(dir_s0_d3));
+}
+
+TEST_F(layout1, ruleset_overlap)
+{
+	struct stat statbuf;
+	int open_fd;
+	int ruleset_fd = create_ruleset(_metadata,
+			(const char *const []){ dir_s1_d1, NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	/* These rules should be ORed among them. */
+	add_path_beneath(_metadata, ruleset_fd,
+			LANDLOCK_ACCESS_FS_GETATTR, dir_s0_d2);
+	add_path_beneath(_metadata, ruleset_fd,
+			LANDLOCK_ACCESS_FS_OPEN, dir_s0_d2);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0));
+	ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY));
+	ASSERT_EQ(0, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY);
+	ASSERT_LE(0, open_fd);
+	EXPECT_EQ(0, close(open_fd));
+	ASSERT_EQ(0, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY);
+	ASSERT_LE(0, open_fd);
+	EXPECT_EQ(0, close(open_fd));
+}
+
+TEST_F(layout1, inherit_superset)
+{
+	struct stat statbuf;
+	int ruleset_fd, open_fd;
+
+	ruleset_fd = create_ruleset(_metadata,
+			(const char *const []){ dir_s1_d1, NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	add_path_beneath(_metadata, ruleset_fd,
+			LANDLOCK_ACCESS_FS_OPEN, dir_s0_d2);
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0));
+	ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY));
+
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY);
+	ASSERT_NE(-1, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY);
+	ASSERT_NE(-1, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/*
+	 * Test shared rule extension: the following rules should not grant any
+	 * new access, only remove some.  Once enforced, these rules are ANDed
+	 * with the previous ones.
+	 */
+	add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_GETATTR,
+			dir_s0_d2);
+	/*
+	 * In ruleset_fd, dir_s0_d2 should now have the LANDLOCK_ACCESS_FS_OPEN
+	 * and LANDLOCK_ACCESS_FS_GETATTR access rights (even if this directory
+	 * is opened a second time).  However, when enforcing this updated
+	 * ruleset, the ruleset tied to the current process will still only
+	 * have the dir_s0_d2 with LANDLOCK_ACCESS_FS_OPEN access,
+	 * LANDLOCK_ACCESS_FS_GETATTR must not be allowed because it would be a
+	 * privilege escalation.
+	 */
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	/* Same tests and results as above. */
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0));
+	ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY));
+
+	/* It is still forbiden to fstat(dir_s0_d2). */
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY);
+	ASSERT_NE(-1, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY);
+	ASSERT_NE(-1, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/*
+	 * Now, dir_s0_d3 get a new rule tied to it, only allowing
+	 * LANDLOCK_ACCESS_FS_GETATTR.  The kernel internal difference is that
+	 * there was no rule tied to it before.
+	 */
+	add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_GETATTR,
+			dir_s0_d3);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * Same tests and results as above, except for open(dir_s0_d3) which is
+	 * now denied because the new rule mask the rule previously inherited
+	 * from dir_s0_d2.
+	 */
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0));
+	ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY));
+
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY);
+	ASSERT_NE(-1, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/* It is still forbiden to fstat(dir_s0_d3). */
+	ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0));
+	open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY);
+	/* open(dir_s0_d3) is now forbidden. */
+	ASSERT_EQ(-1, open_fd);
+	ASSERT_EQ(EACCES, errno);
+}
+
+TEST_F(layout1, extend_ruleset_with_denied_path)
+{
+	struct landlock_attr_path_beneath path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_GETATTR,
+	};
+
+	path_beneath.ruleset_fd = create_ruleset(_metadata,
+			(const char *const []){ dir_s0_d2, NULL });
+	ASSERT_NE(-1, path_beneath.ruleset_fd);
+	enforce_ruleset(_metadata, path_beneath.ruleset_fd);
+
+	ASSERT_EQ(-1, open(dir_s0_d1, O_NOFOLLOW | O_DIRECTORY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+
+	/*
+	 * Tests that we can't create a rule for which we are not allowed to
+	 * open its path.
+	 */
+	path_beneath.parent_fd = open(dir_s0_d1, O_PATH | O_NOFOLLOW
+			| O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(path_beneath.parent_fd, 0);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_CREATE_RULESET,
+				sizeof(path_beneath), &path_beneath));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+	EXPECT_EQ(0, close(path_beneath.ruleset_fd));
+}
+
+TEST_F(layout1, rule_on_mountpoint)
+{
+	int ruleset_fd = create_ruleset(_metadata,
+			(const char *const []){ dir_s0_d1, dir_s3_d1, NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, dir_s1_d1, -1);
+	test_path(_metadata, dir_s0_d1, 0);
+	test_path(_metadata, dir_s3_d1, 0);
+}
+
+TEST_F(layout1, rule_over_mountpoint)
+{
+	int ruleset_fd = create_ruleset(_metadata,
+			(const char *const []){ dir_s4_d1, dir_s0_d1, NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, dir_s4_d2, 0);
+	test_path(_metadata, dir_s0_d1, 0);
+	test_path(_metadata, dir_s4_d1, 0);
+}
+
+/*
+ * This test verifies that we can apply a landlock rule on the root (/), it
+ * might require special handling.
+ */
+TEST_F(layout1, rule_over_root)
+{
+	int ruleset_fd = create_ruleset(_metadata,
+		(const char *const []){ "/", NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, "/", 0);
+	test_path(_metadata, dir_s0_d1, 0);
+}
+
+TEST_F(layout1, rule_inside_mount_ns)
+{
+	ASSERT_NE(-1, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+	ASSERT_NE(-1, syscall(SYS_pivot_root, dir_s3_d1, dir_s3_d2));
+	ASSERT_NE(-1, chdir("/"));
+
+	int ruleset_fd = create_ruleset(_metadata,
+		(const char *const []){ "b3", NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, "b3", 0);
+	test_path(_metadata, "/", -1);
+}
+
+TEST_F(layout1, mount_and_pivot)
+{
+	int ruleset_fd = create_ruleset(_metadata,
+		(const char *const []){ dir_s3_d1, NULL });
+	ASSERT_NE(-1, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+	ASSERT_EQ(-1, syscall(SYS_pivot_root, dir_s3_d1, dir_s3_d2));
+}
+
+enum relative_access {
+	REL_OPEN,
+	REL_CHDIR,
+	REL_CHROOT,
+};
+
+static void check_access(struct __test_metadata *_metadata,
+		bool enforce, enum relative_access rel)
+{
+	int dirfd;
+	int ruleset_fd = -1;
+
+	if (enforce) {
+		ruleset_fd = create_ruleset(_metadata, (const char *const []){
+				dir_s0_d2, dir_s1_d1, NULL });
+		ASSERT_NE(-1, ruleset_fd);
+		if (rel == REL_CHROOT)
+			ASSERT_NE(-1, chdir(dir_s0_d2));
+		enforce_ruleset(_metadata, ruleset_fd);
+	} else if (rel == REL_CHROOT)
+		ASSERT_NE(-1, chdir(dir_s0_d2));
+	switch (rel) {
+	case REL_OPEN:
+		dirfd = open(dir_s0_d2, O_DIRECTORY);
+		ASSERT_NE(-1, dirfd);
+		break;
+	case REL_CHDIR:
+		ASSERT_NE(-1, chdir(dir_s0_d2));
+		dirfd = AT_FDCWD;
+		break;
+	case REL_CHROOT:
+		ASSERT_NE(-1, chroot(".")) {
+			TH_LOG("Failed to chroot: %s\n", strerror(errno));
+		}
+		dirfd = AT_FDCWD;
+		break;
+	default:
+		ASSERT_TRUE(false);
+		return;
+	}
+
+	test_path_rel(_metadata, dirfd, "..", (rel == REL_CHROOT) ? 0 : -1);
+	test_path_rel(_metadata, dirfd, ".", 0);
+	if (rel != REL_CHROOT) {
+		test_path_rel(_metadata, dirfd, "./c0", 0);
+		test_path_rel(_metadata, dirfd, "../../" TMP_PREFIX "a1", 0);
+		test_path_rel(_metadata, dirfd, "../../" TMP_PREFIX "a2", -1);
+	}
+
+	if (rel == REL_OPEN)
+		EXPECT_EQ(0, close(dirfd));
+	if (enforce)
+		EXPECT_EQ(0, close(ruleset_fd));
+}
+
+TEST_F(layout1, deny_open)
+{
+	check_access(_metadata, true, REL_OPEN);
+}
+
+TEST_F(layout1, deny_chdir)
+{
+	check_access(_metadata, true, REL_CHDIR);
+}
+
+TEST_F(layout1, deny_chroot)
+{
+	check_access(_metadata, true, REL_CHROOT);
+}
+
+TEST(cleanup)
+{
+	cleanup_layout1();
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/test_ptrace.c b/tools/testing/selftests/landlock/test_ptrace.c
new file mode 100644
index 000000000000..fcdb41e172d1
--- /dev/null
+++ b/tools/testing/selftests/landlock/test_ptrace.c
@@ -0,0 +1,293 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - ptrace
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <signal.h>
+#include <sys/prctl.h>
+#include <sys/ptrace.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "test.h"
+
+static void create_domain(struct __test_metadata *_metadata)
+{
+	int ruleset_fd, err;
+	struct landlock_attr_features attr_features;
+	struct landlock_attr_enforce attr_enforce;
+	struct landlock_attr_ruleset attr_ruleset = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ,
+	};
+	struct landlock_attr_path_beneath path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_READ,
+	};
+
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+				LANDLOCK_OPT_GET_FEATURES,
+				sizeof(attr_features), &attr_features));
+	/* Only for test, use a binary AND for real application instead. */
+	ASSERT_EQ(attr_ruleset.handled_access_fs,
+			attr_ruleset.handled_access_fs &
+			attr_features.access_fs);
+	ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, sizeof(attr_ruleset),
+			&attr_ruleset);
+	ASSERT_GE(ruleset_fd, 0) {
+		TH_LOG("Failed to create a ruleset: %s\n", strerror(errno));
+	}
+	path_beneath.ruleset_fd = ruleset_fd;
+	path_beneath.parent_fd = open("/tmp", O_PATH | O_NOFOLLOW | O_DIRECTORY
+			| O_CLOEXEC);
+	ASSERT_GE(path_beneath.parent_fd, 0);
+	err = landlock(LANDLOCK_CMD_ADD_RULE,
+			LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+			sizeof(path_beneath), &path_beneath);
+	ASSERT_EQ(err, 0);
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(errno, 0);
+	ASSERT_EQ(err, 0);
+
+	attr_enforce.ruleset_fd = ruleset_fd;
+	err = landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+			LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce),
+			&attr_enforce);
+	ASSERT_EQ(err, 0);
+	ASSERT_EQ(errno, 0);
+
+	ASSERT_EQ(0, close(ruleset_fd));
+}
+
+/* test PTRACE_TRACEME and PTRACE_ATTACH for parent and child */
+static void check_ptrace(struct __test_metadata *_metadata,
+		bool domain_both, bool domain_parent, bool domain_child)
+{
+	pid_t child, parent;
+	int status;
+	int pipe_child[2], pipe_parent[2];
+	char buf_parent;
+
+	parent = getpid();
+	ASSERT_EQ(0, pipe(pipe_child));
+	ASSERT_EQ(0, pipe(pipe_parent));
+	if (domain_both)
+		create_domain(_metadata);
+
+	child = fork();
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		char buf_child;
+
+		EXPECT_EQ(0, close(pipe_parent[1]));
+		EXPECT_EQ(0, close(pipe_child[0]));
+		if (domain_child)
+			create_domain(_metadata);
+
+		/* sync #1 */
+		ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1)) {
+			TH_LOG("Failed to read() sync #1 from parent");
+		}
+		ASSERT_EQ('.', buf_child);
+
+		/* Tests the parent protection. */
+		ASSERT_EQ(domain_child ? -1 : 0,
+				ptrace(PTRACE_ATTACH, parent, NULL, 0));
+		if (domain_child) {
+			ASSERT_EQ(EPERM, errno);
+		} else {
+			ASSERT_EQ(parent, waitpid(parent, &status, 0));
+			ASSERT_EQ(1, WIFSTOPPED(status));
+			ASSERT_EQ(0, ptrace(PTRACE_DETACH, parent, NULL, 0));
+		}
+
+		/* sync #2 */
+		ASSERT_EQ(1, write(pipe_child[1], ".", 1)) {
+			TH_LOG("Failed to write() sync #2 to parent");
+		}
+
+		/* Tests traceme. */
+		ASSERT_EQ(domain_parent ? -1 : 0, ptrace(PTRACE_TRACEME));
+		if (domain_parent) {
+			ASSERT_EQ(EPERM, errno);
+		} else {
+			ASSERT_EQ(0, raise(SIGSTOP));
+		}
+
+		/* sync #3 */
+		ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1)) {
+			TH_LOG("Failed to read() sync #3 from parent");
+		}
+		ASSERT_EQ('.', buf_child);
+		_exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE);
+	}
+
+	EXPECT_EQ(0, close(pipe_child[1]));
+	EXPECT_EQ(0, close(pipe_parent[0]));
+	if (domain_parent)
+		create_domain(_metadata);
+
+	/* sync #1 */
+	ASSERT_EQ(1, write(pipe_parent[1], ".", 1)) {
+		TH_LOG("Failed to write() sync #1 to child");
+	}
+
+	/* Tests the parent protection. */
+	/* sync #2 */
+	ASSERT_EQ(1, read(pipe_child[0], &buf_parent, 1)) {
+		TH_LOG("Failed to read() sync #2 from child");
+	}
+	ASSERT_EQ('.', buf_parent);
+
+	/* Tests traceme. */
+	if (!domain_parent) {
+		ASSERT_EQ(child, waitpid(child, &status, 0));
+		ASSERT_EQ(1, WIFSTOPPED(status));
+		ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0));
+	}
+	/* Tests attach. */
+	ASSERT_EQ(domain_parent ? -1 : 0,
+			ptrace(PTRACE_ATTACH, child, NULL, 0));
+	if (domain_parent) {
+		ASSERT_EQ(EPERM, errno);
+	} else {
+		ASSERT_EQ(child, waitpid(child, &status, 0));
+		ASSERT_EQ(1, WIFSTOPPED(status));
+		ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0));
+	}
+
+	/* sync #3 */
+	ASSERT_EQ(1, write(pipe_parent[1], ".", 1)) {
+		TH_LOG("Failed to write() sync #3 to child");
+	}
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	if (WIFSIGNALED(status) || WEXITSTATUS(status))
+		_metadata->passed = 0;
+}
+
+/*
+ * Test multiple tracing combinations between a parent process P1 and a child
+ * process P2.
+ *
+ * Yama's scoped ptrace is presumed disabled.  If enabled, this optional
+ * restriction is enforced in addition to any Landlock check, which means that
+ * all P2 requests to trace P1 would be denied.
+ */
+
+/*
+ *        No domain
+ *
+ *   P1-.               P1 -> P2 : allow
+ *       \              P2 -> P1 : allow
+ *        'P2
+ */
+TEST(allow_without_domain) {
+	check_ptrace(_metadata, false, false, false);
+}
+
+/*
+ *        Child domain
+ *
+ *   P1--.              P1 -> P2 : allow
+ *        \             P2 -> P1 : deny
+ *        .'-----.
+ *        |  P2  |
+ *        '------'
+ */
+TEST(allow_with_one_domain) {
+	check_ptrace(_metadata, false, false, true);
+}
+
+/*
+ *        Parent domain
+ * .------.
+ * |  P1  --.           P1 -> P2 : deny
+ * '------'  \          P2 -> P1 : allow
+ *            '
+ *            P2
+ */
+TEST(deny_with_parent_domain) {
+	check_ptrace(_metadata, false, true, false);
+}
+
+/*
+ *        Parent + child domain (siblings)
+ * .------.
+ * |  P1  ---.          P1 -> P2 : deny
+ * '------'   \         P2 -> P1 : deny
+ *         .---'--.
+ *         |  P2  |
+ *         '------'
+ */
+TEST(deny_with_sibling_domain) {
+	check_ptrace(_metadata, false, true, true);
+}
+
+/*
+ *         Same domain (inherited)
+ * .-------------.
+ * | P1----.     |      P1 -> P2 : allow
+ * |        \    |      P2 -> P1 : allow
+ * |         '   |
+ * |         P2  |
+ * '-------------'
+ */
+TEST(allow_sibling_domain) {
+	check_ptrace(_metadata, true, false, false);
+}
+
+/*
+ *         Inherited + child domain
+ * .-----------------.
+ * |  P1----.        |  P1 -> P2 : allow
+ * |         \       |  P2 -> P1 : deny
+ * |        .-'----. |
+ * |        |  P2  | |
+ * |        '------' |
+ * '-----------------'
+ */
+TEST(allow_with_nested_domain) {
+	check_ptrace(_metadata, true, false, true);
+}
+
+/*
+ *         Inherited + parent domain
+ * .-----------------.
+ * |.------.         |  P1 -> P2 : deny
+ * ||  P1  ----.     |  P2 -> P1 : allow
+ * |'------'    \    |
+ * |             '   |
+ * |             P2  |
+ * '-----------------'
+ */
+TEST(deny_with_nested_and_parent_domain) {
+	check_ptrace(_metadata, true, true, false);
+}
+
+/*
+ *         Inherited + parent and child domain (siblings)
+ * .-----------------.
+ * | .------.        |  P1 -> P2 : deny
+ * | |  P1  .        |  P2 -> P1 : deny
+ * | '------'\       |
+ * |          \      |
+ * |        .--'---. |
+ * |        |  P2  | |
+ * |        '------' |
+ * '-----------------'
+ */
+TEST(deny_with_forked_domain) {
+	check_ptrace(_metadata, true, true, true);
+}
+
+TEST_HARNESS_MAIN
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 09/10] samples/landlock: Add a sandbox manager example
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (7 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 08/10] selftests/landlock: Add initial tests Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-24 16:02 ` [RFC PATCH v14 10/10] landlock: Add user and kernel documentation Mickaël Salaün
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

Add a basic sandbox tool to launch a command which can only access a
whitelist of file hierarchies in a read-only or read-write way.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v11:
* Add back the filesystem sandbox manager and update it to work with the
  new Landlock syscall.

Previous version:
https://lore.kernel.org/lkml/20190721213116.23476-9-mic@digikod.net/
---
 samples/Kconfig              |   7 ++
 samples/Makefile             |   1 +
 samples/landlock/.gitignore  |   1 +
 samples/landlock/Makefile    |  15 +++
 samples/landlock/sandboxer.c | 226 +++++++++++++++++++++++++++++++++++
 5 files changed, 250 insertions(+)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandboxer.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 9d236c346de5..5ec43a732b10 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -120,6 +120,13 @@ config SAMPLE_HIDRAW
 	bool "hidraw sample"
 	depends on HEADERS_INSTALL
 
+config SAMPLE_LANDLOCK
+	bool "Build Landlock sample code"
+	depends on HEADERS_INSTALL
+	help
+	  Build a simple Landlock sandbox manager able to launch a process
+	  restricted by a user-defined filesystem access-control security policy.
+
 config SAMPLE_PIDFD
 	bool "pidfd sample"
 	depends on HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index f8f847b4f61f..61a2bd216f53 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_SAMPLE_KDB)		+= kdb/
 obj-$(CONFIG_SAMPLE_KFIFO)		+= kfifo/
 obj-$(CONFIG_SAMPLE_KOBJECT)		+= kobject/
 obj-$(CONFIG_SAMPLE_KPROBES)		+= kprobes/
+subdir-$(CONFIG_SAMPLE_LANDLOCK)	+= landlock
 obj-$(CONFIG_SAMPLE_LIVEPATCH)		+= livepatch/
 subdir-$(CONFIG_SAMPLE_PIDFD)		+= pidfd
 obj-$(CONFIG_SAMPLE_QMI_CLIENT)		+= qmi/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index 000000000000..f43668b2d318
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandboxer
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index 000000000000..9dfb571641ba
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+hostprogs-y := sandboxer
+
+always := $(hostprogs-y)
+
+KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
+
+.PHONY: all clean
+
+all:
+	$(MAKE) -C ../.. samples/landlock/
+
+clean:
+	$(MAKE) -C ../.. M=samples/landlock/ clean
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
new file mode 100644
index 000000000000..882c12f71edb
--- /dev/null
+++ b/samples/landlock/sandboxer.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Simple Landlock sandbox manager able to launch a process restricted by a
+ * user-defined filesystem access-control security policy.
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <linux/prctl.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#ifndef landlock
+
+#ifndef __NR_landlock
+#define __NR_landlock 436
+#endif
+
+static inline int landlock(unsigned int command, unsigned int options,
+		size_t attr_size, void *attr_ptr)
+{
+	errno = 0;
+	return syscall(__NR_landlock, command, options, attr_size, attr_ptr, 0,
+			NULL);
+}
+#endif
+
+#define ENV_FS_RO_NAME "LL_FS_RO"
+#define ENV_FS_RW_NAME "LL_FS_RW"
+#define ENV_PATH_TOKEN ":"
+
+static int parse_path(char *env_path, const char ***path_list)
+{
+	int i, path_nb = 0;
+
+	if (env_path) {
+		path_nb++;
+		for (i = 0; env_path[i]; i++) {
+			if (env_path[i] == ENV_PATH_TOKEN[0])
+				path_nb++;
+		}
+	}
+	*path_list = malloc(path_nb * sizeof(**path_list));
+	for (i = 0; i < path_nb; i++)
+		(*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN);
+
+	return path_nb;
+}
+
+static int populate_ruleset(const struct landlock_attr_features *attr_features,
+		const char *env_var, int ruleset_fd, __u64 allowed_access)
+{
+	int path_nb, i;
+	char *env_path_name;
+	const char **path_list = NULL;
+	struct landlock_attr_path_beneath path_beneath = {
+		.ruleset_fd = ruleset_fd,
+		.allowed_access = allowed_access,
+		.parent_fd = -1,
+	};
+
+	env_path_name = getenv(env_var);
+	if (!env_path_name) {
+		fprintf(stderr, "Missing environment variable %s\n", env_var);
+		return 1;
+	}
+	env_path_name = strdup(env_path_name);
+	unsetenv(env_var);
+	path_nb = parse_path(env_path_name, &path_list);
+	if (path_nb == 1 && path_list[0][0] == '\0') {
+		fprintf(stderr, "Missing path in %s\n", env_var);
+		goto err_free_name;
+	}
+
+	/* follow a best-effort approach */
+	path_beneath.allowed_access &= attr_features->access_fs;
+	for (i = 0; i < path_nb; i++) {
+		path_beneath.parent_fd = open(path_list[i],
+				O_PATH | O_NOFOLLOW | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0) {
+			fprintf(stderr, "Failed to open \"%s\": %s\n",
+					path_list[i],
+					strerror(errno));
+			goto err_free_name;
+		}
+		if (landlock(LANDLOCK_CMD_ADD_RULE,
+					LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+					sizeof(path_beneath), &path_beneath)) {
+			fprintf(stderr, "Failed to update the ruleset with \"%s\": %s\n",
+					path_list[i], strerror(errno));
+			close(path_beneath.parent_fd);
+			goto err_free_name;
+		}
+		close(path_beneath.parent_fd);
+	}
+	free(env_path_name);
+	return 0;
+
+err_free_name:
+	free(env_path_name);
+	return 1;
+}
+
+#define ACCESS_FS_ROUGHLY_READ ( \
+	LANDLOCK_ACCESS_FS_READ | \
+	LANDLOCK_ACCESS_FS_READDIR | \
+	LANDLOCK_ACCESS_FS_GETATTR | \
+	LANDLOCK_ACCESS_FS_EXECUTE | \
+	LANDLOCK_ACCESS_FS_CHROOT)
+
+#define ACCESS_FS_ROUGHLY_WRITE ( \
+	LANDLOCK_ACCESS_FS_WRITE | \
+	LANDLOCK_ACCESS_FS_TRUNCATE | \
+	LANDLOCK_ACCESS_FS_LOCK | \
+	LANDLOCK_ACCESS_FS_CHMOD | \
+	LANDLOCK_ACCESS_FS_CHOWN | \
+	LANDLOCK_ACCESS_FS_CHGRP | \
+	LANDLOCK_ACCESS_FS_IOCTL | \
+	LANDLOCK_ACCESS_FS_LINK_TO | \
+	LANDLOCK_ACCESS_FS_RENAME_FROM | \
+	LANDLOCK_ACCESS_FS_RENAME_TO | \
+	LANDLOCK_ACCESS_FS_RMDIR | \
+	LANDLOCK_ACCESS_FS_UNLINK | \
+	LANDLOCK_ACCESS_FS_MAKE_CHAR | \
+	LANDLOCK_ACCESS_FS_MAKE_DIR | \
+	LANDLOCK_ACCESS_FS_MAKE_REG | \
+	LANDLOCK_ACCESS_FS_MAKE_SOCK | \
+	LANDLOCK_ACCESS_FS_MAKE_FIFO | \
+	LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
+	LANDLOCK_ACCESS_FS_MAKE_SYM)
+
+int main(int argc, char * const argv[], char * const *envp)
+{
+	char *cmd_path;
+	char * const *cmd_argv;
+	int ruleset_fd;
+	struct landlock_attr_features attr_features;
+	struct landlock_attr_ruleset ruleset = {
+		/* only restrict open and getattr */
+		.handled_access_fs = ACCESS_FS_ROUGHLY_READ |
+			ACCESS_FS_ROUGHLY_WRITE,
+	};
+	struct landlock_attr_enforce attr_enforce = {};
+
+	if (argc < 2) {
+		fprintf(stderr, "usage: %s=\"...\" %s=\"...\" %s <cmd> [args]...\n\n",
+				ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+		fprintf(stderr, "Launch a command in a restricted environment.\n\n");
+		fprintf(stderr, "Environment variables containing paths, each separated by a colon:\n");
+		fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n",
+				ENV_FS_RO_NAME);
+		fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n",
+				ENV_FS_RO_NAME);
+		fprintf(stderr, "\nexample:\n"
+				"%s=\"/bin:/lib:/usr\" "
+				"%s=\"/dev/pts\" "
+				"%s /bin/bash -i\n",
+				ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+		return 1;
+	}
+
+	if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
+				sizeof(attr_features), &attr_features)) {
+		perror("Failed to probe the Landlock supported features");
+		switch (errno) {
+		case ENOSYS:
+			fprintf(stderr, "Hint: this kernel does not support Landlock.\n");
+			break;
+		case ENOPKG:
+			fprintf(stderr, "Hint: Landlock is currently disabled. It can be enabled in the kernel configuration or at boot with the \"lsm=landlock\" parameter.\n");
+			break;
+		}
+		return 1;
+	}
+	/* follow a best-effort approach */
+	ruleset.handled_access_fs &= attr_features.access_fs;
+	ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset),
+			&ruleset);
+	if (ruleset_fd < 0) {
+		perror("Failed to create a ruleset");
+		return 1;
+	}
+	if (populate_ruleset(&attr_features, ENV_FS_RO_NAME, ruleset_fd,
+				ACCESS_FS_ROUGHLY_READ)) {
+		goto err_close_ruleset;
+	}
+	if (populate_ruleset(&attr_features, ENV_FS_RW_NAME, ruleset_fd,
+				ACCESS_FS_ROUGHLY_READ |
+				ACCESS_FS_ROUGHLY_WRITE)) {
+		goto err_close_ruleset;
+	}
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("Failed to restrict privileges");
+		goto err_close_ruleset;
+	}
+	attr_enforce.ruleset_fd = ruleset_fd;
+	if (landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+				LANDLOCK_OPT_ENFORCE_RULESET,
+				sizeof(attr_enforce), &attr_enforce)) {
+		perror("Failed to enforce ruleset");
+		goto err_close_ruleset;
+	}
+	close(ruleset_fd);
+
+	cmd_path = argv[1];
+	cmd_argv = argv + 1;
+	execve(cmd_path, cmd_argv, envp);
+	fprintf(stderr, "Failed to execute \"%s\"\n", cmd_path);
+	fprintf(stderr, "Hint: access to the binary or its shared libraries may be denied.\n");
+	return 1;
+
+err_close_ruleset:
+	close(ruleset_fd);
+	return 1;
+}
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v14 10/10] landlock: Add user and kernel documentation
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (8 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 09/10] samples/landlock: Add a sandbox manager example Mickaël Salaün
@ 2020-02-24 16:02 ` Mickaël Salaün
  2020-02-29 17:23   ` Randy Dunlap
  2020-02-25 18:49 ` [RFC PATCH v14 00/10] Landlock LSM J Freyensee
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-24 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Greg Kroah-Hartman, James Morris,
	Jann Horn, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86

This documentation can be built with the Sphinx framework.

Another location might be more appropriate, though.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: James Morris <jmorris@namei.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v13:
* Rewrote the documentation according to the major revamp.

Previous version:
https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/
---
 Documentation/security/index.rst           |   1 +
 Documentation/security/landlock/index.rst  |  18 ++
 Documentation/security/landlock/kernel.rst |  44 ++++
 Documentation/security/landlock/user.rst   | 233 +++++++++++++++++++++
 4 files changed, 296 insertions(+)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index fc503dd689a7..4d213e76ddf4 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -15,3 +15,4 @@ Security Documentation
    self-protection
    siphash
    tpm/index
+   landlock/index
diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst
new file mode 100644
index 000000000000..dbd33b96ce60
--- /dev/null
+++ b/Documentation/security/landlock/index.rst
@@ -0,0 +1,18 @@
+=========================================
+Landlock LSM: unprivileged access control
+=========================================
+
+:Author: Mickaël Salaün
+
+The goal of Landlock is to enable to restrict ambient rights (e.g.  global
+filesystem access) for a set of processes.  Because Landlock is a stackable
+LSM, it makes possible to create safe security sandboxes as new security layers
+in addition to the existing system-wide access-controls. This kind of sandbox
+is expected to help mitigate the security impact of bugs or
+unexpected/malicious behaviors in user-space applications. Landlock empower any
+process, including unprivileged ones, to securely restrict themselves.
+
+.. toctree::
+
+    user
+    kernel
diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst
new file mode 100644
index 000000000000..b87769909029
--- /dev/null
+++ b/Documentation/security/landlock/kernel.rst
@@ -0,0 +1,44 @@
+==============================
+Landlock: kernel documentation
+==============================
+
+Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
+harden a whole system, this feature should be available to any process,
+including unprivileged ones.  Because such process may be compromised or
+backdoored (i.e. untrusted), Landlock's features must be safe to use from the
+kernel and other processes point of view.  Landlock's interface must therefore
+expose a minimal attack surface.
+
+Landlock is designed to be usable by unprivileged processes while following the
+system security policy enforced by other access control mechanisms (e.g. DAC,
+LSM).  Indeed, a Landlock rule shall not interfere with other access-controls
+enforced on the system, only add more restrictions.
+
+Any user can enforce Landlock rulesets on their processes.  They are merged and
+evaluated according to the inherited ones in a way that ensure that only more
+constraints can be added.
+
+
+Guiding principles for safe access controls
+===========================================
+
+* A Landlock rule shall be focused on access control on kernel objects instead
+  of syscall filtering (i.e. syscall arguments), which is the purpose of
+  seccomp-bpf.
+* To avoid multiple kind of side-channel attacks (e.g. leak of security
+  policies, CPU-based attacks), Landlock rules shall not be able to
+  programmatically communicate with user space.
+* Kernel access check shall not slow down access request from unsandboxed
+  processes.
+* Computation related to Landlock operations (e.g. enforce a ruleset) shall
+  only impact the processes requesting them.
+
+
+Landlock rulesets and domains
+=============================
+
+A domain is a read-only ruleset tied to a set of subjects (i.e. tasks).  A
+domain can transition to a new one which is the intersection of the constraints
+from the current and a new ruleset.  The definition of a subject is implicit
+for a task sandboxing itself, which makes the reasoning much easier and helps
+avoid pitfalls.
diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst
new file mode 100644
index 000000000000..cbd7f61fca8c
--- /dev/null
+++ b/Documentation/security/landlock/user.rst
@@ -0,0 +1,233 @@
+=================================
+Landlock: userspace documentation
+=================================
+
+Landlock rules
+==============
+
+A Landlock rule enables to describe an action on an object.  An object is
+currently a file hierarchy, and the related filesystem actions are defined in
+`Access rights`_.  A set of rules are aggregated in a ruleset, which can then
+restricts the thread enforcing it, and its future children.
+
+
+Defining and enforcing a security policy
+----------------------------------------
+
+Before defining a security policy, an application should first probe for the
+features supported by the running kernel, which is important to be compatible
+with older kernels.  This can be done thanks to the `landlock` syscall (cf.
+:ref:`syscall`).
+
+.. code-block:: c
+
+    struct landlock_attr_features attr_features;
+
+    if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
+            sizeof(attr_features), &attr_features)) {
+        perror("Failed to probe the Landlock supported features");
+        return 1;
+    }
+
+Then, we need to create the ruleset that will contains our rules.  For this
+example, the ruleset will contains rules which only allow read actions, but
+write actions will be denied.  The ruleset then needs to handle both of these
+kind of actions.  To have a backward compatibility, these actions should be
+ANDed with the supported ones.
+
+.. code-block:: c
+
+    int ruleset_fd;
+    struct landlock_attr_ruleset ruleset = {
+        .handled_access_fs =
+            LANDLOCK_ACCESS_FS_READ |
+            LANDLOCK_ACCESS_FS_READDIR |
+            LANDLOCK_ACCESS_FS_EXECUTE |
+            LANDLOCK_ACCESS_FS_WRITE |
+            LANDLOCK_ACCESS_FS_TRUNCATE |
+            LANDLOCK_ACCESS_FS_CHMOD |
+            LANDLOCK_ACCESS_FS_CHOWN |
+            LANDLOCK_ACCESS_FS_CHGRP |
+            LANDLOCK_ACCESS_FS_LINK_TO |
+            LANDLOCK_ACCESS_FS_RENAME_FROM |
+            LANDLOCK_ACCESS_FS_RENAME_TO |
+            LANDLOCK_ACCESS_FS_RMDIR |
+            LANDLOCK_ACCESS_FS_UNLINK |
+            LANDLOCK_ACCESS_FS_MAKE_CHAR |
+            LANDLOCK_ACCESS_FS_MAKE_DIR |
+            LANDLOCK_ACCESS_FS_MAKE_REG |
+            LANDLOCK_ACCESS_FS_MAKE_SOCK |
+            LANDLOCK_ACCESS_FS_MAKE_FIFO |
+            LANDLOCK_ACCESS_FS_MAKE_BLOCK |
+            LANDLOCK_ACCESS_FS_MAKE_SYM,
+    };
+
+    ruleset.handled_access_fs &= attr_features.access_fs;
+    ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+                    LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
+    if (ruleset_fd < 0) {
+        perror("Failed to create a ruleset");
+        return 1;
+    }
+
+We can now add a new rule to this ruleset thanks to the returned file
+descriptor referring to this ruleset.  The rule will only enable to read the
+file hierarchy ``/usr``.  Without other rule, write actions would then be
+denied by the ruleset.  To add ``/usr`` to the ruleset, we open it with the
+``O_PATH`` flag and fill the &struct landlock_attr_path_beneath with this file
+descriptor.
+
+.. code-block:: c
+
+    int err;
+    struct landlock_attr_path_beneath path_beneath = {
+        .ruleset_fd = ruleset_fd,
+        .allowed_access =
+            LANDLOCK_ACCESS_FS_READ |
+            LANDLOCK_ACCESS_FS_READDIR |
+            LANDLOCK_ACCESS_FS_EXECUTE,
+    };
+
+    path_beneath.allowed_access &= attr_features.access_fs;
+    path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
+    if (path_beneath.parent_fd < 0) {
+        perror("Failed to open file");
+        close(ruleset_fd);
+        return 1;
+    }
+    err = landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+            sizeof(path_beneath), &path_beneath);
+    close(path_beneath.parent_fd);
+    if (err) {
+        perror("Failed to update ruleset");
+        close(ruleset_fd);
+        return 1;
+    }
+
+We now have a ruleset with one rule allowing read access to ``/usr`` while
+denying all accesses featured in ``attr_features.access_fs`` to everything else
+on the filesystem.  The next step is to restrict the current thread from
+gaining more privileges (e.g. thanks to a SUID binary).
+
+.. code-block:: c
+
+    if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+        perror("Failed to restrict privileges");
+        close(ruleset_fd);
+        return 1;
+    }
+
+The current thread is now ready to sandbox itself with the ruleset.
+
+.. code-block:: c
+
+    struct landlock_attr_enforce attr_enforce = {
+        .ruleset_fd = ruleset_fd,
+    };
+
+    if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
+            sizeof(attr_enforce), &attr_enforce)) {
+        perror("Failed to enforce ruleset");
+        close(ruleset_fd);
+        return 1;
+    }
+    close(ruleset_fd);
+
+If this last system call succeeds, the current thread is now restricted and
+this policy will be enforced on all its subsequently created children as well.
+Once a thread is landlocked, there is no way to remove its security policy,
+only adding more restrictions is allowed.  These threads are now in a new
+Landlock domain, merge of their parent one (if any) with the new ruleset.
+
+A full working code can be found in `samples/landlock/sandboxer.c`_.
+
+
+Inheritance
+-----------
+
+Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program
+restrictions from its parent.  This is similar to the seccomp inheritance (cf.
+:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's
+:manpage:`credentials(7)`.  For instance, one process' thread may apply
+Landlock rules to itself, but they will not be automatically applied to other
+sibling threads (unlike POSIX thread credential changes, cf.
+:manpage:`nptl(7)`).
+
+
+Ptrace restrictions
+-------------------
+
+A sandboxed process has less privileges than a non-sandboxed process and must
+then be subject to additional restrictions when manipulating another process.
+To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target
+process, a sandboxed process should have a subset of the target process rules,
+which means the tracee must be in a sub-domain of the tracer.
+
+
+.. _syscall:
+
+The `landlock` syscall and its arguments
+========================================
+
+.. kernel-doc:: security/landlock/syscall.c
+    :functions: sys_landlock
+
+Commands
+--------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :functions: landlock_cmd
+
+Options
+-------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :functions: options_intro
+                options_get_features options_create_ruleset
+                options_add_rule options_enforce_ruleset
+
+Attributes
+----------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :functions: landlock_attr_features landlock_attr_ruleset
+                landlock_attr_path_beneath landlock_attr_enforce
+
+Access rights
+-------------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :functions: fs_access
+
+
+Questions and answers
+=====================
+
+What about user space sandbox managers?
+---------------------------------------
+
+Using user space process to enforce restrictions on kernel resources can lead
+to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of
+the OS code and state
+<https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_).
+
+What about namespaces and containers?
+-------------------------------------
+
+Namespaces can help create sandboxes but they are not designed for
+access-control and then miss useful features for such use case (e.g. no
+fine-grained restrictions).  Moreover, their complexity can lead to security
+issues, especially when untrusted processes can manipulate them (cf.
+`Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_).
+
+
+Additional documentation
+========================
+
+See https://landlock.io
+
+
+.. Links
+.. _samples/landlock/sandboxer.c: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/samples/landlock/sandboxer.c
+.. _tools/testing/selftests/landlock/: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/landlock/
+.. _tools/testing/selftests/landlock/test_ptrace.c: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/landlock/test_ptrace.c
-- 
2.25.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (9 preceding siblings ...)
  2020-02-24 16:02 ` [RFC PATCH v14 10/10] landlock: Add user and kernel documentation Mickaël Salaün
@ 2020-02-25 18:49 ` J Freyensee
  2020-02-26 15:34   ` Mickaël Salaün
       [not found] ` <20200227042002.3032-1-hdanton@sina.com>
  2020-03-09 23:44 ` [RFC PATCH v14 00/10] Landlock LSM Jann Horn
  12 siblings, 1 reply; 34+ messages in thread
From: J Freyensee @ 2020-02-25 18:49 UTC (permalink / raw)
  To: Mickaël Salaün, linux-kernel
  Cc: Al Viro, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Greg Kroah-Hartman, James Morris, Jann Horn, Jonathan Corbet,
	Kees Cook, Michael Kerrisk, Mickaël Salaün,
	Serge E . Hallyn, Shuah Khan, Vincent Dagonneau,
	kernel-hardening, linux-api, linux-arch, linux-doc,
	linux-fsdevel, linux-kselftest, linux-security-module, x86



On 2/24/20 8:02 AM, Mickaël Salaün wrote:

> ## Syscall
>
> Because it is only tested on x86_64, the syscall is only wired up for
> this architecture.  The whole x86 family (and probably all the others)
> will be supported in the next patch series.
General question for u.  What is it meant "whole x86 family will be 
supported".  32-bit x86 will be supported?

Thanks,
Jay


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 01/10] landlock: Add object and rule management
  2020-02-24 16:02 ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
@ 2020-02-25 20:49   ` Jann Horn
  2020-02-26 15:31     ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Jann Horn @ 2020-02-25 20:49 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün <mic@digikod.net> wrote:
> A Landlock object enables to identify a kernel object (e.g. an inode).
> A Landlock rule is a set of access rights allowed on an object.  Rules
> are grouped in rulesets that may be tied to a set of processes (i.e.
> subjects) to enforce a scoped access-control (i.e. a domain).
>
> Because Landlock's goal is to empower any process (especially
> unprivileged ones) to sandbox themselves, we can't rely on a system-wide
> object identification such as file extended attributes.  Indeed, we need
> innocuous, composable and modular access-controls.
>
> The main challenge with this constraints is to identify kernel objects
> while this identification is useful (i.e. when a security policy makes
> use of this object).  But this identification data should be freed once
> no policy is using it.  This ephemeral tagging should not and may not be
> written in the filesystem.  We then need to manage the lifetime of a
> rule according to the lifetime of its object.  To avoid a global lock,
> this implementation make use of RCU and counters to safely reference
> objects.
>
> A following commit uses this generic object management for inodes.
[...]
> diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
> new file mode 100644
> index 000000000000..4a321d5b3f67
> --- /dev/null
> +++ b/security/landlock/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config SECURITY_LANDLOCK
> +       bool "Landlock support"
> +       depends on SECURITY
> +       default n

(I think "default n" is implicit?)

> +       help
> +         This selects Landlock, a safe sandboxing mechanism.  It enables to
> +         restrict processes on the fly (i.e. enforce an access control policy),
> +         which can complement seccomp-bpf.  The security policy is a set of access
> +         rights tied to an object, which could be a file, a socket or a process.
> +
> +         See Documentation/security/landlock/ for further information.
> +
> +         If you are unsure how to answer this question, answer N.
[...]
> diff --git a/security/landlock/object.c b/security/landlock/object.c
> new file mode 100644
> index 000000000000..38fbbb108120
> --- /dev/null
> +++ b/security/landlock/object.c
> @@ -0,0 +1,339 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock LSM - Object and rule management
> + *
> + * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
> + * Copyright © 2018-2020 ANSSI
> + *
> + * Principles and constraints of the object and rule management:
> + * - Do not leak memory.
> + * - Try as much as possible to free a memory allocation as soon as it is
> + *   unused.
> + * - Do not use global lock.
> + * - Do not charge processes other than the one requesting a Landlock
> + *   operation.
> + */
> +
> +#include <linux/bug.h>
> +#include <linux/compiler.h>
> +#include <linux/compiler_types.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/fs.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/rbtree.h>
> +#include <linux/rcupdate.h>
> +#include <linux/refcount.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/workqueue.h>
> +
> +#include "object.h"
> +
> +struct landlock_object *landlock_create_object(
> +               const enum landlock_object_type type, void *underlying_object)
> +{
> +       struct landlock_object *object;
> +
> +       if (WARN_ON_ONCE(!underlying_object))
> +               return NULL;
> +       object = kzalloc(sizeof(*object), GFP_KERNEL);
> +       if (!object)
> +               return NULL;
> +       refcount_set(&object->usage, 1);
> +       refcount_set(&object->cleaners, 1);
> +       spin_lock_init(&object->lock);
> +       INIT_LIST_HEAD(&object->rules);
> +       object->type = type;
> +       WRITE_ONCE(object->underlying_object, underlying_object);

`object` is not globally visible at this point, so WRITE_ONCE() is unnecessary.

> +       return object;
> +}
> +
> +struct landlock_object *landlock_get_object(struct landlock_object *object)
> +       __acquires(object->usage)
> +{
> +       __acquire(object->usage);
> +       /*
> +        * If @object->usage equal 0, then it will be ignored by writers, and
> +        * underlying_object->object may be replaced, but this is not an issue
> +        * for release_object().
> +        */
> +       if (object && refcount_inc_not_zero(&object->usage)) {
> +               /*
> +                * It should not be possible to get a reference to an object if
> +                * its underlying object is being terminated (e.g. with
> +                * landlock_release_object()), because an object is only
> +                * modifiable through such underlying object.  This is not the
> +                * case with landlock_get_object_cleaner().
> +                */
> +               WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
> +               return object;
> +       }
> +       return NULL;
> +}
> +
> +static struct landlock_object *get_object_cleaner(
> +               struct landlock_object *object)
> +       __acquires(object->cleaners)
> +{
> +       __acquire(object->cleaners);
> +       if (object && refcount_inc_not_zero(&object->cleaners))
> +               return object;
> +       return NULL;
> +}

I don't get this whole "cleaners" thing. Can you give a quick
description of why this is necessary, and what benefits it has over a
standard refcounting+RCU scheme? I don't immediately see anything that
requires this.

> +/*
> + * There is two cases when an object should be free and the reference to the
> + * underlying object should be put:
> + * - when the last rule tied to this object is removed, which is handled by
> + *   landlock_put_rule() and then release_object();
> + * - when the object is being terminated (e.g. no more reference to an inode),
> + *   which is handled by landlock_put_object().
> + */
> +static void put_object_free(struct landlock_object *object)
> +       __releases(object->cleaners)
> +{
> +       __release(object->cleaners);
> +       if (!refcount_dec_and_test(&object->cleaners))
> +               return;
> +       WARN_ON_ONCE(refcount_read(&object->usage));
> +       /*
> +        * Ensures a safe use of @object in the RCU block from
> +        * landlock_put_rule().
> +        */
> +       kfree_rcu(object, rcu_free);
> +}
> +
> +/*
> + * Destroys a newly created and useless object.
> + */
> +void landlock_drop_object(struct landlock_object *object)
> +{
> +       if (WARN_ON_ONCE(!refcount_dec_and_test(&object->usage)))
> +               return;
> +       __acquire(object->cleaners);
> +       put_object_free(object);
> +}
> +
> +/*
> + * Puts the underlying object (e.g. inode) if it is the first request to
> + * release @object, without calling landlock_put_object().
> + *
> + * Return true if this call effectively marks @object as released, false
> + * otherwise.
> + */
> +static bool release_object(struct landlock_object *object)
> +       __releases(&object->lock)
> +{
> +       void *underlying_object;
> +
> +       lockdep_assert_held(&object->lock);
> +
> +       underlying_object = xchg(&object->underlying_object, NULL);
> +       spin_unlock(&object->lock);
> +       might_sleep();
> +       if (!underlying_object)
> +               return false;
> +
> +       switch (object->type) {
> +       case LANDLOCK_OBJECT_INODE:
> +               break;
> +       default:
> +               WARN_ON_ONCE(1);
> +       }
> +       return true;
> +}
> +
> +static void put_object_cleaner(struct landlock_object *object)
> +       __releases(object->cleaners)
> +{
> +       /* Let's try an early lockless check. */
> +       if (list_empty(&object->rules) &&
> +                       READ_ONCE(object->underlying_object)) {
> +               /*
> +                * Puts @object if there is no rule tied to it and the
> +                * remaining user is the underlying object.  This check is
> +                * atomic because @object->rules and @object->underlying_object
> +                * are protected by @object->lock.
> +                */
> +               spin_lock(&object->lock);
> +               if (list_empty(&object->rules) &&
> +                               READ_ONCE(object->underlying_object) &&
> +                               refcount_dec_if_one(&object->usage)) {
> +                       /*
> +                        * Releases @object, in place of
> +                        * landlock_release_object().
> +                        *
> +                        * @object is already empty, implying that all its
> +                        * previous rules are already disabled.
> +                        *
> +                        * Unbalance the @object->cleaners counter to reflect
> +                        * the underlying object release.
> +                        */
> +                       if (!WARN_ON_ONCE(!release_object(object))) {
> +                               __acquire(object->cleaners);
> +                               put_object_free(object);
> +                       }
> +               } else {
> +                       spin_unlock(&object->lock);
> +               }
> +       }
> +       put_object_free(object);
> +}
> +
> +/*
> + * Putting an object is easy when the object is being terminated, but it is
> + * much more tricky when the reason is that there is no more rule tied to this
> + * object.  Indeed, new rules could be added at the same time.
> + */
> +void landlock_put_object(struct landlock_object *object)
> +       __releases(object->usage)
> +{
> +       struct landlock_object *object_cleaner;
> +
> +       __release(object->usage);
> +       might_sleep();
> +       if (!object)
> +               return;
> +       /*
> +        * Guards against concurrent termination to be able to terminate
> +        * @object if it is empty and not referenced by another rule-appender
> +        * other than the underlying object.
> +        */
> +       object_cleaner = get_object_cleaner(object);
> +       if (WARN_ON_ONCE(!object_cleaner)) {
> +               __release(object->cleaners);
> +               return;
> +       }
> +       /*
> +        * Decrements @object->usage and if it reach zero, also decrement
> +        * @object->cleaners.  If both reach zero, then release and free
> +        * @object.
> +        */
> +       if (refcount_dec_and_test(&object->usage)) {
> +               struct landlock_rule *rule_walker, *rule_walker2;
> +
> +               spin_lock(&object->lock);
> +               /*
> +                * Disables all the rules tied to @object when it is forbidden
> +                * to add new rule but still allowed to remove them with
> +                * landlock_put_rule().  This is crucial to be able to safely
> +                * free a rule according to landlock_rule_is_disabled().
> +                */
> +               list_for_each_entry_safe(rule_walker, rule_walker2,
> +                               &object->rules, list)
> +                       list_del_rcu(&rule_walker->list);
> +
> +               /*
> +                * Releases @object if it is not already released (e.g. with
> +                * landlock_release_object()).
> +                */
> +               release_object(object);
> +               /*
> +                * Unbalances the @object->cleaners counter to reflect the
> +                * underlying object release.
> +                */
> +               __acquire(object->cleaners);
> +               put_object_free(object);
> +       }
> +       put_object_cleaner(object_cleaner);
> +}
> +
> +void landlock_put_rule(struct landlock_object *object,
> +               struct landlock_rule *rule)
> +{
> +       if (!rule)
> +               return;
> +       WARN_ON_ONCE(!object);
> +       /*
> +        * Guards against a concurrent @object self-destruction with
> +        * landlock_put_object() or put_object_cleaner().
> +        */
> +       rcu_read_lock();
> +       if (landlock_rule_is_disabled(rule)) {
> +               rcu_read_unlock();
> +               if (refcount_dec_and_test(&rule->usage))
> +                       kfree_rcu(rule, rcu_free);
> +               return;
> +       }
> +       if (refcount_dec_and_test(&rule->usage)) {
> +               struct landlock_object *safe_object;
> +
> +               /*
> +                * Now, @rule may still be enabled, or in the process of being
> +                * untied to @object by put_object_cleaner().  However, we know
> +                * that @object will not be freed until rcu_read_unlock() and
> +                * until @object->cleaners reach zero.  Furthermore, we may not
> +                * be the only one willing to free a @rule linked with @object.
> +                * If we succeed to hold @object with get_object_cleaner(), we
> +                * know that until put_object_cleaner(), we can safely use
> +                * @object to remove @rule.
> +                */
> +               safe_object = get_object_cleaner(object);
> +               rcu_read_unlock();
> +               if (!safe_object) {
> +                       __release(safe_object->cleaners);
> +                       /*
> +                        * We can safely free @rule because it is already
> +                        * removed from @object's list.
> +                        */
> +                       WARN_ON_ONCE(!landlock_rule_is_disabled(rule));
> +                       kfree_rcu(rule, rcu_free);
> +               } else {
> +                       spin_lock(&safe_object->lock);
> +                       if (!landlock_rule_is_disabled(rule))
> +                               list_del(&rule->list);
> +                       spin_unlock(&safe_object->lock);
> +                       kfree_rcu(rule, rcu_free);
> +                       put_object_cleaner(safe_object);
> +               }
> +       } else {
> +               rcu_read_unlock();
> +       }
> +       /*
> +        * put_object_cleaner() might sleep, but it is only reachable if
> +        * !landlock_rule_is_disabled().  Therefore, clean_ref() can not sleep.
> +        */
> +       might_sleep();
> +}
> +
> +void landlock_release_object(struct landlock_object __rcu *rcu_object)
> +{
> +       struct landlock_object *object;
> +
> +       if (!rcu_object)
> +               return;
> +       rcu_read_lock();
> +       object = get_object_cleaner(rcu_dereference(rcu_object));

This is not how RCU works. You need the rcu annotation on the access
to the data structure member (or global variable) that's actually
being accessed. A "struct foo __rcu *foo" argument is essentially
always wrong.

> +struct landlock_rule {
> +       struct landlock_access access;
> +       /*
> +        * @list: Linked list with other rules tied to the same object, which
> +        * enable to manage their lifetimes.  This is also used to identify if
> +        * a rule is still valid, thanks to landlock_rule_is_disabled(), which
> +        * is important in the matching process because the original object
> +        * address might have been recycled.
> +        */
> +       struct list_head list;
> +       union {
> +               /*
> +                * @usage: Number of rulesets pointing to this rule.  This
> +                * field is never used by RCU readers.
> +                */
> +               refcount_t usage;
> +               struct rcu_head rcu_free;
> +       };
> +};

An object that is subject to RCU but whose refcount must not be
accessed from RCU context? That seems a weird.

> +enum landlock_object_type {
> +       LANDLOCK_OBJECT_INODE = 1,
> +};
> +
> +struct landlock_object {
> +       /*
> +        * @usage: Main usage counter, used to tie an object to it's underlying
> +        * object (i.e. create a lifetime) and potentially add new rules.

I can't really follow this by reading this patch on its own. As one
suggestion to make things at least a bit better, how about documenting
here that `usage` always reaches zero before `cleaners` does?

> +        */
> +       refcount_t usage;
> +       /*
> +        * @cleaners: Usage counter used to free a rule from @rules (thanks to
> +        * put_rule()).  Enables to get a reference to this object until it
> +        * really become freed.  Cf. put_object().

Maybe add: @usage being non-zero counts as one reference to @cleaners.
Once @cleaners has become zero, the object is freed after an RCU grace
period.

> +        */
> +       refcount_t cleaners;
> +       union {
> +               /*
> +                * The use of this struct is controlled by @usage and
> +                * @cleaners, which makes it safe to union it with @rcu_free.
> +                */
[...]
> +               struct rcu_head rcu_free;
> +       };
> +};
[...]
> +static inline bool landlock_rule_is_disabled(
> +               struct landlock_rule *rule)
> +{
> +       /*
> +        * Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
> +        * It is not possible to re-enable such a rule, then there is no need
> +        * for smp_load_acquire().
> +        *
> +        * LIST_POISON2 is set by list_del() and list_del_rcu().
> +        */
> +       return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;

You're not allowed to do this, the comment above list_del() states:

 * Note: list_empty() on entry does not return true after this, the entry is
 * in an undefined state.

If you want to be able to test whether the element is on a list
afterwards, use stuff like list_del_init().

> +}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 01/10] landlock: Add object and rule management
  2020-02-25 20:49   ` Jann Horn
@ 2020-02-26 15:31     ` Mickaël Salaün
  2020-02-26 20:24       ` Jann Horn
  0 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-26 15:31 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers


On 25/02/2020 21:49, Jann Horn wrote:
> On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün <mic@digikod.net> wrote:
>> A Landlock object enables to identify a kernel object (e.g. an inode).
>> A Landlock rule is a set of access rights allowed on an object.  Rules
>> are grouped in rulesets that may be tied to a set of processes (i.e.
>> subjects) to enforce a scoped access-control (i.e. a domain).
>>
>> Because Landlock's goal is to empower any process (especially
>> unprivileged ones) to sandbox themselves, we can't rely on a system-wide
>> object identification such as file extended attributes.  Indeed, we need
>> innocuous, composable and modular access-controls.
>>
>> The main challenge with this constraints is to identify kernel objects
>> while this identification is useful (i.e. when a security policy makes
>> use of this object).  But this identification data should be freed once
>> no policy is using it.  This ephemeral tagging should not and may not be
>> written in the filesystem.  We then need to manage the lifetime of a
>> rule according to the lifetime of its object.  To avoid a global lock,
>> this implementation make use of RCU and counters to safely reference
>> objects.
>>
>> A following commit uses this generic object management for inodes.
> [...]
>> diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
>> new file mode 100644
>> index 000000000000..4a321d5b3f67
>> --- /dev/null
>> +++ b/security/landlock/Kconfig
>> @@ -0,0 +1,15 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +
>> +config SECURITY_LANDLOCK
>> +       bool "Landlock support"
>> +       depends on SECURITY
>> +       default n
> 
> (I think "default n" is implicit?)

It seems that most (all?) Kconfig are written like this.

> 
>> +       help
>> +         This selects Landlock, a safe sandboxing mechanism.  It enables to
>> +         restrict processes on the fly (i.e. enforce an access control policy),
>> +         which can complement seccomp-bpf.  The security policy is a set of access
>> +         rights tied to an object, which could be a file, a socket or a process.
>> +
>> +         See Documentation/security/landlock/ for further information.
>> +
>> +         If you are unsure how to answer this question, answer N.
> [...]
>> diff --git a/security/landlock/object.c b/security/landlock/object.c
>> new file mode 100644
>> index 000000000000..38fbbb108120
>> --- /dev/null
>> +++ b/security/landlock/object.c
>> @@ -0,0 +1,339 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Landlock LSM - Object and rule management
>> + *
>> + * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
>> + * Copyright © 2018-2020 ANSSI
>> + *
>> + * Principles and constraints of the object and rule management:
>> + * - Do not leak memory.
>> + * - Try as much as possible to free a memory allocation as soon as it is
>> + *   unused.
>> + * - Do not use global lock.
>> + * - Do not charge processes other than the one requesting a Landlock
>> + *   operation.
>> + */
>> +
>> +#include <linux/bug.h>
>> +#include <linux/compiler.h>
>> +#include <linux/compiler_types.h>
>> +#include <linux/err.h>
>> +#include <linux/errno.h>
>> +#include <linux/fs.h>
>> +#include <linux/kernel.h>
>> +#include <linux/list.h>
>> +#include <linux/rbtree.h>
>> +#include <linux/rcupdate.h>
>> +#include <linux/refcount.h>
>> +#include <linux/slab.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/workqueue.h>
>> +
>> +#include "object.h"
>> +
>> +struct landlock_object *landlock_create_object(
>> +               const enum landlock_object_type type, void *underlying_object)
>> +{
>> +       struct landlock_object *object;
>> +
>> +       if (WARN_ON_ONCE(!underlying_object))
>> +               return NULL;
>> +       object = kzalloc(sizeof(*object), GFP_KERNEL);
>> +       if (!object)
>> +               return NULL;
>> +       refcount_set(&object->usage, 1);
>> +       refcount_set(&object->cleaners, 1);
>> +       spin_lock_init(&object->lock);
>> +       INIT_LIST_HEAD(&object->rules);
>> +       object->type = type;
>> +       WRITE_ONCE(object->underlying_object, underlying_object);
> 
> `object` is not globally visible at this point, so WRITE_ONCE() is unnecessary.

Right. It was written like this to have a uniform use of this pointer,
but I'll remove it.

> 
>> +       return object;
>> +}
>> +
>> +struct landlock_object *landlock_get_object(struct landlock_object *object)
>> +       __acquires(object->usage)
>> +{
>> +       __acquire(object->usage);
>> +       /*
>> +        * If @object->usage equal 0, then it will be ignored by writers, and
>> +        * underlying_object->object may be replaced, but this is not an issue
>> +        * for release_object().
>> +        */
>> +       if (object && refcount_inc_not_zero(&object->usage)) {
>> +               /*
>> +                * It should not be possible to get a reference to an object if
>> +                * its underlying object is being terminated (e.g. with
>> +                * landlock_release_object()), because an object is only
>> +                * modifiable through such underlying object.  This is not the
>> +                * case with landlock_get_object_cleaner().
>> +                */
>> +               WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
>> +               return object;
>> +       }
>> +       return NULL;
>> +}
>> +
>> +static struct landlock_object *get_object_cleaner(
>> +               struct landlock_object *object)
>> +       __acquires(object->cleaners)
>> +{
>> +       __acquire(object->cleaners);
>> +       if (object && refcount_inc_not_zero(&object->cleaners))
>> +               return object;
>> +       return NULL;
>> +}
> 
> I don't get this whole "cleaners" thing. Can you give a quick
> description of why this is necessary, and what benefits it has over a
> standard refcounting+RCU scheme? I don't immediately see anything that
> requires this.

This indeed needs more documentation here. Here is a comment I'll add to
get_object_cleaner():

This enables to safely get a reference to an object to potentially free
it if it is not already being freed by a concurrent thread. Indeed, the
object's address may still be read and dereferenced while a concurrent
thread is attempting to clean the object. Cf. &struct
landlock_object->usage and &struct landlock_object->cleaners.

See below the explanation about "usage" and "cleaners".

> 
>> +/*
>> + * There is two cases when an object should be free and the reference to the
>> + * underlying object should be put:
>> + * - when the last rule tied to this object is removed, which is handled by
>> + *   landlock_put_rule() and then release_object();
>> + * - when the object is being terminated (e.g. no more reference to an inode),
>> + *   which is handled by landlock_put_object().
>> + */
>> +static void put_object_free(struct landlock_object *object)
>> +       __releases(object->cleaners)
>> +{
>> +       __release(object->cleaners);
>> +       if (!refcount_dec_and_test(&object->cleaners))
>> +               return;
>> +       WARN_ON_ONCE(refcount_read(&object->usage));
>> +       /*
>> +        * Ensures a safe use of @object in the RCU block from
>> +        * landlock_put_rule().
>> +        */
>> +       kfree_rcu(object, rcu_free);
>> +}
>> +
>> +/*
>> + * Destroys a newly created and useless object.
>> + */
>> +void landlock_drop_object(struct landlock_object *object)
>> +{
>> +       if (WARN_ON_ONCE(!refcount_dec_and_test(&object->usage)))
>> +               return;
>> +       __acquire(object->cleaners);
>> +       put_object_free(object);
>> +}
>> +
>> +/*
>> + * Puts the underlying object (e.g. inode) if it is the first request to
>> + * release @object, without calling landlock_put_object().
>> + *
>> + * Return true if this call effectively marks @object as released, false
>> + * otherwise.
>> + */
>> +static bool release_object(struct landlock_object *object)
>> +       __releases(&object->lock)
>> +{
>> +       void *underlying_object;
>> +
>> +       lockdep_assert_held(&object->lock);
>> +
>> +       underlying_object = xchg(&object->underlying_object, NULL);
>> +       spin_unlock(&object->lock);
>> +       might_sleep();
>> +       if (!underlying_object)
>> +               return false;
>> +
>> +       switch (object->type) {
>> +       case LANDLOCK_OBJECT_INODE:
>> +               break;
>> +       default:
>> +               WARN_ON_ONCE(1);
>> +       }
>> +       return true;
>> +}
>> +
>> +static void put_object_cleaner(struct landlock_object *object)
>> +       __releases(object->cleaners)
>> +{
>> +       /* Let's try an early lockless check. */
>> +       if (list_empty(&object->rules) &&
>> +                       READ_ONCE(object->underlying_object)) {
>> +               /*
>> +                * Puts @object if there is no rule tied to it and the
>> +                * remaining user is the underlying object.  This check is
>> +                * atomic because @object->rules and @object->underlying_object
>> +                * are protected by @object->lock.
>> +                */
>> +               spin_lock(&object->lock);
>> +               if (list_empty(&object->rules) &&
>> +                               READ_ONCE(object->underlying_object) &&
>> +                               refcount_dec_if_one(&object->usage)) {
>> +                       /*
>> +                        * Releases @object, in place of
>> +                        * landlock_release_object().
>> +                        *
>> +                        * @object is already empty, implying that all its
>> +                        * previous rules are already disabled.
>> +                        *
>> +                        * Unbalance the @object->cleaners counter to reflect
>> +                        * the underlying object release.
>> +                        */
>> +                       if (!WARN_ON_ONCE(!release_object(object))) {
>> +                               __acquire(object->cleaners);
>> +                               put_object_free(object);
>> +                       }
>> +               } else {
>> +                       spin_unlock(&object->lock);
>> +               }
>> +       }
>> +       put_object_free(object);
>> +}
>> +
>> +/*
>> + * Putting an object is easy when the object is being terminated, but it is
>> + * much more tricky when the reason is that there is no more rule tied to this
>> + * object.  Indeed, new rules could be added at the same time.
>> + */
>> +void landlock_put_object(struct landlock_object *object)
>> +       __releases(object->usage)
>> +{
>> +       struct landlock_object *object_cleaner;
>> +
>> +       __release(object->usage);
>> +       might_sleep();
>> +       if (!object)
>> +               return;
>> +       /*
>> +        * Guards against concurrent termination to be able to terminate
>> +        * @object if it is empty and not referenced by another rule-appender
>> +        * other than the underlying object.
>> +        */
>> +       object_cleaner = get_object_cleaner(object);
>> +       if (WARN_ON_ONCE(!object_cleaner)) {
>> +               __release(object->cleaners);
>> +               return;
>> +       }
>> +       /*
>> +        * Decrements @object->usage and if it reach zero, also decrement
>> +        * @object->cleaners.  If both reach zero, then release and free
>> +        * @object.
>> +        */
>> +       if (refcount_dec_and_test(&object->usage)) {
>> +               struct landlock_rule *rule_walker, *rule_walker2;
>> +
>> +               spin_lock(&object->lock);
>> +               /*
>> +                * Disables all the rules tied to @object when it is forbidden
>> +                * to add new rule but still allowed to remove them with
>> +                * landlock_put_rule().  This is crucial to be able to safely
>> +                * free a rule according to landlock_rule_is_disabled().
>> +                */
>> +               list_for_each_entry_safe(rule_walker, rule_walker2,
>> +                               &object->rules, list)
>> +                       list_del_rcu(&rule_walker->list);
>> +
>> +               /*
>> +                * Releases @object if it is not already released (e.g. with
>> +                * landlock_release_object()).
>> +                */
>> +               release_object(object);
>> +               /*
>> +                * Unbalances the @object->cleaners counter to reflect the
>> +                * underlying object release.
>> +                */
>> +               __acquire(object->cleaners);
>> +               put_object_free(object);
>> +       }
>> +       put_object_cleaner(object_cleaner);
>> +}
>> +
>> +void landlock_put_rule(struct landlock_object *object,
>> +               struct landlock_rule *rule)
>> +{
>> +       if (!rule)
>> +               return;
>> +       WARN_ON_ONCE(!object);
>> +       /*
>> +        * Guards against a concurrent @object self-destruction with
>> +        * landlock_put_object() or put_object_cleaner().
>> +        */
>> +       rcu_read_lock();
>> +       if (landlock_rule_is_disabled(rule)) {
>> +               rcu_read_unlock();
>> +               if (refcount_dec_and_test(&rule->usage))
>> +                       kfree_rcu(rule, rcu_free);
>> +               return;
>> +       }
>> +       if (refcount_dec_and_test(&rule->usage)) {
>> +               struct landlock_object *safe_object;
>> +
>> +               /*
>> +                * Now, @rule may still be enabled, or in the process of being
>> +                * untied to @object by put_object_cleaner().  However, we know
>> +                * that @object will not be freed until rcu_read_unlock() and
>> +                * until @object->cleaners reach zero.  Furthermore, we may not
>> +                * be the only one willing to free a @rule linked with @object.
>> +                * If we succeed to hold @object with get_object_cleaner(), we
>> +                * know that until put_object_cleaner(), we can safely use
>> +                * @object to remove @rule.
>> +                */
>> +               safe_object = get_object_cleaner(object);
>> +               rcu_read_unlock();
>> +               if (!safe_object) {
>> +                       __release(safe_object->cleaners);
>> +                       /*
>> +                        * We can safely free @rule because it is already
>> +                        * removed from @object's list.
>> +                        */
>> +                       WARN_ON_ONCE(!landlock_rule_is_disabled(rule));
>> +                       kfree_rcu(rule, rcu_free);
>> +               } else {
>> +                       spin_lock(&safe_object->lock);
>> +                       if (!landlock_rule_is_disabled(rule))
>> +                               list_del(&rule->list);
>> +                       spin_unlock(&safe_object->lock);
>> +                       kfree_rcu(rule, rcu_free);
>> +                       put_object_cleaner(safe_object);
>> +               }
>> +       } else {
>> +               rcu_read_unlock();
>> +       }
>> +       /*
>> +        * put_object_cleaner() might sleep, but it is only reachable if
>> +        * !landlock_rule_is_disabled().  Therefore, clean_ref() can not sleep.
>> +        */
>> +       might_sleep();
>> +}
>> +
>> +void landlock_release_object(struct landlock_object __rcu *rcu_object)
>> +{
>> +       struct landlock_object *object;
>> +
>> +       if (!rcu_object)
>> +               return;
>> +       rcu_read_lock();
>> +       object = get_object_cleaner(rcu_dereference(rcu_object));
> 
> This is not how RCU works. You need the rcu annotation on the access
> to the data structure member (or global variable) that's actually
> being accessed. A "struct foo __rcu *foo" argument is essentially
> always wrong.

Absolutely! I fixed this with the following patch:

diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 7f3bd4fd04bb..01a48c75f210 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -98,7 +98,9 @@ void landlock_release_inodes(struct super_block *sb)
 		if (iput_inode)
 			iput(iput_inode);

-		landlock_release_object(inode_landlock(inode)->object);
+		rcu_read_lock();
+		landlock_release_object(rcu_dereference(
+					inode_landlock(inode)->object));

 		iput_inode = inode;
 		spin_lock(&sb->s_inode_list_lock);
diff --git a/security/landlock/object.c b/security/landlock/object.c
index 2d373f224989..a0e65a78068d 100644
--- a/security/landlock/object.c
+++ b/security/landlock/object.c
@@ -300,14 +300,16 @@ void landlock_put_rule(struct landlock_object *object,
 	might_sleep();
 }

-void landlock_release_object(struct landlock_object __rcu *rcu_object)
+void landlock_release_object(struct landlock_object *rcu_object)
+	__releases(RCU)
 {
 	struct landlock_object *object;

-	if (!rcu_object)
+	if (!rcu_object) {
+		rcu_read_unlock();
 		return;
-	rcu_read_lock();
-	object = get_object_cleaner(rcu_dereference(rcu_object));
+	}
+	object = get_object_cleaner(rcu_object);
 	rcu_read_unlock();
 	if (unlikely(!object)) {
 		__release(object->cleaners);
diff --git a/security/landlock/object.h b/security/landlock/object.h
index 15dfc9a75a82..78bfb25d4bcc 100644
--- a/security/landlock/object.h
+++ b/security/landlock/object.h
@@ -12,9 +12,9 @@
 #include <linux/compiler_types.h>
 #include <linux/list.h>
 #include <linux/poison.h>
-#include <linux/rcupdate.h>
 #include <linux/refcount.h>
 #include <linux/spinlock.h>
+#include <linux/types.h>

 struct landlock_access {
 	/*
@@ -105,7 +105,8 @@ struct landlock_object {
 void landlock_put_rule(struct landlock_object *object,
 		struct landlock_rule *rule);

-void landlock_release_object(struct landlock_object __rcu *rcu_object);
+void landlock_release_object(struct landlock_object *object)
+	__releases(RCU);

 struct landlock_object *landlock_create_object(
 		const enum landlock_object_type type, void *underlying_object);


> 
>> +struct landlock_rule {
>> +       struct landlock_access access;
>> +       /*
>> +        * @list: Linked list with other rules tied to the same object, which
>> +        * enable to manage their lifetimes.  This is also used to identify if
>> +        * a rule is still valid, thanks to landlock_rule_is_disabled(), which
>> +        * is important in the matching process because the original object
>> +        * address might have been recycled.
>> +        */
>> +       struct list_head list;
>> +       union {
>> +               /*
>> +                * @usage: Number of rulesets pointing to this rule.  This
>> +                * field is never used by RCU readers.
>> +                */
>> +               refcount_t usage;
>> +               struct rcu_head rcu_free;
>> +       };
>> +};
> 
> An object that is subject to RCU but whose refcount must not be
> accessed from RCU context? That seems a weird.

The fields "access" and "list" are read (in a RCU-read block) by
ruleset.c:landlock_find_access() (cf. patch 2). The use of the "usage"
counter is in landlock_insert_ruleset_rule() and landlock_put_rule(),
but in these cases the rule is always owned/held by the caller. I should
say something like "This field must only be used when already holding
the rule."

> 
>> +enum landlock_object_type {
>> +       LANDLOCK_OBJECT_INODE = 1,
>> +};
>> +
>> +struct landlock_object {
>> +       /*
>> +        * @usage: Main usage counter, used to tie an object to it's underlying
>> +        * object (i.e. create a lifetime) and potentially add new rules.
> 
> I can't really follow this by reading this patch on its own. As one
> suggestion to make things at least a bit better, how about documenting
> here that `usage` always reaches zero before `cleaners` does?

What about this?

This counter is used to tie an object to its underlying object (e.g. an
inode) and to modify it (e.g. add or remove a rule). If this counter
reaches zero, the object must not be modified, but it may still be used
from within an RCU-read block. When adding a new rule to an object with
a usage counter of zero, the underlying object must be locked and its
object pointer can then be replaced with a new empty object (while
ignoring the disabled object which is being handled by another thread).
This counter always reaches zero before @cleaners does.


> 
>> +        */
>> +       refcount_t usage;
>> +       /*
>> +        * @cleaners: Usage counter used to free a rule from @rules (thanks to
>> +        * put_rule()).  Enables to get a reference to this object until it
>> +        * really become freed.  Cf. put_object().
> 
> Maybe add: @usage being non-zero counts as one reference to @cleaners.
> Once @cleaners has become zero, the object is freed after an RCU grace
> period.

What about this?

This counter can only reach zero if the @usage counter already reached
zero. Indeed, @usage being non-zero counts as one reference to
@cleaners. Once @cleaners has become zero, the object is freed after an
RCU grace period. This enables concurrent threads to safely get an
object reference to terminate it if there is no more concurrent cleaners
for this object. This mechanism is required to enable concurrent threads
to safely dereference an object from potentially different pointers
(e.g. the underlying object, or a rule tied to this object), to
potentially terminate and free it (i.e. if there is no more rules tied
to it, or if the underlying object is being terminated).

> 
>> +        */
>> +       refcount_t cleaners;
>> +       union {
>> +               /*
>> +                * The use of this struct is controlled by @usage and
>> +                * @cleaners, which makes it safe to union it with @rcu_free.
>> +                */
> [...]
>> +               struct rcu_head rcu_free;
>> +       };
>> +};
> [...]
>> +static inline bool landlock_rule_is_disabled(
>> +               struct landlock_rule *rule)
>> +{
>> +       /*
>> +        * Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
>> +        * It is not possible to re-enable such a rule, then there is no need
>> +        * for smp_load_acquire().
>> +        *
>> +        * LIST_POISON2 is set by list_del() and list_del_rcu().
>> +        */
>> +       return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
> 
> You're not allowed to do this, the comment above list_del() states:
> 
>  * Note: list_empty() on entry does not return true after this, the entry is
>  * in an undefined state.

list_del() checks READ_ONCE(head->next) == head, but
landlock_rule_is_disabled() checks READ_ONCE(rule->list.prev) ==
LIST_POISON2.
The comment about LIST_POISON2 is right but may be misleading. There is
no use of list_empty() with a landlock_rule->list, only
landlock_object->rules. The only list_del() is in landlock_put_rule()
when there is a guarantee that there is no other reference to it, hence
no possible use of landlock_rule_is_disabled() with this rule. I could
replace it with a call to list_del_rcu() to make it more consistent.

> 
> If you want to be able to test whether the element is on a list
> afterwards, use stuff like list_del_init().

There is no need to re-initialize the list but using list_del_init() and
list_empty() could work too. However, there is no list_del_init_rcu()
helper. Moreover, resetting the list's pointer with LIST_POISON2 might
help to detect bugs.


Thanks for this review!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-02-25 18:49 ` [RFC PATCH v14 00/10] Landlock LSM J Freyensee
@ 2020-02-26 15:34   ` Mickaël Salaün
  0 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-26 15:34 UTC (permalink / raw)
  To: J Freyensee, linux-kernel
  Cc: Al Viro, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Greg Kroah-Hartman, James Morris, Jann Horn, Jonathan Corbet,
	Kees Cook, Michael Kerrisk, Mickaël Salaün,
	Serge E . Hallyn, Shuah Khan, Vincent Dagonneau,
	kernel-hardening, linux-api, linux-arch, linux-doc,
	linux-fsdevel, linux-kselftest, linux-security-module, x86


On 25/02/2020 19:49, J Freyensee wrote:
> 
> 
> On 2/24/20 8:02 AM, Mickaël Salaün wrote:
> 
>> ## Syscall
>>
>> Because it is only tested on x86_64, the syscall is only wired up for
>> this architecture.  The whole x86 family (and probably all the others)
>> will be supported in the next patch series.
> General question for u.  What is it meant "whole x86 family will be
> supported".  32-bit x86 will be supported?

Yes, I was referring to x86_32, x86_64 and x32, but all architectures
should be supported.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 01/10] landlock: Add object and rule management
  2020-02-26 15:31     ` Mickaël Salaün
@ 2020-02-26 20:24       ` Jann Horn
  2020-02-27 16:46         ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Jann Horn @ 2020-02-26 20:24 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Wed, Feb 26, 2020 at 4:32 PM Mickaël Salaün <mic@digikod.net> wrote:
> On 25/02/2020 21:49, Jann Horn wrote:
> > On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün <mic@digikod.net> wrote:
> >> A Landlock object enables to identify a kernel object (e.g. an inode).
> >> A Landlock rule is a set of access rights allowed on an object.  Rules
> >> are grouped in rulesets that may be tied to a set of processes (i.e.
> >> subjects) to enforce a scoped access-control (i.e. a domain).
> >>
> >> Because Landlock's goal is to empower any process (especially
> >> unprivileged ones) to sandbox themselves, we can't rely on a system-wide
> >> object identification such as file extended attributes.  Indeed, we need
> >> innocuous, composable and modular access-controls.
> >>
> >> The main challenge with this constraints is to identify kernel objects
> >> while this identification is useful (i.e. when a security policy makes
> >> use of this object).  But this identification data should be freed once
> >> no policy is using it.  This ephemeral tagging should not and may not be
> >> written in the filesystem.  We then need to manage the lifetime of a
> >> rule according to the lifetime of its object.  To avoid a global lock,
> >> this implementation make use of RCU and counters to safely reference
> >> objects.
> >>
> >> A following commit uses this generic object management for inodes.
[...]
> >> +config SECURITY_LANDLOCK
> >> +       bool "Landlock support"
> >> +       depends on SECURITY
> >> +       default n
> >
> > (I think "default n" is implicit?)
>
> It seems that most (all?) Kconfig are written like this.

See e.g. <https://lore.kernel.org/lkml/c187bb77-e804-93bd-64db-9418be58f191@infradead.org/>.

[...]
> >> +       return object;
> >> +}
> >> +
> >> +struct landlock_object *landlock_get_object(struct landlock_object *object)
> >> +       __acquires(object->usage)
> >> +{
> >> +       __acquire(object->usage);
> >> +       /*
> >> +        * If @object->usage equal 0, then it will be ignored by writers, and
> >> +        * underlying_object->object may be replaced, but this is not an issue
> >> +        * for release_object().
> >> +        */
> >> +       if (object && refcount_inc_not_zero(&object->usage)) {
> >> +               /*
> >> +                * It should not be possible to get a reference to an object if
> >> +                * its underlying object is being terminated (e.g. with
> >> +                * landlock_release_object()), because an object is only
> >> +                * modifiable through such underlying object.  This is not the
> >> +                * case with landlock_get_object_cleaner().
> >> +                */
> >> +               WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
> >> +               return object;
> >> +       }
> >> +       return NULL;
> >> +}
> >> +
> >> +static struct landlock_object *get_object_cleaner(
> >> +               struct landlock_object *object)
> >> +       __acquires(object->cleaners)
> >> +{
> >> +       __acquire(object->cleaners);
> >> +       if (object && refcount_inc_not_zero(&object->cleaners))
> >> +               return object;
> >> +       return NULL;
> >> +}
> >
> > I don't get this whole "cleaners" thing. Can you give a quick
> > description of why this is necessary, and what benefits it has over a
> > standard refcounting+RCU scheme? I don't immediately see anything that
> > requires this.
>
> This indeed needs more documentation here. Here is a comment I'll add to
> get_object_cleaner():
>
> This enables to safely get a reference to an object to potentially free
> it if it is not already being freed by a concurrent thread.

"get a reference to an object to potentially free it" just sounds all
wrong to me. You free an object when you're *dropping* a reference to
it. Your refcounting scheme doesn't fit my mental models of how normal
refcounting works at all...

[...]
> >> +/*
> >> + * Putting an object is easy when the object is being terminated, but it is
> >> + * much more tricky when the reason is that there is no more rule tied to this
> >> + * object.  Indeed, new rules could be added at the same time.
> >> + */
> >> +void landlock_put_object(struct landlock_object *object)
> >> +       __releases(object->usage)
> >> +{
> >> +       struct landlock_object *object_cleaner;
> >> +
> >> +       __release(object->usage);
> >> +       might_sleep();
> >> +       if (!object)
> >> +               return;
> >> +       /*
> >> +        * Guards against concurrent termination to be able to terminate
> >> +        * @object if it is empty and not referenced by another rule-appender
> >> +        * other than the underlying object.
> >> +        */
> >> +       object_cleaner = get_object_cleaner(object);
[...]
> >> +       /*
> >> +        * Decrements @object->usage and if it reach zero, also decrement
> >> +        * @object->cleaners.  If both reach zero, then release and free
> >> +        * @object.
> >> +        */
> >> +       if (refcount_dec_and_test(&object->usage)) {
> >> +               struct landlock_rule *rule_walker, *rule_walker2;
> >> +
> >> +               spin_lock(&object->lock);
> >> +               /*
> >> +                * Disables all the rules tied to @object when it is forbidden
> >> +                * to add new rule but still allowed to remove them with
> >> +                * landlock_put_rule().  This is crucial to be able to safely
> >> +                * free a rule according to landlock_rule_is_disabled().
> >> +                */
> >> +               list_for_each_entry_safe(rule_walker, rule_walker2,
> >> +                               &object->rules, list)
> >> +                       list_del_rcu(&rule_walker->list);

So... rules don't take references on the landlock_objects they use?
Instead, the landlock_object knows which rules use it, and when the
landlock_object goes away, it nukes all the rules associated with
itself?

That seems terrible to me - AFAICS it means that if some random
process decides to install a landlock rule that uses inode X, and then
that process dies together with all its landlock rules, the inode
still stays pinned in kernel memory as long as the superblock is
mounted. In other words, it's a resource leak. (And if I'm not missing
something in patch 5, that applies even if the inode has been
unlinked?)

Can you please refactor your refcounting as follows?

 - A rule takes a reference on each landlock_object it uses.
 - A landlock_object takes a reference on the underlying object (just like now).
 - The underlying object *DOES NOT* take a reference on the
landlock_object (unlike now); the reference from the underlying object
to the landlock_object has weak pointer semantics.
 - When a landlock_object's refcount drops to zero (iow no rules use
it anymore), it is freed.

That might also help get rid of the awkward ->cleaners thing?

> >> +               /*
> >> +                * Releases @object if it is not already released (e.g. with
> >> +                * landlock_release_object()).
> >> +                */
> >> +               release_object(object);
> >> +               /*
> >> +                * Unbalances the @object->cleaners counter to reflect the
> >> +                * underlying object release.
> >> +                */
> >> +               __acquire(object->cleaners);
> >> +               put_object_free(object);
> >> +       }
> >> +       put_object_cleaner(object_cleaner);
> >> +}
[...]
> >> +static inline bool landlock_rule_is_disabled(
> >> +               struct landlock_rule *rule)
> >> +{
> >> +       /*
> >> +        * Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
> >> +        * It is not possible to re-enable such a rule, then there is no need
> >> +        * for smp_load_acquire().
> >> +        *
> >> +        * LIST_POISON2 is set by list_del() and list_del_rcu().
> >> +        */
> >> +       return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
> >
> > You're not allowed to do this, the comment above list_del() states:
> >
> >  * Note: list_empty() on entry does not return true after this, the entry is
> >  * in an undefined state.
>
> list_del() checks READ_ONCE(head->next) == head, but
> landlock_rule_is_disabled() checks READ_ONCE(rule->list.prev) ==
> LIST_POISON2.
> The comment about LIST_POISON2 is right but may be misleading. There is
> no use of list_empty() with a landlock_rule->list, only
> landlock_object->rules. The only list_del() is in landlock_put_rule()
> when there is a guarantee that there is no other reference to it, hence
> no possible use of landlock_rule_is_disabled() with this rule. I could
> replace it with a call to list_del_rcu() to make it more consistent.
>
> >
> > If you want to be able to test whether the element is on a list
> > afterwards, use stuff like list_del_init().
>
> There is no need to re-initialize the list but using list_del_init() and
> list_empty() could work too. However, there is no list_del_init_rcu()
> helper. Moreover, resetting the list's pointer with LIST_POISON2 might
> help to detect bugs.

Either way, you are currently using the list_head API in a way that
goes against what the header documents. If you want to rely on
list_del() bringing the object into a specific state, then you can't
leave the comment above list_del() as-is that says that it puts the
object in an undefined state; and this kind of check should probably
be done in a helper in list.h instead of open-coding the check for
LIST_POISON2.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control
  2020-02-24 16:02 ` [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control Mickaël Salaün
@ 2020-02-26 20:29   ` Jann Horn
  2020-02-27 16:50     ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Jann Horn @ 2020-02-26 20:29 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
> +static inline u32 get_mem_access(unsigned long prot, bool private)
> +{
> +       u32 access = LANDLOCK_ACCESS_FS_MAP;
> +
> +       /* Private mapping do not write to files. */
> +       if (!private && (prot & PROT_WRITE))
> +               access |= LANDLOCK_ACCESS_FS_WRITE;
> +       if (prot & PROT_READ)
> +               access |= LANDLOCK_ACCESS_FS_READ;
> +       if (prot & PROT_EXEC)
> +               access |= LANDLOCK_ACCESS_FS_EXECUTE;
> +       return access;
> +}

When I do the following, is landlock going to detect that the mmap()
is a read access, or is it incorrectly going to think that it's
neither read nor write?

$ cat write-only.c
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
int main(void) {
  int fd = open("/etc/passwd", O_RDONLY);
  char *ptr = mmap(NULL, 0x1000, PROT_WRITE, MAP_PRIVATE, fd, 0);
  printf("'%.*s'\n", 4, ptr);
}
$ gcc -o write-only write-only.c -Wall
$ ./write-only
'root'
$

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 01/10] landlock: Add object and rule management
  2020-02-26 20:24       ` Jann Horn
@ 2020-02-27 16:46         ` Mickaël Salaün
  0 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-27 16:46 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers


On 26/02/2020 21:24, Jann Horn wrote:
> On Wed, Feb 26, 2020 at 4:32 PM Mickaël Salaün <mic@digikod.net> wrote:
>> On 25/02/2020 21:49, Jann Horn wrote:
>>> On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün <mic@digikod.net> wrote:
>>>> A Landlock object enables to identify a kernel object (e.g. an inode).
>>>> A Landlock rule is a set of access rights allowed on an object.  Rules
>>>> are grouped in rulesets that may be tied to a set of processes (i.e.
>>>> subjects) to enforce a scoped access-control (i.e. a domain).
>>>>
>>>> Because Landlock's goal is to empower any process (especially
>>>> unprivileged ones) to sandbox themselves, we can't rely on a system-wide
>>>> object identification such as file extended attributes.  Indeed, we need
>>>> innocuous, composable and modular access-controls.
>>>>
>>>> The main challenge with this constraints is to identify kernel objects
>>>> while this identification is useful (i.e. when a security policy makes
>>>> use of this object).  But this identification data should be freed once
>>>> no policy is using it.  This ephemeral tagging should not and may not be
>>>> written in the filesystem.  We then need to manage the lifetime of a
>>>> rule according to the lifetime of its object.  To avoid a global lock,
>>>> this implementation make use of RCU and counters to safely reference
>>>> objects.
>>>>
>>>> A following commit uses this generic object management for inodes.
> [...]
>>>> +config SECURITY_LANDLOCK
>>>> +       bool "Landlock support"
>>>> +       depends on SECURITY
>>>> +       default n
>>>
>>> (I think "default n" is implicit?)
>>
>> It seems that most (all?) Kconfig are written like this.
> 
> See e.g. <https://lore.kernel.org/lkml/c187bb77-e804-93bd-64db-9418be58f191@infradead.org/>.

Ok, done.

> 
> [...]
>>>> +       return object;
>>>> +}
>>>> +
>>>> +struct landlock_object *landlock_get_object(struct landlock_object *object)
>>>> +       __acquires(object->usage)
>>>> +{
>>>> +       __acquire(object->usage);
>>>> +       /*
>>>> +        * If @object->usage equal 0, then it will be ignored by writers, and
>>>> +        * underlying_object->object may be replaced, but this is not an issue
>>>> +        * for release_object().
>>>> +        */
>>>> +       if (object && refcount_inc_not_zero(&object->usage)) {
>>>> +               /*
>>>> +                * It should not be possible to get a reference to an object if
>>>> +                * its underlying object is being terminated (e.g. with
>>>> +                * landlock_release_object()), because an object is only
>>>> +                * modifiable through such underlying object.  This is not the
>>>> +                * case with landlock_get_object_cleaner().
>>>> +                */
>>>> +               WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
>>>> +               return object;
>>>> +       }
>>>> +       return NULL;
>>>> +}
>>>> +
>>>> +static struct landlock_object *get_object_cleaner(
>>>> +               struct landlock_object *object)
>>>> +       __acquires(object->cleaners)
>>>> +{
>>>> +       __acquire(object->cleaners);
>>>> +       if (object && refcount_inc_not_zero(&object->cleaners))
>>>> +               return object;
>>>> +       return NULL;
>>>> +}
>>>
>>> I don't get this whole "cleaners" thing. Can you give a quick
>>> description of why this is necessary, and what benefits it has over a
>>> standard refcounting+RCU scheme? I don't immediately see anything that
>>> requires this.
>>
>> This indeed needs more documentation here. Here is a comment I'll add to
>> get_object_cleaner():
>>
>> This enables to safely get a reference to an object to potentially free
>> it if it is not already being freed by a concurrent thread.
> 
> "get a reference to an object to potentially free it" just sounds all
> wrong to me. You free an object when you're *dropping* a reference to
> it. Your refcounting scheme doesn't fit my mental models of how normal
> refcounting works at all...

Unfortunately, as I explain below, it is a bit tricky.

> 
> [...]
>>>> +/*
>>>> + * Putting an object is easy when the object is being terminated, but it is
>>>> + * much more tricky when the reason is that there is no more rule tied to this
>>>> + * object.  Indeed, new rules could be added at the same time.
>>>> + */
>>>> +void landlock_put_object(struct landlock_object *object)
>>>> +       __releases(object->usage)
>>>> +{
>>>> +       struct landlock_object *object_cleaner;
>>>> +
>>>> +       __release(object->usage);
>>>> +       might_sleep();
>>>> +       if (!object)
>>>> +               return;
>>>> +       /*
>>>> +        * Guards against concurrent termination to be able to terminate
>>>> +        * @object if it is empty and not referenced by another rule-appender
>>>> +        * other than the underlying object.
>>>> +        */
>>>> +       object_cleaner = get_object_cleaner(object);
> [...]
>>>> +       /*
>>>> +        * Decrements @object->usage and if it reach zero, also decrement
>>>> +        * @object->cleaners.  If both reach zero, then release and free
>>>> +        * @object.
>>>> +        */
>>>> +       if (refcount_dec_and_test(&object->usage)) {
>>>> +               struct landlock_rule *rule_walker, *rule_walker2;
>>>> +
>>>> +               spin_lock(&object->lock);
>>>> +               /*
>>>> +                * Disables all the rules tied to @object when it is forbidden
>>>> +                * to add new rule but still allowed to remove them with
>>>> +                * landlock_put_rule().  This is crucial to be able to safely
>>>> +                * free a rule according to landlock_rule_is_disabled().
>>>> +                */
>>>> +               list_for_each_entry_safe(rule_walker, rule_walker2,
>>>> +                               &object->rules, list)
>>>> +                       list_del_rcu(&rule_walker->list);
> 
> So... rules don't take references on the landlock_objects they use?
> Instead, the landlock_object knows which rules use it, and when the
> landlock_object goes away, it nukes all the rules associated with
> itself?

Right.

> 
> That seems terrible to me - AFAICS it means that if some random
> process decides to install a landlock rule that uses inode X, and then
> that process dies together with all its landlock rules, the inode
> still stays pinned in kernel memory as long as the superblock is
> mounted. In other words, it's a resource leak.

That is not correct. When there is no more process enforced by a
domain/ruleset, this domain is terminated, which means that every rules
linked to this domain are put away. When the usage counter of a rule
reaches zero, then the rule is terminated with landlock_put_rule() which
unlink the rule from its object and clean this object. The cleaning
involves to free the object if there is no rule tied to this object,
thanks to put_object_cleaner().

When the underlying object is terminated, landlock_release_object() also
decrement the usage counter. However, if there is a concurrent thread
adding a new rule, the usage counter still stay greater than zero while
the new rule is being added, but the counter then drops to zero at the
end of this addition, which can then unbalance the "cleaners" counter,
which will finally leads to the object freeing. This design enables to
add rules without locking (if the object already exists). While this
property is interesting for a performance point of view, the main reason
is to avoid unnecessary lock between processes (especially from
different domains).

> (And if I'm not missing
> something in patch 5, that applies even if the inode has been
> unlinked?)

That is true for now, but only because I didn't find yet the right spot
to call landlock_release_inode(). Indeed, unlinking a file may not
terminate an inode because it can still be open by a process, and
freeing an object when the underlying object is unlinked could be a way
to bypass a check on that object/inode.

Do you know where is the best spot to identify the last userspace
reference (through the filesystem or a file descriptor) to an inode?
Fnotify doesn't seem to check for that.


> 
> Can you please refactor your refcounting as follows?
> 
>  - A rule takes a reference on each landlock_object it uses.
>  - A landlock_object takes a reference on the underlying object (just like now).
>  - The underlying object *DOES NOT* take a reference on the
> landlock_object (unlike now); the reference from the underlying object
> to the landlock_object has weak pointer semantics.

We need to increment the reference counter of the underlying objects
(i.e. inodes) not to lose the link with their Landlock object and then
the related access-control. For instance, if a struct inode (e.g. a
directory) is first tied to a Landlock object/access-control, then
because the inode is not open nor used by any process and the kernel
decides to free it, when a process tries to access a file beneath this
directory, there will not have any Landlock object tied to it and the
requested access might then be forbidden (whereas the initial policy
allowed it).

>  - When a landlock_object's refcount drops to zero (iow no rules use
> it anymore), it is freed.

Before the current design, I used a similar pattern, but this is not
necessary because of the management of the underlying object lifetime.
The list_empty() check is enough, and because we need to handle
concurrent termination, the object's usage counter for the rules seems
unnecessary.

> 
> That might also help get rid of the awkward ->cleaners thing?
> 
>>>> +               /*
>>>> +                * Releases @object if it is not already released (e.g. with
>>>> +                * landlock_release_object()).
>>>> +                */
>>>> +               release_object(object);
>>>> +               /*
>>>> +                * Unbalances the @object->cleaners counter to reflect the
>>>> +                * underlying object release.
>>>> +                */
>>>> +               __acquire(object->cleaners);
>>>> +               put_object_free(object);
>>>> +       }
>>>> +       put_object_cleaner(object_cleaner);
>>>> +}
> [...]
>>>> +static inline bool landlock_rule_is_disabled(
>>>> +               struct landlock_rule *rule)
>>>> +{
>>>> +       /*
>>>> +        * Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
>>>> +        * It is not possible to re-enable such a rule, then there is no need
>>>> +        * for smp_load_acquire().
>>>> +        *
>>>> +        * LIST_POISON2 is set by list_del() and list_del_rcu().
>>>> +        */
>>>> +       return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
>>>
>>> You're not allowed to do this, the comment above list_del() states:
>>>
>>>  * Note: list_empty() on entry does not return true after this, the entry is
>>>  * in an undefined state.
>>
>> list_del() checks READ_ONCE(head->next) == head, but
>> landlock_rule_is_disabled() checks READ_ONCE(rule->list.prev) ==
>> LIST_POISON2.
>> The comment about LIST_POISON2 is right but may be misleading. There is
>> no use of list_empty() with a landlock_rule->list, only
>> landlock_object->rules. The only list_del() is in landlock_put_rule()
>> when there is a guarantee that there is no other reference to it, hence
>> no possible use of landlock_rule_is_disabled() with this rule. I could
>> replace it with a call to list_del_rcu() to make it more consistent.
>>
>>>
>>> If you want to be able to test whether the element is on a list
>>> afterwards, use stuff like list_del_init().
>>
>> There is no need to re-initialize the list but using list_del_init() and
>> list_empty() could work too. However, there is no list_del_init_rcu()
>> helper. Moreover, resetting the list's pointer with LIST_POISON2 might
>> help to detect bugs.
> 
> Either way, you are currently using the list_head API in a way that
> goes against what the header documents. If you want to rely on
> list_del() bringing the object into a specific state, then you can't
> leave the comment above list_del() as-is that says that it puts the
> object in an undefined state; and this kind of check should probably
> be done in a helper in list.h instead of open-coding the check for
> LIST_POISON2.

In the case of Landlock, it is illegal to use or recycle a rule which
was untied from its (initial) object. There is no use of
list_empty(&landlock_rule->list), only
landlock_rule_is_disabled(landlock_rule). The LIST_POISON2 might help to
identify such misuse.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control
  2020-02-26 20:29   ` Jann Horn
@ 2020-02-27 16:50     ` Mickaël Salaün
  2020-02-27 16:51       ` Jann Horn
  0 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-27 16:50 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers



On 26/02/2020 21:29, Jann Horn wrote:
> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
>> +static inline u32 get_mem_access(unsigned long prot, bool private)
>> +{
>> +       u32 access = LANDLOCK_ACCESS_FS_MAP;
>> +
>> +       /* Private mapping do not write to files. */
>> +       if (!private && (prot & PROT_WRITE))
>> +               access |= LANDLOCK_ACCESS_FS_WRITE;
>> +       if (prot & PROT_READ)
>> +               access |= LANDLOCK_ACCESS_FS_READ;
>> +       if (prot & PROT_EXEC)
>> +               access |= LANDLOCK_ACCESS_FS_EXECUTE;
>> +       return access;
>> +}
> 
> When I do the following, is landlock going to detect that the mmap()
> is a read access, or is it incorrectly going to think that it's
> neither read nor write?
> 
> $ cat write-only.c
> #include <fcntl.h>
> #include <sys/mman.h>
> #include <stdio.h>
> int main(void) {
>   int fd = open("/etc/passwd", O_RDONLY);
>   char *ptr = mmap(NULL, 0x1000, PROT_WRITE, MAP_PRIVATE, fd, 0);
>   printf("'%.*s'\n", 4, ptr);
> }
> $ gcc -o write-only write-only.c -Wall
> $ ./write-only
> 'root'
> $
> 

Thanks to the "if (!private && (prot & PROT_WRITE))", Landlock allows
this private mmap (as intended) even if there is no write access to this
file, but not with a shared mmap (and a file opened with O_RDWR). I just
added a test for this to be sure.

However, I'm not sure this hook is useful for now. Indeed, the process
still need to have a file descriptor open with the right accesses.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control
  2020-02-27 16:50     ` Mickaël Salaün
@ 2020-02-27 16:51       ` Jann Horn
  0 siblings, 0 replies; 34+ messages in thread
From: Jann Horn @ 2020-02-27 16:51 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Thu, Feb 27, 2020 at 5:50 PM Mickaël Salaün <mic@digikod.net> wrote:
> On 26/02/2020 21:29, Jann Horn wrote:
> > On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
> >> +static inline u32 get_mem_access(unsigned long prot, bool private)
> >> +{
> >> +       u32 access = LANDLOCK_ACCESS_FS_MAP;
> >> +
> >> +       /* Private mapping do not write to files. */
> >> +       if (!private && (prot & PROT_WRITE))
> >> +               access |= LANDLOCK_ACCESS_FS_WRITE;
> >> +       if (prot & PROT_READ)
> >> +               access |= LANDLOCK_ACCESS_FS_READ;
> >> +       if (prot & PROT_EXEC)
> >> +               access |= LANDLOCK_ACCESS_FS_EXECUTE;
> >> +       return access;
> >> +}
[...]
> However, I'm not sure this hook is useful for now. Indeed, the process
> still need to have a file descriptor open with the right accesses.

Yeah, agreed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 01/10] landlock: Add object and rule management
       [not found] ` <20200227042002.3032-1-hdanton@sina.com>
@ 2020-02-27 17:01   ` Mickaël Salaün
  0 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-02-27 17:01 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E. Hallyn, Shuah Khan,
	Vincent Dagonneau, kernel-hardening, linux-api, linux-arch,
	linux-doc, linux-fsdevel, linux-kselftest, linux-security-module,
	x86



On 27/02/2020 05:20, Hillf Danton wrote:
> 
> On Mon, 24 Feb 2020 17:02:06 +0100 Mickaël Salaün 
>> A Landlock object enables to identify a kernel object (e.g. an inode).
>> A Landlock rule is a set of access rights allowed on an object.  Rules
>> are grouped in rulesets that may be tied to a set of processes (i.e.
>> subjects) to enforce a scoped access-control (i.e. a domain).
>>
>> Because Landlock's goal is to empower any process (especially
>> unprivileged ones) to sandbox themselves, we can't rely on a system-wide
>> object identification such as file extended attributes.  Indeed, we need
>> innocuous, composable and modular access-controls.
>>
>> The main challenge with this constraints is to identify kernel objects
>> while this identification is useful (i.e. when a security policy makes
>> use of this object).  But this identification data should be freed once
>> no policy is using it.  This ephemeral tagging should not and may not be
>> written in the filesystem.  We then need to manage the lifetime of a
>> rule according to the lifetime of its object.  To avoid a global lock,
>> this implementation make use of RCU and counters to safely reference
>> objects.
>>
>> A following commit uses this generic object management for inodes.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: James Morris <jmorris@namei.org>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Serge E. Hallyn <serge@hallyn.com>
>> ---
>>
>> Changes since v13:
>> * New dedicated implementation, removing the need for eBPF.
>>
>> Previous version:
>> https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/
>> ---
>>  MAINTAINERS                |  10 ++
>>  security/Kconfig           |   1 +
>>  security/Makefile          |   2 +
>>  security/landlock/Kconfig  |  15 ++
>>  security/landlock/Makefile |   3 +
>>  security/landlock/object.c | 339 +++++++++++++++++++++++++++++++++++++
>>  security/landlock/object.h | 134 +++++++++++++++
>>  7 files changed, 504 insertions(+)
>>  create mode 100644 security/landlock/Kconfig
>>  create mode 100644 security/landlock/Makefile
>>  create mode 100644 security/landlock/object.c
>>  create mode 100644 security/landlock/object.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index fcd79fc38928..206f85768cd9 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -9360,6 +9360,16 @@ F:	net/core/skmsg.c
>>  F:	net/core/sock_map.c
>>  F:	net/ipv4/tcp_bpf.c
>>  
>> +LANDLOCK SECURITY MODULE
>> +M:	Mickaël Salaün <mic@digikod.net>
>> +L:	linux-security-module@vger.kernel.org
>> +W:	https://landlock.io
>> +T:	git https://github.com/landlock-lsm/linux.git
>> +S:	Supported
>> +F:	security/landlock/
>> +K:	landlock
>> +K:	LANDLOCK
>> +
>>  LANTIQ / INTEL Ethernet drivers
>>  M:	Hauke Mehrtens <hauke@hauke-m.de>
>>  L:	netdev@vger.kernel.org
>> diff --git a/security/Kconfig b/security/Kconfig
>> index 2a1a2d396228..9d9981394fb0 100644
>> --- a/security/Kconfig
>> +++ b/security/Kconfig
>> @@ -238,6 +238,7 @@ source "security/loadpin/Kconfig"
>>  source "security/yama/Kconfig"
>>  source "security/safesetid/Kconfig"
>>  source "security/lockdown/Kconfig"
>> +source "security/landlock/Kconfig"
>>  
>>  source "security/integrity/Kconfig"
>>  
>> diff --git a/security/Makefile b/security/Makefile
>> index 746438499029..2472ef96d40a 100644
>> --- a/security/Makefile
>> +++ b/security/Makefile
>> @@ -12,6 +12,7 @@ subdir-$(CONFIG_SECURITY_YAMA)		+= yama
>>  subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
>>  subdir-$(CONFIG_SECURITY_SAFESETID)    += safesetid
>>  subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown
>> +subdir-$(CONFIG_SECURITY_LANDLOCK)		+= landlock
>>  
>>  # always enable default capabilities
>>  obj-y					+= commoncap.o
>> @@ -29,6 +30,7 @@ obj-$(CONFIG_SECURITY_YAMA)		+= yama/
>>  obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
>>  obj-$(CONFIG_SECURITY_SAFESETID)       += safesetid/
>>  obj-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown/
>> +obj-$(CONFIG_SECURITY_LANDLOCK)	+= landlock/
>>  obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
>>  
>>  # Object integrity file lists
>> diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
>> new file mode 100644
>> index 000000000000..4a321d5b3f67
>> --- /dev/null
>> +++ b/security/landlock/Kconfig
>> @@ -0,0 +1,15 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +
>> +config SECURITY_LANDLOCK
>> +	bool "Landlock support"
>> +	depends on SECURITY
>> +	default n
>> +	help
>> +	  This selects Landlock, a safe sandboxing mechanism.  It enables to
>> +	  restrict processes on the fly (i.e. enforce an access control policy),
>> +	  which can complement seccomp-bpf.  The security policy is a set of access
>> +	  rights tied to an object, which could be a file, a socket or a process.
>> +
>> +	  See Documentation/security/landlock/ for further information.
>> +
>> +	  If you are unsure how to answer this question, answer N.
>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>> new file mode 100644
>> index 000000000000..cb6deefbf4c0
>> --- /dev/null
>> +++ b/security/landlock/Makefile
>> @@ -0,0 +1,3 @@
>> +obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>> +
>> +landlock-y := object.o
>> diff --git a/security/landlock/object.c b/security/landlock/object.c
>> new file mode 100644
>> index 000000000000..38fbbb108120
>> --- /dev/null
>> +++ b/security/landlock/object.c
>> @@ -0,0 +1,339 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Landlock LSM - Object and rule management
>> + *
>> + * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
>> + * Copyright © 2018-2020 ANSSI
>> + *
>> + * Principles and constraints of the object and rule management:
>> + * - Do not leak memory.
>> + * - Try as much as possible to free a memory allocation as soon as it is
>> + *   unused.
>> + * - Do not use global lock.
>> + * - Do not charge processes other than the one requesting a Landlock
>> + *   operation.
>> + */
>> +
>> +#include <linux/bug.h>
>> +#include <linux/compiler.h>
>> +#include <linux/compiler_types.h>
>> +#include <linux/err.h>
>> +#include <linux/errno.h>
>> +#include <linux/fs.h>
>> +#include <linux/kernel.h>
>> +#include <linux/list.h>
>> +#include <linux/rbtree.h>
>> +#include <linux/rcupdate.h>
>> +#include <linux/refcount.h>
>> +#include <linux/slab.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/workqueue.h>
>> +
>> +#include "object.h"
>> +
>> +struct landlock_object *landlock_create_object(
>> +		const enum landlock_object_type type, void *underlying_object)
>> +{
>> +	struct landlock_object *object;
>> +
>> +	if (WARN_ON_ONCE(!underlying_object))
>> +		return NULL;
>> +	object = kzalloc(sizeof(*object), GFP_KERNEL);
>> +	if (!object)
>> +		return NULL;
>> +	refcount_set(&object->usage, 1);
>> +	refcount_set(&object->cleaners, 1);
>> +	spin_lock_init(&object->lock);
>> +	INIT_LIST_HEAD(&object->rules);
>> +	object->type = type;
>> +	WRITE_ONCE(object->underlying_object, underlying_object);
>> +	return object;
>> +}
>> +
>> +struct landlock_object *landlock_get_object(struct landlock_object *object)
>> +	__acquires(object->usage)
>> +{
>> +	__acquire(object->usage);
>> +	/*
>> +	 * If @object->usage equal 0, then it will be ignored by writers, and
>> +	 * underlying_object->object may be replaced, but this is not an issue
>> +	 * for release_object().
>> +	 */
>> +	if (object && refcount_inc_not_zero(&object->usage)) {
>> +		/*
>> +		 * It should not be possible to get a reference to an object if
>> +		 * its underlying object is being terminated (e.g. with
>> +		 * landlock_release_object()), because an object is only
>> +		 * modifiable through such underlying object.  This is not the
>> +		 * case with landlock_get_object_cleaner().
>> +		 */
>> +		WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
>> +		return object;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static struct landlock_object *get_object_cleaner(
>> +		struct landlock_object *object)
>> +	__acquires(object->cleaners)
>> +{
>> +	__acquire(object->cleaners);
>> +	if (object && refcount_inc_not_zero(&object->cleaners))
>> +		return object;
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * There is two cases when an object should be free and the reference to the
>> + * underlying object should be put:
>> + * - when the last rule tied to this object is removed, which is handled by
>> + *   landlock_put_rule() and then release_object();
>> + * - when the object is being terminated (e.g. no more reference to an inode),
>> + *   which is handled by landlock_put_object().
>> + */
>> +static void put_object_free(struct landlock_object *object)
>> +	__releases(object->cleaners)
>> +{
>> +	__release(object->cleaners);
>> +	if (!refcount_dec_and_test(&object->cleaners))
>> +		return;
>> +	WARN_ON_ONCE(refcount_read(&object->usage));
>> +	/*
>> +	 * Ensures a safe use of @object in the RCU block from
>> +	 * landlock_put_rule().
>> +	 */
>> +	kfree_rcu(object, rcu_free);
>> +}
>> +
>> +/*
>> + * Destroys a newly created and useless object.
>> + */
>> +void landlock_drop_object(struct landlock_object *object)
>> +{
>> +	if (WARN_ON_ONCE(!refcount_dec_and_test(&object->usage)))
>> +		return;
>> +	__acquire(object->cleaners);
>> +	put_object_free(object);
>> +}
>> +
>> +/*
>> + * Puts the underlying object (e.g. inode) if it is the first request to
>> + * release @object, without calling landlock_put_object().
>> + *
>> + * Return true if this call effectively marks @object as released, false
>> + * otherwise.
>> + */
>> +static bool release_object(struct landlock_object *object)
>> +	__releases(&object->lock)
>> +{
>> +	void *underlying_object;
>> +
>> +	lockdep_assert_held(&object->lock);
>> +
>> +	underlying_object = xchg(&object->underlying_object, NULL);
> 
> A one-line comment looks needed for xchg.

Ok. This is to have a guarantee that the underlying_object (e.g. the
inode pointer) is only used once. I'll add a comment.

> 
>> +	spin_unlock(&object->lock);
>> +	might_sleep();
> 
> Have trouble working out what might_sleep is put for.

Patch 5 adds a call to landlock_release_inode(underlying_object, object)
(LANDLOCK_OBJECT_INODE case), which can sleep e.g., with a call to iput().

> 
>> +	if (!underlying_object)
>> +		return false;
>> +
>> +	switch (object->type) {
>> +	case LANDLOCK_OBJECT_INODE:
>> +		break;
>> +	default:
>> +		WARN_ON_ONCE(1);
>> +	}
>> +	return true;
>> +}
>> +
>> +static void put_object_cleaner(struct landlock_object *object)
>> +	__releases(object->cleaners)
>> +{
>> +	/* Let's try an early lockless check. */
>> +	if (list_empty(&object->rules) &&
>> +			READ_ONCE(object->underlying_object)) {
>> +		/*
>> +		 * Puts @object if there is no rule tied to it and the
>> +		 * remaining user is the underlying object.  This check is
>> +		 * atomic because @object->rules and @object->underlying_object
>> +		 * are protected by @object->lock.
>> +		 */
>> +		spin_lock(&object->lock);
>> +		if (list_empty(&object->rules) &&
>> +				READ_ONCE(object->underlying_object) &&
>> +				refcount_dec_if_one(&object->usage)) {
>> +			/*
>> +			 * Releases @object, in place of
>> +			 * landlock_release_object().
>> +			 *
>> +			 * @object is already empty, implying that all its
>> +			 * previous rules are already disabled.
>> +			 *
>> +			 * Unbalance the @object->cleaners counter to reflect
>> +			 * the underlying object release.
>> +			 */
>> +			if (!WARN_ON_ONCE(!release_object(object))) {
> 
> Two ! hurt more than help.

Well, it may not look nice but don't you think it is better than a
WARN_ON_ONCE(1) in the if block?

>> +				__acquire(object->cleaners);
>> +				put_object_free(object);
> 
> Why put object more than once?

I just replied to Jann about this subject. This is to "unbalance" the
counter to potentially free it (if there is no more user). I explain it
here:
https://lore.kernel.org/lkml/67465638-e22c-5d1a-df37-862b31d999a1@digikod.net/

> 
>> +			}
>> +		} else {
>> +			spin_unlock(&object->lock);
>> +		}
>> +	}
>> +	put_object_free(object);
>> +}
>> +
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 10/10] landlock: Add user and kernel documentation
  2020-02-24 16:02 ` [RFC PATCH v14 10/10] landlock: Add user and kernel documentation Mickaël Salaün
@ 2020-02-29 17:23   ` Randy Dunlap
  2020-03-02 10:03     ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Randy Dunlap @ 2020-02-29 17:23 UTC (permalink / raw)
  To: Mickaël Salaün, linux-kernel
  Cc: Al Viro, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Greg Kroah-Hartman, James Morris, Jann Horn, Jonathan Corbet,
	Kees Cook, Michael Kerrisk, Mickaël Salaün,
	Serge E . Hallyn, Shuah Khan, Vincent Dagonneau,
	kernel-hardening, linux-api, linux-arch, linux-doc,
	linux-fsdevel, linux-kselftest, linux-security-module, x86

Hi,
Here are a few corrections for you to consider.


On 2/24/20 8:02 AM, Mickaël Salaün wrote:
> This documentation can be built with the Sphinx framework.
> 
> Another location might be more appropriate, though.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: James Morris <jmorris@namei.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> ---
> 
> Changes since v13:
> * Rewrote the documentation according to the major revamp.
> 
> Previous version:
> https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/
> ---
>  Documentation/security/index.rst           |   1 +
>  Documentation/security/landlock/index.rst  |  18 ++
>  Documentation/security/landlock/kernel.rst |  44 ++++
>  Documentation/security/landlock/user.rst   | 233 +++++++++++++++++++++
>  4 files changed, 296 insertions(+)
>  create mode 100644 Documentation/security/landlock/index.rst
>  create mode 100644 Documentation/security/landlock/kernel.rst
>  create mode 100644 Documentation/security/landlock/user.rst
> 
> diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst
> new file mode 100644
> index 000000000000..dbd33b96ce60
> --- /dev/null
> +++ b/Documentation/security/landlock/index.rst
> @@ -0,0 +1,18 @@
> +=========================================
> +Landlock LSM: unprivileged access control
> +=========================================
> +
> +:Author: Mickaël Salaün
> +
> +The goal of Landlock is to enable to restrict ambient rights (e.g.  global
> +filesystem access) for a set of processes.  Because Landlock is a stackable
> +LSM, it makes possible to create safe security sandboxes as new security layers
> +in addition to the existing system-wide access-controls. This kind of sandbox
> +is expected to help mitigate the security impact of bugs or
> +unexpected/malicious behaviors in user-space applications. Landlock empower any

                                                                       empowers

> +process, including unprivileged ones, to securely restrict themselves.
> +
> +.. toctree::
> +
> +    user
> +    kernel
> diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst
> new file mode 100644
> index 000000000000..b87769909029
> --- /dev/null
> +++ b/Documentation/security/landlock/kernel.rst
> @@ -0,0 +1,44 @@
> +==============================
> +Landlock: kernel documentation
> +==============================
> +
> +Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> +harden a whole system, this feature should be available to any process,
> +including unprivileged ones.  Because such process may be compromised or
> +backdoored (i.e. untrusted), Landlock's features must be safe to use from the
> +kernel and other processes point of view.  Landlock's interface must therefore
> +expose a minimal attack surface.
> +
> +Landlock is designed to be usable by unprivileged processes while following the
> +system security policy enforced by other access control mechanisms (e.g. DAC,
> +LSM).  Indeed, a Landlock rule shall not interfere with other access-controls
> +enforced on the system, only add more restrictions.
> +
> +Any user can enforce Landlock rulesets on their processes.  They are merged and
> +evaluated according to the inherited ones in a way that ensure that only more

                                                           ensures

> +constraints can be added.
> +
> +
> +Guiding principles for safe access controls
> +===========================================
> +
> +* A Landlock rule shall be focused on access control on kernel objects instead
> +  of syscall filtering (i.e. syscall arguments), which is the purpose of
> +  seccomp-bpf.
> +* To avoid multiple kind of side-channel attacks (e.g. leak of security

                       kinds

> +  policies, CPU-based attacks), Landlock rules shall not be able to
> +  programmatically communicate with user space.
> +* Kernel access check shall not slow down access request from unsandboxed
> +  processes.
> +* Computation related to Landlock operations (e.g. enforce a ruleset) shall
> +  only impact the processes requesting them.
> +
> +
> +Landlock rulesets and domains
> +=============================
> +
> +A domain is a read-only ruleset tied to a set of subjects (i.e. tasks).  A
> +domain can transition to a new one which is the intersection of the constraints
> +from the current and a new ruleset.  The definition of a subject is implicit
> +for a task sandboxing itself, which makes the reasoning much easier and helps
> +avoid pitfalls.
> diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst
> new file mode 100644
> index 000000000000..cbd7f61fca8c
> --- /dev/null
> +++ b/Documentation/security/landlock/user.rst
> @@ -0,0 +1,233 @@
> +=================================
> +Landlock: userspace documentation
> +=================================
> +
> +Landlock rules
> +==============
> +
> +A Landlock rule enables to describe an action on an object.  An object is
> +currently a file hierarchy, and the related filesystem actions are defined in
> +`Access rights`_.  A set of rules are aggregated in a ruleset, which can then

                                     is

> +restricts the thread enforcing it, and its future children.

   restrict

> +
> +
> +Defining and enforcing a security policy
> +----------------------------------------
> +
> +Before defining a security policy, an application should first probe for the
> +features supported by the running kernel, which is important to be compatible
> +with older kernels.  This can be done thanks to the `landlock` syscall (cf.
> +:ref:`syscall`).
> +
> +.. code-block:: c
> +
> +    struct landlock_attr_features attr_features;
> +
> +    if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
> +            sizeof(attr_features), &attr_features)) {
> +        perror("Failed to probe the Landlock supported features");
> +        return 1;
> +    }
> +
> +Then, we need to create the ruleset that will contains our rules.  For this

                                                 contain

> +example, the ruleset will contains rules which only allow read actions, but

                             contain

> +write actions will be denied.  The ruleset then needs to handle both of these
> +kind of actions.  To have a backward compatibility, these actions should be
> +ANDed with the supported ones.
> +
> +.. code-block:: c
> +
> +    int ruleset_fd;
> +    struct landlock_attr_ruleset ruleset = {
> +        .handled_access_fs =
> +            LANDLOCK_ACCESS_FS_READ |
> +            LANDLOCK_ACCESS_FS_READDIR |
> +            LANDLOCK_ACCESS_FS_EXECUTE |
> +            LANDLOCK_ACCESS_FS_WRITE |
> +            LANDLOCK_ACCESS_FS_TRUNCATE |
> +            LANDLOCK_ACCESS_FS_CHMOD |
> +            LANDLOCK_ACCESS_FS_CHOWN |
> +            LANDLOCK_ACCESS_FS_CHGRP |
> +            LANDLOCK_ACCESS_FS_LINK_TO |
> +            LANDLOCK_ACCESS_FS_RENAME_FROM |
> +            LANDLOCK_ACCESS_FS_RENAME_TO |
> +            LANDLOCK_ACCESS_FS_RMDIR |
> +            LANDLOCK_ACCESS_FS_UNLINK |
> +            LANDLOCK_ACCESS_FS_MAKE_CHAR |
> +            LANDLOCK_ACCESS_FS_MAKE_DIR |
> +            LANDLOCK_ACCESS_FS_MAKE_REG |
> +            LANDLOCK_ACCESS_FS_MAKE_SOCK |
> +            LANDLOCK_ACCESS_FS_MAKE_FIFO |
> +            LANDLOCK_ACCESS_FS_MAKE_BLOCK |
> +            LANDLOCK_ACCESS_FS_MAKE_SYM,
> +    };
> +
> +    ruleset.handled_access_fs &= attr_features.access_fs;
> +    ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> +                    LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> +    if (ruleset_fd < 0) {
> +        perror("Failed to create a ruleset");
> +        return 1;
> +    }
> +
> +We can now add a new rule to this ruleset thanks to the returned file
> +descriptor referring to this ruleset.  The rule will only enable to read the
> +file hierarchy ``/usr``.  Without other rule, write actions would then be

                             Without other rules,
or
                             Without another rule,

> +denied by the ruleset.  To add ``/usr`` to the ruleset, we open it with the
> +``O_PATH`` flag and fill the &struct landlock_attr_path_beneath with this file
> +descriptor.
> +
> +.. code-block:: c
> +
> +    int err;
> +    struct landlock_attr_path_beneath path_beneath = {
> +        .ruleset_fd = ruleset_fd,
> +        .allowed_access =
> +            LANDLOCK_ACCESS_FS_READ |
> +            LANDLOCK_ACCESS_FS_READDIR |
> +            LANDLOCK_ACCESS_FS_EXECUTE,
> +    };
> +
> +    path_beneath.allowed_access &= attr_features.access_fs;
> +    path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> +    if (path_beneath.parent_fd < 0) {
> +        perror("Failed to open file");
> +        close(ruleset_fd);
> +        return 1;
> +    }
> +    err = landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> +            sizeof(path_beneath), &path_beneath);
> +    close(path_beneath.parent_fd);
> +    if (err) {
> +        perror("Failed to update ruleset");
> +        close(ruleset_fd);
> +        return 1;
> +    }
> +
> +We now have a ruleset with one rule allowing read access to ``/usr`` while
> +denying all accesses featured in ``attr_features.access_fs`` to everything else
> +on the filesystem.  The next step is to restrict the current thread from
> +gaining more privileges (e.g. thanks to a SUID binary).
> +
> +.. code-block:: c
> +
> +    if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
> +        perror("Failed to restrict privileges");
> +        close(ruleset_fd);
> +        return 1;
> +    }
> +
> +The current thread is now ready to sandbox itself with the ruleset.
> +
> +.. code-block:: c
> +
> +    struct landlock_attr_enforce attr_enforce = {
> +        .ruleset_fd = ruleset_fd,
> +    };
> +
> +    if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> +            sizeof(attr_enforce), &attr_enforce)) {
> +        perror("Failed to enforce ruleset");
> +        close(ruleset_fd);
> +        return 1;
> +    }
> +    close(ruleset_fd);
> +
> +If this last system call succeeds, the current thread is now restricted and

   If this last landlock system call succeeds,

[because close() is the last system call]

> +this policy will be enforced on all its subsequently created children as well.
> +Once a thread is landlocked, there is no way to remove its security policy,

                                                   preferably:         policy;

> +only adding more restrictions is allowed.  These threads are now in a new
> +Landlock domain, merge of their parent one (if any) with the new ruleset.
> +
> +A full working code can be found in `samples/landlock/sandboxer.c`_.

   Full working code

> +
> +
> +Inheritance
> +-----------
> +
> +Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program
> +restrictions from its parent.  This is similar to the seccomp inheritance (cf.
> +:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's
> +:manpage:`credentials(7)`.  For instance, one process' thread may apply

                                                 process's

> +Landlock rules to itself, but they will not be automatically applied to other
> +sibling threads (unlike POSIX thread credential changes, cf.
> +:manpage:`nptl(7)`).

[snip]

thanks for the documentation.

-- 
~Randy


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 10/10] landlock: Add user and kernel documentation
  2020-02-29 17:23   ` Randy Dunlap
@ 2020-03-02 10:03     ` Mickaël Salaün
  0 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-03-02 10:03 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel
  Cc: Al Viro, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Greg Kroah-Hartman, James Morris, Jann Horn, Jonathan Corbet,
	Kees Cook, Michael Kerrisk, Mickaël Salaün,
	Serge E . Hallyn, Shuah Khan, Vincent Dagonneau,
	kernel-hardening, linux-api, linux-arch, linux-doc,
	linux-fsdevel, linux-kselftest, linux-security-module, x86


On 29/02/2020 18:23, Randy Dunlap wrote:
> Hi,
> Here are a few corrections for you to consider.
> 
> 
> On 2/24/20 8:02 AM, Mickaël Salaün wrote:
>> This documentation can be built with the Sphinx framework.
>>
>> Another location might be more appropriate, though.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: James Morris <jmorris@namei.org>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Serge E. Hallyn <serge@hallyn.com>
>> ---
>>
>> Changes since v13:
>> * Rewrote the documentation according to the major revamp.
>>
>> Previous version:
>> https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/
>> ---
>>  Documentation/security/index.rst           |   1 +
>>  Documentation/security/landlock/index.rst  |  18 ++
>>  Documentation/security/landlock/kernel.rst |  44 ++++
>>  Documentation/security/landlock/user.rst   | 233 +++++++++++++++++++++
>>  4 files changed, 296 insertions(+)
>>  create mode 100644 Documentation/security/landlock/index.rst
>>  create mode 100644 Documentation/security/landlock/kernel.rst
>>  create mode 100644 Documentation/security/landlock/user.rst
>>
>> diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst
>> new file mode 100644
>> index 000000000000..dbd33b96ce60
>> --- /dev/null
>> +++ b/Documentation/security/landlock/index.rst
>> @@ -0,0 +1,18 @@
>> +=========================================
>> +Landlock LSM: unprivileged access control
>> +=========================================
>> +
>> +:Author: Mickaël Salaün
>> +
>> +The goal of Landlock is to enable to restrict ambient rights (e.g.  global
>> +filesystem access) for a set of processes.  Because Landlock is a stackable
>> +LSM, it makes possible to create safe security sandboxes as new security layers
>> +in addition to the existing system-wide access-controls. This kind of sandbox
>> +is expected to help mitigate the security impact of bugs or
>> +unexpected/malicious behaviors in user-space applications. Landlock empower any
> 
>                                                                        empowers
> 
>> +process, including unprivileged ones, to securely restrict themselves.
>> +
>> +.. toctree::
>> +
>> +    user
>> +    kernel
>> diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst
>> new file mode 100644
>> index 000000000000..b87769909029
>> --- /dev/null
>> +++ b/Documentation/security/landlock/kernel.rst
>> @@ -0,0 +1,44 @@
>> +==============================
>> +Landlock: kernel documentation
>> +==============================
>> +
>> +Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
>> +harden a whole system, this feature should be available to any process,
>> +including unprivileged ones.  Because such process may be compromised or
>> +backdoored (i.e. untrusted), Landlock's features must be safe to use from the
>> +kernel and other processes point of view.  Landlock's interface must therefore
>> +expose a minimal attack surface.
>> +
>> +Landlock is designed to be usable by unprivileged processes while following the
>> +system security policy enforced by other access control mechanisms (e.g. DAC,
>> +LSM).  Indeed, a Landlock rule shall not interfere with other access-controls
>> +enforced on the system, only add more restrictions.
>> +
>> +Any user can enforce Landlock rulesets on their processes.  They are merged and
>> +evaluated according to the inherited ones in a way that ensure that only more
> 
>                                                            ensures
> 
>> +constraints can be added.
>> +
>> +
>> +Guiding principles for safe access controls
>> +===========================================
>> +
>> +* A Landlock rule shall be focused on access control on kernel objects instead
>> +  of syscall filtering (i.e. syscall arguments), which is the purpose of
>> +  seccomp-bpf.
>> +* To avoid multiple kind of side-channel attacks (e.g. leak of security
> 
>                        kinds
> 
>> +  policies, CPU-based attacks), Landlock rules shall not be able to
>> +  programmatically communicate with user space.
>> +* Kernel access check shall not slow down access request from unsandboxed
>> +  processes.
>> +* Computation related to Landlock operations (e.g. enforce a ruleset) shall
>> +  only impact the processes requesting them.
>> +
>> +
>> +Landlock rulesets and domains
>> +=============================
>> +
>> +A domain is a read-only ruleset tied to a set of subjects (i.e. tasks).  A
>> +domain can transition to a new one which is the intersection of the constraints
>> +from the current and a new ruleset.  The definition of a subject is implicit
>> +for a task sandboxing itself, which makes the reasoning much easier and helps
>> +avoid pitfalls.
>> diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst
>> new file mode 100644
>> index 000000000000..cbd7f61fca8c
>> --- /dev/null
>> +++ b/Documentation/security/landlock/user.rst
>> @@ -0,0 +1,233 @@
>> +=================================
>> +Landlock: userspace documentation
>> +=================================
>> +
>> +Landlock rules
>> +==============
>> +
>> +A Landlock rule enables to describe an action on an object.  An object is
>> +currently a file hierarchy, and the related filesystem actions are defined in
>> +`Access rights`_.  A set of rules are aggregated in a ruleset, which can then
> 
>                                      is
> 
>> +restricts the thread enforcing it, and its future children.
> 
>    restrict
> 
>> +
>> +
>> +Defining and enforcing a security policy
>> +----------------------------------------
>> +
>> +Before defining a security policy, an application should first probe for the
>> +features supported by the running kernel, which is important to be compatible
>> +with older kernels.  This can be done thanks to the `landlock` syscall (cf.
>> +:ref:`syscall`).
>> +
>> +.. code-block:: c
>> +
>> +    struct landlock_attr_features attr_features;
>> +
>> +    if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
>> +            sizeof(attr_features), &attr_features)) {
>> +        perror("Failed to probe the Landlock supported features");
>> +        return 1;
>> +    }
>> +
>> +Then, we need to create the ruleset that will contains our rules.  For this
> 
>                                                  contain
> 
>> +example, the ruleset will contains rules which only allow read actions, but
> 
>                              contain
> 
>> +write actions will be denied.  The ruleset then needs to handle both of these
>> +kind of actions.  To have a backward compatibility, these actions should be
>> +ANDed with the supported ones.
>> +
>> +.. code-block:: c
>> +
>> +    int ruleset_fd;
>> +    struct landlock_attr_ruleset ruleset = {
>> +        .handled_access_fs =
>> +            LANDLOCK_ACCESS_FS_READ |
>> +            LANDLOCK_ACCESS_FS_READDIR |
>> +            LANDLOCK_ACCESS_FS_EXECUTE |
>> +            LANDLOCK_ACCESS_FS_WRITE |
>> +            LANDLOCK_ACCESS_FS_TRUNCATE |
>> +            LANDLOCK_ACCESS_FS_CHMOD |
>> +            LANDLOCK_ACCESS_FS_CHOWN |
>> +            LANDLOCK_ACCESS_FS_CHGRP |
>> +            LANDLOCK_ACCESS_FS_LINK_TO |
>> +            LANDLOCK_ACCESS_FS_RENAME_FROM |
>> +            LANDLOCK_ACCESS_FS_RENAME_TO |
>> +            LANDLOCK_ACCESS_FS_RMDIR |
>> +            LANDLOCK_ACCESS_FS_UNLINK |
>> +            LANDLOCK_ACCESS_FS_MAKE_CHAR |
>> +            LANDLOCK_ACCESS_FS_MAKE_DIR |
>> +            LANDLOCK_ACCESS_FS_MAKE_REG |
>> +            LANDLOCK_ACCESS_FS_MAKE_SOCK |
>> +            LANDLOCK_ACCESS_FS_MAKE_FIFO |
>> +            LANDLOCK_ACCESS_FS_MAKE_BLOCK |
>> +            LANDLOCK_ACCESS_FS_MAKE_SYM,
>> +    };
>> +
>> +    ruleset.handled_access_fs &= attr_features.access_fs;
>> +    ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
>> +                    LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
>> +    if (ruleset_fd < 0) {
>> +        perror("Failed to create a ruleset");
>> +        return 1;
>> +    }
>> +
>> +We can now add a new rule to this ruleset thanks to the returned file
>> +descriptor referring to this ruleset.  The rule will only enable to read the
>> +file hierarchy ``/usr``.  Without other rule, write actions would then be
> 
>                              Without other rules,
> or
>                              Without another rule,
> 
>> +denied by the ruleset.  To add ``/usr`` to the ruleset, we open it with the
>> +``O_PATH`` flag and fill the &struct landlock_attr_path_beneath with this file
>> +descriptor.
>> +
>> +.. code-block:: c
>> +
>> +    int err;
>> +    struct landlock_attr_path_beneath path_beneath = {
>> +        .ruleset_fd = ruleset_fd,
>> +        .allowed_access =
>> +            LANDLOCK_ACCESS_FS_READ |
>> +            LANDLOCK_ACCESS_FS_READDIR |
>> +            LANDLOCK_ACCESS_FS_EXECUTE,
>> +    };
>> +
>> +    path_beneath.allowed_access &= attr_features.access_fs;
>> +    path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
>> +    if (path_beneath.parent_fd < 0) {
>> +        perror("Failed to open file");
>> +        close(ruleset_fd);
>> +        return 1;
>> +    }
>> +    err = landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
>> +            sizeof(path_beneath), &path_beneath);
>> +    close(path_beneath.parent_fd);
>> +    if (err) {
>> +        perror("Failed to update ruleset");
>> +        close(ruleset_fd);
>> +        return 1;
>> +    }
>> +
>> +We now have a ruleset with one rule allowing read access to ``/usr`` while
>> +denying all accesses featured in ``attr_features.access_fs`` to everything else
>> +on the filesystem.  The next step is to restrict the current thread from
>> +gaining more privileges (e.g. thanks to a SUID binary).
>> +
>> +.. code-block:: c
>> +
>> +    if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
>> +        perror("Failed to restrict privileges");
>> +        close(ruleset_fd);
>> +        return 1;
>> +    }
>> +
>> +The current thread is now ready to sandbox itself with the ruleset.
>> +
>> +.. code-block:: c
>> +
>> +    struct landlock_attr_enforce attr_enforce = {
>> +        .ruleset_fd = ruleset_fd,
>> +    };
>> +
>> +    if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
>> +            sizeof(attr_enforce), &attr_enforce)) {
>> +        perror("Failed to enforce ruleset");
>> +        close(ruleset_fd);
>> +        return 1;
>> +    }
>> +    close(ruleset_fd);
>> +
>> +If this last system call succeeds, the current thread is now restricted and
> 
>    If this last landlock system call succeeds,
> 
> [because close() is the last system call]
> 
>> +this policy will be enforced on all its subsequently created children as well.
>> +Once a thread is landlocked, there is no way to remove its security policy,
> 
>                                                    preferably:         policy;
> 
>> +only adding more restrictions is allowed.  These threads are now in a new
>> +Landlock domain, merge of their parent one (if any) with the new ruleset.
>> +
>> +A full working code can be found in `samples/landlock/sandboxer.c`_.
> 
>    Full working code
> 
>> +
>> +
>> +Inheritance
>> +-----------
>> +
>> +Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program
>> +restrictions from its parent.  This is similar to the seccomp inheritance (cf.
>> +:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's
>> +:manpage:`credentials(7)`.  For instance, one process' thread may apply
> 
>                                                  process's
> 
>> +Landlock rules to itself, but they will not be automatically applied to other
>> +sibling threads (unlike POSIX thread credential changes, cf.
>> +:manpage:`nptl(7)`).
> 
> [snip]
> 
> thanks for the documentation.
> 

Done. Thanks for this attentive review!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
                   ` (11 preceding siblings ...)
       [not found] ` <20200227042002.3032-1-hdanton@sina.com>
@ 2020-03-09 23:44 ` Jann Horn
  2020-03-11 23:38   ` Mickaël Salaün
  12 siblings, 1 reply; 34+ messages in thread
From: Jann Horn @ 2020-03-09 23:44 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
> This new version of Landlock is a major revamp of the previous series
> [1], hence the RFC tag.  The three main changes are the replacement of
> eBPF with a dedicated safe management of access rules, the replacement
> of the use of seccomp(2) with a dedicated syscall, and the management of
> filesystem access-control (back from the v10).
>
> As discussed in [2], eBPF may be too powerful and dangerous to be put in
> the hand of unprivileged and potentially malicious processes, especially
> because of side-channel attacks against access-controls or other parts
> of the kernel.
>
> Thanks to this new implementation (1540 SLOC), designed from the ground
> to be used by unprivileged processes, this series enables a process to
> sandbox itself without requiring CAP_SYS_ADMIN, but only the
> no_new_privs constraint (like seccomp).  Not relying on eBPF also
> enables to improve performances, especially for stacked security
> policies thanks to mergeable rulesets.
>
> The compiled documentation is available here:
> https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html
>
> This series can be applied on top of v5.6-rc3.  This can be tested with
> CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK.  This patch series
> can be found in a Git repository here:
> https://github.com/landlock-lsm/linux/commits/landlock-v14
> I would really appreciate constructive comments on the design and the code.

I've looked through the patchset, and I think that it would be
possible to simplify it quite a bit. I have tried to do that (and
compiled-tested it, but not actually tried running it); here's what I
came up with:

https://github.com/thejh/linux/commits/landlock-mod

The three modified patches (patches 1, 2 and 5) are marked with
"[MODIFIED]" in their title. Please take a look - what do you think?
Feel free to integrate my changes into your patches if you think they
make sense.


Apart from simplifying the code, I also found the following issues,
which I have fixed in the modified patches:

put_hierarchy() has to drop a reference on its parent. (However, this
must not recurse, so we have to do it with a loop.)

put_ruleset() is not in an RCU read-side critical section, so as soon
as it calls kfree_rcu(), "freeme" might disappear; but "orig" is in
"freeme", so when the loop tries to find the next element with
rb_next(orig), that can be a UAF.
rbtree_postorder_for_each_entry_safe() exists for dealing with such
issues.

AFAIK the calls to rb_erase() in clean_ruleset() is not safe if
someone is concurrently accessing the rbtree as an RCU reader, because
concurrent rotations can prevent a lookup from succeeding. The
simplest fix is probably to just make any rbtree that has been
installed on a process immutable, and give up on the cleaning -
arguably the memory wastage that can cause is pretty limited. (By the
way, as a future optimization, we might want to turn the rbtree into a
hashtable when installing it?)

The iput() in landlock_release_inode() looks unsafe - you need to
guarantee that even if the deletion of a ruleset races with
generic_shutdown_super(), every iput() for that superblock finishes
before landlock_release_inodes() returns, even if the iput() is
happening in the context of ruleset deletion. This is why
fsnotify_unmount_inodes() has that wait_var_event() at the end.


Aside from those things, there is also a major correctness issue where
I'm not sure how to solve it properly:

Let's say a process installs a filter on itself like this:

struct landlock_attr_ruleset ruleset = { .handled_access_fs =
ACCESS_FS_ROUGHLY_WRITE};
int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
struct landlock_attr_path_beneath path_beneath = {
  .ruleset_fd = ruleset_fd,
  .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
  .parent_fd = open("/tmp/foobar", O_PATH),
};
landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
sizeof(path_beneath), &path_beneath);
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
sizeof(attr_enforce), &attr_enforce);

At this point, the process is not supposed to be able to write to
anything outside /tmp/foobar, right? But what happens if the process
does the following next?

struct landlock_attr_ruleset ruleset = { .handled_access_fs =
ACCESS_FS_ROUGHLY_WRITE};
int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
struct landlock_attr_path_beneath path_beneath = {
  .ruleset_fd = ruleset_fd,
  .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
  .parent_fd = open("/", O_PATH),
};
landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
sizeof(path_beneath), &path_beneath);
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
sizeof(attr_enforce), &attr_enforce);

As far as I can tell from looking at the source, after this, you will
have write access to the entire filesystem again. I think the idea is
that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
not increase them, right?

I think the easy way to fix this would be to add a bitmask to each
rule that says from which ruleset it originally comes, and then let
check_access_path() collect these bitmasks from each rule with OR, and
check at the end whether the resulting bitmask is full - if not, at
least one of the rulesets did not permit the access, and it should be
denied.

But maybe it would make more sense to change how the API works
instead, and get rid of the concept of "merging" two rulesets
together? Instead, we could make the API work like this:

 - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
->private_data contains a pointer to the old ruleset of the process,
as well as a pointer to a new empty ruleset.
 - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
permitted by the old ruleset, then adds the rule to the new ruleset
 - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
->private_data doesn't match the current ruleset of the process, then
replaces the old ruleset with the new ruleset.

With this, the new ruleset is guaranteed to be a subset of the old
ruleset because each of the new ruleset's rules is permitted by the
old ruleset. (Unless the directory hierarchy rotates, but in that case
the inaccuracy isn't much worse than what would've been possible
through RCU path walk anyway AFAIK.)

What do you think?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-09 23:44 ` [RFC PATCH v14 00/10] Landlock LSM Jann Horn
@ 2020-03-11 23:38   ` Mickaël Salaün
  2020-03-17 16:19     ` Jann Horn
  0 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-03-11 23:38 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers


On 10/03/2020 00:44, Jann Horn wrote:
> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
>> This new version of Landlock is a major revamp of the previous series
>> [1], hence the RFC tag.  The three main changes are the replacement of
>> eBPF with a dedicated safe management of access rules, the replacement
>> of the use of seccomp(2) with a dedicated syscall, and the management of
>> filesystem access-control (back from the v10).
>>
>> As discussed in [2], eBPF may be too powerful and dangerous to be put in
>> the hand of unprivileged and potentially malicious processes, especially
>> because of side-channel attacks against access-controls or other parts
>> of the kernel.
>>
>> Thanks to this new implementation (1540 SLOC), designed from the ground
>> to be used by unprivileged processes, this series enables a process to
>> sandbox itself without requiring CAP_SYS_ADMIN, but only the
>> no_new_privs constraint (like seccomp).  Not relying on eBPF also
>> enables to improve performances, especially for stacked security
>> policies thanks to mergeable rulesets.
>>
>> The compiled documentation is available here:
>> https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html
>>
>> This series can be applied on top of v5.6-rc3.  This can be tested with
>> CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK.  This patch series
>> can be found in a Git repository here:
>> https://github.com/landlock-lsm/linux/commits/landlock-v14
>> I would really appreciate constructive comments on the design and the code.
> 
> I've looked through the patchset, and I think that it would be
> possible to simplify it quite a bit. I have tried to do that (and
> compiled-tested it, but not actually tried running it); here's what I
> came up with:
> 
> https://github.com/thejh/linux/commits/landlock-mod
> 
> The three modified patches (patches 1, 2 and 5) are marked with
> "[MODIFIED]" in their title. Please take a look - what do you think?
> Feel free to integrate my changes into your patches if you think they
> make sense.

Regarding the landlock_release_inodes(), the final wait_var_event() is
indeed needed (as does fsnotify), but why do you use a READ_ONCE() for
landlock_initialized?

I was reluctant to use function pointers but landlock_object_operations
makes a cleaner and more generic interface to manage objects.

Your get_inode_object() is much simpler and easier to understand than
the get_object() and get_cleaner().
The other main change is about the object cross-reference: you entirely
removed it, which means that an object will only be free when there are
no rules using it. This does not free an object when its underlying
object is being terminated. We now only have to worry about the
termination of the parent of an underlying object (e.g. the super-block
of an inode).

However, I think you forgot to increment object->usage in
create_ruleset_elem(). There is also an unused checked_mask variable in
merge_ruleset().

All this removes optimizations that made the code more difficult to
understand. The performance difference is negligible, and I think that
the memory footprint is fine.
These optimizations (and others) could be discussed later. I'm
integrating most of your changes in the next patch series.

Thank you very much for this review and the code.

> 
> 
> Apart from simplifying the code, I also found the following issues,
> which I have fixed in the modified patches:
> 
> put_hierarchy() has to drop a reference on its parent. (However, this
> must not recurse, so we have to do it with a loop.)

Right, fixed.

> 
> put_ruleset() is not in an RCU read-side critical section, so as soon
> as it calls kfree_rcu(), "freeme" might disappear; but "orig" is in
> "freeme", so when the loop tries to find the next element with
> rb_next(orig), that can be a UAF.
> rbtree_postorder_for_each_entry_safe() exists for dealing with such
> issues.

Good catch.

> 
> AFAIK the calls to rb_erase() in clean_ruleset() is not safe if
> someone is concurrently accessing the rbtree as an RCU reader, because
> concurrent rotations can prevent a lookup from succeeding. The
> simplest fix is probably to just make any rbtree that has been
> installed on a process immutable, and give up on the cleaning -
> arguably the memory wastage that can cause is pretty limited.

Yes, let's go for immutable domains.

> (By the
> way, as a future optimization, we might want to turn the rbtree into a
> hashtable when installing it?)

Definitely. This was a previous (private) implementation I did for
domains, but to simplify the code I reused the same type as a ruleset. A
future evolution of Landlock could add back this optimization.

> 
> The iput() in landlock_release_inode() looks unsafe - you need to
> guarantee that even if the deletion of a ruleset races with
> generic_shutdown_super(), every iput() for that superblock finishes
> before landlock_release_inodes() returns, even if the iput() is
> happening in the context of ruleset deletion. This is why
> fsnotify_unmount_inodes() has that wait_var_event() at the end.

Right, much better with that.

> 
> 
> Aside from those things, there is also a major correctness issue where
> I'm not sure how to solve it properly:
> 
> Let's say a process installs a filter on itself like this:
> 
> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> ACCESS_FS_ROUGHLY_WRITE};
> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> struct landlock_attr_path_beneath path_beneath = {
>   .ruleset_fd = ruleset_fd,
>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>   .parent_fd = open("/tmp/foobar", O_PATH),
> };
> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> sizeof(path_beneath), &path_beneath);
> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> sizeof(attr_enforce), &attr_enforce);
> 
> At this point, the process is not supposed to be able to write to
> anything outside /tmp/foobar, right? But what happens if the process
> does the following next?
> 
> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> ACCESS_FS_ROUGHLY_WRITE};
> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> struct landlock_attr_path_beneath path_beneath = {
>   .ruleset_fd = ruleset_fd,
>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>   .parent_fd = open("/", O_PATH),
> };
> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> sizeof(path_beneath), &path_beneath);
> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> sizeof(attr_enforce), &attr_enforce);
> 
> As far as I can tell from looking at the source, after this, you will
> have write access to the entire filesystem again. I think the idea is
> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
> not increase them, right?

There is an additionnal check in syscall.c:get_path_from_fd(): it is
forbidden to add a rule with a path which is not accessible (according
to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(),
but this is definitely not perfect.

> 
> I think the easy way to fix this would be to add a bitmask to each
> rule that says from which ruleset it originally comes, and then let
> check_access_path() collect these bitmasks from each rule with OR, and
> check at the end whether the resulting bitmask is full - if not, at
> least one of the rulesets did not permit the access, and it should be
> denied.
> 
> But maybe it would make more sense to change how the API works
> instead, and get rid of the concept of "merging" two rulesets
> together? Instead, we could make the API work like this:
> 
>  - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
> ->private_data contains a pointer to the old ruleset of the process,
> as well as a pointer to a new empty ruleset.
>  - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
> permitted by the old ruleset, then adds the rule to the new ruleset
>  - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
> ->private_data doesn't match the current ruleset of the process, then
> replaces the old ruleset with the new ruleset.
> 
> With this, the new ruleset is guaranteed to be a subset of the old
> ruleset because each of the new ruleset's rules is permitted by the
> old ruleset. (Unless the directory hierarchy rotates, but in that case
> the inaccuracy isn't much worse than what would've been possible
> through RCU path walk anyway AFAIK.)
> 
> What do you think?
> 

I would prefer to add the same checks you described at first (with
check_access_path), but only when creating a new ruleset with
merge_ruleset() (which should probably be renamed). This enables not to
rely on a parent ruleset/domain until the enforcement, which is the case
anyway.
Unfortunately this doesn't work for some cases with bind mounts. Because
check_access_path() goes through one path, another (bind mounted) path
could be illegitimately allowed.
That makes the problem a bit more complicated. A solution may be to keep
track of the hierarchy of each rule (e.g. with a layer/depth number),
and only allow an access request if at least a rule of each layer allow
this access. In this case we also need to correctly handle the case when
rules from different layers are tied to the same object.

I would like Landlock to have "pure" syscalls, in the sense that a
process A (e.g. a daemon) could prepare a ruleset and sends its FD to a
process B which would then be able to use it to sandbox itself. I think
it makes the reasoning clearer not to have a given ruleset (FD) tied to
a domain (i.e. parent ruleset) at first.
Landlock should (as much as possible) return an error if a syscall
argument is invalid, not according to the current access control (which
is not the case currently because of the security_file_open() check).
This means that these additional merge_ruleset() checks should only
affect the new domain/ruleset, but it should not be visible to userspace.

In a future evolution, it may be useful to add a lock/seal command to
deny any additional rule enforcement. However that may be
counter-productive because that enable application developers (e.g. for
a shell) to deny the use of Landlock features to its child processes.
But it would be possible anyway with seccomp-bpf…

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-11 23:38   ` Mickaël Salaün
@ 2020-03-17 16:19     ` Jann Horn
  2020-03-17 17:50       ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Jann Horn @ 2020-03-17 16:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün <mic@digikod.net> wrote:
> On 10/03/2020 00:44, Jann Horn wrote:
> > On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
> >> This new version of Landlock is a major revamp of the previous series
> >> [1], hence the RFC tag.  The three main changes are the replacement of
> >> eBPF with a dedicated safe management of access rules, the replacement
> >> of the use of seccomp(2) with a dedicated syscall, and the management of
> >> filesystem access-control (back from the v10).
> >>
> >> As discussed in [2], eBPF may be too powerful and dangerous to be put in
> >> the hand of unprivileged and potentially malicious processes, especially
> >> because of side-channel attacks against access-controls or other parts
> >> of the kernel.
> >>
> >> Thanks to this new implementation (1540 SLOC), designed from the ground
> >> to be used by unprivileged processes, this series enables a process to
> >> sandbox itself without requiring CAP_SYS_ADMIN, but only the
> >> no_new_privs constraint (like seccomp).  Not relying on eBPF also
> >> enables to improve performances, especially for stacked security
> >> policies thanks to mergeable rulesets.
> >>
> >> The compiled documentation is available here:
> >> https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html
> >>
> >> This series can be applied on top of v5.6-rc3.  This can be tested with
> >> CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK.  This patch series
> >> can be found in a Git repository here:
> >> https://github.com/landlock-lsm/linux/commits/landlock-v14
> >> I would really appreciate constructive comments on the design and the code.
> >
> > I've looked through the patchset, and I think that it would be
> > possible to simplify it quite a bit. I have tried to do that (and
> > compiled-tested it, but not actually tried running it); here's what I
> > came up with:
> >
> > https://github.com/thejh/linux/commits/landlock-mod
> >
> > The three modified patches (patches 1, 2 and 5) are marked with
> > "[MODIFIED]" in their title. Please take a look - what do you think?
> > Feel free to integrate my changes into your patches if you think they
> > make sense.
>
> Regarding the landlock_release_inodes(), the final wait_var_event() is
> indeed needed (as does fsnotify), but why do you use a READ_ONCE() for
> landlock_initialized?

Ah, good point - that READ_ONCE() should be unnecessary.

> The other main change is about the object cross-reference: you entirely
> removed it, which means that an object will only be free when there are
> no rules using it. This does not free an object when its underlying
> object is being terminated. We now only have to worry about the
> termination of the parent of an underlying object (e.g. the super-block
> of an inode).
>
> However, I think you forgot to increment object->usage in
> create_ruleset_elem().

Whoops, you're right.

> There is also an unused checked_mask variable in merge_ruleset().

Oh, yeah, oops.

> All this removes optimizations that made the code more difficult to
> understand. The performance difference is negligible, and I think that
> the memory footprint is fine.
> These optimizations (and others) could be discussed later. I'm
> integrating most of your changes in the next patch series.

:)

> > Aside from those things, there is also a major correctness issue where
> > I'm not sure how to solve it properly:
> >
> > Let's say a process installs a filter on itself like this:
> >
> > struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> > ACCESS_FS_ROUGHLY_WRITE};
> > int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> > LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> > struct landlock_attr_path_beneath path_beneath = {
> >   .ruleset_fd = ruleset_fd,
> >   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
> >   .parent_fd = open("/tmp/foobar", O_PATH),
> > };
> > landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> > sizeof(path_beneath), &path_beneath);
> > prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> > struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> > landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> > sizeof(attr_enforce), &attr_enforce);
> >
> > At this point, the process is not supposed to be able to write to
> > anything outside /tmp/foobar, right? But what happens if the process
> > does the following next?
> >
> > struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> > ACCESS_FS_ROUGHLY_WRITE};
> > int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> > LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> > struct landlock_attr_path_beneath path_beneath = {
> >   .ruleset_fd = ruleset_fd,
> >   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
> >   .parent_fd = open("/", O_PATH),
> > };
> > landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> > sizeof(path_beneath), &path_beneath);
> > prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> > struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> > landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> > sizeof(attr_enforce), &attr_enforce);
> >
> > As far as I can tell from looking at the source, after this, you will
> > have write access to the entire filesystem again. I think the idea is
> > that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
> > not increase them, right?
>
> There is an additionnal check in syscall.c:get_path_from_fd(): it is
> forbidden to add a rule with a path which is not accessible (according
> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(),
> but this is definitely not perfect.

Ah, I missed that.

> > I think the easy way to fix this would be to add a bitmask to each
> > rule that says from which ruleset it originally comes, and then let
> > check_access_path() collect these bitmasks from each rule with OR, and
> > check at the end whether the resulting bitmask is full - if not, at
> > least one of the rulesets did not permit the access, and it should be
> > denied.
> >
> > But maybe it would make more sense to change how the API works
> > instead, and get rid of the concept of "merging" two rulesets
> > together? Instead, we could make the API work like this:
> >
> >  - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
> > ->private_data contains a pointer to the old ruleset of the process,
> > as well as a pointer to a new empty ruleset.
> >  - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
> > permitted by the old ruleset, then adds the rule to the new ruleset
> >  - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
> > ->private_data doesn't match the current ruleset of the process, then
> > replaces the old ruleset with the new ruleset.
> >
> > With this, the new ruleset is guaranteed to be a subset of the old
> > ruleset because each of the new ruleset's rules is permitted by the
> > old ruleset. (Unless the directory hierarchy rotates, but in that case
> > the inaccuracy isn't much worse than what would've been possible
> > through RCU path walk anyway AFAIK.)
> >
> > What do you think?
> >
>
> I would prefer to add the same checks you described at first (with
> check_access_path), but only when creating a new ruleset with
> merge_ruleset() (which should probably be renamed). This enables not to
> rely on a parent ruleset/domain until the enforcement, which is the case
> anyway.
> Unfortunately this doesn't work for some cases with bind mounts. Because
> check_access_path() goes through one path, another (bind mounted) path
> could be illegitimately allowed.

Hmm... I'm not sure what you mean. At the moment, landlock doesn't
allow any sandboxed process to change the mount hierarchy, right? Can
you give an example where this would go wrong?

> That makes the problem a bit more complicated. A solution may be to keep
> track of the hierarchy of each rule (e.g. with a layer/depth number),
> and only allow an access request if at least a rule of each layer allow
> this access. In this case we also need to correctly handle the case when
> rules from different layers are tied to the same object.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 06/10] landlock: Add syscall implementation
  2020-02-24 16:02 ` [RFC PATCH v14 06/10] landlock: Add syscall implementation Mickaël Salaün
@ 2020-03-17 16:47   ` Al Viro
  2020-03-17 17:51     ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Al Viro @ 2020-03-17 16:47 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Greg Kroah-Hartman, James Morris, Jann Horn, Jonathan Corbet,
	Kees Cook, Michael Kerrisk, Mickaël Salaün,
	Serge E . Hallyn, Shuah Khan, Vincent Dagonneau,
	kernel-hardening, linux-api, linux-arch, linux-doc,
	linux-fsdevel, linux-kselftest, linux-security-module, x86

On Mon, Feb 24, 2020 at 05:02:11PM +0100, Mickaël Salaün wrote:

> +static int get_path_from_fd(u64 fd, struct path *path)

> +	/*
> +	 * Only allows O_PATH FD: enable to restrict ambiant (FS) accesses
> +	 * without requiring to open and risk leaking or misuing a FD.  Accept
> +	 * removed, but still open directory (S_DEAD).
> +	 */
> +	if (!(f.file->f_mode & FMODE_PATH) || !f.file->f_path.mnt ||
					      ^^^^^^^^^^^^^^^^^^^
Could you explain what that one had been be about?  The underlined
subexpression is always false; was that supposed to check some
condition and if so, which one?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-17 16:19     ` Jann Horn
@ 2020-03-17 17:50       ` Mickaël Salaün
  2020-03-17 19:45         ` Jann Horn
  0 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-03-17 17:50 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers


On 17/03/2020 17:19, Jann Horn wrote:
> On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün <mic@digikod.net> wrote:
>> On 10/03/2020 00:44, Jann Horn wrote:
>>> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:

[...]

>>> Aside from those things, there is also a major correctness issue where
>>> I'm not sure how to solve it properly:
>>>
>>> Let's say a process installs a filter on itself like this:
>>>
>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
>>> ACCESS_FS_ROUGHLY_WRITE};
>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
>>> struct landlock_attr_path_beneath path_beneath = {
>>>   .ruleset_fd = ruleset_fd,
>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>>>   .parent_fd = open("/tmp/foobar", O_PATH),
>>> };
>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
>>> sizeof(path_beneath), &path_beneath);
>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
>>> sizeof(attr_enforce), &attr_enforce);
>>>
>>> At this point, the process is not supposed to be able to write to
>>> anything outside /tmp/foobar, right? But what happens if the process
>>> does the following next?
>>>
>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
>>> ACCESS_FS_ROUGHLY_WRITE};
>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
>>> struct landlock_attr_path_beneath path_beneath = {
>>>   .ruleset_fd = ruleset_fd,
>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>>>   .parent_fd = open("/", O_PATH),
>>> };
>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
>>> sizeof(path_beneath), &path_beneath);
>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
>>> sizeof(attr_enforce), &attr_enforce);
>>>
>>> As far as I can tell from looking at the source, after this, you will
>>> have write access to the entire filesystem again. I think the idea is
>>> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
>>> not increase them, right?
>>
>> There is an additionnal check in syscall.c:get_path_from_fd(): it is
>> forbidden to add a rule with a path which is not accessible (according
>> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(),
>> but this is definitely not perfect.
> 
> Ah, I missed that.
> 
>>> I think the easy way to fix this would be to add a bitmask to each
>>> rule that says from which ruleset it originally comes, and then let
>>> check_access_path() collect these bitmasks from each rule with OR, and
>>> check at the end whether the resulting bitmask is full - if not, at
>>> least one of the rulesets did not permit the access, and it should be
>>> denied.
>>>
>>> But maybe it would make more sense to change how the API works
>>> instead, and get rid of the concept of "merging" two rulesets
>>> together? Instead, we could make the API work like this:
>>>
>>>  - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
>>> ->private_data contains a pointer to the old ruleset of the process,
>>> as well as a pointer to a new empty ruleset.
>>>  - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
>>> permitted by the old ruleset, then adds the rule to the new ruleset
>>>  - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
>>> ->private_data doesn't match the current ruleset of the process, then
>>> replaces the old ruleset with the new ruleset.
>>>
>>> With this, the new ruleset is guaranteed to be a subset of the old
>>> ruleset because each of the new ruleset's rules is permitted by the
>>> old ruleset. (Unless the directory hierarchy rotates, but in that case
>>> the inaccuracy isn't much worse than what would've been possible
>>> through RCU path walk anyway AFAIK.)
>>>
>>> What do you think?
>>>
>>
>> I would prefer to add the same checks you described at first (with
>> check_access_path), but only when creating a new ruleset with
>> merge_ruleset() (which should probably be renamed). This enables not to
>> rely on a parent ruleset/domain until the enforcement, which is the case
>> anyway.
>> Unfortunately this doesn't work for some cases with bind mounts. Because
>> check_access_path() goes through one path, another (bind mounted) path
>> could be illegitimately allowed.
> 
> Hmm... I'm not sure what you mean. At the moment, landlock doesn't
> allow any sandboxed process to change the mount hierarchy, right? Can
> you give an example where this would go wrong?

Indeed, a Landlocked process must no be able to change its mount
namespace layout. However, bind mounts may already exist.
Let's say a process sandbox itself to only access /a in a read-write
way. Then, this process (or one of its children) add a new restriction
on /a/b to only be able to read this hierarchy. The check at insertion
time would allow this because this access right is a subset of the
access right allowed with the parent directory. However, If /a/b is bind
mounted somewhere else, let's say in /private/b, then the second
enforcement just gave new access rights to this hierarchy too. This is
why it seems risky to rely on a check about the legitimacy of a new
access right when adding it to a ruleset or when enforcing it.


> 
>> That makes the problem a bit more complicated. A solution may be to keep
>> track of the hierarchy of each rule (e.g. with a layer/depth number),
>> and only allow an access request if at least a rule of each layer allow
>> this access. In this case we also need to correctly handle the case when
>> rules from different layers are tied to the same object.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 06/10] landlock: Add syscall implementation
  2020-03-17 16:47   ` Al Viro
@ 2020-03-17 17:51     ` Mickaël Salaün
  0 siblings, 0 replies; 34+ messages in thread
From: Mickaël Salaün @ 2020-03-17 17:51 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Greg Kroah-Hartman, James Morris, Jann Horn, Jonathan Corbet,
	Kees Cook, Michael Kerrisk, Mickaël Salaün,
	Serge E . Hallyn, Shuah Khan, Vincent Dagonneau,
	kernel-hardening, linux-api, linux-arch, linux-doc,
	linux-fsdevel, linux-kselftest, linux-security-module, x86


On 17/03/2020 17:47, Al Viro wrote:
> On Mon, Feb 24, 2020 at 05:02:11PM +0100, Mickaël Salaün wrote:
> 
>> +static int get_path_from_fd(u64 fd, struct path *path)
> 
>> +	/*
>> +	 * Only allows O_PATH FD: enable to restrict ambiant (FS) accesses
>> +	 * without requiring to open and risk leaking or misuing a FD.  Accept
>> +	 * removed, but still open directory (S_DEAD).
>> +	 */
>> +	if (!(f.file->f_mode & FMODE_PATH) || !f.file->f_path.mnt ||
> 					      ^^^^^^^^^^^^^^^^^^^
> Could you explain what that one had been be about?  The underlined
> subexpression is always false; was that supposed to check some
> condition and if so, which one?
> 

This was just to be sure that the next assignment "path->mnt =
f.file->f_path.mnt;" always creates a valid path. If this is always
true, I will remove it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-17 17:50       ` Mickaël Salaün
@ 2020-03-17 19:45         ` Jann Horn
  2020-03-18 12:06           ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Jann Horn @ 2020-03-17 19:45 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün <mic@digikod.net> wrote:
> On 17/03/2020 17:19, Jann Horn wrote:
> > On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün <mic@digikod.net> wrote:
> >> On 10/03/2020 00:44, Jann Horn wrote:
> >>> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> [...]
>
> >>> Aside from those things, there is also a major correctness issue where
> >>> I'm not sure how to solve it properly:
> >>>
> >>> Let's say a process installs a filter on itself like this:
> >>>
> >>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> >>> ACCESS_FS_ROUGHLY_WRITE};
> >>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> >>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> >>> struct landlock_attr_path_beneath path_beneath = {
> >>>   .ruleset_fd = ruleset_fd,
> >>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
> >>>   .parent_fd = open("/tmp/foobar", O_PATH),
> >>> };
> >>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> >>> sizeof(path_beneath), &path_beneath);
> >>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> >>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> >>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> >>> sizeof(attr_enforce), &attr_enforce);
> >>>
> >>> At this point, the process is not supposed to be able to write to
> >>> anything outside /tmp/foobar, right? But what happens if the process
> >>> does the following next?
> >>>
> >>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> >>> ACCESS_FS_ROUGHLY_WRITE};
> >>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> >>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> >>> struct landlock_attr_path_beneath path_beneath = {
> >>>   .ruleset_fd = ruleset_fd,
> >>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
> >>>   .parent_fd = open("/", O_PATH),
> >>> };
> >>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> >>> sizeof(path_beneath), &path_beneath);
> >>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> >>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> >>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> >>> sizeof(attr_enforce), &attr_enforce);
> >>>
> >>> As far as I can tell from looking at the source, after this, you will
> >>> have write access to the entire filesystem again. I think the idea is
> >>> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
> >>> not increase them, right?
> >>
> >> There is an additionnal check in syscall.c:get_path_from_fd(): it is
> >> forbidden to add a rule with a path which is not accessible (according
> >> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(),
> >> but this is definitely not perfect.
> >
> > Ah, I missed that.
> >
> >>> I think the easy way to fix this would be to add a bitmask to each
> >>> rule that says from which ruleset it originally comes, and then let
> >>> check_access_path() collect these bitmasks from each rule with OR, and
> >>> check at the end whether the resulting bitmask is full - if not, at
> >>> least one of the rulesets did not permit the access, and it should be
> >>> denied.
> >>>
> >>> But maybe it would make more sense to change how the API works
> >>> instead, and get rid of the concept of "merging" two rulesets
> >>> together? Instead, we could make the API work like this:
> >>>
> >>>  - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
> >>> ->private_data contains a pointer to the old ruleset of the process,
> >>> as well as a pointer to a new empty ruleset.
> >>>  - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
> >>> permitted by the old ruleset, then adds the rule to the new ruleset
> >>>  - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
> >>> ->private_data doesn't match the current ruleset of the process, then
> >>> replaces the old ruleset with the new ruleset.
> >>>
> >>> With this, the new ruleset is guaranteed to be a subset of the old
> >>> ruleset because each of the new ruleset's rules is permitted by the
> >>> old ruleset. (Unless the directory hierarchy rotates, but in that case
> >>> the inaccuracy isn't much worse than what would've been possible
> >>> through RCU path walk anyway AFAIK.)
> >>>
> >>> What do you think?
> >>>
> >>
> >> I would prefer to add the same checks you described at first (with
> >> check_access_path), but only when creating a new ruleset with
> >> merge_ruleset() (which should probably be renamed). This enables not to
> >> rely on a parent ruleset/domain until the enforcement, which is the case
> >> anyway.
> >> Unfortunately this doesn't work for some cases with bind mounts. Because
> >> check_access_path() goes through one path, another (bind mounted) path
> >> could be illegitimately allowed.
> >
> > Hmm... I'm not sure what you mean. At the moment, landlock doesn't
> > allow any sandboxed process to change the mount hierarchy, right? Can
> > you give an example where this would go wrong?
>
> Indeed, a Landlocked process must no be able to change its mount
> namespace layout. However, bind mounts may already exist.
> Let's say a process sandbox itself to only access /a in a read-write
> way.

So, first policy:

/a RW

> Then, this process (or one of its children) add a new restriction
> on /a/b to only be able to read this hierarchy.

You mean with the second policy looking like this?

/a RW
/a/b R

Then the resulting policy would be:

/a RW policy_bitmask=0x00000003 (bits 0 and 1 set)
/a/b R policy_bitmask=0x00000002 (bit 1 set)
required_bits=0x00000003 (bits 0 and 1 set)

> The check at insertion
> time would allow this because this access right is a subset of the
> access right allowed with the parent directory. However, If /a/b is bind
> mounted somewhere else, let's say in /private/b, then the second
> enforcement just gave new access rights to this hierarchy too.

But with the solution I proposed, landlock's path walk would see
something like this when accessing a file at /private/b/foo:
/private/b/foo <no rules>
  policies seen until now: 0x00000000
/private/b <access: R, policy_bitmask=0x00000002>
  policies seen until now: 0x00000002
/private <no rules>
  policies seen until now: 0x00000002
/ <no rules>
  policies seen until now: 0x00000002

It wouldn't encounter any rule from the first policy, so the OR of the
seen policy bitmasks would be 0x00000002, which is not the required
value 0x00000003, and so the access would be denied.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-17 19:45         ` Jann Horn
@ 2020-03-18 12:06           ` Mickaël Salaün
  2020-03-18 23:33             ` Jann Horn
  0 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-03-18 12:06 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers


On 17/03/2020 20:45, Jann Horn wrote:
> On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün <mic@digikod.net> wrote:
>> On 17/03/2020 17:19, Jann Horn wrote:
>>> On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün <mic@digikod.net> wrote:
>>>> On 10/03/2020 00:44, Jann Horn wrote:
>>>>> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
>>
>> [...]
>>
>>>>> Aside from those things, there is also a major correctness issue where
>>>>> I'm not sure how to solve it properly:
>>>>>
>>>>> Let's say a process installs a filter on itself like this:
>>>>>
>>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
>>>>> ACCESS_FS_ROUGHLY_WRITE};
>>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
>>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
>>>>> struct landlock_attr_path_beneath path_beneath = {
>>>>>   .ruleset_fd = ruleset_fd,
>>>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>>>>>   .parent_fd = open("/tmp/foobar", O_PATH),
>>>>> };
>>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
>>>>> sizeof(path_beneath), &path_beneath);
>>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
>>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
>>>>> sizeof(attr_enforce), &attr_enforce);
>>>>>
>>>>> At this point, the process is not supposed to be able to write to
>>>>> anything outside /tmp/foobar, right? But what happens if the process
>>>>> does the following next?
>>>>>
>>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
>>>>> ACCESS_FS_ROUGHLY_WRITE};
>>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
>>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
>>>>> struct landlock_attr_path_beneath path_beneath = {
>>>>>   .ruleset_fd = ruleset_fd,
>>>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>>>>>   .parent_fd = open("/", O_PATH),
>>>>> };
>>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
>>>>> sizeof(path_beneath), &path_beneath);
>>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
>>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
>>>>> sizeof(attr_enforce), &attr_enforce);
>>>>>
>>>>> As far as I can tell from looking at the source, after this, you will
>>>>> have write access to the entire filesystem again. I think the idea is
>>>>> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
>>>>> not increase them, right?
>>>>
>>>> There is an additionnal check in syscall.c:get_path_from_fd(): it is
>>>> forbidden to add a rule with a path which is not accessible (according
>>>> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(),
>>>> but this is definitely not perfect.
>>>
>>> Ah, I missed that.
>>>
>>>>> I think the easy way to fix this would be to add a bitmask to each
>>>>> rule that says from which ruleset it originally comes, and then let
>>>>> check_access_path() collect these bitmasks from each rule with OR, and
>>>>> check at the end whether the resulting bitmask is full - if not, at
>>>>> least one of the rulesets did not permit the access, and it should be
>>>>> denied.
>>>>>
>>>>> But maybe it would make more sense to change how the API works
>>>>> instead, and get rid of the concept of "merging" two rulesets
>>>>> together? Instead, we could make the API work like this:
>>>>>
>>>>>  - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
>>>>> ->private_data contains a pointer to the old ruleset of the process,
>>>>> as well as a pointer to a new empty ruleset.
>>>>>  - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
>>>>> permitted by the old ruleset, then adds the rule to the new ruleset
>>>>>  - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
>>>>> ->private_data doesn't match the current ruleset of the process, then
>>>>> replaces the old ruleset with the new ruleset.
>>>>>
>>>>> With this, the new ruleset is guaranteed to be a subset of the old
>>>>> ruleset because each of the new ruleset's rules is permitted by the
>>>>> old ruleset. (Unless the directory hierarchy rotates, but in that case
>>>>> the inaccuracy isn't much worse than what would've been possible
>>>>> through RCU path walk anyway AFAIK.)
>>>>>
>>>>> What do you think?
>>>>>
>>>>
>>>> I would prefer to add the same checks you described at first (with
>>>> check_access_path), but only when creating a new ruleset with
>>>> merge_ruleset() (which should probably be renamed). This enables not to
>>>> rely on a parent ruleset/domain until the enforcement, which is the case
>>>> anyway.
>>>> Unfortunately this doesn't work for some cases with bind mounts. Because
>>>> check_access_path() goes through one path, another (bind mounted) path
>>>> could be illegitimately allowed.
>>>
>>> Hmm... I'm not sure what you mean. At the moment, landlock doesn't
>>> allow any sandboxed process to change the mount hierarchy, right? Can
>>> you give an example where this would go wrong?
>>
>> Indeed, a Landlocked process must no be able to change its mount
>> namespace layout. However, bind mounts may already exist.
>> Let's say a process sandbox itself to only access /a in a read-write
>> way.
> 
> So, first policy:
> 
> /a RW
> 
>> Then, this process (or one of its children) add a new restriction
>> on /a/b to only be able to read this hierarchy.
> 
> You mean with the second policy looking like this?

Right.

> 
> /a RW
> /a/b R
> 
> Then the resulting policy would be:
> 
> /a RW policy_bitmask=0x00000003 (bits 0 and 1 set)
> /a/b R policy_bitmask=0x00000002 (bit 1 set)
> required_bits=0x00000003 (bits 0 and 1 set)
> 
>> The check at insertion
>> time would allow this because this access right is a subset of the
>> access right allowed with the parent directory. However, If /a/b is bind
>> mounted somewhere else, let's say in /private/b, then the second
>> enforcement just gave new access rights to this hierarchy too.
> 
> But with the solution I proposed, landlock's path walk would see
> something like this when accessing a file at /private/b/foo:
> /private/b/foo <no rules>
>   policies seen until now: 0x00000000
> /private/b <access: R, policy_bitmask=0x00000002>
>   policies seen until now: 0x00000002
> /private <no rules>
>   policies seen until now: 0x00000002
> / <no rules>
>   policies seen until now: 0x00000002
> 
> It wouldn't encounter any rule from the first policy, so the OR of the
> seen policy bitmasks would be 0x00000002, which is not the required
> value 0x00000003, and so the access would be denied.
As I understand your proposition, we need to build the required_bits
when adding a rule or enforcing/merging a ruleset with a domain. The
issue is that a rule only refers to a struct inode, not a struct path.
For your proposition to work, we would need to walk through the file
path when adding a rule to a ruleset, which means that we need to depend
of the current view of the process (i.e. its mount namespace), and its
Landlock domain. If the required_bits field is set when the ruleset is
merged with the domain, it is not possible anymore to walk through the
corresponding initial file path, which makes the enforcement step too
late to check for such consistency. The important point is that a
ruleset/domain doesn't have a notion of file hierarchy, a ruleset is
only a set of tagged inodes.

I'm not sure I got your proposition right, though. When and how would
you generate the required_bits?

Here is my updated proposition: add a layer level and a depth to each
rule (once enforced/merged with a domain), and a top layer level for a
domain. When enforcing a ruleset (i.e. merging a ruleset into the
current domain), the layer level of a new rule would be the incremented
top layer level. If there is no rule (from this domain) tied to the same
inode, then the depth of the new rule is 1. However, if there is already
a rule tied to the same inode and if this rule's layer level is the
previous top layer level, then the depth and the layer level are both
incremented and the rule is updated with the new access rights (boolean
AND).

The policy looks like this:
domain top_layer=2
/a RW policy_bitmask=0x00000003 layer=1 depth=1
/a/b R policy_bitmask=0x00000002 layer=2 depth=1

The path walk access check walks through all inodes and start with a
layer counter equal to the top layer of the current domain. For each
encountered inode tied to a rule, the access rights are checked and a
new check ensures that the layer of the matching rule is the same as the
counter (this may be a merged ruleset containing rules pertaining to the
same hierarchy, which is fine) or equal to the decremented counter (i.e.
the path walk just reached the underlying layer). If the path walk
encounter a rule with a layer strictly less than the counter minus one,
there is a whole in the layers which means that the ruleset
hierarchy/subset does not match, and the access must be denied.

When accessing a file at /private/b/foo for a read access:
/private/b/foo <no rules>
  allowed_access=unknown layer_counter=2
/private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1>
  allowed_access=allowed layer_counter=2
/private <no rules>
  allowed_access=allowed layer_counter=2
/ <no rules>
  allowed_access=allowed layer_counter=2

Because the layer_counter didn't reach 1, the access request is then denied.

This proposition enables not to rely on a parent ruleset at first, only
when enforcing/merging a ruleset with a domain. This also solves the
issue with multiple inherited/nested rules on the same inode (in which
case the depth just grows). Moreover, this enables to safely stop the
path walk as soon as we reach the layer 1.

Here is a more complex example. A process sandbox itself with a first rule:
domain top_layer=1
/a RW policy_bitmask=0x00000003 layer=1 depth=1

Then the sandbox process enforces to itself this second (useless) ruleset:
/a/b RW policy_bitmask=0x00000003

The resulting domain is then:
domain top_layer=2
/a RW policy_bitmask=0x00000003 layer=1 depth=1
/a/b RW policy_bitmask=0x00000003 layer=2 depth=1

Then the sandbox process enforces to itself this third ruleset (which
effectively reduces its access):
/a/b R policy_bitmask=0x00000002

The resulting domain is then:
domain top_layer=3
/a RW policy_bitmask=0x00000003 layer=1 depth=1
/a/b R policy_bitmask=0x00000002 layer=3 depth=2

At this time, only /a/b is accessible in a read way. The access rights
on /a are ignored (but still inherited).

Then the sandbox process enforces to itself this fourth ruleset:
/c R policy_bitmask=0x00000002

The resulting domain is then:
domain top_layer=4
/a RW policy_bitmask=0x00000003 layer=1 depth=1
/a/b R policy_bitmask=0x00000002 layer=3 depth=2
/c R policy_bitmask=0x00000002 layer=4 depth=1

Now, every read or write access requests will be denied.

Then the sandbox process enforces to itself this fifth ruleset:
/a R policy_bitmask=0x00000002

Because /a is not in a contiguous underneath layer, the resulting domain
is unchanged (except the top_layer which may be incremented anyway).
Of course, we must check that the top_layer is not overflowing, in which
case an error must be returned to inform userspace that the ruleset
can't be enforced.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-18 12:06           ` Mickaël Salaün
@ 2020-03-18 23:33             ` Jann Horn
  2020-03-19 16:58               ` Mickaël Salaün
  0 siblings, 1 reply; 34+ messages in thread
From: Jann Horn @ 2020-03-18 23:33 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün <mic@digikod.net> wrote:
> On 17/03/2020 20:45, Jann Horn wrote:
> > On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün <mic@digikod.net> wrote:
> >> On 17/03/2020 17:19, Jann Horn wrote:
> >>> On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün <mic@digikod.net> wrote:
> >>>> On 10/03/2020 00:44, Jann Horn wrote:
> >>>>> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
> >>
> >> [...]
> >>
> >>>>> Aside from those things, there is also a major correctness issue where
> >>>>> I'm not sure how to solve it properly:
> >>>>>
> >>>>> Let's say a process installs a filter on itself like this:
> >>>>>
> >>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> >>>>> ACCESS_FS_ROUGHLY_WRITE};
> >>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> >>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> >>>>> struct landlock_attr_path_beneath path_beneath = {
> >>>>>   .ruleset_fd = ruleset_fd,
> >>>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
> >>>>>   .parent_fd = open("/tmp/foobar", O_PATH),
> >>>>> };
> >>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> >>>>> sizeof(path_beneath), &path_beneath);
> >>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> >>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> >>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> >>>>> sizeof(attr_enforce), &attr_enforce);
> >>>>>
> >>>>> At this point, the process is not supposed to be able to write to
> >>>>> anything outside /tmp/foobar, right? But what happens if the process
> >>>>> does the following next?
> >>>>>
> >>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
> >>>>> ACCESS_FS_ROUGHLY_WRITE};
> >>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
> >>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
> >>>>> struct landlock_attr_path_beneath path_beneath = {
> >>>>>   .ruleset_fd = ruleset_fd,
> >>>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
> >>>>>   .parent_fd = open("/", O_PATH),
> >>>>> };
> >>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
> >>>>> sizeof(path_beneath), &path_beneath);
> >>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> >>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
> >>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
> >>>>> sizeof(attr_enforce), &attr_enforce);
> >>>>>
> >>>>> As far as I can tell from looking at the source, after this, you will
> >>>>> have write access to the entire filesystem again. I think the idea is
> >>>>> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
> >>>>> not increase them, right?
> >>>>
> >>>> There is an additionnal check in syscall.c:get_path_from_fd(): it is
> >>>> forbidden to add a rule with a path which is not accessible (according
> >>>> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(),
> >>>> but this is definitely not perfect.
> >>>
> >>> Ah, I missed that.
> >>>
> >>>>> I think the easy way to fix this would be to add a bitmask to each
> >>>>> rule that says from which ruleset it originally comes, and then let
> >>>>> check_access_path() collect these bitmasks from each rule with OR, and
> >>>>> check at the end whether the resulting bitmask is full - if not, at
> >>>>> least one of the rulesets did not permit the access, and it should be
> >>>>> denied.
> >>>>>
> >>>>> But maybe it would make more sense to change how the API works
> >>>>> instead, and get rid of the concept of "merging" two rulesets
> >>>>> together? Instead, we could make the API work like this:
> >>>>>
> >>>>>  - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
> >>>>> ->private_data contains a pointer to the old ruleset of the process,
> >>>>> as well as a pointer to a new empty ruleset.
> >>>>>  - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
> >>>>> permitted by the old ruleset, then adds the rule to the new ruleset
> >>>>>  - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
> >>>>> ->private_data doesn't match the current ruleset of the process, then
> >>>>> replaces the old ruleset with the new ruleset.
> >>>>>
> >>>>> With this, the new ruleset is guaranteed to be a subset of the old
> >>>>> ruleset because each of the new ruleset's rules is permitted by the
> >>>>> old ruleset. (Unless the directory hierarchy rotates, but in that case
> >>>>> the inaccuracy isn't much worse than what would've been possible
> >>>>> through RCU path walk anyway AFAIK.)
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>
> >>>> I would prefer to add the same checks you described at first (with
> >>>> check_access_path), but only when creating a new ruleset with
> >>>> merge_ruleset() (which should probably be renamed). This enables not to
> >>>> rely on a parent ruleset/domain until the enforcement, which is the case
> >>>> anyway.
> >>>> Unfortunately this doesn't work for some cases with bind mounts. Because
> >>>> check_access_path() goes through one path, another (bind mounted) path
> >>>> could be illegitimately allowed.
> >>>
> >>> Hmm... I'm not sure what you mean. At the moment, landlock doesn't
> >>> allow any sandboxed process to change the mount hierarchy, right? Can
> >>> you give an example where this would go wrong?
> >>
> >> Indeed, a Landlocked process must no be able to change its mount
> >> namespace layout. However, bind mounts may already exist.
> >> Let's say a process sandbox itself to only access /a in a read-write
> >> way.
> >
> > So, first policy:
> >
> > /a RW
> >
> >> Then, this process (or one of its children) add a new restriction
> >> on /a/b to only be able to read this hierarchy.
> >
> > You mean with the second policy looking like this?
>
> Right.
>
> >
> > /a RW
> > /a/b R
> >
> > Then the resulting policy would be:
> >
> > /a RW policy_bitmask=0x00000003 (bits 0 and 1 set)
> > /a/b R policy_bitmask=0x00000002 (bit 1 set)
> > required_bits=0x00000003 (bits 0 and 1 set)
> >
> >> The check at insertion
> >> time would allow this because this access right is a subset of the
> >> access right allowed with the parent directory. However, If /a/b is bind
> >> mounted somewhere else, let's say in /private/b, then the second
> >> enforcement just gave new access rights to this hierarchy too.
> >
> > But with the solution I proposed, landlock's path walk would see
> > something like this when accessing a file at /private/b/foo:
> > /private/b/foo <no rules>
> >   policies seen until now: 0x00000000
> > /private/b <access: R, policy_bitmask=0x00000002>
> >   policies seen until now: 0x00000002
> > /private <no rules>
> >   policies seen until now: 0x00000002
> > / <no rules>
> >   policies seen until now: 0x00000002
> >
> > It wouldn't encounter any rule from the first policy, so the OR of the
> > seen policy bitmasks would be 0x00000002, which is not the required
> > value 0x00000003, and so the access would be denied.
> As I understand your proposition, we need to build the required_bits
> when adding a rule or enforcing/merging a ruleset with a domain. The
> issue is that a rule only refers to a struct inode, not a struct path.
> For your proposition to work, we would need to walk through the file
> path when adding a rule to a ruleset, which means that we need to depend
> of the current view of the process (i.e. its mount namespace), and its
> Landlock domain.

I don't see why that is necessary. Why would we have to walk the file
path when adding a rule?

> If the required_bits field is set when the ruleset is
> merged with the domain, it is not possible anymore to walk through the
> corresponding initial file path, which makes the enforcement step too
> late to check for such consistency. The important point is that a
> ruleset/domain doesn't have a notion of file hierarchy, a ruleset is
> only a set of tagged inodes.
>
> I'm not sure I got your proposition right, though. When and how would
> you generate the required_bits?

Using your terminology:
A domain is a collection of N layers, which are assigned indices 0..N-1.
For each possible access type, a domain has a bitmask containing N
bits that stores which layers control that access type. (Basically a
per-layer version of fs_access_mask.)
To validate an access, you start by ORing together the bitmasks for
the requested access types; that gives you the required_bits mask,
which lists all layers that want to control the access.
Then you set seen_policy_bits=0, then do the
check_access_path_continue() loop while keeping track of which layers
you've seen with "seen_policy_bits |= access->contributing_policies",
or something like that.
And in the end, you check that seen_policy_bits is a superset of
required_bits - something like `(~seen_policy_bits) & required_bits ==
0`.

AFAICS to create a new domain from a bunch of layers, you wouldn't
have to do any path walking.

> Here is my updated proposition: add a layer level and a depth to each
> rule (once enforced/merged with a domain), and a top layer level for a
> domain. When enforcing a ruleset (i.e. merging a ruleset into the
> current domain), the layer level of a new rule would be the incremented
> top layer level.
> If there is no rule (from this domain) tied to the same
> inode, then the depth of the new rule is 1. However, if there is already
> a rule tied to the same inode and if this rule's layer level is the
> previous top layer level, then the depth and the layer level are both
> incremented and the rule is updated with the new access rights (boolean
> AND).
>
> The policy looks like this:
> domain top_layer=2
> /a RW policy_bitmask=0x00000003 layer=1 depth=1
> /a/b R policy_bitmask=0x00000002 layer=2 depth=1
>
> The path walk access check walks through all inodes and start with a
> layer counter equal to the top layer of the current domain. For each
> encountered inode tied to a rule, the access rights are checked and a
> new check ensures that the layer of the matching rule is the same as the
> counter (this may be a merged ruleset containing rules pertaining to the
> same hierarchy, which is fine) or equal to the decremented counter (i.e.
> the path walk just reached the underlying layer). If the path walk
> encounter a rule with a layer strictly less than the counter minus one,
> there is a whole in the layers which means that the ruleset
> hierarchy/subset does not match, and the access must be denied.
>
> When accessing a file at /private/b/foo for a read access:
> /private/b/foo <no rules>
>   allowed_access=unknown layer_counter=2
> /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1>
>   allowed_access=allowed layer_counter=2
> /private <no rules>
>   allowed_access=allowed layer_counter=2
> / <no rules>
>   allowed_access=allowed layer_counter=2
>
> Because the layer_counter didn't reach 1, the access request is then denied.
>
> This proposition enables not to rely on a parent ruleset at first, only
> when enforcing/merging a ruleset with a domain. This also solves the
> issue with multiple inherited/nested rules on the same inode (in which
> case the depth just grows). Moreover, this enables to safely stop the
> path walk as soon as we reach the layer 1.

(FWIW, you could do the same optimization with the seen_policy_bits approach.)

I guess the difference between your proposal and mine is that in my
proposal, the following would work, in effect permitting W access to
/foo/bar/baz (and nothing else)?

first ruleset:
  /foo W
second ruleset:
  /foo/bar/baz W
third ruleset:
  /foo/bar W

whereas in your proposal, IIUC it wouldn't be valid for a new ruleset
to whitelist a superset of what was whitelisted in a previous ruleset?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-18 23:33             ` Jann Horn
@ 2020-03-19 16:58               ` Mickaël Salaün
  2020-03-19 21:17                 ` Jann Horn
  0 siblings, 1 reply; 34+ messages in thread
From: Mickaël Salaün @ 2020-03-19 16:58 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers


On 19/03/2020 00:33, Jann Horn wrote:
> On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün <mic@digikod.net> wrote:
>> On 17/03/2020 20:45, Jann Horn wrote:
>>> On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün <mic@digikod.net> wrote:
>>>> On 17/03/2020 17:19, Jann Horn wrote:
>>>>> On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün <mic@digikod.net> wrote:
>>>>>> On 10/03/2020 00:44, Jann Horn wrote:
>>>>>>> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@digikod.net> wrote:
>>>>
>>>> [...]
>>>>
>>>>>>> Aside from those things, there is also a major correctness issue where
>>>>>>> I'm not sure how to solve it properly:
>>>>>>>
>>>>>>> Let's say a process installs a filter on itself like this:
>>>>>>>
>>>>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
>>>>>>> ACCESS_FS_ROUGHLY_WRITE};
>>>>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
>>>>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
>>>>>>> struct landlock_attr_path_beneath path_beneath = {
>>>>>>>   .ruleset_fd = ruleset_fd,
>>>>>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>>>>>>>   .parent_fd = open("/tmp/foobar", O_PATH),
>>>>>>> };
>>>>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
>>>>>>> sizeof(path_beneath), &path_beneath);
>>>>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>>>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
>>>>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
>>>>>>> sizeof(attr_enforce), &attr_enforce);
>>>>>>>
>>>>>>> At this point, the process is not supposed to be able to write to
>>>>>>> anything outside /tmp/foobar, right? But what happens if the process
>>>>>>> does the following next?
>>>>>>>
>>>>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs =
>>>>>>> ACCESS_FS_ROUGHLY_WRITE};
>>>>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
>>>>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
>>>>>>> struct landlock_attr_path_beneath path_beneath = {
>>>>>>>   .ruleset_fd = ruleset_fd,
>>>>>>>   .allowed_access = ACCESS_FS_ROUGHLY_WRITE,
>>>>>>>   .parent_fd = open("/", O_PATH),
>>>>>>> };
>>>>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
>>>>>>> sizeof(path_beneath), &path_beneath);
>>>>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>>>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd };
>>>>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
>>>>>>> sizeof(attr_enforce), &attr_enforce);
>>>>>>>
>>>>>>> As far as I can tell from looking at the source, after this, you will
>>>>>>> have write access to the entire filesystem again. I think the idea is
>>>>>>> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges,
>>>>>>> not increase them, right?
>>>>>>
>>>>>> There is an additionnal check in syscall.c:get_path_from_fd(): it is
>>>>>> forbidden to add a rule with a path which is not accessible (according
>>>>>> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(),
>>>>>> but this is definitely not perfect.
>>>>>
>>>>> Ah, I missed that.
>>>>>
>>>>>>> I think the easy way to fix this would be to add a bitmask to each
>>>>>>> rule that says from which ruleset it originally comes, and then let
>>>>>>> check_access_path() collect these bitmasks from each rule with OR, and
>>>>>>> check at the end whether the resulting bitmask is full - if not, at
>>>>>>> least one of the rulesets did not permit the access, and it should be
>>>>>>> denied.
>>>>>>>
>>>>>>> But maybe it would make more sense to change how the API works
>>>>>>> instead, and get rid of the concept of "merging" two rulesets
>>>>>>> together? Instead, we could make the API work like this:
>>>>>>>
>>>>>>>  - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
>>>>>>> ->private_data contains a pointer to the old ruleset of the process,
>>>>>>> as well as a pointer to a new empty ruleset.
>>>>>>>  - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
>>>>>>> permitted by the old ruleset, then adds the rule to the new ruleset
>>>>>>>  - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
>>>>>>> ->private_data doesn't match the current ruleset of the process, then
>>>>>>> replaces the old ruleset with the new ruleset.
>>>>>>>
>>>>>>> With this, the new ruleset is guaranteed to be a subset of the old
>>>>>>> ruleset because each of the new ruleset's rules is permitted by the
>>>>>>> old ruleset. (Unless the directory hierarchy rotates, but in that case
>>>>>>> the inaccuracy isn't much worse than what would've been possible
>>>>>>> through RCU path walk anyway AFAIK.)
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>
>>>>>> I would prefer to add the same checks you described at first (with
>>>>>> check_access_path), but only when creating a new ruleset with
>>>>>> merge_ruleset() (which should probably be renamed). This enables not to
>>>>>> rely on a parent ruleset/domain until the enforcement, which is the case
>>>>>> anyway.
>>>>>> Unfortunately this doesn't work for some cases with bind mounts. Because
>>>>>> check_access_path() goes through one path, another (bind mounted) path
>>>>>> could be illegitimately allowed.
>>>>>
>>>>> Hmm... I'm not sure what you mean. At the moment, landlock doesn't
>>>>> allow any sandboxed process to change the mount hierarchy, right? Can
>>>>> you give an example where this would go wrong?
>>>>
>>>> Indeed, a Landlocked process must no be able to change its mount
>>>> namespace layout. However, bind mounts may already exist.
>>>> Let's say a process sandbox itself to only access /a in a read-write
>>>> way.
>>>
>>> So, first policy:
>>>
>>> /a RW
>>>
>>>> Then, this process (or one of its children) add a new restriction
>>>> on /a/b to only be able to read this hierarchy.
>>>
>>> You mean with the second policy looking like this?
>>
>> Right.
>>
>>>
>>> /a RW
>>> /a/b R
>>>
>>> Then the resulting policy would be:
>>>
>>> /a RW policy_bitmask=0x00000003 (bits 0 and 1 set)
>>> /a/b R policy_bitmask=0x00000002 (bit 1 set)
>>> required_bits=0x00000003 (bits 0 and 1 set)
>>>
>>>> The check at insertion
>>>> time would allow this because this access right is a subset of the
>>>> access right allowed with the parent directory. However, If /a/b is bind
>>>> mounted somewhere else, let's say in /private/b, then the second
>>>> enforcement just gave new access rights to this hierarchy too.
>>>
>>> But with the solution I proposed, landlock's path walk would see
>>> something like this when accessing a file at /private/b/foo:
>>> /private/b/foo <no rules>
>>>   policies seen until now: 0x00000000
>>> /private/b <access: R, policy_bitmask=0x00000002>
>>>   policies seen until now: 0x00000002
>>> /private <no rules>
>>>   policies seen until now: 0x00000002
>>> / <no rules>
>>>   policies seen until now: 0x00000002
>>>
>>> It wouldn't encounter any rule from the first policy, so the OR of the
>>> seen policy bitmasks would be 0x00000002, which is not the required
>>> value 0x00000003, and so the access would be denied.
>> As I understand your proposition, we need to build the required_bits
>> when adding a rule or enforcing/merging a ruleset with a domain. The
>> issue is that a rule only refers to a struct inode, not a struct path.
>> For your proposition to work, we would need to walk through the file
>> path when adding a rule to a ruleset, which means that we need to depend
>> of the current view of the process (i.e. its mount namespace), and its
>> Landlock domain.
> 
> I don't see why that is necessary. Why would we have to walk the file
> path when adding a rule?
> 
>> If the required_bits field is set when the ruleset is
>> merged with the domain, it is not possible anymore to walk through the
>> corresponding initial file path, which makes the enforcement step too
>> late to check for such consistency. The important point is that a
>> ruleset/domain doesn't have a notion of file hierarchy, a ruleset is
>> only a set of tagged inodes.
>>
>> I'm not sure I got your proposition right, though. When and how would
>> you generate the required_bits?
> 
> Using your terminology:
> A domain is a collection of N layers, which are assigned indices 0..N-1.
> For each possible access type, a domain has a bitmask containing N
> bits that stores which layers control that access type. (Basically a
> per-layer version of fs_access_mask.)

OK, so there is a bit for each domain, which means that you get a limit
of, let's say 64 layers? Knowing that each layer can be created by a
standalone application, potentially nested in a bunch of layers, this
seems artificially limiting.

> To validate an access, you start by ORing together the bitmasks for
> the requested access types; that gives you the required_bits mask,
> which lists all layers that want to control the access.
> Then you set seen_policy_bits=0, then do the
> check_access_path_continue() loop while keeping track of which layers
> you've seen with "seen_policy_bits |= access->contributing_policies",
> or something like that.
> And in the end, you check that seen_policy_bits is a superset of
> required_bits - something like `(~seen_policy_bits) & required_bits ==
> 0`.
> 
> AFAICS to create a new domain from a bunch of layers, you wouldn't
> have to do any path walking.

Right, I misunderstood your previous email.

> 
>> Here is my updated proposition: add a layer level and a depth to each
>> rule (once enforced/merged with a domain), and a top layer level for a
>> domain. When enforcing a ruleset (i.e. merging a ruleset into the
>> current domain), the layer level of a new rule would be the incremented
>> top layer level.
>> If there is no rule (from this domain) tied to the same
>> inode, then the depth of the new rule is 1. However, if there is already
>> a rule tied to the same inode and if this rule's layer level is the
>> previous top layer level, then the depth and the layer level are both
>> incremented and the rule is updated with the new access rights (boolean
>> AND).
>>
>> The policy looks like this:
>> domain top_layer=2
>> /a RW policy_bitmask=0x00000003 layer=1 depth=1
>> /a/b R policy_bitmask=0x00000002 layer=2 depth=1
>>
>> The path walk access check walks through all inodes and start with a
>> layer counter equal to the top layer of the current domain. For each
>> encountered inode tied to a rule, the access rights are checked and a
>> new check ensures that the layer of the matching rule is the same as the
>> counter (this may be a merged ruleset containing rules pertaining to the
>> same hierarchy, which is fine) or equal to the decremented counter (i.e.
>> the path walk just reached the underlying layer). If the path walk
>> encounter a rule with a layer strictly less than the counter minus one,
>> there is a whole in the layers which means that the ruleset
>> hierarchy/subset does not match, and the access must be denied.
>>
>> When accessing a file at /private/b/foo for a read access:
>> /private/b/foo <no rules>
>>   allowed_access=unknown layer_counter=2
>> /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1>
>>   allowed_access=allowed layer_counter=2
>> /private <no rules>
>>   allowed_access=allowed layer_counter=2
>> / <no rules>
>>   allowed_access=allowed layer_counter=2
>>
>> Because the layer_counter didn't reach 1, the access request is then denied.
>>
>> This proposition enables not to rely on a parent ruleset at first, only
>> when enforcing/merging a ruleset with a domain. This also solves the
>> issue with multiple inherited/nested rules on the same inode (in which
>> case the depth just grows). Moreover, this enables to safely stop the
>> path walk as soon as we reach the layer 1.
> 
> (FWIW, you could do the same optimization with the seen_policy_bits approach.)
> 
> I guess the difference between your proposal and mine is that in my
> proposal, the following would work, in effect permitting W access to
> /foo/bar/baz (and nothing else)?
> 
> first ruleset:
>   /foo W
> second ruleset:
>   /foo/bar/baz W
> third ruleset:
>   /foo/bar W
> 
> whereas in your proposal, IIUC it wouldn't be valid for a new ruleset
> to whitelist a superset of what was whitelisted in a previous ruleset?
> 

This behavior seems dangerous because a process which sandbox itself to
only access /foo/bar W can bypass the restrictions from one of its
parent domains (i.e. only access /foo/bar/baz W). Indeed, each layer is
(most of the time) a different and standalone security policy.

To sum up, the bitmask approach doesn't have the notion of layers
ordering. It is then not possible to check that a rule comes from a
domain which is the direct ancestor of a child's domain. I want each
policy/layer to be really nested in the sense that a process sandboxing
itself can only add more restriction to itself with regard to its parent
domain (and the whole hierarchy). This is a similar approach to
seccomp-bpf (with chained filters), except there is almost no overhead
to nest several policies/layers together because they are flattened.
Using the layer level and depth approach enables to implement this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v14 00/10] Landlock LSM
  2020-03-19 16:58               ` Mickaël Salaün
@ 2020-03-19 21:17                 ` Jann Horn
  0 siblings, 0 replies; 34+ messages in thread
From: Jann Horn @ 2020-03-19 21:17 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: kernel list, Al Viro, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Greg Kroah-Hartman, James Morris, Jann Horn,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Serge E . Hallyn, Shuah Khan,
	Vincent Dagonneau, Kernel Hardening, Linux API, linux-arch,
	linux-doc, linux-fsdevel, open list:KERNEL SELFTEST FRAMEWORK,
	linux-security-module, the arch/x86 maintainers

On Thu, Mar 19, 2020 at 5:58 PM Mickaël Salaün <mic@digikod.net> wrote:
> On 19/03/2020 00:33, Jann Horn wrote:
> > On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün <mic@digikod.net> wrote:
[...]
> >> As I understand your proposition, we need to build the required_bits
> >> when adding a rule or enforcing/merging a ruleset with a domain. The
> >> issue is that a rule only refers to a struct inode, not a struct path.
> >> For your proposition to work, we would need to walk through the file
> >> path when adding a rule to a ruleset, which means that we need to depend
> >> of the current view of the process (i.e. its mount namespace), and its
> >> Landlock domain.
> >
> > I don't see why that is necessary. Why would we have to walk the file
> > path when adding a rule?
> >
> >> If the required_bits field is set when the ruleset is
> >> merged with the domain, it is not possible anymore to walk through the
> >> corresponding initial file path, which makes the enforcement step too
> >> late to check for such consistency. The important point is that a
> >> ruleset/domain doesn't have a notion of file hierarchy, a ruleset is
> >> only a set of tagged inodes.
> >>
> >> I'm not sure I got your proposition right, though. When and how would
> >> you generate the required_bits?
> >
> > Using your terminology:
> > A domain is a collection of N layers, which are assigned indices 0..N-1.
> > For each possible access type, a domain has a bitmask containing N
> > bits that stores which layers control that access type. (Basically a
> > per-layer version of fs_access_mask.)
>
> OK, so there is a bit for each domain, which means that you get a limit
> of, let's say 64 layers? Knowing that each layer can be created by a
> standalone application, potentially nested in a bunch of layers, this
> seems artificially limiting.

Yes, that is a downside of my approach.

> > To validate an access, you start by ORing together the bitmasks for
> > the requested access types; that gives you the required_bits mask,
> > which lists all layers that want to control the access.
> > Then you set seen_policy_bits=0, then do the
> > check_access_path_continue() loop while keeping track of which layers
> > you've seen with "seen_policy_bits |= access->contributing_policies",
> > or something like that.
> > And in the end, you check that seen_policy_bits is a superset of
> > required_bits - something like `(~seen_policy_bits) & required_bits ==
> > 0`.
> >
> > AFAICS to create a new domain from a bunch of layers, you wouldn't
> > have to do any path walking.
>
> Right, I misunderstood your previous email.
>
> >
> >> Here is my updated proposition: add a layer level and a depth to each
> >> rule (once enforced/merged with a domain), and a top layer level for a
> >> domain. When enforcing a ruleset (i.e. merging a ruleset into the
> >> current domain), the layer level of a new rule would be the incremented
> >> top layer level.
> >> If there is no rule (from this domain) tied to the same
> >> inode, then the depth of the new rule is 1. However, if there is already
> >> a rule tied to the same inode and if this rule's layer level is the
> >> previous top layer level, then the depth and the layer level are both
> >> incremented and the rule is updated with the new access rights (boolean
> >> AND).
> >>
> >> The policy looks like this:
> >> domain top_layer=2
> >> /a RW policy_bitmask=0x00000003 layer=1 depth=1
> >> /a/b R policy_bitmask=0x00000002 layer=2 depth=1
> >>
> >> The path walk access check walks through all inodes and start with a
> >> layer counter equal to the top layer of the current domain. For each
> >> encountered inode tied to a rule, the access rights are checked and a
> >> new check ensures that the layer of the matching rule is the same as the
> >> counter (this may be a merged ruleset containing rules pertaining to the
> >> same hierarchy, which is fine) or equal to the decremented counter (i.e.
> >> the path walk just reached the underlying layer). If the path walk
> >> encounter a rule with a layer strictly less than the counter minus one,
> >> there is a whole in the layers which means that the ruleset
> >> hierarchy/subset does not match, and the access must be denied.
> >>
> >> When accessing a file at /private/b/foo for a read access:
> >> /private/b/foo <no rules>
> >>   allowed_access=unknown layer_counter=2
> >> /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1>
> >>   allowed_access=allowed layer_counter=2
> >> /private <no rules>
> >>   allowed_access=allowed layer_counter=2
> >> / <no rules>
> >>   allowed_access=allowed layer_counter=2
> >>
> >> Because the layer_counter didn't reach 1, the access request is then denied.
> >>
> >> This proposition enables not to rely on a parent ruleset at first, only
> >> when enforcing/merging a ruleset with a domain. This also solves the
> >> issue with multiple inherited/nested rules on the same inode (in which
> >> case the depth just grows). Moreover, this enables to safely stop the
> >> path walk as soon as we reach the layer 1.
> >
> > (FWIW, you could do the same optimization with the seen_policy_bits approach.)
> >
> > I guess the difference between your proposal and mine is that in my
> > proposal, the following would work, in effect permitting W access to
> > /foo/bar/baz (and nothing else)?
> >
> > first ruleset:
> >   /foo W
> > second ruleset:
> >   /foo/bar/baz W
> > third ruleset:
> >   /foo/bar W
> >
> > whereas in your proposal, IIUC it wouldn't be valid for a new ruleset
> > to whitelist a superset of what was whitelisted in a previous ruleset?
> >
>
> This behavior seems dangerous because a process which sandbox itself to
> only access /foo/bar W can bypass the restrictions from one of its
> parent domains (i.e. only access /foo/bar/baz W). Indeed, each layer is
> (most of the time) a different and standalone security policy.

It isn't actually bypassing the restriction: You still can't actually
access files like /foo/bar/blah, because a path walk from there
doesn't encounter any rules from the second ruleset.

> To sum up, the bitmask approach doesn't have the notion of layers
> ordering. It is then not possible to check that a rule comes from a
> domain which is the direct ancestor of a child's domain. I want each
> policy/layer to be really nested in the sense that a process sandboxing
> itself can only add more restriction to itself with regard to its parent
> domain (and the whole hierarchy). This is a similar approach to
> seccomp-bpf (with chained filters), except there is almost no overhead
> to nest several policies/layers together because they are flattened.
> Using the layer level and depth approach enables to implement this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, back to index

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
2020-02-25 20:49   ` Jann Horn
2020-02-26 15:31     ` Mickaël Salaün
2020-02-26 20:24       ` Jann Horn
2020-02-27 16:46         ` Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 02/10] landlock: Add ruleset and domain management Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 03/10] landlock: Set up the security framework and manage credentials Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 04/10] landlock: Add ptrace restrictions Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control Mickaël Salaün
2020-02-26 20:29   ` Jann Horn
2020-02-27 16:50     ` Mickaël Salaün
2020-02-27 16:51       ` Jann Horn
2020-02-24 16:02 ` [RFC PATCH v14 06/10] landlock: Add syscall implementation Mickaël Salaün
2020-03-17 16:47   ` Al Viro
2020-03-17 17:51     ` Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 07/10] arch: Wire up landlock() syscall Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 08/10] selftests/landlock: Add initial tests Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 09/10] samples/landlock: Add a sandbox manager example Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 10/10] landlock: Add user and kernel documentation Mickaël Salaün
2020-02-29 17:23   ` Randy Dunlap
2020-03-02 10:03     ` Mickaël Salaün
2020-02-25 18:49 ` [RFC PATCH v14 00/10] Landlock LSM J Freyensee
2020-02-26 15:34   ` Mickaël Salaün
     [not found] ` <20200227042002.3032-1-hdanton@sina.com>
2020-02-27 17:01   ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
2020-03-09 23:44 ` [RFC PATCH v14 00/10] Landlock LSM Jann Horn
2020-03-11 23:38   ` Mickaël Salaün
2020-03-17 16:19     ` Jann Horn
2020-03-17 17:50       ` Mickaël Salaün
2020-03-17 19:45         ` Jann Horn
2020-03-18 12:06           ` Mickaël Salaün
2020-03-18 23:33             ` Jann Horn
2020-03-19 16:58               ` Mickaël Salaün
2020-03-19 21:17                 ` Jann Horn

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git