bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
@ 2023-04-12  4:32 Andrii Nakryiko
  2023-04-12  4:32 ` [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
                   ` (8 more replies)
  0 siblings, 9 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
are meant to allow highly-granular LSM-based control over the usage of BPF
subsytem. Specifically, to control the creation of BPF maps and BTF data
objects, which are fundamental building blocks of any modern BPF application.

These new hooks are able to override default kernel-side CAP_BPF-based (and
sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
implement LSM policies that could granularly enforce more restrictions on
a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
capabilities), but also, importantly, allow to *bypass kernel-side
enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
cases. The decision about trust for a particular process is delegated to
custom LSM policy implementation. Such setup allows to implement safe and
highly-granular trust-based unprivileged BPF map creation, which is a first
step and a prerequisite towards implementing full-fledged trusted unprivileged
BPF application workflow. Similar approach seems to be implemented by some
other existing LSM hooks, e.g., vm_enough_memory().

Such LSM hook semantics gives ability to have safer-by-default policy of not
giving applications any of the CAP_BPF/CAP_PERFMON/CAP_NET_ADMIN capabilities,
normally required to be able to use BPF subsystem in the kernel. Instead, all
the BPF processes could be left completely unprivileged, and only allowlisted
exceptions for trusted and verified production use cases would be granted
permission to work with bpf() syscall, as if those application had root-like
capabilities. 

This patch set implements and demonstrates an overall approach starting with
BPF map and BTF object creation, first two steps in the lifetime of a typical
BPF applications. Next step would be to do similar changes for BPF_PROG_LOAD
command to allow BPF program loading and verificatlion. This will be
implemented in a follow up patch set and will follow the same approach as
implemented in this patch set.

Patches #1-#3 are refactorings that allow to add new LSM hook in one
centralized place. Patch #4 is where we add and implement LSM hook for
BPF_MAP_CREATE command. Patch #5 adds tests that validates that LSM hook works
as expected: we implement a trivial BPF LSM policy allowing unprivileged BPF
map creation for test_prog's process only. Patch #6 drops unnecessary CAP_BPF
restriction for BPF_MAP_FREEZE command, which seems to slip through the craack
during refactoring to remove extra capability restrictions for commands that
accept FDs of BPF objects. Patches #7 add bpf_btf_load_security LSM hook to
control BTF object load, and patch #8 adds extra tests for that hook.

Andrii Nakryiko (8):
  bpf: move unprivileged checks into map_create() and bpf_prog_load()
  bpf: inline map creation logic in map_create() function
  bpf: centralize permissions checks for all BPF map types
  bpf, lsm: implement bpf_map_create_security LSM hook
  selftests/bpf: validate new bpf_map_create_security LSM hook
  bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  bpf, lsm: implement bpf_btf_load_security LSM hook
  selftests/bpf: enhance lsm_map_create test with BTF LSM control

 include/linux/lsm_hook_defs.h                 |   2 +
 include/linux/lsm_hooks.h                     |  25 +++
 include/linux/security.h                      |  12 +
 kernel/bpf/bloom_filter.c                     |   3 -
 kernel/bpf/bpf_local_storage.c                |   3 -
 kernel/bpf/bpf_lsm.c                          |   2 +
 kernel/bpf/bpf_struct_ops.c                   |   3 -
 kernel/bpf/cpumap.c                           |   4 -
 kernel/bpf/devmap.c                           |   3 -
 kernel/bpf/hashtab.c                          |   6 -
 kernel/bpf/lpm_trie.c                         |   3 -
 kernel/bpf/queue_stack_maps.c                 |   4 -
 kernel/bpf/reuseport_array.c                  |   3 -
 kernel/bpf/stackmap.c                         |   3 -
 kernel/bpf/syscall.c                          | 177 ++++++++++-----
 net/core/sock_map.c                           |   4 -
 net/xdp/xskmap.c                              |   4 -
 security/security.c                           |   8 +
 .../selftests/bpf/prog_tests/lsm_map_create.c | 208 ++++++++++++++++++
 .../bpf/prog_tests/unpriv_bpf_disabled.c      |   6 +-
 tools/testing/selftests/bpf/progs/just_maps.c |  56 +++++
 .../selftests/bpf/progs/lsm_map_create.c      |  47 ++++
 tools/testing/selftests/bpf/test_progs.h      |   6 +
 23 files changed, 494 insertions(+), 98 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
 create mode 100644 tools/testing/selftests/bpf/progs/just_maps.c
 create mode 100644 tools/testing/selftests/bpf/progs/lsm_map_create.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
@ 2023-04-12  4:32 ` Andrii Nakryiko
  2023-04-12 17:49   ` Kees Cook
  2023-04-12  4:32 ` [PATCH bpf-next 2/8] bpf: inline map creation logic in map_create() function Andrii Nakryiko
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Make each bpf() syscall command a bit more self-contained, making it
easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
handling down to map_create() and bpf_prog_load(), two special commands
in this regard.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6d575505f89c..c1d268025985 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1130,6 +1130,17 @@ static int map_create(union bpf_attr *attr)
 	int f_flags;
 	int err;
 
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
+		return -EPERM;
+
 	err = CHECK_ATTR(BPF_MAP_CREATE);
 	if (err)
 		return -EINVAL;
@@ -2512,6 +2523,17 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	char license[128];
 	bool is_gpl;
 
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
+		return -EPERM;
+
 	if (CHECK_ATTR(BPF_PROG_LOAD))
 		return -EINVAL;
 
@@ -5008,23 +5030,8 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
-	bool capable;
 	int err;
 
-	capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled;
-
-	/* Intent here is for unprivileged_bpf_disabled to block key object
-	 * creation commands for unprivileged users; other actions depend
-	 * of fd availability and access to bpffs, so are dependent on
-	 * object creation success.  Capabilities are later verified for
-	 * operations such as load and map create, so even with unprivileged
-	 * BPF disabled, capability checks are still carried out for these
-	 * and other operations.
-	 */
-	if (!capable &&
-	    (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
-		return -EPERM;
-
 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
 	if (err)
 		return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 2/8] bpf: inline map creation logic in map_create() function
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
  2023-04-12  4:32 ` [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
@ 2023-04-12  4:32 ` Andrii Nakryiko
  2023-04-12 17:53   ` Kees Cook
  2023-04-12  4:32 ` [PATCH bpf-next 3/8] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Keep all the relevant generic sanity checks, permission checks, and
creation and initialization logic in one linear piece of code. Currently
helper function that handles memory allocation and partial
initialization is split apart and is about 1000 lines higher in the
file, hurting readability.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 54 ++++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c1d268025985..a090737f98ea 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -108,37 +108,6 @@ const struct bpf_map_ops bpf_map_offload_ops = {
 	.map_mem_usage = bpf_map_offload_map_mem_usage,
 };
 
-static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
-{
-	const struct bpf_map_ops *ops;
-	u32 type = attr->map_type;
-	struct bpf_map *map;
-	int err;
-
-	if (type >= ARRAY_SIZE(bpf_map_types))
-		return ERR_PTR(-EINVAL);
-	type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));
-	ops = bpf_map_types[type];
-	if (!ops)
-		return ERR_PTR(-EINVAL);
-
-	if (ops->map_alloc_check) {
-		err = ops->map_alloc_check(attr);
-		if (err)
-			return ERR_PTR(err);
-	}
-	if (attr->map_ifindex)
-		ops = &bpf_map_offload_ops;
-	if (!ops->map_mem_usage)
-		return ERR_PTR(-EINVAL);
-	map = ops->map_alloc(attr);
-	if (IS_ERR(map))
-		return map;
-	map->ops = ops;
-	map->map_type = type;
-	return map;
-}
-
 static void bpf_map_write_active_inc(struct bpf_map *map)
 {
 	atomic64_inc(&map->writecnt);
@@ -1124,7 +1093,9 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
+	const struct bpf_map_ops *ops;
 	int numa_node = bpf_map_attr_numa_node(attr);
+	u32 map_type = attr->map_type;
 	struct btf_field_offs *foffs;
 	struct bpf_map *map;
 	int f_flags;
@@ -1167,9 +1138,28 @@ static int map_create(union bpf_attr *attr)
 		return -EINVAL;
 
 	/* find map type and init map: hashtable vs rbtree vs bloom vs ... */
-	map = find_and_alloc_map(attr);
+	map_type = attr->map_type;
+	if (map_type >= ARRAY_SIZE(bpf_map_types))
+		return -EINVAL;
+	map_type = array_index_nospec(map_type, ARRAY_SIZE(bpf_map_types));
+	ops = bpf_map_types[map_type];
+	if (!ops)
+		return -EINVAL;
+
+	if (ops->map_alloc_check) {
+		err = ops->map_alloc_check(attr);
+		if (err)
+			return err;
+	}
+	if (attr->map_ifindex)
+		ops = &bpf_map_offload_ops;
+	if (!ops->map_mem_usage)
+		return -EINVAL;
+	map = ops->map_alloc(attr);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
+	map->ops = ops;
+	map->map_type = map_type;
 
 	err = bpf_obj_name_cpy(map->name, attr->map_name,
 			       sizeof(attr->map_name));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 3/8] bpf: centralize permissions checks for all BPF map types
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
  2023-04-12  4:32 ` [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
  2023-04-12  4:32 ` [PATCH bpf-next 2/8] bpf: inline map creation logic in map_create() function Andrii Nakryiko
@ 2023-04-12  4:32 ` Andrii Nakryiko
  2023-04-12 18:01   ` Kees Cook
  2023-04-12  4:32 ` [PATCH bpf-next 4/8] bpf, lsm: implement bpf_map_create_security LSM hook Andrii Nakryiko
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

This allows to do more centralized decisions later on, and generally
makes it very explicit which maps are privileged and which are not.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/bloom_filter.c                     |  3 -
 kernel/bpf/bpf_local_storage.c                |  3 -
 kernel/bpf/bpf_struct_ops.c                   |  3 -
 kernel/bpf/cpumap.c                           |  4 --
 kernel/bpf/devmap.c                           |  3 -
 kernel/bpf/hashtab.c                          |  6 --
 kernel/bpf/lpm_trie.c                         |  3 -
 kernel/bpf/queue_stack_maps.c                 |  4 --
 kernel/bpf/reuseport_array.c                  |  3 -
 kernel/bpf/stackmap.c                         |  3 -
 kernel/bpf/syscall.c                          | 70 ++++++++++++++++---
 net/core/sock_map.c                           |  4 --
 net/xdp/xskmap.c                              |  4 --
 .../bpf/prog_tests/unpriv_bpf_disabled.c      |  6 +-
 14 files changed, 64 insertions(+), 55 deletions(-)

diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index 540331b610a9..addf3dd57b59 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -86,9 +86,6 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_bloom_filter *bloom;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	if (attr->key_size != 0 || attr->value_size == 0 ||
 	    attr->max_entries == 0 ||
 	    attr->map_flags & ~BLOOM_CREATE_FLAG_MASK ||
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index dab2ff4c99d9..2bb35b1c3740 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -720,9 +720,6 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
 	    !attr->btf_key_type_id || !attr->btf_value_type_id)
 		return -EINVAL;
 
-	if (!bpf_capable())
-		return -EPERM;
-
 	if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
 		return -E2BIG;
 
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index d3f0a4825fa6..116a0ce378ec 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -655,9 +655,6 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 	const struct btf_type *t, *vt;
 	struct bpf_map *map;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	st_ops = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id);
 	if (!st_ops)
 		return ERR_PTR(-ENOTSUPP);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8ec18faa74ac..8a33e8747a0e 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -28,7 +28,6 @@
 #include <linux/sched.h>
 #include <linux/workqueue.h>
 #include <linux/kthread.h>
-#include <linux/capability.h>
 #include <trace/events/xdp.h>
 #include <linux/btf_ids.h>
 
@@ -89,9 +88,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	u32 value_size = attr->value_size;
 	struct bpf_cpu_map *cmap;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
 	    (value_size != offsetofend(struct bpf_cpumap_val, qsize) &&
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 802692fa3905..49cc0b5671c6 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -160,9 +160,6 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	struct bpf_dtab *dtab;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
-
 	dtab = bpf_map_area_alloc(sizeof(*dtab), NUMA_NO_NODE);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 00c253b84bf5..c69db80fc947 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -422,12 +422,6 @@ static int htab_map_alloc_check(union bpf_attr *attr)
 	BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
 		     offsetof(struct htab_elem, hash_node.pprev));
 
-	if (lru && !bpf_capable())
-		/* LRU implementation is much complicated than other
-		 * maps.  Hence, limit to CAP_BPF.
-		 */
-		return -EPERM;
-
 	if (zero_seed && !capable(CAP_SYS_ADMIN))
 		/* Guard against local DoS, and discourage production use. */
 		return -EPERM;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index e0d3ddf2037a..17c7e7782a1f 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -544,9 +544,6 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 {
 	struct lpm_trie *trie;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 ||
 	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 601609164ef3..8d2ddcb7566b 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -7,7 +7,6 @@
 #include <linux/bpf.h>
 #include <linux/list.h>
 #include <linux/slab.h>
-#include <linux/capability.h>
 #include <linux/btf_ids.h>
 #include "percpu_freelist.h"
 
@@ -46,9 +45,6 @@ static bool queue_stack_map_is_full(struct bpf_queue_stack *qs)
 /* Called from syscall */
 static int queue_stack_map_alloc_check(union bpf_attr *attr)
 {
-	if (!bpf_capable())
-		return -EPERM;
-
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 0 ||
 	    attr->value_size == 0 ||
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index cbf2d8d784b8..4b4f9670f1a9 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -151,9 +151,6 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct reuseport_array *array;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	/* allocate all map elements and zero-initialize them */
 	array = bpf_map_area_alloc(struct_size(array, ptrs, attr->max_entries), numa_node);
 	if (!array)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b25fce425b2c..458bb80b14d5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -74,9 +74,6 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	u64 cost, n_buckets;
 	int err;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	if (attr->map_flags & ~STACK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a090737f98ea..cbea4999e92f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1101,17 +1101,6 @@ static int map_create(union bpf_attr *attr)
 	int f_flags;
 	int err;
 
-	/* Intent here is for unprivileged_bpf_disabled to block key object
-	 * creation commands for unprivileged users; other actions depend
-	 * of fd availability and access to bpffs, so are dependent on
-	 * object creation success.  Capabilities are later verified for
-	 * operations such as load and map create, so even with unprivileged
-	 * BPF disabled, capability checks are still carried out for these
-	 * and other operations.
-	 */
-	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
-		return -EPERM;
-
 	err = CHECK_ATTR(BPF_MAP_CREATE);
 	if (err)
 		return -EINVAL;
@@ -1155,6 +1144,65 @@ static int map_create(union bpf_attr *attr)
 		ops = &bpf_map_offload_ops;
 	if (!ops->map_mem_usage)
 		return -EINVAL;
+
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
+		return -EPERM;
+
+	/* check privileged map type permissions */
+	switch (map_type) {
+	case BPF_MAP_TYPE_SK_STORAGE:
+	case BPF_MAP_TYPE_INODE_STORAGE:
+	case BPF_MAP_TYPE_TASK_STORAGE:
+	case BPF_MAP_TYPE_CGRP_STORAGE:
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+	case BPF_MAP_TYPE_LPM_TRIE:
+	case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
+	case BPF_MAP_TYPE_STACK_TRACE:
+	case BPF_MAP_TYPE_QUEUE:
+	case BPF_MAP_TYPE_STACK:
+	case BPF_MAP_TYPE_LRU_HASH:
+	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
+	case BPF_MAP_TYPE_STRUCT_OPS:
+	case BPF_MAP_TYPE_CPUMAP:
+		if (!bpf_capable())
+			return -EPERM;
+		break;
+	case BPF_MAP_TYPE_SOCKMAP:
+	case BPF_MAP_TYPE_SOCKHASH:
+	case BPF_MAP_TYPE_DEVMAP:
+	case BPF_MAP_TYPE_DEVMAP_HASH:
+	case BPF_MAP_TYPE_XSKMAP:
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
+		break;
+	case BPF_MAP_TYPE_ARRAY:
+	case BPF_MAP_TYPE_PERCPU_ARRAY:
+	case BPF_MAP_TYPE_PROG_ARRAY:
+	case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
+	case BPF_MAP_TYPE_CGROUP_ARRAY:
+	case BPF_MAP_TYPE_ARRAY_OF_MAPS:
+	case BPF_MAP_TYPE_HASH:
+	case BPF_MAP_TYPE_PERCPU_HASH:
+	case BPF_MAP_TYPE_HASH_OF_MAPS:
+	case BPF_MAP_TYPE_RINGBUF:
+	case BPF_MAP_TYPE_USER_RINGBUF:
+	case BPF_MAP_TYPE_CGROUP_STORAGE:
+	case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
+		/* unprivileged */
+		break;
+	default:
+		WARN(1, "unsupported map type %d", map_type);
+		return -EPERM;
+	}
+
 	map = ops->map_alloc(attr);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 7c189c2e2fbf..4b67bb5e7f9c 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -32,8 +32,6 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_stab *stab;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
 	if (attr->max_entries == 0 ||
 	    attr->key_size    != 4 ||
 	    (attr->value_size != sizeof(u32) &&
@@ -1085,8 +1083,6 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	struct bpf_shtab *htab;
 	int i, err;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
 	if (attr->max_entries == 0 ||
 	    attr->key_size    == 0 ||
 	    (attr->value_size != sizeof(u32) &&
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index 2c1427074a3b..e1c526f97ce3 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -5,7 +5,6 @@
 
 #include <linux/bpf.h>
 #include <linux/filter.h>
-#include <linux/capability.h>
 #include <net/xdp_sock.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
@@ -68,9 +67,6 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	int numa_node;
 	u64 size;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
-
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
 	    attr->value_size != 4 ||
 	    attr->map_flags & ~(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY))
diff --git a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
index 8383a99f610f..0adf8d9475cb 100644
--- a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
+++ b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
@@ -171,7 +171,11 @@ static void test_unpriv_bpf_disabled_negative(struct test_unpriv_bpf_disabled *s
 				prog_insns, prog_insn_cnt, &load_opts),
 		  -EPERM, "prog_load_fails");
 
-	for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_BLOOM_FILTER; i++)
+	/* some map types require particular correct parameters which could be
+	 * sanity-checked before enforcing -EPERM, so only validate that
+	 * the simple ARRAY and HASH maps are failing with -EPERM
+	 */
+	for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_ARRAY; i++)
 		ASSERT_EQ(bpf_map_create(i, NULL, sizeof(int), sizeof(int), 1, NULL),
 			  -EPERM, "map_create_fails");
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 4/8] bpf, lsm: implement bpf_map_create_security LSM hook
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-04-12  4:32 ` [PATCH bpf-next 3/8] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
@ 2023-04-12  4:32 ` Andrii Nakryiko
  2023-04-12 18:20   ` Kees Cook
  2023-04-12  4:32 ` [PATCH bpf-next 5/8] selftests/bpf: validate new " Andrii Nakryiko
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Add new LSM hook, bpf_map_create_security, that allows custom LSM
security policies controlling BPF map creation permissions granularly
and precisely.

This new LSM hook allows to implement both LSM policy that could enforce
more granular and restrictive decisions about which processes can create
which BPF maps, by rejecting BPF map creation based on passed in
bpf_attr attributes. But also it allows to bypass CAP_BPF and
CAP_NET_ADMIN restrictions, normally enforced by kernel, for
applications that LSM policy deems trusted. Trustworthiness
determination of the process/user/cgroup/etc is left up to custom LSM
hook implementation and will dependon particular production setup of
each individual use case.

If LSM policy wants to rely on default kernel logic, it can return
0 to delegate back to kernel. If it returns >0 return code,
kernel will bypass its normal checks. This way it's possible to perform
a delegation of trust (specifically for BPF map creation) from
privileged LSM custom policy implementation to unprivileged user
process, verifier and trusted by custom LSM policy.

Such model allows flexible and secure-by-default approach where user
processes that need to use BPF features (BPF map creation, in this case)
are left unprivileged with no CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON, etc.
capabilities, but specific exceptions are implemented (usually in
a centralized server fleet-wide fashion) for trusted
processes/containers/users, allowing them to manipulate BPF facilities,
as long as they are allowed and known apriori.

This patch implements first required part for full-fledged BPF usage:
map creation. The other one, BPF program load, will be addressed in
follow up patches.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/lsm_hook_defs.h |  1 +
 include/linux/lsm_hooks.h     | 12 ++++++++++++
 include/linux/security.h      |  6 ++++++
 kernel/bpf/bpf_lsm.c          |  1 +
 kernel/bpf/syscall.c          | 19 ++++++++++++++++---
 security/security.c           |  4 ++++
 6 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 094b76dc7164..b4fe9ed7021a 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -396,6 +396,7 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
 LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
 LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
 LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
+LSM_HOOK(int, 0, bpf_map_create_security, const union bpf_attr *attr)
 LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
 LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
 LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 6e156d2acffc..42bf7c0aa4d8 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1598,6 +1598,18 @@
  *	@prog: bpf prog that userspace want to use.
  *	Return 0 if permission is granted.
  *
+ * @bpf_map_create_security:
+ *	Do a check to determine permission to create requested BPF map.
+ *	Implementation can override kernel capabilities checks according to
+ *	the rules below:
+ *	  - 0 should be returned to delegate permission checks to other
+ *	    installed LSM callbacks and/or hard-wired kernel logic, which
+ *	    would enforce CAP_BPF/CAP_NET_ADMIN capabilities;
+ *	  - reject BPF map creation by returning -EPERM or any other
+ *	    negative error code;
+ *	  - allow BPF map creation, overriding kernel checks, by returning
+ *	    a positive result.
+ *
  * @bpf_map_alloc_security:
  *	Initialize the security field inside bpf map.
  *	Return 0 on success, error on failure.
diff --git a/include/linux/security.h b/include/linux/security.h
index 5984d0d550b4..e5374fe92ef6 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2023,6 +2023,7 @@ struct bpf_prog_aux;
 extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
 extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
 extern int security_bpf_prog(struct bpf_prog *prog);
+extern int security_bpf_map_create(const union bpf_attr *attr);
 extern int security_bpf_map_alloc(struct bpf_map *map);
 extern void security_bpf_map_free(struct bpf_map *map);
 extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
@@ -2044,6 +2045,11 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
 	return 0;
 }
 
+static inline int security_bpf_map_create(const union bpf_attr *attr)
+{
+	return 0;
+}
+
 static inline int security_bpf_map_alloc(struct bpf_map *map)
 {
 	return 0;
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index e14c822f8911..931d4dda5dac 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -260,6 +260,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 BTF_SET_START(sleepable_lsm_hooks)
 BTF_ID(func, bpf_lsm_bpf)
 BTF_ID(func, bpf_lsm_bpf_map)
+BTF_ID(func, bpf_lsm_bpf_map_create_security)
 BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
 BTF_ID(func, bpf_lsm_bpf_map_free_security)
 BTF_ID(func, bpf_lsm_bpf_prog)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cbea4999e92f..7d1165814efc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -980,7 +980,7 @@ int map_check_no_btf(const struct bpf_map *map,
 }
 
 static int map_check_btf(struct bpf_map *map, const struct btf *btf,
-			 u32 btf_key_id, u32 btf_value_id)
+			 u32 btf_key_id, u32 btf_value_id, bool priv_checked)
 {
 	const struct btf_type *key_type, *value_type;
 	u32 key_size, value_size;
@@ -1008,7 +1008,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
 
-		if (!bpf_capable()) {
+		if (!priv_checked && !bpf_capable()) {
 			ret = -EPERM;
 			goto free_map_tab;
 		}
@@ -1097,10 +1097,12 @@ static int map_create(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 map_type = attr->map_type;
 	struct btf_field_offs *foffs;
+	bool priv_checked = false;
 	struct bpf_map *map;
 	int f_flags;
 	int err;
 
+	/* sanity checks */
 	err = CHECK_ATTR(BPF_MAP_CREATE);
 	if (err)
 		return -EINVAL;
@@ -1145,6 +1147,15 @@ static int map_create(union bpf_attr *attr)
 	if (!ops->map_mem_usage)
 		return -EINVAL;
 
+	/* security checks */
+	err = security_bpf_map_create(attr);
+	if (err < 0)
+		return err;
+	if (err > 0) {
+		priv_checked = true;
+		goto skip_priv_checks;
+	}
+
 	/* Intent here is for unprivileged_bpf_disabled to block key object
 	 * creation commands for unprivileged users; other actions depend
 	 * of fd availability and access to bpffs, so are dependent on
@@ -1203,6 +1214,8 @@ static int map_create(union bpf_attr *attr)
 		return -EPERM;
 	}
 
+skip_priv_checks:
+	/* create and init map */
 	map = ops->map_alloc(attr);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
@@ -1243,7 +1256,7 @@ static int map_create(union bpf_attr *attr)
 
 		if (attr->btf_value_type_id) {
 			err = map_check_btf(map, btf, attr->btf_key_type_id,
-					    attr->btf_value_type_id);
+					    attr->btf_value_type_id, priv_checked);
 			if (err)
 				goto free_map;
 		}
diff --git a/security/security.c b/security/security.c
index cf6cc576736f..f9b885680966 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2682,6 +2682,10 @@ int security_bpf_prog(struct bpf_prog *prog)
 {
 	return call_int_hook(bpf_prog, 0, prog);
 }
+int security_bpf_map_create(const union bpf_attr *attr)
+{
+	return call_int_hook(bpf_map_create_security, 0, attr);
+}
 int security_bpf_map_alloc(struct bpf_map *map)
 {
 	return call_int_hook(bpf_map_alloc_security, 0, map);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 5/8] selftests/bpf: validate new bpf_map_create_security LSM hook
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-04-12  4:32 ` [PATCH bpf-next 4/8] bpf, lsm: implement bpf_map_create_security LSM hook Andrii Nakryiko
@ 2023-04-12  4:32 ` Andrii Nakryiko
  2023-04-12 18:23   ` Kees Cook
  2023-04-12  4:32 ` [PATCH bpf-next 6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command Andrii Nakryiko
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Add selftests that goes over every known map type and validates that
a combination of privileged/unprivileged modes and allow/reject/pass-through
LSM policy decisions behave as expected.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/lsm_map_create.c | 143 ++++++++++++++++++
 .../selftests/bpf/progs/lsm_map_create.c      |  32 ++++
 tools/testing/selftests/bpf/test_progs.h      |   6 +
 3 files changed, 181 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
 create mode 100644 tools/testing/selftests/bpf/progs/lsm_map_create.c

diff --git a/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c b/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
new file mode 100644
index 000000000000..fee78b0448c3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#include "linux/bpf.h"
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "cap_helpers.h"
+#include "lsm_map_create.skel.h"
+
+static int drop_priv_caps(__u64 *old_caps)
+{
+	return cap_disable_effective((1ULL << CAP_BPF) |
+				     (1ULL << CAP_PERFMON) |
+				     (1ULL << CAP_NET_ADMIN) |
+				     (1ULL << CAP_SYS_ADMIN), old_caps);
+}
+
+static int restore_priv_caps(__u64 old_caps)
+{
+	return cap_enable_effective(old_caps, NULL);
+}
+
+void test_lsm_map_create(void)
+{
+	struct btf *btf = NULL;
+	struct lsm_map_create *skel = NULL;
+	const struct btf_type *t;
+	const struct btf_enum *e;
+	int i, n, id, err, ret;
+
+	skel = lsm_map_create__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	skel->bss->my_tid = syscall(SYS_gettid);
+	skel->bss->decision = 0;
+
+	err = lsm_map_create__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	btf = btf__parse("/sys/kernel/btf/vmlinux", NULL);
+	if (!ASSERT_OK_PTR(btf, "btf_parse"))
+		goto cleanup;
+
+	/* find enum bpf_map_type and enumerate each value */
+	id = btf__find_by_name_kind(btf, "bpf_map_type", BTF_KIND_ENUM);
+	if (!ASSERT_GT(id, 0, "bpf_map_type_id"))
+		goto cleanup;
+
+	t = btf__type_by_id(btf, id);
+	e = btf_enum(t);
+	n = btf_vlen(t);
+	for (i = 0; i < n; e++, i++) {
+		enum bpf_map_type map_type = (enum bpf_map_type)e->val;
+		const char *map_type_name;
+		__u64 orig_caps;
+		bool is_map_priv;
+		bool needs_btf;
+
+		if (map_type == BPF_MAP_TYPE_UNSPEC)
+			continue;
+
+		/* this will show which map type we are working with in verbose log */
+		map_type_name = btf__str_by_offset(btf, e->name_off);
+		ASSERT_OK_PTR(map_type_name, map_type_name);
+
+		switch (map_type) {
+		case BPF_MAP_TYPE_ARRAY:
+		case BPF_MAP_TYPE_PERCPU_ARRAY:
+		case BPF_MAP_TYPE_PROG_ARRAY:
+		case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
+		case BPF_MAP_TYPE_CGROUP_ARRAY:
+		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
+		case BPF_MAP_TYPE_HASH:
+		case BPF_MAP_TYPE_PERCPU_HASH:
+		case BPF_MAP_TYPE_HASH_OF_MAPS:
+		case BPF_MAP_TYPE_RINGBUF:
+		case BPF_MAP_TYPE_USER_RINGBUF:
+		case BPF_MAP_TYPE_CGROUP_STORAGE:
+		case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
+			is_map_priv = false;
+			needs_btf = false;
+			break;
+		case BPF_MAP_TYPE_SK_STORAGE:
+		case BPF_MAP_TYPE_INODE_STORAGE:
+		case BPF_MAP_TYPE_TASK_STORAGE:
+		case BPF_MAP_TYPE_CGRP_STORAGE:
+			is_map_priv = true;
+			needs_btf = true;
+			break;
+		default:
+			is_map_priv = true;
+			needs_btf = false;
+		}
+
+		/* make sure we delegate to kernel for final decision */
+		skel->bss->decision = 0;
+
+		/* we are normally under sudo, so all maps should succeed */
+		ret = libbpf_probe_bpf_map_type(map_type, NULL);
+		ASSERT_EQ(ret, 1, "default_priv_mode");
+
+		/* local storage needs custom BTF to be loaded, which we
+		 * currently can't do once we drop privileges, so skip few
+		 * checks for such maps
+		 */
+		if (needs_btf)
+			goto skip_if_needs_btf;
+
+		/* now let's drop privileges, and chech that unpriv maps are
+		 * still possible to create
+		 */
+		if (!ASSERT_OK(drop_priv_caps(&orig_caps), "drop_caps"))
+			goto cleanup;
+
+		ret = libbpf_probe_bpf_map_type(map_type, NULL);
+		ASSERT_EQ(ret, is_map_priv ? 0 : 1,  "default_unpriv_mode");
+
+		/* allow any map creation for our thread */
+		skel->bss->decision = 1;
+		ret = libbpf_probe_bpf_map_type(map_type, NULL);
+		ASSERT_EQ(ret, 1, "lsm_allow_unpriv_mode");
+
+		/* reject any map creation for our thread */
+		skel->bss->decision = -1;
+		ret = libbpf_probe_bpf_map_type(map_type, NULL);
+		ASSERT_EQ(ret, 0, "lsm_reject_unpriv_mode");
+
+		/* restore privileges, but keep reject LSM policy */
+		if (!ASSERT_OK(restore_priv_caps(orig_caps), "restore_caps"))
+			goto cleanup;
+
+skip_if_needs_btf:
+		/* even with all caps map create will fail */
+		skel->bss->decision = -1;
+		ret = libbpf_probe_bpf_map_type(map_type, NULL);
+		ASSERT_EQ(ret, 0, "lsm_reject_priv_mode");
+	}
+
+cleanup:
+	btf__free(btf);
+	lsm_map_create__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/lsm_map_create.c b/tools/testing/selftests/bpf/progs/lsm_map_create.c
new file mode 100644
index 000000000000..093825c68459
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/lsm_map_create.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+
+char _license[] SEC("license") = "GPL";
+
+int my_tid;
+/* LSM enforcement:
+ *   - 0, delegate to kernel;
+ *   - 1, allow;
+ *   - -1, reject.
+ */
+int decision;
+
+SEC("lsm/bpf_map_create_security")
+int BPF_PROG(allow_unpriv_maps, union bpf_attr *attr)
+{
+	if (!my_tid || (u32)bpf_get_current_pid_tgid() != my_tid)
+		return 0; /* keep processing LSM hooks */
+
+	if (decision == 0)
+		return 0;
+
+	if (decision > 0)
+		return 1; /* allow */
+
+	return -EPERM;
+}
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 10ba43250668..12f9c6652d40 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -23,6 +23,7 @@ typedef __u16 __sum16;
 #include <linux/perf_event.h>
 #include <linux/socket.h>
 #include <linux/unistd.h>
+#include <sys/syscall.h>
 
 #include <sys/ioctl.h>
 #include <sys/wait.h>
@@ -176,6 +177,11 @@ void test__skip(void);
 void test__fail(void);
 int test__join_cgroup(const char *path);
 
+static inline int gettid(void)
+{
+	return syscall(SYS_gettid);
+}
+
 #define PRINT_FAIL(format...)                                                  \
 	({                                                                     \
 		test__fail();                                                  \
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-04-12  4:32 ` [PATCH bpf-next 5/8] selftests/bpf: validate new " Andrii Nakryiko
@ 2023-04-12  4:32 ` Andrii Nakryiko
  2023-04-12 18:24   ` Kees Cook
  2023-04-12  4:32 ` [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook Andrii Nakryiko
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Seems like that extra bpf_capable() check in BPF_MAP_FREEZE handler was
unintentionally left when we switched to a model that all BPF map
operations should be allowed regardless of CAP_BPF (or any other
capabilities), as long as process got BPF map FD somehow.

This patch replaces bpf_capable() check in BPF_MAP_FREEZE handler with
writeable access check, given conceptually freezing the map is modifying
it: map becomes unmodifiable for subsequent updates.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7d1165814efc..42d8473237ab 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2001,6 +2001,11 @@ static int map_freeze(const union bpf_attr *attr)
 		return -ENOTSUPP;
 	}
 
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
 	mutex_lock(&map->freeze_mutex);
 	if (bpf_map_write_active(map)) {
 		err = -EBUSY;
@@ -2010,10 +2015,6 @@ static int map_freeze(const union bpf_attr *attr)
 		err = -EBUSY;
 		goto err_put;
 	}
-	if (!bpf_capable()) {
-		err = -EPERM;
-		goto err_put;
-	}
 
 	WRITE_ONCE(map->frozen, true);
 err_put:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-04-12  4:32 ` [PATCH bpf-next 6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command Andrii Nakryiko
@ 2023-04-12  4:32 ` Andrii Nakryiko
  2023-04-12 16:52   ` Paul Moore
  2023-04-12  4:33 ` [PATCH bpf-next 8/8] selftests/bpf: enhance lsm_map_create test with BTF LSM control Andrii Nakryiko
  2023-04-12 16:49 ` [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Paul Moore
  8 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:32 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Add new LSM hook, bpf_btf_load_security, that allows custom LSM security
policies controlling BTF data loading permissions (BPF_BTF_LOAD command
of bpf() syscall) granularly and precisely.

This complements bpf_map_create_security LSM hook added earlier and
follow the same semantics: 0 means perform standard kernel capabilities-based
checks, negative error rejects BTF object load, while positive one skips
CAP_BPF check and allows BTF data object creation.

With this hook, together with bpf_map_create_security, we now can also allow
trusted unprivileged process to create BPF maps that require BTF, which
we take advantaged in the next patch to improve the coverage of added
BPF selftest.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/lsm_hook_defs.h |  1 +
 include/linux/lsm_hooks.h     | 13 +++++++++++++
 include/linux/security.h      |  6 ++++++
 kernel/bpf/bpf_lsm.c          |  1 +
 kernel/bpf/syscall.c          | 10 ++++++++++
 security/security.c           |  4 ++++
 6 files changed, 35 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index b4fe9ed7021a..92cb0f95b970 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -396,6 +396,7 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
 LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
 LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
 LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
+LSM_HOOK(int, 0, bpf_btf_load_security, const union bpf_attr *attr)
 LSM_HOOK(int, 0, bpf_map_create_security, const union bpf_attr *attr)
 LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
 LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 42bf7c0aa4d8..cde96b5e15e2 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1598,6 +1598,19 @@
  *	@prog: bpf prog that userspace want to use.
  *	Return 0 if permission is granted.
  *
+ * @bpf_btf_load_security:
+ *	Do a check to determine permission to create BTF data object
+ *	(BPF_BTF_LOAD command of bpf() syscall).
+ *	Implementation can override kernel capabilities checks according to
+ *	the rules below:
+ *	  - 0 should be returned to delegate permission checks to other
+ *	    installed LSM callbacks and/or hard-wired kernel logic, which
+ *	    would enforce CAP_BPF capability;
+ *	  - reject BTF data object creation by returning -EPERM or any other
+ *	    negative error code;
+ *	  - allow BTF data object creation, overriding kernel checks, by
+ *	    returning a positive result.
+ *
  * @bpf_map_create_security:
  *	Do a check to determine permission to create requested BPF map.
  *	Implementation can override kernel capabilities checks according to
diff --git a/include/linux/security.h b/include/linux/security.h
index e5374fe92ef6..f3ee1800392d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2023,6 +2023,7 @@ struct bpf_prog_aux;
 extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
 extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
 extern int security_bpf_prog(struct bpf_prog *prog);
+extern int security_bpf_btf_load(const union bpf_attr *attr);
 extern int security_bpf_map_create(const union bpf_attr *attr);
 extern int security_bpf_map_alloc(struct bpf_map *map);
 extern void security_bpf_map_free(struct bpf_map *map);
@@ -2045,6 +2046,11 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
 	return 0;
 }
 
+static inline int security_bpf_btf_load(const union bpf_attr *attr)
+{
+	return 0;
+}
+
 static inline int security_bpf_map_create(const union bpf_attr *attr)
 {
 	return 0;
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 931d4dda5dac..53c39a18fd2c 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -260,6 +260,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 BTF_SET_START(sleepable_lsm_hooks)
 BTF_ID(func, bpf_lsm_bpf)
 BTF_ID(func, bpf_lsm_bpf_map)
+BTF_ID(func, bpf_lsm_bpf_btf_load_security)
 BTF_ID(func, bpf_lsm_bpf_map_create_security)
 BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
 BTF_ID(func, bpf_lsm_bpf_map_free_security)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 42d8473237ab..bbf70bddc770 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4449,12 +4449,22 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 
 static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
 {
+	int err;
+
 	if (CHECK_ATTR(BPF_BTF_LOAD))
 		return -EINVAL;
 
+	/* security checks */
+	err = security_bpf_btf_load(attr);
+	if (err < 0)
+		return err;
+	if (err > 0)
+		goto skip_priv_checks;
+
 	if (!bpf_capable())
 		return -EPERM;
 
+skip_priv_checks:
 	return btf_new_fd(attr, uattr, uattr_size);
 }
 
diff --git a/security/security.c b/security/security.c
index f9b885680966..8869802ef5f5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2682,6 +2682,10 @@ int security_bpf_prog(struct bpf_prog *prog)
 {
 	return call_int_hook(bpf_prog, 0, prog);
 }
+int security_bpf_btf_load(const union bpf_attr *attr)
+{
+	return call_int_hook(bpf_btf_load_security, 0, attr);
+}
 int security_bpf_map_create(const union bpf_attr *attr)
 {
 	return call_int_hook(bpf_map_create_security, 0, attr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next 8/8] selftests/bpf: enhance lsm_map_create test with BTF LSM control
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-04-12  4:32 ` [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook Andrii Nakryiko
@ 2023-04-12  4:33 ` Andrii Nakryiko
  2023-04-12 16:49 ` [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Paul Moore
  8 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-12  4:33 UTC (permalink / raw)
  To: bpf, ast, daniel, kpsingh, keescook, paul
  Cc: linux-security-module, Andrii Nakryiko

Adjust and augment lsm_map_create selftest with bpf_btf_load_security
LSM hook and validate that BPF maps that require custom BTF are
succeeding even without privileged, as long as LSM policy allows both
BPF map and BTF object creation.

Further, add another subtest that uses libbpf's BPF skeleton to create
a bunch of maps declaratively. We also add read-only global variable to
validate that BPF_MAP_FREEZE command follows LSM policy as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/lsm_map_create.c | 89 ++++++++++++++++---
 tools/testing/selftests/bpf/progs/just_maps.c | 56 ++++++++++++
 .../selftests/bpf/progs/lsm_map_create.c      | 15 ++++
 3 files changed, 148 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/just_maps.c

diff --git a/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c b/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
index fee78b0448c3..497268d6febd 100644
--- a/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
+++ b/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
@@ -5,6 +5,7 @@
 #include <bpf/btf.h>
 #include "cap_helpers.h"
 #include "lsm_map_create.skel.h"
+#include "just_maps.skel.h"
 
 static int drop_priv_caps(__u64 *old_caps)
 {
@@ -19,7 +20,7 @@ static int restore_priv_caps(__u64 old_caps)
 	return cap_enable_effective(old_caps, NULL);
 }
 
-void test_lsm_map_create(void)
+static void subtest_map_create_probes(void)
 {
 	struct btf *btf = NULL;
 	struct lsm_map_create *skel = NULL;
@@ -59,6 +60,7 @@ void test_lsm_map_create(void)
 
 		if (map_type == BPF_MAP_TYPE_UNSPEC)
 			continue;
+		map_type = BPF_MAP_TYPE_SK_STORAGE;
 
 		/* this will show which map type we are working with in verbose log */
 		map_type_name = btf__str_by_offset(btf, e->name_off);
@@ -100,13 +102,6 @@ void test_lsm_map_create(void)
 		ret = libbpf_probe_bpf_map_type(map_type, NULL);
 		ASSERT_EQ(ret, 1, "default_priv_mode");
 
-		/* local storage needs custom BTF to be loaded, which we
-		 * currently can't do once we drop privileges, so skip few
-		 * checks for such maps
-		 */
-		if (needs_btf)
-			goto skip_if_needs_btf;
-
 		/* now let's drop privileges, and chech that unpriv maps are
 		 * still possible to create
 		 */
@@ -114,7 +109,11 @@ void test_lsm_map_create(void)
 			goto cleanup;
 
 		ret = libbpf_probe_bpf_map_type(map_type, NULL);
-		ASSERT_EQ(ret, is_map_priv ? 0 : 1,  "default_unpriv_mode");
+		/* maps that require custom BTF will fail with -EPERM */
+		if (needs_btf)
+			ASSERT_EQ(ret, -EPERM, "default_unpriv_mode");
+		else
+			ASSERT_EQ(ret, is_map_priv ? 0 : 1,  "default_unpriv_mode");
 
 		/* allow any map creation for our thread */
 		skel->bss->decision = 1;
@@ -124,20 +123,86 @@ void test_lsm_map_create(void)
 		/* reject any map creation for our thread */
 		skel->bss->decision = -1;
 		ret = libbpf_probe_bpf_map_type(map_type, NULL);
-		ASSERT_EQ(ret, 0, "lsm_reject_unpriv_mode");
+		/* maps that require custom BTF will fail with -EPERM */
+		if (needs_btf)
+			ASSERT_EQ(ret, -EPERM, "lsm_reject_unpriv_mode");
+		else
+			ASSERT_EQ(ret, 0, "lsm_reject_unpriv_mode");
 
 		/* restore privileges, but keep reject LSM policy */
 		if (!ASSERT_OK(restore_priv_caps(orig_caps), "restore_caps"))
 			goto cleanup;
 
-skip_if_needs_btf:
 		/* even with all caps map create will fail */
 		skel->bss->decision = -1;
 		ret = libbpf_probe_bpf_map_type(map_type, NULL);
-		ASSERT_EQ(ret, 0, "lsm_reject_priv_mode");
+		if (needs_btf)
+			ASSERT_EQ(ret, -EPERM, "lsm_reject_priv_mode");
+		else
+			ASSERT_EQ(ret, 0, "lsm_reject_priv_mode");
 	}
 
 cleanup:
 	btf__free(btf);
 	lsm_map_create__destroy(skel);
 }
+
+static void subtest_map_create_obj(void)
+{
+	struct lsm_map_create *skel = NULL;
+	struct just_maps *maps_skel = NULL;
+	struct bpf_map_info map_info;
+	__u32 map_info_sz = sizeof(map_info);
+	__u64 orig_caps;
+	int err, map_fd;
+
+	skel = lsm_map_create__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	skel->bss->my_tid = syscall(SYS_gettid);
+	skel->bss->decision = 0;
+
+	err = lsm_map_create__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	/* now let's drop privileges, and chech that unpriv maps are
+	 * still possible to create and they do have BTF associated with it
+	 */
+	if (!ASSERT_OK(drop_priv_caps(&orig_caps), "drop_caps"))
+		goto cleanup;
+
+	/* allow unprivileged BPF map and BTF obj creation */
+	skel->bss->decision = 1;
+
+	maps_skel = just_maps__open_and_load();
+	if (!ASSERT_OK_PTR(maps_skel, "maps_skel_open_and_load"))
+		goto restore_caps;
+
+	ASSERT_GT(bpf_object__btf_fd(maps_skel->obj), 0, "maps_btf_fd");
+
+	/* check that SK_LOCAL_STORAGE map has BTF info */
+	map_fd = bpf_map__fd(maps_skel->maps.sk_msg_netns_cookies);
+	memset(&map_info, 0, map_info_sz);
+	err = bpf_map_get_info_by_fd(map_fd, &map_info, &map_info_sz);
+	ASSERT_OK(err, "get_map_info_by_fd");
+
+	ASSERT_GT(map_info.btf_id, 0, "map_btf_id");
+	ASSERT_GT(map_info.btf_key_type_id, 0, "map_btf_key_type_id");
+	ASSERT_GT(map_info.btf_value_type_id, 0, "map_btf_value_type_id");
+
+restore_caps:
+	ASSERT_OK(restore_priv_caps(orig_caps), "restore_caps");
+cleanup:
+	just_maps__destroy(maps_skel);
+	lsm_map_create__destroy(skel);
+}
+
+void test_lsm_map_create(void)
+{
+	if (test__start_subtest("map_create_probes"))
+		subtest_map_create_probes();
+	if (test__start_subtest("map_create_obj"))
+		subtest_map_create_obj();
+}
diff --git a/tools/testing/selftests/bpf/progs/just_maps.c b/tools/testing/selftests/bpf/progs/just_maps.c
new file mode 100644
index 000000000000..9073a51da705
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/just_maps.c
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+struct array_map {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, int);
+} array SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, int);
+	__array(values, struct array_map);
+} outer_arr SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
+	__uint(max_entries, 5);
+	__type(key, int);
+	__array(values, struct array_map);
+} outer_hash SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, int);
+} sockarr SEC(".maps");
+
+struct hmap_elem {
+	volatile int cnt;
+	struct bpf_spin_lock lock;
+	int test_padding;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct hmap_elem);
+} hmap SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, int);
+} sk_msg_netns_cookies SEC(".maps");
+
+/* .rodata to test BPF_MAP_FREEZE as well */
+const volatile int some_read_only_variable = 123;
diff --git a/tools/testing/selftests/bpf/progs/lsm_map_create.c b/tools/testing/selftests/bpf/progs/lsm_map_create.c
index 093825c68459..f3c8465c1ed0 100644
--- a/tools/testing/selftests/bpf/progs/lsm_map_create.c
+++ b/tools/testing/selftests/bpf/progs/lsm_map_create.c
@@ -30,3 +30,18 @@ int BPF_PROG(allow_unpriv_maps, union bpf_attr *attr)
 
 	return -EPERM;
 }
+
+SEC("lsm/bpf_btf_load_security")
+int BPF_PROG(allow_unpriv_btf, union bpf_attr *attr)
+{
+	if (!my_tid || (u32)bpf_get_current_pid_tgid() != my_tid)
+		return 0; /* keep processing LSM hooks */
+
+	if (decision == 0)
+		return 0;
+
+	if (decision > 0)
+		return 1; /* allow */
+
+	return -EPERM;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2023-04-12  4:33 ` [PATCH bpf-next 8/8] selftests/bpf: enhance lsm_map_create test with BTF LSM control Andrii Nakryiko
@ 2023-04-12 16:49 ` Paul Moore
  2023-04-12 17:47   ` Kees Cook
  8 siblings, 1 reply; 52+ messages in thread
From: Paul Moore @ 2023-04-12 16:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kpsingh, keescook, linux-security-module

On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> are meant to allow highly-granular LSM-based control over the usage of BPF
> subsytem. Specifically, to control the creation of BPF maps and BTF data
> objects, which are fundamental building blocks of any modern BPF application.
>
> These new hooks are able to override default kernel-side CAP_BPF-based (and
> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> implement LSM policies that could granularly enforce more restrictions on
> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> capabilities), but also, importantly, allow to *bypass kernel-side
> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> cases.

One of the hallmarks of the LSM has always been that it is
non-authoritative: it cannot unilaterally grant access, it can only
restrict what would have been otherwise permitted on a traditional
Linux system.  Put another way, a LSM should not undermine the Linux
discretionary access controls, e.g. capabilities.

If there is a problem with the eBPF capability-based access controls,
that problem needs to be addressed in how the core eBPF code
implements its capability checks, not by modifying the LSM mechanism
to bypass these checks.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook
  2023-04-12  4:32 ` [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook Andrii Nakryiko
@ 2023-04-12 16:52   ` Paul Moore
  2023-04-13  1:43     ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Paul Moore @ 2023-04-12 16:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kpsingh, keescook, linux-security-module

On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add new LSM hook, bpf_btf_load_security, that allows custom LSM security
> policies controlling BTF data loading permissions (BPF_BTF_LOAD command
> of bpf() syscall) granularly and precisely.
>
> This complements bpf_map_create_security LSM hook added earlier and
> follow the same semantics: 0 means perform standard kernel capabilities-based
> checks, negative error rejects BTF object load, while positive one skips
> CAP_BPF check and allows BTF data object creation.
>
> With this hook, together with bpf_map_create_security, we now can also allow
> trusted unprivileged process to create BPF maps that require BTF, which
> we take advantaged in the next patch to improve the coverage of added
> BPF selftest.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/lsm_hook_defs.h |  1 +
>  include/linux/lsm_hooks.h     | 13 +++++++++++++
>  include/linux/security.h      |  6 ++++++
>  kernel/bpf/bpf_lsm.c          |  1 +
>  kernel/bpf/syscall.c          | 10 ++++++++++
>  security/security.c           |  4 ++++
>  6 files changed, 35 insertions(+)

...

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 42d8473237ab..bbf70bddc770 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -4449,12 +4449,22 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
>
>  static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
>  {
> +       int err;
> +
>         if (CHECK_ATTR(BPF_BTF_LOAD))
>                 return -EINVAL;
>
> +       /* security checks */
> +       err = security_bpf_btf_load(attr);
> +       if (err < 0)
> +               return err;
> +       if (err > 0)
> +               goto skip_priv_checks;
> +
>         if (!bpf_capable())
>                 return -EPERM;
>
> +skip_priv_checks:
>         return btf_new_fd(attr, uattr, uattr_size);
>  }

Beyond the objection I brought up in the patchset cover letter, I
believe the work of the security_bpf_btf_load() hook presented here
could be done by the existing security_bpf() LSM hook.  If you believe
that not to be the case, please let me know.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12 16:49 ` [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Paul Moore
@ 2023-04-12 17:47   ` Kees Cook
  2023-04-12 18:06     ` Paul Moore
  2023-04-14 20:23     ` Dr. Greg
  0 siblings, 2 replies; 52+ messages in thread
From: Kees Cook @ 2023-04-12 17:47 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, linux-security-module

On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > are meant to allow highly-granular LSM-based control over the usage of BPF
> > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > objects, which are fundamental building blocks of any modern BPF application.
> >
> > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > implement LSM policies that could granularly enforce more restrictions on
> > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > capabilities), but also, importantly, allow to *bypass kernel-side
> > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > cases.
> 
> One of the hallmarks of the LSM has always been that it is
> non-authoritative: it cannot unilaterally grant access, it can only
> restrict what would have been otherwise permitted on a traditional
> Linux system.  Put another way, a LSM should not undermine the Linux
> discretionary access controls, e.g. capabilities.
> 
> If there is a problem with the eBPF capability-based access controls,
> that problem needs to be addressed in how the core eBPF code
> implements its capability checks, not by modifying the LSM mechanism
> to bypass these checks.

I think semantics matter here. I wouldn't view this as _bypassing_
capability enforcement: it's just more fine-grained access control.

For example, in many places we have things like:

	if (!some_check(...) && !capable(...))
		return -EPERM;

I would expect this is a similar logic. An operation can succeed if the
access control requirement is met. The mismatch we have through-out the
kernel is that capability checks aren't strictly done by LSM hooks. And
this series conceptually, I think, doesn't violate that -- it's changing
the logic of the capability checks, not the LSM (i.e. there no LSM hooks
yet here).

The reason CAP_BPF was created was because there was nothing else that
would be fine-grained enough at the time.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-04-12  4:32 ` [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
@ 2023-04-12 17:49   ` Kees Cook
  2023-04-13  0:22     ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2023-04-12 17:49 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kpsingh, paul, linux-security-module

On Tue, Apr 11, 2023 at 09:32:53PM -0700, Andrii Nakryiko wrote:
> Make each bpf() syscall command a bit more self-contained, making it
> easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
> handling down to map_create() and bpf_prog_load(), two special commands
> in this regard.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 6d575505f89c..c1d268025985 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1130,6 +1130,17 @@ static int map_create(union bpf_attr *attr)
>  	int f_flags;
>  	int err;
>  
> +	/* Intent here is for unprivileged_bpf_disabled to block key object
> +	 * creation commands for unprivileged users; other actions depend
> +	 * of fd availability and access to bpffs, so are dependent on
> +	 * object creation success.  Capabilities are later verified for
> +	 * operations such as load and map create, so even with unprivileged
> +	 * BPF disabled, capability checks are still carried out for these
> +	 * and other operations.
> +	 */
> +	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> +		return -EPERM;

This appears to be a problem in the original code, but capability checks
should be last, so that audit doesn't see a capability as having been
used when it wasn't. i.e. if bpf_capable() passes, but
sysctl_unprivileged_bpf_disabled isn't true, it'll look like a
capability got used, and the flag gets set. Not a big deal at the end of
the day, but the preferred ordering should be:

	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
		...

> +
>  	err = CHECK_ATTR(BPF_MAP_CREATE);
>  	if (err)
>  		return -EINVAL;
> @@ -2512,6 +2523,17 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
>  	char license[128];
>  	bool is_gpl;
>  
> +	/* Intent here is for unprivileged_bpf_disabled to block key object
> +	 * creation commands for unprivileged users; other actions depend
> +	 * of fd availability and access to bpffs, so are dependent on
> +	 * object creation success.  Capabilities are later verified for
> +	 * operations such as load and map create, so even with unprivileged
> +	 * BPF disabled, capability checks are still carried out for these
> +	 * and other operations.
> +	 */
> +	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> +		return -EPERM;
> +
>  	if (CHECK_ATTR(BPF_PROG_LOAD))
>  		return -EINVAL;
>  
> @@ -5008,23 +5030,8 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
>  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
>  {
>  	union bpf_attr attr;
> -	bool capable;
>  	int err;
>  
> -	capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled;
> -
> -	/* Intent here is for unprivileged_bpf_disabled to block key object
> -	 * creation commands for unprivileged users; other actions depend
> -	 * of fd availability and access to bpffs, so are dependent on
> -	 * object creation success.  Capabilities are later verified for
> -	 * operations such as load and map create, so even with unprivileged
> -	 * BPF disabled, capability checks are still carried out for these
> -	 * and other operations.
> -	 */
> -	if (!capable &&
> -	    (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
> -		return -EPERM;
> -
>  	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
>  	if (err)
>  		return err;
> -- 
> 2.34.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 2/8] bpf: inline map creation logic in map_create() function
  2023-04-12  4:32 ` [PATCH bpf-next 2/8] bpf: inline map creation logic in map_create() function Andrii Nakryiko
@ 2023-04-12 17:53   ` Kees Cook
  2023-04-13  0:22     ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2023-04-12 17:53 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kpsingh, paul, linux-security-module

On Tue, Apr 11, 2023 at 09:32:54PM -0700, Andrii Nakryiko wrote:
> Keep all the relevant generic sanity checks, permission checks, and
> creation and initialization logic in one linear piece of code. Currently
> helper function that handles memory allocation and partial
> initialization is split apart and is about 1000 lines higher in the
> file, hurting readability.

At first glance, this seems like a step in the wrong direction: having a
single-purpose function pulled out of a larger one seems like a good
thing for stuff like unit testing, etc. Unless there's a reason later in
the series for this inlining (which should be called out in the
changelog here), I would say if it is only readability, just move the
function down 1000 lines but leave it a separate function.

-Kees

> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/bpf/syscall.c | 54 ++++++++++++++++++--------------------------
>  1 file changed, 22 insertions(+), 32 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c1d268025985..a090737f98ea 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -108,37 +108,6 @@ const struct bpf_map_ops bpf_map_offload_ops = {
>  	.map_mem_usage = bpf_map_offload_map_mem_usage,
>  };
>  
> -static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
> -{
> -	const struct bpf_map_ops *ops;
> -	u32 type = attr->map_type;
> -	struct bpf_map *map;
> -	int err;
> -
> -	if (type >= ARRAY_SIZE(bpf_map_types))
> -		return ERR_PTR(-EINVAL);
> -	type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));
> -	ops = bpf_map_types[type];
> -	if (!ops)
> -		return ERR_PTR(-EINVAL);
> -
> -	if (ops->map_alloc_check) {
> -		err = ops->map_alloc_check(attr);
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	if (attr->map_ifindex)
> -		ops = &bpf_map_offload_ops;
> -	if (!ops->map_mem_usage)
> -		return ERR_PTR(-EINVAL);
> -	map = ops->map_alloc(attr);
> -	if (IS_ERR(map))
> -		return map;
> -	map->ops = ops;
> -	map->map_type = type;
> -	return map;
> -}
> -
>  static void bpf_map_write_active_inc(struct bpf_map *map)
>  {
>  	atomic64_inc(&map->writecnt);
> @@ -1124,7 +1093,9 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  /* called via syscall */
>  static int map_create(union bpf_attr *attr)
>  {
> +	const struct bpf_map_ops *ops;
>  	int numa_node = bpf_map_attr_numa_node(attr);
> +	u32 map_type = attr->map_type;
>  	struct btf_field_offs *foffs;
>  	struct bpf_map *map;
>  	int f_flags;
> @@ -1167,9 +1138,28 @@ static int map_create(union bpf_attr *attr)
>  		return -EINVAL;
>  
>  	/* find map type and init map: hashtable vs rbtree vs bloom vs ... */
> -	map = find_and_alloc_map(attr);
> +	map_type = attr->map_type;
> +	if (map_type >= ARRAY_SIZE(bpf_map_types))
> +		return -EINVAL;
> +	map_type = array_index_nospec(map_type, ARRAY_SIZE(bpf_map_types));
> +	ops = bpf_map_types[map_type];
> +	if (!ops)
> +		return -EINVAL;
> +
> +	if (ops->map_alloc_check) {
> +		err = ops->map_alloc_check(attr);
> +		if (err)
> +			return err;
> +	}
> +	if (attr->map_ifindex)
> +		ops = &bpf_map_offload_ops;
> +	if (!ops->map_mem_usage)
> +		return -EINVAL;
> +	map = ops->map_alloc(attr);
>  	if (IS_ERR(map))
>  		return PTR_ERR(map);
> +	map->ops = ops;
> +	map->map_type = map_type;
>  
>  	err = bpf_obj_name_cpy(map->name, attr->map_name,
>  			       sizeof(attr->map_name));
> -- 
> 2.34.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 3/8] bpf: centralize permissions checks for all BPF map types
  2023-04-12  4:32 ` [PATCH bpf-next 3/8] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
@ 2023-04-12 18:01   ` Kees Cook
  2023-04-13  0:23     ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2023-04-12 18:01 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kpsingh, paul, linux-security-module

On Tue, Apr 11, 2023 at 09:32:55PM -0700, Andrii Nakryiko wrote:
> This allows to do more centralized decisions later on, and generally
> makes it very explicit which maps are privileged and which are not.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> [...]
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 00c253b84bf5..c69db80fc947 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -422,12 +422,6 @@ static int htab_map_alloc_check(union bpf_attr *attr)
>  	BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
>  		     offsetof(struct htab_elem, hash_node.pprev));
>  
> -	if (lru && !bpf_capable())
> -		/* LRU implementation is much complicated than other
> -		 * maps.  Hence, limit to CAP_BPF.
> -		 */
> -		return -EPERM;
> -

The LRU part of this check gets lost, doesn't it? More specifically,
doesn't this make the security check for htab_map_alloc_check() more
strict than before? (If that's okay, please mention the logical change
in the commit log.)

> [...]
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a090737f98ea..cbea4999e92f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1101,17 +1101,6 @@ static int map_create(union bpf_attr *attr)
>  	int f_flags;
>  	int err;
>  
> -	/* Intent here is for unprivileged_bpf_disabled to block key object
> -	 * creation commands for unprivileged users; other actions depend
> -	 * of fd availability and access to bpffs, so are dependent on
> -	 * object creation success.  Capabilities are later verified for
> -	 * operations such as load and map create, so even with unprivileged
> -	 * BPF disabled, capability checks are still carried out for these
> -	 * and other operations.
> -	 */
> -	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> -		return -EPERM;
> -

Given that this was already performing a centralized capability check,
why were the individual functions doing checks before too?

(I'm wondering if the individual functions remain the better place to do
this checking?)

>  	err = CHECK_ATTR(BPF_MAP_CREATE);
>  	if (err)
>  		return -EINVAL;
> @@ -1155,6 +1144,65 @@ static int map_create(union bpf_attr *attr)
>  		ops = &bpf_map_offload_ops;
>  	if (!ops->map_mem_usage)
>  		return -EINVAL;
> +
> +	/* Intent here is for unprivileged_bpf_disabled to block key object
> +	 * creation commands for unprivileged users; other actions depend
> +	 * of fd availability and access to bpffs, so are dependent on
> +	 * object creation success.  Capabilities are later verified for
> +	 * operations such as load and map create, so even with unprivileged
> +	 * BPF disabled, capability checks are still carried out for these
> +	 * and other operations.
> +	 */
> +	if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> +		return -EPERM;
> +
> +	/* check privileged map type permissions */
> +	switch (map_type) {
> +	case BPF_MAP_TYPE_SK_STORAGE:
> +	case BPF_MAP_TYPE_INODE_STORAGE:
> +	case BPF_MAP_TYPE_TASK_STORAGE:
> +	case BPF_MAP_TYPE_CGRP_STORAGE:
> +	case BPF_MAP_TYPE_BLOOM_FILTER:
> +	case BPF_MAP_TYPE_LPM_TRIE:
> +	case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
> +	case BPF_MAP_TYPE_STACK_TRACE:
> +	case BPF_MAP_TYPE_QUEUE:
> +	case BPF_MAP_TYPE_STACK:
> +	case BPF_MAP_TYPE_LRU_HASH:
> +	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
> +	case BPF_MAP_TYPE_STRUCT_OPS:
> +	case BPF_MAP_TYPE_CPUMAP:
> +		if (!bpf_capable())
> +			return -EPERM;
> +		break;
> +	case BPF_MAP_TYPE_SOCKMAP:
> +	case BPF_MAP_TYPE_SOCKHASH:
> +	case BPF_MAP_TYPE_DEVMAP:
> +	case BPF_MAP_TYPE_DEVMAP_HASH:
> +	case BPF_MAP_TYPE_XSKMAP:
> +		if (!capable(CAP_NET_ADMIN))
> +			return -EPERM;
> +		break;
> +	case BPF_MAP_TYPE_ARRAY:
> +	case BPF_MAP_TYPE_PERCPU_ARRAY:
> +	case BPF_MAP_TYPE_PROG_ARRAY:
> +	case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
> +	case BPF_MAP_TYPE_CGROUP_ARRAY:
> +	case BPF_MAP_TYPE_ARRAY_OF_MAPS:
> +	case BPF_MAP_TYPE_HASH:
> +	case BPF_MAP_TYPE_PERCPU_HASH:
> +	case BPF_MAP_TYPE_HASH_OF_MAPS:
> +	case BPF_MAP_TYPE_RINGBUF:
> +	case BPF_MAP_TYPE_USER_RINGBUF:
> +	case BPF_MAP_TYPE_CGROUP_STORAGE:
> +	case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
> +		/* unprivileged */
> +		break;
> +	default:
> +		WARN(1, "unsupported map type %d", map_type);
> +		return -EPERM;

Thank you for making sure this fails safe! :)

> +	}
> +
>  	map = ops->map_alloc(attr);
>  	if (IS_ERR(map))
>  		return PTR_ERR(map);
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 7c189c2e2fbf..4b67bb5e7f9c 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -32,8 +32,6 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
>  {
>  	struct bpf_stab *stab;
>  
> -	if (!capable(CAP_NET_ADMIN))
> -		return ERR_PTR(-EPERM);
>  	if (attr->max_entries == 0 ||
>  	    attr->key_size    != 4 ||
>  	    (attr->value_size != sizeof(u32) &&
> @@ -1085,8 +1083,6 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
>  	struct bpf_shtab *htab;
>  	int i, err;
>  
> -	if (!capable(CAP_NET_ADMIN))
> -		return ERR_PTR(-EPERM);
>  	if (attr->max_entries == 0 ||
>  	    attr->key_size    == 0 ||
>  	    (attr->value_size != sizeof(u32) &&
> diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
> index 2c1427074a3b..e1c526f97ce3 100644
> --- a/net/xdp/xskmap.c
> +++ b/net/xdp/xskmap.c
> @@ -5,7 +5,6 @@
>  
>  #include <linux/bpf.h>
>  #include <linux/filter.h>
> -#include <linux/capability.h>
>  #include <net/xdp_sock.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
> @@ -68,9 +67,6 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
>  	int numa_node;
>  	u64 size;
>  
> -	if (!capable(CAP_NET_ADMIN))
> -		return ERR_PTR(-EPERM);
> -
>  	if (attr->max_entries == 0 || attr->key_size != 4 ||
>  	    attr->value_size != 4 ||
>  	    attr->map_flags & ~(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY))
> diff --git a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
> index 8383a99f610f..0adf8d9475cb 100644
> --- a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
> +++ b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
> @@ -171,7 +171,11 @@ static void test_unpriv_bpf_disabled_negative(struct test_unpriv_bpf_disabled *s
>  				prog_insns, prog_insn_cnt, &load_opts),
>  		  -EPERM, "prog_load_fails");
>  
> -	for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_BLOOM_FILTER; i++)
> +	/* some map types require particular correct parameters which could be
> +	 * sanity-checked before enforcing -EPERM, so only validate that
> +	 * the simple ARRAY and HASH maps are failing with -EPERM
> +	 */
> +	for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_ARRAY; i++)
>  		ASSERT_EQ(bpf_map_create(i, NULL, sizeof(int), sizeof(int), 1, NULL),
>  			  -EPERM, "map_create_fails");
>  
> -- 
> 2.34.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12 17:47   ` Kees Cook
@ 2023-04-12 18:06     ` Paul Moore
  2023-04-12 18:28       ` Kees Cook
  2023-04-12 18:38       ` Casey Schaufler
  2023-04-14 20:23     ` Dr. Greg
  1 sibling, 2 replies; 52+ messages in thread
From: Paul Moore @ 2023-04-12 18:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, linux-security-module

On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > objects, which are fundamental building blocks of any modern BPF application.
> > >
> > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > implement LSM policies that could granularly enforce more restrictions on
> > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > cases.
> >
> > One of the hallmarks of the LSM has always been that it is
> > non-authoritative: it cannot unilaterally grant access, it can only
> > restrict what would have been otherwise permitted on a traditional
> > Linux system.  Put another way, a LSM should not undermine the Linux
> > discretionary access controls, e.g. capabilities.
> >
> > If there is a problem with the eBPF capability-based access controls,
> > that problem needs to be addressed in how the core eBPF code
> > implements its capability checks, not by modifying the LSM mechanism
> > to bypass these checks.
>
> I think semantics matter here. I wouldn't view this as _bypassing_
> capability enforcement: it's just more fine-grained access control.
>
> For example, in many places we have things like:
>
>         if (!some_check(...) && !capable(...))
>                 return -EPERM;
>
> I would expect this is a similar logic. An operation can succeed if the
> access control requirement is met. The mismatch we have through-out the
> kernel is that capability checks aren't strictly done by LSM hooks. And
> this series conceptually, I think, doesn't violate that -- it's changing
> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> yet here).

Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
when it returns a positive value "bypasses kernel checks".  The patch
isn't based on either Linus' tree or the LSM tree, I'm guessing it is
based on a eBPF tree, so I can't say with 100% certainty that it is
bypassing a capability check, but the description claims that to be
the case.

Regardless of how you want to spin this, I'm not supportive of a LSM
hook which allows a LSM to bypass a capability check.  A LSM hook can
be used to provide additional access control restrictions beyond a
capability check, but a LSM hook should never be allowed to overrule
an access denial due to a capability check.

> The reason CAP_BPF was created was because there was nothing else that
> would be fine-grained enough at the time.

The LSM layer predates CAP_BPF, and one could make a very solid
argument that one of the reasons LSMs exist is to provide
supplementary controls due to capability-based access controls being a
poor fit for many modern use cases.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 4/8] bpf, lsm: implement bpf_map_create_security LSM hook
  2023-04-12  4:32 ` [PATCH bpf-next 4/8] bpf, lsm: implement bpf_map_create_security LSM hook Andrii Nakryiko
@ 2023-04-12 18:20   ` Kees Cook
  2023-04-13  0:23     ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2023-04-12 18:20 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kpsingh, paul, linux-security-module

On Tue, Apr 11, 2023 at 09:32:56PM -0700, Andrii Nakryiko wrote:
> Add new LSM hook, bpf_map_create_security, that allows custom LSM
> security policies controlling BPF map creation permissions granularly
> and precisely.

Naming nit-pick: the hook name doesn't need the "_security" suffix, if I'm
reading this correctly. The LSM hooks with that are really around the
allocation/initialization of LSM-specific memory (i.e. attach
LSM-specific allocation to an inode, etc).

The hook looks like it's "only" policy, so it can just be called
"bpf_map_create".

> This new LSM hook allows to implement both LSM policy that could enforce
> more granular and restrictive decisions about which processes can create
> which BPF maps, by rejecting BPF map creation based on passed in
> bpf_attr attributes. But also it allows to bypass CAP_BPF and
> CAP_NET_ADMIN restrictions, normally enforced by kernel, for
> applications that LSM policy deems trusted. Trustworthiness
> determination of the process/user/cgroup/etc is left up to custom LSM
> hook implementation and will dependon particular production setup of
> each individual use case.

As Paul mentioned, we need to give a careful examination of the access
control logic here. BPF is not deal with POSIX or DAC rules, so I think
there isn't a problem being flexible here, but it would be nice to find
a way to make this be "default reject" via capabilities that doesn't
differ much from the way things happen normally in the LSM (so that it
can be successfully reasoned about without need to consider BPF-specific
"special cases").

> If LSM policy wants to rely on default kernel logic, it can return
> 0 to delegate back to kernel. If it returns >0 return code,
> kernel will bypass its normal checks. This way it's possible to perform
> a delegation of trust (specifically for BPF map creation) from
> privileged LSM custom policy implementation to unprivileged user
> process, verifier and trusted by custom LSM policy.

At the least, I think the language of "bypass" is going to cause a not
of friction. :) We make to make sure this fails safe -- if there is no
loaded policy, capable() needs to remain the back-stop.

> Such model allows flexible and secure-by-default approach where user
> processes that need to use BPF features (BPF map creation, in this case)
> are left unprivileged with no CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON, etc.
> capabilities, but specific exceptions are implemented (usually in
> a centralized server fleet-wide fashion) for trusted
> processes/containers/users, allowing them to manipulate BPF facilities,
> as long as they are allowed and known apriori.

if (!unprivileged_allowed(...) && !capable(...))
	return -EPERM;

and uprivileged_allowed() is looking at the sysctl and LSM policy.

> 
> This patch implements first required part for full-fledged BPF usage:
> map creation. The other one, BPF program load, will be addressed in
> follow up patches.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/lsm_hook_defs.h |  1 +
>  include/linux/lsm_hooks.h     | 12 ++++++++++++
>  include/linux/security.h      |  6 ++++++
>  kernel/bpf/bpf_lsm.c          |  1 +
>  kernel/bpf/syscall.c          | 19 ++++++++++++++++---
>  security/security.c           |  4 ++++
>  6 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 094b76dc7164..b4fe9ed7021a 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -396,6 +396,7 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
>  LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
>  LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
>  LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
> +LSM_HOOK(int, 0, bpf_map_create_security, const union bpf_attr *attr)
>  LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
>  LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
>  LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 6e156d2acffc..42bf7c0aa4d8 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1598,6 +1598,18 @@
>   *	@prog: bpf prog that userspace want to use.
>   *	Return 0 if permission is granted.
>   *
> + * @bpf_map_create_security:
> + *	Do a check to determine permission to create requested BPF map.
> + *	Implementation can override kernel capabilities checks according to
> + *	the rules below:
> + *	  - 0 should be returned to delegate permission checks to other
> + *	    installed LSM callbacks and/or hard-wired kernel logic, which
> + *	    would enforce CAP_BPF/CAP_NET_ADMIN capabilities;
> + *	  - reject BPF map creation by returning -EPERM or any other
> + *	    negative error code;
> + *	  - allow BPF map creation, overriding kernel checks, by returning
> + *	    a positive result.
> + *
>   * @bpf_map_alloc_security:
>   *	Initialize the security field inside bpf map.
>   *	Return 0 on success, error on failure.
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 5984d0d550b4..e5374fe92ef6 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -2023,6 +2023,7 @@ struct bpf_prog_aux;
>  extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
>  extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
>  extern int security_bpf_prog(struct bpf_prog *prog);
> +extern int security_bpf_map_create(const union bpf_attr *attr);
>  extern int security_bpf_map_alloc(struct bpf_map *map);
>  extern void security_bpf_map_free(struct bpf_map *map);
>  extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
> @@ -2044,6 +2045,11 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
>  	return 0;
>  }
>  
> +static inline int security_bpf_map_create(const union bpf_attr *attr)
> +{
> +	return 0;
> +}

I would expect this to be something like:

	return sysctl_unprivileged_bpf_disabled ? -EPERM : 0;

> +
>  static inline int security_bpf_map_alloc(struct bpf_map *map)
>  {
>  	return 0;
> diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> index e14c822f8911..931d4dda5dac 100644
> --- a/kernel/bpf/bpf_lsm.c
> +++ b/kernel/bpf/bpf_lsm.c
> @@ -260,6 +260,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  BTF_SET_START(sleepable_lsm_hooks)
>  BTF_ID(func, bpf_lsm_bpf)
>  BTF_ID(func, bpf_lsm_bpf_map)
> +BTF_ID(func, bpf_lsm_bpf_map_create_security)
>  BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
>  BTF_ID(func, bpf_lsm_bpf_map_free_security)
>  BTF_ID(func, bpf_lsm_bpf_prog)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index cbea4999e92f..7d1165814efc 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -980,7 +980,7 @@ int map_check_no_btf(const struct bpf_map *map,
>  }
>  
>  static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> -			 u32 btf_key_id, u32 btf_value_id)
> +			 u32 btf_key_id, u32 btf_value_id, bool priv_checked)
>  {
>  	const struct btf_type *key_type, *value_type;
>  	u32 key_size, value_size;
> @@ -1008,7 +1008,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  	if (!IS_ERR_OR_NULL(map->record)) {
>  		int i;
>  
> -		if (!bpf_capable()) {
> +		if (!priv_checked && !bpf_capable()) {
>  			ret = -EPERM;
>  			goto free_map_tab;
>  		}
> @@ -1097,10 +1097,12 @@ static int map_create(union bpf_attr *attr)
>  	int numa_node = bpf_map_attr_numa_node(attr);
>  	u32 map_type = attr->map_type;
>  	struct btf_field_offs *foffs;
> +	bool priv_checked = false;
>  	struct bpf_map *map;
>  	int f_flags;
>  	int err;
>  
> +	/* sanity checks */
>  	err = CHECK_ATTR(BPF_MAP_CREATE);
>  	if (err)
>  		return -EINVAL;
> @@ -1145,6 +1147,15 @@ static int map_create(union bpf_attr *attr)
>  	if (!ops->map_mem_usage)
>  		return -EINVAL;
>  
> +	/* security checks */
> +	err = security_bpf_map_create(attr);
> +	if (err < 0)
> +		return err;
> +	if (err > 0) {
> +		priv_checked = true;
> +		goto skip_priv_checks;
> +	}

I think we can refactor this to avoid the concept of "skipping" checks.

Also, I think passing "priv_checked" is kind of confusing -- I feel like
access control should either be centralized or in each individual
function. Why is there a need to split this up?

> +
>  	/* Intent here is for unprivileged_bpf_disabled to block key object
>  	 * creation commands for unprivileged users; other actions depend
>  	 * of fd availability and access to bpffs, so are dependent on
> @@ -1203,6 +1214,8 @@ static int map_create(union bpf_attr *attr)
>  		return -EPERM;
>  	}
>  
> +skip_priv_checks:
> +	/* create and init map */
>  	map = ops->map_alloc(attr);
>  	if (IS_ERR(map))
>  		return PTR_ERR(map);
> @@ -1243,7 +1256,7 @@ static int map_create(union bpf_attr *attr)
>  
>  		if (attr->btf_value_type_id) {
>  			err = map_check_btf(map, btf, attr->btf_key_type_id,
> -					    attr->btf_value_type_id);
> +					    attr->btf_value_type_id, priv_checked);
>  			if (err)
>  				goto free_map;
>  		}
> diff --git a/security/security.c b/security/security.c
> index cf6cc576736f..f9b885680966 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -2682,6 +2682,10 @@ int security_bpf_prog(struct bpf_prog *prog)
>  {
>  	return call_int_hook(bpf_prog, 0, prog);
>  }
> +int security_bpf_map_create(const union bpf_attr *attr)
> +{
> +	return call_int_hook(bpf_map_create_security, 0, attr);

And the default return value here wouldn't be 0, but:

	sysctl_unprivileged_bpf_disabled ?  -EPERM : 0

> +}
>  int security_bpf_map_alloc(struct bpf_map *map)
>  {
>  	return call_int_hook(bpf_map_alloc_security, 0, map);
> -- 
> 2.34.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 5/8] selftests/bpf: validate new bpf_map_create_security LSM hook
  2023-04-12  4:32 ` [PATCH bpf-next 5/8] selftests/bpf: validate new " Andrii Nakryiko
@ 2023-04-12 18:23   ` Kees Cook
  2023-04-13  0:23     ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2023-04-12 18:23 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kpsingh, paul, linux-security-module

On Tue, Apr 11, 2023 at 09:32:57PM -0700, Andrii Nakryiko wrote:
> Add selftests that goes over every known map type and validates that
> a combination of privileged/unprivileged modes and allow/reject/pass-through
> LSM policy decisions behave as expected.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  .../selftests/bpf/prog_tests/lsm_map_create.c | 143 ++++++++++++++++++
>  .../selftests/bpf/progs/lsm_map_create.c      |  32 ++++
>  tools/testing/selftests/bpf/test_progs.h      |   6 +
>  3 files changed, 181 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
>  create mode 100644 tools/testing/selftests/bpf/progs/lsm_map_create.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c b/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
> new file mode 100644
> index 000000000000..fee78b0448c3
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
> @@ -0,0 +1,143 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
> +#include "linux/bpf.h"
> +#include <test_progs.h>
> +#include <bpf/btf.h>
> +#include "cap_helpers.h"
> +#include "lsm_map_create.skel.h"
> +
> +static int drop_priv_caps(__u64 *old_caps)
> +{
> +	return cap_disable_effective((1ULL << CAP_BPF) |
> +				     (1ULL << CAP_PERFMON) |
> +				     (1ULL << CAP_NET_ADMIN) |
> +				     (1ULL << CAP_SYS_ADMIN), old_caps);
> +}
> +
> +static int restore_priv_caps(__u64 old_caps)
> +{
> +	return cap_enable_effective(old_caps, NULL);
> +}
> +
> +void test_lsm_map_create(void)
> +{
> +	struct btf *btf = NULL;
> +	struct lsm_map_create *skel = NULL;
> +	const struct btf_type *t;
> +	const struct btf_enum *e;
> +	int i, n, id, err, ret;
> +
> +	skel = lsm_map_create__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
> +		return;
> +
> +	skel->bss->my_tid = syscall(SYS_gettid);
> +	skel->bss->decision = 0;
> +
> +	err = lsm_map_create__attach(skel);
> +	if (!ASSERT_OK(err, "skel_attach"))
> +		goto cleanup;
> +
> +	btf = btf__parse("/sys/kernel/btf/vmlinux", NULL);
> +	if (!ASSERT_OK_PTR(btf, "btf_parse"))
> +		goto cleanup;
> +
> +	/* find enum bpf_map_type and enumerate each value */
> +	id = btf__find_by_name_kind(btf, "bpf_map_type", BTF_KIND_ENUM);
> +	if (!ASSERT_GT(id, 0, "bpf_map_type_id"))
> +		goto cleanup;
> +
> +	t = btf__type_by_id(btf, id);
> +	e = btf_enum(t);
> +	n = btf_vlen(t);
> +	for (i = 0; i < n; e++, i++) {
> +		enum bpf_map_type map_type = (enum bpf_map_type)e->val;
> +		const char *map_type_name;
> +		__u64 orig_caps;
> +		bool is_map_priv;
> +		bool needs_btf;
> +
> +		if (map_type == BPF_MAP_TYPE_UNSPEC)
> +			continue;
> +
> +		/* this will show which map type we are working with in verbose log */
> +		map_type_name = btf__str_by_offset(btf, e->name_off);
> +		ASSERT_OK_PTR(map_type_name, map_type_name);
> +
> +		switch (map_type) {
> +		case BPF_MAP_TYPE_ARRAY:
> +		case BPF_MAP_TYPE_PERCPU_ARRAY:
> +		case BPF_MAP_TYPE_PROG_ARRAY:
> +		case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
> +		case BPF_MAP_TYPE_CGROUP_ARRAY:
> +		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
> +		case BPF_MAP_TYPE_HASH:
> +		case BPF_MAP_TYPE_PERCPU_HASH:
> +		case BPF_MAP_TYPE_HASH_OF_MAPS:
> +		case BPF_MAP_TYPE_RINGBUF:
> +		case BPF_MAP_TYPE_USER_RINGBUF:
> +		case BPF_MAP_TYPE_CGROUP_STORAGE:
> +		case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
> +			is_map_priv = false;
> +			needs_btf = false;
> +			break;
> +		case BPF_MAP_TYPE_SK_STORAGE:
> +		case BPF_MAP_TYPE_INODE_STORAGE:
> +		case BPF_MAP_TYPE_TASK_STORAGE:
> +		case BPF_MAP_TYPE_CGRP_STORAGE:
> +			is_map_priv = true;
> +			needs_btf = true;
> +			break;
> +		default:
> +			is_map_priv = true;
> +			needs_btf = false;
> +		}
> +
> +		/* make sure we delegate to kernel for final decision */
> +		skel->bss->decision = 0;
> +
> +		/* we are normally under sudo, so all maps should succeed */
> +		ret = libbpf_probe_bpf_map_type(map_type, NULL);
> +		ASSERT_EQ(ret, 1, "default_priv_mode");
> +
> +		/* local storage needs custom BTF to be loaded, which we
> +		 * currently can't do once we drop privileges, so skip few
> +		 * checks for such maps
> +		 */
> +		if (needs_btf)
> +			goto skip_if_needs_btf;
> +
> +		/* now let's drop privileges, and chech that unpriv maps are
> +		 * still possible to create
> +		 */
> +		if (!ASSERT_OK(drop_priv_caps(&orig_caps), "drop_caps"))
> +			goto cleanup;
> +
> +		ret = libbpf_probe_bpf_map_type(map_type, NULL);
> +		ASSERT_EQ(ret, is_map_priv ? 0 : 1,  "default_unpriv_mode");
> +
> +		/* allow any map creation for our thread */
> +		skel->bss->decision = 1;
> +		ret = libbpf_probe_bpf_map_type(map_type, NULL);
> +		ASSERT_EQ(ret, 1, "lsm_allow_unpriv_mode");
> +
> +		/* reject any map creation for our thread */
> +		skel->bss->decision = -1;
> +		ret = libbpf_probe_bpf_map_type(map_type, NULL);
> +		ASSERT_EQ(ret, 0, "lsm_reject_unpriv_mode");
> +
> +		/* restore privileges, but keep reject LSM policy */
> +		if (!ASSERT_OK(restore_priv_caps(orig_caps), "restore_caps"))
> +			goto cleanup;
> +
> +skip_if_needs_btf:
> +		/* even with all caps map create will fail */
> +		skel->bss->decision = -1;
> +		ret = libbpf_probe_bpf_map_type(map_type, NULL);
> +		ASSERT_EQ(ret, 0, "lsm_reject_priv_mode");
> +	}
> +
> +cleanup:
> +	btf__free(btf);
> +	lsm_map_create__destroy(skel);
> +}

This test looks good! One meta-comment about testing would be: are you
sure each needs to be ASSERT instead of EXPECT? (i.e. should forward
progress through this test always be aborted when a check fails?)

> diff --git a/tools/testing/selftests/bpf/progs/lsm_map_create.c b/tools/testing/selftests/bpf/progs/lsm_map_create.c
> new file mode 100644
> index 000000000000..093825c68459
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/lsm_map_create.c
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <errno.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int my_tid;
> +/* LSM enforcement:
> + *   - 0, delegate to kernel;
> + *   - 1, allow;
> + *   - -1, reject.
> + */
> +int decision;
> +
> +SEC("lsm/bpf_map_create_security")
> +int BPF_PROG(allow_unpriv_maps, union bpf_attr *attr)
> +{
> +	if (!my_tid || (u32)bpf_get_current_pid_tgid() != my_tid)
> +		return 0; /* keep processing LSM hooks */
> +
> +	if (decision == 0)
> +		return 0;
> +
> +	if (decision > 0)
> +		return 1; /* allow */
> +
> +	return -EPERM;
> +}
> diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
> index 10ba43250668..12f9c6652d40 100644
> --- a/tools/testing/selftests/bpf/test_progs.h
> +++ b/tools/testing/selftests/bpf/test_progs.h
> @@ -23,6 +23,7 @@ typedef __u16 __sum16;
>  #include <linux/perf_event.h>
>  #include <linux/socket.h>
>  #include <linux/unistd.h>
> +#include <sys/syscall.h>
>  
>  #include <sys/ioctl.h>
>  #include <sys/wait.h>
> @@ -176,6 +177,11 @@ void test__skip(void);
>  void test__fail(void);
>  int test__join_cgroup(const char *path);
>  
> +static inline int gettid(void)
> +{
> +	return syscall(SYS_gettid);
> +}
> +
>  #define PRINT_FAIL(format...)                                                  \
>  	({                                                                     \
>  		test__fail();                                                  \
> -- 
> 2.34.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  2023-04-12  4:32 ` [PATCH bpf-next 6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command Andrii Nakryiko
@ 2023-04-12 18:24   ` Kees Cook
  2023-04-13  0:17     ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2023-04-12 18:24 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kpsingh, paul, linux-security-module

On Tue, Apr 11, 2023 at 09:32:58PM -0700, Andrii Nakryiko wrote:
> Seems like that extra bpf_capable() check in BPF_MAP_FREEZE handler was
> unintentionally left when we switched to a model that all BPF map
> operations should be allowed regardless of CAP_BPF (or any other
> capabilities), as long as process got BPF map FD somehow.
> 
> This patch replaces bpf_capable() check in BPF_MAP_FREEZE handler with
> writeable access check, given conceptually freezing the map is modifying
> it: map becomes unmodifiable for subsequent updates.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Is this patch stand-alone? It seems like this could be taken separately,
or at least just be the first patch in the series?

> ---
>  kernel/bpf/syscall.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 7d1165814efc..42d8473237ab 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2001,6 +2001,11 @@ static int map_freeze(const union bpf_attr *attr)
>  		return -ENOTSUPP;
>  	}
>  
> +	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
> +		err = -EPERM;
> +		goto err_put;
> +	}
> +
>  	mutex_lock(&map->freeze_mutex);
>  	if (bpf_map_write_active(map)) {
>  		err = -EBUSY;
> @@ -2010,10 +2015,6 @@ static int map_freeze(const union bpf_attr *attr)
>  		err = -EBUSY;
>  		goto err_put;
>  	}
> -	if (!bpf_capable()) {
> -		err = -EPERM;
> -		goto err_put;
> -	}
>  
>  	WRITE_ONCE(map->frozen, true);
>  err_put:
> -- 
> 2.34.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12 18:06     ` Paul Moore
@ 2023-04-12 18:28       ` Kees Cook
  2023-04-12 19:06         ` Paul Moore
  2023-04-12 18:38       ` Casey Schaufler
  1 sibling, 1 reply; 52+ messages in thread
From: Kees Cook @ 2023-04-12 18:28 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, linux-security-module

On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > objects, which are fundamental building blocks of any modern BPF application.
> > > >
> > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > implement LSM policies that could granularly enforce more restrictions on
> > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > cases.
> > >
> > > One of the hallmarks of the LSM has always been that it is
> > > non-authoritative: it cannot unilaterally grant access, it can only
> > > restrict what would have been otherwise permitted on a traditional
> > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > discretionary access controls, e.g. capabilities.
> > >
> > > If there is a problem with the eBPF capability-based access controls,
> > > that problem needs to be addressed in how the core eBPF code
> > > implements its capability checks, not by modifying the LSM mechanism
> > > to bypass these checks.
> >
> > I think semantics matter here. I wouldn't view this as _bypassing_
> > capability enforcement: it's just more fine-grained access control.
> >
> > For example, in many places we have things like:
> >
> >         if (!some_check(...) && !capable(...))
> >                 return -EPERM;
> >
> > I would expect this is a similar logic. An operation can succeed if the
> > access control requirement is met. The mismatch we have through-out the
> > kernel is that capability checks aren't strictly done by LSM hooks. And
> > this series conceptually, I think, doesn't violate that -- it's changing
> > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > yet here).
> 
> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> when it returns a positive value "bypasses kernel checks".  The patch
> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> based on a eBPF tree, so I can't say with 100% certainty that it is
> bypassing a capability check, but the description claims that to be
> the case.
> 
> Regardless of how you want to spin this, I'm not supportive of a LSM
> hook which allows a LSM to bypass a capability check.  A LSM hook can
> be used to provide additional access control restrictions beyond a
> capability check, but a LSM hook should never be allowed to overrule
> an access denial due to a capability check.
> 
> > The reason CAP_BPF was created was because there was nothing else that
> > would be fine-grained enough at the time.
> 
> The LSM layer predates CAP_BPF, and one could make a very solid
> argument that one of the reasons LSMs exist is to provide
> supplementary controls due to capability-based access controls being a
> poor fit for many modern use cases.

I generally agree with what you say, but we DO have this code pattern:

         if (!some_check(...) && !capable(...))
                 return -EPERM;

It looks to me like this series can be refactored to do the same. I
wouldn't consider that to be a "bypass", but I would agree the current
series looks too much like "bypass", and makes reasoning about the
effect of the LSM hooks too "special". :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12 18:06     ` Paul Moore
  2023-04-12 18:28       ` Kees Cook
@ 2023-04-12 18:38       ` Casey Schaufler
  1 sibling, 0 replies; 52+ messages in thread
From: Casey Schaufler @ 2023-04-12 18:38 UTC (permalink / raw)
  To: Paul Moore, Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module, Casey Schaufler

On 4/12/2023 11:06 AM, Paul Moore wrote:
> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
>>>> objects, which are fundamental building blocks of any modern BPF application.
>>>>
>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
>>>> implement LSM policies that could granularly enforce more restrictions on
>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
>>>> capabilities), but also, importantly, allow to *bypass kernel-side
>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
>>>> cases.
>>> One of the hallmarks of the LSM has always been that it is
>>> non-authoritative: it cannot unilaterally grant access, it can only
>>> restrict what would have been otherwise permitted on a traditional
>>> Linux system.  Put another way, a LSM should not undermine the Linux
>>> discretionary access controls, e.g. capabilities.
>>>
>>> If there is a problem with the eBPF capability-based access controls,
>>> that problem needs to be addressed in how the core eBPF code
>>> implements its capability checks, not by modifying the LSM mechanism
>>> to bypass these checks.

Agreed. A lot of thought went into this. The LSM mechanism would be
vastly different if the hooks were authoritative instead of restrictive.

>> I think semantics matter here. I wouldn't view this as _bypassing_
>> capability enforcement: it's just more fine-grained access control.
>>
>> For example, in many places we have things like:
>>
>>         if (!some_check(...) && !capable(...))
>>                 return -EPERM;
>>
>> I would expect this is a similar logic. An operation can succeed if the
>> access control requirement is met. The mismatch we have through-out the
>> kernel is that capability checks aren't strictly done by LSM hooks. And
>> this series conceptually, I think, doesn't violate that -- it's changing
>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>> yet here).
> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> when it returns a positive value "bypasses kernel checks".  The patch
> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> based on a eBPF tree, so I can't say with 100% certainty that it is
> bypassing a capability check, but the description claims that to be
> the case.
>
> Regardless of how you want to spin this, I'm not supportive of a LSM
> hook which allows a LSM to bypass a capability check.  A LSM hook can
> be used to provide additional access control restrictions beyond a
> capability check, but a LSM hook should never be allowed to overrule
> an access denial due to a capability check.
>
>> The reason CAP_BPF was created was because there was nothing else that
>> would be fine-grained enough at the time.

There's nothing stopping you from having a fine grained mechanism that
further restricts a process with CAP_BPF. SELinux implements many checks
that can, policy willing, restrict a process with a capability from doing
what the capability permits.

> The LSM layer predates CAP_BPF, and one could make a very solid
> argument that one of the reasons LSMs exist is to provide
> supplementary controls due to capability-based access controls being a
> poor fit for many modern use cases.
>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12 18:28       ` Kees Cook
@ 2023-04-12 19:06         ` Paul Moore
  2023-04-13  1:43           ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Paul Moore @ 2023-04-12 19:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, linux-security-module

On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > >
> > > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > > objects, which are fundamental building blocks of any modern BPF application.
> > > > >
> > > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > > implement LSM policies that could granularly enforce more restrictions on
> > > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > > cases.
> > > >
> > > > One of the hallmarks of the LSM has always been that it is
> > > > non-authoritative: it cannot unilaterally grant access, it can only
> > > > restrict what would have been otherwise permitted on a traditional
> > > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > > discretionary access controls, e.g. capabilities.
> > > >
> > > > If there is a problem with the eBPF capability-based access controls,
> > > > that problem needs to be addressed in how the core eBPF code
> > > > implements its capability checks, not by modifying the LSM mechanism
> > > > to bypass these checks.
> > >
> > > I think semantics matter here. I wouldn't view this as _bypassing_
> > > capability enforcement: it's just more fine-grained access control.
> > >
> > > For example, in many places we have things like:
> > >
> > >         if (!some_check(...) && !capable(...))
> > >                 return -EPERM;
> > >
> > > I would expect this is a similar logic. An operation can succeed if the
> > > access control requirement is met. The mismatch we have through-out the
> > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > this series conceptually, I think, doesn't violate that -- it's changing
> > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > yet here).
> >
> > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > when it returns a positive value "bypasses kernel checks".  The patch
> > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > based on a eBPF tree, so I can't say with 100% certainty that it is
> > bypassing a capability check, but the description claims that to be
> > the case.
> >
> > Regardless of how you want to spin this, I'm not supportive of a LSM
> > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > be used to provide additional access control restrictions beyond a
> > capability check, but a LSM hook should never be allowed to overrule
> > an access denial due to a capability check.
> >
> > > The reason CAP_BPF was created was because there was nothing else that
> > > would be fine-grained enough at the time.
> >
> > The LSM layer predates CAP_BPF, and one could make a very solid
> > argument that one of the reasons LSMs exist is to provide
> > supplementary controls due to capability-based access controls being a
> > poor fit for many modern use cases.
>
> I generally agree with what you say, but we DO have this code pattern:
>
>          if (!some_check(...) && !capable(...))
>                  return -EPERM;

I think we need to make this more concrete; we don't have a pattern in
the upstream kernel where 'some_check(...)' is a LSM hook, right?
Simply because there is another kernel access control mechanism which
allows a capability check to be skipped doesn't mean I want to allow a
LSM hook to be used to skip a capability check.

> It looks to me like this series can be refactored to do the same. I
> wouldn't consider that to be a "bypass", but I would agree the current
> series looks too much like "bypass", and makes reasoning about the
> effect of the LSM hooks too "special". :)

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  2023-04-12 18:24   ` Kees Cook
@ 2023-04-13  0:17     ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  0:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, paul, linux-security-module

On Wed, Apr 12, 2023 at 11:24 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Apr 11, 2023 at 09:32:58PM -0700, Andrii Nakryiko wrote:
> > Seems like that extra bpf_capable() check in BPF_MAP_FREEZE handler was
> > unintentionally left when we switched to a model that all BPF map
> > operations should be allowed regardless of CAP_BPF (or any other
> > capabilities), as long as process got BPF map FD somehow.
> >
> > This patch replaces bpf_capable() check in BPF_MAP_FREEZE handler with
> > writeable access check, given conceptually freezing the map is modifying
> > it: map becomes unmodifiable for subsequent updates.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Is this patch stand-alone? It seems like this could be taken separately,
> or at least just be the first patch in the series?
>

yep, I'll send it separately, good point

> > ---
> >  kernel/bpf/syscall.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 7d1165814efc..42d8473237ab 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2001,6 +2001,11 @@ static int map_freeze(const union bpf_attr *attr)
> >               return -ENOTSUPP;
> >       }
> >
> > +     if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
> > +             err = -EPERM;
> > +             goto err_put;
> > +     }
> > +
> >       mutex_lock(&map->freeze_mutex);
> >       if (bpf_map_write_active(map)) {
> >               err = -EBUSY;
> > @@ -2010,10 +2015,6 @@ static int map_freeze(const union bpf_attr *attr)
> >               err = -EBUSY;
> >               goto err_put;
> >       }
> > -     if (!bpf_capable()) {
> > -             err = -EPERM;
> > -             goto err_put;
> > -     }
> >
> >       WRITE_ONCE(map->frozen, true);
> >  err_put:
> > --
> > 2.34.1
> >
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-04-12 17:49   ` Kees Cook
@ 2023-04-13  0:22     ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  0:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, paul, linux-security-module

On Wed, Apr 12, 2023 at 10:49 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Apr 11, 2023 at 09:32:53PM -0700, Andrii Nakryiko wrote:
> > Make each bpf() syscall command a bit more self-contained, making it
> > easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
> > handling down to map_create() and bpf_prog_load(), two special commands
> > in this regard.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
> >  1 file changed, 22 insertions(+), 15 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 6d575505f89c..c1d268025985 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1130,6 +1130,17 @@ static int map_create(union bpf_attr *attr)
> >       int f_flags;
> >       int err;
> >
> > +     /* Intent here is for unprivileged_bpf_disabled to block key object
> > +      * creation commands for unprivileged users; other actions depend
> > +      * of fd availability and access to bpffs, so are dependent on
> > +      * object creation success.  Capabilities are later verified for
> > +      * operations such as load and map create, so even with unprivileged
> > +      * BPF disabled, capability checks are still carried out for these
> > +      * and other operations.
> > +      */
> > +     if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> > +             return -EPERM;
>
> This appears to be a problem in the original code, but capability checks
> should be last, so that audit doesn't see a capability as having been
> used when it wasn't. i.e. if bpf_capable() passes, but
> sysctl_unprivileged_bpf_disabled isn't true, it'll look like a
> capability got used, and the flag gets set. Not a big deal at the end of
> the day, but the preferred ordering should be:
>
>         if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>                 ...
>

makes sense, I'll change the order



> > +
> >       err = CHECK_ATTR(BPF_MAP_CREATE);
> >       if (err)
> >               return -EINVAL;
> > @@ -2512,6 +2523,17 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
> >       char license[128];
> >       bool is_gpl;
> >
> > +     /* Intent here is for unprivileged_bpf_disabled to block key object
> > +      * creation commands for unprivileged users; other actions depend
> > +      * of fd availability and access to bpffs, so are dependent on
> > +      * object creation success.  Capabilities are later verified for
> > +      * operations such as load and map create, so even with unprivileged
> > +      * BPF disabled, capability checks are still carried out for these
> > +      * and other operations.
> > +      */
> > +     if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> > +             return -EPERM;
> > +
> >       if (CHECK_ATTR(BPF_PROG_LOAD))
> >               return -EINVAL;
> >
> > @@ -5008,23 +5030,8 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
> >  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
> >  {
> >       union bpf_attr attr;
> > -     bool capable;
> >       int err;
> >
> > -     capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled;
> > -
> > -     /* Intent here is for unprivileged_bpf_disabled to block key object
> > -      * creation commands for unprivileged users; other actions depend
> > -      * of fd availability and access to bpffs, so are dependent on
> > -      * object creation success.  Capabilities are later verified for
> > -      * operations such as load and map create, so even with unprivileged
> > -      * BPF disabled, capability checks are still carried out for these
> > -      * and other operations.
> > -      */
> > -     if (!capable &&
> > -         (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
> > -             return -EPERM;
> > -
> >       err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
> >       if (err)
> >               return err;
> > --
> > 2.34.1
> >
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 2/8] bpf: inline map creation logic in map_create() function
  2023-04-12 17:53   ` Kees Cook
@ 2023-04-13  0:22     ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  0:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, paul, linux-security-module

On Wed, Apr 12, 2023 at 10:53 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Apr 11, 2023 at 09:32:54PM -0700, Andrii Nakryiko wrote:
> > Keep all the relevant generic sanity checks, permission checks, and
> > creation and initialization logic in one linear piece of code. Currently
> > helper function that handles memory allocation and partial
> > initialization is split apart and is about 1000 lines higher in the
> > file, hurting readability.
>
> At first glance, this seems like a step in the wrong direction: having a
> single-purpose function pulled out of a larger one seems like a good
> thing for stuff like unit testing, etc. Unless there's a reason later in
> the series for this inlining (which should be called out in the
> changelog here), I would say if it is only readability, just move the
> function down 1000 lines but leave it a separate function.

Oh, I should probably clarify this in the commit message. This
function is not that single-function, really. It performs some sanity
checking and then allocates and (partially) initializes the BPF map
itself. By "inlining" it, it makes it possible to perform these sanity
checks first, then do capabilities/security checks, and only if both
pass, allocate and initialize the map. Next patch inserts
(centralizes) all the spread out capabilities checks from per-map
custom callbacks into the same function, right before performing map
allocation and initialization, but after validation of parameters.

So yeah, I do take advantage of this in the next patch, because LSM
hook gets validated bpf_uattr. I'll call this out more clearly. It's
definitely not just moving code around for no good reason.


>
> -Kees
>
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  kernel/bpf/syscall.c | 54 ++++++++++++++++++--------------------------
> >  1 file changed, 22 insertions(+), 32 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index c1d268025985..a090737f98ea 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -108,37 +108,6 @@ const struct bpf_map_ops bpf_map_offload_ops = {
> >       .map_mem_usage = bpf_map_offload_map_mem_usage,
> >  };
> >
> > -static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
> > -{
> > -     const struct bpf_map_ops *ops;
> > -     u32 type = attr->map_type;
> > -     struct bpf_map *map;
> > -     int err;
> > -
> > -     if (type >= ARRAY_SIZE(bpf_map_types))
> > -             return ERR_PTR(-EINVAL);
> > -     type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));
> > -     ops = bpf_map_types[type];
> > -     if (!ops)
> > -             return ERR_PTR(-EINVAL);
> > -
> > -     if (ops->map_alloc_check) {
> > -             err = ops->map_alloc_check(attr);
> > -             if (err)
> > -                     return ERR_PTR(err);
> > -     }
> > -     if (attr->map_ifindex)
> > -             ops = &bpf_map_offload_ops;
> > -     if (!ops->map_mem_usage)
> > -             return ERR_PTR(-EINVAL);
> > -     map = ops->map_alloc(attr);
> > -     if (IS_ERR(map))
> > -             return map;
> > -     map->ops = ops;
> > -     map->map_type = type;
> > -     return map;
> > -}
> > -
> >  static void bpf_map_write_active_inc(struct bpf_map *map)
> >  {
> >       atomic64_inc(&map->writecnt);
> > @@ -1124,7 +1093,9 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >  /* called via syscall */
> >  static int map_create(union bpf_attr *attr)
> >  {
> > +     const struct bpf_map_ops *ops;
> >       int numa_node = bpf_map_attr_numa_node(attr);
> > +     u32 map_type = attr->map_type;
> >       struct btf_field_offs *foffs;
> >       struct bpf_map *map;
> >       int f_flags;
> > @@ -1167,9 +1138,28 @@ static int map_create(union bpf_attr *attr)
> >               return -EINVAL;
> >
> >       /* find map type and init map: hashtable vs rbtree vs bloom vs ... */
> > -     map = find_and_alloc_map(attr);
> > +     map_type = attr->map_type;
> > +     if (map_type >= ARRAY_SIZE(bpf_map_types))
> > +             return -EINVAL;
> > +     map_type = array_index_nospec(map_type, ARRAY_SIZE(bpf_map_types));
> > +     ops = bpf_map_types[map_type];
> > +     if (!ops)
> > +             return -EINVAL;
> > +
> > +     if (ops->map_alloc_check) {
> > +             err = ops->map_alloc_check(attr);
> > +             if (err)
> > +                     return err;
> > +     }
> > +     if (attr->map_ifindex)
> > +             ops = &bpf_map_offload_ops;
> > +     if (!ops->map_mem_usage)
> > +             return -EINVAL;
> > +     map = ops->map_alloc(attr);
> >       if (IS_ERR(map))
> >               return PTR_ERR(map);
> > +     map->ops = ops;
> > +     map->map_type = map_type;
> >
> >       err = bpf_obj_name_cpy(map->name, attr->map_name,
> >                              sizeof(attr->map_name));
> > --
> > 2.34.1
> >
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 3/8] bpf: centralize permissions checks for all BPF map types
  2023-04-12 18:01   ` Kees Cook
@ 2023-04-13  0:23     ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  0:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, paul, linux-security-module

On Wed, Apr 12, 2023 at 11:01 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Apr 11, 2023 at 09:32:55PM -0700, Andrii Nakryiko wrote:
> > This allows to do more centralized decisions later on, and generally
> > makes it very explicit which maps are privileged and which are not.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > [...]
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 00c253b84bf5..c69db80fc947 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -422,12 +422,6 @@ static int htab_map_alloc_check(union bpf_attr *attr)
> >       BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
> >                    offsetof(struct htab_elem, hash_node.pprev));
> >
> > -     if (lru && !bpf_capable())
> > -             /* LRU implementation is much complicated than other
> > -              * maps.  Hence, limit to CAP_BPF.
> > -              */
> > -             return -EPERM;
> > -
>
> The LRU part of this check gets lost, doesn't it? More specifically,
> doesn't this make the security check for htab_map_alloc_check() more
> strict than before? (If that's okay, please mention the logical change
> in the commit log.)

Patch diff doesn't make this very obvious, unfortunately, but lru
variable is defined as

        bool lru = (attr->map_type == BPF_MAP_TYPE_LRU_HASH ||
                    attr->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH);

And below I'm adding explicit big switch where BPF_MAP_TYPE_LRU_HASH
and BPF_MAP_TYPE_LRU_PERCPU_HASH do bpf_capable() check, while non-LRU
hashes (like BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_PERCPU_HASH) do not.
So I think the semantics was preserved.


>
> > [...]
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index a090737f98ea..cbea4999e92f 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1101,17 +1101,6 @@ static int map_create(union bpf_attr *attr)
> >       int f_flags;
> >       int err;
> >
> > -     /* Intent here is for unprivileged_bpf_disabled to block key object
> > -      * creation commands for unprivileged users; other actions depend
> > -      * of fd availability and access to bpffs, so are dependent on
> > -      * object creation success.  Capabilities are later verified for
> > -      * operations such as load and map create, so even with unprivileged
> > -      * BPF disabled, capability checks are still carried out for these
> > -      * and other operations.
> > -      */
> > -     if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> > -             return -EPERM;
> > -
>
> Given that this was already performing a centralized capability check,
> why were the individual functions doing checks before too?
>
> (I'm wondering if the individual functions remain the better place to do
> this checking?)

This sysctl_unprivileged_bpf_disabled was added much later to tighten
up security across any type of map/program. Just keep in mind that
sysctl_unprivileged_bpf_disabled is not mandatory, so some distros
might choose not to restrict unprivileged map creation yet.

So I think centralized makes more sense. And as you noticed below, it
allows us to easily be more strict by default (if we forget to add
bpf_capable check for new map type).

>
> >       err = CHECK_ATTR(BPF_MAP_CREATE);
> >       if (err)
> >               return -EINVAL;
> > @@ -1155,6 +1144,65 @@ static int map_create(union bpf_attr *attr)
> >               ops = &bpf_map_offload_ops;
> >       if (!ops->map_mem_usage)
> >               return -EINVAL;
> > +
> > +     /* Intent here is for unprivileged_bpf_disabled to block key object
> > +      * creation commands for unprivileged users; other actions depend
> > +      * of fd availability and access to bpffs, so are dependent on
> > +      * object creation success.  Capabilities are later verified for
> > +      * operations such as load and map create, so even with unprivileged
> > +      * BPF disabled, capability checks are still carried out for these
> > +      * and other operations.
> > +      */
> > +     if (!bpf_capable() && sysctl_unprivileged_bpf_disabled)
> > +             return -EPERM;
> > +
> > +     /* check privileged map type permissions */
> > +     switch (map_type) {
> > +     case BPF_MAP_TYPE_SK_STORAGE:
> > +     case BPF_MAP_TYPE_INODE_STORAGE:
> > +     case BPF_MAP_TYPE_TASK_STORAGE:
> > +     case BPF_MAP_TYPE_CGRP_STORAGE:
> > +     case BPF_MAP_TYPE_BLOOM_FILTER:
> > +     case BPF_MAP_TYPE_LPM_TRIE:
> > +     case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
> > +     case BPF_MAP_TYPE_STACK_TRACE:
> > +     case BPF_MAP_TYPE_QUEUE:
> > +     case BPF_MAP_TYPE_STACK:
> > +     case BPF_MAP_TYPE_LRU_HASH:
> > +     case BPF_MAP_TYPE_LRU_PERCPU_HASH:
> > +     case BPF_MAP_TYPE_STRUCT_OPS:
> > +     case BPF_MAP_TYPE_CPUMAP:
> > +             if (!bpf_capable())
> > +                     return -EPERM;
> > +             break;
> > +     case BPF_MAP_TYPE_SOCKMAP:
> > +     case BPF_MAP_TYPE_SOCKHASH:
> > +     case BPF_MAP_TYPE_DEVMAP:
> > +     case BPF_MAP_TYPE_DEVMAP_HASH:
> > +     case BPF_MAP_TYPE_XSKMAP:
> > +             if (!capable(CAP_NET_ADMIN))
> > +                     return -EPERM;
> > +             break;
> > +     case BPF_MAP_TYPE_ARRAY:
> > +     case BPF_MAP_TYPE_PERCPU_ARRAY:
> > +     case BPF_MAP_TYPE_PROG_ARRAY:
> > +     case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
> > +     case BPF_MAP_TYPE_CGROUP_ARRAY:
> > +     case BPF_MAP_TYPE_ARRAY_OF_MAPS:
> > +     case BPF_MAP_TYPE_HASH:
> > +     case BPF_MAP_TYPE_PERCPU_HASH:
> > +     case BPF_MAP_TYPE_HASH_OF_MAPS:
> > +     case BPF_MAP_TYPE_RINGBUF:
> > +     case BPF_MAP_TYPE_USER_RINGBUF:
> > +     case BPF_MAP_TYPE_CGROUP_STORAGE:
> > +     case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
> > +             /* unprivileged */
> > +             break;
> > +     default:
> > +             WARN(1, "unsupported map type %d", map_type);
> > +             return -EPERM;
>
> Thank you for making sure this fails safe! :)

Sure :)


>
> > +     }
> > +
> >       map = ops->map_alloc(attr);
> >       if (IS_ERR(map))
> >               return PTR_ERR(map);
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 7c189c2e2fbf..4b67bb5e7f9c 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -32,8 +32,6 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
> >  {
> >       struct bpf_stab *stab;
> >
> > -     if (!capable(CAP_NET_ADMIN))
> > -             return ERR_PTR(-EPERM);
> >       if (attr->max_entries == 0 ||
> >           attr->key_size    != 4 ||
> >           (attr->value_size != sizeof(u32) &&
> > @@ -1085,8 +1083,6 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
> >       struct bpf_shtab *htab;
> >       int i, err;
> >
> > -     if (!capable(CAP_NET_ADMIN))
> > -             return ERR_PTR(-EPERM);
> >       if (attr->max_entries == 0 ||
> >           attr->key_size    == 0 ||
> >           (attr->value_size != sizeof(u32) &&
> > diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
> > index 2c1427074a3b..e1c526f97ce3 100644
> > --- a/net/xdp/xskmap.c
> > +++ b/net/xdp/xskmap.c
> > @@ -5,7 +5,6 @@
> >
> >  #include <linux/bpf.h>
> >  #include <linux/filter.h>
> > -#include <linux/capability.h>
> >  #include <net/xdp_sock.h>
> >  #include <linux/slab.h>
> >  #include <linux/sched.h>
> > @@ -68,9 +67,6 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
> >       int numa_node;
> >       u64 size;
> >
> > -     if (!capable(CAP_NET_ADMIN))
> > -             return ERR_PTR(-EPERM);
> > -
> >       if (attr->max_entries == 0 || attr->key_size != 4 ||
> >           attr->value_size != 4 ||
> >           attr->map_flags & ~(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY))
> > diff --git a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
> > index 8383a99f610f..0adf8d9475cb 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
> > @@ -171,7 +171,11 @@ static void test_unpriv_bpf_disabled_negative(struct test_unpriv_bpf_disabled *s
> >                               prog_insns, prog_insn_cnt, &load_opts),
> >                 -EPERM, "prog_load_fails");
> >
> > -     for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_BLOOM_FILTER; i++)
> > +     /* some map types require particular correct parameters which could be
> > +      * sanity-checked before enforcing -EPERM, so only validate that
> > +      * the simple ARRAY and HASH maps are failing with -EPERM
> > +      */
> > +     for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_ARRAY; i++)
> >               ASSERT_EQ(bpf_map_create(i, NULL, sizeof(int), sizeof(int), 1, NULL),
> >                         -EPERM, "map_create_fails");
> >
> > --
> > 2.34.1
> >
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 4/8] bpf, lsm: implement bpf_map_create_security LSM hook
  2023-04-12 18:20   ` Kees Cook
@ 2023-04-13  0:23     ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  0:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, paul, linux-security-module

On Wed, Apr 12, 2023 at 11:20 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Apr 11, 2023 at 09:32:56PM -0700, Andrii Nakryiko wrote:
> > Add new LSM hook, bpf_map_create_security, that allows custom LSM
> > security policies controlling BPF map creation permissions granularly
> > and precisely.
>
> Naming nit-pick: the hook name doesn't need the "_security" suffix, if I'm
> reading this correctly. The LSM hooks with that are really around the
> allocation/initialization of LSM-specific memory (i.e. attach
> LSM-specific allocation to an inode, etc).

Ah, I didn't know about this convention. I though that _security is
preferred way, as we have a bunch more BPF-related hooks with this
naming pattern (bpf_map_free_security, bpf_prog_free_security).

>
> The hook looks like it's "only" policy, so it can just be called
> "bpf_map_create".

Yep, I'll drop _security suffix, no need to add to confusion.


>
> > This new LSM hook allows to implement both LSM policy that could enforce
> > more granular and restrictive decisions about which processes can create
> > which BPF maps, by rejecting BPF map creation based on passed in
> > bpf_attr attributes. But also it allows to bypass CAP_BPF and
> > CAP_NET_ADMIN restrictions, normally enforced by kernel, for
> > applications that LSM policy deems trusted. Trustworthiness
> > determination of the process/user/cgroup/etc is left up to custom LSM
> > hook implementation and will dependon particular production setup of
> > each individual use case.
>
> As Paul mentioned, we need to give a careful examination of the access
> control logic here. BPF is not deal with POSIX or DAC rules, so I think
> there isn't a problem being flexible here, but it would be nice to find
> a way to make this be "default reject" via capabilities that doesn't
> differ much from the way things happen normally in the LSM (so that it
> can be successfully reasoned about without need to consider BPF-specific
> "special cases").

It is definitely not my intent to create unnecessary special casing
here. I think it does "default reject" (modulo cases when one doesn't
require extra permissions already), see below, but if I missed
anything, please do point out.

>
> > If LSM policy wants to rely on default kernel logic, it can return
> > 0 to delegate back to kernel. If it returns >0 return code,
> > kernel will bypass its normal checks. This way it's possible to perform
> > a delegation of trust (specifically for BPF map creation) from
> > privileged LSM custom policy implementation to unprivileged user
> > process, verifier and trusted by custom LSM policy.
>
> At the least, I think the language of "bypass" is going to cause a not
> of friction. :) We make to make sure this fails safe -- if there is no
> loaded policy, capable() needs to remain the back-stop.

I was under the impression that that's how it works already. These
hooks use call_int_hook() helper and specify 0 as default. If no LSM
hook is installed, the returned result will stay 0, which will fall
through to normal kernel checks we have.

When you say "make sure this fails safe", do you mean to just double
check this semantics, or some extra code and checks that I should add
to make sure this works as expected?


>
> > Such model allows flexible and secure-by-default approach where user
> > processes that need to use BPF features (BPF map creation, in this case)
> > are left unprivileged with no CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON, etc.
> > capabilities, but specific exceptions are implemented (usually in
> > a centralized server fleet-wide fashion) for trusted
> > processes/containers/users, allowing them to manipulate BPF facilities,
> > as long as they are allowed and known apriori.
>
> if (!unprivileged_allowed(...) && !capable(...))
>         return -EPERM;
>
> and uprivileged_allowed() is looking at the sysctl and LSM policy.

make sense, I'll refactor all this to have this more recognizable
"shape" to make the intent clearer, thanks

>
> >
> > This patch implements first required part for full-fledged BPF usage:
> > map creation. The other one, BPF program load, will be addressed in
> > follow up patches.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/lsm_hook_defs.h |  1 +
> >  include/linux/lsm_hooks.h     | 12 ++++++++++++
> >  include/linux/security.h      |  6 ++++++
> >  kernel/bpf/bpf_lsm.c          |  1 +
> >  kernel/bpf/syscall.c          | 19 ++++++++++++++++---
> >  security/security.c           |  4 ++++
> >  6 files changed, 40 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> > index 094b76dc7164..b4fe9ed7021a 100644
> > --- a/include/linux/lsm_hook_defs.h
> > +++ b/include/linux/lsm_hook_defs.h
> > @@ -396,6 +396,7 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
> >  LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
> >  LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
> >  LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
> > +LSM_HOOK(int, 0, bpf_map_create_security, const union bpf_attr *attr)
> >  LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
> >  LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
> >  LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
> > diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> > index 6e156d2acffc..42bf7c0aa4d8 100644
> > --- a/include/linux/lsm_hooks.h
> > +++ b/include/linux/lsm_hooks.h
> > @@ -1598,6 +1598,18 @@
> >   *   @prog: bpf prog that userspace want to use.
> >   *   Return 0 if permission is granted.
> >   *
> > + * @bpf_map_create_security:
> > + *   Do a check to determine permission to create requested BPF map.
> > + *   Implementation can override kernel capabilities checks according to
> > + *   the rules below:
> > + *     - 0 should be returned to delegate permission checks to other
> > + *       installed LSM callbacks and/or hard-wired kernel logic, which
> > + *       would enforce CAP_BPF/CAP_NET_ADMIN capabilities;
> > + *     - reject BPF map creation by returning -EPERM or any other
> > + *       negative error code;
> > + *     - allow BPF map creation, overriding kernel checks, by returning
> > + *       a positive result.
> > + *
> >   * @bpf_map_alloc_security:
> >   *   Initialize the security field inside bpf map.
> >   *   Return 0 on success, error on failure.
> > diff --git a/include/linux/security.h b/include/linux/security.h
> > index 5984d0d550b4..e5374fe92ef6 100644
> > --- a/include/linux/security.h
> > +++ b/include/linux/security.h
> > @@ -2023,6 +2023,7 @@ struct bpf_prog_aux;
> >  extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
> >  extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
> >  extern int security_bpf_prog(struct bpf_prog *prog);
> > +extern int security_bpf_map_create(const union bpf_attr *attr);
> >  extern int security_bpf_map_alloc(struct bpf_map *map);
> >  extern void security_bpf_map_free(struct bpf_map *map);
> >  extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
> > @@ -2044,6 +2045,11 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
> >       return 0;
> >  }
> >
> > +static inline int security_bpf_map_create(const union bpf_attr *attr)
> > +{
> > +     return 0;
> > +}
>
> I would expect this to be something like:
>
>         return sysctl_unprivileged_bpf_disabled ? -EPERM : 0;

So I'd need to duplicate this check in two places: default
security_bpf_map_create implemented if !CONFIG_SECURITY &&
!CONFIG_BPF_SYSCALL and the actual one when both are defined. Do you
think it's worth it to duplicate this check instead of having it
checked (or skipped if LSM hook allows it) explicitly in map_create()
in kernel/bpf/syscall.c?

I personally find it harder to keep track of overall logic if it's
spread like this, as sysctl_unprivileged_bpf_disabled is not really
LSM-related.

>
> > +
> >  static inline int security_bpf_map_alloc(struct bpf_map *map)
> >  {
> >       return 0;
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index e14c822f8911..931d4dda5dac 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -260,6 +260,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >  BTF_SET_START(sleepable_lsm_hooks)
> >  BTF_ID(func, bpf_lsm_bpf)
> >  BTF_ID(func, bpf_lsm_bpf_map)
> > +BTF_ID(func, bpf_lsm_bpf_map_create_security)
> >  BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
> >  BTF_ID(func, bpf_lsm_bpf_map_free_security)
> >  BTF_ID(func, bpf_lsm_bpf_prog)
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index cbea4999e92f..7d1165814efc 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -980,7 +980,7 @@ int map_check_no_btf(const struct bpf_map *map,
> >  }
> >
> >  static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> > -                      u32 btf_key_id, u32 btf_value_id)
> > +                      u32 btf_key_id, u32 btf_value_id, bool priv_checked)
> >  {
> >       const struct btf_type *key_type, *value_type;
> >       u32 key_size, value_size;
> > @@ -1008,7 +1008,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >       if (!IS_ERR_OR_NULL(map->record)) {
> >               int i;
> >
> > -             if (!bpf_capable()) {
> > +             if (!priv_checked && !bpf_capable()) {
> >                       ret = -EPERM;
> >                       goto free_map_tab;
> >               }
> > @@ -1097,10 +1097,12 @@ static int map_create(union bpf_attr *attr)
> >       int numa_node = bpf_map_attr_numa_node(attr);
> >       u32 map_type = attr->map_type;
> >       struct btf_field_offs *foffs;
> > +     bool priv_checked = false;
> >       struct bpf_map *map;
> >       int f_flags;
> >       int err;
> >
> > +     /* sanity checks */
> >       err = CHECK_ATTR(BPF_MAP_CREATE);
> >       if (err)
> >               return -EINVAL;
> > @@ -1145,6 +1147,15 @@ static int map_create(union bpf_attr *attr)
> >       if (!ops->map_mem_usage)
> >               return -EINVAL;
> >
> > +     /* security checks */
> > +     err = security_bpf_map_create(attr);
> > +     if (err < 0)
> > +             return err;
> > +     if (err > 0) {
> > +             priv_checked = true;
> > +             goto skip_priv_checks;
> > +     }
>
> I think we can refactor this to avoid the concept of "skipping" checks.

Yep, makes sense, will do

>
> Also, I think passing "priv_checked" is kind of confusing -- I feel like
> access control should either be centralized or in each individual
> function. Why is there a need to split this up?

Yeah, I hate this bit. There is this extra bpf_capable() check much
later on only if a particular BPF map happens to have custom
user-defined extra features (like spin lock and stuff like this).
There is no way to know this upfront without doing lots of preparatory
work. So there has to be something to let that later code say that
it's ok to use this advanced feature.

I'm actually thinking to have a bool flag on struct bpf_map itself  to
record whether BPF map is considered to be "privileged", and thus any
other advanced features like that won't have to do bpf_capable()
check, they will be just checking this recorded bool. We have a
similar approach for BPF programs, where we remember during
verification whether a BPF program had CAP_BPF, which influences which
features are allowed for it.

This will be a bit cleaner, I think.

>
> > +
> >       /* Intent here is for unprivileged_bpf_disabled to block key object
> >        * creation commands for unprivileged users; other actions depend
> >        * of fd availability and access to bpffs, so are dependent on
> > @@ -1203,6 +1214,8 @@ static int map_create(union bpf_attr *attr)
> >               return -EPERM;
> >       }
> >
> > +skip_priv_checks:
> > +     /* create and init map */
> >       map = ops->map_alloc(attr);
> >       if (IS_ERR(map))
> >               return PTR_ERR(map);
> > @@ -1243,7 +1256,7 @@ static int map_create(union bpf_attr *attr)
> >
> >               if (attr->btf_value_type_id) {
> >                       err = map_check_btf(map, btf, attr->btf_key_type_id,
> > -                                         attr->btf_value_type_id);
> > +                                         attr->btf_value_type_id, priv_checked);
> >                       if (err)
> >                               goto free_map;
> >               }
> > diff --git a/security/security.c b/security/security.c
> > index cf6cc576736f..f9b885680966 100644
> > --- a/security/security.c
> > +++ b/security/security.c
> > @@ -2682,6 +2682,10 @@ int security_bpf_prog(struct bpf_prog *prog)
> >  {
> >       return call_int_hook(bpf_prog, 0, prog);
> >  }
> > +int security_bpf_map_create(const union bpf_attr *attr)
> > +{
> > +     return call_int_hook(bpf_map_create_security, 0, attr);
>
> And the default return value here wouldn't be 0, but:
>
>         sysctl_unprivileged_bpf_disabled ?  -EPERM : 0

replied above, I do find that hiding sysctl_unprivileged_bpf_disabled
handling so deep make following the overall flow more confusing


>
> > +}
> >  int security_bpf_map_alloc(struct bpf_map *map)
> >  {
> >       return call_int_hook(bpf_map_alloc_security, 0, map);
> > --
> > 2.34.1
> >
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 5/8] selftests/bpf: validate new bpf_map_create_security LSM hook
  2023-04-12 18:23   ` Kees Cook
@ 2023-04-13  0:23     ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  0:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, paul, linux-security-module

On Wed, Apr 12, 2023 at 11:23 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Apr 11, 2023 at 09:32:57PM -0700, Andrii Nakryiko wrote:
> > Add selftests that goes over every known map type and validates that
> > a combination of privileged/unprivileged modes and allow/reject/pass-through
> > LSM policy decisions behave as expected.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  .../selftests/bpf/prog_tests/lsm_map_create.c | 143 ++++++++++++++++++
> >  .../selftests/bpf/progs/lsm_map_create.c      |  32 ++++
> >  tools/testing/selftests/bpf/test_progs.h      |   6 +
> >  3 files changed, 181 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/lsm_map_create.c
> >

[...]

> > +             ret = libbpf_probe_bpf_map_type(map_type, NULL);
> > +             ASSERT_EQ(ret, is_map_priv ? 0 : 1,  "default_unpriv_mode");
> > +
> > +             /* allow any map creation for our thread */
> > +             skel->bss->decision = 1;
> > +             ret = libbpf_probe_bpf_map_type(map_type, NULL);
> > +             ASSERT_EQ(ret, 1, "lsm_allow_unpriv_mode");
> > +
> > +             /* reject any map creation for our thread */
> > +             skel->bss->decision = -1;
> > +             ret = libbpf_probe_bpf_map_type(map_type, NULL);
> > +             ASSERT_EQ(ret, 0, "lsm_reject_unpriv_mode");
> > +
> > +             /* restore privileges, but keep reject LSM policy */
> > +             if (!ASSERT_OK(restore_priv_caps(orig_caps), "restore_caps"))
> > +                     goto cleanup;
> > +
> > +skip_if_needs_btf:
> > +             /* even with all caps map create will fail */
> > +             skel->bss->decision = -1;
> > +             ret = libbpf_probe_bpf_map_type(map_type, NULL);
> > +             ASSERT_EQ(ret, 0, "lsm_reject_priv_mode");
> > +     }
> > +
> > +cleanup:
> > +     btf__free(btf);
> > +     lsm_map_create__destroy(skel);
> > +}
>
> This test looks good! One meta-comment about testing would be: are you
> sure each needs to be ASSERT instead of EXPECT? (i.e. should forward
> progress through this test always be aborted when a check fails?)
>

it's our custom BPF selftests ASSERTs, they don't really do assert()
and panic, they really are just a check (so I'm guessing they have
EXPECT semantics you are referring to). And if check doesn't pass, we
just set a flag notifying our own custom test runner that test failed
and proceed.

So in short, it already behaves like you would want with EXPECT. We
just don't use kselftests's ASSERTs and EXPECTs.


> > diff --git a/tools/testing/selftests/bpf/progs/lsm_map_create.c b/tools/testing/selftests/bpf/progs/lsm_map_create.c
> > new file mode 100644
> > index 000000000000..093825c68459
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/lsm_map_create.c
> > @@ -0,0 +1,32 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
> > +
> > +#include "vmlinux.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <errno.h>
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +int my_tid;
> > +/* LSM enforcement:
> > + *   - 0, delegate to kernel;
> > + *   - 1, allow;
> > + *   - -1, reject.
> > + */
> > +int decision;
> > +
> > +SEC("lsm/bpf_map_create_security")
> > +int BPF_PROG(allow_unpriv_maps, union bpf_attr *attr)
> > +{
> > +     if (!my_tid || (u32)bpf_get_current_pid_tgid() != my_tid)
> > +             return 0; /* keep processing LSM hooks */
> > +
> > +     if (decision == 0)
> > +             return 0;
> > +
> > +     if (decision > 0)
> > +             return 1; /* allow */
> > +
> > +     return -EPERM;
> > +}
> > diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
> > index 10ba43250668..12f9c6652d40 100644
> > --- a/tools/testing/selftests/bpf/test_progs.h
> > +++ b/tools/testing/selftests/bpf/test_progs.h
> > @@ -23,6 +23,7 @@ typedef __u16 __sum16;
> >  #include <linux/perf_event.h>
> >  #include <linux/socket.h>
> >  #include <linux/unistd.h>
> > +#include <sys/syscall.h>
> >
> >  #include <sys/ioctl.h>
> >  #include <sys/wait.h>
> > @@ -176,6 +177,11 @@ void test__skip(void);
> >  void test__fail(void);
> >  int test__join_cgroup(const char *path);
> >
> > +static inline int gettid(void)
> > +{
> > +     return syscall(SYS_gettid);
> > +}
> > +
> >  #define PRINT_FAIL(format...)                                                  \
> >       ({                                                                     \
> >               test__fail();                                                  \
> > --
> > 2.34.1
> >
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12 19:06         ` Paul Moore
@ 2023-04-13  1:43           ` Andrii Nakryiko
  2023-04-13  2:56             ` Paul Moore
  2023-04-13 16:27             ` Casey Schaufler
  0 siblings, 2 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  1:43 UTC (permalink / raw)
  To: Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > > >
> > > > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > > > objects, which are fundamental building blocks of any modern BPF application.
> > > > > >
> > > > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > > > implement LSM policies that could granularly enforce more restrictions on
> > > > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > > > cases.
> > > > >
> > > > > One of the hallmarks of the LSM has always been that it is
> > > > > non-authoritative: it cannot unilaterally grant access, it can only
> > > > > restrict what would have been otherwise permitted on a traditional
> > > > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > > > discretionary access controls, e.g. capabilities.
> > > > >
> > > > > If there is a problem with the eBPF capability-based access controls,
> > > > > that problem needs to be addressed in how the core eBPF code
> > > > > implements its capability checks, not by modifying the LSM mechanism
> > > > > to bypass these checks.
> > > >
> > > > I think semantics matter here. I wouldn't view this as _bypassing_
> > > > capability enforcement: it's just more fine-grained access control.

Exactly. One of the motivations for this work was the need to move
some production use cases that are only needing extra privileges so
that they can use BPF into a more restrictive environment. Granting
CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
for BPF usage is too coarse grained. These caps would allow those
applications way more than just BPF usage. So the idea here is more
finer-grained control of BPF-specific operations, granting *effective*
CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
production logic that would validate the use case.

This *is* an attempt to achieve a more secure production approach.

> > > >
> > > > For example, in many places we have things like:
> > > >
> > > >         if (!some_check(...) && !capable(...))
> > > >                 return -EPERM;
> > > >
> > > > I would expect this is a similar logic. An operation can succeed if the
> > > > access control requirement is met. The mismatch we have through-out the
> > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > yet here).
> > >
> > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > when it returns a positive value "bypasses kernel checks".  The patch
> > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > bypassing a capability check, but the description claims that to be
> > > the case.
> > >
> > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > be used to provide additional access control restrictions beyond a
> > > capability check, but a LSM hook should never be allowed to overrule
> > > an access denial due to a capability check.
> > >
> > > > The reason CAP_BPF was created was because there was nothing else that
> > > > would be fine-grained enough at the time.
> > >
> > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > argument that one of the reasons LSMs exist is to provide
> > > supplementary controls due to capability-based access controls being a
> > > poor fit for many modern use cases.
> >
> > I generally agree with what you say, but we DO have this code pattern:
> >
> >          if (!some_check(...) && !capable(...))
> >                  return -EPERM;
>
> I think we need to make this more concrete; we don't have a pattern in
> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> Simply because there is another kernel access control mechanism which
> allows a capability check to be skipped doesn't mean I want to allow a
> LSM hook to be used to skip a capability check.

This work is an attempt to tighten the security of production systems
by allowing to drop too coarse-grained and permissive capabilities
(like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
than production use cases are meant to be able to do) and then grant
specific BPF operations on specific BPF programs/maps based on custom
LSM security policy, which validates application trustworthiness using
custom production-specific logic.

Isn't this goal in line with LSMs mission to enhance system security?

>
> > It looks to me like this series can be refactored to do the same. I
> > wouldn't consider that to be a "bypass", but I would agree the current
> > series looks too much like "bypass", and makes reasoning about the
> > effect of the LSM hooks too "special". :)

Sorry, I didn't realize that the current code layout is making things
more confusing. I'll address feedback to make the intent a bit
clearer.

>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook
  2023-04-12 16:52   ` Paul Moore
@ 2023-04-13  1:43     ` Andrii Nakryiko
  2023-04-13  2:47       ` Paul Moore
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  1:43 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, keescook,
	linux-security-module

On Wed, Apr 12, 2023 at 9:53 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add new LSM hook, bpf_btf_load_security, that allows custom LSM security
> > policies controlling BTF data loading permissions (BPF_BTF_LOAD command
> > of bpf() syscall) granularly and precisely.
> >
> > This complements bpf_map_create_security LSM hook added earlier and
> > follow the same semantics: 0 means perform standard kernel capabilities-based
> > checks, negative error rejects BTF object load, while positive one skips
> > CAP_BPF check and allows BTF data object creation.
> >
> > With this hook, together with bpf_map_create_security, we now can also allow
> > trusted unprivileged process to create BPF maps that require BTF, which
> > we take advantaged in the next patch to improve the coverage of added
> > BPF selftest.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/lsm_hook_defs.h |  1 +
> >  include/linux/lsm_hooks.h     | 13 +++++++++++++
> >  include/linux/security.h      |  6 ++++++
> >  kernel/bpf/bpf_lsm.c          |  1 +
> >  kernel/bpf/syscall.c          | 10 ++++++++++
> >  security/security.c           |  4 ++++
> >  6 files changed, 35 insertions(+)
>
> ...
>
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 42d8473237ab..bbf70bddc770 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -4449,12 +4449,22 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
> >
> >  static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
> >  {
> > +       int err;
> > +
> >         if (CHECK_ATTR(BPF_BTF_LOAD))
> >                 return -EINVAL;
> >
> > +       /* security checks */
> > +       err = security_bpf_btf_load(attr);
> > +       if (err < 0)
> > +               return err;
> > +       if (err > 0)
> > +               goto skip_priv_checks;
> > +
> >         if (!bpf_capable())
> >                 return -EPERM;
> >
> > +skip_priv_checks:
> >         return btf_new_fd(attr, uattr, uattr_size);
> >  }
>
> Beyond the objection I brought up in the patchset cover letter, I
> believe the work of the security_bpf_btf_load() hook presented here
> could be done by the existing security_bpf() LSM hook.  If you believe
> that not to be the case, please let me know.

security_bpf() could prevent BTF object loading only, but
security_bpf_btf_load() can *also* allow *trusted* (according to LSM
policy) unprivileged process to proceed. So it doesn't seem like they
are interchangeable.


>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook
  2023-04-13  1:43     ` Andrii Nakryiko
@ 2023-04-13  2:47       ` Paul Moore
  0 siblings, 0 replies; 52+ messages in thread
From: Paul Moore @ 2023-04-13  2:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, kpsingh, keescook,
	linux-security-module

On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Wed, Apr 12, 2023 at 9:53 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add new LSM hook, bpf_btf_load_security, that allows custom LSM security
> > > policies controlling BTF data loading permissions (BPF_BTF_LOAD command
> > > of bpf() syscall) granularly and precisely.
> > >
> > > This complements bpf_map_create_security LSM hook added earlier and
> > > follow the same semantics: 0 means perform standard kernel capabilities-based
> > > checks, negative error rejects BTF object load, while positive one skips
> > > CAP_BPF check and allows BTF data object creation.
> > >
> > > With this hook, together with bpf_map_create_security, we now can also allow
> > > trusted unprivileged process to create BPF maps that require BTF, which
> > > we take advantaged in the next patch to improve the coverage of added
> > > BPF selftest.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/lsm_hook_defs.h |  1 +
> > >  include/linux/lsm_hooks.h     | 13 +++++++++++++
> > >  include/linux/security.h      |  6 ++++++
> > >  kernel/bpf/bpf_lsm.c          |  1 +
> > >  kernel/bpf/syscall.c          | 10 ++++++++++
> > >  security/security.c           |  4 ++++
> > >  6 files changed, 35 insertions(+)
> >
> > ...
> >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 42d8473237ab..bbf70bddc770 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -4449,12 +4449,22 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
> > >
> > >  static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
> > >  {
> > > +       int err;
> > > +
> > >         if (CHECK_ATTR(BPF_BTF_LOAD))
> > >                 return -EINVAL;
> > >
> > > +       /* security checks */
> > > +       err = security_bpf_btf_load(attr);
> > > +       if (err < 0)
> > > +               return err;
> > > +       if (err > 0)
> > > +               goto skip_priv_checks;
> > > +
> > >         if (!bpf_capable())
> > >                 return -EPERM;
> > >
> > > +skip_priv_checks:
> > >         return btf_new_fd(attr, uattr, uattr_size);
> > >  }
> >
> > Beyond the objection I brought up in the patchset cover letter, I
> > believe the work of the security_bpf_btf_load() hook presented here
> > could be done by the existing security_bpf() LSM hook.  If you believe
> > that not to be the case, please let me know.
>
> security_bpf() could prevent BTF object loading only, but
> security_bpf_btf_load() can *also* allow *trusted* (according to LSM
> policy) unprivileged process to proceed. So it doesn't seem like they
> are interchangeable.

As discussed in the cover letter thread, I'm opposed to using a LSM
hook to skip/bypass/circumvent/etc. existing capability checks.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13  1:43           ` Andrii Nakryiko
@ 2023-04-13  2:56             ` Paul Moore
  2023-04-13  5:16               ` Andrii Nakryiko
  2023-04-13 16:27             ` Casey Schaufler
  1 sibling, 1 reply; 52+ messages in thread
From: Paul Moore @ 2023-04-13  2:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:

...

> > > > > For example, in many places we have things like:
> > > > >
> > > > >         if (!some_check(...) && !capable(...))
> > > > >                 return -EPERM;
> > > > >
> > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > access control requirement is met. The mismatch we have through-out the
> > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > yet here).
> > > >
> > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > bypassing a capability check, but the description claims that to be
> > > > the case.
> > > >
> > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > be used to provide additional access control restrictions beyond a
> > > > capability check, but a LSM hook should never be allowed to overrule
> > > > an access denial due to a capability check.
> > > >
> > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > would be fine-grained enough at the time.
> > > >
> > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > argument that one of the reasons LSMs exist is to provide
> > > > supplementary controls due to capability-based access controls being a
> > > > poor fit for many modern use cases.
> > >
> > > I generally agree with what you say, but we DO have this code pattern:
> > >
> > >          if (!some_check(...) && !capable(...))
> > >                  return -EPERM;
> >
> > I think we need to make this more concrete; we don't have a pattern in
> > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > Simply because there is another kernel access control mechanism which
> > allows a capability check to be skipped doesn't mean I want to allow a
> > LSM hook to be used to skip a capability check.
>
> This work is an attempt to tighten the security of production systems
> by allowing to drop too coarse-grained and permissive capabilities
> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> than production use cases are meant to be able to do) and then grant
> specific BPF operations on specific BPF programs/maps based on custom
> LSM security policy, which validates application trustworthiness using
> custom production-specific logic.

There are ways to leverage the LSMs to apply finer grained access
control on top of the relatively coarse capabilities that do not
require circumventing those capability controls.  One grants the
capabilities, just as one would do today, and then leverages the
security functionality of a LSM to further restrict specific users,
applications, etc. with a level of granularity beyond that offered by
the capability controls.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13  2:56             ` Paul Moore
@ 2023-04-13  5:16               ` Andrii Nakryiko
  2023-04-13 15:11                 ` Paul Moore
                                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-13  5:16 UTC (permalink / raw)
  To: Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> ...
>
> > > > > > For example, in many places we have things like:
> > > > > >
> > > > > >         if (!some_check(...) && !capable(...))
> > > > > >                 return -EPERM;
> > > > > >
> > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > yet here).
> > > > >
> > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > bypassing a capability check, but the description claims that to be
> > > > > the case.
> > > > >
> > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > be used to provide additional access control restrictions beyond a
> > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > an access denial due to a capability check.
> > > > >
> > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > would be fine-grained enough at the time.
> > > > >
> > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > argument that one of the reasons LSMs exist is to provide
> > > > > supplementary controls due to capability-based access controls being a
> > > > > poor fit for many modern use cases.
> > > >
> > > > I generally agree with what you say, but we DO have this code pattern:
> > > >
> > > >          if (!some_check(...) && !capable(...))
> > > >                  return -EPERM;
> > >
> > > I think we need to make this more concrete; we don't have a pattern in
> > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > Simply because there is another kernel access control mechanism which
> > > allows a capability check to be skipped doesn't mean I want to allow a
> > > LSM hook to be used to skip a capability check.
> >
> > This work is an attempt to tighten the security of production systems
> > by allowing to drop too coarse-grained and permissive capabilities
> > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > than production use cases are meant to be able to do) and then grant
> > specific BPF operations on specific BPF programs/maps based on custom
> > LSM security policy, which validates application trustworthiness using
> > custom production-specific logic.
>
> There are ways to leverage the LSMs to apply finer grained access
> control on top of the relatively coarse capabilities that do not
> require circumventing those capability controls.  One grants the
> capabilities, just as one would do today, and then leverages the
> security functionality of a LSM to further restrict specific users,
> applications, etc. with a level of granularity beyond that offered by
> the capability controls.

Please help me understand something. What you and Casey are proposing,
when taken to the logical extreme, is to grant to all processes root
permissions and then use LSM to restrict specific actions, do I
understand correctly? This strikes me as a less secure and more
error-prone way of doing things. If there is some problem with
installing LSM policy, it could go unnoticed for a really long time,
while the system would be way more vulnerable. Why do you prefer such
an approach instead of going with no extra permissions by default, but
allowing custom LSM policy to grant few exceptions for known and
trusted use cases?

By the way, even the above proposal of yours doesn't work for
production use cases when user namespaces are involved, as far as I
understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
containers running inside user namespaces, as CAP_BPF in non-init
namespace is not enough for bpf() syscall to allow loading BPF maps or
BPF program (bpf() doesn't do ns_capable(), it's only using
capable()). What solution would you suggest for such production
setups?

Also, in previous email you said:

> Simply because there is another kernel access control mechanism which
> allows a capability check to be skipped doesn't mean I want to allow a
> LSM hook to be used to skip a capability check.

I understand your stated position, but can you please help me
understand the reasoning behind it? What would be wrong with some LSM
hooks granting effective capabilities? How would that change anything
about LSM design? As far as I can see, I'm not doing anything crazy
with my LSM hook implementation. It's reusing the standard
call_int_hook() mechanism very straightforwardly with a default result
of 0. And then just interprets 0, <0, and >0 results accordingly. Is
that abusing the LSM mechanism itself somehow?

Does the above also mean that you'd be fine if we just don't plug into
the LSM subsystem at all and instead come up with some ad-hoc solution
to allow effectively the same policies? This sounds detrimental both
to LSM and BPF subsystems, so I hope we can talk this through before
finalizing decisions.

Lastly, you mentioned before:

> > > I think we need to make this more concrete; we don't have a pattern in
> > > the upstream kernel where 'some_check(...)' is a LSM hook, right?

Unfortunately I don't have enough familiarity with all LSM hooks, so I
can't confirm or disprove the above statement. But earlier someone
brought to my attention the case of security_vm_enough_memory_mm(),
which seems to be granting effectively CAP_SYS_ADMIN for the purposes
of memory accounting. Am I missing something subtle there or does it
grant effective caps indeed?




>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13  5:16               ` Andrii Nakryiko
@ 2023-04-13 15:11                 ` Paul Moore
  2023-04-17 23:29                   ` Andrii Nakryiko
  2023-04-13 16:54                 ` Casey Schaufler
  2023-04-13 19:03                 ` Jonathan Corbet
  2 siblings, 1 reply; 52+ messages in thread
From: Paul Moore @ 2023-04-13 15:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > ...
> >
> > > > > > > For example, in many places we have things like:
> > > > > > >
> > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > >                 return -EPERM;
> > > > > > >
> > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > yet here).
> > > > > >
> > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > bypassing a capability check, but the description claims that to be
> > > > > > the case.
> > > > > >
> > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > be used to provide additional access control restrictions beyond a
> > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > an access denial due to a capability check.
> > > > > >
> > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > would be fine-grained enough at the time.
> > > > > >
> > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > supplementary controls due to capability-based access controls being a
> > > > > > poor fit for many modern use cases.
> > > > >
> > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > >
> > > > >          if (!some_check(...) && !capable(...))
> > > > >                  return -EPERM;
> > > >
> > > > I think we need to make this more concrete; we don't have a pattern in
> > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > Simply because there is another kernel access control mechanism which
> > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > LSM hook to be used to skip a capability check.
> > >
> > > This work is an attempt to tighten the security of production systems
> > > by allowing to drop too coarse-grained and permissive capabilities
> > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > than production use cases are meant to be able to do) and then grant
> > > specific BPF operations on specific BPF programs/maps based on custom
> > > LSM security policy, which validates application trustworthiness using
> > > custom production-specific logic.
> >
> > There are ways to leverage the LSMs to apply finer grained access
> > control on top of the relatively coarse capabilities that do not
> > require circumventing those capability controls.  One grants the
> > capabilities, just as one would do today, and then leverages the
> > security functionality of a LSM to further restrict specific users,
> > applications, etc. with a level of granularity beyond that offered by
> > the capability controls.
>
> Please help me understand something. What you and Casey are proposing,
> when taken to the logical extreme, is to grant to all processes root
> permissions and then use LSM to restrict specific actions, do I
> understand correctly? This strikes me as a less secure and more
> error-prone way of doing things.

When taken to the "logical extreme" most concepts end up sounding a
bit absurd, but that was the point, wasn't it?

Here is a fun story which seems relevant ... in the early days of
SELinux, one of the community devs setup up a system with a SELinux
policy which restricted all privileged operations from the root user,
put the system on a publicly accessible network, posted the root
password for all to see, and invited the public to login to the system
and attempt to exercise root privilege (it's been well over 10 years
at this point so the details are a bit fuzzy).  Granted, there were
some hiccups in the beginning, mostly due to the crude state of policy
development/analysis at the time, but after a few policy revisions the
system held up quite well.

On the more practical side of things, there are several use cases
which require, by way of legal or contractual requirements, that full
root/admin privileges are decomposed into separate roles: security
admin, audit admin, backup admin, etc.  These users satisfy these
requirements by using LSMs, such as SELinux, to restrict the
administrative capabilities based on the SELinux user/role/domain.

> By the way, even the above proposal of yours doesn't work for
> production use cases when user namespaces are involved, as far as I
> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> containers running inside user namespaces, as CAP_BPF in non-init
> namespace is not enough for bpf() syscall to allow loading BPF maps or
> BPF program ...

Once again, the LSM has always intended to be a restrictive mechanism,
not a privilege granting mechanism.  If an operation is not possible
without the LSM layer enabled, it should not be possible with the LSM
layer enabled.  The LSM is not a mechanism to circumvent other access
control mechanisms in the kernel.

> Also, in previous email you said:
>
> > Simply because there is another kernel access control mechanism which
> > allows a capability check to be skipped doesn't mean I want to allow a
> > LSM hook to be used to skip a capability check.
>
> I understand your stated position, but can you please help me
> understand the reasoning behind it?

Keeping the LSM as a restrictive access control mechanism helps ensure
some level of sanity and consistency across different Linux
installations.  If a certain operation requires CAP_SYS_ADMIN on one
Linux system, it should require CAP_SYS_ADMIN on another Linux system.
Granted, a LSM running on one system might impose additional
constraints on that operation, but the CAP_SYS_ADMIN requirement still
applies.

There is also an issue of safety in knowing that enabling a LSM will
not degrade the access controls on a system by potentially granting
operations that were previously denied.

> Does the above also mean that you'd be fine if we just don't plug into
> the LSM subsystem at all and instead come up with some ad-hoc solution
> to allow effectively the same policies? This sounds detrimental both
> to LSM and BPF subsystems, so I hope we can talk this through before
> finalizing decisions.

Based on your patches and our discussion, it seems to me that the
problem you are trying to resolve is related more to the
capability-based access controls in the eBPF, and possibly other
kernel subsystems, and not any LSM-based restrictions.  I'm happy to
work with you on a solution involving the LSM, but please understand
that I'm not going to support a solution which changes a core
philosophy of the LSM layer.

> Lastly, you mentioned before:
>
> > > > I think we need to make this more concrete; we don't have a pattern in
> > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
>
> Unfortunately I don't have enough familiarity with all LSM hooks, so I
> can't confirm or disprove the above statement. But earlier someone
> brought to my attention the case of security_vm_enough_memory_mm(),
> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> of memory accounting. Am I missing something subtle there or does it
> grant effective caps indeed?

Some of the comments around that hook can be misleading, but if you
look at the actual code it starts to make more sense.

First, look at the LSM-disabled case and you'll see that the
security_vm_enough_memory_mm() hook ends up looking like this:

int security_vm_enough_memory_mm(...)
{
  return __vm_enough_memory(mm, pages, cap_vm_enough_memory(mm, pages));
}

... which basically calls into the core capability code to check for
CAP_SYS_ADMIN, passing the result onto __vm_enough_memory.

If we then look at the LSM-enabled case, things are a little more
complicated, but it looks something like this:

int security_vm_enough_memory_mm(...)
{
  int cap_admin = 1;

  for_each_lsm_hook(...) {
    rc = lsm_hook(...);
    if (rc <= 0) {
      cap_admin = 0;
      break;
    }
  }

  return __vm_enough_memory(mm, pages, cap_admin);
}

... which as the comment says, "If all of the modules agree that it
should be set it will. If any module thinks it should not be set it
won't.".  However, if we look at which LSMs define vm_enough_memory()
hooks we see just two: the capability LSM, and SELinux.  The
capability LSM[1] just uses cap_vm_enough_memory() so that's
straightforward, and the SELinux hook is selinux_vm_enough_memory(),
which simply checks the loaded SELinux policy to see if the current
task has permission to exercise the CAP_SYS_ADMIN capability.  SELinux
can't grant CAP_SYS_ADMIN beyond what the capability code permits, it
only restricts its use.  Put another way, if the capability code does
not allow CAP_SYS_ADMIN in a call to security_vm_enough_memory() then
CAP_SYS_ADMIN will not be granted regardless of what the other LSMs
may decide.

I do agree that the security_vm_enough_memory() hook is structured a
bit differently than most of the other LSM hooks, but it still
operates with the same philosophy: a LSM should only be allowed to
restrict access, a LSM should never be allowed to grant access that
would otherwise be denied by the traditional Linux access controls.

Hopefully that explanation makes sense, but if things are still a bit
fuzzy I would encourage you to go look at the code, I'm sure it will
make sense once you spend a few minutes figuring out how it works.

[1] There is a long and sorta bizarre history with the capability LSM,
but just understand it is a bit "special" in many ways, and those
"special" behaviors are intentional.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13  1:43           ` Andrii Nakryiko
  2023-04-13  2:56             ` Paul Moore
@ 2023-04-13 16:27             ` Casey Schaufler
  2023-04-17 23:31               ` Andrii Nakryiko
  1 sibling, 1 reply; 52+ messages in thread
From: Casey Schaufler @ 2023-04-13 16:27 UTC (permalink / raw)
  To: Andrii Nakryiko, Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module, Casey Schaufler

On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
>>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
>>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
>>>>>>> objects, which are fundamental building blocks of any modern BPF application.
>>>>>>>
>>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
>>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
>>>>>>> implement LSM policies that could granularly enforce more restrictions on
>>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
>>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
>>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
>>>>>>> cases.
>>>>>> One of the hallmarks of the LSM has always been that it is
>>>>>> non-authoritative: it cannot unilaterally grant access, it can only
>>>>>> restrict what would have been otherwise permitted on a traditional
>>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
>>>>>> discretionary access controls, e.g. capabilities.
>>>>>>
>>>>>> If there is a problem with the eBPF capability-based access controls,
>>>>>> that problem needs to be addressed in how the core eBPF code
>>>>>> implements its capability checks, not by modifying the LSM mechanism
>>>>>> to bypass these checks.
>>>>> I think semantics matter here. I wouldn't view this as _bypassing_
>>>>> capability enforcement: it's just more fine-grained access control.
> Exactly. One of the motivations for this work was the need to move
> some production use cases that are only needing extra privileges so
> that they can use BPF into a more restrictive environment. Granting
> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
> for BPF usage is too coarse grained. These caps would allow those
> applications way more than just BPF usage. So the idea here is more
> finer-grained control of BPF-specific operations, granting *effective*
> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
> production logic that would validate the use case.

That's an authoritative model which is in direct conflict with the
design and implementation of both capabilities and LSM.

>
> This *is* an attempt to achieve a more secure production approach.
>
>>>>> For example, in many places we have things like:
>>>>>
>>>>>         if (!some_check(...) && !capable(...))
>>>>>                 return -EPERM;
>>>>>
>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>> access control requirement is met. The mismatch we have through-out the
>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>> yet here).
>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>> bypassing a capability check, but the description claims that to be
>>>> the case.
>>>>
>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>> be used to provide additional access control restrictions beyond a
>>>> capability check, but a LSM hook should never be allowed to overrule
>>>> an access denial due to a capability check.
>>>>
>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>> would be fine-grained enough at the time.
>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>> argument that one of the reasons LSMs exist is to provide
>>>> supplementary controls due to capability-based access controls being a
>>>> poor fit for many modern use cases.
>>> I generally agree with what you say, but we DO have this code pattern:
>>>
>>>          if (!some_check(...) && !capable(...))
>>>                  return -EPERM;
>> I think we need to make this more concrete; we don't have a pattern in
>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>> Simply because there is another kernel access control mechanism which
>> allows a capability check to be skipped doesn't mean I want to allow a
>> LSM hook to be used to skip a capability check.
> This work is an attempt to tighten the security of production systems
> by allowing to drop too coarse-grained and permissive capabilities
> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> than production use cases are meant to be able to do)

The BPF developers are in complete control of what CAP_BPF controls.
You can easily address the granularity issue by adding addition restrictions
on processes that have CAP_BPF. That is the intended use of LSM.
The whole point of having multiple capabilities is so that you can
grant just those that are required by the system security policy, and
do so safely. That leads to differences of opinion regarding the definition
of the system security policy. BPF chose to set itself up as an element
of security policy (you need CAP_BPF) rather than define elements such that
existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
control. 

>  and then grant
> specific BPF operations on specific BPF programs/maps based on custom
> LSM security policy,

This is backwards. The correct implementation is to require CAP_BPF and
further restrict BPF operations based on a custom LSM security policy.
That's how LSM is designed.

>  which validates application trustworthiness using
> custom production-specific logic.
>
> Isn't this goal in line with LSMs mission to enhance system security?

We're not arguing the goal, we're discussing the implementation.

>>> It looks to me like this series can be refactored to do the same. I
>>> wouldn't consider that to be a "bypass", but I would agree the current
>>> series looks too much like "bypass", and makes reasoning about the
>>> effect of the LSM hooks too "special". :)
> Sorry, I didn't realize that the current code layout is making things
> more confusing. I'll address feedback to make the intent a bit
> clearer.
>
>> --
>> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13  5:16               ` Andrii Nakryiko
  2023-04-13 15:11                 ` Paul Moore
@ 2023-04-13 16:54                 ` Casey Schaufler
  2023-04-17 23:31                   ` Andrii Nakryiko
  2023-04-13 19:03                 ` Jonathan Corbet
  2 siblings, 1 reply; 52+ messages in thread
From: Casey Schaufler @ 2023-04-13 16:54 UTC (permalink / raw)
  To: Andrii Nakryiko, Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module, Casey Schaufler

On 4/12/2023 10:16 PM, Andrii Nakryiko wrote:
> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
>> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
>> <andrii.nakryiko@gmail.com> wrote:
>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>> ...
>>
>>>>>>> For example, in many places we have things like:
>>>>>>>
>>>>>>>         if (!some_check(...) && !capable(...))
>>>>>>>                 return -EPERM;
>>>>>>>
>>>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>>>> access control requirement is met. The mismatch we have through-out the
>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>>>> yet here).
>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>>>> bypassing a capability check, but the description claims that to be
>>>>>> the case.
>>>>>>
>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>>>> be used to provide additional access control restrictions beyond a
>>>>>> capability check, but a LSM hook should never be allowed to overrule
>>>>>> an access denial due to a capability check.
>>>>>>
>>>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>>>> would be fine-grained enough at the time.
>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>>>> argument that one of the reasons LSMs exist is to provide
>>>>>> supplementary controls due to capability-based access controls being a
>>>>>> poor fit for many modern use cases.
>>>>> I generally agree with what you say, but we DO have this code pattern:
>>>>>
>>>>>          if (!some_check(...) && !capable(...))
>>>>>                  return -EPERM;
>>>> I think we need to make this more concrete; we don't have a pattern in
>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>>> Simply because there is another kernel access control mechanism which
>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>> LSM hook to be used to skip a capability check.
>>> This work is an attempt to tighten the security of production systems
>>> by allowing to drop too coarse-grained and permissive capabilities
>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
>>> than production use cases are meant to be able to do) and then grant
>>> specific BPF operations on specific BPF programs/maps based on custom
>>> LSM security policy, which validates application trustworthiness using
>>> custom production-specific logic.
>> There are ways to leverage the LSMs to apply finer grained access
>> control on top of the relatively coarse capabilities that do not
>> require circumventing those capability controls.  One grants the
>> capabilities, just as one would do today, and then leverages the
>> security functionality of a LSM to further restrict specific users,
>> applications, etc. with a level of granularity beyond that offered by
>> the capability controls.
> Please help me understand something. What you and Casey are proposing,
> when taken to the logical extreme, is to grant to all processes root
> permissions and then use LSM to restrict specific actions, do I
> understand correctly?

No. You grant a process the capabilities it needs (CAP_BPF, CAP_WHATEVER)
and only those capabilities. If you want additional restrictions you include
an LSM that implements those restrictions. If you want finer control over
the operations controlled by CAP_BPF you include an LSM that implements
those controls.

>  This strikes me as a less secure and more
> error-prone way of doing things. If there is some problem with
> installing LSM policy,

LSMs are not required to have loadable or dynamic policies. That's
up to the developer.

>  it could go unnoticed for a really long time,
> while the system would be way more vulnerable.

There is no way Paul or I are going to solve the mis-configured system
problem.

>  Why do you prefer such
> an approach instead of going with no extra permissions by default, but
> allowing custom LSM policy to grant few exceptions for known and
> trusted use cases?

Because that's not how capabilities work. Capabilities are independent
of other controls. If you want to propose a change to how capabilities
work, you need to propose that to the capability maintainer.

Because that's not how LSMs work. LSMs implement additional restrictions
to the existing policy. The restrictive vs. authoritative debate was closed
long ago. It's a fundamental property of how LSMs work.

> By the way, even the above proposal of yours doesn't work for
> production use cases when user namespaces are involved, as far as I
> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> containers running inside user namespaces, as CAP_BPF in non-init
> namespace is not enough for bpf() syscall to allow loading BPF maps or
> BPF program (bpf() doesn't do ns_capable(), it's only using
> capable()). What solution would you suggest for such production
> setups?

If user namespaces don't work the way you'd like, you should take that
up with the namespace maintainers. Or, since this appears to be an issue
with BPF not being namespace aware, fix BPF's use of capable() and ns_capable().

> Also, in previous email you said:
>
>> Simply because there is another kernel access control mechanism which
>> allows a capability check to be skipped doesn't mean I want to allow a
>> LSM hook to be used to skip a capability check.
> I understand your stated position, but can you please help me
> understand the reasoning behind it? What would be wrong with some LSM
> hooks granting effective capabilities?

You keep asking the question and ignoring the answer. See above.

>  How would that change anything
> about LSM design? As far as I can see, I'm not doing anything crazy
> with my LSM hook implementation.

You keep asking the question and ignoring the answer. See above.


>  It's reusing the standard
> call_int_hook() mechanism very straightforwardly with a default result
> of 0. And then just interprets 0, <0, and >0 results accordingly. Is
> that abusing the LSM mechanism itself somehow?
>
> Does the above also mean that you'd be fine if we just don't plug into
> the LSM subsystem at all and instead come up with some ad-hoc solution
> to allow effectively the same policies?

No, because you would be breaking the capability system in that case.

There is an example of a feature that does just what you're suggesting.
POSIX ACLs aren't an LSM because they don't just add restrictions, they
change the semantics of the file mode bits. Look at that implementation
before you seriously consider going that route.

>  This sounds detrimental both
> to LSM and BPF subsystems, so I hope we can talk this through before
> finalizing decisions.
>
> Lastly, you mentioned before:
>
>>>> I think we need to make this more concrete; we don't have a pattern in
>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> Unfortunately I don't have enough familiarity with all LSM hooks, so I
> can't confirm or disprove the above statement. But earlier someone
> brought to my attention the case of security_vm_enough_memory_mm(),
> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> of memory accounting. Am I missing something subtle there or does it
> grant effective caps indeed?
>
>
>
>
>> --
>> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13  5:16               ` Andrii Nakryiko
  2023-04-13 15:11                 ` Paul Moore
  2023-04-13 16:54                 ` Casey Schaufler
@ 2023-04-13 19:03                 ` Jonathan Corbet
  2023-04-17 23:28                   ` Andrii Nakryiko
  2 siblings, 1 reply; 52+ messages in thread
From: Jonathan Corbet @ 2023-04-13 19:03 UTC (permalink / raw)
  To: Andrii Nakryiko, Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> Why do you prefer such
> an approach instead of going with no extra permissions by default, but
> allowing custom LSM policy to grant few exceptions for known and
> trusted use cases?

Should you be curious, you can find some of the history of the "no
authoritative hooks" policy at:

  https://lwn.net/2001/1108/kernel.php3

It was fairly heatedly discussed at the time.

jon

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-12 17:47   ` Kees Cook
  2023-04-12 18:06     ` Paul Moore
@ 2023-04-14 20:23     ` Dr. Greg
  2023-04-17 23:31       ` Andrii Nakryiko
  1 sibling, 1 reply; 52+ messages in thread
From: Dr. Greg @ 2023-04-14 20:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Paul Moore, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Wed, Apr 12, 2023 at 10:47:13AM -0700, Kees Cook wrote:

Hi, I hope the week is ending well for everyone.

> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > On Wed, Apr 12, 2023 at 12:33???AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > objects, which are fundamental building blocks of any modern BPF application.
> > >
> > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > implement LSM policies that could granularly enforce more restrictions on
> > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > cases.
> > 
> > One of the hallmarks of the LSM has always been that it is
> > non-authoritative: it cannot unilaterally grant access, it can only
> > restrict what would have been otherwise permitted on a traditional
> > Linux system.  Put another way, a LSM should not undermine the Linux
> > discretionary access controls, e.g. capabilities.
> > 
> > If there is a problem with the eBPF capability-based access controls,
> > that problem needs to be addressed in how the core eBPF code
> > implements its capability checks, not by modifying the LSM mechanism
> > to bypass these checks.

> I think semantics matter here. I wouldn't view this as _bypassing_
> capability enforcement: it's just more fine-grained access control.
> 
> For example, in many places we have things like:
> 
> 	if (!some_check(...) && !capable(...))
> 		return -EPERM;
> 
> I would expect this is a similar logic. An operation can succeed if the
> access control requirement is met. The mismatch we have through-out the
> kernel is that capability checks aren't strictly done by LSM hooks. And
> this series conceptually, I think, doesn't violate that -- it's changing
> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> yet here).
> 
> The reason CAP_BPF was created was because there was nothing else that
> would be fine-grained enough at the time.

This was one of the issues, among others, that the TSEM LSM we are
working to upstream, was designed to address and may be an avenue
forward.

TSEM, being narratival rather than deontologically based, provides a
framework for security permissions that are based on a
characterization of the event itself.  So the permissions are as
variable as the contents of whatever BPF related information is passed
to the bpf* LSM hooks [1].

Currently, the tsem_bpf_* hooks are generically modeled.  We would
certainly entertain any discussion or suggestions as to what elements
of the structures passed to the hooks would be useful with respect
to establishing security policies useful and appropriate to the BPF
community.

We don't want to get in the middle of the restrictive
vs. authoritative debate, but it would seem that the jury is
conclusively in on that issue and LSM hooks are not going to be
allowed to dismiss, or modify, any other security controls.

Hopefully the BPF ABI isn't tied to CAP_BPF as that would seem to make
it problematic to make controls more granular.

> Kees Cook

Have a good weekend.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity

[1]: Plus developers don't need to write security policies, you test
your application in order to get the desired controls for a workload.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13 19:03                 ` Jonathan Corbet
@ 2023-04-17 23:28                   ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-17 23:28 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Paul Moore, Kees Cook, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module

On Thu, Apr 13, 2023 at 12:03 PM Jonathan Corbet <corbet@lwn.net> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > Why do you prefer such
> > an approach instead of going with no extra permissions by default, but
> > allowing custom LSM policy to grant few exceptions for known and
> > trusted use cases?
>
> Should you be curious, you can find some of the history of the "no
> authoritative hooks" policy at:
>
>   https://lwn.net/2001/1108/kernel.php3
>
> It was fairly heatedly discussed at the time.
>

Thanks, Jonathan! Yes, it was very useful to get a bit of context.


> jon

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13 15:11                 ` Paul Moore
@ 2023-04-17 23:29                   ` Andrii Nakryiko
  2023-04-18  0:47                     ` Casey Schaufler
  2023-04-18 14:21                     ` Paul Moore
  0 siblings, 2 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-17 23:29 UTC (permalink / raw)
  To: Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > ...
> > >
> > > > > > > > For example, in many places we have things like:
> > > > > > > >
> > > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > > >                 return -EPERM;
> > > > > > > >
> > > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > > yet here).
> > > > > > >
> > > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > > bypassing a capability check, but the description claims that to be
> > > > > > > the case.
> > > > > > >
> > > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > > be used to provide additional access control restrictions beyond a
> > > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > > an access denial due to a capability check.
> > > > > > >
> > > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > > would be fine-grained enough at the time.
> > > > > > >
> > > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > > supplementary controls due to capability-based access controls being a
> > > > > > > poor fit for many modern use cases.
> > > > > >
> > > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > > >
> > > > > >          if (!some_check(...) && !capable(...))
> > > > > >                  return -EPERM;
> > > > >
> > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > > Simply because there is another kernel access control mechanism which
> > > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > > LSM hook to be used to skip a capability check.
> > > >
> > > > This work is an attempt to tighten the security of production systems
> > > > by allowing to drop too coarse-grained and permissive capabilities
> > > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > > than production use cases are meant to be able to do) and then grant
> > > > specific BPF operations on specific BPF programs/maps based on custom
> > > > LSM security policy, which validates application trustworthiness using
> > > > custom production-specific logic.
> > >
> > > There are ways to leverage the LSMs to apply finer grained access
> > > control on top of the relatively coarse capabilities that do not
> > > require circumventing those capability controls.  One grants the
> > > capabilities, just as one would do today, and then leverages the
> > > security functionality of a LSM to further restrict specific users,
> > > applications, etc. with a level of granularity beyond that offered by
> > > the capability controls.
> >
> > Please help me understand something. What you and Casey are proposing,
> > when taken to the logical extreme, is to grant to all processes root
> > permissions and then use LSM to restrict specific actions, do I
> > understand correctly? This strikes me as a less secure and more
> > error-prone way of doing things.
>
> When taken to the "logical extreme" most concepts end up sounding a
> bit absurd, but that was the point, wasn't it?

Wasn't my intent to make it sound absurd, sorry. The way I see it, for
the sake of example, let's say CAP_BPF allows 20 different operations
(each with its own security_xxx hook). And let's say in production I
want to only allow 3 of them. Sure, technically it should be possible
to deny access at 17 hooks and let it through in just those 3. But if
someone adds 21st and I forget to add 21st restriction, that would be
bad (but very probably with such approach).

So my point is that for situations like this, dropping CAP_BPF, but
allowing only 3 hooks to proceed seems a safer approach, because if we
add 21st hook, it will safely be denied without CAP_BPF *by default*.
That's what I tried to point out.

But even if we ignore this "safe by default when a new hook is added"
behavior, when taking user namespaces into account, the restrictive
LSM approach just doesn't seem to work at all for something like
CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
because we cannot ensure that a given BPF program won't access kernel
state "belonging" to another process (as one example).

Now, thanks to Jonathan, I get that there was a heated discussion 20
years ago about authoritative vs restrictive LSMs. But if I read a
summary at that time ([0]), authoritative hooks were not out of the
question *in principle*. Surely, "walk before we can run" makes sense,
but it's been a while ago.

  [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3


>
> Here is a fun story which seems relevant ... in the early days of
> SELinux, one of the community devs setup up a system with a SELinux
> policy which restricted all privileged operations from the root user,
> put the system on a publicly accessible network, posted the root
> password for all to see, and invited the public to login to the system
> and attempt to exercise root privilege (it's been well over 10 years
> at this point so the details are a bit fuzzy).  Granted, there were
> some hiccups in the beginning, mostly due to the crude state of policy
> development/analysis at the time, but after a few policy revisions the
> system held up quite well.

Honest question out of curiosity: was the intent to demonstrate that
with LSM one can completely restrict root? Or that root was actually
allowed to do something useful? Because I can see how rejecting
everything would be rather simple, but actually pretty useless in
practice. Restricting only part of the power of the root, while still
allowing it to do something useful in production seems like a much
harder (but way more valuable) endeavor. Not saying it's impossible,
but see my example about missing 21st new CAP_BPF functionality.

>
> On the more practical side of things, there are several use cases
> which require, by way of legal or contractual requirements, that full
> root/admin privileges are decomposed into separate roles: security
> admin, audit admin, backup admin, etc.  These users satisfy these
> requirements by using LSMs, such as SELinux, to restrict the
> administrative capabilities based on the SELinux user/role/domain.
>
> > By the way, even the above proposal of yours doesn't work for
> > production use cases when user namespaces are involved, as far as I
> > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > containers running inside user namespaces, as CAP_BPF in non-init
> > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > BPF program ...
>
> Once again, the LSM has always intended to be a restrictive mechanism,
> not a privilege granting mechanism.  If an operation is not possible

Not according to [0] above:

  > It is our belief that these changes do not belong in the initial version of
  > LSM (especially given our limited charter and original goals), and should
  > be proposed as incremental refinements after LSM has been initially
  > accepted.
  > ...
  > It is our belief that the current LSM
  > will provide a meaningful improvement in the security infrastructure of the
  > Linux kernel, and that there is plenty of room for future expansion of LSM
  > in subsequent phases.

I don't see "always intended to be a restrictive mechanism" there.

> without the LSM layer enabled, it should not be possible with the LSM
> layer enabled.  The LSM is not a mechanism to circumvent other access
> control mechanisms in the kernel.

I understand, but it's not like we are proposing to go and bypass all
kinds of random kernel security mechanisms. These are targeted hooks,
developed by the BPF community for the BPF subsystem to allow trusted
unprivileged production use cases. Yes, we currently rely on checking
CAP_BPF to grant more dangerous/advanced features, but it's because we
can't just allow any unprivileged process to do this. But what we
really want is to answer the question "can we trust this process to
use this advanced functionality", and if there is no specific LSM
policy that cares one way (allow) or the other (disallow), fallback to
CAP_BPF enforcement.

So it's not bypassing kernel checks, but rather augmenting them with
more flexible and customizable mechanisms, while still falling back to
CAP_BPF if the user didn't install any custom LSM policy.

>
> > Also, in previous email you said:
> >
> > > Simply because there is another kernel access control mechanism which
> > > allows a capability check to be skipped doesn't mean I want to allow a
> > > LSM hook to be used to skip a capability check.
> >
> > I understand your stated position, but can you please help me
> > understand the reasoning behind it?
>
> Keeping the LSM as a restrictive access control mechanism helps ensure
> some level of sanity and consistency across different Linux
> installations.  If a certain operation requires CAP_SYS_ADMIN on one
> Linux system, it should require CAP_SYS_ADMIN on another Linux system.
> Granted, a LSM running on one system might impose additional
> constraints on that operation, but the CAP_SYS_ADMIN requirement still
> applies.
>
> There is also an issue of safety in knowing that enabling a LSM will
> not degrade the access controls on a system by potentially granting
> operations that were previously denied.
>
> > Does the above also mean that you'd be fine if we just don't plug into
> > the LSM subsystem at all and instead come up with some ad-hoc solution
> > to allow effectively the same policies? This sounds detrimental both
> > to LSM and BPF subsystems, so I hope we can talk this through before
> > finalizing decisions.
>
> Based on your patches and our discussion, it seems to me that the
> problem you are trying to resolve is related more to the
> capability-based access controls in the eBPF, and possibly other
> kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> work with you on a solution involving the LSM, but please understand
> that I'm not going to support a solution which changes a core
> philosophy of the LSM layer.

Great, I'd really appreciate help and suggestions on how to solve the
following problem.

We have a BPF subsystem that allows loading BPF programs. Those BPF
programs cannot be contained within a particular namespace just by its
system-wide tracing nature (it can safely read kernel and user memory
and we can't restrict whether that memory belongs to a particular
namespace), so it's like CAP_SYS_TIME, just with much broader API
surface.

The other piece of a puzzle is user namespaces. We do want to run
applications inside user namespaces, but allow them to use BPF
programs. As far as I can tell, there is no way to grant real CAP_BPF
that will be recognized by capable(CAP_BPF) (not ns_capable, see above
about system-wide nature of BPF). If there is, please help me
understand how. All my local experiments failed, and looking at
cap_capable() implementation it is not intended to even check the
initial namespace's capability if the process is running in the user
namespace.


So, given that a) we can't make CAP_BPF namespace-aware and b) we
can't grant real CAP_BPF to processes in user namespace, how could we
allow user namespaced applications to do useful work with BPF?

>
> > Lastly, you mentioned before:
> >
> > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >
> > Unfortunately I don't have enough familiarity with all LSM hooks, so I
> > can't confirm or disprove the above statement. But earlier someone
> > brought to my attention the case of security_vm_enough_memory_mm(),
> > which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> > of memory accounting. Am I missing something subtle there or does it
> > grant effective caps indeed?
>
> Some of the comments around that hook can be misleading, but if you
> look at the actual code it starts to make more sense.
>

[...]

>
> I do agree that the security_vm_enough_memory() hook is structured a
> bit differently than most of the other LSM hooks, but it still
> operates with the same philosophy: a LSM should only be allowed to
> restrict access, a LSM should never be allowed to grant access that
> would otherwise be denied by the traditional Linux access controls.
>
> Hopefully that explanation makes sense, but if things are still a bit
> fuzzy I would encourage you to go look at the code, I'm sure it will
> make sense once you spend a few minutes figuring out how it works.
>

Yep, thanks a lot, it's way more clear after grokking relevant pieces
of LSM the code you pointed out and LSM infrastructure in general.
"capabilities" LSM is non-negotiable, so it effectively always
restricts a small subset of hooks, including vm_enough_memory and
capable.

Still, the problem still stands. How do we marry BPF and user
namespaces? I'd really appreciate suggestions. Thank you!


> [1] There is a long and sorta bizarre history with the capability LSM,
> but just understand it is a bit "special" in many ways, and those
> "special" behaviors are intentional.
>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13 16:27             ` Casey Schaufler
@ 2023-04-17 23:31               ` Andrii Nakryiko
  2023-04-17 23:53                 ` Casey Schaufler
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-17 23:31 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Paul Moore, Kees Cook, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module

On Thu, Apr 13, 2023 at 9:27 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
> > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> >>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
> >>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
> >>>>>>> objects, which are fundamental building blocks of any modern BPF application.
> >>>>>>>
> >>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
> >>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> >>>>>>> implement LSM policies that could granularly enforce more restrictions on
> >>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> >>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
> >>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> >>>>>>> cases.
> >>>>>> One of the hallmarks of the LSM has always been that it is
> >>>>>> non-authoritative: it cannot unilaterally grant access, it can only
> >>>>>> restrict what would have been otherwise permitted on a traditional
> >>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
> >>>>>> discretionary access controls, e.g. capabilities.
> >>>>>>
> >>>>>> If there is a problem with the eBPF capability-based access controls,
> >>>>>> that problem needs to be addressed in how the core eBPF code
> >>>>>> implements its capability checks, not by modifying the LSM mechanism
> >>>>>> to bypass these checks.
> >>>>> I think semantics matter here. I wouldn't view this as _bypassing_
> >>>>> capability enforcement: it's just more fine-grained access control.
> > Exactly. One of the motivations for this work was the need to move
> > some production use cases that are only needing extra privileges so
> > that they can use BPF into a more restrictive environment. Granting
> > CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
> > for BPF usage is too coarse grained. These caps would allow those
> > applications way more than just BPF usage. So the idea here is more
> > finer-grained control of BPF-specific operations, granting *effective*
> > CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
> > production logic that would validate the use case.
>
> That's an authoritative model which is in direct conflict with the
> design and implementation of both capabilities and LSM.
>
> >
> > This *is* an attempt to achieve a more secure production approach.
> >
> >>>>> For example, in many places we have things like:
> >>>>>
> >>>>>         if (!some_check(...) && !capable(...))
> >>>>>                 return -EPERM;
> >>>>>
> >>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>> access control requirement is met. The mismatch we have through-out the
> >>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>> yet here).
> >>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>> bypassing a capability check, but the description claims that to be
> >>>> the case.
> >>>>
> >>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>> be used to provide additional access control restrictions beyond a
> >>>> capability check, but a LSM hook should never be allowed to overrule
> >>>> an access denial due to a capability check.
> >>>>
> >>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>> would be fine-grained enough at the time.
> >>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>> argument that one of the reasons LSMs exist is to provide
> >>>> supplementary controls due to capability-based access controls being a
> >>>> poor fit for many modern use cases.
> >>> I generally agree with what you say, but we DO have this code pattern:
> >>>
> >>>          if (!some_check(...) && !capable(...))
> >>>                  return -EPERM;
> >> I think we need to make this more concrete; we don't have a pattern in
> >> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >> Simply because there is another kernel access control mechanism which
> >> allows a capability check to be skipped doesn't mean I want to allow a
> >> LSM hook to be used to skip a capability check.
> > This work is an attempt to tighten the security of production systems
> > by allowing to drop too coarse-grained and permissive capabilities
> > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > than production use cases are meant to be able to do)
>
> The BPF developers are in complete control of what CAP_BPF controls.
> You can easily address the granularity issue by adding addition restrictions
> on processes that have CAP_BPF. That is the intended use of LSM.
> The whole point of having multiple capabilities is so that you can
> grant just those that are required by the system security policy, and
> do so safely. That leads to differences of opinion regarding the definition
> of the system security policy. BPF chose to set itself up as an element
> of security policy (you need CAP_BPF) rather than define elements such that
> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
> control.

Please see my reply to Paul, where I explain CAP_BPF's system-wide
nature and problem with user namespaces. I don't think the problem is
in the granularity of CAP_BPF, it's more of a "non-namespaceable"
nature of the BPF subsystem in general.

>
> >  and then grant
> > specific BPF operations on specific BPF programs/maps based on custom
> > LSM security policy,
>
> This is backwards. The correct implementation is to require CAP_BPF and
> further restrict BPF operations based on a custom LSM security policy.
> That's how LSM is designed.

Please see my reply to Paul, we can't grant real CAP_BPF for
applications in user namespace (unless there is some trick that I
don't know, so please do point it out). Let's converge the discussion
in that email thread branch to not discuss the same topic multiple
times.


>
> >  which validates application trustworthiness using
> > custom production-specific logic.
> >
> > Isn't this goal in line with LSMs mission to enhance system security?
>
> We're not arguing the goal, we're discussing the implementation.
>
> >>> It looks to me like this series can be refactored to do the same. I
> >>> wouldn't consider that to be a "bypass", but I would agree the current
> >>> series looks too much like "bypass", and makes reasoning about the
> >>> effect of the LSM hooks too "special". :)
> > Sorry, I didn't realize that the current code layout is making things
> > more confusing. I'll address feedback to make the intent a bit
> > clearer.
> >
> >> --
> >> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-13 16:54                 ` Casey Schaufler
@ 2023-04-17 23:31                   ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-17 23:31 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Paul Moore, Kees Cook, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module

On Thu, Apr 13, 2023 at 9:54 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/12/2023 10:16 PM, Andrii Nakryiko wrote:
> > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> >> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> >> <andrii.nakryiko@gmail.com> wrote:
> >>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >> ...
> >>
> >>>>>>> For example, in many places we have things like:
> >>>>>>>
> >>>>>>>         if (!some_check(...) && !capable(...))
> >>>>>>>                 return -EPERM;
> >>>>>>>
> >>>>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>>>> access control requirement is met. The mismatch we have through-out the
> >>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>>>> yet here).
> >>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>>>> bypassing a capability check, but the description claims that to be
> >>>>>> the case.
> >>>>>>
> >>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>>>> be used to provide additional access control restrictions beyond a
> >>>>>> capability check, but a LSM hook should never be allowed to overrule
> >>>>>> an access denial due to a capability check.
> >>>>>>
> >>>>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>>>> would be fine-grained enough at the time.
> >>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>>>> argument that one of the reasons LSMs exist is to provide
> >>>>>> supplementary controls due to capability-based access controls being a
> >>>>>> poor fit for many modern use cases.
> >>>>> I generally agree with what you say, but we DO have this code pattern:
> >>>>>
> >>>>>          if (!some_check(...) && !capable(...))
> >>>>>                  return -EPERM;
> >>>> I think we need to make this more concrete; we don't have a pattern in
> >>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>>> Simply because there is another kernel access control mechanism which
> >>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>> LSM hook to be used to skip a capability check.
> >>> This work is an attempt to tighten the security of production systems
> >>> by allowing to drop too coarse-grained and permissive capabilities
> >>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> >>> than production use cases are meant to be able to do) and then grant
> >>> specific BPF operations on specific BPF programs/maps based on custom
> >>> LSM security policy, which validates application trustworthiness using
> >>> custom production-specific logic.
> >> There are ways to leverage the LSMs to apply finer grained access
> >> control on top of the relatively coarse capabilities that do not
> >> require circumventing those capability controls.  One grants the
> >> capabilities, just as one would do today, and then leverages the
> >> security functionality of a LSM to further restrict specific users,
> >> applications, etc. with a level of granularity beyond that offered by
> >> the capability controls.
> > Please help me understand something. What you and Casey are proposing,
> > when taken to the logical extreme, is to grant to all processes root
> > permissions and then use LSM to restrict specific actions, do I
> > understand correctly?
>
> No. You grant a process the capabilities it needs (CAP_BPF, CAP_WHATEVER)
> and only those capabilities. If you want additional restrictions you include
> an LSM that implements those restrictions. If you want finer control over
> the operations controlled by CAP_BPF you include an LSM that implements
> those controls.
>

See previous replies. We can't grant CAP_BPF, even if we wanted to, if
the process is in a user namespace.

> >  This strikes me as a less secure and more
> > error-prone way of doing things. If there is some problem with
> > installing LSM policy,
>
> LSMs are not required to have loadable or dynamic policies. That's
> up to the developer.
>

Sure, but having a more dynamic policy is a very attractive feature
and one of the reasons for people to use BPF LSM. So it might not be
required, but it's something that people are using in practice, so if
we can make all this less error-prone, that would be better for
everyone.

> >  it could go unnoticed for a really long time,
> > while the system would be way more vulnerable.
>
> There is no way Paul or I are going to solve the mis-configured system
> problem.
>

Please see my example about (hypothetical) 21st added hook that is
very easy to miss, because the kernel is big and there are tons of
people doing development, and so it's no wonder that users might miss
a new hook they are supposed to restrict.

But again, even with all that said, granting CAP_BPF is impossible for
user namespaced applications.

> >  Why do you prefer such
> > an approach instead of going with no extra permissions by default, but
> > allowing custom LSM policy to grant few exceptions for known and
> > trusted use cases?
>
> Because that's not how capabilities work. Capabilities are independent
> of other controls. If you want to propose a change to how capabilities
> work, you need to propose that to the capability maintainer.
>
> Because that's not how LSMs work. LSMs implement additional restrictions
> to the existing policy. The restrictive vs. authoritative debate was closed
> long ago. It's a fundamental property of how LSMs work.

There doesn't seem to be anything fundamentally and technically
preventing LSM hooks to say "yep, looks good, no need to fallback to
CAP_BPF checks due to lack of other signal". [0] also outright said
that authoritative hooks can be the next step, but didn't reject it
outright.

  [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3


>
> > By the way, even the above proposal of yours doesn't work for
> > production use cases when user namespaces are involved, as far as I
> > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > containers running inside user namespaces, as CAP_BPF in non-init
> > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > BPF program (bpf() doesn't do ns_capable(), it's only using
> > capable()). What solution would you suggest for such production
> > setups?
>
> If user namespaces don't work the way you'd like, you should take that
> up with the namespace maintainers. Or, since this appears to be an issue
> with BPF not being namespace aware, fix BPF's use of capable() and ns_capable().

Can't be fixed on the BPF side, unfortunately. Don't know enough about
namespaces to tell if it's a bug or feature that root CAP_BPF can't be
checked from inside userns. So yep, I should perhaps ask.

>
> > Also, in previous email you said:
> >
> >> Simply because there is another kernel access control mechanism which
> >> allows a capability check to be skipped doesn't mean I want to allow a
> >> LSM hook to be used to skip a capability check.
> > I understand your stated position, but can you please help me
> > understand the reasoning behind it? What would be wrong with some LSM
> > hooks granting effective capabilities?
>
> You keep asking the question and ignoring the answer. See above.
>
> >  How would that change anything
> > about LSM design? As far as I can see, I'm not doing anything crazy
> > with my LSM hook implementation.
>
> You keep asking the question and ignoring the answer. See above.
>
>
> >  It's reusing the standard
> > call_int_hook() mechanism very straightforwardly with a default result
> > of 0. And then just interprets 0, <0, and >0 results accordingly. Is
> > that abusing the LSM mechanism itself somehow?
> >
> > Does the above also mean that you'd be fine if we just don't plug into
> > the LSM subsystem at all and instead come up with some ad-hoc solution
> > to allow effectively the same policies?
>
> No, because you would be breaking the capability system in that case.
>
> There is an example of a feature that does just what you're suggesting.
> POSIX ACLs aren't an LSM because they don't just add restrictions, they
> change the semantics of the file mode bits. Look at that implementation
> before you seriously consider going that route.

Are you referring to posix_acl_permission() and fs/posix_acl.c? I'll
take a look, not familiar. Thanks for the suggestion!

I'd still prefer to avoid building a new access control system just
for BPF, of course. But let me take a look at the code and see what
you are referring to.

>
> >  This sounds detrimental both
> > to LSM and BPF subsystems, so I hope we can talk this through before
> > finalizing decisions.
> >
> > Lastly, you mentioned before:
> >
> >>>> I think we need to make this more concrete; we don't have a pattern in
> >>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > Unfortunately I don't have enough familiarity with all LSM hooks, so I
> > can't confirm or disprove the above statement. But earlier someone
> > brought to my attention the case of security_vm_enough_memory_mm(),
> > which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> > of memory accounting. Am I missing something subtle there or does it
> > grant effective caps indeed?
> >
> >
> >
> >
> >> --
> >> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-14 20:23     ` Dr. Greg
@ 2023-04-17 23:31       ` Andrii Nakryiko
  2023-04-19 10:53         ` Dr. Greg
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-17 23:31 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Kees Cook, Paul Moore, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module

On Fri, Apr 14, 2023 at 1:24 PM Dr. Greg <greg@enjellic.com> wrote:
>
> On Wed, Apr 12, 2023 at 10:47:13AM -0700, Kees Cook wrote:
>
> Hi, I hope the week is ending well for everyone.
>
> > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > On Wed, Apr 12, 2023 at 12:33???AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > objects, which are fundamental building blocks of any modern BPF application.
> > > >
> > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > implement LSM policies that could granularly enforce more restrictions on
> > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > cases.
> > >
> > > One of the hallmarks of the LSM has always been that it is
> > > non-authoritative: it cannot unilaterally grant access, it can only
> > > restrict what would have been otherwise permitted on a traditional
> > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > discretionary access controls, e.g. capabilities.
> > >
> > > If there is a problem with the eBPF capability-based access controls,
> > > that problem needs to be addressed in how the core eBPF code
> > > implements its capability checks, not by modifying the LSM mechanism
> > > to bypass these checks.
>
> > I think semantics matter here. I wouldn't view this as _bypassing_
> > capability enforcement: it's just more fine-grained access control.
> >
> > For example, in many places we have things like:
> >
> >       if (!some_check(...) && !capable(...))
> >               return -EPERM;
> >
> > I would expect this is a similar logic. An operation can succeed if the
> > access control requirement is met. The mismatch we have through-out the
> > kernel is that capability checks aren't strictly done by LSM hooks. And
> > this series conceptually, I think, doesn't violate that -- it's changing
> > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > yet here).
> >
> > The reason CAP_BPF was created was because there was nothing else that
> > would be fine-grained enough at the time.
>
> This was one of the issues, among others, that the TSEM LSM we are
> working to upstream, was designed to address and may be an avenue
> forward.
>
> TSEM, being narratival rather than deontologically based, provides a
> framework for security permissions that are based on a
> characterization of the event itself.  So the permissions are as
> variable as the contents of whatever BPF related information is passed
> to the bpf* LSM hooks [1].
>
> Currently, the tsem_bpf_* hooks are generically modeled.  We would
> certainly entertain any discussion or suggestions as to what elements
> of the structures passed to the hooks would be useful with respect
> to establishing security policies useful and appropriate to the BPF
> community.

Could you please provide some links to get a bit more context and
information? I'd like to understand at least "narratival rather than
deontologically based" part of this.

>
> We don't want to get in the middle of the restrictive
> vs. authoritative debate, but it would seem that the jury is
> conclusively in on that issue and LSM hooks are not going to be
> allowed to dismiss, or modify, any other security controls.
>
> Hopefully the BPF ABI isn't tied to CAP_BPF as that would seem to make
> it problematic to make controls more granular.
>
> > Kees Cook
>
> Have a good weekend.
>
> As always,
> Dr. Greg
>
> The Quixote Project - Flailing at the Travails of Cybersecurity
>
> [1]: Plus developers don't need to write security policies, you test
> your application in order to get the desired controls for a workload.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-17 23:31               ` Andrii Nakryiko
@ 2023-04-17 23:53                 ` Casey Schaufler
  2023-04-18  0:28                   ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Casey Schaufler @ 2023-04-17 23:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Paul Moore, Kees Cook, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module, Casey Schaufler

On 4/17/2023 4:31 PM, Andrii Nakryiko wrote:
> On Thu, Apr 13, 2023 at 9:27 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
>>>>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
>>>>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
>>>>>>>>> objects, which are fundamental building blocks of any modern BPF application.
>>>>>>>>>
>>>>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
>>>>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
>>>>>>>>> implement LSM policies that could granularly enforce more restrictions on
>>>>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
>>>>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
>>>>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
>>>>>>>>> cases.
>>>>>>>> One of the hallmarks of the LSM has always been that it is
>>>>>>>> non-authoritative: it cannot unilaterally grant access, it can only
>>>>>>>> restrict what would have been otherwise permitted on a traditional
>>>>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
>>>>>>>> discretionary access controls, e.g. capabilities.
>>>>>>>>
>>>>>>>> If there is a problem with the eBPF capability-based access controls,
>>>>>>>> that problem needs to be addressed in how the core eBPF code
>>>>>>>> implements its capability checks, not by modifying the LSM mechanism
>>>>>>>> to bypass these checks.
>>>>>>> I think semantics matter here. I wouldn't view this as _bypassing_
>>>>>>> capability enforcement: it's just more fine-grained access control.
>>> Exactly. One of the motivations for this work was the need to move
>>> some production use cases that are only needing extra privileges so
>>> that they can use BPF into a more restrictive environment. Granting
>>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
>>> for BPF usage is too coarse grained. These caps would allow those
>>> applications way more than just BPF usage. So the idea here is more
>>> finer-grained control of BPF-specific operations, granting *effective*
>>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
>>> production logic that would validate the use case.
>> That's an authoritative model which is in direct conflict with the
>> design and implementation of both capabilities and LSM.
>>
>>> This *is* an attempt to achieve a more secure production approach.
>>>
>>>>>>> For example, in many places we have things like:
>>>>>>>
>>>>>>>         if (!some_check(...) && !capable(...))
>>>>>>>                 return -EPERM;
>>>>>>>
>>>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>>>> access control requirement is met. The mismatch we have through-out the
>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>>>> yet here).
>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>>>> bypassing a capability check, but the description claims that to be
>>>>>> the case.
>>>>>>
>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>>>> be used to provide additional access control restrictions beyond a
>>>>>> capability check, but a LSM hook should never be allowed to overrule
>>>>>> an access denial due to a capability check.
>>>>>>
>>>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>>>> would be fine-grained enough at the time.
>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>>>> argument that one of the reasons LSMs exist is to provide
>>>>>> supplementary controls due to capability-based access controls being a
>>>>>> poor fit for many modern use cases.
>>>>> I generally agree with what you say, but we DO have this code pattern:
>>>>>
>>>>>          if (!some_check(...) && !capable(...))
>>>>>                  return -EPERM;
>>>> I think we need to make this more concrete; we don't have a pattern in
>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>>> Simply because there is another kernel access control mechanism which
>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>> LSM hook to be used to skip a capability check.
>>> This work is an attempt to tighten the security of production systems
>>> by allowing to drop too coarse-grained and permissive capabilities
>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
>>> than production use cases are meant to be able to do)
>> The BPF developers are in complete control of what CAP_BPF controls.
>> You can easily address the granularity issue by adding addition restrictions
>> on processes that have CAP_BPF. That is the intended use of LSM.
>> The whole point of having multiple capabilities is so that you can
>> grant just those that are required by the system security policy, and
>> do so safely. That leads to differences of opinion regarding the definition
>> of the system security policy. BPF chose to set itself up as an element
>> of security policy (you need CAP_BPF) rather than define elements such that
>> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
>> control.
> Please see my reply to Paul, where I explain CAP_BPF's system-wide
> nature and problem with user namespaces. I don't think the problem is
> in the granularity of CAP_BPF, it's more of a "non-namespaceable"
> nature of the BPF subsystem in general.

Paul is approaching this from a different angle. Your response to Paul
does not address the issue I have raised.

>>>  and then grant
>>> specific BPF operations on specific BPF programs/maps based on custom
>>> LSM security policy,
>> This is backwards. The correct implementation is to require CAP_BPF and
>> further restrict BPF operations based on a custom LSM security policy.
>> That's how LSM is designed.
> Please see my reply to Paul, we can't grant real CAP_BPF for
> applications in user namespace (unless there is some trick that I
> don't know, so please do point it out). Let's converge the discussion
> in that email thread branch to not discuss the same topic multiple
> times.

I saw your reply to Paul. Paul's points are not my points. If they where,
I wouldn't have taken my or your time to present them.

>>>  which validates application trustworthiness using
>>> custom production-specific logic.
>>>
>>> Isn't this goal in line with LSMs mission to enhance system security?
>> We're not arguing the goal, we're discussing the implementation.
>>
>>>>> It looks to me like this series can be refactored to do the same. I
>>>>> wouldn't consider that to be a "bypass", but I would agree the current
>>>>> series looks too much like "bypass", and makes reasoning about the
>>>>> effect of the LSM hooks too "special". :)
>>> Sorry, I didn't realize that the current code layout is making things
>>> more confusing. I'll address feedback to make the intent a bit
>>> clearer.
>>>
>>>> --
>>>> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-17 23:53                 ` Casey Schaufler
@ 2023-04-18  0:28                   ` Andrii Nakryiko
  2023-04-18  0:52                     ` Casey Schaufler
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-18  0:28 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Paul Moore, Kees Cook, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module

On Mon, Apr 17, 2023 at 4:53 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/17/2023 4:31 PM, Andrii Nakryiko wrote:
> > On Thu, Apr 13, 2023 at 9:27 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >> On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
> >>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >>>>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> >>>>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
> >>>>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
> >>>>>>>>> objects, which are fundamental building blocks of any modern BPF application.
> >>>>>>>>>
> >>>>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
> >>>>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> >>>>>>>>> implement LSM policies that could granularly enforce more restrictions on
> >>>>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> >>>>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
> >>>>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> >>>>>>>>> cases.
> >>>>>>>> One of the hallmarks of the LSM has always been that it is
> >>>>>>>> non-authoritative: it cannot unilaterally grant access, it can only
> >>>>>>>> restrict what would have been otherwise permitted on a traditional
> >>>>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
> >>>>>>>> discretionary access controls, e.g. capabilities.
> >>>>>>>>
> >>>>>>>> If there is a problem with the eBPF capability-based access controls,
> >>>>>>>> that problem needs to be addressed in how the core eBPF code
> >>>>>>>> implements its capability checks, not by modifying the LSM mechanism
> >>>>>>>> to bypass these checks.
> >>>>>>> I think semantics matter here. I wouldn't view this as _bypassing_
> >>>>>>> capability enforcement: it's just more fine-grained access control.
> >>> Exactly. One of the motivations for this work was the need to move
> >>> some production use cases that are only needing extra privileges so
> >>> that they can use BPF into a more restrictive environment. Granting
> >>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
> >>> for BPF usage is too coarse grained. These caps would allow those
> >>> applications way more than just BPF usage. So the idea here is more
> >>> finer-grained control of BPF-specific operations, granting *effective*
> >>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
> >>> production logic that would validate the use case.
> >> That's an authoritative model which is in direct conflict with the
> >> design and implementation of both capabilities and LSM.
> >>
> >>> This *is* an attempt to achieve a more secure production approach.
> >>>
> >>>>>>> For example, in many places we have things like:
> >>>>>>>
> >>>>>>>         if (!some_check(...) && !capable(...))
> >>>>>>>                 return -EPERM;
> >>>>>>>
> >>>>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>>>> access control requirement is met. The mismatch we have through-out the
> >>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>>>> yet here).
> >>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>>>> bypassing a capability check, but the description claims that to be
> >>>>>> the case.
> >>>>>>
> >>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>>>> be used to provide additional access control restrictions beyond a
> >>>>>> capability check, but a LSM hook should never be allowed to overrule
> >>>>>> an access denial due to a capability check.
> >>>>>>
> >>>>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>>>> would be fine-grained enough at the time.
> >>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>>>> argument that one of the reasons LSMs exist is to provide
> >>>>>> supplementary controls due to capability-based access controls being a
> >>>>>> poor fit for many modern use cases.
> >>>>> I generally agree with what you say, but we DO have this code pattern:
> >>>>>
> >>>>>          if (!some_check(...) && !capable(...))
> >>>>>                  return -EPERM;
> >>>> I think we need to make this more concrete; we don't have a pattern in
> >>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>>> Simply because there is another kernel access control mechanism which
> >>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>> LSM hook to be used to skip a capability check.
> >>> This work is an attempt to tighten the security of production systems
> >>> by allowing to drop too coarse-grained and permissive capabilities
> >>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> >>> than production use cases are meant to be able to do)
> >> The BPF developers are in complete control of what CAP_BPF controls.
> >> You can easily address the granularity issue by adding addition restrictions
> >> on processes that have CAP_BPF. That is the intended use of LSM.
> >> The whole point of having multiple capabilities is so that you can
> >> grant just those that are required by the system security policy, and
> >> do so safely. That leads to differences of opinion regarding the definition
> >> of the system security policy. BPF chose to set itself up as an element
> >> of security policy (you need CAP_BPF) rather than define elements such that
> >> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
> >> control.
> > Please see my reply to Paul, where I explain CAP_BPF's system-wide
> > nature and problem with user namespaces. I don't think the problem is
> > in the granularity of CAP_BPF, it's more of a "non-namespaceable"
> > nature of the BPF subsystem in general.
>
> Paul is approaching this from a different angle. Your response to Paul
> does not address the issue I have raised.

I see, I definitely missed this. Re-reading your reply, I still am not
clear on what you are proposing, tbh. Can you please elaborate what
you have in mind?

>
> >>>  and then grant
> >>> specific BPF operations on specific BPF programs/maps based on custom
> >>> LSM security policy,
> >> This is backwards. The correct implementation is to require CAP_BPF and
> >> further restrict BPF operations based on a custom LSM security policy.
> >> That's how LSM is designed.
> > Please see my reply to Paul, we can't grant real CAP_BPF for
> > applications in user namespace (unless there is some trick that I
> > don't know, so please do point it out). Let's converge the discussion
> > in that email thread branch to not discuss the same topic multiple
> > times.
>
> I saw your reply to Paul. Paul's points are not my points. If they where,
> I wouldn't have taken my or your time to present them.

Sure, sorry about that. What do you have in mind then?

>
> >>>  which validates application trustworthiness using
> >>> custom production-specific logic.
> >>>
> >>> Isn't this goal in line with LSMs mission to enhance system security?
> >> We're not arguing the goal, we're discussing the implementation.
> >>
> >>>>> It looks to me like this series can be refactored to do the same. I
> >>>>> wouldn't consider that to be a "bypass", but I would agree the current
> >>>>> series looks too much like "bypass", and makes reasoning about the
> >>>>> effect of the LSM hooks too "special". :)
> >>> Sorry, I didn't realize that the current code layout is making things
> >>> more confusing. I'll address feedback to make the intent a bit
> >>> clearer.
> >>>
> >>>> --
> >>>> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-17 23:29                   ` Andrii Nakryiko
@ 2023-04-18  0:47                     ` Casey Schaufler
  2023-04-21  0:00                       ` Andrii Nakryiko
  2023-04-18 14:21                     ` Paul Moore
  1 sibling, 1 reply; 52+ messages in thread
From: Casey Schaufler @ 2023-04-18  0:47 UTC (permalink / raw)
  To: Andrii Nakryiko, Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module, Casey Schaufler

On 4/17/2023 4:29 PM, Andrii Nakryiko wrote:
> On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
>> On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
>> <andrii.nakryiko@gmail.com> wrote:
>>> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
>>>> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>>>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>> ...
>>>>
>>>>>>>>> For example, in many places we have things like:
>>>>>>>>>
>>>>>>>>>         if (!some_check(...) && !capable(...))
>>>>>>>>>                 return -EPERM;
>>>>>>>>>
>>>>>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>>>>>> access control requirement is met. The mismatch we have through-out the
>>>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>>>>>> yet here).
>>>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>>>>>> bypassing a capability check, but the description claims that to be
>>>>>>>> the case.
>>>>>>>>
>>>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>>>>>> be used to provide additional access control restrictions beyond a
>>>>>>>> capability check, but a LSM hook should never be allowed to overrule
>>>>>>>> an access denial due to a capability check.
>>>>>>>>
>>>>>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>>>>>> would be fine-grained enough at the time.
>>>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>>>>>> argument that one of the reasons LSMs exist is to provide
>>>>>>>> supplementary controls due to capability-based access controls being a
>>>>>>>> poor fit for many modern use cases.
>>>>>>> I generally agree with what you say, but we DO have this code pattern:
>>>>>>>
>>>>>>>          if (!some_check(...) && !capable(...))
>>>>>>>                  return -EPERM;
>>>>>> I think we need to make this more concrete; we don't have a pattern in
>>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>>>>> Simply because there is another kernel access control mechanism which
>>>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>>>> LSM hook to be used to skip a capability check.
>>>>> This work is an attempt to tighten the security of production systems
>>>>> by allowing to drop too coarse-grained and permissive capabilities
>>>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
>>>>> than production use cases are meant to be able to do) and then grant
>>>>> specific BPF operations on specific BPF programs/maps based on custom
>>>>> LSM security policy, which validates application trustworthiness using
>>>>> custom production-specific logic.
>>>> There are ways to leverage the LSMs to apply finer grained access
>>>> control on top of the relatively coarse capabilities that do not
>>>> require circumventing those capability controls.  One grants the
>>>> capabilities, just as one would do today, and then leverages the
>>>> security functionality of a LSM to further restrict specific users,
>>>> applications, etc. with a level of granularity beyond that offered by
>>>> the capability controls.
>>> Please help me understand something. What you and Casey are proposing,
>>> when taken to the logical extreme, is to grant to all processes root
>>> permissions and then use LSM to restrict specific actions, do I
>>> understand correctly? This strikes me as a less secure and more
>>> error-prone way of doing things.
>> When taken to the "logical extreme" most concepts end up sounding a
>> bit absurd, but that was the point, wasn't it?
> Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> the sake of example, let's say CAP_BPF allows 20 different operations
> (each with its own security_xxx hook). And let's say in production I
> want to only allow 3 of them. Sure, technically it should be possible
> to deny access at 17 hooks and let it through in just those 3. But if
> someone adds 21st and I forget to add 21st restriction, that would be
> bad (but very probably with such approach).

That would be a flaw in the implementation of the 21st, not a problem
with the capabilities or LSM model. For the LSM model to be sufficiently
flexible it cannot be required to prevent or detect coding errors.

> So my point is that for situations like this, dropping CAP_BPF, but
> allowing only 3 hooks to proceed seems a safer approach, because if we
> add 21st hook, it will safely be denied without CAP_BPF *by default*.
> That's what I tried to point out.

When you're creating security relevant or enforcing mechanisms there has
too be a level of expectation regarding the care with which they're
developed. My expectation is that the 21st hook won't go in without
adequate review.

> But even if we ignore this "safe by default when a new hook is added"
> behavior, when taking user namespaces into account, the restrictive
> LSM approach just doesn't seem to work at all for something like
> CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> because we cannot ensure that a given BPF program won't access kernel
> state "belonging" to another process (as one example).

Time namespaces have been proposed. I would be surprised if there aren't
people working on BPF namespaces somewhere. There's a difference between
"can't" and "haven't been".

> Now, thanks to Jonathan, I get that there was a heated discussion 20
> years ago about authoritative vs restrictive LSMs. But if I read a
> summary at that time ([0]), authoritative hooks were not out of the
> question *in principle*. Surely, "walk before we can run" makes sense,
> but it's been a while ago.

Certainly. The SGI comment was mine, by the way. I wanted authoritative
hooks for cases like POSIX ACLs and systems without root. While I would
have liked the decision to go the other way, there's no way I would endorse
a hybrid, where some hooks are restrictive and others authoritative.

>   [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3
>
>
>> Here is a fun story which seems relevant ... in the early days of
>> SELinux, one of the community devs setup up a system with a SELinux
>> policy which restricted all privileged operations from the root user,
>> put the system on a publicly accessible network, posted the root
>> password for all to see, and invited the public to login to the system
>> and attempt to exercise root privilege (it's been well over 10 years
>> at this point so the details are a bit fuzzy).  Granted, there were
>> some hiccups in the beginning, mostly due to the crude state of policy
>> development/analysis at the time, but after a few policy revisions the
>> system held up quite well.
> Honest question out of curiosity: was the intent to demonstrate that
> with LSM one can completely restrict root? Or that root was actually
> allowed to do something useful? Because I can see how rejecting
> everything would be rather simple, but actually pretty useless in
> practice. Restricting only part of the power of the root, while still
> allowing it to do something useful in production seems like a much
> harder (but way more valuable) endeavor. Not saying it's impossible,
> but see my example about missing 21st new CAP_BPF functionality.

Capabilities are sufficient to implement a rootless system. It's been done.
Someone will point out that CAP_SYS_ADMIN is effectively root, and there's
some truth to that.

>> On the more practical side of things, there are several use cases
>> which require, by way of legal or contractual requirements, that full
>> root/admin privileges are decomposed into separate roles: security
>> admin, audit admin, backup admin, etc.  These users satisfy these
>> requirements by using LSMs, such as SELinux, to restrict the
>> administrative capabilities based on the SELinux user/role/domain.
>>
>>> By the way, even the above proposal of yours doesn't work for
>>> production use cases when user namespaces are involved, as far as I
>>> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
>>> containers running inside user namespaces, as CAP_BPF in non-init
>>> namespace is not enough for bpf() syscall to allow loading BPF maps or
>>> BPF program ...
>> Once again, the LSM has always intended to be a restrictive mechanism,
>> not a privilege granting mechanism.  If an operation is not possible
> Not according to [0] above:
>
>   > It is our belief that these changes do not belong in the initial version of
>   > LSM (especially given our limited charter and original goals), and should
>   > be proposed as incremental refinements after LSM has been initially
>   > accepted.
>   > ...
>   > It is our belief that the current LSM
>   > will provide a meaningful improvement in the security infrastructure of the
>   > Linux kernel, and that there is plenty of room for future expansion of LSM
>   > in subsequent phases.
>
> I don't see "always intended to be a restrictive mechanism" there.

Having been on the other side of the argument, the system that was accepted
was in fact "always intended to be a restrictive mechanism". The quote above
is a "never say never" statement.

>> without the LSM layer enabled, it should not be possible with the LSM
>> layer enabled.  The LSM is not a mechanism to circumvent other access
>> control mechanisms in the kernel.
> I understand, but it's not like we are proposing to go and bypass all
> kinds of random kernel security mechanisms. These are targeted hooks,
> developed by the BPF community for the BPF subsystem to allow trusted
> unprivileged production use cases. Yes, we currently rely on checking
> CAP_BPF to grant more dangerous/advanced features, but it's because we
> can't just allow any unprivileged process to do this. But what we
> really want is to answer the question "can we trust this process to
> use this advanced functionality", and if there is no specific LSM
> policy that cares one way (allow) or the other (disallow), fallback to
> CAP_BPF enforcement.
>
> So it's not bypassing kernel checks, but rather augmenting them with
> more flexible and customizable mechanisms, while still falling back to
> CAP_BPF if the user didn't install any custom LSM policy.

That would make CAP_BPF behave differently from all other capabilities.
Capabilities are hard enough to use correctly as it is. If each capability
defined its own semantics they would be completely unusable. 

>>> Also, in previous email you said:
>>>
>>>> Simply because there is another kernel access control mechanism which
>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>> LSM hook to be used to skip a capability check.
>>> I understand your stated position, but can you please help me
>>> understand the reasoning behind it?
>> Keeping the LSM as a restrictive access control mechanism helps ensure
>> some level of sanity and consistency across different Linux
>> installations.  If a certain operation requires CAP_SYS_ADMIN on one
>> Linux system, it should require CAP_SYS_ADMIN on another Linux system.
>> Granted, a LSM running on one system might impose additional
>> constraints on that operation, but the CAP_SYS_ADMIN requirement still
>> applies.
>>
>> There is also an issue of safety in knowing that enabling a LSM will
>> not degrade the access controls on a system by potentially granting
>> operations that were previously denied.
>>
>>> Does the above also mean that you'd be fine if we just don't plug into
>>> the LSM subsystem at all and instead come up with some ad-hoc solution
>>> to allow effectively the same policies? This sounds detrimental both
>>> to LSM and BPF subsystems, so I hope we can talk this through before
>>> finalizing decisions.
>> Based on your patches and our discussion, it seems to me that the
>> problem you are trying to resolve is related more to the
>> capability-based access controls in the eBPF, and possibly other
>> kernel subsystems, and not any LSM-based restrictions.  I'm happy to
>> work with you on a solution involving the LSM, but please understand
>> that I'm not going to support a solution which changes a core
>> philosophy of the LSM layer.
> Great, I'd really appreciate help and suggestions on how to solve the
> following problem.
>
> We have a BPF subsystem that allows loading BPF programs. Those BPF
> programs cannot be contained within a particular namespace just by its
> system-wide tracing nature (it can safely read kernel and user memory
> and we can't restrict whether that memory belongs to a particular
> namespace), so it's like CAP_SYS_TIME, just with much broader API
> surface.

This doesn't sound like a problem, it sounds like BPF is explicitly
designed to prevent interference by namespaces. But in some cases you
now want to limit it by namespaces.

It appears that the desired uses of BPF are no longer compatible with
its original security model. That's unfortunate, and likely to require
a significant change to the implementation of BPF.

>
> The other piece of a puzzle is user namespaces. We do want to run
> applications inside user namespaces, but allow them to use BPF
> programs. As far as I can tell, there is no way to grant real CAP_BPF
> that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> about system-wide nature of BPF). If there is, please help me
> understand how. All my local experiments failed, and looking at
> cap_capable() implementation it is not intended to even check the
> initial namespace's capability if the process is running in the user
> namespace.
>
>
> So, given that a) we can't make CAP_BPF namespace-aware and b) we
> can't grant real CAP_BPF to processes in user namespace, how could we
> allow user namespaced applications to do useful work with BPF?
>
>>> Lastly, you mentioned before:
>>>
>>>>>> I think we need to make this more concrete; we don't have a pattern in
>>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>> Unfortunately I don't have enough familiarity with all LSM hooks, so I
>>> can't confirm or disprove the above statement. But earlier someone
>>> brought to my attention the case of security_vm_enough_memory_mm(),
>>> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
>>> of memory accounting. Am I missing something subtle there or does it
>>> grant effective caps indeed?
>> Some of the comments around that hook can be misleading, but if you
>> look at the actual code it starts to make more sense.
>>
> [...]
>
>> I do agree that the security_vm_enough_memory() hook is structured a
>> bit differently than most of the other LSM hooks, but it still
>> operates with the same philosophy: a LSM should only be allowed to
>> restrict access, a LSM should never be allowed to grant access that
>> would otherwise be denied by the traditional Linux access controls.
>>
>> Hopefully that explanation makes sense, but if things are still a bit
>> fuzzy I would encourage you to go look at the code, I'm sure it will
>> make sense once you spend a few minutes figuring out how it works.
>>
> Yep, thanks a lot, it's way more clear after grokking relevant pieces
> of LSM the code you pointed out and LSM infrastructure in general.
> "capabilities" LSM is non-negotiable, so it effectively always
> restricts a small subset of hooks, including vm_enough_memory and
> capable.
>
> Still, the problem still stands. How do we marry BPF and user
> namespaces? I'd really appreciate suggestions. Thank you!
>
>
>> [1] There is a long and sorta bizarre history with the capability LSM,
>> but just understand it is a bit "special" in many ways, and those
>> "special" behaviors are intentional.
>>
>> --
>> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-18  0:28                   ` Andrii Nakryiko
@ 2023-04-18  0:52                     ` Casey Schaufler
  0 siblings, 0 replies; 52+ messages in thread
From: Casey Schaufler @ 2023-04-18  0:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Paul Moore, Kees Cook, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module, Casey Schaufler

On 4/17/2023 5:28 PM, Andrii Nakryiko wrote:
> On Mon, Apr 17, 2023 at 4:53 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> ...
>>
>> The BPF developers are in complete control of what CAP_BPF controls.
>> You can easily address the granularity issue by adding addition restrictions
>> on processes that have CAP_BPF. That is the intended use of LSM.
>> The whole point of having multiple capabilities is so that you can
>> grant just those that are required by the system security policy, and
>> do so safely. That leads to differences of opinion regarding the definition
>> of the system security policy. BPF chose to set itself up as an element
>> of security policy (you need CAP_BPF) rather than define elements such that
>> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
>> control.
>>> Please see my reply to Paul, where I explain CAP_BPF's system-wide
>>> nature and problem with user namespaces. I don't think the problem is
>>> in the granularity of CAP_BPF, it's more of a "non-namespaceable"
>>> nature of the BPF subsystem in general.
>> Paul is approaching this from a different angle. Your response to Paul
>> does not address the issue I have raised.
> I see, I definitely missed this. Re-reading your reply, I still am not
> clear on what you are proposing, tbh. Can you please elaborate what
> you have in mind?

As requested, I've moved over to the "other" thread.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-17 23:29                   ` Andrii Nakryiko
  2023-04-18  0:47                     ` Casey Schaufler
@ 2023-04-18 14:21                     ` Paul Moore
  2023-04-21  0:00                       ` Andrii Nakryiko
  1 sibling, 1 reply; 52+ messages in thread
From: Paul Moore @ 2023-04-18 14:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Mon, Apr 17, 2023 at 7:29 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > ...
> > > >
> > > > > > > > > For example, in many places we have things like:
> > > > > > > > >
> > > > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > > > >                 return -EPERM;
> > > > > > > > >
> > > > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > > > yet here).
> > > > > > > >
> > > > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > > > bypassing a capability check, but the description claims that to be
> > > > > > > > the case.
> > > > > > > >
> > > > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > > > be used to provide additional access control restrictions beyond a
> > > > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > > > an access denial due to a capability check.
> > > > > > > >
> > > > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > > > would be fine-grained enough at the time.
> > > > > > > >
> > > > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > > > supplementary controls due to capability-based access controls being a
> > > > > > > > poor fit for many modern use cases.
> > > > > > >
> > > > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > > > >
> > > > > > >          if (!some_check(...) && !capable(...))
> > > > > > >                  return -EPERM;
> > > > > >
> > > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > > > Simply because there is another kernel access control mechanism which
> > > > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > > > LSM hook to be used to skip a capability check.
> > > > >
> > > > > This work is an attempt to tighten the security of production systems
> > > > > by allowing to drop too coarse-grained and permissive capabilities
> > > > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > > > than production use cases are meant to be able to do) and then grant
> > > > > specific BPF operations on specific BPF programs/maps based on custom
> > > > > LSM security policy, which validates application trustworthiness using
> > > > > custom production-specific logic.
> > > >
> > > > There are ways to leverage the LSMs to apply finer grained access
> > > > control on top of the relatively coarse capabilities that do not
> > > > require circumventing those capability controls.  One grants the
> > > > capabilities, just as one would do today, and then leverages the
> > > > security functionality of a LSM to further restrict specific users,
> > > > applications, etc. with a level of granularity beyond that offered by
> > > > the capability controls.
> > >
> > > Please help me understand something. What you and Casey are proposing,
> > > when taken to the logical extreme, is to grant to all processes root
> > > permissions and then use LSM to restrict specific actions, do I
> > > understand correctly? This strikes me as a less secure and more
> > > error-prone way of doing things.
> >
> > When taken to the "logical extreme" most concepts end up sounding a
> > bit absurd, but that was the point, wasn't it?
>
> Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> the sake of example, let's say CAP_BPF allows 20 different operations
> (each with its own security_xxx hook). And let's say in production I
> want to only allow 3 of them. Sure, technically it should be possible
> to deny access at 17 hooks and let it through in just those 3. But if
> someone adds 21st and I forget to add 21st restriction, that would be
> bad (but very probably with such approach).

Welcome to the challenges of maintaining access controls within the
Linux Kernel, LSM or otherwise.  As we all know, the Linux Kernel
moves forward at a staggering pace sometimes, and it is not uncommon
for new features/subsystems to be added without consulting all of the
different folks who worry about access controls.  In many cases it can
be a simple misunderstanding, but in some cases it's a willful
rejection of a particular form of access control, the LSM being a
prime example.  Thankfully in almost all of those cases we have been
moderately successful in retrofitting the necessary access controls,
sometimes they are not as good/capable/granular/etc. as we would like
because of design limitations, but such is life.

I say this not because I believe this is a valid argument for
authoritative LSM hooks, I say this simply to acknowledge that this
*is* a problem.

> So my point is that for situations like this, dropping CAP_BPF, but
> allowing only 3 hooks to proceed seems a safer approach, because if we
> add 21st hook, it will safely be denied without CAP_BPF *by default*.
> That's what I tried to point out.

I believe I understand your point, I just disagree with you on
accepting authoritative LSM hooks in the upstream Linux Kernel; I
believe it would be a *big* mistake to move away from the restrictive
LSM hook philosophy at this point in time.

> But even if we ignore this "safe by default when a new hook is added"
> behavior, when taking user namespaces into account, the restrictive
> LSM approach just doesn't seem to work at all for something like
> CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> because we cannot ensure that a given BPF program won't access kernel
> state "belonging" to another process (as one example).

Once again, the root of this problem lies in the capabilities and/or
namespace mechanisms, not the LSM; if you want to fix this properly
you should be looking at how eBPF leverages capabilities for access
control.  Changing the very core behavior of the LSM layer in order to
work around an issue with another access control mechanism is a
non-starter.  I can't say this enough.

> Now, thanks to Jonathan, I get that there was a heated discussion 20
> years ago about authoritative vs restrictive LSMs. But if I read a
> summary at that time ([0]), authoritative hooks were not out of the
> question *in principle*. Surely, "walk before we can run" makes sense,
> but it's been a while ago.

... and once again, the restrictive approach has proven to work
reasonably well over the past ~20 years, why would we abandon that
simply to work around a problem with a different access control
mechanism.  Don't break the LSM layer to fix something else.

> > Here is a fun story which seems relevant ... in the early days of
> > SELinux, one of the community devs setup up a system with a SELinux
> > policy which restricted all privileged operations from the root user,
> > put the system on a publicly accessible network, posted the root
> > password for all to see, and invited the public to login to the system
> > and attempt to exercise root privilege (it's been well over 10 years
> > at this point so the details are a bit fuzzy).  Granted, there were
> > some hiccups in the beginning, mostly due to the crude state of policy
> > development/analysis at the time, but after a few policy revisions the
> > system held up quite well.
>
> Honest question out of curiosity: was the intent to demonstrate that
> with LSM one can completely restrict root? Or that root was actually
> allowed to do something useful?

The intent was to show that it is possible to restrict
capability-based access controls with the LSM layer; it was the best
example of the "logical extreme" carried out in the real world that I
could think of when writing my response.

> > On the more practical side of things, there are several use cases
> > which require, by way of legal or contractual requirements, that full
> > root/admin privileges are decomposed into separate roles: security
> > admin, audit admin, backup admin, etc.  These users satisfy these
> > requirements by using LSMs, such as SELinux, to restrict the
> > administrative capabilities based on the SELinux user/role/domain.
> >
> > > By the way, even the above proposal of yours doesn't work for
> > > production use cases when user namespaces are involved, as far as I
> > > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > > containers running inside user namespaces, as CAP_BPF in non-init
> > > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > > BPF program ...
> >
> > Once again, the LSM has always intended to be a restrictive mechanism,
> > not a privilege granting mechanism.  If an operation is not possible
>
> Not according to [0] above:

When one considers what has been present in Linus' tree, then yes.
The idea of authoritative LSM hooks has been rejected for ~20 years
and I've seen nothing in this thread to make me believe that we should
change that now, and for this use case.

> > Based on your patches and our discussion, it seems to me that the
> > problem you are trying to resolve is related more to the
> > capability-based access controls in the eBPF, and possibly other
> > kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> > work with you on a solution involving the LSM, but please understand
> > that I'm not going to support a solution which changes a core
> > philosophy of the LSM layer.
>
> Great, I'd really appreciate help and suggestions on how to solve the
> following problem.
>
> We have a BPF subsystem that allows loading BPF programs. Those BPF
> programs cannot be contained within a particular namespace just by its
> system-wide tracing nature (it can safely read kernel and user memory
> and we can't restrict whether that memory belongs to a particular
> namespace), so it's like CAP_SYS_TIME, just with much broader API
> surface.
>
> The other piece of a puzzle is user namespaces. We do want to run
> applications inside user namespaces, but allow them to use BPF
> programs. As far as I can tell, there is no way to grant real CAP_BPF
> that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> about system-wide nature of BPF). If there is, please help me
> understand how. All my local experiments failed, and looking at
> cap_capable() implementation it is not intended to even check the
> initial namespace's capability if the process is running in the user
> namespace.
>
> So, given that a) we can't make CAP_BPF namespace-aware and b) we
> can't grant real CAP_BPF to processes in user namespace, how could we
> allow user namespaced applications to do useful work with BPF?

I would start by talking with the user namespace folks.  I may be
misunderstanding the problem as you've described it, but it seems like
the core issue is how capabilities, specifically CAP_BPF, are handled
in user namespaces.  To be honest, I'm not sure how much luck you'll
have there, but you stand a better chance in changing how capabilities
are handled across user namespaces than you do in getting an
authoritative LSM hook merged.

Regardless, my offer still stands, if you have a solution which sticks
to a restrictive LSM model, I'm happy to work with you further to sort
out the details and try to make that work.  I don't have any great
ideas there at the moment, but there are plenty of smart people on
this mailing list and others who might have something clever in mind.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-17 23:31       ` Andrii Nakryiko
@ 2023-04-19 10:53         ` Dr. Greg
  0 siblings, 0 replies; 52+ messages in thread
From: Dr. Greg @ 2023-04-19 10:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kees Cook, Paul Moore, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module

On Mon, Apr 17, 2023 at 04:31:31PM -0700, Andrii Nakryiko wrote:

Hi, I hope the week is going well for everyone.

> On Fri, Apr 14, 2023 at 1:24???PM Dr. Greg <greg@enjellic.com> wrote:
> >
> > On Wed, Apr 12, 2023 at 10:47:13AM -0700, Kees Cook wrote:
> >
> > Hi, I hope the week is ending well for everyone.
> >
> > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > On Wed, Apr 12, 2023 at 12:33???AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > >
> > > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > > objects, which are fundamental building blocks of any modern BPF application.
> > > > >
> > > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > > implement LSM policies that could granularly enforce more restrictions on
> > > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > > cases.
> > > >
> > > > One of the hallmarks of the LSM has always been that it is
> > > > non-authoritative: it cannot unilaterally grant access, it can only
> > > > restrict what would have been otherwise permitted on a traditional
> > > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > > discretionary access controls, e.g. capabilities.
> > > >
> > > > If there is a problem with the eBPF capability-based access controls,
> > > > that problem needs to be addressed in how the core eBPF code
> > > > implements its capability checks, not by modifying the LSM mechanism
> > > > to bypass these checks.
> >
> > > I think semantics matter here. I wouldn't view this as _bypassing_
> > > capability enforcement: it's just more fine-grained access control.
> > >
> > > For example, in many places we have things like:
> > >
> > >       if (!some_check(...) && !capable(...))
> > >               return -EPERM;
> > >
> > > I would expect this is a similar logic. An operation can succeed if the
> > > access control requirement is met. The mismatch we have through-out the
> > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > this series conceptually, I think, doesn't violate that -- it's changing
> > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > yet here).
> > >
> > > The reason CAP_BPF was created was because there was nothing else that
> > > would be fine-grained enough at the time.

> > This was one of the issues, among others, that the TSEM LSM we are
> > working to upstream, was designed to address and may be an avenue
> > forward.
> >
> > TSEM, being narratival rather than deontologically based, provides a
> > framework for security permissions that are based on a
> > characterization of the event itself.  So the permissions are as
> > variable as the contents of whatever BPF related information is passed
> > to the bpf* LSM hooks [1].
> >
> > Currently, the tsem_bpf_* hooks are generically modeled.  We would
> > certainly entertain any discussion or suggestions as to what elements
> > of the structures passed to the hooks would be useful with respect
> > to establishing security policies useful and appropriate to the BPF
> > community.

> Could you please provide some links to get a bit more context and
> information? I'd like to understand at least "narratival rather than
> deontologically based" part of this.

We don't have much in the way of links, hopefully some simple prose
will be helpful.

'Narratival vs deontological' contrasts the logic philosophy that is
being used in the design of a security architecture.

Deontological implies that the security architecture is 'rules' based.
A concept embraced by the classic mandatory access control
architectures such as SeLinux.

Narratival, the logic predicate embraced by TSEM, implies that the
security architecture is events based and is constructed from a
narration of a known good workload by unit testing.

At the risk of indulging in further philosophical wonkiness, the two
bodies of logic arise from the constrasting philosopies espoused by
Immanual Kant and Georg Wilhelm Friedrich Hegel.  It is somewhat less
precise, but a security architecture that is rules based would be
considered 'Kantian' motivated while an events based architecture
would be considered 'Hegelian' inspired.

So, departing from epistemology, what does all of this mean with
respect to security.

In a policy based architecture, the security decision is a product of
the rules, in the case of SeLinux a rather complex corpus, that have
been established to regulate the interaction of a role, subject and
object label.

In an events based architecture, the security decision is a product of
the characteristics of the event.  From a granularity perspective,
which seems to be an issue in this BPF/BTF discussion, the granularity
of the security decision can be as variable as any of characteristics
that is used to describe the LSM event at the operating system level.

In TSEM, the characteristics of the event are used to generic a unique
numeric coefficient specific to the event.  The TSEM documentation
discusses the functional generation of these coefficients.

In the case of the three bpf LSM hooks that are in 6.5, this would be
any of the characteristics embodied in the following variables.

bpf command
bpf_attributes
bpf_map
fmode_t
bpf_prog

With respect to your problem at hand; Paul Moore suggested elsewhere
in this thread that there were smart people hanging around on the list
that might be able to comment on the challenge of CAP_BPF lacking
granularity and being unavailable in a user namespace.

I can't claim to being very smart, but I did hook up the big screen TV
at our lake place in west-central Minnesota and it worked the first
time, so here goes some thoughts.

I can't claim a great deal of experience with BPF, but I'm assuming
that any of the characteristics above, or that would be passed to the
proposed BPF LSM hooks, would embody sufficient information about a
BPF program to fully characterize it from a security perspective.

I'm also assuming that the BPF implementation in the Linux kernel is
now sufficiently featureful for a BPF program to assist in making a
security decision by analyzing any of the attributes passed to an LSM
hook for a subsequent and subordinate BPF program.

We currently don't have support in TSEM for connecting a BPF program
to an in kernel Trusted Modeling Agent (TMA), but it is on our radar
screen, desperately seeking attention cycles.  With such hypothetical
support in place, I would propose gating the ability to attach a BPF
program to a TMA with CAP_BPF.  Said program would then assume the
role of assisting the TMA in generating the security coefficients for
subsequent BPF related security events in the modeling namespace.

At that point, the security behavior of subsequent BPF programs will
be under the control of the security model being run by the TMA
assigned to that security namespace.  It can be as granular and
restrictive as any security characteristics that would be described as
being relevant to BPF.

From a security perspective, you don't write any security policy, you
unit test the BPF application and the trust orchestrator generates the
security model that would be subsequently enforced.

With this model, you don't override any existing security controls and
the LSM implementation remains purely restrictive.  CAP_BPF regulates
whether the BPF infrastructure can be accessed and BPF itself becomes
responsible for defining the permissable security behavior of any
subordinate BPF applications.

There are undoubtedly considerations needed in the BPF implementation
to support this model but I haven't had time to look at those
particulars.

There is further discussion of the concepts involved in the 18+ page
documentation file that was included in the V0 release of TSEM.  Here
is the lore link for the original series:

https://lore.kernel.org/linux-security-module/20230204050954.11583-1-greg@enjellic.com/#t

The V1 release, currently being finalized, is a significantly enhanced
implementation but the architectural and security concepts discussed
are all still relevant, if there is a desire to dig into this further.

With respect to the thinking and writings of Kant and Hegel, Wikipedia
is your friend.... :-)

To conclude in a big picture context, if it hasn't already jumped out
at people.  While TSEM operates practically from a narratival design
perspective, it is designed to do so by applying either deterministic
or machine learning models to the characterization and enforcement of
the security behavior of a platform.

The reason we have a somewhat intense interest in BPF is that HIDS
based machine learning models need to do characteristic screening in
order to be properly trained for anomaly detection.  BPF is a pathway
to achieving this with a single kernel based trusted modeling agent
implementation.

Now, back to figuring out how to hook up the stereo/hifi.

Have a good remainder of the week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-18  0:47                     ` Casey Schaufler
@ 2023-04-21  0:00                       ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-21  0:00 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Paul Moore, Kees Cook, Andrii Nakryiko, bpf, ast, daniel,
	kpsingh, linux-security-module

On Mon, Apr 17, 2023 at 5:48 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/17/2023 4:29 PM, Andrii Nakryiko wrote:
> > On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
> >> On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> >> <andrii.nakryiko@gmail.com> wrote:
> >>> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> >>>> <andrii.nakryiko@gmail.com> wrote:
> >>>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >>>> ...
> >>>>
> >>>>>>>>> For example, in many places we have things like:
> >>>>>>>>>
> >>>>>>>>>         if (!some_check(...) && !capable(...))
> >>>>>>>>>                 return -EPERM;
> >>>>>>>>>
> >>>>>>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>>>>>> access control requirement is met. The mismatch we have through-out the
> >>>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>>>>>> yet here).
> >>>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>>>>>> bypassing a capability check, but the description claims that to be
> >>>>>>>> the case.
> >>>>>>>>
> >>>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>>>>>> be used to provide additional access control restrictions beyond a
> >>>>>>>> capability check, but a LSM hook should never be allowed to overrule
> >>>>>>>> an access denial due to a capability check.
> >>>>>>>>
> >>>>>>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>>>>>> would be fine-grained enough at the time.
> >>>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>>>>>> argument that one of the reasons LSMs exist is to provide
> >>>>>>>> supplementary controls due to capability-based access controls being a
> >>>>>>>> poor fit for many modern use cases.
> >>>>>>> I generally agree with what you say, but we DO have this code pattern:
> >>>>>>>
> >>>>>>>          if (!some_check(...) && !capable(...))
> >>>>>>>                  return -EPERM;
> >>>>>> I think we need to make this more concrete; we don't have a pattern in
> >>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>>>>> Simply because there is another kernel access control mechanism which
> >>>>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>>>> LSM hook to be used to skip a capability check.
> >>>>> This work is an attempt to tighten the security of production systems
> >>>>> by allowing to drop too coarse-grained and permissive capabilities
> >>>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> >>>>> than production use cases are meant to be able to do) and then grant
> >>>>> specific BPF operations on specific BPF programs/maps based on custom
> >>>>> LSM security policy, which validates application trustworthiness using
> >>>>> custom production-specific logic.
> >>>> There are ways to leverage the LSMs to apply finer grained access
> >>>> control on top of the relatively coarse capabilities that do not
> >>>> require circumventing those capability controls.  One grants the
> >>>> capabilities, just as one would do today, and then leverages the
> >>>> security functionality of a LSM to further restrict specific users,
> >>>> applications, etc. with a level of granularity beyond that offered by
> >>>> the capability controls.
> >>> Please help me understand something. What you and Casey are proposing,
> >>> when taken to the logical extreme, is to grant to all processes root
> >>> permissions and then use LSM to restrict specific actions, do I
> >>> understand correctly? This strikes me as a less secure and more
> >>> error-prone way of doing things.
> >> When taken to the "logical extreme" most concepts end up sounding a
> >> bit absurd, but that was the point, wasn't it?
> > Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> > the sake of example, let's say CAP_BPF allows 20 different operations
> > (each with its own security_xxx hook). And let's say in production I
> > want to only allow 3 of them. Sure, technically it should be possible
> > to deny access at 17 hooks and let it through in just those 3. But if
> > someone adds 21st and I forget to add 21st restriction, that would be
> > bad (but very probably with such approach).
>
> That would be a flaw in the implementation of the 21st, not a problem
> with the capabilities or LSM model. For the LSM model to be sufficiently
> flexible it cannot be required to prevent or detect coding errors.
>
> > So my point is that for situations like this, dropping CAP_BPF, but
> > allowing only 3 hooks to proceed seems a safer approach, because if we
> > add 21st hook, it will safely be denied without CAP_BPF *by default*.
> > That's what I tried to point out.
>
> When you're creating security relevant or enforcing mechanisms there has
> too be a level of expectation regarding the care with which they're
> developed. My expectation is that the 21st hook won't go in without
> adequate review.
>

That's not how it works with BPF LSM, but there is no point in arguing
about this. I agree that LSM shouldn't be prevent from adding new
hooks just because of some particular LSM implementation.

> > But even if we ignore this "safe by default when a new hook is added"
> > behavior, when taking user namespaces into account, the restrictive
> > LSM approach just doesn't seem to work at all for something like
> > CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> > because we cannot ensure that a given BPF program won't access kernel
> > state "belonging" to another process (as one example).
>
> Time namespaces have been proposed. I would be surprised if there aren't
> people working on BPF namespaces somewhere. There's a difference between
> "can't" and "haven't been".
>

It really is "can't" for BPF, as it allows tracing of kernel internals.

> > Now, thanks to Jonathan, I get that there was a heated discussion 20
> > years ago about authoritative vs restrictive LSMs. But if I read a
> > summary at that time ([0]), authoritative hooks were not out of the
> > question *in principle*. Surely, "walk before we can run" makes sense,
> > but it's been a while ago.
>
> Certainly. The SGI comment was mine, by the way. I wanted authoritative
> hooks for cases like POSIX ACLs and systems without root. While I would
> have liked the decision to go the other way, there's no way I would endorse
> a hybrid, where some hooks are restrictive and others authoritative.
>

Yep, saw your comments as well. Can't say I get what would be wrong
with having authoritative hooks together with restrictive ones, but oh
well.


> >   [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3
> >
> >
> >> Here is a fun story which seems relevant ... in the early days of
> >> SELinux, one of the community devs setup up a system with a SELinux
> >> policy which restricted all privileged operations from the root user,
> >> put the system on a publicly accessible network, posted the root
> >> password for all to see, and invited the public to login to the system
> >> and attempt to exercise root privilege (it's been well over 10 years
> >> at this point so the details are a bit fuzzy).  Granted, there were
> >> some hiccups in the beginning, mostly due to the crude state of policy
> >> development/analysis at the time, but after a few policy revisions the
> >> system held up quite well.
> > Honest question out of curiosity: was the intent to demonstrate that
> > with LSM one can completely restrict root? Or that root was actually
> > allowed to do something useful? Because I can see how rejecting
> > everything would be rather simple, but actually pretty useless in
> > practice. Restricting only part of the power of the root, while still
> > allowing it to do something useful in production seems like a much
> > harder (but way more valuable) endeavor. Not saying it's impossible,
> > but see my example about missing 21st new CAP_BPF functionality.
>
> Capabilities are sufficient to implement a rootless system. It's been done.
> Someone will point out that CAP_SYS_ADMIN is effectively root, and there's
> some truth to that.
>
> >> On the more practical side of things, there are several use cases
> >> which require, by way of legal or contractual requirements, that full
> >> root/admin privileges are decomposed into separate roles: security
> >> admin, audit admin, backup admin, etc.  These users satisfy these
> >> requirements by using LSMs, such as SELinux, to restrict the
> >> administrative capabilities based on the SELinux user/role/domain.
> >>
> >>> By the way, even the above proposal of yours doesn't work for
> >>> production use cases when user namespaces are involved, as far as I
> >>> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> >>> containers running inside user namespaces, as CAP_BPF in non-init
> >>> namespace is not enough for bpf() syscall to allow loading BPF maps or
> >>> BPF program ...
> >> Once again, the LSM has always intended to be a restrictive mechanism,
> >> not a privilege granting mechanism.  If an operation is not possible
> > Not according to [0] above:
> >
> >   > It is our belief that these changes do not belong in the initial version of
> >   > LSM (especially given our limited charter and original goals), and should
> >   > be proposed as incremental refinements after LSM has been initially
> >   > accepted.
> >   > ...
> >   > It is our belief that the current LSM
> >   > will provide a meaningful improvement in the security infrastructure of the
> >   > Linux kernel, and that there is plenty of room for future expansion of LSM
> >   > in subsequent phases.
> >
> > I don't see "always intended to be a restrictive mechanism" there.
>
> Having been on the other side of the argument, the system that was accepted
> was in fact "always intended to be a restrictive mechanism". The quote above
> is a "never say never" statement.
>
> >> without the LSM layer enabled, it should not be possible with the LSM
> >> layer enabled.  The LSM is not a mechanism to circumvent other access
> >> control mechanisms in the kernel.
> > I understand, but it's not like we are proposing to go and bypass all
> > kinds of random kernel security mechanisms. These are targeted hooks,
> > developed by the BPF community for the BPF subsystem to allow trusted
> > unprivileged production use cases. Yes, we currently rely on checking
> > CAP_BPF to grant more dangerous/advanced features, but it's because we
> > can't just allow any unprivileged process to do this. But what we
> > really want is to answer the question "can we trust this process to
> > use this advanced functionality", and if there is no specific LSM
> > policy that cares one way (allow) or the other (disallow), fallback to
> > CAP_BPF enforcement.
> >
> > So it's not bypassing kernel checks, but rather augmenting them with
> > more flexible and customizable mechanisms, while still falling back to
> > CAP_BPF if the user didn't install any custom LSM policy.
>
> That would make CAP_BPF behave differently from all other capabilities.
> Capabilities are hard enough to use correctly as it is. If each capability
> defined its own semantics they would be completely unusable.
>
> >>> Also, in previous email you said:
> >>>
> >>>> Simply because there is another kernel access control mechanism which
> >>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>> LSM hook to be used to skip a capability check.
> >>> I understand your stated position, but can you please help me
> >>> understand the reasoning behind it?
> >> Keeping the LSM as a restrictive access control mechanism helps ensure
> >> some level of sanity and consistency across different Linux
> >> installations.  If a certain operation requires CAP_SYS_ADMIN on one
> >> Linux system, it should require CAP_SYS_ADMIN on another Linux system.
> >> Granted, a LSM running on one system might impose additional
> >> constraints on that operation, but the CAP_SYS_ADMIN requirement still
> >> applies.
> >>
> >> There is also an issue of safety in knowing that enabling a LSM will
> >> not degrade the access controls on a system by potentially granting
> >> operations that were previously denied.
> >>
> >>> Does the above also mean that you'd be fine if we just don't plug into
> >>> the LSM subsystem at all and instead come up with some ad-hoc solution
> >>> to allow effectively the same policies? This sounds detrimental both
> >>> to LSM and BPF subsystems, so I hope we can talk this through before
> >>> finalizing decisions.
> >> Based on your patches and our discussion, it seems to me that the
> >> problem you are trying to resolve is related more to the
> >> capability-based access controls in the eBPF, and possibly other
> >> kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> >> work with you on a solution involving the LSM, but please understand
> >> that I'm not going to support a solution which changes a core
> >> philosophy of the LSM layer.
> > Great, I'd really appreciate help and suggestions on how to solve the
> > following problem.
> >
> > We have a BPF subsystem that allows loading BPF programs. Those BPF
> > programs cannot be contained within a particular namespace just by its
> > system-wide tracing nature (it can safely read kernel and user memory
> > and we can't restrict whether that memory belongs to a particular
> > namespace), so it's like CAP_SYS_TIME, just with much broader API
> > surface.
>
> This doesn't sound like a problem, it sounds like BPF is explicitly
> designed to prevent interference by namespaces. But in some cases you
> now want to limit it by namespaces.
>
> It appears that the desired uses of BPF are no longer compatible with
> its original security model. That's unfortunate, and likely to require
> a significant change to the implementation of BPF.
>

I have some new ideas, so hopefully not as significant. While I still
think that authoritative LSM hooks would be great, I'll stop arguing.
I'll get back with a different proposal that would allow BPF usage
within user namespaces. We still will want LSM hooks for fine-grained
control, but I think we'll be able to make them restrictive-only.

> >
> > The other piece of a puzzle is user namespaces. We do want to run
> > applications inside user namespaces, but allow them to use BPF
> > programs. As far as I can tell, there is no way to grant real CAP_BPF
> > that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> > about system-wide nature of BPF). If there is, please help me
> > understand how. All my local experiments failed, and looking at
> > cap_capable() implementation it is not intended to even check the
> > initial namespace's capability if the process is running in the user
> > namespace.
> >
> >
> > So, given that a) we can't make CAP_BPF namespace-aware and b) we
> > can't grant real CAP_BPF to processes in user namespace, how could we
> > allow user namespaced applications to do useful work with BPF?
> >
> >>> Lastly, you mentioned before:
> >>>
> >>>>>> I think we need to make this more concrete; we don't have a pattern in
> >>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>> Unfortunately I don't have enough familiarity with all LSM hooks, so I
> >>> can't confirm or disprove the above statement. But earlier someone
> >>> brought to my attention the case of security_vm_enough_memory_mm(),
> >>> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> >>> of memory accounting. Am I missing something subtle there or does it
> >>> grant effective caps indeed?
> >> Some of the comments around that hook can be misleading, but if you
> >> look at the actual code it starts to make more sense.
> >>
> > [...]
> >
> >> I do agree that the security_vm_enough_memory() hook is structured a
> >> bit differently than most of the other LSM hooks, but it still
> >> operates with the same philosophy: a LSM should only be allowed to
> >> restrict access, a LSM should never be allowed to grant access that
> >> would otherwise be denied by the traditional Linux access controls.
> >>
> >> Hopefully that explanation makes sense, but if things are still a bit
> >> fuzzy I would encourage you to go look at the code, I'm sure it will
> >> make sense once you spend a few minutes figuring out how it works.
> >>
> > Yep, thanks a lot, it's way more clear after grokking relevant pieces
> > of LSM the code you pointed out and LSM infrastructure in general.
> > "capabilities" LSM is non-negotiable, so it effectively always
> > restricts a small subset of hooks, including vm_enough_memory and
> > capable.
> >
> > Still, the problem still stands. How do we marry BPF and user
> > namespaces? I'd really appreciate suggestions. Thank you!
> >
> >
> >> [1] There is a long and sorta bizarre history with the capability LSM,
> >> but just understand it is a bit "special" in many ways, and those
> >> "special" behaviors are intentional.
> >>
> >> --
> >> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-18 14:21                     ` Paul Moore
@ 2023-04-21  0:00                       ` Andrii Nakryiko
  2023-04-21 18:57                         ` Kees Cook
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-04-21  0:00 UTC (permalink / raw)
  To: Paul Moore
  Cc: Kees Cook, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Tue, Apr 18, 2023 at 7:21 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Mon, Apr 17, 2023 at 7:29 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
> > > On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > > > > > > For example, in many places we have things like:
> > > > > > > > > >
> > > > > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > > > > >                 return -EPERM;
> > > > > > > > > >
> > > > > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > > > > yet here).
> > > > > > > > >
> > > > > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > > > > bypassing a capability check, but the description claims that to be
> > > > > > > > > the case.
> > > > > > > > >
> > > > > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > > > > be used to provide additional access control restrictions beyond a
> > > > > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > > > > an access denial due to a capability check.
> > > > > > > > >
> > > > > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > > > > would be fine-grained enough at the time.
> > > > > > > > >
> > > > > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > > > > supplementary controls due to capability-based access controls being a
> > > > > > > > > poor fit for many modern use cases.
> > > > > > > >
> > > > > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > > > > >
> > > > > > > >          if (!some_check(...) && !capable(...))
> > > > > > > >                  return -EPERM;
> > > > > > >
> > > > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > > > > Simply because there is another kernel access control mechanism which
> > > > > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > > > > LSM hook to be used to skip a capability check.
> > > > > >
> > > > > > This work is an attempt to tighten the security of production systems
> > > > > > by allowing to drop too coarse-grained and permissive capabilities
> > > > > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > > > > than production use cases are meant to be able to do) and then grant
> > > > > > specific BPF operations on specific BPF programs/maps based on custom
> > > > > > LSM security policy, which validates application trustworthiness using
> > > > > > custom production-specific logic.
> > > > >
> > > > > There are ways to leverage the LSMs to apply finer grained access
> > > > > control on top of the relatively coarse capabilities that do not
> > > > > require circumventing those capability controls.  One grants the
> > > > > capabilities, just as one would do today, and then leverages the
> > > > > security functionality of a LSM to further restrict specific users,
> > > > > applications, etc. with a level of granularity beyond that offered by
> > > > > the capability controls.
> > > >
> > > > Please help me understand something. What you and Casey are proposing,
> > > > when taken to the logical extreme, is to grant to all processes root
> > > > permissions and then use LSM to restrict specific actions, do I
> > > > understand correctly? This strikes me as a less secure and more
> > > > error-prone way of doing things.
> > >
> > > When taken to the "logical extreme" most concepts end up sounding a
> > > bit absurd, but that was the point, wasn't it?
> >
> > Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> > the sake of example, let's say CAP_BPF allows 20 different operations
> > (each with its own security_xxx hook). And let's say in production I
> > want to only allow 3 of them. Sure, technically it should be possible
> > to deny access at 17 hooks and let it through in just those 3. But if
> > someone adds 21st and I forget to add 21st restriction, that would be
> > bad (but very probably with such approach).
>
> Welcome to the challenges of maintaining access controls within the
> Linux Kernel, LSM or otherwise.  As we all know, the Linux Kernel
> moves forward at a staggering pace sometimes, and it is not uncommon
> for new features/subsystems to be added without consulting all of the
> different folks who worry about access controls.  In many cases it can
> be a simple misunderstanding, but in some cases it's a willful
> rejection of a particular form of access control, the LSM being a
> prime example.  Thankfully in almost all of those cases we have been
> moderately successful in retrofitting the necessary access controls,
> sometimes they are not as good/capable/granular/etc. as we would like
> because of design limitations, but such is life.
>
> I say this not because I believe this is a valid argument for
> authoritative LSM hooks, I say this simply to acknowledge that this
> *is* a problem.
>

Ack, thanks.

> > So my point is that for situations like this, dropping CAP_BPF, but
> > allowing only 3 hooks to proceed seems a safer approach, because if we
> > add 21st hook, it will safely be denied without CAP_BPF *by default*.
> > That's what I tried to point out.
>
> I believe I understand your point, I just disagree with you on
> accepting authoritative LSM hooks in the upstream Linux Kernel; I
> believe it would be a *big* mistake to move away from the restrictive
> LSM hook philosophy at this point in time.

Ok, understood. While unfortunate, I'll stop pushing for authoritative LSMs.

>
> > But even if we ignore this "safe by default when a new hook is added"
> > behavior, when taking user namespaces into account, the restrictive
> > LSM approach just doesn't seem to work at all for something like
> > CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> > because we cannot ensure that a given BPF program won't access kernel
> > state "belonging" to another process (as one example).
>
> Once again, the root of this problem lies in the capabilities and/or
> namespace mechanisms, not the LSM; if you want to fix this properly
> you should be looking at how eBPF leverages capabilities for access
> control.  Changing the very core behavior of the LSM layer in order to
> work around an issue with another access control mechanism is a
> non-starter.  I can't say this enough.

Alright. I now do have an alternative approach in mind that will only
use restrictive LSMs and will still allow BPF usage within user
namespaces.

>
> > Now, thanks to Jonathan, I get that there was a heated discussion 20
> > years ago about authoritative vs restrictive LSMs. But if I read a
> > summary at that time ([0]), authoritative hooks were not out of the
> > question *in principle*. Surely, "walk before we can run" makes sense,
> > but it's been a while ago.
>
> ... and once again, the restrictive approach has proven to work
> reasonably well over the past ~20 years, why would we abandon that
> simply to work around a problem with a different access control
> mechanism.  Don't break the LSM layer to fix something else.

There was no breakage introduced, let's call things by their proper
names. Surely, new hooks were authoritative, but they don't really
break anything, right? I understand that they go against your
restrictive-only LSM philosophy, but it's not a breakage in any proper
sense of that word. All existing hooks continue to work. New hooks
would work properly as well. It's not a breakage. I'm not saying this
to try to convince you, but let's not misrepresent what I tried to do
in this patch set.

>
> > > Here is a fun story which seems relevant ... in the early days of
> > > SELinux, one of the community devs setup up a system with a SELinux
> > > policy which restricted all privileged operations from the root user,
> > > put the system on a publicly accessible network, posted the root
> > > password for all to see, and invited the public to login to the system
> > > and attempt to exercise root privilege (it's been well over 10 years
> > > at this point so the details are a bit fuzzy).  Granted, there were
> > > some hiccups in the beginning, mostly due to the crude state of policy
> > > development/analysis at the time, but after a few policy revisions the
> > > system held up quite well.
> >
> > Honest question out of curiosity: was the intent to demonstrate that
> > with LSM one can completely restrict root? Or that root was actually
> > allowed to do something useful?
>
> The intent was to show that it is possible to restrict
> capability-based access controls with the LSM layer; it was the best
> example of the "logical extreme" carried out in the real world that I
> could think of when writing my response.
>
> > > On the more practical side of things, there are several use cases
> > > which require, by way of legal or contractual requirements, that full
> > > root/admin privileges are decomposed into separate roles: security
> > > admin, audit admin, backup admin, etc.  These users satisfy these
> > > requirements by using LSMs, such as SELinux, to restrict the
> > > administrative capabilities based on the SELinux user/role/domain.
> > >
> > > > By the way, even the above proposal of yours doesn't work for
> > > > production use cases when user namespaces are involved, as far as I
> > > > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > > > containers running inside user namespaces, as CAP_BPF in non-init
> > > > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > > > BPF program ...
> > >
> > > Once again, the LSM has always intended to be a restrictive mechanism,
> > > not a privilege granting mechanism.  If an operation is not possible
> >
> > Not according to [0] above:
>
> When one considers what has been present in Linus' tree, then yes.
> The idea of authoritative LSM hooks has been rejected for ~20 years
> and I've seen nothing in this thread to make me believe that we should
> change that now, and for this use case.

Ack.

>
> > > Based on your patches and our discussion, it seems to me that the
> > > problem you are trying to resolve is related more to the
> > > capability-based access controls in the eBPF, and possibly other
> > > kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> > > work with you on a solution involving the LSM, but please understand
> > > that I'm not going to support a solution which changes a core
> > > philosophy of the LSM layer.
> >
> > Great, I'd really appreciate help and suggestions on how to solve the
> > following problem.
> >
> > We have a BPF subsystem that allows loading BPF programs. Those BPF
> > programs cannot be contained within a particular namespace just by its
> > system-wide tracing nature (it can safely read kernel and user memory
> > and we can't restrict whether that memory belongs to a particular
> > namespace), so it's like CAP_SYS_TIME, just with much broader API
> > surface.
> >
> > The other piece of a puzzle is user namespaces. We do want to run
> > applications inside user namespaces, but allow them to use BPF
> > programs. As far as I can tell, there is no way to grant real CAP_BPF
> > that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> > about system-wide nature of BPF). If there is, please help me
> > understand how. All my local experiments failed, and looking at
> > cap_capable() implementation it is not intended to even check the
> > initial namespace's capability if the process is running in the user
> > namespace.
> >
> > So, given that a) we can't make CAP_BPF namespace-aware and b) we
> > can't grant real CAP_BPF to processes in user namespace, how could we
> > allow user namespaced applications to do useful work with BPF?
>
> I would start by talking with the user namespace folks.  I may be
> misunderstanding the problem as you've described it, but it seems like
> the core issue is how capabilities, specifically CAP_BPF, are handled
> in user namespaces.  To be honest, I'm not sure how much luck you'll
> have there, but you stand a better chance in changing how capabilities
> are handled across user namespaces than you do in getting an
> authoritative LSM hook merged.
>

You made it very clear, yes.

> Regardless, my offer still stands, if you have a solution which sticks
> to a restrictive LSM model, I'm happy to work with you further to sort
> out the details and try to make that work.  I don't have any great
> ideas there at the moment, but there are plenty of smart people on
> this mailing list and others who might have something clever in mind.

I do have a solution in mind. Stay tuned.

>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks
  2023-04-21  0:00                       ` Andrii Nakryiko
@ 2023-04-21 18:57                         ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2023-04-21 18:57 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Paul Moore, Andrii Nakryiko, bpf, ast, daniel, kpsingh,
	linux-security-module

On Thu, Apr 20, 2023 at 05:00:55PM -0700, Andrii Nakryiko wrote:
> Alright. I now do have an alternative approach in mind that will only
> use restrictive LSMs and will still allow BPF usage within user
> namespaces.

It seems the problem with in the existing kernel is that bpf_capable() is
rather inflexible. In only one place is sysctl_unprivileged_bpf_disabled
checked (outside the unprivileged_ebpf_enabled() checks in CPU errata
fixes).

Should CAP_BPF be per-namespace?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-04-21 18:57 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12  4:32 [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Andrii Nakryiko
2023-04-12  4:32 ` [PATCH bpf-next 1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
2023-04-12 17:49   ` Kees Cook
2023-04-13  0:22     ` Andrii Nakryiko
2023-04-12  4:32 ` [PATCH bpf-next 2/8] bpf: inline map creation logic in map_create() function Andrii Nakryiko
2023-04-12 17:53   ` Kees Cook
2023-04-13  0:22     ` Andrii Nakryiko
2023-04-12  4:32 ` [PATCH bpf-next 3/8] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
2023-04-12 18:01   ` Kees Cook
2023-04-13  0:23     ` Andrii Nakryiko
2023-04-12  4:32 ` [PATCH bpf-next 4/8] bpf, lsm: implement bpf_map_create_security LSM hook Andrii Nakryiko
2023-04-12 18:20   ` Kees Cook
2023-04-13  0:23     ` Andrii Nakryiko
2023-04-12  4:32 ` [PATCH bpf-next 5/8] selftests/bpf: validate new " Andrii Nakryiko
2023-04-12 18:23   ` Kees Cook
2023-04-13  0:23     ` Andrii Nakryiko
2023-04-12  4:32 ` [PATCH bpf-next 6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command Andrii Nakryiko
2023-04-12 18:24   ` Kees Cook
2023-04-13  0:17     ` Andrii Nakryiko
2023-04-12  4:32 ` [PATCH bpf-next 7/8] bpf, lsm: implement bpf_btf_load_security LSM hook Andrii Nakryiko
2023-04-12 16:52   ` Paul Moore
2023-04-13  1:43     ` Andrii Nakryiko
2023-04-13  2:47       ` Paul Moore
2023-04-12  4:33 ` [PATCH bpf-next 8/8] selftests/bpf: enhance lsm_map_create test with BTF LSM control Andrii Nakryiko
2023-04-12 16:49 ` [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Paul Moore
2023-04-12 17:47   ` Kees Cook
2023-04-12 18:06     ` Paul Moore
2023-04-12 18:28       ` Kees Cook
2023-04-12 19:06         ` Paul Moore
2023-04-13  1:43           ` Andrii Nakryiko
2023-04-13  2:56             ` Paul Moore
2023-04-13  5:16               ` Andrii Nakryiko
2023-04-13 15:11                 ` Paul Moore
2023-04-17 23:29                   ` Andrii Nakryiko
2023-04-18  0:47                     ` Casey Schaufler
2023-04-21  0:00                       ` Andrii Nakryiko
2023-04-18 14:21                     ` Paul Moore
2023-04-21  0:00                       ` Andrii Nakryiko
2023-04-21 18:57                         ` Kees Cook
2023-04-13 16:54                 ` Casey Schaufler
2023-04-17 23:31                   ` Andrii Nakryiko
2023-04-13 19:03                 ` Jonathan Corbet
2023-04-17 23:28                   ` Andrii Nakryiko
2023-04-13 16:27             ` Casey Schaufler
2023-04-17 23:31               ` Andrii Nakryiko
2023-04-17 23:53                 ` Casey Schaufler
2023-04-18  0:28                   ` Andrii Nakryiko
2023-04-18  0:52                     ` Casey Schaufler
2023-04-12 18:38       ` Casey Schaufler
2023-04-14 20:23     ` Dr. Greg
2023-04-17 23:31       ` Andrii Nakryiko
2023-04-19 10:53         ` Dr. Greg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).