All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/10] Centralize BPF permission checks
@ 2023-05-02 23:06 Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

This patch set refactors BPF subsystem permission checks for BPF maps and
programs, localizes them in one place, and ensures all parts of BPF ecosystem
(BPF verifier and JITs, and their supporting infra) use recorded effective
capabilities, stored in respective bpf_map or bpf_prog structs, for further
decision making.

This allows for more explicit and centralized handling of BPF-related
capabilities and makes for simpler further BPF permission model evolution, to
be proposed and discussed in follow up patch sets.

Andrii Nakryiko (10):
  bpf: move unprivileged checks into map_create() and bpf_prog_load()
  bpf: inline map creation logic in map_create() function
  bpf: centralize permissions checks for all BPF map types
  bpf: remember if bpf_map was unprivileged and use that consistently
  bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  bpf: keep BPF_PROG_LOAD permission checks clear of validations
  bpf: record effective capabilities at BPF prog load time
  bpf: use recorded BPF prog effective caps when fetching helper protos
  bpf: use recorded bpf_capable flag in JIT code
  bpf: consistenly use program's recorded capabilities in BPF verifier

 arch/arm/net/bpf_jit_32.c                     |   2 +-
 arch/arm64/net/bpf_jit_comp.c                 |   2 +-
 arch/loongarch/net/bpf_jit.c                  |   2 +-
 arch/mips/net/bpf_jit_comp.c                  |   2 +-
 arch/powerpc/net/bpf_jit_comp.c               |   2 +-
 arch/riscv/net/bpf_jit_core.c                 |   3 +-
 arch/s390/net/bpf_jit_comp.c                  |   3 +-
 arch/sparc/net/bpf_jit_comp_64.c              |   2 +-
 arch/x86/net/bpf_jit_comp.c                   |   3 +-
 arch/x86/net/bpf_jit_comp32.c                 |   2 +-
 drivers/media/rc/bpf-lirc.c                   |   2 +-
 include/linux/bpf.h                           |  32 ++-
 include/linux/filter.h                        |   8 +-
 kernel/bpf/arraymap.c                         |  59 +++--
 kernel/bpf/bloom_filter.c                     |   3 -
 kernel/bpf/bpf_local_storage.c                |   3 -
 kernel/bpf/bpf_struct_ops.c                   |   3 -
 kernel/bpf/cgroup.c                           |   6 +-
 kernel/bpf/core.c                             |  22 +-
 kernel/bpf/cpumap.c                           |   4 -
 kernel/bpf/devmap.c                           |   3 -
 kernel/bpf/hashtab.c                          |   6 -
 kernel/bpf/helpers.c                          |   6 +-
 kernel/bpf/lpm_trie.c                         |   3 -
 kernel/bpf/map_in_map.c                       |   3 +-
 kernel/bpf/queue_stack_maps.c                 |   4 -
 kernel/bpf/reuseport_array.c                  |   3 -
 kernel/bpf/stackmap.c                         |   3 -
 kernel/bpf/syscall.c                          | 218 ++++++++++++------
 kernel/bpf/trampoline.c                       |   2 +-
 kernel/bpf/verifier.c                         |  23 +-
 kernel/trace/bpf_trace.c                      |   2 +-
 net/core/filter.c                             |  36 +--
 net/core/sock_map.c                           |   4 -
 net/ipv4/bpf_tcp_ca.c                         |   2 +-
 net/netfilter/nf_bpf_link.c                   |   2 +-
 net/xdp/xskmap.c                              |   4 -
 .../bpf/prog_tests/unpriv_bpf_disabled.c      |   6 +-
 38 files changed, 280 insertions(+), 215 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-03 18:28   ` Stanislav Fomichev
  2023-05-02 23:06 ` [PATCH bpf-next 02/10] bpf: inline map creation logic in map_create() function Andrii Nakryiko
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Make each bpf() syscall command a bit more self-contained, making it
easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
handling down to map_create() and bpf_prog_load(), two special commands
in this regard.

Also swap the order of checks, calling bpf_capable() only if
sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit
messages.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 14f39c1e573e..d5009fafe0f4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1132,6 +1132,17 @@ static int map_create(union bpf_attr *attr)
 	int f_flags;
 	int err;
 
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+		return -EPERM;
+
 	err = CHECK_ATTR(BPF_MAP_CREATE);
 	if (err)
 		return -EINVAL;
@@ -2504,6 +2515,17 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	char license[128];
 	bool is_gpl;
 
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+		return -EPERM;
+
 	if (CHECK_ATTR(BPF_PROG_LOAD))
 		return -EINVAL;
 
@@ -5004,23 +5026,8 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
-	bool capable;
 	int err;
 
-	capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled;
-
-	/* Intent here is for unprivileged_bpf_disabled to block key object
-	 * creation commands for unprivileged users; other actions depend
-	 * of fd availability and access to bpffs, so are dependent on
-	 * object creation success.  Capabilities are later verified for
-	 * operations such as load and map create, so even with unprivileged
-	 * BPF disabled, capability checks are still carried out for these
-	 * and other operations.
-	 */
-	if (!capable &&
-	    (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
-		return -EPERM;
-
 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
 	if (err)
 		return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 02/10] bpf: inline map creation logic in map_create() function
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 03/10] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Currently find_and_alloc_map() performs two separate functions: some
argument sanity checking and partial map creation workflow hanling.
Neither of those functions are self-sufficient and are augmented by
further checks and initialization logic in the caller (map_create()
function). So unify all the sanity checks, permission checks, and
creation and initialization logic in one linear piece of code in
map_create() instead. This also make it easier to further enhance
permission checks and keep them located in one place.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 54 ++++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d5009fafe0f4..3ab6c0dc241a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -109,37 +109,6 @@ const struct bpf_map_ops bpf_map_offload_ops = {
 	.map_mem_usage = bpf_map_offload_map_mem_usage,
 };
 
-static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
-{
-	const struct bpf_map_ops *ops;
-	u32 type = attr->map_type;
-	struct bpf_map *map;
-	int err;
-
-	if (type >= ARRAY_SIZE(bpf_map_types))
-		return ERR_PTR(-EINVAL);
-	type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));
-	ops = bpf_map_types[type];
-	if (!ops)
-		return ERR_PTR(-EINVAL);
-
-	if (ops->map_alloc_check) {
-		err = ops->map_alloc_check(attr);
-		if (err)
-			return ERR_PTR(err);
-	}
-	if (attr->map_ifindex)
-		ops = &bpf_map_offload_ops;
-	if (!ops->map_mem_usage)
-		return ERR_PTR(-EINVAL);
-	map = ops->map_alloc(attr);
-	if (IS_ERR(map))
-		return map;
-	map->ops = ops;
-	map->map_type = type;
-	return map;
-}
-
 static void bpf_map_write_active_inc(struct bpf_map *map)
 {
 	atomic64_inc(&map->writecnt);
@@ -1127,7 +1096,9 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
+	const struct bpf_map_ops *ops;
 	int numa_node = bpf_map_attr_numa_node(attr);
+	u32 map_type = attr->map_type;
 	struct bpf_map *map;
 	int f_flags;
 	int err;
@@ -1169,9 +1140,28 @@ static int map_create(union bpf_attr *attr)
 		return -EINVAL;
 
 	/* find map type and init map: hashtable vs rbtree vs bloom vs ... */
-	map = find_and_alloc_map(attr);
+	map_type = attr->map_type;
+	if (map_type >= ARRAY_SIZE(bpf_map_types))
+		return -EINVAL;
+	map_type = array_index_nospec(map_type, ARRAY_SIZE(bpf_map_types));
+	ops = bpf_map_types[map_type];
+	if (!ops)
+		return -EINVAL;
+
+	if (ops->map_alloc_check) {
+		err = ops->map_alloc_check(attr);
+		if (err)
+			return err;
+	}
+	if (attr->map_ifindex)
+		ops = &bpf_map_offload_ops;
+	if (!ops->map_mem_usage)
+		return -EINVAL;
+	map = ops->map_alloc(attr);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
+	map->ops = ops;
+	map->map_type = map_type;
 
 	err = bpf_obj_name_cpy(map->name, attr->map_name,
 			       sizeof(attr->map_name));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 03/10] bpf: centralize permissions checks for all BPF map types
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 02/10] bpf: inline map creation logic in map_create() function Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently Andrii Nakryiko
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

This allows to do more centralized decisions later on, and generally
makes it very explicit which maps are privileged and which are not
(e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants,
as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit
and easy to verify).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/bloom_filter.c                     |  3 -
 kernel/bpf/bpf_local_storage.c                |  3 -
 kernel/bpf/bpf_struct_ops.c                   |  3 -
 kernel/bpf/cpumap.c                           |  4 --
 kernel/bpf/devmap.c                           |  3 -
 kernel/bpf/hashtab.c                          |  6 --
 kernel/bpf/lpm_trie.c                         |  3 -
 kernel/bpf/queue_stack_maps.c                 |  4 --
 kernel/bpf/reuseport_array.c                  |  3 -
 kernel/bpf/stackmap.c                         |  3 -
 kernel/bpf/syscall.c                          | 70 ++++++++++++++++---
 net/core/sock_map.c                           |  4 --
 net/xdp/xskmap.c                              |  4 --
 .../bpf/prog_tests/unpriv_bpf_disabled.c      |  6 +-
 14 files changed, 64 insertions(+), 55 deletions(-)

diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index 540331b610a9..addf3dd57b59 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -86,9 +86,6 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_bloom_filter *bloom;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	if (attr->key_size != 0 || attr->value_size == 0 ||
 	    attr->max_entries == 0 ||
 	    attr->map_flags & ~BLOOM_CREATE_FLAG_MASK ||
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 47d9948d768f..b5149cfce7d4 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -723,9 +723,6 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
 	    !attr->btf_key_type_id || !attr->btf_value_type_id)
 		return -EINVAL;
 
-	if (!bpf_capable())
-		return -EPERM;
-
 	if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
 		return -E2BIG;
 
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index d3f0a4825fa6..116a0ce378ec 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -655,9 +655,6 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 	const struct btf_type *t, *vt;
 	struct bpf_map *map;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	st_ops = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id);
 	if (!st_ops)
 		return ERR_PTR(-ENOTSUPP);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8ec18faa74ac..8a33e8747a0e 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -28,7 +28,6 @@
 #include <linux/sched.h>
 #include <linux/workqueue.h>
 #include <linux/kthread.h>
-#include <linux/capability.h>
 #include <trace/events/xdp.h>
 #include <linux/btf_ids.h>
 
@@ -89,9 +88,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	u32 value_size = attr->value_size;
 	struct bpf_cpu_map *cmap;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
 	    (value_size != offsetofend(struct bpf_cpumap_val, qsize) &&
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 802692fa3905..49cc0b5671c6 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -160,9 +160,6 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	struct bpf_dtab *dtab;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
-
 	dtab = bpf_map_area_alloc(sizeof(*dtab), NUMA_NO_NODE);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 00c253b84bf5..c69db80fc947 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -422,12 +422,6 @@ static int htab_map_alloc_check(union bpf_attr *attr)
 	BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
 		     offsetof(struct htab_elem, hash_node.pprev));
 
-	if (lru && !bpf_capable())
-		/* LRU implementation is much complicated than other
-		 * maps.  Hence, limit to CAP_BPF.
-		 */
-		return -EPERM;
-
 	if (zero_seed && !capable(CAP_SYS_ADMIN))
 		/* Guard against local DoS, and discourage production use. */
 		return -EPERM;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index e0d3ddf2037a..17c7e7782a1f 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -544,9 +544,6 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 {
 	struct lpm_trie *trie;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 ||
 	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 601609164ef3..8d2ddcb7566b 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -7,7 +7,6 @@
 #include <linux/bpf.h>
 #include <linux/list.h>
 #include <linux/slab.h>
-#include <linux/capability.h>
 #include <linux/btf_ids.h>
 #include "percpu_freelist.h"
 
@@ -46,9 +45,6 @@ static bool queue_stack_map_is_full(struct bpf_queue_stack *qs)
 /* Called from syscall */
 static int queue_stack_map_alloc_check(union bpf_attr *attr)
 {
-	if (!bpf_capable())
-		return -EPERM;
-
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 0 ||
 	    attr->value_size == 0 ||
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index cbf2d8d784b8..4b4f9670f1a9 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -151,9 +151,6 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct reuseport_array *array;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	/* allocate all map elements and zero-initialize them */
 	array = bpf_map_area_alloc(struct_size(array, ptrs, attr->max_entries), numa_node);
 	if (!array)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b25fce425b2c..458bb80b14d5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -74,9 +74,6 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	u64 cost, n_buckets;
 	int err;
 
-	if (!bpf_capable())
-		return ERR_PTR(-EPERM);
-
 	if (attr->map_flags & ~STACK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3ab6c0dc241a..92127eaee467 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1103,17 +1103,6 @@ static int map_create(union bpf_attr *attr)
 	int f_flags;
 	int err;
 
-	/* Intent here is for unprivileged_bpf_disabled to block key object
-	 * creation commands for unprivileged users; other actions depend
-	 * of fd availability and access to bpffs, so are dependent on
-	 * object creation success.  Capabilities are later verified for
-	 * operations such as load and map create, so even with unprivileged
-	 * BPF disabled, capability checks are still carried out for these
-	 * and other operations.
-	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
-
 	err = CHECK_ATTR(BPF_MAP_CREATE);
 	if (err)
 		return -EINVAL;
@@ -1157,6 +1146,65 @@ static int map_create(union bpf_attr *attr)
 		ops = &bpf_map_offload_ops;
 	if (!ops->map_mem_usage)
 		return -EINVAL;
+
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+		return -EPERM;
+
+	/* check privileged map type permissions */
+	switch (map_type) {
+	case BPF_MAP_TYPE_SK_STORAGE:
+	case BPF_MAP_TYPE_INODE_STORAGE:
+	case BPF_MAP_TYPE_TASK_STORAGE:
+	case BPF_MAP_TYPE_CGRP_STORAGE:
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+	case BPF_MAP_TYPE_LPM_TRIE:
+	case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
+	case BPF_MAP_TYPE_STACK_TRACE:
+	case BPF_MAP_TYPE_QUEUE:
+	case BPF_MAP_TYPE_STACK:
+	case BPF_MAP_TYPE_LRU_HASH:
+	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
+	case BPF_MAP_TYPE_STRUCT_OPS:
+	case BPF_MAP_TYPE_CPUMAP:
+		if (!bpf_capable())
+			return -EPERM;
+		break;
+	case BPF_MAP_TYPE_SOCKMAP:
+	case BPF_MAP_TYPE_SOCKHASH:
+	case BPF_MAP_TYPE_DEVMAP:
+	case BPF_MAP_TYPE_DEVMAP_HASH:
+	case BPF_MAP_TYPE_XSKMAP:
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
+		break;
+	case BPF_MAP_TYPE_ARRAY:
+	case BPF_MAP_TYPE_PERCPU_ARRAY:
+	case BPF_MAP_TYPE_PROG_ARRAY:
+	case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
+	case BPF_MAP_TYPE_CGROUP_ARRAY:
+	case BPF_MAP_TYPE_ARRAY_OF_MAPS:
+	case BPF_MAP_TYPE_HASH:
+	case BPF_MAP_TYPE_PERCPU_HASH:
+	case BPF_MAP_TYPE_HASH_OF_MAPS:
+	case BPF_MAP_TYPE_RINGBUF:
+	case BPF_MAP_TYPE_USER_RINGBUF:
+	case BPF_MAP_TYPE_CGROUP_STORAGE:
+	case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
+		/* unprivileged */
+		break;
+	default:
+		WARN(1, "unsupported map type %d", map_type);
+		return -EPERM;
+	}
+
 	map = ops->map_alloc(attr);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 7c189c2e2fbf..4b67bb5e7f9c 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -32,8 +32,6 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_stab *stab;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
 	if (attr->max_entries == 0 ||
 	    attr->key_size    != 4 ||
 	    (attr->value_size != sizeof(u32) &&
@@ -1085,8 +1083,6 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	struct bpf_shtab *htab;
 	int i, err;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
 	if (attr->max_entries == 0 ||
 	    attr->key_size    == 0 ||
 	    (attr->value_size != sizeof(u32) &&
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index 2c1427074a3b..e1c526f97ce3 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -5,7 +5,6 @@
 
 #include <linux/bpf.h>
 #include <linux/filter.h>
-#include <linux/capability.h>
 #include <net/xdp_sock.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
@@ -68,9 +67,6 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	int numa_node;
 	u64 size;
 
-	if (!capable(CAP_NET_ADMIN))
-		return ERR_PTR(-EPERM);
-
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
 	    attr->value_size != 4 ||
 	    attr->map_flags & ~(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY))
diff --git a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
index 8383a99f610f..0adf8d9475cb 100644
--- a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
+++ b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
@@ -171,7 +171,11 @@ static void test_unpriv_bpf_disabled_negative(struct test_unpriv_bpf_disabled *s
 				prog_insns, prog_insn_cnt, &load_opts),
 		  -EPERM, "prog_load_fails");
 
-	for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_BLOOM_FILTER; i++)
+	/* some map types require particular correct parameters which could be
+	 * sanity-checked before enforcing -EPERM, so only validate that
+	 * the simple ARRAY and HASH maps are failing with -EPERM
+	 */
+	for (i = BPF_MAP_TYPE_HASH; i <= BPF_MAP_TYPE_ARRAY; i++)
 		ASSERT_EQ(bpf_map_create(i, NULL, sizeof(int), sizeof(int), 1, NULL),
 			  -EPERM, "map_create_fails");
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-05-02 23:06 ` [PATCH bpf-next 03/10] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-04 20:05   ` Alexei Starovoitov
  2023-05-02 23:06 ` [PATCH bpf-next 05/10] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command Andrii Nakryiko
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

While some maps are allowed to be created with no extra privileges (like
CAP_BPF and/or CAP_NET_ADMIN), some parts of BPF verifier logic do care
about the presence of privileged for a given map.

One such place is Spectre v1 mitigations in ARRAY map. Another, extra
restrictions on maps having special embedded BTF-defined fields (like
spin lock, etc).

So record whether a map was privileged or not at the creation time and
don't recheck bpf_capable() anymore.

Handling Spectre v1 mitigation in ARRAY-like maps required a bit of code
acrobatics: extracting size adjustment into a separate
bpf_array_adjust_for_spec_v1() helper and calling it explicitly for
ARRAY, PERCPU_ARRAY, PROG_ARRAY, CGROUP_ARRAY, PERF_EVENT_ARRAY and
ARRAY_OF_MAPS to adjust passed in bpf_attrs, because these adjustments
have to happen before map creation. Alternative would be to extend
map_ops->map_alloc() callback with unprivileged flag, which seemed
excessive, as all other maps don't care, but could be done if preferred.

But once map->unpriv flag is recorded, handing BTF-defined fields was
a breeze, dropping bpf_capable() check buried deeper in map_check_btf()
validation logic.

Once extra bit that required consideration was remembering unprivileged
bit when dealing with map-in-maps and taking it into account when
checking map metadata compatibility. Given unprivileged bit is important
for correctness, it should be taken into account just like key and value
sizes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h     |  4 ++-
 kernel/bpf/arraymap.c   | 59 ++++++++++++++++++++++++-----------------
 kernel/bpf/map_in_map.c |  3 ++-
 kernel/bpf/syscall.c    | 25 +++++++++++++++--
 kernel/bpf/verifier.c   |  6 ++---
 5 files changed, 65 insertions(+), 32 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 456f33b9d205..479657bb113e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -273,7 +273,7 @@ struct bpf_map {
 		bool jited;
 		bool xdp_has_frags;
 	} owner;
-	bool bypass_spec_v1;
+	bool unpriv;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
 };
 
@@ -2058,6 +2058,8 @@ static inline bool bpf_bypass_spec_v1(void)
 	return perfmon_capable();
 }
 
+int bpf_array_adjust_for_spec_v1(union bpf_attr *attr);
+
 static inline bool bpf_bypass_spec_v4(void)
 {
 	return perfmon_capable();
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..a51d22a3afd1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -77,18 +77,9 @@ int array_map_alloc_check(union bpf_attr *attr)
 	return 0;
 }
 
-static struct bpf_map *array_map_alloc(union bpf_attr *attr)
+static u32 array_index_mask(u32 max_entries)
 {
-	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
-	int numa_node = bpf_map_attr_numa_node(attr);
-	u32 elem_size, index_mask, max_entries;
-	bool bypass_spec_v1 = bpf_bypass_spec_v1();
-	u64 array_size, mask64;
-	struct bpf_array *array;
-
-	elem_size = round_up(attr->value_size, 8);
-
-	max_entries = attr->max_entries;
+	u64 mask64;
 
 	/* On 32 bit archs roundup_pow_of_two() with max_entries that has
 	 * upper most bit set in u32 space is undefined behavior due to
@@ -98,17 +89,38 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	mask64 = 1ULL << mask64;
 	mask64 -= 1;
 
-	index_mask = mask64;
-	if (!bypass_spec_v1) {
-		/* round up array size to nearest power of 2,
-		 * since cpu will speculate within index_mask limits
-		 */
-		max_entries = index_mask + 1;
-		/* Check for overflows. */
-		if (max_entries < attr->max_entries)
-			return ERR_PTR(-E2BIG);
-	}
+	return (u32)mask64;
+}
+
+int bpf_array_adjust_for_spec_v1(union bpf_attr *attr)
+{
+	u32 max_entries, index_mask;
+
+	/* round up array size to nearest power of 2,
+	 * since cpu will speculate within index_mask limits
+	 */
+	index_mask = array_index_mask(attr->max_entries);
+	max_entries = index_mask + 1;
+	/* Check for overflows. */
+	if (max_entries < attr->max_entries)
+		return -E2BIG;
+
+	attr->max_entries = max_entries;
+	return 0;
+}
 
+static struct bpf_map *array_map_alloc(union bpf_attr *attr)
+{
+	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
+	int numa_node = bpf_map_attr_numa_node(attr);
+	u32 elem_size, index_mask, max_entries;
+	u64 array_size;
+	struct bpf_array *array;
+
+	elem_size = round_up(attr->value_size, 8);
+
+	max_entries = attr->max_entries;
+	index_mask = array_index_mask(max_entries);
 	array_size = sizeof(*array);
 	if (percpu) {
 		array_size += (u64) max_entries * sizeof(void *);
@@ -140,7 +152,6 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	if (!array)
 		return ERR_PTR(-ENOMEM);
 	array->index_mask = index_mask;
-	array->map.bypass_spec_v1 = bypass_spec_v1;
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&array->map, attr);
@@ -216,7 +227,7 @@ static int array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
 
 	*insn++ = BPF_ALU64_IMM(BPF_ADD, map_ptr, offsetof(struct bpf_array, value));
 	*insn++ = BPF_LDX_MEM(BPF_W, ret, index, 0);
-	if (!map->bypass_spec_v1) {
+	if (map->unpriv) {
 		*insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 4);
 		*insn++ = BPF_ALU32_IMM(BPF_AND, ret, array->index_mask);
 	} else {
@@ -1373,7 +1384,7 @@ static int array_of_map_gen_lookup(struct bpf_map *map,
 
 	*insn++ = BPF_ALU64_IMM(BPF_ADD, map_ptr, offsetof(struct bpf_array, value));
 	*insn++ = BPF_LDX_MEM(BPF_W, ret, index, 0);
-	if (!map->bypass_spec_v1) {
+	if (map->unpriv) {
 		*insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 6);
 		*insn++ = BPF_ALU32_IMM(BPF_AND, ret, array->index_mask);
 	} else {
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 2c5c64c2a53b..21cb4be92097 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -41,6 +41,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 		goto put;
 	}
 
+	inner_map_meta->unpriv = inner_map->unpriv;
 	inner_map_meta->map_type = inner_map->map_type;
 	inner_map_meta->key_size = inner_map->key_size;
 	inner_map_meta->value_size = inner_map->value_size;
@@ -69,7 +70,6 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 	/* Misc members not needed in bpf_map_meta_equal() check. */
 	inner_map_meta->ops = inner_map->ops;
 	if (inner_map->ops == &array_map_ops) {
-		inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
 		container_of(inner_map_meta, struct bpf_array, map)->index_mask =
 		     container_of(inner_map, struct bpf_array, map)->index_mask;
 	}
@@ -98,6 +98,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 		meta0->key_size == meta1->key_size &&
 		meta0->value_size == meta1->value_size &&
 		meta0->map_flags == meta1->map_flags &&
+		meta0->unpriv == meta1->unpriv &&
 		btf_record_equal(meta0->record, meta1->record);
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 92127eaee467..ffc61a764fe5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1010,7 +1010,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
 
-		if (!bpf_capable()) {
+		if (map->unpriv) {
 			ret = -EPERM;
 			goto free_map_tab;
 		}
@@ -1100,6 +1100,7 @@ static int map_create(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 map_type = attr->map_type;
 	struct bpf_map *map;
+	bool unpriv;
 	int f_flags;
 	int err;
 
@@ -1176,6 +1177,7 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_CPUMAP:
 		if (!bpf_capable())
 			return -EPERM;
+		unpriv = false;
 		break;
 	case BPF_MAP_TYPE_SOCKMAP:
 	case BPF_MAP_TYPE_SOCKHASH:
@@ -1184,6 +1186,7 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_XSKMAP:
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
+		unpriv = false;
 		break;
 	case BPF_MAP_TYPE_ARRAY:
 	case BPF_MAP_TYPE_PERCPU_ARRAY:
@@ -1198,18 +1201,36 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_USER_RINGBUF:
 	case BPF_MAP_TYPE_CGROUP_STORAGE:
 	case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
-		/* unprivileged */
+		/* unprivileged is OK, but we still record if we had CAP_BPF */
+		unpriv = !bpf_capable();
 		break;
 	default:
 		WARN(1, "unsupported map type %d", map_type);
 		return -EPERM;
 	}
 
+	/* ARRAY-like maps have special sizing provisions for mitigating Spectre v1 */
+	if (unpriv) {
+		switch (map_type) {
+		case BPF_MAP_TYPE_ARRAY:
+		case BPF_MAP_TYPE_PERCPU_ARRAY:
+		case BPF_MAP_TYPE_PROG_ARRAY:
+		case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
+		case BPF_MAP_TYPE_CGROUP_ARRAY:
+		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
+			err = bpf_array_adjust_for_spec_v1(attr);
+			if (err)
+				return err;
+			break;
+		}
+	}
+
 	map = ops->map_alloc(attr);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 	map->ops = ops;
 	map->map_type = map_type;
+	map->unpriv = unpriv;
 
 	err = bpf_obj_name_cpy(map->name, attr->map_name,
 			       sizeof(attr->map_name));
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ff4a8ab99f08..481aaf189183 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8731,11 +8731,9 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 	}
 
 	if (!BPF_MAP_PTR(aux->map_ptr_state))
-		bpf_map_ptr_store(aux, meta->map_ptr,
-				  !meta->map_ptr->bypass_spec_v1);
+		bpf_map_ptr_store(aux, meta->map_ptr, meta->map_ptr->unpriv);
 	else if (BPF_MAP_PTR(aux->map_ptr_state) != meta->map_ptr)
-		bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON,
-				  !meta->map_ptr->bypass_spec_v1);
+		bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON, meta->map_ptr->unpriv);
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 05/10] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-05-02 23:06 ` [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 06/10] bpf: keep BPF_PROG_LOAD permission checks clear of validations Andrii Nakryiko
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Seems like that extra bpf_capable() check in BPF_MAP_FREEZE handler was
unintentionally left when we switched to a model that all BPF map
operations should be allowed regardless of CAP_BPF (or any other
capabilities), as long as process got BPF map FD somehow.

This patch replaces bpf_capable() check in BPF_MAP_FREEZE handler with
writeable access check, given conceptually freezing the map is modifying
it: map becomes unmodifiable for subsequent updates.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ffc61a764fe5..2e5ca52c45c4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2001,6 +2001,11 @@ static int map_freeze(const union bpf_attr *attr)
 		return -ENOTSUPP;
 	}
 
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
 	mutex_lock(&map->freeze_mutex);
 	if (bpf_map_write_active(map)) {
 		err = -EBUSY;
@@ -2010,10 +2015,6 @@ static int map_freeze(const union bpf_attr *attr)
 		err = -EBUSY;
 		goto err_put;
 	}
-	if (!bpf_capable()) {
-		err = -EPERM;
-		goto err_put;
-	}
 
 	WRITE_ONCE(map->frozen, true);
 err_put:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 06/10] bpf: keep BPF_PROG_LOAD permission checks clear of validations
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-05-02 23:06 ` [PATCH bpf-next 05/10] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-04 20:12   ` Alexei Starovoitov
  2023-05-02 23:06 ` [PATCH bpf-next 07/10] bpf: record effective capabilities at BPF prog load time Andrii Nakryiko
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Move out flags validation and license checks out of the permission
checks. They were intermingled, which makes subsequent changes harder.
Clean this up: perform straightforward flag validation upfront, and
fetch and check license later, right where we use it. Also consolidate
capabilities check in one block, right after basic attribute sanity
checks.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 44 +++++++++++++++++++++-----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2e5ca52c45c4..d960eb476754 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2573,18 +2573,6 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	struct btf *attach_btf = NULL;
 	int err;
 	char license[128];
-	bool is_gpl;
-
-	/* Intent here is for unprivileged_bpf_disabled to block key object
-	 * creation commands for unprivileged users; other actions depend
-	 * of fd availability and access to bpffs, so are dependent on
-	 * object creation success.  Capabilities are later verified for
-	 * operations such as load and map create, so even with unprivileged
-	 * BPF disabled, capability checks are still carried out for these
-	 * and other operations.
-	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
 
 	if (CHECK_ATTR(BPF_PROG_LOAD))
 		return -EINVAL;
@@ -2598,21 +2586,22 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_XDP_DEV_BOUND_ONLY))
 		return -EINVAL;
 
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+		return -EPERM;
+
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
 	    !bpf_capable())
 		return -EPERM;
 
-	/* copy eBPF program license from user space */
-	if (strncpy_from_bpfptr(license,
-				make_bpfptr(attr->license, uattr.is_kernel),
-				sizeof(license) - 1) < 0)
-		return -EFAULT;
-	license[sizeof(license) - 1] = 0;
-
-	/* eBPF programs must be GPL compatible to use GPL-ed functions */
-	is_gpl = license_is_gpl_compatible(license);
-
 	if (attr->insn_cnt == 0 ||
 	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
 		return -E2BIG;
@@ -2700,7 +2689,16 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->jited = 0;
 
 	atomic64_set(&prog->aux->refcnt, 1);
-	prog->gpl_compatible = is_gpl ? 1 : 0;
+
+	/* copy eBPF program license from user space */
+	if (strncpy_from_bpfptr(license,
+				make_bpfptr(attr->license, uattr.is_kernel),
+				sizeof(license) - 1) < 0)
+		return -EFAULT;
+	license[sizeof(license) - 1] = 0;
+
+	/* eBPF programs must be GPL compatible to use GPL-ed functions */
+	prog->gpl_compatible = license_is_gpl_compatible(license) ? 1 : 0;
 
 	if (bpf_prog_is_dev_bound(prog->aux)) {
 		err = bpf_prog_dev_bound_init(prog, attr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 07/10] bpf: record effective capabilities at BPF prog load time
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-05-02 23:06 ` [PATCH bpf-next 06/10] bpf: keep BPF_PROG_LOAD permission checks clear of validations Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 08/10] bpf: use recorded BPF prog effective caps when fetching helper protos Andrii Nakryiko
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Determine and remember effective capabilities of the process at the
BPF_PROG_LOAD command time. We store those in prog->aux, so we can check
and use them throughout various places (like BPF verifier) that perform
various checks and enforce various extra conditions on BPF program based
on its effective capabilities. Currently we have explicit calls to
bpf_capable(), perfmon_capable() and capable() spread out throughout
code base, but we'll be refactoring this over next few patches to
consistently use prog->aux->{bpf,perfmon}_capable
fields to perform decisions. This ensures we have a consistent set of
restrictions checked and recorded in one consistent place.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h  |  5 +++++
 kernel/bpf/syscall.c | 23 +++++++++++++++++------
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 479657bb113e..44ccfea0f73b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1370,6 +1370,11 @@ struct bpf_prog_aux {
 	u32 ctx_arg_info_size;
 	u32 max_rdonly_access;
 	u32 max_rdwr_access;
+
+	/* effective capabilities that were given to BPF program at load time */
+	bool bpf_capable;
+	bool perfmon_capable;
+
 	struct btf *attach_btf;
 	const struct bpf_ctx_arg_aux *ctx_arg_info;
 	struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d960eb476754..839f69d75297 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2568,6 +2568,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 
 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 {
+	bool bpf_cap, net_admin_cap, sys_admin_cap, perfmon_cap;
 	enum bpf_prog_type type = attr->prog_type;
 	struct bpf_prog *prog, *dst_prog = NULL;
 	struct btf *attach_btf = NULL;
@@ -2586,6 +2587,12 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_XDP_DEV_BOUND_ONLY))
 		return -EINVAL;
 
+	/* remember set of effective capabilities to be used throughout */
+	bpf_cap = bpf_capable();
+	perfmon_cap = perfmon_capable();
+	net_admin_cap = capable(CAP_NET_ADMIN);
+	sys_admin_cap = capable(CAP_SYS_ADMIN);
+
 	/* Intent here is for unprivileged_bpf_disabled to block key object
 	 * creation commands for unprivileged users; other actions depend
 	 * of fd availability and access to bpffs, so are dependent on
@@ -2594,25 +2601,25 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 * BPF disabled, capability checks are still carried out for these
 	 * and other operations.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+	if (sysctl_unprivileged_bpf_disabled && !bpf_cap)
 		return -EPERM;
 
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
-	    !bpf_capable())
+	    !bpf_cap)
 		return -EPERM;
 
 	if (attr->insn_cnt == 0 ||
-	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
+	    attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
 		return -E2BIG;
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
-	    !bpf_capable())
+	    !bpf_cap)
 		return -EPERM;
 
-	if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN))
+	if (is_net_admin_prog_type(type) && !net_admin_cap && !sys_admin_cap)
 		return -EPERM;
-	if (is_perfmon_prog_type(type) && !perfmon_capable())
+	if (is_perfmon_prog_type(type) && !perfmon_cap)
 		return -EPERM;
 
 	/* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog
@@ -2672,6 +2679,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
 
+	/* remember effective capabilities prog was given */
+	prog->aux->bpf_capable = bpf_cap;
+	prog->aux->perfmon_capable = perfmon_cap;
+
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
 		goto free_prog;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 08/10] bpf: use recorded BPF prog effective caps when fetching helper protos
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-05-02 23:06 ` [PATCH bpf-next 07/10] bpf: record effective capabilities at BPF prog load time Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 09/10] bpf: use recorded bpf_capable flag in JIT code Andrii Nakryiko
  2023-05-02 23:06 ` [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier Andrii Nakryiko
  9 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Instead of performing extra bpf_capable() and perfmon_capable() calls
inside bpf_base_func_proto() function (and other similar ones) to
determine eligibility of a given BPF helper for a given program, use
previously recorded effective capabilities of the verified program given
to it during BPF_PROG_LOAD command handling.

This prevents any potential races and makes
prog->aux->{bpf,perfmon}_capable the source of truth for decisions on
what BPF program is allowed to use.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 drivers/media/rc/bpf-lirc.c |  2 +-
 include/linux/bpf.h         |  5 +++--
 kernel/bpf/cgroup.c         |  6 +++---
 kernel/bpf/helpers.c        |  6 +++---
 kernel/bpf/syscall.c        |  4 ++--
 kernel/trace/bpf_trace.c    |  2 +-
 net/core/filter.c           | 32 ++++++++++++++++----------------
 net/ipv4/bpf_tcp_ca.c       |  2 +-
 net/netfilter/nf_bpf_link.c |  2 +-
 9 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index fe17c7f98e81..f580326f7a96 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_trace_printk:
-		if (perfmon_capable())
+		if (prog->aux->perfmon_capable)
 			return bpf_get_trace_printk_proto();
 		fallthrough;
 	default:
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 44ccfea0f73b..a5832a69f24e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2328,7 +2328,8 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
 struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
 
-const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
+const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
+						 const struct bpf_prog *prog);
 void bpf_task_storage_free(struct task_struct *task);
 void bpf_cgrp_storage_free(struct cgroup *cgroup);
 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
@@ -2567,7 +2568,7 @@ static inline int btf_struct_access(struct bpf_verifier_log *log,
 }
 
 static inline const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	return NULL;
 }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a06e118a9be5..097fc69acfe9 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1615,7 +1615,7 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2158,7 +2158,7 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2315,7 +2315,7 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index bb6b4637ebf2..020e03f625b5 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1663,7 +1663,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
 const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
 
 const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -1714,7 +1714,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!bpf_capable())
+	if (!prog->aux->bpf_capable)
 		return NULL;
 
 	switch (func_id) {
@@ -1772,7 +1772,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!perfmon_capable())
+	if (!prog->aux->perfmon_capable)
 		return NULL;
 
 	switch (func_id) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 839f69d75297..6dac69c6ec26 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5331,7 +5331,7 @@ static const struct bpf_func_proto bpf_sys_bpf_proto = {
 const struct bpf_func_proto * __weak
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 BPF_CALL_1(bpf_sys_close, u32, fd)
@@ -5381,7 +5381,7 @@ syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_sys_bpf:
-		return !perfmon_capable() ? NULL : &bpf_sys_bpf_proto;
+		return !prog->aux->perfmon_capable ? NULL : &bpf_sys_bpf_proto;
 	case BPF_FUNC_btf_find_by_name_kind:
 		return &bpf_btf_find_by_name_kind_proto;
 	case BPF_FUNC_sys_close:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8deb22a99abe..f79606347e4f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1511,7 +1511,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index d9ce04ca22ce..8e282797e658 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -83,7 +83,7 @@
 #include <net/netfilter/nf_conntrack_bpf.h>
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id);
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 
 int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len)
 {
@@ -7724,7 +7724,7 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7807,7 +7807,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 			return NULL;
 		}
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7826,7 +7826,7 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_skb_event_output_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8013,7 +8013,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8072,7 +8072,7 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 
 #if IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)
@@ -8133,7 +8133,7 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_sock_proto;
 #endif /* CONFIG_INET */
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8175,7 +8175,7 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_cgroup_classid_curr_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8219,7 +8219,7 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_skc_lookup_tcp_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8230,7 +8230,7 @@ flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_load_bytes:
 		return &bpf_flow_dissector_load_bytes_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8257,7 +8257,7 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11088,7 +11088,7 @@ sk_reuseport_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11270,7 +11270,7 @@ sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_sk_release:
 		return &bpf_sk_release_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11604,7 +11604,7 @@ const struct bpf_func_proto bpf_sock_from_file_proto = {
 };
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id)
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	const struct bpf_func_proto *func;
 
@@ -11633,10 +11633,10 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 
-	if (!perfmon_capable())
+	if (!prog->aux->perfmon_capable)
 		return NULL;
 
 	return func;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 4406d796cc2f..0a3a60e7c282 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -193,7 +193,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c
index c36da56d756f..d7786ea9c01a 100644
--- a/net/netfilter/nf_bpf_link.c
+++ b/net/netfilter/nf_bpf_link.c
@@ -219,7 +219,7 @@ static bool nf_is_valid_access(int off, int size, enum bpf_access_type type,
 static const struct bpf_func_proto *
 bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 const struct bpf_verifier_ops netfilter_verifier_ops = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 09/10] bpf: use recorded bpf_capable flag in JIT code
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2023-05-02 23:06 ` [PATCH bpf-next 08/10] bpf: use recorded BPF prog effective caps when fetching helper protos Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-04 22:09   ` Alexei Starovoitov
  2023-05-02 23:06 ` [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier Andrii Nakryiko
  9 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Use recorded bpf_capable flag of a BPF program in
bpf_jit_charge_modmem(), if possible. If BPF program is unavailable,
fallback to system-level bpf_capable() check. This is currently the case
for BPF trampoline update code, which might not be associated with
a particular instance of BPF program.

All the other users of bpf_jit_charge_modmem() do work within a context
of a specific BPF program, so can trivially use prog->aux->bpf_capable.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 arch/arm/net/bpf_jit_32.c        |  2 +-
 arch/arm64/net/bpf_jit_comp.c    |  2 +-
 arch/loongarch/net/bpf_jit.c     |  2 +-
 arch/mips/net/bpf_jit_comp.c     |  2 +-
 arch/powerpc/net/bpf_jit_comp.c  |  2 +-
 arch/riscv/net/bpf_jit_core.c    |  3 ++-
 arch/s390/net/bpf_jit_comp.c     |  3 ++-
 arch/sparc/net/bpf_jit_comp_64.c |  2 +-
 arch/x86/net/bpf_jit_comp.c      |  3 ++-
 arch/x86/net/bpf_jit_comp32.c    |  2 +-
 include/linux/bpf.h              |  2 +-
 include/linux/filter.h           |  6 ++++--
 kernel/bpf/core.c                | 20 +++++++++++---------
 kernel/bpf/trampoline.c          |  2 +-
 14 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 6a1c9fca5260..82dae8e4d6b1 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1963,7 +1963,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	image_size = sizeof(u32) * ctx.idx;
 
 	/* Now we know the size of the structure to make */
-	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+	header = bpf_jit_binary_alloc(prog, image_size, &image_ptr,
 				      sizeof(u32), jit_fill_hole);
 	/* Not able to allocate memory for the structure then
 	 * we must fall back to the interpretation
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index b26da8efa616..5c60d9922a03 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1533,7 +1533,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	/* also allocate space for plt target */
 	extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align);
 	image_size = extable_offset + extable_size;
-	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+	header = bpf_jit_binary_alloc(prog, image_size, &image_ptr,
 				      sizeof(u32), jit_fill_hole);
 	if (header == NULL) {
 		prog = orig_prog;
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index db9342b2d0e6..c803581e87db 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1172,7 +1172,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	prog_size = sizeof(u32) * ctx.idx;
 	image_size = prog_size + extable_size;
 	/* Now we know the size of the structure to make */
-	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+	header = bpf_jit_binary_alloc(prog, image_size, &image_ptr,
 				      sizeof(u32), jit_fill_hole);
 	if (header == NULL) {
 		prog = orig_prog;
diff --git a/arch/mips/net/bpf_jit_comp.c b/arch/mips/net/bpf_jit_comp.c
index a40d926b6513..365849adc238 100644
--- a/arch/mips/net/bpf_jit_comp.c
+++ b/arch/mips/net/bpf_jit_comp.c
@@ -985,7 +985,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	/* Now we know the size of the structure to make */
 	image_size = sizeof(u32) * ctx.jit_index;
-	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+	header = bpf_jit_binary_alloc(prog, image_size, &image_ptr,
 				      sizeof(u32), jit_fill_hole);
 	/*
 	 * Not able to allocate memory for the structure then
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index e93aefcfb83f..fb7732fe2e7e 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -154,7 +154,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	proglen = cgctx.idx * 4;
 	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
 
-	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
+	bpf_hdr = bpf_jit_binary_alloc(fp, alloclen, &image, 4, bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
 		fp = org_fp;
 		goto out_addrs;
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 737baf8715da..570567e02dc7 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -109,7 +109,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			prog_size = sizeof(*ctx->insns) * ctx->ninsns;
 
 			jit_data->header =
-				bpf_jit_binary_alloc(prog_size + extable_size,
+				bpf_jit_binary_alloc(prog,
+						     prog_size + extable_size,
 						     &jit_data->image,
 						     sizeof(u32),
 						     bpf_fill_ill_insns);
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index f95d7e401b96..11627295e695 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1878,7 +1878,8 @@ static struct bpf_binary_header *bpf_jit_alloc(struct bpf_jit *jit,
 			    __alignof__(struct exception_table_entry));
 	extable_size = fp->aux->num_exentries *
 		sizeof(struct exception_table_entry);
-	header = bpf_jit_binary_alloc(code_size + extable_size, &jit->prg_buf,
+	header = bpf_jit_binary_alloc(fp,
+				      code_size + extable_size, &jit->prg_buf,
 				      8, jit_fill_hole);
 	if (!header)
 		return NULL;
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index fa0759bfe498..bff669698e86 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1567,7 +1567,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	/* Now we know the actual image size. */
 	image_size = sizeof(u32) * ctx.idx;
-	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+	header = bpf_jit_binary_alloc(prog, image_size, &image_ptr,
 				      sizeof(u32), jit_fill_hole);
 	if (header == NULL) {
 		prog = orig_prog;
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 1056bbf55b17..593c9daad167 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2556,7 +2556,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 				sizeof(struct exception_table_entry);
 
 			/* allocate module memory for x86 insns and extable */
-			header = bpf_jit_binary_pack_alloc(roundup(proglen, align) + extable_size,
+			header = bpf_jit_binary_pack_alloc(prog,
+							   roundup(proglen, align) + extable_size,
 							   &image, align, &rw_header, &rw_image,
 							   jit_fill_hole);
 			if (!header) {
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 429a89c5468b..e59ff8935b12 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -2586,7 +2586,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			break;
 		}
 		if (proglen == oldproglen) {
-			header = bpf_jit_binary_alloc(proglen, &image,
+			header = bpf_jit_binary_alloc(prog, proglen, &image,
 						      1, jit_fill_hole);
 			if (!header) {
 				prog = orig_prog;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a5832a69f24e..785b720358f5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1277,7 +1277,7 @@ void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym);
 void bpf_image_ksym_del(struct bpf_ksym *ksym);
 void bpf_ksym_add(struct bpf_ksym *ksym);
 void bpf_ksym_del(struct bpf_ksym *ksym);
-int bpf_jit_charge_modmem(u32 size);
+int bpf_jit_charge_modmem(u32 size, const struct bpf_prog *prog);
 void bpf_jit_uncharge_modmem(u32 size);
 bool bpf_prog_has_trampoline(const struct bpf_prog *prog);
 #else
diff --git a/include/linux/filter.h b/include/linux/filter.h
index bbce89937fde..9c207d9848e9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1026,7 +1026,8 @@ typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
 void bpf_jit_fill_hole_with_zero(void *area, unsigned int size);
 
 struct bpf_binary_header *
-bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
+bpf_jit_binary_alloc(const struct bpf_prog *prog,
+		     unsigned int proglen, u8 **image_ptr,
 		     unsigned int alignment,
 		     bpf_jit_fill_hole_t bpf_fill_ill_insns);
 void bpf_jit_binary_free(struct bpf_binary_header *hdr);
@@ -1047,7 +1048,8 @@ static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
 }
 
 struct bpf_binary_header *
-bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image,
+bpf_jit_binary_pack_alloc(const struct bpf_prog *prog,
+			  unsigned int proglen, u8 **ro_image,
 			  unsigned int alignment,
 			  struct bpf_binary_header **rw_hdr,
 			  u8 **rw_image,
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 7421487422d4..4d057d39c286 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -978,13 +978,13 @@ static int __init bpf_jit_charge_init(void)
 }
 pure_initcall(bpf_jit_charge_init);
 
-int bpf_jit_charge_modmem(u32 size)
+int bpf_jit_charge_modmem(u32 size, const struct bpf_prog *prog)
 {
 	if (atomic_long_add_return(size, &bpf_jit_current) > READ_ONCE(bpf_jit_limit)) {
-		if (!bpf_capable()) {
-			atomic_long_sub(size, &bpf_jit_current);
-			return -EPERM;
-		}
+		if (prog ? prog->aux->bpf_capable : bpf_capable())
+			return 0;
+		atomic_long_sub(size, &bpf_jit_current);
+		return -EPERM;
 	}
 
 	return 0;
@@ -1006,7 +1006,8 @@ void __weak bpf_jit_free_exec(void *addr)
 }
 
 struct bpf_binary_header *
-bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
+bpf_jit_binary_alloc(const struct bpf_prog *prog,
+		     unsigned int proglen, u8 **image_ptr,
 		     unsigned int alignment,
 		     bpf_jit_fill_hole_t bpf_fill_ill_insns)
 {
@@ -1022,7 +1023,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 	 */
 	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
 
-	if (bpf_jit_charge_modmem(size))
+	if (bpf_jit_charge_modmem(size, prog))
 		return NULL;
 	hdr = bpf_jit_alloc_exec(size);
 	if (!hdr) {
@@ -1061,7 +1062,8 @@ void bpf_jit_binary_free(struct bpf_binary_header *hdr)
  * the JITed program to the RO memory.
  */
 struct bpf_binary_header *
-bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr,
+bpf_jit_binary_pack_alloc(const struct bpf_prog *prog,
+			  unsigned int proglen, u8 **image_ptr,
 			  unsigned int alignment,
 			  struct bpf_binary_header **rw_header,
 			  u8 **rw_image,
@@ -1076,7 +1078,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr,
 	/* add 16 bytes for a random section of illegal instructions */
 	size = round_up(proglen + sizeof(*ro_header) + 16, BPF_PROG_CHUNK_SIZE);
 
-	if (bpf_jit_charge_modmem(size))
+	if (bpf_jit_charge_modmem(size, prog))
 		return NULL;
 	ro_header = bpf_prog_pack_alloc(size, bpf_fill_ill_insns);
 	if (!ro_header) {
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index ac021bc43a66..b464807f4b62 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -355,7 +355,7 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, u32 idx)
 	if (!im)
 		goto out;
 
-	err = bpf_jit_charge_modmem(PAGE_SIZE);
+	err = bpf_jit_charge_modmem(PAGE_SIZE, NULL);
 	if (err)
 		goto out_free_im;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2023-05-02 23:06 ` [PATCH bpf-next 09/10] bpf: use recorded bpf_capable flag in JIT code Andrii Nakryiko
@ 2023-05-02 23:06 ` Andrii Nakryiko
  2023-05-04 22:20   ` Alexei Starovoitov
  2023-05-11 16:21   ` Alexei Starovoitov
  9 siblings, 2 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-02 23:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Remove remaining direct queries to perfmon_capable() and bpf_capable()
in BPF verifier logic and instead use prog->aux->{bpf,perfmon}_capable
values. This enables to have one place where permissions are checked
and granted for any given BPF program, and after that BPF subsystem will
consistently use the decision.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h    | 16 ++++++++--------
 include/linux/filter.h |  2 +-
 kernel/bpf/core.c      |  2 +-
 kernel/bpf/verifier.c  | 17 ++++++++++-------
 net/core/filter.c      |  4 ++--
 5 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 785b720358f5..8a2af67039fc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2048,26 +2048,26 @@ bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align,
 
 extern int sysctl_unprivileged_bpf_disabled;
 
-static inline bool bpf_allow_ptr_leaks(void)
+static inline bool bpf_allow_ptr_leaks(const struct bpf_prog *prog)
 {
-	return perfmon_capable();
+	return prog->aux->perfmon_capable;
 }
 
-static inline bool bpf_allow_uninit_stack(void)
+static inline bool bpf_allow_uninit_stack(const struct bpf_prog *prog)
 {
-	return perfmon_capable();
+	return prog->aux->perfmon_capable;
 }
 
-static inline bool bpf_bypass_spec_v1(void)
+static inline bool bpf_bypass_spec_v1(const struct bpf_prog *prog)
 {
-	return perfmon_capable();
+	return prog->aux->perfmon_capable;
 }
 
 int bpf_array_adjust_for_spec_v1(union bpf_attr *attr);
 
-static inline bool bpf_bypass_spec_v4(void)
+static inline bool bpf_bypass_spec_v4(const struct bpf_prog *prog)
 {
-	return perfmon_capable();
+	return prog->aux->perfmon_capable;
 }
 
 int bpf_map_new_fd(struct bpf_map *map, int flags);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9c207d9848e9..95ce7c4ab28d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1112,7 +1112,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
 		return false;
 	if (!bpf_jit_harden)
 		return false;
-	if (bpf_jit_harden == 1 && bpf_capable())
+	if (bpf_jit_harden == 1 && prog->aux->bpf_capable)
 		return false;
 
 	return true;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 4d057d39c286..c0d60da7e0e0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -661,7 +661,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
 	if (!bpf_prog_kallsyms_candidate(fp) ||
-	    !bpf_capable())
+	    !fp->aux->bpf_capable)
 		return;
 
 	bpf_prog_ksym_set_addr(fp);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 481aaf189183..7be0abc196db 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17194,6 +17194,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->poke_tab = prog->aux->poke_tab;
 		func[i]->aux->size_poke_tab = prog->aux->size_poke_tab;
 
+		/* inherit main prog's effective capabilities */
+		func[i]->aux->bpf_capable = prog->aux->bpf_capable;
+		func[i]->aux->perfmon_capable = prog->aux->perfmon_capable;
+
 		for (j = 0; j < prog->aux->size_poke_tab; j++) {
 			struct bpf_jit_poke_descriptor *poke;
 
@@ -18878,7 +18882,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	env->prog = *prog;
 	env->ops = bpf_verifier_ops[env->prog->type];
 	env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
-	is_priv = bpf_capable();
+
+	env->allow_ptr_leaks = bpf_allow_ptr_leaks(*prog);
+	env->allow_uninit_stack = bpf_allow_uninit_stack(*prog);
+	env->bypass_spec_v1 = bpf_bypass_spec_v1(*prog);
+	env->bypass_spec_v4 = bpf_bypass_spec_v4(*prog);
+	env->bpf_capable = is_priv = (*prog)->aux->bpf_capable;
 
 	bpf_get_btf_vmlinux();
 
@@ -18910,12 +18919,6 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
 		env->strict_alignment = false;
 
-	env->allow_ptr_leaks = bpf_allow_ptr_leaks();
-	env->allow_uninit_stack = bpf_allow_uninit_stack();
-	env->bypass_spec_v1 = bpf_bypass_spec_v1();
-	env->bypass_spec_v4 = bpf_bypass_spec_v4();
-	env->bpf_capable = bpf_capable();
-
 	if (is_priv)
 		env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 8e282797e658..8a7e86230f2a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8432,7 +8432,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		return false;
 	case bpf_ctx_range(struct __sk_buff, data):
 	case bpf_ctx_range(struct __sk_buff, data_end):
-		if (!bpf_capable())
+		if (!prog->aux->bpf_capable)
 			return false;
 		break;
 	}
@@ -8444,7 +8444,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
 			break;
 		case bpf_ctx_range(struct __sk_buff, tstamp):
-			if (!bpf_capable())
+			if (!prog->aux->bpf_capable)
 				return false;
 			break;
 		default:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-05-02 23:06 ` [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
@ 2023-05-03 18:28   ` Stanislav Fomichev
  2023-05-03 19:04     ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Stanislav Fomichev @ 2023-05-03 18:28 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

On 05/02, Andrii Nakryiko wrote:
> Make each bpf() syscall command a bit more self-contained, making it
> easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
> handling down to map_create() and bpf_prog_load(), two special commands
> in this regard.
> 
> Also swap the order of checks, calling bpf_capable() only if
> sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit
> messages.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 14f39c1e573e..d5009fafe0f4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1132,6 +1132,17 @@ static int map_create(union bpf_attr *attr)
>  	int f_flags;
>  	int err;

[..]

> +	/* Intent here is for unprivileged_bpf_disabled to block key object
> +	 * creation commands for unprivileged users; other actions depend
> +	 * of fd availability and access to bpffs, so are dependent on
> +	 * object creation success.  Capabilities are later verified for
> +	 * operations such as load and map create, so even with unprivileged
> +	 * BPF disabled, capability checks are still carried out for these
> +	 * and other operations.
> +	 */
> +	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> +		return -EPERM;
> +

Does it make sense to have something like unpriv_bpf_capable() to avoid
the copy-paste?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-05-03 18:28   ` Stanislav Fomichev
@ 2023-05-03 19:04     ` Andrii Nakryiko
  2023-05-03 22:33       ` Stanislav Fomichev
  0 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-03 19:04 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Wed, May 3, 2023 at 11:28 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On 05/02, Andrii Nakryiko wrote:
> > Make each bpf() syscall command a bit more self-contained, making it
> > easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
> > handling down to map_create() and bpf_prog_load(), two special commands
> > in this regard.
> >
> > Also swap the order of checks, calling bpf_capable() only if
> > sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit
> > messages.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
> >  1 file changed, 22 insertions(+), 15 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 14f39c1e573e..d5009fafe0f4 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1132,6 +1132,17 @@ static int map_create(union bpf_attr *attr)
> >       int f_flags;
> >       int err;
>
> [..]
>
> > +     /* Intent here is for unprivileged_bpf_disabled to block key object
> > +      * creation commands for unprivileged users; other actions depend
> > +      * of fd availability and access to bpffs, so are dependent on
> > +      * object creation success.  Capabilities are later verified for
> > +      * operations such as load and map create, so even with unprivileged
> > +      * BPF disabled, capability checks are still carried out for these
> > +      * and other operations.
> > +      */
> > +     if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > +             return -EPERM;
> > +
>
> Does it make sense to have something like unpriv_bpf_capable() to avoid
> the copy-paste?

It's a simple if condition used in exactly two places, so I had
preference to keep permissions checks grouped together in the same
block of code in respective MAP_CREATE and PROG_LOAD handlers. I can
factor it out into a function, but I see little value in that, tbh.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-05-03 19:04     ` Andrii Nakryiko
@ 2023-05-03 22:33       ` Stanislav Fomichev
  2023-05-04 18:52         ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Stanislav Fomichev @ 2023-05-03 22:33 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Wed, May 3, 2023 at 12:04 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, May 3, 2023 at 11:28 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 05/02, Andrii Nakryiko wrote:
> > > Make each bpf() syscall command a bit more self-contained, making it
> > > easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
> > > handling down to map_create() and bpf_prog_load(), two special commands
> > > in this regard.
> > >
> > > Also swap the order of checks, calling bpf_capable() only if
> > > sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit
> > > messages.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
> > >  1 file changed, 22 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 14f39c1e573e..d5009fafe0f4 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -1132,6 +1132,17 @@ static int map_create(union bpf_attr *attr)
> > >       int f_flags;
> > >       int err;
> >
> > [..]
> >
> > > +     /* Intent here is for unprivileged_bpf_disabled to block key object
> > > +      * creation commands for unprivileged users; other actions depend
> > > +      * of fd availability and access to bpffs, so are dependent on
> > > +      * object creation success.  Capabilities are later verified for
> > > +      * operations such as load and map create, so even with unprivileged
> > > +      * BPF disabled, capability checks are still carried out for these
> > > +      * and other operations.
> > > +      */
> > > +     if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > > +             return -EPERM;
> > > +
> >
> > Does it make sense to have something like unpriv_bpf_capable() to avoid
> > the copy-paste?
>
> It's a simple if condition used in exactly two places, so I had
> preference to keep permissions checks grouped together in the same
> block of code in respective MAP_CREATE and PROG_LOAD handlers. I can
> factor it out into a function, but I see little value in that, tbh.

Ack, up to you. I'm more concerned about the comment tbh, not the check itself.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load()
  2023-05-03 22:33       ` Stanislav Fomichev
@ 2023-05-04 18:52         ` Andrii Nakryiko
  0 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-04 18:52 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Wed, May 3, 2023 at 3:33 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Wed, May 3, 2023 at 12:04 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, May 3, 2023 at 11:28 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On 05/02, Andrii Nakryiko wrote:
> > > > Make each bpf() syscall command a bit more self-contained, making it
> > > > easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
> > > > handling down to map_create() and bpf_prog_load(), two special commands
> > > > in this regard.
> > > >
> > > > Also swap the order of checks, calling bpf_capable() only if
> > > > sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit
> > > > messages.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  kernel/bpf/syscall.c | 37 ++++++++++++++++++++++---------------
> > > >  1 file changed, 22 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > index 14f39c1e573e..d5009fafe0f4 100644
> > > > --- a/kernel/bpf/syscall.c
> > > > +++ b/kernel/bpf/syscall.c
> > > > @@ -1132,6 +1132,17 @@ static int map_create(union bpf_attr *attr)
> > > >       int f_flags;
> > > >       int err;
> > >
> > > [..]
> > >
> > > > +     /* Intent here is for unprivileged_bpf_disabled to block key object
> > > > +      * creation commands for unprivileged users; other actions depend
> > > > +      * of fd availability and access to bpffs, so are dependent on
> > > > +      * object creation success.  Capabilities are later verified for
> > > > +      * operations such as load and map create, so even with unprivileged
> > > > +      * BPF disabled, capability checks are still carried out for these
> > > > +      * and other operations.
> > > > +      */
> > > > +     if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > > > +             return -EPERM;
> > > > +
> > >
> > > Does it make sense to have something like unpriv_bpf_capable() to avoid
> > > the copy-paste?
> >
> > It's a simple if condition used in exactly two places, so I had
> > preference to keep permissions checks grouped together in the same
> > block of code in respective MAP_CREATE and PROG_LOAD handlers. I can
> > factor it out into a function, but I see little value in that, tbh.
>
> Ack, up to you. I'm more concerned about the comment tbh, not the check itself.

I'll trim those comments down and keep relevant map *or* prog parts.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently
  2023-05-02 23:06 ` [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently Andrii Nakryiko
@ 2023-05-04 20:05   ` Alexei Starovoitov
  2023-05-04 22:51     ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-04 20:05 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

On Tue, May 02, 2023 at 04:06:13PM -0700, Andrii Nakryiko wrote:
>  }
>  
> -static struct bpf_map *array_map_alloc(union bpf_attr *attr)
> +static u32 array_index_mask(u32 max_entries)
>  {
> -	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
> -	int numa_node = bpf_map_attr_numa_node(attr);
> -	u32 elem_size, index_mask, max_entries;
> -	bool bypass_spec_v1 = bpf_bypass_spec_v1();

static inline bool bpf_bypass_spec_v1(void)
{
        return perfmon_capable();
}

> +		/* unprivileged is OK, but we still record if we had CAP_BPF */
> +		unpriv = !bpf_capable();

map->unpriv flag makes sense as !CAP_BPF,
but it's not equivalent to bpf_bypass_spec_v1.

>  		break;
>  	default:
>  		WARN(1, "unsupported map type %d", map_type);
>  		return -EPERM;
>  	}
>  
> +	/* ARRAY-like maps have special sizing provisions for mitigating Spectre v1 */
> +	if (unpriv) {
> +		switch (map_type) {
> +		case BPF_MAP_TYPE_ARRAY:
> +		case BPF_MAP_TYPE_PERCPU_ARRAY:
> +		case BPF_MAP_TYPE_PROG_ARRAY:
> +		case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
> +		case BPF_MAP_TYPE_CGROUP_ARRAY:
> +		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
> +			err = bpf_array_adjust_for_spec_v1(attr);
> +			if (err)
> +				return err;
> +			break;
> +		}
> +	}
> +
>  	map = ops->map_alloc(attr);
>  	if (IS_ERR(map))
>  		return PTR_ERR(map);
>  	map->ops = ops;
>  	map->map_type = map_type;
> +	map->unpriv = unpriv;
>  
>  	err = bpf_obj_name_cpy(map->name, attr->map_name,
>  			       sizeof(attr->map_name));
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index ff4a8ab99f08..481aaf189183 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8731,11 +8731,9 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
>  	}
>  
>  	if (!BPF_MAP_PTR(aux->map_ptr_state))
> -		bpf_map_ptr_store(aux, meta->map_ptr,
> -				  !meta->map_ptr->bypass_spec_v1);
> +		bpf_map_ptr_store(aux, meta->map_ptr, meta->map_ptr->unpriv);
>  	else if (BPF_MAP_PTR(aux->map_ptr_state) != meta->map_ptr)
> -		bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON,
> -				  !meta->map_ptr->bypass_spec_v1);
> +		bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON, meta->map_ptr->unpriv);
>  	return 0;
>  }
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 06/10] bpf: keep BPF_PROG_LOAD permission checks clear of validations
  2023-05-02 23:06 ` [PATCH bpf-next 06/10] bpf: keep BPF_PROG_LOAD permission checks clear of validations Andrii Nakryiko
@ 2023-05-04 20:12   ` Alexei Starovoitov
  2023-05-04 22:51     ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-04 20:12 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

On Tue, May 02, 2023 at 04:06:15PM -0700, Andrii Nakryiko wrote:
> Move out flags validation and license checks out of the permission
> checks. They were intermingled, which makes subsequent changes harder.
> Clean this up: perform straightforward flag validation upfront, and
> fetch and check license later, right where we use it. Also consolidate
> capabilities check in one block, right after basic attribute sanity
> checks.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/bpf/syscall.c | 44 +++++++++++++++++++++-----------------------
>  1 file changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 2e5ca52c45c4..d960eb476754 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2573,18 +2573,6 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
>  	struct btf *attach_btf = NULL;
>  	int err;
>  	char license[128];
> -	bool is_gpl;
> -
> -	/* Intent here is for unprivileged_bpf_disabled to block key object
> -	 * creation commands for unprivileged users; other actions depend
> -	 * of fd availability and access to bpffs, so are dependent on
> -	 * object creation success.  Capabilities are later verified for
> -	 * operations such as load and map create, so even with unprivileged
> -	 * BPF disabled, capability checks are still carried out for these
> -	 * and other operations.
> -	 */
> -	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> -		return -EPERM;
>  
>  	if (CHECK_ATTR(BPF_PROG_LOAD))
>  		return -EINVAL;
> @@ -2598,21 +2586,22 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
>  				 BPF_F_XDP_DEV_BOUND_ONLY))
>  		return -EINVAL;
>  
> +	/* Intent here is for unprivileged_bpf_disabled to block key object
> +	 * creation commands for unprivileged users; other actions depend
> +	 * of fd availability and access to bpffs, so are dependent on
> +	 * object creation success.  Capabilities are later verified for
> +	 * operations such as load and map create, so even with unprivileged
> +	 * BPF disabled, capability checks are still carried out for these
> +	 * and other operations.
> +	 */
> +	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> +		return -EPERM;
> +

Since that part was done in patch 1. Move it into right spot in patch 1 right away?

>  	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
>  	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
>  	    !bpf_capable())
>  		return -EPERM;
>  
> -	/* copy eBPF program license from user space */
> -	if (strncpy_from_bpfptr(license,
> -				make_bpfptr(attr->license, uattr.is_kernel),
> -				sizeof(license) - 1) < 0)
> -		return -EFAULT;
> -	license[sizeof(license) - 1] = 0;
> -
> -	/* eBPF programs must be GPL compatible to use GPL-ed functions */
> -	is_gpl = license_is_gpl_compatible(license);
> -
>  	if (attr->insn_cnt == 0 ||
>  	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
>  		return -E2BIG;
> @@ -2700,7 +2689,16 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
>  	prog->jited = 0;
>  
>  	atomic64_set(&prog->aux->refcnt, 1);
> -	prog->gpl_compatible = is_gpl ? 1 : 0;
> +
> +	/* copy eBPF program license from user space */
> +	if (strncpy_from_bpfptr(license,
> +				make_bpfptr(attr->license, uattr.is_kernel),
> +				sizeof(license) - 1) < 0)
> +		return -EFAULT;

This 'return' is incorrect. It misses lots of cleanup.
Should probably be 'goto free_prog_sec', but pls double check. 
We don't have tests to check error handling. Just lucky code review.

> +	license[sizeof(license) - 1] = 0;
> +
> +	/* eBPF programs must be GPL compatible to use GPL-ed functions */
> +	prog->gpl_compatible = license_is_gpl_compatible(license) ? 1 : 0;
>  
>  	if (bpf_prog_is_dev_bound(prog->aux)) {
>  		err = bpf_prog_dev_bound_init(prog, attr);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 09/10] bpf: use recorded bpf_capable flag in JIT code
  2023-05-02 23:06 ` [PATCH bpf-next 09/10] bpf: use recorded bpf_capable flag in JIT code Andrii Nakryiko
@ 2023-05-04 22:09   ` Alexei Starovoitov
  2023-05-04 22:51     ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-04 22:09 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

On Tue, May 02, 2023 at 04:06:18PM -0700, Andrii Nakryiko wrote:
>  
> -int bpf_jit_charge_modmem(u32 size)
> +int bpf_jit_charge_modmem(u32 size, const struct bpf_prog *prog)
>  {
>  	if (atomic_long_add_return(size, &bpf_jit_current) > READ_ONCE(bpf_jit_limit)) {
> -		if (!bpf_capable()) {
> -			atomic_long_sub(size, &bpf_jit_current);
> -			return -EPERM;
> -		}
> +		if (prog ? prog->aux->bpf_capable : bpf_capable())
> +			return 0;

I would drop this patch.
It still has to fall back to bpf_capable for trampolines and
its 'help' to cap_bpf is minimal. That limit on all practical systems is huge.
It won't have any effect for your future follow ups for cap_bpf in containers.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-02 23:06 ` [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier Andrii Nakryiko
@ 2023-05-04 22:20   ` Alexei Starovoitov
  2023-05-04 22:51     ` Andrii Nakryiko
  2023-05-11 16:21   ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-04 22:20 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

On Tue, May 02, 2023 at 04:06:19PM -0700, Andrii Nakryiko wrote:
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 4d057d39c286..c0d60da7e0e0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -661,7 +661,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
>  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
>  {
>  	if (!bpf_prog_kallsyms_candidate(fp) ||
> -	    !bpf_capable())
> +	    !fp->aux->bpf_capable)
>  		return;

Looking at this bit made me worry about classic bpf.
bpf_prog_alloc_no_stats() zeros all fields include aux->bpf_capable.
And loading of classic progs doesn't go through bpf_check().
So fp->aux->bpf_capable will stay 'false' even when root loads cBPF.
It doesn't matter here, since bpf_prog_kallsyms_candidate() will return false
for cBPF.

Maybe we should init aux->bpf_capable in bpf_prog_alloc_no_stats()
to stay consistent between cBPF and eBPF ?
It probably has no effect, but anyone looking at crash dumps with drgn
will have a consistent view of aux->bpf_capable field.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently
  2023-05-04 20:05   ` Alexei Starovoitov
@ 2023-05-04 22:51     ` Andrii Nakryiko
  2023-05-04 22:54       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-04 22:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, May 4, 2023 at 1:05 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 02, 2023 at 04:06:13PM -0700, Andrii Nakryiko wrote:
> >  }
> >
> > -static struct bpf_map *array_map_alloc(union bpf_attr *attr)
> > +static u32 array_index_mask(u32 max_entries)
> >  {
> > -     bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
> > -     int numa_node = bpf_map_attr_numa_node(attr);
> > -     u32 elem_size, index_mask, max_entries;
> > -     bool bypass_spec_v1 = bpf_bypass_spec_v1();
>
> static inline bool bpf_bypass_spec_v1(void)
> {
>         return perfmon_capable();
> }
>
> > +             /* unprivileged is OK, but we still record if we had CAP_BPF */
> > +             unpriv = !bpf_capable();
>
> map->unpriv flag makes sense as !CAP_BPF,
> but it's not equivalent to bpf_bypass_spec_v1.
>

argh, right, it's perfmon_capable() :(

what do you propose? do bpf_capable and perfmon_capable fields for
each map separately? or keep unpriv and add perfmon_capable
separately? or any better ideas?..


> >               break;
> >       default:
> >               WARN(1, "unsupported map type %d", map_type);
> >               return -EPERM;
> >       }
> >
> > +     /* ARRAY-like maps have special sizing provisions for mitigating Spectre v1 */
> > +     if (unpriv) {
> > +             switch (map_type) {
> > +             case BPF_MAP_TYPE_ARRAY:
> > +             case BPF_MAP_TYPE_PERCPU_ARRAY:
> > +             case BPF_MAP_TYPE_PROG_ARRAY:
> > +             case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
> > +             case BPF_MAP_TYPE_CGROUP_ARRAY:
> > +             case BPF_MAP_TYPE_ARRAY_OF_MAPS:
> > +                     err = bpf_array_adjust_for_spec_v1(attr);
> > +                     if (err)
> > +                             return err;
> > +                     break;
> > +             }
> > +     }
> > +
> >       map = ops->map_alloc(attr);
> >       if (IS_ERR(map))
> >               return PTR_ERR(map);
> >       map->ops = ops;
> >       map->map_type = map_type;
> > +     map->unpriv = unpriv;
> >
> >       err = bpf_obj_name_cpy(map->name, attr->map_name,
> >                              sizeof(attr->map_name));
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index ff4a8ab99f08..481aaf189183 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -8731,11 +8731,9 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
> >       }
> >
> >       if (!BPF_MAP_PTR(aux->map_ptr_state))
> > -             bpf_map_ptr_store(aux, meta->map_ptr,
> > -                               !meta->map_ptr->bypass_spec_v1);
> > +             bpf_map_ptr_store(aux, meta->map_ptr, meta->map_ptr->unpriv);
> >       else if (BPF_MAP_PTR(aux->map_ptr_state) != meta->map_ptr)
> > -             bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON,
> > -                               !meta->map_ptr->bypass_spec_v1);
> > +             bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON, meta->map_ptr->unpriv);
> >       return 0;
> >  }
> >
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 06/10] bpf: keep BPF_PROG_LOAD permission checks clear of validations
  2023-05-04 20:12   ` Alexei Starovoitov
@ 2023-05-04 22:51     ` Andrii Nakryiko
  0 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-04 22:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, May 4, 2023 at 1:13 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 02, 2023 at 04:06:15PM -0700, Andrii Nakryiko wrote:
> > Move out flags validation and license checks out of the permission
> > checks. They were intermingled, which makes subsequent changes harder.
> > Clean this up: perform straightforward flag validation upfront, and
> > fetch and check license later, right where we use it. Also consolidate
> > capabilities check in one block, right after basic attribute sanity
> > checks.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  kernel/bpf/syscall.c | 44 +++++++++++++++++++++-----------------------
> >  1 file changed, 21 insertions(+), 23 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 2e5ca52c45c4..d960eb476754 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2573,18 +2573,6 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
> >       struct btf *attach_btf = NULL;
> >       int err;
> >       char license[128];
> > -     bool is_gpl;
> > -
> > -     /* Intent here is for unprivileged_bpf_disabled to block key object
> > -      * creation commands for unprivileged users; other actions depend
> > -      * of fd availability and access to bpffs, so are dependent on
> > -      * object creation success.  Capabilities are later verified for
> > -      * operations such as load and map create, so even with unprivileged
> > -      * BPF disabled, capability checks are still carried out for these
> > -      * and other operations.
> > -      */
> > -     if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > -             return -EPERM;
> >
> >       if (CHECK_ATTR(BPF_PROG_LOAD))
> >               return -EINVAL;
> > @@ -2598,21 +2586,22 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
> >                                BPF_F_XDP_DEV_BOUND_ONLY))
> >               return -EINVAL;
> >
> > +     /* Intent here is for unprivileged_bpf_disabled to block key object
> > +      * creation commands for unprivileged users; other actions depend
> > +      * of fd availability and access to bpffs, so are dependent on
> > +      * object creation success.  Capabilities are later verified for
> > +      * operations such as load and map create, so even with unprivileged
> > +      * BPF disabled, capability checks are still carried out for these
> > +      * and other operations.
> > +      */
> > +     if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > +             return -EPERM;
> > +
>
> Since that part was done in patch 1. Move it into right spot in patch 1 right away?

fair enough, will do it right away in patch 1

>
> >       if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
> >           (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
> >           !bpf_capable())
> >               return -EPERM;
> >
> > -     /* copy eBPF program license from user space */
> > -     if (strncpy_from_bpfptr(license,
> > -                             make_bpfptr(attr->license, uattr.is_kernel),
> > -                             sizeof(license) - 1) < 0)
> > -             return -EFAULT;
> > -     license[sizeof(license) - 1] = 0;
> > -
> > -     /* eBPF programs must be GPL compatible to use GPL-ed functions */
> > -     is_gpl = license_is_gpl_compatible(license);
> > -
> >       if (attr->insn_cnt == 0 ||
> >           attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
> >               return -E2BIG;
> > @@ -2700,7 +2689,16 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
> >       prog->jited = 0;
> >
> >       atomic64_set(&prog->aux->refcnt, 1);
> > -     prog->gpl_compatible = is_gpl ? 1 : 0;
> > +
> > +     /* copy eBPF program license from user space */
> > +     if (strncpy_from_bpfptr(license,
> > +                             make_bpfptr(attr->license, uattr.is_kernel),
> > +                             sizeof(license) - 1) < 0)
> > +             return -EFAULT;
>
> This 'return' is incorrect. It misses lots of cleanup.
> Should probably be 'goto free_prog_sec', but pls double check.
> We don't have tests to check error handling. Just lucky code review.
>

yep, lucky indeed. My bad, I should have adjusted it to `goto free_prog_sec;`.


> > +     license[sizeof(license) - 1] = 0;
> > +
> > +     /* eBPF programs must be GPL compatible to use GPL-ed functions */
> > +     prog->gpl_compatible = license_is_gpl_compatible(license) ? 1 : 0;
> >
> >       if (bpf_prog_is_dev_bound(prog->aux)) {
> >               err = bpf_prog_dev_bound_init(prog, attr);
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 09/10] bpf: use recorded bpf_capable flag in JIT code
  2023-05-04 22:09   ` Alexei Starovoitov
@ 2023-05-04 22:51     ` Andrii Nakryiko
  0 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-04 22:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, May 4, 2023 at 3:09 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 02, 2023 at 04:06:18PM -0700, Andrii Nakryiko wrote:
> >
> > -int bpf_jit_charge_modmem(u32 size)
> > +int bpf_jit_charge_modmem(u32 size, const struct bpf_prog *prog)
> >  {
> >       if (atomic_long_add_return(size, &bpf_jit_current) > READ_ONCE(bpf_jit_limit)) {
> > -             if (!bpf_capable()) {
> > -                     atomic_long_sub(size, &bpf_jit_current);
> > -                     return -EPERM;
> > -             }
> > +             if (prog ? prog->aux->bpf_capable : bpf_capable())
> > +                     return 0;
>
> I would drop this patch.
> It still has to fall back to bpf_capable for trampolines and
> its 'help' to cap_bpf is minimal. That limit on all practical systems is huge.
> It won't have any effect for your future follow ups for cap_bpf in containers.

fair enough, will drop

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-04 22:20   ` Alexei Starovoitov
@ 2023-05-04 22:51     ` Andrii Nakryiko
  2023-05-05 19:08       ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-04 22:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, May 4, 2023 at 3:20 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 02, 2023 at 04:06:19PM -0700, Andrii Nakryiko wrote:
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 4d057d39c286..c0d60da7e0e0 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -661,7 +661,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
> >  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> >  {
> >       if (!bpf_prog_kallsyms_candidate(fp) ||
> > -         !bpf_capable())
> > +         !fp->aux->bpf_capable)
> >               return;
>
> Looking at this bit made me worry about classic bpf.
> bpf_prog_alloc_no_stats() zeros all fields include aux->bpf_capable.
> And loading of classic progs doesn't go through bpf_check().
> So fp->aux->bpf_capable will stay 'false' even when root loads cBPF.
> It doesn't matter here, since bpf_prog_kallsyms_candidate() will return false
> for cBPF.
>
> Maybe we should init aux->bpf_capable in bpf_prog_alloc_no_stats()
> to stay consistent between cBPF and eBPF ?
> It probably has no effect, but anyone looking at crash dumps with drgn
> will have a consistent view of aux->bpf_capable field.

classic BPF predates my involvement with BPF, so I didn't even think
about that. I'll check and make sure we do initialize aux->bpf_capable
for that

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently
  2023-05-04 22:51     ` Andrii Nakryiko
@ 2023-05-04 22:54       ` Alexei Starovoitov
  2023-05-04 23:06         ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-04 22:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, May 04, 2023 at 03:51:16PM -0700, Andrii Nakryiko wrote:
> On Thu, May 4, 2023 at 1:05 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, May 02, 2023 at 04:06:13PM -0700, Andrii Nakryiko wrote:
> > >  }
> > >
> > > -static struct bpf_map *array_map_alloc(union bpf_attr *attr)
> > > +static u32 array_index_mask(u32 max_entries)
> > >  {
> > > -     bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
> > > -     int numa_node = bpf_map_attr_numa_node(attr);
> > > -     u32 elem_size, index_mask, max_entries;
> > > -     bool bypass_spec_v1 = bpf_bypass_spec_v1();
> >
> > static inline bool bpf_bypass_spec_v1(void)
> > {
> >         return perfmon_capable();
> > }
> >
> > > +             /* unprivileged is OK, but we still record if we had CAP_BPF */
> > > +             unpriv = !bpf_capable();
> >
> > map->unpriv flag makes sense as !CAP_BPF,
> > but it's not equivalent to bpf_bypass_spec_v1.
> >
> 
> argh, right, it's perfmon_capable() :(
> 
> what do you propose? do bpf_capable and perfmon_capable fields for
> each map separately? or keep unpriv and add perfmon_capable
> separately? or any better ideas?..

Instead of map->unpriv I'd add map->bpf_capable and map->perfmon_capable
just like we'll be doing to progs.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently
  2023-05-04 22:54       ` Alexei Starovoitov
@ 2023-05-04 23:06         ` Andrii Nakryiko
  0 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-04 23:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, May 4, 2023 at 3:55 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, May 04, 2023 at 03:51:16PM -0700, Andrii Nakryiko wrote:
> > On Thu, May 4, 2023 at 1:05 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, May 02, 2023 at 04:06:13PM -0700, Andrii Nakryiko wrote:
> > > >  }
> > > >
> > > > -static struct bpf_map *array_map_alloc(union bpf_attr *attr)
> > > > +static u32 array_index_mask(u32 max_entries)
> > > >  {
> > > > -     bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
> > > > -     int numa_node = bpf_map_attr_numa_node(attr);
> > > > -     u32 elem_size, index_mask, max_entries;
> > > > -     bool bypass_spec_v1 = bpf_bypass_spec_v1();
> > >
> > > static inline bool bpf_bypass_spec_v1(void)
> > > {
> > >         return perfmon_capable();
> > > }
> > >
> > > > +             /* unprivileged is OK, but we still record if we had CAP_BPF */
> > > > +             unpriv = !bpf_capable();
> > >
> > > map->unpriv flag makes sense as !CAP_BPF,
> > > but it's not equivalent to bpf_bypass_spec_v1.
> > >
> >
> > argh, right, it's perfmon_capable() :(
> >
> > what do you propose? do bpf_capable and perfmon_capable fields for
> > each map separately? or keep unpriv and add perfmon_capable
> > separately? or any better ideas?..
>
> Instead of map->unpriv I'd add map->bpf_capable and map->perfmon_capable
> just like we'll be doing to progs.

ok, sounds good!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-04 22:51     ` Andrii Nakryiko
@ 2023-05-05 19:08       ` Andrii Nakryiko
  2023-05-05 19:55         ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-05 19:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, May 4, 2023 at 3:51 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, May 4, 2023 at 3:20 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, May 02, 2023 at 04:06:19PM -0700, Andrii Nakryiko wrote:
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 4d057d39c286..c0d60da7e0e0 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -661,7 +661,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
> > >  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> > >  {
> > >       if (!bpf_prog_kallsyms_candidate(fp) ||
> > > -         !bpf_capable())
> > > +         !fp->aux->bpf_capable)
> > >               return;
> >
> > Looking at this bit made me worry about classic bpf.
> > bpf_prog_alloc_no_stats() zeros all fields include aux->bpf_capable.
> > And loading of classic progs doesn't go through bpf_check().
> > So fp->aux->bpf_capable will stay 'false' even when root loads cBPF.
> > It doesn't matter here, since bpf_prog_kallsyms_candidate() will return false
> > for cBPF.
> >
> > Maybe we should init aux->bpf_capable in bpf_prog_alloc_no_stats()
> > to stay consistent between cBPF and eBPF ?

I'd like to avoid doing this deep inside bpf_prog_alloc_no_stats() or
bpf_prog_alloc() because this decision about privileges will be later
based on some other factors besides CAP_BPF. So hard-coding
bpf_capable() here defeats the purpose.

I did look at classic BPF, there are three places in net/core/filter.c
where we allocated struct bpf_prog from  struct sock_fprog/struct
sock_fprog_kern: bpf_prog_create, bpf_prog_create_from_user,
__get_filter.

Would it be ok if I just hard-coded `prog->aux->bpf_capable =
bpf_capable();` (and same for perfmon) in those three functions? Or
leave them always false? Because taking a step back a bit, do we want
to allow privileged classic BPF programs at all? Maybe it's actually
good that we force those cBPF programs to be unprivileged?

Right now it doesn't even matter, I think, because these privileges
are only checked during verification. But assuming that in the future
we might want to check that at runtime from helpers/kfuncs, probably
best to have a strategy here.

Tl;dr, I don't know what's the best approach to take, but just calling
into bpf_capable() deeply inside bpf_prog_alloc/bpf_prog_alloc_stats
makes subsequent work to implement trusted unpriv harder. Thoughts?

> > It probably has no effect, but anyone looking at crash dumps with drgn
> > will have a consistent view of aux->bpf_capable field.
>
> classic BPF predates my involvement with BPF, so I didn't even think
> about that. I'll check and make sure we do initialize aux->bpf_capable
> for that

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-05 19:08       ` Andrii Nakryiko
@ 2023-05-05 19:55         ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-05 19:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Fri, May 05, 2023 at 12:08:21PM -0700, Andrii Nakryiko wrote:
> On Thu, May 4, 2023 at 3:51 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, May 4, 2023 at 3:20 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, May 02, 2023 at 04:06:19PM -0700, Andrii Nakryiko wrote:
> > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > index 4d057d39c286..c0d60da7e0e0 100644
> > > > --- a/kernel/bpf/core.c
> > > > +++ b/kernel/bpf/core.c
> > > > @@ -661,7 +661,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
> > > >  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> > > >  {
> > > >       if (!bpf_prog_kallsyms_candidate(fp) ||
> > > > -         !bpf_capable())
> > > > +         !fp->aux->bpf_capable)
> > > >               return;
> > >
> > > Looking at this bit made me worry about classic bpf.
> > > bpf_prog_alloc_no_stats() zeros all fields include aux->bpf_capable.
> > > And loading of classic progs doesn't go through bpf_check().
> > > So fp->aux->bpf_capable will stay 'false' even when root loads cBPF.
> > > It doesn't matter here, since bpf_prog_kallsyms_candidate() will return false
> > > for cBPF.
> > >
> > > Maybe we should init aux->bpf_capable in bpf_prog_alloc_no_stats()
> > > to stay consistent between cBPF and eBPF ?
> 
> I'd like to avoid doing this deep inside bpf_prog_alloc_no_stats() or
> bpf_prog_alloc() because this decision about privileges will be later
> based on some other factors besides CAP_BPF. So hard-coding
> bpf_capable() here defeats the purpose.
> 
> I did look at classic BPF, there are three places in net/core/filter.c
> where we allocated struct bpf_prog from  struct sock_fprog/struct
> sock_fprog_kern: bpf_prog_create, bpf_prog_create_from_user,
> __get_filter.
> 
> Would it be ok if I just hard-coded `prog->aux->bpf_capable =
> bpf_capable();` (and same for perfmon) in those three functions? Or
> leave them always false? Because taking a step back a bit, do we want
> to allow privileged classic BPF programs at all? Maybe it's actually
> good that we force those cBPF programs to be unprivileged?

Indeed. cBPF is always allowed for unpriv.
In that sense the value prog->aux->bpf_capable for cBPF is meaningless.
Let's keep the patch as-is then.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-02 23:06 ` [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier Andrii Nakryiko
  2023-05-04 22:20   ` Alexei Starovoitov
@ 2023-05-11 16:21   ` Alexei Starovoitov
  2023-05-15 16:42     ` Andrii Nakryiko
  1 sibling, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-11 16:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Tue, May 2, 2023 at 4:09 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> @@ -18878,7 +18882,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
>         env->prog = *prog;
>         env->ops = bpf_verifier_ops[env->prog->type];
>         env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
> -       is_priv = bpf_capable();
> +
> +       env->allow_ptr_leaks = bpf_allow_ptr_leaks(*prog);
> +       env->allow_uninit_stack = bpf_allow_uninit_stack(*prog);
> +       env->bypass_spec_v1 = bpf_bypass_spec_v1(*prog);
> +       env->bypass_spec_v4 = bpf_bypass_spec_v4(*prog);
> +       env->bpf_capable = is_priv = (*prog)->aux->bpf_capable;

Just remembered that moving all CAP* checks early
(before they actually needed)
might be problematic.
See
https://lore.kernel.org/all/20230511142535.732324-10-cgzones@googlemail.com/

This patch set is reducing the number of cap* checks which is
a good thing from audit pov, but it calls them early before the cap
is actually needed and that part is misleading for audit.
I'm afraid we cannot do one big switch for all map types after bpf_capable.
The bpf_capable for maps needs to be done on demand.
For progs we should also do it on demand too.
socket_filter and cg_skb should proceed without cap* checks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-11 16:21   ` Alexei Starovoitov
@ 2023-05-15 16:42     ` Andrii Nakryiko
  2023-05-15 18:38       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2023-05-15 16:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team

On Thu, May 11, 2023 at 9:21 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 2, 2023 at 4:09 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > @@ -18878,7 +18882,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
> >         env->prog = *prog;
> >         env->ops = bpf_verifier_ops[env->prog->type];
> >         env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
> > -       is_priv = bpf_capable();
> > +
> > +       env->allow_ptr_leaks = bpf_allow_ptr_leaks(*prog);
> > +       env->allow_uninit_stack = bpf_allow_uninit_stack(*prog);
> > +       env->bypass_spec_v1 = bpf_bypass_spec_v1(*prog);
> > +       env->bypass_spec_v4 = bpf_bypass_spec_v4(*prog);
> > +       env->bpf_capable = is_priv = (*prog)->aux->bpf_capable;
>
> Just remembered that moving all CAP* checks early
> (before they actually needed)
> might be problematic.
> See
> https://lore.kernel.org/all/20230511142535.732324-10-cgzones@googlemail.com/
>
> This patch set is reducing the number of cap* checks which is
> a good thing from audit pov, but it calls them early before the cap
> is actually needed and that part is misleading for audit.
> I'm afraid we cannot do one big switch for all map types after bpf_capable.
> The bpf_capable for maps needs to be done on demand.
> For progs we should also do it on demand too.
> socket_filter and cg_skb should proceed without cap* checks.

Ok, fair enough. With BPF token approach this shouldn't be a big deal anyways.

Does patch #5 ("bpf: drop unnecessary bpf_capable() check in
BPF_MAP_FREEZE command") look good? I think it makes sense to land
either way, given any other map-related operation isn't privileged
once user has map FD. I'll send it separately.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier
  2023-05-15 16:42     ` Andrii Nakryiko
@ 2023-05-15 18:38       ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2023-05-15 18:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team

On Mon, May 15, 2023 at 9:42 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, May 11, 2023 at 9:21 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, May 2, 2023 at 4:09 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > @@ -18878,7 +18882,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
> > >         env->prog = *prog;
> > >         env->ops = bpf_verifier_ops[env->prog->type];
> > >         env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
> > > -       is_priv = bpf_capable();
> > > +
> > > +       env->allow_ptr_leaks = bpf_allow_ptr_leaks(*prog);
> > > +       env->allow_uninit_stack = bpf_allow_uninit_stack(*prog);
> > > +       env->bypass_spec_v1 = bpf_bypass_spec_v1(*prog);
> > > +       env->bypass_spec_v4 = bpf_bypass_spec_v4(*prog);
> > > +       env->bpf_capable = is_priv = (*prog)->aux->bpf_capable;
> >
> > Just remembered that moving all CAP* checks early
> > (before they actually needed)
> > might be problematic.
> > See
> > https://lore.kernel.org/all/20230511142535.732324-10-cgzones@googlemail.com/
> >
> > This patch set is reducing the number of cap* checks which is
> > a good thing from audit pov, but it calls them early before the cap
> > is actually needed and that part is misleading for audit.
> > I'm afraid we cannot do one big switch for all map types after bpf_capable.
> > The bpf_capable for maps needs to be done on demand.
> > For progs we should also do it on demand too.
> > socket_filter and cg_skb should proceed without cap* checks.
>
> Ok, fair enough. With BPF token approach this shouldn't be a big deal anyways.
>
> Does patch #5 ("bpf: drop unnecessary bpf_capable() check in
> BPF_MAP_FREEZE command") look good? I think it makes sense to land
> either way, given any other map-related operation isn't privileged
> once user has map FD. I'll send it separately.

Yeah. I think the whole cleanup makes sense, just need to make
sure we do on-demand cap checks. I hope the patches won't change much.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-05-15 18:38 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-02 23:06 [PATCH bpf-next 00/10] Centralize BPF permission checks Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 01/10] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
2023-05-03 18:28   ` Stanislav Fomichev
2023-05-03 19:04     ` Andrii Nakryiko
2023-05-03 22:33       ` Stanislav Fomichev
2023-05-04 18:52         ` Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 02/10] bpf: inline map creation logic in map_create() function Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 03/10] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 04/10] bpf: remember if bpf_map was unprivileged and use that consistently Andrii Nakryiko
2023-05-04 20:05   ` Alexei Starovoitov
2023-05-04 22:51     ` Andrii Nakryiko
2023-05-04 22:54       ` Alexei Starovoitov
2023-05-04 23:06         ` Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 05/10] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 06/10] bpf: keep BPF_PROG_LOAD permission checks clear of validations Andrii Nakryiko
2023-05-04 20:12   ` Alexei Starovoitov
2023-05-04 22:51     ` Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 07/10] bpf: record effective capabilities at BPF prog load time Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 08/10] bpf: use recorded BPF prog effective caps when fetching helper protos Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 09/10] bpf: use recorded bpf_capable flag in JIT code Andrii Nakryiko
2023-05-04 22:09   ` Alexei Starovoitov
2023-05-04 22:51     ` Andrii Nakryiko
2023-05-02 23:06 ` [PATCH bpf-next 10/10] bpf: consistenly use program's recorded capabilities in BPF verifier Andrii Nakryiko
2023-05-04 22:20   ` Alexei Starovoitov
2023-05-04 22:51     ` Andrii Nakryiko
2023-05-05 19:08       ` Andrii Nakryiko
2023-05-05 19:55         ` Alexei Starovoitov
2023-05-11 16:21   ` Alexei Starovoitov
2023-05-15 16:42     ` Andrii Nakryiko
2023-05-15 18:38       ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.