All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 bpf-next 0/3] Introduce CAP_BPF
@ 2020-05-08 21:53 Alexei Starovoitov
  2020-05-08 21:53 ` [PATCH v5 bpf-next 1/3] bpf, capability: " Alexei Starovoitov
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-08 21:53 UTC (permalink / raw)
  To: davem
  Cc: daniel, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

From: Alexei Starovoitov <ast@kernel.org>

v4->v5:

Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of
CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN.

The user process has to have
- CAP_BPF and CAP_PERFMON to load tracing programs.
- CAP_BPF and CAP_NET_ADMIN to load networking programs.
(or CAP_SYS_ADMIN for backward compatibility).

CAP_BPF solves three main goals:
1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF.
   More on this below. This is the major difference vs v4 set back from Sep 2019.
2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN
   prevents pointer leaks and arbitrary kernel memory access.
3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs
   and making BPF infra more secure. Currently fuzzers run in unpriv.
   They will be able to run with CAP_BPF.

The patchset is long overdue follow-up from the last plumbers conference.
Comparing to what was discussed at LPC the CAP* checks at attach time are gone.
For tracing progs the CAP_SYS_ADMIN check was done at load time only. There was
no check at attach time. For networking and cgroup progs CAP_SYS_ADMIN was
required at load time and CAP_NET_ADMIN at attach time, but there are several
ways to bypass CAP_NET_ADMIN:
- if networking prog is using tail_call writing FD into prog_array will
  effectively attach it, but bpf_map_update_elem is an unprivileged operation.
- freplace prog with CAP_SYS_ADMIN can replace networking prog

Consolidating all CAP checks at load time makes security model similar to
open() syscall. Once the user got an FD it can do everything with it.
read/write/poll don't check permissions. The same way when bpf_prog_load
command returns an FD the user can do everything (including attaching,
detaching, and bpf_test_run).

The important design decision is to allow ID->FD transition for
CAP_SYS_ADMIN only. What it means that user processes can run
with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each
other unless they pass FDs via scm_rights or via pinning in bpffs.
ID->FD is a mechanism for human override and introspection.
An admin can do 'sudo bpftool prog ...'. It's possible to enforce via LSM that
only bpftool binary does bpf syscall with CAP_SYS_ADMIN and the rest of user
space processes do bpf syscall with CAP_BPF isolating bpf objects (progs, maps,
links) that are owned by such processes from each other.

Another significant change from LPC is that the verifier checks are split into
allow_ptr_leaks and bpf_capable flags. The allow_ptr_leaks disables spectre
defense and allows pointer manipulations while bpf_capable enables all modern
verifier features like bpf-to-bpf calls, BTF, bounded loops, indirect stack
access, dead code elimination, etc. All the goodness.
These flags are initialized as:
  env->allow_ptr_leaks = perfmon_capable();
  env->bpf_capable = bpf_capable();
That allows networking progs with CAP_BPF + CAP_NET_ADMIN enjoy modern
verifier features while being more secure.

Some networking progs may need CAP_BPF + CAP_NET_ADMIN + CAP_PERFMON,
since subtracting pointers (like skb->data_end - skb->data) is a pointer leak,
but the verifier may get smarter in the future.

Please see patches for more details.

Alexei Starovoitov (3):
  bpf, capability: Introduce CAP_BPF
  bpf: implement CAP_BPF
  selftests/bpf: use CAP_BPF and CAP_PERFMON in tests

 drivers/media/rc/bpf-lirc.c                   |  2 +-
 include/linux/bpf_verifier.h                  |  1 +
 include/linux/capability.h                    |  5 ++
 include/uapi/linux/capability.h               | 34 +++++++-
 kernel/bpf/arraymap.c                         |  2 +-
 kernel/bpf/bpf_struct_ops.c                   |  2 +-
 kernel/bpf/core.c                             |  4 +-
 kernel/bpf/cpumap.c                           |  2 +-
 kernel/bpf/hashtab.c                          |  4 +-
 kernel/bpf/helpers.c                          |  4 +-
 kernel/bpf/lpm_trie.c                         |  2 +-
 kernel/bpf/queue_stack_maps.c                 |  2 +-
 kernel/bpf/reuseport_array.c                  |  2 +-
 kernel/bpf/stackmap.c                         |  2 +-
 kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
 kernel/bpf/verifier.c                         | 24 ++---
 kernel/trace/bpf_trace.c                      |  3 +
 net/core/bpf_sk_storage.c                     |  4 +-
 net/core/filter.c                             |  4 +-
 security/selinux/include/classmap.h           |  4 +-
 tools/testing/selftests/bpf/test_verifier.c   | 44 ++++++++--
 tools/testing/selftests/bpf/verifier/calls.c  | 16 ++--
 .../selftests/bpf/verifier/dead_code.c        | 10 +--
 23 files changed, 191 insertions(+), 73 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5 bpf-next 1/3] bpf, capability: Introduce CAP_BPF
  2020-05-08 21:53 [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Alexei Starovoitov
@ 2020-05-08 21:53 ` Alexei Starovoitov
  2020-05-08 21:53 ` [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF Alexei Starovoitov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-08 21:53 UTC (permalink / raw)
  To: davem
  Cc: daniel, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

From: Alexei Starovoitov <ast@kernel.org>

Split BPF operations that are allowed under CAP_SYS_ADMIN into
combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.
For backward compatibility include them in CAP_SYS_ADMIN as well.

The end result provides simple safety model for applications that use BPF:
- to load tracing program types
  BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc}
  use CAP_BPF and CAP_PERFMON
- to load networking program types
  BPF_PROG_TYPE_{SCHED_CLS, XDP, CGROUP_SKB, SK_SKB, etc}
  use CAP_BPF and CAP_NET_ADMIN

There are few exceptions from this rule:
- bpf_trace_printk() is allowed in networking programs, but it's using
  tracing mechanism, hence this helper needs additional CAP_PERFMON
  if networking program is using this helper.
- BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only
  to discourage production use.
- BPF HW offload is allowed under CAP_SYS_ADMIN.
- bpf_probe_write_user() is allowed under CAP_SYS_ADMIN only.

CAPs are not checked at attach/detach time with two exceptions:
- loading BPF_PROG_TYPE_CGROUP_SKB is allowed for unprivileged users,
  hence CAP_NET_ADMIN is required at attach time.
- flow_dissector detach doesn't check prog FD at detach,
  hence CAP_NET_ADMIN is required at detach time.

CAP_SYS_ADMIN is required to iterate BPF objects (progs, maps, links) via get_next_id
command and convert them to file descriptor via GET_FD_BY_ID command.
This restriction guarantees that mutliple tasks with CAP_BPF are not able to
affect each other. That leads to clean isolation of tasks. For example:
task A with CAP_BPF and CAP_NET_ADMIN loads and attaches a firewall via bpf_link.
task B with the same capabilities cannot detach that firewall unless
task A explicitly passed link FD to task B via scm_rights or bpffs.
CAP_SYS_ADMIN can still detach/unload everything.

Two networking user apps with CAP_SYS_ADMIN and CAP_NET_ADMIN can
accidentely mess with each other programs and maps.
Two networking user apps with CAP_NET_ADMIN and CAP_BPF cannot affect each other.

CAP_NET_ADMIN + CAP_BPF allows networking programs access only packet data.
Such networking progs cannot access arbitrary kernel memory or leak pointers.

bpftool, bpftrace, bcc tools binaries should NOT be installed with
CAP_BPF and CAP_PERFMON, since unpriv users will be able to read kernel secrets.
But users with these two permissions will be able to use these tracing tools.

CAP_PERFMON is least secure, since it allows kprobes and kernel memory access.
CAP_NET_ADMIN can stop network traffic via iproute2.
CAP_BPF is the safest from security point of view and harmless on its own.

Having CAP_BPF and/or CAP_NET_ADMIN is not enough to write into arbitrary map
and if that map is used by firewall-like bpf prog.
CAP_BPF allows many bpf prog_load commands in parallel. The verifier
may consume large amount of memory and significantly slow down the system.

Existing unprivileged BPF operations are not affected.
In particular unprivileged users are allowed to load socket_filter and cg_skb
program types and to create array, hash, prog_array, map-in-map map types.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/capability.h          |  5 +++++
 include/uapi/linux/capability.h     | 34 ++++++++++++++++++++++++++++-
 security/selinux/include/classmap.h |  4 ++--
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 027d7e4a853b..b4345b38a6be 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -256,6 +256,11 @@ static inline bool perfmon_capable(void)
 	return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
 }
 
+static inline bool bpf_capable(void)
+{
+	return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
+}
+
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
 
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index e58c9636741b..c7372180a0a9 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -274,6 +274,7 @@ struct vfs_ns_cap_data {
    arbitrary SCSI commands */
 /* Allow setting encryption key on loopback filesystem */
 /* Allow setting zone reclaim policy */
+/* Allow everything under CAP_BPF and CAP_PERFMON for backward compatibility */
 
 #define CAP_SYS_ADMIN        21
 
@@ -374,7 +375,38 @@ struct vfs_ns_cap_data {
 
 #define CAP_PERFMON		38
 
-#define CAP_LAST_CAP         CAP_PERFMON
+/*
+ * CAP_BPF allows the following BPF operations:
+ * - Creating all types of BPF maps
+ * - Advanced verifier features
+ *   - Indirect variable access
+ *   - Bounded loops
+ *   - BPF to BPF function calls
+ *   - Scalar precision tracking
+ *   - Larger complexity limits
+ *   - Dead code elimination
+ *   - And potentially other features
+ * - Loading BPF Type Format (BTF) data
+ * - Retrieve xlated and JITed code of BPF programs
+ * - Use bpf_spin_lock() helper
+ *
+ * CAP_PERFMON relaxes the verifier checks further:
+ * - BPF progs can use of pointer-to-integer conversions
+ * - speculation attack hardening measures are bypassed
+ * - bpf_probe_read to read arbitrary kernel memory is allowed
+ * - bpf_trace_printk to print kernel memory is allowed
+ *
+ * CAP_SYS_ADMIN is required to use bpf_probe_write_user.
+ *
+ * CAP_SYS_ADMIN is required to iterate system wide loaded
+ * programs, maps, links, BTFs and convert their IDs to file descriptors.
+ *
+ * CAP_PERFMON and CAP_BPF are required to load tracing programs.
+ * CAP_NET_ADMIN and CAP_BPF are required to load networking programs.
+ */
+#define CAP_BPF			39
+
+#define CAP_LAST_CAP         CAP_BPF
 
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
 
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index d233ab3f1533..98e1513b608a 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,9 @@
 	    "audit_control", "setfcap"
 
 #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
-		"wake_alarm", "block_suspend", "audit_read", "perfmon"
+		"wake_alarm", "block_suspend", "audit_read", "perfmon", "bpf"
 
-#if CAP_LAST_CAP > CAP_PERFMON
+#if CAP_LAST_CAP > CAP_BPF
 #error New capability defined, please update COMMON_CAP2_PERMS.
 #endif
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-08 21:53 [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Alexei Starovoitov
  2020-05-08 21:53 ` [PATCH v5 bpf-next 1/3] bpf, capability: " Alexei Starovoitov
@ 2020-05-08 21:53 ` Alexei Starovoitov
  2020-05-12  0:12   ` sdf
                     ` (3 more replies)
  2020-05-08 21:53 ` [PATCH v5 bpf-next 3/3] selftests/bpf: use CAP_BPF and CAP_PERFMON in tests Alexei Starovoitov
  2020-05-08 22:45 ` [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Casey Schaufler
  3 siblings, 4 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-08 21:53 UTC (permalink / raw)
  To: davem
  Cc: daniel, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

From: Alexei Starovoitov <ast@kernel.org>

Implement permissions as stated in uapi/linux/capability.h
In order to do that the verifier allow_ptr_leaks flag is split
into allow_ptr_leaks and bpf_capable flags and they are set as:
  env->allow_ptr_leaks = perfmon_capable();
  env->bpf_capable = bpf_capable();

bpf_capable enables bounded loops, variable stack access and other verifier features.
allow_ptr_leaks enable ptr leaks, ptr conversions, subtraction of pointers, etc.
It also disables side channel mitigations.

That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will
have speculative checks done by the verifier and other spectre mitigation applied.
Such networking BPF program will not be able to leak kernel pointers.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/media/rc/bpf-lirc.c   |  2 +-
 include/linux/bpf_verifier.h  |  1 +
 kernel/bpf/arraymap.c         |  2 +-
 kernel/bpf/bpf_struct_ops.c   |  2 +-
 kernel/bpf/core.c             |  4 +-
 kernel/bpf/cpumap.c           |  2 +-
 kernel/bpf/hashtab.c          |  4 +-
 kernel/bpf/helpers.c          |  4 +-
 kernel/bpf/lpm_trie.c         |  2 +-
 kernel/bpf/queue_stack_maps.c |  2 +-
 kernel/bpf/reuseport_array.c  |  2 +-
 kernel/bpf/stackmap.c         |  2 +-
 kernel/bpf/syscall.c          | 87 ++++++++++++++++++++++++++---------
 kernel/bpf/verifier.c         | 24 +++++-----
 kernel/trace/bpf_trace.c      |  3 ++
 net/core/bpf_sk_storage.c     |  4 +-
 net/core/filter.c             |  4 +-
 17 files changed, 102 insertions(+), 49 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index 069c42f22a8c..5bb144435c16 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_trace_printk:
-		if (capable(CAP_SYS_ADMIN))
+		if (perfmon_capable())
 			return bpf_get_trace_printk_proto();
 		/* fall through */
 	default:
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 6abd5a778fcd..c32a7880fa62 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -375,6 +375,7 @@ struct bpf_verifier_env {
 	u32 used_map_cnt;		/* number of used maps */
 	u32 id_gen;			/* used to generate unique reg IDs */
 	bool allow_ptr_leaks;
+	bool bpf_capable;
 	bool seen_direct_write;
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	const struct bpf_line_info *prev_linfo;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 95d77770353c..264a9254dc39 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -77,7 +77,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
 	int ret, numa_node = bpf_map_attr_numa_node(attr);
 	u32 elem_size, index_mask, max_entries;
-	bool unpriv = !capable(CAP_SYS_ADMIN);
+	bool unpriv = !bpf_capable();
 	u64 cost, array_size, mask64;
 	struct bpf_map_memory mem;
 	struct bpf_array *array;
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 26cb51f2db72..c6b0decaa46a 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -557,7 +557,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 	struct bpf_map *map;
 	int err;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
 
 	st_ops = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 6aa11de67315..8f421dd0c4cf 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -646,7 +646,7 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
 	if (!bpf_prog_kallsyms_candidate(fp) ||
-	    !capable(CAP_SYS_ADMIN))
+	    !bpf_capable())
 		return;
 
 	bpf_prog_ksym_set_addr(fp);
@@ -824,7 +824,7 @@ static int bpf_jit_charge_modmem(u32 pages)
 {
 	if (atomic_long_add_return(pages, &bpf_jit_current) >
 	    (bpf_jit_limit >> PAGE_SHIFT)) {
-		if (!capable(CAP_SYS_ADMIN)) {
+		if (!bpf_capable()) {
 			atomic_long_sub(pages, &bpf_jit_current);
 			return -EPERM;
 		}
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 3fe0b006d2d2..be8fdceefe55 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -85,7 +85,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	u64 cost;
 	int ret;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
 
 	/* check sanity of attributes */
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d541c8486c95..b4b288a3c3c9 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -359,9 +359,9 @@ static int htab_map_alloc_check(union bpf_attr *attr)
 	BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
 		     offsetof(struct htab_elem, hash_node.pprev));
 
-	if (lru && !capable(CAP_SYS_ADMIN))
+	if (lru && !bpf_capable())
 		/* LRU implementation is much complicated than other
-		 * maps.  Hence, limit to CAP_SYS_ADMIN for now.
+		 * maps.  Hence, limit to CAP_BPF.
 		 */
 		return -EPERM;
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5c0290e0696e..886949fdcece 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -633,7 +633,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return NULL;
 
 	switch (func_id) {
@@ -642,6 +642,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_spin_unlock:
 		return &bpf_spin_unlock_proto;
 	case BPF_FUNC_trace_printk:
+		if (!perfmon_capable())
+			return NULL;
 		return bpf_get_trace_printk_proto();
 	case BPF_FUNC_jiffies64:
 		return &bpf_jiffies64_proto;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 65c236cf341e..c8cc4e4cf98d 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -543,7 +543,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	u64 cost = sizeof(*trie), cost_per_node;
 	int ret;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
 
 	/* check sanity of attributes */
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index f697647ceb54..24d244daceff 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -45,7 +45,7 @@ static bool queue_stack_map_is_full(struct bpf_queue_stack *qs)
 /* Called from syscall */
 static int queue_stack_map_alloc_check(union bpf_attr *attr)
 {
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return -EPERM;
 
 	/* check sanity of attributes */
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 01badd3eda7a..21cde24386db 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -154,7 +154,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 	struct bpf_map_memory mem;
 	u64 array_size;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
 
 	array_size = sizeof(*array);
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index db76339fe358..7b8381ce40a0 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -93,7 +93,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	u64 cost, n_buckets;
 	int err;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
 
 	if (attr->map_flags & ~STACK_CREATE_FLAG_MASK)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bb1ab7da6103..5b8782c29cf9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1534,7 +1534,7 @@ static int map_freeze(const union bpf_attr *attr)
 		err = -EBUSY;
 		goto err_put;
 	}
-	if (!capable(CAP_SYS_ADMIN)) {
+	if (!bpf_capable()) {
 		err = -EPERM;
 		goto err_put;
 	}
@@ -2009,6 +2009,53 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 	}
 }
 
+static bool is_net_admin_prog_type(enum bpf_prog_type prog_type)
+{
+	switch (prog_type) {
+	case BPF_PROG_TYPE_SCHED_CLS:
+	case BPF_PROG_TYPE_SCHED_ACT:
+	case BPF_PROG_TYPE_XDP:
+	case BPF_PROG_TYPE_LWT_IN:
+	case BPF_PROG_TYPE_LWT_OUT:
+	case BPF_PROG_TYPE_LWT_XMIT:
+	case BPF_PROG_TYPE_LWT_SEG6LOCAL:
+	case BPF_PROG_TYPE_SK_SKB:
+	case BPF_PROG_TYPE_SK_MSG:
+	case BPF_PROG_TYPE_SK_REUSEPORT:
+	case BPF_PROG_TYPE_LIRC_MODE2:
+	case BPF_PROG_TYPE_FLOW_DISSECTOR:
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_EXT: /* extends any prog */
+		return true;
+	case BPF_PROG_TYPE_CGROUP_SKB: /* always unpriv */
+	default:
+		return false;
+	}
+}
+
+static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
+{
+	switch (prog_type) {
+	case BPF_PROG_TYPE_KPROBE:
+	case BPF_PROG_TYPE_TRACEPOINT:
+	case BPF_PROG_TYPE_PERF_EVENT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
+	case BPF_PROG_TYPE_TRACING:
+	case BPF_PROG_TYPE_LSM:
+	case BPF_PROG_TYPE_STRUCT_OPS: /* has access to struct sock */
+	case BPF_PROG_TYPE_EXT: /* extends any prog */
+		return true;
+	default:
+		return false;
+	}
+}
+
 /* last field in 'union bpf_attr' used by this command */
 #define	BPF_PROG_LOAD_LAST_FIELD attach_prog_fd
 
@@ -2031,7 +2078,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
-	    !capable(CAP_SYS_ADMIN))
+	    !bpf_capable())
 		return -EPERM;
 
 	/* copy eBPF program license from user space */
@@ -2044,11 +2091,16 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	is_gpl = license_is_gpl_compatible(license);
 
 	if (attr->insn_cnt == 0 ||
-	    attr->insn_cnt > (capable(CAP_SYS_ADMIN) ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
+	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
 		return -E2BIG;
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
-	    !capable(CAP_SYS_ADMIN))
+	    !bpf_capable())
+		return -EPERM;
+
+	if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN))
+		return -EPERM;
+	if (is_perfmon_prog_type(type) && !perfmon_capable())
 		return -EPERM;
 
 	bpf_prog_load_fixup_attach_type(attr);
@@ -2682,6 +2734,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
+		if (!capable(CAP_NET_ADMIN))
+			/* cg-skb progs can be loaded by unpriv user.
+			 * check permissions at attach time.
+			 */
+			return -EPERM;
 		return prog->enforce_expected_attach_type &&
 			prog->expected_attach_type != attach_type ?
 			-EINVAL : 0;
@@ -2745,9 +2802,6 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	struct bpf_prog *prog;
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
 	if (CHECK_ATTR(BPF_PROG_ATTACH))
 		return -EINVAL;
 
@@ -2802,9 +2856,6 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 {
 	enum bpf_prog_type ptype;
 
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
 	if (CHECK_ATTR(BPF_PROG_DETACH))
 		return -EINVAL;
 
@@ -2817,6 +2868,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_LIRC_MODE2:
 		return lirc_prog_detach(attr);
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
 		return skb_flow_dissector_bpf_prog_detach(attr);
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
@@ -2880,8 +2933,6 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
 	struct bpf_prog *prog;
 	int ret = -ENOTSUPP;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
 	if (CHECK_ATTR(BPF_PROG_TEST_RUN))
 		return -EINVAL;
 
@@ -3163,7 +3214,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 	info.run_time_ns = stats.nsecs;
 	info.run_cnt = stats.cnt;
 
-	if (!capable(CAP_SYS_ADMIN)) {
+	if (!bpf_capable()) {
 		info.jited_prog_len = 0;
 		info.xlated_prog_len = 0;
 		info.nr_jited_ksyms = 0;
@@ -3522,7 +3573,7 @@ static int bpf_btf_load(const union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_BTF_LOAD))
 		return -EINVAL;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return -EPERM;
 
 	return btf_new_fd(attr);
@@ -3736,9 +3787,6 @@ static int link_create(union bpf_attr *attr)
 	struct bpf_prog *prog;
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
 	if (CHECK_ATTR(BPF_LINK_CREATE))
 		return -EINVAL;
 
@@ -3784,9 +3832,6 @@ static int link_update(union bpf_attr *attr)
 	u32 flags;
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
 	if (CHECK_ATTR(BPF_LINK_UPDATE))
 		return -EINVAL;
 
@@ -3932,7 +3977,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	union bpf_attr attr;
 	int err;
 
-	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
+	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
 		return -EPERM;
 
 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 70ad009577f8..a6893746cd87 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1293,7 +1293,7 @@ static void __mark_reg_unknown(const struct bpf_verifier_env *env,
 	reg->type = SCALAR_VALUE;
 	reg->var_off = tnum_unknown;
 	reg->frameno = 0;
-	reg->precise = env->subprog_cnt > 1 || !env->allow_ptr_leaks;
+	reg->precise = env->subprog_cnt > 1 || !env->bpf_capable;
 	__mark_reg_unbounded(reg);
 }
 
@@ -1425,8 +1425,9 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			continue;
 		if (insn[i].src_reg != BPF_PSEUDO_CALL)
 			continue;
-		if (!env->allow_ptr_leaks) {
-			verbose(env, "function calls to other bpf functions are allowed for root only\n");
+		if (!env->bpf_capable) {
+			verbose(env,
+				"function calls to other bpf functions are allowed for CAP_BPF and CAP_SYS_ADMIN\n");
 			return -EPERM;
 		}
 		ret = add_subprog(env, i + insn[i].imm + 1);
@@ -1960,7 +1961,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 	bool new_marks = false;
 	int i, err;
 
-	if (!env->allow_ptr_leaks)
+	if (!env->bpf_capable)
 		/* backtracking is root only for now */
 		return 0;
 
@@ -2208,7 +2209,7 @@ static int check_stack_write(struct bpf_verifier_env *env,
 		reg = &cur->regs[value_regno];
 
 	if (reg && size == BPF_REG_SIZE && register_is_const(reg) &&
-	    !register_is_null(reg) && env->allow_ptr_leaks) {
+	    !register_is_null(reg) && env->bpf_capable) {
 		if (dst_reg != BPF_REG_FP) {
 			/* The backtracking logic can only recognize explicit
 			 * stack slot address like [fp - 8]. Other spill of
@@ -3428,7 +3429,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 		 * Spectre masking for stack ALU.
 		 * See also retrieve_ptr_limit().
 		 */
-		if (!env->allow_ptr_leaks) {
+		if (!env->bpf_capable) {
 			char tn_buf[48];
 
 			tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
@@ -7229,7 +7230,7 @@ static int push_insn(int t, int w, int e, struct bpf_verifier_env *env,
 		insn_stack[env->cfg.cur_stack++] = w;
 		return 1;
 	} else if ((insn_state[w] & 0xF0) == DISCOVERED) {
-		if (loop_ok && env->allow_ptr_leaks)
+		if (loop_ok && env->bpf_capable)
 			return 0;
 		verbose_linfo(env, t, "%d: ", t);
 		verbose_linfo(env, w, "%d: ", w);
@@ -8338,7 +8339,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 	if (env->max_states_per_insn < states_cnt)
 		env->max_states_per_insn = states_cnt;
 
-	if (!env->allow_ptr_leaks && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
+	if (!env->bpf_capable && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
 		return push_jmp_history(env, cur);
 
 	if (!add_new_state)
@@ -9998,7 +9999,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			insn->code = BPF_JMP | BPF_TAIL_CALL;
 
 			aux = &env->insn_aux_data[i + delta];
-			if (env->allow_ptr_leaks && !expect_blinding &&
+			if (env->bpf_capable && !expect_blinding &&
 			    prog->jit_requested &&
 			    !bpf_map_key_poisoned(aux) &&
 			    !bpf_map_ptr_poisoned(aux) &&
@@ -10725,7 +10726,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 		env->insn_aux_data[i].orig_idx = i;
 	env->prog = *prog;
 	env->ops = bpf_verifier_ops[env->prog->type];
-	is_priv = capable(CAP_SYS_ADMIN);
+	is_priv = bpf_capable();
 
 	if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
 		mutex_lock(&bpf_verifier_lock);
@@ -10766,7 +10767,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
 		env->strict_alignment = false;
 
-	env->allow_ptr_leaks = is_priv;
+	env->allow_ptr_leaks = perfmon_capable();
+	env->bpf_capable = bpf_capable();
 
 	if (is_priv)
 		env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e875c95d3ced..58c51837fa18 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -315,6 +315,9 @@ static const struct bpf_func_proto bpf_probe_write_user_proto = {
 
 static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
 {
+	if (!capable(CAP_SYS_ADMIN))
+		return NULL;
+
 	pr_warn_ratelimited("%s[%d] is installing a program with bpf_probe_write_user helper that may corrupt user memory!",
 			    current->comm, task_pid_nr(current));
 
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 756b63b6f7b3..d2c4d16dadba 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -625,7 +625,7 @@ static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
 	    !attr->btf_key_type_id || !attr->btf_value_type_id)
 		return -EINVAL;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return -EPERM;
 
 	if (attr->value_size > MAX_VALUE_SIZE)
@@ -978,7 +978,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
 	/* bpf_sk_storage_map is currently limited to CAP_SYS_ADMIN as
 	 * the map_alloc_check() side also does.
 	 */
-	if (!capable(CAP_SYS_ADMIN))
+	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
 
 	nla_for_each_nested(nla, nla_stgs, rem) {
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..121a7ad404a8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6626,7 +6626,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		return false;
 	case bpf_ctx_range(struct __sk_buff, data):
 	case bpf_ctx_range(struct __sk_buff, data_end):
-		if (!capable(CAP_SYS_ADMIN))
+		if (!bpf_capable())
 			return false;
 		break;
 	}
@@ -6638,7 +6638,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
 			break;
 		case bpf_ctx_range(struct __sk_buff, tstamp):
-			if (!capable(CAP_SYS_ADMIN))
+			if (!bpf_capable())
 				return false;
 			break;
 		default:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 bpf-next 3/3] selftests/bpf: use CAP_BPF and CAP_PERFMON in tests
  2020-05-08 21:53 [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Alexei Starovoitov
  2020-05-08 21:53 ` [PATCH v5 bpf-next 1/3] bpf, capability: " Alexei Starovoitov
  2020-05-08 21:53 ` [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF Alexei Starovoitov
@ 2020-05-08 21:53 ` Alexei Starovoitov
  2020-05-08 22:45 ` [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Casey Schaufler
  3 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-08 21:53 UTC (permalink / raw)
  To: davem
  Cc: daniel, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

From: Alexei Starovoitov <ast@kernel.org>

Make all test_verifier test exercise CAP_BPF and CAP_PERFMON

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/test_verifier.c   | 44 +++++++++++++++----
 tools/testing/selftests/bpf/verifier/calls.c  | 16 +++----
 .../selftests/bpf/verifier/dead_code.c        | 10 ++---
 3 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 21a1ce219c1c..78a6bae56ea6 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -818,10 +818,18 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	}
 }
 
+struct libcap {
+	struct __user_cap_header_struct hdr;
+	struct __user_cap_data_struct data[2];
+};
+
 static int set_admin(bool admin)
 {
 	cap_t caps;
-	const cap_value_t cap_val = CAP_SYS_ADMIN;
+	/* need CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON to load progs */
+	const cap_value_t cap_net_admin = CAP_NET_ADMIN;
+	const cap_value_t cap_sys_admin = CAP_SYS_ADMIN;
+	struct libcap *cap;
 	int ret = -1;
 
 	caps = cap_get_proc();
@@ -829,11 +837,26 @@ static int set_admin(bool admin)
 		perror("cap_get_proc");
 		return -1;
 	}
-	if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_val,
+	cap = (struct libcap *)caps;
+	if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_sys_admin, CAP_CLEAR)) {
+		perror("cap_set_flag clear admin");
+		goto out;
+	}
+	if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_net_admin,
 				admin ? CAP_SET : CAP_CLEAR)) {
-		perror("cap_set_flag");
+		perror("cap_set_flag set_or_clear net");
 		goto out;
 	}
+	/* libcap is likely old and simply ignores CAP_BPF and CAP_PERFMON,
+	 * so update effective bits manually
+	 */
+	if (admin) {
+		cap->data[1].effective |= 1 << (38 /* CAP_PERFMON */ - 32);
+		cap->data[1].effective |= 1 << (39 /* CAP_BPF */ - 32);
+	} else {
+		cap->data[1].effective &= ~(1 << (38 - 32));
+		cap->data[1].effective &= ~(1 << (39 - 32));
+	}
 	if (cap_set_proc(caps)) {
 		perror("cap_set_proc");
 		goto out;
@@ -1067,9 +1090,11 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 
 static bool is_admin(void)
 {
+	cap_flag_value_t net_priv = CAP_CLEAR;
+	bool perfmon_priv = false;
+	bool bpf_priv = false;
+	struct libcap *cap;
 	cap_t caps;
-	cap_flag_value_t sysadmin = CAP_CLEAR;
-	const cap_value_t cap_val = CAP_SYS_ADMIN;
 
 #ifdef CAP_IS_SUPPORTED
 	if (!CAP_IS_SUPPORTED(CAP_SETFCAP)) {
@@ -1082,11 +1107,14 @@ static bool is_admin(void)
 		perror("cap_get_proc");
 		return false;
 	}
-	if (cap_get_flag(caps, cap_val, CAP_EFFECTIVE, &sysadmin))
-		perror("cap_get_flag");
+	cap = (struct libcap *)caps;
+	bpf_priv = cap->data[1].effective & (1 << (39/* CAP_BPF */ - 32));
+	perfmon_priv = cap->data[1].effective & (1 << (38/* CAP_PERFMON */ - 32));
+	if (cap_get_flag(caps, CAP_NET_ADMIN, CAP_EFFECTIVE, &net_priv))
+		perror("cap_get_flag NET");
 	if (cap_free(caps))
 		perror("cap_free");
-	return (sysadmin == CAP_SET);
+	return bpf_priv && perfmon_priv && net_priv == CAP_SET;
 }
 
 static void get_unpriv_disabled()
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index 2d752c4f8d9d..7629a0cebb9b 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -19,7 +19,7 @@
 	BPF_MOV64_IMM(BPF_REG_0, 2),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = 1,
@@ -315,7 +315,7 @@
 	BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "allowed for root only",
+	.errstr_unpriv = "allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = POINTER_VALUE,
@@ -346,7 +346,7 @@
 	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "allowed for root only",
+	.errstr_unpriv = "allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = TEST_DATA_LEN + TEST_DATA_LEN - ETH_HLEN - ETH_HLEN,
@@ -397,7 +397,7 @@
 	BPF_MOV64_IMM(BPF_REG_0, 1),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.fixup_map_hash_48b = { 3 },
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
@@ -1064,7 +1064,7 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "allowed for root only",
+	.errstr_unpriv = "allowed for",
 	.result_unpriv = REJECT,
 	.errstr = "R0 !read_ok",
 	.result = REJECT,
@@ -1977,7 +1977,7 @@
 	BPF_EXIT_INSN(),
 	},
 	.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 },
@@ -2003,7 +2003,7 @@
 	BPF_EXIT_INSN(),
 	},
 	.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.errstr = "!read_ok",
 	.result = REJECT,
 },
@@ -2028,7 +2028,7 @@
 	BPF_EXIT_INSN(),
 	},
 	.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.errstr = "!read_ok",
 	.result = REJECT,
 },
diff --git a/tools/testing/selftests/bpf/verifier/dead_code.c b/tools/testing/selftests/bpf/verifier/dead_code.c
index 50a8a63be4ac..5cf361d8eb1c 100644
--- a/tools/testing/selftests/bpf/verifier/dead_code.c
+++ b/tools/testing/selftests/bpf/verifier/dead_code.c
@@ -85,7 +85,7 @@
 	BPF_MOV64_IMM(BPF_REG_0, 12),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = 7,
@@ -103,7 +103,7 @@
 	BPF_MOV64_IMM(BPF_REG_0, 12),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = 7,
@@ -121,7 +121,7 @@
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -5),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = 7,
@@ -137,7 +137,7 @@
 	BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = 2,
@@ -152,7 +152,7 @@
 	BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
 	BPF_EXIT_INSN(),
 	},
-	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.errstr_unpriv = "function calls to other bpf functions are allowed for",
 	.result_unpriv = REJECT,
 	.result = ACCEPT,
 	.retval = 2,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 0/3] Introduce CAP_BPF
  2020-05-08 21:53 [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2020-05-08 21:53 ` [PATCH v5 bpf-next 3/3] selftests/bpf: use CAP_BPF and CAP_PERFMON in tests Alexei Starovoitov
@ 2020-05-08 22:45 ` Casey Schaufler
  2020-05-08 23:00   ` Alexei Starovoitov
  3 siblings, 1 reply; 21+ messages in thread
From: Casey Schaufler @ 2020-05-08 22:45 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: daniel, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh, Casey Schaufler

On 5/8/2020 2:53 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
>
> v4->v5:
>
> Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of
> CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN.
>
> The user process has to have
> - CAP_BPF and CAP_PERFMON to load tracing programs.
> - CAP_BPF and CAP_NET_ADMIN to load networking programs.
> (or CAP_SYS_ADMIN for backward compatibility).

Is there a case where CAP_BPF is useful in the absence of other capabilities?
I generally object to new capabilities in cases where existing capabilities
are already required.

>
> CAP_BPF solves three main goals:
> 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF.
>    More on this below. This is the major difference vs v4 set back from Sep 2019.
> 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN
>    prevents pointer leaks and arbitrary kernel memory access.
> 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs
>    and making BPF infra more secure. Currently fuzzers run in unpriv.
>    They will be able to run with CAP_BPF.
>
> The patchset is long overdue follow-up from the last plumbers conference.
> Comparing to what was discussed at LPC the CAP* checks at attach time are gone.
> For tracing progs the CAP_SYS_ADMIN check was done at load time only. There was
> no check at attach time. For networking and cgroup progs CAP_SYS_ADMIN was
> required at load time and CAP_NET_ADMIN at attach time, but there are several
> ways to bypass CAP_NET_ADMIN:
> - if networking prog is using tail_call writing FD into prog_array will
>   effectively attach it, but bpf_map_update_elem is an unprivileged operation.
> - freplace prog with CAP_SYS_ADMIN can replace networking prog
>
> Consolidating all CAP checks at load time makes security model similar to
> open() syscall. Once the user got an FD it can do everything with it.
> read/write/poll don't check permissions. The same way when bpf_prog_load
> command returns an FD the user can do everything (including attaching,
> detaching, and bpf_test_run).
>
> The important design decision is to allow ID->FD transition for
> CAP_SYS_ADMIN only. What it means that user processes can run
> with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each
> other unless they pass FDs via scm_rights or via pinning in bpffs.
> ID->FD is a mechanism for human override and introspection.
> An admin can do 'sudo bpftool prog ...'. It's possible to enforce via LSM that
> only bpftool binary does bpf syscall with CAP_SYS_ADMIN and the rest of user
> space processes do bpf syscall with CAP_BPF isolating bpf objects (progs, maps,
> links) that are owned by such processes from each other.
>
> Another significant change from LPC is that the verifier checks are split into
> allow_ptr_leaks and bpf_capable flags. The allow_ptr_leaks disables spectre
> defense and allows pointer manipulations while bpf_capable enables all modern
> verifier features like bpf-to-bpf calls, BTF, bounded loops, indirect stack
> access, dead code elimination, etc. All the goodness.
> These flags are initialized as:
>   env->allow_ptr_leaks = perfmon_capable();
>   env->bpf_capable = bpf_capable();
> That allows networking progs with CAP_BPF + CAP_NET_ADMIN enjoy modern
> verifier features while being more secure.
>
> Some networking progs may need CAP_BPF + CAP_NET_ADMIN + CAP_PERFMON,
> since subtracting pointers (like skb->data_end - skb->data) is a pointer leak,
> but the verifier may get smarter in the future.
>
> Please see patches for more details.
>
> Alexei Starovoitov (3):
>   bpf, capability: Introduce CAP_BPF
>   bpf: implement CAP_BPF
>   selftests/bpf: use CAP_BPF and CAP_PERFMON in tests
>
>  drivers/media/rc/bpf-lirc.c                   |  2 +-
>  include/linux/bpf_verifier.h                  |  1 +
>  include/linux/capability.h                    |  5 ++
>  include/uapi/linux/capability.h               | 34 +++++++-
>  kernel/bpf/arraymap.c                         |  2 +-
>  kernel/bpf/bpf_struct_ops.c                   |  2 +-
>  kernel/bpf/core.c                             |  4 +-
>  kernel/bpf/cpumap.c                           |  2 +-
>  kernel/bpf/hashtab.c                          |  4 +-
>  kernel/bpf/helpers.c                          |  4 +-
>  kernel/bpf/lpm_trie.c                         |  2 +-
>  kernel/bpf/queue_stack_maps.c                 |  2 +-
>  kernel/bpf/reuseport_array.c                  |  2 +-
>  kernel/bpf/stackmap.c                         |  2 +-
>  kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
>  kernel/bpf/verifier.c                         | 24 ++---
>  kernel/trace/bpf_trace.c                      |  3 +
>  net/core/bpf_sk_storage.c                     |  4 +-
>  net/core/filter.c                             |  4 +-
>  security/selinux/include/classmap.h           |  4 +-
>  tools/testing/selftests/bpf/test_verifier.c   | 44 ++++++++--
>  tools/testing/selftests/bpf/verifier/calls.c  | 16 ++--
>  .../selftests/bpf/verifier/dead_code.c        | 10 +--
>  23 files changed, 191 insertions(+), 73 deletions(-)
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 0/3] Introduce CAP_BPF
  2020-05-08 22:45 ` [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Casey Schaufler
@ 2020-05-08 23:00   ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-08 23:00 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: davem, daniel, netdev, bpf, kernel-team, linux-security-module,
	acme, jamorris, jannh, kpsingh

On Fri, May 08, 2020 at 03:45:36PM -0700, Casey Schaufler wrote:
> On 5/8/2020 2:53 PM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > v4->v5:
> >
> > Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of
> > CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN.
> >
> > The user process has to have
> > - CAP_BPF and CAP_PERFMON to load tracing programs.
> > - CAP_BPF and CAP_NET_ADMIN to load networking programs.
> > (or CAP_SYS_ADMIN for backward compatibility).
> 
> Is there a case where CAP_BPF is useful in the absence of other capabilities?
> I generally object to new capabilities in cases where existing capabilities
> are already required.

You mean beyond what is written about CAP_BPF in include/uapi/linux/capability.h in patch 1?
There are prog types that are neither tracing nor networking.
Like LIRC2 and cgroup-device are not, but they were put under CAP_SYS_ADMIN + CAP_NET_ADMIN
because there was no CAP_BPF. This patch keeps them under CAP_BPF + CAP_NET_ADMIN for now.
May be that can be relaxed later. For sure future prog types won't have to deal with
such binary decision.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-08 21:53 ` [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF Alexei Starovoitov
@ 2020-05-12  0:12   ` sdf
  2020-05-12  2:36     ` Alexei Starovoitov
  2020-05-12 14:35   ` Daniel Borkmann
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: sdf @ 2020-05-12  0:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, netdev, bpf, kernel-team, linux-security-module,
	acme, jamorris, jannh, kpsingh

On 05/08, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
[..]
> @@ -3932,7 +3977,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr  
> __user *, uattr, unsigned int, siz
>   	union bpf_attr attr;
>   	int err;

> -	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> +	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>   		return -EPERM;
This is awesome, thanks for reviving the effort!

One question I have about this particular snippet:
Does it make sense to drop bpf_capable checks for the operations
that work on a provided fd?

The use-case I have in mind is as follows:
* privileged (CAP_BPF) process loads the programs/maps and pins
   them at some known location
* unprivileged process opens up those pins and does the following:
   * prepares the maps (and will later on read them)
   * does SO_ATTACH_BPF/SO_ATTACH_REUSEPORT_EBPF which afaik don't
     require any capabilities

This essentially pushes some of the permission checks into a fs layer. So
whoever has a file descriptor (via unix sock or open) can do BPF operations
on the object that represents it.

Thoughts? Am I missing something important?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12  0:12   ` sdf
@ 2020-05-12  2:36     ` Alexei Starovoitov
  2020-05-12 12:50       ` Jordan Glover
  2020-05-12 15:54       ` sdf
  0 siblings, 2 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-12  2:36 UTC (permalink / raw)
  To: sdf
  Cc: davem, daniel, netdev, bpf, kernel-team, linux-security-module,
	acme, jamorris, jannh, kpsingh

On Mon, May 11, 2020 at 05:12:10PM -0700, sdf@google.com wrote:
> On 05/08, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> [..]
> > @@ -3932,7 +3977,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr
> > __user *, uattr, unsigned int, siz
> >   	union bpf_attr attr;
> >   	int err;
> 
> > -	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> > +	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> >   		return -EPERM;
> This is awesome, thanks for reviving the effort!
> 
> One question I have about this particular snippet:
> Does it make sense to drop bpf_capable checks for the operations
> that work on a provided fd?

Above snippet is for the case when sysctl switches unpriv off.
It was a big hammer and stays big hammer.
I certainly would like to improve the situation, but I suspect
the folks who turn that sysctl knob on are simply paranoid about bpf
and no amount of reasoning would turn them around.

> The use-case I have in mind is as follows:
> * privileged (CAP_BPF) process loads the programs/maps and pins
>   them at some known location
> * unprivileged process opens up those pins and does the following:
>   * prepares the maps (and will later on read them)
>   * does SO_ATTACH_BPF/SO_ATTACH_REUSEPORT_EBPF which afaik don't
>     require any capabilities
> 
> This essentially pushes some of the permission checks into a fs layer. So
> whoever has a file descriptor (via unix sock or open) can do BPF operations
> on the object that represents it.

cap_bpf doesn't change things in that regard.
Two cases here:
sysctl_unprivileged_bpf_disabled==0:
  Unpriv can load socket_filter prog type and unpriv can attach it
  via SO_ATTACH_BPF/SO_ATTACH_REUSEPORT_EBPF.
sysctl_unprivileged_bpf_disabled==1:
  cap_sys_admin can load socket_filter and unpriv can attach it.

With addition of cap_bpf in the second case cap_bpf process can
load socket_filter too.
It doesn't mean that permissions are pushed into fs layer.
I'm not sure that relaxing of sysctl_unprivileged_bpf_disabled
will be well received.
Are you proposing to selectively allow certain bpf syscall commands
even when sysctl_unprivileged_bpf_disabled==1 ?
Like allow unpriv to do BPF_OBJ_GET to get an fd from bpffs ?
And allow unpriv to do map_update ? 
It makes complete sense to me, but I'd like to argue about that
independently from this cap_bpf set.
We can relax that sysctl later.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12  2:36     ` Alexei Starovoitov
@ 2020-05-12 12:50       ` Jordan Glover
  2020-05-12 15:46         ` Alexei Starovoitov
  2020-05-12 15:54       ` sdf
  1 sibling, 1 reply; 21+ messages in thread
From: Jordan Glover @ 2020-05-12 12:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: sdf, davem, daniel, netdev, bpf, kernel-team,
	linux-security-module, acme, jamorris, jannh, kpsingh

On Tuesday, May 12, 2020 2:36 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, May 11, 2020 at 05:12:10PM -0700, sdf@google.com wrote:
>
> > On 05/08, Alexei Starovoitov wrote:
> >
> > > From: Alexei Starovoitov ast@kernel.org
> > > [..]
> > > @@ -3932,7 +3977,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr
> > > __user *, uattr, unsigned int, siz
> > > union bpf_attr attr;
> > > int err;
> >
> > > -   if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> > >
> > > -   if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > >     return -EPERM;
> > >     This is awesome, thanks for reviving the effort!
> > >
> >
> > One question I have about this particular snippet:
> > Does it make sense to drop bpf_capable checks for the operations
> > that work on a provided fd?
>
> Above snippet is for the case when sysctl switches unpriv off.
> It was a big hammer and stays big hammer.
> I certainly would like to improve the situation, but I suspect
> the folks who turn that sysctl knob on are simply paranoid about bpf
> and no amount of reasoning would turn them around.
>

Without CAP_BPF, sysctl was the only option to keep you safe from flow
of bpf vulns. You didn't had to be paranoid about that.

Jordan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-08 21:53 ` [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF Alexei Starovoitov
  2020-05-12  0:12   ` sdf
@ 2020-05-12 14:35   ` Daniel Borkmann
  2020-05-12 18:25     ` Alexei Starovoitov
  2020-05-12 15:05   ` Daniel Borkmann
  2020-05-12 20:27   ` Daniel Borkmann
  3 siblings, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2020-05-12 14:35 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: netdev, bpf, kernel-team, linux-security-module, acme, jamorris,
	jannh, kpsingh

On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Implement permissions as stated in uapi/linux/capability.h
> In order to do that the verifier allow_ptr_leaks flag is split
> into allow_ptr_leaks and bpf_capable flags and they are set as:
>    env->allow_ptr_leaks = perfmon_capable();
>    env->bpf_capable = bpf_capable();
> 
> bpf_capable enables bounded loops, variable stack access and other verifier features.
> allow_ptr_leaks enable ptr leaks, ptr conversions, subtraction of pointers, etc.
> It also disables side channel mitigations.
> 
> That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will
> have speculative checks done by the verifier and other spectre mitigation applied.
> Such networking BPF program will not be able to leak kernel pointers.

I don't quite follow this part in the code below yet, see my comments.

> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[...]
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 6abd5a778fcd..c32a7880fa62 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -375,6 +375,7 @@ struct bpf_verifier_env {
>   	u32 used_map_cnt;		/* number of used maps */
>   	u32 id_gen;			/* used to generate unique reg IDs */
>   	bool allow_ptr_leaks;
> +	bool bpf_capable;
>   	bool seen_direct_write;
>   	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
>   	const struct bpf_line_info *prev_linfo;
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 95d77770353c..264a9254dc39 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -77,7 +77,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>   	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
>   	int ret, numa_node = bpf_map_attr_numa_node(attr);
>   	u32 elem_size, index_mask, max_entries;
> -	bool unpriv = !capable(CAP_SYS_ADMIN);
> +	bool unpriv = !bpf_capable();

So here progs loaded with CAP_BPF will have spectre mitigations bypassed which
is the opposite of above statement, no?

>   	u64 cost, array_size, mask64;
>   	struct bpf_map_memory mem;
>   	struct bpf_array *array;
[...]
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 6aa11de67315..8f421dd0c4cf 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -646,7 +646,7 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
>   void bpf_prog_kallsyms_add(struct bpf_prog *fp)
>   {
>   	if (!bpf_prog_kallsyms_candidate(fp) ||
> -	    !capable(CAP_SYS_ADMIN))
> +	    !bpf_capable())
>   		return;
>   
>   	bpf_prog_ksym_set_addr(fp);
> @@ -824,7 +824,7 @@ static int bpf_jit_charge_modmem(u32 pages)
>   {
>   	if (atomic_long_add_return(pages, &bpf_jit_current) >
>   	    (bpf_jit_limit >> PAGE_SHIFT)) {
> -		if (!capable(CAP_SYS_ADMIN)) {
> +		if (!bpf_capable()) {

Should there still be an upper charge on module mem for !CAP_SYS_ADMIN?

>   			atomic_long_sub(pages, &bpf_jit_current);
>   			return -EPERM;
>   		}
[...]
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 70ad009577f8..a6893746cd87 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
[...]
> @@ -3428,7 +3429,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>   		 * Spectre masking for stack ALU.
>   		 * See also retrieve_ptr_limit().
>   		 */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->bpf_capable) {

This needs to stay on env->allow_ptr_leaks, the can_skip_alu_sanitation() does
check on env->allow_ptr_leaks as well, otherwise this breaks spectre mitgation
when masking alu.

>   			char tn_buf[48];
>   
>   			tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> @@ -7229,7 +7230,7 @@ static int push_insn(int t, int w, int e, struct bpf_verifier_env *env,
>   		insn_stack[env->cfg.cur_stack++] = w;
>   		return 1;
>   	} else if ((insn_state[w] & 0xF0) == DISCOVERED) {
> -		if (loop_ok && env->allow_ptr_leaks)
> +		if (loop_ok && env->bpf_capable)
>   			return 0;
>   		verbose_linfo(env, t, "%d: ", t);
>   		verbose_linfo(env, w, "%d: ", w);
> @@ -8338,7 +8339,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>   	if (env->max_states_per_insn < states_cnt)
>   		env->max_states_per_insn = states_cnt;
>   
> -	if (!env->allow_ptr_leaks && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
> +	if (!env->bpf_capable && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
>   		return push_jmp_history(env, cur);
>   
>   	if (!add_new_state)
> @@ -9998,7 +9999,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
>   			insn->code = BPF_JMP | BPF_TAIL_CALL;
>   
>   			aux = &env->insn_aux_data[i + delta];
> -			if (env->allow_ptr_leaks && !expect_blinding &&
> +			if (env->bpf_capable && !expect_blinding &&
>   			    prog->jit_requested &&
>   			    !bpf_map_key_poisoned(aux) &&
>   			    !bpf_map_ptr_poisoned(aux) &&
> @@ -10725,7 +10726,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
>   		env->insn_aux_data[i].orig_idx = i;
>   	env->prog = *prog;
>   	env->ops = bpf_verifier_ops[env->prog->type];
> -	is_priv = capable(CAP_SYS_ADMIN);
> +	is_priv = bpf_capable();
>   
>   	if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
>   		mutex_lock(&bpf_verifier_lock);
> @@ -10766,7 +10767,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
>   	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
>   		env->strict_alignment = false;
>   
> -	env->allow_ptr_leaks = is_priv;
> +	env->allow_ptr_leaks = perfmon_capable();
> +	env->bpf_capable = bpf_capable();
>   

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-08 21:53 ` [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF Alexei Starovoitov
  2020-05-12  0:12   ` sdf
  2020-05-12 14:35   ` Daniel Borkmann
@ 2020-05-12 15:05   ` Daniel Borkmann
  2020-05-12 18:29     ` Alexei Starovoitov
  2020-05-12 20:27   ` Daniel Borkmann
  3 siblings, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2020-05-12 15:05 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: netdev, bpf, kernel-team, linux-security-module, acme, jamorris,
	jannh, kpsingh

On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Implement permissions as stated in uapi/linux/capability.h
> In order to do that the verifier allow_ptr_leaks flag is split
> into allow_ptr_leaks and bpf_capable flags and they are set as:
>    env->allow_ptr_leaks = perfmon_capable();
>    env->bpf_capable = bpf_capable();

[...]
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 70ad009577f8..a6893746cd87 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1293,7 +1293,7 @@ static void __mark_reg_unknown(const struct bpf_verifier_env *env,
>   	reg->type = SCALAR_VALUE;
>   	reg->var_off = tnum_unknown;
>   	reg->frameno = 0;
> -	reg->precise = env->subprog_cnt > 1 || !env->allow_ptr_leaks;
> +	reg->precise = env->subprog_cnt > 1 || !env->bpf_capable;
>   	__mark_reg_unbounded(reg);
>   }
>   
> @@ -1425,8 +1425,9 @@ static int check_subprogs(struct bpf_verifier_env *env)
>   			continue;
>   		if (insn[i].src_reg != BPF_PSEUDO_CALL)
>   			continue;
> -		if (!env->allow_ptr_leaks) {
> -			verbose(env, "function calls to other bpf functions are allowed for root only\n");
> +		if (!env->bpf_capable) {
> +			verbose(env,
> +				"function calls to other bpf functions are allowed for CAP_BPF and CAP_SYS_ADMIN\n");
>   			return -EPERM;
>   		}
>   		ret = add_subprog(env, i + insn[i].imm + 1);
> @@ -1960,7 +1961,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
>   	bool new_marks = false;
>   	int i, err;
>   
> -	if (!env->allow_ptr_leaks)
> +	if (!env->bpf_capable)
>   		/* backtracking is root only for now */
>   		return 0;
>   
> @@ -2208,7 +2209,7 @@ static int check_stack_write(struct bpf_verifier_env *env,
>   		reg = &cur->regs[value_regno];
>   
>   	if (reg && size == BPF_REG_SIZE && register_is_const(reg) &&
> -	    !register_is_null(reg) && env->allow_ptr_leaks) {
> +	    !register_is_null(reg) && env->bpf_capable) {
>   		if (dst_reg != BPF_REG_FP) {
>   			/* The backtracking logic can only recognize explicit
>   			 * stack slot address like [fp - 8]. Other spill of
> @@ -3428,7 +3429,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>   		 * Spectre masking for stack ALU.
>   		 * See also retrieve_ptr_limit().
>   		 */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->bpf_capable) {
>   			char tn_buf[48];
>   
>   			tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> @@ -7229,7 +7230,7 @@ static int push_insn(int t, int w, int e, struct bpf_verifier_env *env,
>   		insn_stack[env->cfg.cur_stack++] = w;
>   		return 1;
>   	} else if ((insn_state[w] & 0xF0) == DISCOVERED) {
> -		if (loop_ok && env->allow_ptr_leaks)
> +		if (loop_ok && env->bpf_capable)
>   			return 0;
>   		verbose_linfo(env, t, "%d: ", t);
>   		verbose_linfo(env, w, "%d: ", w);
> @@ -8338,7 +8339,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>   	if (env->max_states_per_insn < states_cnt)
>   		env->max_states_per_insn = states_cnt;
>   
> -	if (!env->allow_ptr_leaks && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
> +	if (!env->bpf_capable && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
>   		return push_jmp_history(env, cur);
>   
>   	if (!add_new_state)
> @@ -9998,7 +9999,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
>   			insn->code = BPF_JMP | BPF_TAIL_CALL;
>   
>   			aux = &env->insn_aux_data[i + delta];
> -			if (env->allow_ptr_leaks && !expect_blinding &&
> +			if (env->bpf_capable && !expect_blinding &&
>   			    prog->jit_requested &&
>   			    !bpf_map_key_poisoned(aux) &&
>   			    !bpf_map_ptr_poisoned(aux) &&
> @@ -10725,7 +10726,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
>   		env->insn_aux_data[i].orig_idx = i;
>   	env->prog = *prog;
>   	env->ops = bpf_verifier_ops[env->prog->type];
> -	is_priv = capable(CAP_SYS_ADMIN);
> +	is_priv = bpf_capable();
>   
>   	if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
>   		mutex_lock(&bpf_verifier_lock);
> @@ -10766,7 +10767,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
>   	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
>   		env->strict_alignment = false;
>   
> -	env->allow_ptr_leaks = is_priv;
> +	env->allow_ptr_leaks = perfmon_capable();
> +	env->bpf_capable = bpf_capable();

Probably more of a detail, but it feels weird to tie perfmon_capable() into the BPF
core and use it in various places there. I would rather make this a proper bpf_*
prefixed helper and add a more descriptive name (what does it have to do with perf
or monitoring directly?). For example, all the main functionality could be under
`bpf_base_capable()` and everything with potential to leak pointers or mem to user
space as `bpf_leak_capable()`. Then inside include/linux/capability.h this can still
resolve under the hood to something like:

static inline bool bpf_base_capable(void)
{
	return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
}

static inline bool bpf_leak_capable(void)
{
	return perfmon_capable();
}

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 12:50       ` Jordan Glover
@ 2020-05-12 15:46         ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-12 15:46 UTC (permalink / raw)
  To: Jordan Glover
  Cc: sdf, davem, daniel, netdev, bpf, kernel-team,
	linux-security-module, acme, jamorris, jannh, kpsingh

On Tue, May 12, 2020 at 12:50:05PM +0000, Jordan Glover wrote:
> On Tuesday, May 12, 2020 2:36 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > On Mon, May 11, 2020 at 05:12:10PM -0700, sdf@google.com wrote:
> >
> > > On 05/08, Alexei Starovoitov wrote:
> > >
> > > > From: Alexei Starovoitov ast@kernel.org
> > > > [..]
> > > > @@ -3932,7 +3977,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr
> > > > __user *, uattr, unsigned int, siz
> > > > union bpf_attr attr;
> > > > int err;
> > >
> > > > -   if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> > > >
> > > > -   if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > > >     return -EPERM;
> > > >     This is awesome, thanks for reviving the effort!
> > > >
> > >
> > > One question I have about this particular snippet:
> > > Does it make sense to drop bpf_capable checks for the operations
> > > that work on a provided fd?
> >
> > Above snippet is for the case when sysctl switches unpriv off.
> > It was a big hammer and stays big hammer.
> > I certainly would like to improve the situation, but I suspect
> > the folks who turn that sysctl knob on are simply paranoid about bpf
> > and no amount of reasoning would turn them around.
> >
> 
> Without CAP_BPF, sysctl was the only option to keep you safe from flow
> of bpf vulns. You didn't had to be paranoid about that.

In the year 2020 there were three verifier bugs that could have been exploited
through unpriv. All three were found by new kBdysch fuzzer. In 2019 there was
nothing. Not because people didn't try, but because syzbot fuzzer reached its
limit. This cap_bpf will help fuzzers find a new set of bugs.

The pace of bpf development is accelerating, so there will be more bugs found
and introduced in the verifier. Folks that run the very latest kernel are
taking that risk along with the risk associated with other new kernel features.
Yet other features don't have sysctls to disable them.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12  2:36     ` Alexei Starovoitov
  2020-05-12 12:50       ` Jordan Glover
@ 2020-05-12 15:54       ` sdf
  2020-05-12 18:39         ` Alexei Starovoitov
  1 sibling, 1 reply; 21+ messages in thread
From: sdf @ 2020-05-12 15:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, netdev, bpf, kernel-team, linux-security-module,
	acme, jamorris, jannh, kpsingh

On 05/11, Alexei Starovoitov wrote:
> On Mon, May 11, 2020 at 05:12:10PM -0700, sdf@google.com wrote:
> > On 05/08, Alexei Starovoitov wrote:
> > > From: Alexei Starovoitov <ast@kernel.org>
> > [..]
> > > @@ -3932,7 +3977,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr
> > > __user *, uattr, unsigned int, siz
> > >   	union bpf_attr attr;
> > >   	int err;
> >
> > > -	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> > > +	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > >   		return -EPERM;
> > This is awesome, thanks for reviving the effort!
> >
> > One question I have about this particular snippet:
> > Does it make sense to drop bpf_capable checks for the operations
> > that work on a provided fd?

> Above snippet is for the case when sysctl switches unpriv off.
> It was a big hammer and stays big hammer.
> I certainly would like to improve the situation, but I suspect
> the folks who turn that sysctl knob on are simply paranoid about bpf
> and no amount of reasoning would turn them around.
Yeah, and we do use it unfortunately :-( I suppose we still would
like to keep it that way for a while, but maybe start relaxing
some operations a bit.

> > The use-case I have in mind is as follows:
> > * privileged (CAP_BPF) process loads the programs/maps and pins
> >   them at some known location
> > * unprivileged process opens up those pins and does the following:
> >   * prepares the maps (and will later on read them)
> >   * does SO_ATTACH_BPF/SO_ATTACH_REUSEPORT_EBPF which afaik don't
> >     require any capabilities
> >
> > This essentially pushes some of the permission checks into a fs layer.  
> So
> > whoever has a file descriptor (via unix sock or open) can do BPF  
> operations
> > on the object that represents it.

> cap_bpf doesn't change things in that regard.
> Two cases here:
> sysctl_unprivileged_bpf_disabled==0:
>    Unpriv can load socket_filter prog type and unpriv can attach it
>    via SO_ATTACH_BPF/SO_ATTACH_REUSEPORT_EBPF.
> sysctl_unprivileged_bpf_disabled==1:
>    cap_sys_admin can load socket_filter and unpriv can attach it.
Sorry, I wasn't clear enough, I was talking about unpriv_bpf_disabled=1
case.

> With addition of cap_bpf in the second case cap_bpf process can
> load socket_filter too.
> It doesn't mean that permissions are pushed into fs layer.
> I'm not sure that relaxing of sysctl_unprivileged_bpf_disabled
> will be well received.
> Are you proposing to selectively allow certain bpf syscall commands
> even when sysctl_unprivileged_bpf_disabled==1 ?
> Like allow unpriv to do BPF_OBJ_GET to get an fd from bpffs ?
> And allow unpriv to do map_update ?
Yes, that's the gist of what I'm proposing. Allow the operations that
work on fd even with unpriv_bpf_disabled=1. The assumption that
obtaining fd requires a privileged operation on its own and
should give enough protection.

> It makes complete sense to me, but I'd like to argue about that
> independently from this cap_bpf set.
> We can relax that sysctl later.
Ack, thanks, let me bring it up again later, when we get to the cap_bpf
state.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 14:35   ` Daniel Borkmann
@ 2020-05-12 18:25     ` Alexei Starovoitov
  2020-05-12 20:07       ` Daniel Borkmann
  0 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-12 18:25 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

On Tue, May 12, 2020 at 04:35:41PM +0200, Daniel Borkmann wrote:
> On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> > 
> > Implement permissions as stated in uapi/linux/capability.h
> > In order to do that the verifier allow_ptr_leaks flag is split
> > into allow_ptr_leaks and bpf_capable flags and they are set as:
> >    env->allow_ptr_leaks = perfmon_capable();
> >    env->bpf_capable = bpf_capable();
> > 
> > bpf_capable enables bounded loops, variable stack access and other verifier features.
> > allow_ptr_leaks enable ptr leaks, ptr conversions, subtraction of pointers, etc.
> > It also disables side channel mitigations.
> > 
> > That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will
> > have speculative checks done by the verifier and other spectre mitigation applied.
> > Such networking BPF program will not be able to leak kernel pointers.
> 
> I don't quite follow this part in the code below yet, see my comments.
> 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> [...]
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 6abd5a778fcd..c32a7880fa62 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -375,6 +375,7 @@ struct bpf_verifier_env {
> >   	u32 used_map_cnt;		/* number of used maps */
> >   	u32 id_gen;			/* used to generate unique reg IDs */
> >   	bool allow_ptr_leaks;
> > +	bool bpf_capable;
> >   	bool seen_direct_write;
> >   	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
> >   	const struct bpf_line_info *prev_linfo;
> > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > index 95d77770353c..264a9254dc39 100644
> > --- a/kernel/bpf/arraymap.c
> > +++ b/kernel/bpf/arraymap.c
> > @@ -77,7 +77,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
> >   	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
> >   	int ret, numa_node = bpf_map_attr_numa_node(attr);
> >   	u32 elem_size, index_mask, max_entries;
> > -	bool unpriv = !capable(CAP_SYS_ADMIN);
> > +	bool unpriv = !bpf_capable();
> 
> So here progs loaded with CAP_BPF will have spectre mitigations bypassed which
> is the opposite of above statement, no?

right. good catch, but now I'm not sure it was such a good call to toss
spectre into cap_perfmon. It probably should be disabled under cap_bpf.

> >   	u64 cost, array_size, mask64;
> >   	struct bpf_map_memory mem;
> >   	struct bpf_array *array;
> [...]
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 6aa11de67315..8f421dd0c4cf 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -646,7 +646,7 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
> >   void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> >   {
> >   	if (!bpf_prog_kallsyms_candidate(fp) ||
> > -	    !capable(CAP_SYS_ADMIN))
> > +	    !bpf_capable())
> >   		return;
> >   	bpf_prog_ksym_set_addr(fp);
> > @@ -824,7 +824,7 @@ static int bpf_jit_charge_modmem(u32 pages)
> >   {
> >   	if (atomic_long_add_return(pages, &bpf_jit_current) >
> >   	    (bpf_jit_limit >> PAGE_SHIFT)) {
> > -		if (!capable(CAP_SYS_ADMIN)) {
> > +		if (!bpf_capable()) {
> 
> Should there still be an upper charge on module mem for !CAP_SYS_ADMIN?

hmm. cap_bpf is a subset of cap_sys_admin. I don't see a reason
to keep requiring cap_sys_admin here.

> 
> >   			atomic_long_sub(pages, &bpf_jit_current);
> >   			return -EPERM;
> >   		}
> [...]
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 70ad009577f8..a6893746cd87 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> [...]
> > @@ -3428,7 +3429,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
> >   		 * Spectre masking for stack ALU.
> >   		 * See also retrieve_ptr_limit().
> >   		 */
> > -		if (!env->allow_ptr_leaks) {
> > +		if (!env->bpf_capable) {
> 
> This needs to stay on env->allow_ptr_leaks, the can_skip_alu_sanitation() does
> check on env->allow_ptr_leaks as well, otherwise this breaks spectre mitgation
> when masking alu.

The patch kept it in can_skip_alu_sanitation(), but I missed it here.
Don't really recall the details of discussion around
commit 088ec26d9c2d ("bpf: Reject indirect var_off stack access in unpriv mode")

So thinking all over this bit will effectively disable variable
stack access which is one of main usability features.
So for v6 I'm thinking to put spectre bypass into cap_bpf.
allow_ptr_leak will mean only what the name says: pointer leaks only.
cap_bpf should not be given to user processes that want to become root
via spectre side channels.
I think it's a usability trade-off for cap_bpf.
Without indirect var under cap_bpf too many networking progs will be forced to use
cap_bpf+net_net_admin+cap_perfmon only to pass the verifier
while they don't really care about reading arbitrary memory via cap_perfmon.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 15:05   ` Daniel Borkmann
@ 2020-05-12 18:29     ` Alexei Starovoitov
  2020-05-12 20:09       ` Daniel Borkmann
  0 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-12 18:29 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

On Tue, May 12, 2020 at 05:05:12PM +0200, Daniel Borkmann wrote:
> > -	env->allow_ptr_leaks = is_priv;
> > +	env->allow_ptr_leaks = perfmon_capable();
> > +	env->bpf_capable = bpf_capable();
> 
> Probably more of a detail, but it feels weird to tie perfmon_capable() into the BPF
> core and use it in various places there. I would rather make this a proper bpf_*
> prefixed helper and add a more descriptive name (what does it have to do with perf
> or monitoring directly?). For example, all the main functionality could be under
> `bpf_base_capable()` and everything with potential to leak pointers or mem to user
> space as `bpf_leak_capable()`. Then inside include/linux/capability.h this can still
> resolve under the hood to something like:
> 
> static inline bool bpf_base_capable(void)
> {
> 	return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
> }

I don't like the 'base' in the name, since 'base' implies common subset,
but it's not the case. Also 'base' implies that something else is additive,
but it's not the case either. The real base is unpriv. cap_bpf adds to it.
So bpf_capable() in capability.h is the most appropriate.
It also matches perfmon_capable() and other *_capable()

> static inline bool bpf_leak_capable(void)
> {
> 	return perfmon_capable();
> }

This is ok, but not in capability.h. I can put it into bpf_verifier.h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 15:54       ` sdf
@ 2020-05-12 18:39         ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-12 18:39 UTC (permalink / raw)
  To: sdf
  Cc: davem, daniel, netdev, bpf, kernel-team, linux-security-module,
	acme, jamorris, jannh, kpsingh

On Tue, May 12, 2020 at 08:54:11AM -0700, sdf@google.com wrote:
> On 05/11, Alexei Starovoitov wrote:
> > On Mon, May 11, 2020 at 05:12:10PM -0700, sdf@google.com wrote:
> > > On 05/08, Alexei Starovoitov wrote:
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > [..]
> > > > @@ -3932,7 +3977,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr
> > > > __user *, uattr, unsigned int, siz
> > > >   	union bpf_attr attr;
> > > >   	int err;
> > >
> > > > -	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> > > > +	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > > >   		return -EPERM;
> > > This is awesome, thanks for reviving the effort!
> > >
> > > One question I have about this particular snippet:
> > > Does it make sense to drop bpf_capable checks for the operations
> > > that work on a provided fd?
> 
> > Above snippet is for the case when sysctl switches unpriv off.
> > It was a big hammer and stays big hammer.
> > I certainly would like to improve the situation, but I suspect
> > the folks who turn that sysctl knob on are simply paranoid about bpf
> > and no amount of reasoning would turn them around.
> Yeah, and we do use it unfortunately :-( I suppose we still would
> like to keep it that way for a while, but maybe start relaxing
> some operations a bit.

I suspect that was done couple years ago when spectre was just discovered and
the verifier wasn't doing speculative analysis.
If your security folks still insist on that sysctl they either didn't
follow bpf development or paranoid.
If former is the case I would love to openly discuss all the advances in
the verification logic to prevent side channels.
The verifier is doing amazing job finding bad assembly code.
There is no other tool that is similarly capable.
All compilers and static analyzers are no where close to the level
of sophistication that the verifier has in detection of bad speculation.

> > > The use-case I have in mind is as follows:
> > > * privileged (CAP_BPF) process loads the programs/maps and pins
> > >   them at some known location
> > > * unprivileged process opens up those pins and does the following:
> > >   * prepares the maps (and will later on read them)
> > >   * does SO_ATTACH_BPF/SO_ATTACH_REUSEPORT_EBPF which afaik don't
> > >     require any capabilities
> > >
> > > This essentially pushes some of the permission checks into a fs layer.
> > So
> > > whoever has a file descriptor (via unix sock or open) can do BPF
> > operations
> > > on the object that represents it.
> 
> > cap_bpf doesn't change things in that regard.
> > Two cases here:
> > sysctl_unprivileged_bpf_disabled==0:
> >    Unpriv can load socket_filter prog type and unpriv can attach it
> >    via SO_ATTACH_BPF/SO_ATTACH_REUSEPORT_EBPF.
> > sysctl_unprivileged_bpf_disabled==1:
> >    cap_sys_admin can load socket_filter and unpriv can attach it.
> Sorry, I wasn't clear enough, I was talking about unpriv_bpf_disabled=1
> case.
> 
> > With addition of cap_bpf in the second case cap_bpf process can
> > load socket_filter too.
> > It doesn't mean that permissions are pushed into fs layer.
> > I'm not sure that relaxing of sysctl_unprivileged_bpf_disabled
> > will be well received.
> > Are you proposing to selectively allow certain bpf syscall commands
> > even when sysctl_unprivileged_bpf_disabled==1 ?
> > Like allow unpriv to do BPF_OBJ_GET to get an fd from bpffs ?
> > And allow unpriv to do map_update ?
> Yes, that's the gist of what I'm proposing. Allow the operations that
> work on fd even with unpriv_bpf_disabled=1. The assumption that
> obtaining fd requires a privileged operation on its own and
> should give enough protection.

I agree.

> 
> > It makes complete sense to me, but I'd like to argue about that
> > independently from this cap_bpf set.
> > We can relax that sysctl later.
> Ack, thanks, let me bring it up again later, when we get to the cap_bpf
> state.

Thanks for the feedback.
Just to make sure we're on the same page let me clarify one more thing.
The state of cap_bpf in this patch set is not the final state of bpf
security in general. We were stuck on cap_bpf proposal since september.
bpf community lost many months of what could have been gradual
improvements in bpf safety and security.
This cap_bpf is a way to get us unstuck. There will be many more
security related patches that improve safety, security and usability.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 18:25     ` Alexei Starovoitov
@ 2020-05-12 20:07       ` Daniel Borkmann
  2020-05-12 22:56         ` Alexei Starovoitov
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2020-05-12 20:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

On 5/12/20 8:25 PM, Alexei Starovoitov wrote:
> On Tue, May 12, 2020 at 04:35:41PM +0200, Daniel Borkmann wrote:
>> On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
>>> From: Alexei Starovoitov <ast@kernel.org>
>>>
>>> Implement permissions as stated in uapi/linux/capability.h
>>> In order to do that the verifier allow_ptr_leaks flag is split
>>> into allow_ptr_leaks and bpf_capable flags and they are set as:
>>>     env->allow_ptr_leaks = perfmon_capable();
>>>     env->bpf_capable = bpf_capable();
>>>
>>> bpf_capable enables bounded loops, variable stack access and other verifier features.
>>> allow_ptr_leaks enable ptr leaks, ptr conversions, subtraction of pointers, etc.
>>> It also disables side channel mitigations.
>>>
>>> That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will
>>> have speculative checks done by the verifier and other spectre mitigation applied.
>>> Such networking BPF program will not be able to leak kernel pointers.
>>
>> I don't quite follow this part in the code below yet, see my comments.
>>
>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> [...]
>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>> index 6abd5a778fcd..c32a7880fa62 100644
>>> --- a/include/linux/bpf_verifier.h
>>> +++ b/include/linux/bpf_verifier.h
>>> @@ -375,6 +375,7 @@ struct bpf_verifier_env {
>>>    	u32 used_map_cnt;		/* number of used maps */
>>>    	u32 id_gen;			/* used to generate unique reg IDs */
>>>    	bool allow_ptr_leaks;
>>> +	bool bpf_capable;
>>>    	bool seen_direct_write;
>>>    	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
>>>    	const struct bpf_line_info *prev_linfo;
>>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>>> index 95d77770353c..264a9254dc39 100644
>>> --- a/kernel/bpf/arraymap.c
>>> +++ b/kernel/bpf/arraymap.c
>>> @@ -77,7 +77,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>>>    	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
>>>    	int ret, numa_node = bpf_map_attr_numa_node(attr);
>>>    	u32 elem_size, index_mask, max_entries;
>>> -	bool unpriv = !capable(CAP_SYS_ADMIN);
>>> +	bool unpriv = !bpf_capable();
>>
>> So here progs loaded with CAP_BPF will have spectre mitigations bypassed which
>> is the opposite of above statement, no?
> 
> right. good catch, but now I'm not sure it was such a good call to toss
> spectre into cap_perfmon. It probably should be disabled under cap_bpf.

Right. :( Too bad CAP_*s are not more fine-grained today for more descriptive
policy. I would presume granting both CAP_BPF + CAP_PERFMON combination is not
always desired either, so probably makes sense to leave it out with a clear
description in patch 1/3 for CAP_BPF about the implications.

>>>    	u64 cost, array_size, mask64;
>>>    	struct bpf_map_memory mem;
>>>    	struct bpf_array *array;
>> [...]
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index 6aa11de67315..8f421dd0c4cf 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -646,7 +646,7 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
>>>    void bpf_prog_kallsyms_add(struct bpf_prog *fp)
>>>    {
>>>    	if (!bpf_prog_kallsyms_candidate(fp) ||
>>> -	    !capable(CAP_SYS_ADMIN))
>>> +	    !bpf_capable())
>>>    		return;
>>>    	bpf_prog_ksym_set_addr(fp);
>>> @@ -824,7 +824,7 @@ static int bpf_jit_charge_modmem(u32 pages)
>>>    {
>>>    	if (atomic_long_add_return(pages, &bpf_jit_current) >
>>>    	    (bpf_jit_limit >> PAGE_SHIFT)) {
>>> -		if (!capable(CAP_SYS_ADMIN)) {
>>> +		if (!bpf_capable()) {
>>
>> Should there still be an upper charge on module mem for !CAP_SYS_ADMIN?
> 
> hmm. cap_bpf is a subset of cap_sys_admin. I don't see a reason
> to keep requiring cap_sys_admin here.

It should probably be described in the CAP_BPF comment as well since this
is different compared to plain unpriv.

>>>    			atomic_long_sub(pages, &bpf_jit_current);
>>>    			return -EPERM;
>>>    		}
>> [...]
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 70ad009577f8..a6893746cd87 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>> [...]
>>> @@ -3428,7 +3429,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>>>    		 * Spectre masking for stack ALU.
>>>    		 * See also retrieve_ptr_limit().
>>>    		 */
>>> -		if (!env->allow_ptr_leaks) {
>>> +		if (!env->bpf_capable) {
>>
>> This needs to stay on env->allow_ptr_leaks, the can_skip_alu_sanitation() does
>> check on env->allow_ptr_leaks as well, otherwise this breaks spectre mitgation
>> when masking alu.
> 
> The patch kept it in can_skip_alu_sanitation(), but I missed it here.
> Don't really recall the details of discussion around
> commit 088ec26d9c2d ("bpf: Reject indirect var_off stack access in unpriv mode")
> 
> So thinking all over this bit will effectively disable variable
> stack access which is one of main usability features.

The reason is that we otherwise cannot derive a fixed limit for the masking
in order to avoid oob access under speculation.

> So for v6 I'm thinking to put spectre bypass into cap_bpf.
> allow_ptr_leak will mean only what the name says: pointer leaks only.
> cap_bpf should not be given to user processes that want to become root
> via spectre side channels.

Yeah, I think it needs to be made crystal clear that from a security level
CAP_BPF is effectively from a BPF point of view very close to CAP_SYS_ADMIN
minus the remaining non-BPF stuff in there, so this should not be handed out
loosely.

> I think it's a usability trade-off for cap_bpf.
> Without indirect var under cap_bpf too many networking progs will be forced to use
> cap_bpf+net_net_admin+cap_perfmon only to pass the verifier
> while they don't really care about reading arbitrary memory via cap_perfmon.

If I recall correctly, at least for Cilium programs the var access restriction
was not an issue - we don't use/need them in our code today, but it might differ
on your side, for example. This brings us back that while CAP_BPF would solve
the issue of not having to hand out the even wider CAP_SYS_ADMIN, it's still not
the end of the tunnel either and we'll see need for something more fine-grained
coming next.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 18:29     ` Alexei Starovoitov
@ 2020-05-12 20:09       ` Daniel Borkmann
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Borkmann @ 2020-05-12 20:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

On 5/12/20 8:29 PM, Alexei Starovoitov wrote:
> On Tue, May 12, 2020 at 05:05:12PM +0200, Daniel Borkmann wrote:
>>> -	env->allow_ptr_leaks = is_priv;
>>> +	env->allow_ptr_leaks = perfmon_capable();
>>> +	env->bpf_capable = bpf_capable();
>>
>> Probably more of a detail, but it feels weird to tie perfmon_capable() into the BPF
>> core and use it in various places there. I would rather make this a proper bpf_*
>> prefixed helper and add a more descriptive name (what does it have to do with perf
>> or monitoring directly?). For example, all the main functionality could be under
>> `bpf_base_capable()` and everything with potential to leak pointers or mem to user
>> space as `bpf_leak_capable()`. Then inside include/linux/capability.h this can still
>> resolve under the hood to something like:
>>
>> static inline bool bpf_base_capable(void)
>> {
>> 	return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
>> }
> 
> I don't like the 'base' in the name, since 'base' implies common subset,
> but it's not the case. Also 'base' implies that something else is additive,
> but it's not the case either. The real base is unpriv. cap_bpf adds to it.
> So bpf_capable() in capability.h is the most appropriate.
> It also matches perfmon_capable() and other *_capable()

That's okay with me, naming is usually hardest. :)

>> static inline bool bpf_leak_capable(void)
>> {
>> 	return perfmon_capable();
>> }
> 
> This is ok, but not in capability.h. I can put it into bpf_verifier.h

Makes sense.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-08 21:53 ` [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF Alexei Starovoitov
                     ` (2 preceding siblings ...)
  2020-05-12 15:05   ` Daniel Borkmann
@ 2020-05-12 20:27   ` Daniel Borkmann
  2020-05-12 23:01     ` Alexei Starovoitov
  3 siblings, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2020-05-12 20:27 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: netdev, bpf, kernel-team, linux-security-module, acme, jamorris,
	jannh, kpsingh

On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
[...]
> @@ -2880,8 +2933,6 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
>   	struct bpf_prog *prog;
>   	int ret = -ENOTSUPP;
>   
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EPERM;

Should above be under bpf_capable() as well or is the intention to really let
(fully) unpriv users run sk_filter test progs here? I would assume only progs
that have prior been loaded under bpf_capable() should suffice, so no need to
lower the bar for now, no?

>   	if (CHECK_ATTR(BPF_PROG_TEST_RUN))
>   		return -EINVAL;
>   
> @@ -3163,7 +3214,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>   	info.run_time_ns = stats.nsecs;
>   	info.run_cnt = stats.cnt;
>   
> -	if (!capable(CAP_SYS_ADMIN)) {
> +	if (!bpf_capable()) {

Given the JIT dump this also exposes addresses when bpf_dump_raw_ok() passes.
I presume okay, but should probably be documented given CAP_SYS_ADMIN isn't
required anymore?

>   		info.jited_prog_len = 0;
>   		info.xlated_prog_len = 0;
>   		info.nr_jited_ksyms = 0;
> @@ -3522,7 +3573,7 @@ static int bpf_btf_load(const union bpf_attr *attr)
>   	if (CHECK_ATTR(BPF_BTF_LOAD))
>   		return -EINVAL;
>   
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!bpf_capable())
>   		return -EPERM;
>   
>   	return btf_new_fd(attr);
> @@ -3736,9 +3787,6 @@ static int link_create(union bpf_attr *attr)
>   	struct bpf_prog *prog;
>   	int ret;
>   
> -	if (!capable(CAP_NET_ADMIN))
> -		return -EPERM;
> -
>   	if (CHECK_ATTR(BPF_LINK_CREATE))
>   		return -EINVAL;
>   
> @@ -3784,9 +3832,6 @@ static int link_update(union bpf_attr *attr)
>   	u32 flags;
>   	int ret;
>   
> -	if (!capable(CAP_NET_ADMIN))
> -		return -EPERM;
> -
>   	if (CHECK_ATTR(BPF_LINK_UPDATE))
>   		return -EINVAL;
>   

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 20:07       ` Daniel Borkmann
@ 2020-05-12 22:56         ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-12 22:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

On Tue, May 12, 2020 at 10:07:08PM +0200, Daniel Borkmann wrote:
> On 5/12/20 8:25 PM, Alexei Starovoitov wrote:
> > On Tue, May 12, 2020 at 04:35:41PM +0200, Daniel Borkmann wrote:
> > > On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > 
> > > > Implement permissions as stated in uapi/linux/capability.h
> > > > In order to do that the verifier allow_ptr_leaks flag is split
> > > > into allow_ptr_leaks and bpf_capable flags and they are set as:
> > > >     env->allow_ptr_leaks = perfmon_capable();
> > > >     env->bpf_capable = bpf_capable();
> > > > 
> > > > bpf_capable enables bounded loops, variable stack access and other verifier features.
> > > > allow_ptr_leaks enable ptr leaks, ptr conversions, subtraction of pointers, etc.
> > > > It also disables side channel mitigations.
> > > > 
> > > > That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will
> > > > have speculative checks done by the verifier and other spectre mitigation applied.
> > > > Such networking BPF program will not be able to leak kernel pointers.
> > > 
> > > I don't quite follow this part in the code below yet, see my comments.
> > > 
> > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > [...]
> > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > index 6abd5a778fcd..c32a7880fa62 100644
> > > > --- a/include/linux/bpf_verifier.h
> > > > +++ b/include/linux/bpf_verifier.h
> > > > @@ -375,6 +375,7 @@ struct bpf_verifier_env {
> > > >    	u32 used_map_cnt;		/* number of used maps */
> > > >    	u32 id_gen;			/* used to generate unique reg IDs */
> > > >    	bool allow_ptr_leaks;
> > > > +	bool bpf_capable;
> > > >    	bool seen_direct_write;
> > > >    	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
> > > >    	const struct bpf_line_info *prev_linfo;
> > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > > > index 95d77770353c..264a9254dc39 100644
> > > > --- a/kernel/bpf/arraymap.c
> > > > +++ b/kernel/bpf/arraymap.c
> > > > @@ -77,7 +77,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
> > > >    	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
> > > >    	int ret, numa_node = bpf_map_attr_numa_node(attr);
> > > >    	u32 elem_size, index_mask, max_entries;
> > > > -	bool unpriv = !capable(CAP_SYS_ADMIN);
> > > > +	bool unpriv = !bpf_capable();
> > > 
> > > So here progs loaded with CAP_BPF will have spectre mitigations bypassed which
> > > is the opposite of above statement, no?
> > 
> > right. good catch, but now I'm not sure it was such a good call to toss
> > spectre into cap_perfmon. It probably should be disabled under cap_bpf.
> 
> Right. :( Too bad CAP_*s are not more fine-grained today for more descriptive
> policy. I would presume granting both CAP_BPF + CAP_PERFMON combination is not
> always desired either, so probably makes sense to leave it out with a clear
> description in patch 1/3 for CAP_BPF about the implications.

My reasoning to bypass spectre mitigation under cap_perfmon was
that cap_perfmon allows reading kernel memory and side channels are
doing the same thing just in much more complicated way.
Whereas cap_bpf is about enabling advanced bpf features in the verifier
and other places that could be security sensitive only because of lack
of exposure to unpriv in the past, but by themselves cap_bpf doesn't hold
any known security issues. I think it makes sense to stick to that.
It sucks that indirect var access will be disallowed in the verifier
without cap_perfmon, but to make extensions easier in the future I'll add
env->bypass_spec_v1 and env->bypass_spec_v4 internal verifier knobs that will
deal with these two cases.
In the next revision I'll init them with perfmon_capable() and eventually
relax it to:
env->bypass_spec_v1 = perfmon_capable() || spectre_v1_mitigation == SPECTRE_V1_MITIGATION_NONE;
env->bypass_spec_v4 = perfmon_capable() || ssb_mode == SPEC_STORE_BYPASS_NONE;

The latter conditions are x86 specific and I don't want to mess with x86 bits
at this moment and argue about generalization of this for other archs.
Just doing env->bypass_spec_v1 = perfmon_capable(); is good enough for now
and can be improved like above in the follow up.

> > > >    	u64 cost, array_size, mask64;
> > > >    	struct bpf_map_memory mem;
> > > >    	struct bpf_array *array;
> > > [...]
> > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > index 6aa11de67315..8f421dd0c4cf 100644
> > > > --- a/kernel/bpf/core.c
> > > > +++ b/kernel/bpf/core.c
> > > > @@ -646,7 +646,7 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
> > > >    void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> > > >    {
> > > >    	if (!bpf_prog_kallsyms_candidate(fp) ||
> > > > -	    !capable(CAP_SYS_ADMIN))
> > > > +	    !bpf_capable())
> > > >    		return;
> > > >    	bpf_prog_ksym_set_addr(fp);
> > > > @@ -824,7 +824,7 @@ static int bpf_jit_charge_modmem(u32 pages)
> > > >    {
> > > >    	if (atomic_long_add_return(pages, &bpf_jit_current) >
> > > >    	    (bpf_jit_limit >> PAGE_SHIFT)) {
> > > > -		if (!capable(CAP_SYS_ADMIN)) {
> > > > +		if (!bpf_capable()) {
> > > 
> > > Should there still be an upper charge on module mem for !CAP_SYS_ADMIN?
> > 
> > hmm. cap_bpf is a subset of cap_sys_admin. I don't see a reason
> > to keep requiring cap_sys_admin here.
> 
> It should probably be described in the CAP_BPF comment as well since this
> is different compared to plain unpriv.

I'm going to keep it under CAP_SYS_ADMIN for now. Just to avoid arguing.

> > > >    			atomic_long_sub(pages, &bpf_jit_current);
> > > >    			return -EPERM;
> > > >    		}
> > > [...]
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 70ad009577f8..a6893746cd87 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > [...]
> > > > @@ -3428,7 +3429,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
> > > >    		 * Spectre masking for stack ALU.
> > > >    		 * See also retrieve_ptr_limit().
> > > >    		 */
> > > > -		if (!env->allow_ptr_leaks) {
> > > > +		if (!env->bpf_capable) {
> > > 
> > > This needs to stay on env->allow_ptr_leaks, the can_skip_alu_sanitation() does
> > > check on env->allow_ptr_leaks as well, otherwise this breaks spectre mitgation
> > > when masking alu.
> > 
> > The patch kept it in can_skip_alu_sanitation(), but I missed it here.
> > Don't really recall the details of discussion around
> > commit 088ec26d9c2d ("bpf: Reject indirect var_off stack access in unpriv mode")
> > 
> > So thinking all over this bit will effectively disable variable
> > stack access which is one of main usability features.
> 
> The reason is that we otherwise cannot derive a fixed limit for the masking
> in order to avoid oob access under speculation.

I guess we can revisit this and came up with better handling of this case
in the verifier.
In the next revision the above will be "if (!env->bypass_spec_v1)".

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
  2020-05-12 20:27   ` Daniel Borkmann
@ 2020-05-12 23:01     ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-05-12 23:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, bpf, kernel-team, linux-security-module, acme,
	jamorris, jannh, kpsingh

On Tue, May 12, 2020 at 10:27:33PM +0200, Daniel Borkmann wrote:
> On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
> [...]
> > @@ -2880,8 +2933,6 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
> >   	struct bpf_prog *prog;
> >   	int ret = -ENOTSUPP;
> > -	if (!capable(CAP_SYS_ADMIN))
> > -		return -EPERM;
> 
> Should above be under bpf_capable() as well or is the intention to really let
> (fully) unpriv users run sk_filter test progs here? I would assume only progs
> that have prior been loaded under bpf_capable() should suffice, so no need to
> lower the bar for now, no?

Unpriv can load sk_filter and attach to a socket. Then send data through
the socket to trigger execution.
bpf_prog_test_run is doing the same prog execution without creating a socket.
What is the concern?

> >   	if (CHECK_ATTR(BPF_PROG_TEST_RUN))
> >   		return -EINVAL;
> > @@ -3163,7 +3214,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
> >   	info.run_time_ns = stats.nsecs;
> >   	info.run_cnt = stats.cnt;
> > -	if (!capable(CAP_SYS_ADMIN)) {
> > +	if (!bpf_capable()) {
> 
> Given the JIT dump this also exposes addresses when bpf_dump_raw_ok() passes.
> I presume okay, but should probably be documented given CAP_SYS_ADMIN isn't
> required anymore?

Exactly. dump_raw_ok() is there. I'm not even sure why this cap_sys_admin
check is there. It looks like it can be completely removed, but I didn't
want to go that far in this set.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-05-12 23:01 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08 21:53 [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Alexei Starovoitov
2020-05-08 21:53 ` [PATCH v5 bpf-next 1/3] bpf, capability: " Alexei Starovoitov
2020-05-08 21:53 ` [PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF Alexei Starovoitov
2020-05-12  0:12   ` sdf
2020-05-12  2:36     ` Alexei Starovoitov
2020-05-12 12:50       ` Jordan Glover
2020-05-12 15:46         ` Alexei Starovoitov
2020-05-12 15:54       ` sdf
2020-05-12 18:39         ` Alexei Starovoitov
2020-05-12 14:35   ` Daniel Borkmann
2020-05-12 18:25     ` Alexei Starovoitov
2020-05-12 20:07       ` Daniel Borkmann
2020-05-12 22:56         ` Alexei Starovoitov
2020-05-12 15:05   ` Daniel Borkmann
2020-05-12 18:29     ` Alexei Starovoitov
2020-05-12 20:09       ` Daniel Borkmann
2020-05-12 20:27   ` Daniel Borkmann
2020-05-12 23:01     ` Alexei Starovoitov
2020-05-08 21:53 ` [PATCH v5 bpf-next 3/3] selftests/bpf: use CAP_BPF and CAP_PERFMON in tests Alexei Starovoitov
2020-05-08 22:45 ` [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Casey Schaufler
2020-05-08 23:00   ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.