All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation
@ 2023-11-03 19:05 Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
                   ` (16 more replies)
  0 siblings, 17 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

This patch set introduces an ability to delegate a subset of BPF subsystem
functionality from privileged system-wide daemon (e.g., systemd or any other
container manager) through special mount options for userns-bound BPF FS to
a *trusted* unprivileged application. Trust is the key here. This
functionality is not about allowing unconditional unprivileged BPF usage.
Establishing trust, though, is completely up to the discretion of respective
privileged application that would create and mount a BPF FS instance with
delegation enabled, as different production setups can and do achieve it
through a combination of different means (signing, LSM, code reviews, etc),
and it's undesirable and infeasible for kernel to enforce any particular way
of validating trustworthiness of particular process.

The main motivation for this work is a desire to enable containerized BPF
applications to be used together with user namespaces. This is currently
impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced
or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF
helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read
arbitrary memory, and it's impossible to ensure that they only read memory of
processes belonging to any given namespace. This means that it's impossible to
have a mechanically verifiable namespace-aware CAP_BPF capability, and as such
another mechanism to allow safe usage of BPF functionality is necessary.BPF FS
delegation mount options and BPF token derived from such BPF FS instance is
such a mechanism. Kernel makes no assumption about what "trusted" constitutes
in any particular case, and it's up to specific privileged applications and
their surrounding infrastructure to decide that. What kernel provides is a set
of APIs to setup and mount special BPF FS instanecs and derive BPF tokens from
it. BPF FS and BPF token are both bound to its owning userns and in such a way
are constrained inside intended container. Users can then pass BPF token FD to
privileged bpf() syscall commands, like BPF map creation and BPF program
loading, to perform such operations without having init userns privileged.

This version incorporates feedback and suggestions ([3]) received on v3 of
this patch set, and instead of allowing to create BPF tokens directly assuming
capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accepts a few new
delegation mount options. If these options are used and BPF FS itself is
properly created, set up, and mounted inside the user namespaced container,
user application is able to derive a BPF token object from BPF FS instance,
and pass that token to bpf() syscall. As explained in patch #2, BPF token
itself doesn't grant access to BPF functionality, but instead allows kernel to
do namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF,
CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one
half of a puzzle and allows container managers and sys admins to have safe and
flexible configuration options: determining which containers get delegation of
BPF functionality through BPF FS, and then which applications within such
containers are allowed to perform bpf() commands, based on namespaces
capabilities.

Previous attempt at addressing this very same problem ([0]) attempted to
utilize authoritative LSM approach, but was conclusively rejected by upstream
LSM maintainers. BPF token concept is not changing anything about LSM
approach, but can be combined with LSM hooks for very fine-grained security
policy. Some ideas about making BPF token more convenient to use with LSM (in
particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF
2023 presentation ([1]). E.g., an ability to specify user-provided data
(context), which in combination with BPF LSM would allow implementing a very
dynamic and fine-granular custom security policies on top of BPF token. In the
interest of minimizing API surface area and discussions this was relegated to
follow up patches, as it's not essential to the fundamental concept of
delegatable BPF token.

It should be noted that BPF token is conceptually quite similar to the idea of
/dev/bpf device file, proposed by Song a while ago ([2]). The biggest
difference is the idea of using virtual anon_inode file to hold BPF token and
allowing multiple independent instances of them, each (potentially) with its
own set of restrictions. And also, crucially, BPF token approach is not using
any special stateful task-scoped flags. Instead, bpf() syscall accepts
token_fd parameters explicitly for each relevant BPF command. This addresses
main concerns brought up during the /dev/bpf discussion, and fits better with
overall BPF subsystem design.

This patch set adds a basic minimum of functionality to make BPF token idea
useful and to discuss API and functionality. Currently only low-level libbpf
APIs support creating and passing BPF token around, allowing to test kernel
functionality, but for the most part is not sufficient for real-world
applications, which typically use high-level libbpf APIs based on `struct
bpf_object` type. This was done with the intent to limit the size of patch set
and concentrate on mostly kernel-side changes. All the necessary plumbing for
libbpf will be sent as a separate follow up patch set kernel support makes it
upstream.

Another part that should happen once kernel-side BPF token is established, is
a set of conventions between applications (e.g., systemd), tools (e.g.,
bpftool), and libraries (e.g., libbpf) on exposing delegatable BPF FS
instance(s) at well-defined locations to allow applications take advantage of
this in automatic fashion without explicit code changes on BPF application's
side. But I'd like to postpone this discussion to after BPF token concept
lands.

  [0] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/
  [1] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf
  [2] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/
  [3] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

v8->v9:
  - fix issue in selftests due to sys/mount.h header (Jiri);
  - fix warning in doc comments in LSM hooks (kernel test robot);
v7->v8:
  - add bpf_token_allow_cmd and bpf_token_capable hooks (Paul);
  - inline bpf_token_alloc() into bpf_token_create() to prevent accidental
    divergence with security_bpf_token_create() hook (Paul);
v6->v7:
  - separate patches to refactor bpf_prog_alloc/bpf_map_alloc LSM hooks, as
    discussed with Paul, and now they also accept struct bpf_token;
  - added bpf_token_create/bpf_token_free to allow LSMs (SELinux,
    specifically) to set up security LSM blob (Paul);
  - last patch also wires bpf_security_struct setup by SELinux, similar to how
    it's done for BPF map/prog, though I'm not sure if that's enough, so worst
    case it's easy to drop this patch if more full fledged SELinux
    implementation will be done separately;
  - small fixes for issues caught by code reviews (Jiri, Hou);
  - fix for test_maps test that doesn't use LIBBPF_OPTS() macro (CI);
v5->v6:
  - fix possible use of uninitialized variable in selftests (CI);
  - don't use anon_inode, instead create one from BPF FS instance (Christian);
  - don't store bpf_token inside struct bpf_map, instead pass it explicitly to
    map_check_btf(). We do store bpf_token inside prog->aux, because it's used
    during verification and even can be checked during attach time for some
    program types;
  - LSM hooks are left intact pending the conclusion of discussion with Paul
    Moore; I'd prefer to do LSM-related changes as a follow up patch set
    anyways;
v4->v5:
  - add pre-patch unifying CAP_NET_ADMIN handling inside kernel/bpf/syscall.c
    (Paul Moore);
  - fix build warnings and errors in selftests and kernel, detected by CI and
    kernel test robot;
v3->v4:
  - add delegation mount options to BPF FS;
  - BPF token is derived from the instance of BPF FS and associates itself
    with BPF FS' owning userns;
  - BPF token doesn't grant BPF functionality directly, it just turns
    capable() checks into ns_capable() checks within BPF FS' owning user;
  - BPF token cannot be pinned;
v2->v3:
  - make BPF_TOKEN_CREATE pin created BPF token in BPF FS, and disallow
    BPF_OBJ_PIN for BPF token;
v1->v2:
  - fix build failures on Kconfig with CONFIG_BPF_SYSCALL unset;
  - drop BPF_F_TOKEN_UNKNOWN_* flags and simplify UAPI (Stanislav).

Andrii Nakryiko (17):
  bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
  bpf: add BPF token delegation mount options to BPF FS
  bpf: introduce BPF token object
  bpf: add BPF token support to BPF_MAP_CREATE command
  bpf: add BPF token support to BPF_BTF_LOAD command
  bpf: add BPF token support to BPF_PROG_LOAD command
  bpf: take into account BPF token when fetching helper protos
  bpf: consistently use BPF token throughout BPF verifier logic
  bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
  bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM hooks
  bpf,lsm: add BPF token LSM hooks
  libbpf: add bpf_token_create() API
  libbpf: add BPF token support to bpf_map_create() API
  libbpf: add BPF token support to bpf_btf_load() API
  libbpf: add BPF token support to bpf_prog_load() API
  selftests/bpf: add BPF token-enabled tests
  bpf,selinux: allocate bpf_security_struct per BPF token

 drivers/media/rc/bpf-lirc.c                   |   2 +-
 include/linux/bpf.h                           |  83 ++-
 include/linux/filter.h                        |   2 +-
 include/linux/lsm_hook_defs.h                 |  15 +-
 include/linux/security.h                      |  43 +-
 include/uapi/linux/bpf.h                      |  44 ++
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/arraymap.c                         |   2 +-
 kernel/bpf/bpf_lsm.c                          |  15 +-
 kernel/bpf/cgroup.c                           |   6 +-
 kernel/bpf/core.c                             |   3 +-
 kernel/bpf/helpers.c                          |   6 +-
 kernel/bpf/inode.c                            |  98 ++-
 kernel/bpf/syscall.c                          | 215 ++++--
 kernel/bpf/token.c                            | 247 +++++++
 kernel/bpf/verifier.c                         |  13 +-
 kernel/trace/bpf_trace.c                      |   2 +-
 net/core/filter.c                             |  36 +-
 net/ipv4/bpf_tcp_ca.c                         |   2 +-
 net/netfilter/nf_bpf_link.c                   |   2 +-
 security/security.c                           | 101 ++-
 security/selinux/hooks.c                      |  47 +-
 tools/include/uapi/linux/bpf.h                |  44 ++
 tools/lib/bpf/bpf.c                           |  30 +-
 tools/lib/bpf/bpf.h                           |  39 +-
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |   4 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |   6 +
 .../testing/selftests/bpf/prog_tests/token.c  | 622 ++++++++++++++++++
 29 files changed, 1565 insertions(+), 167 deletions(-)
 create mode 100644 kernel/bpf/token.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-05  6:33   ` Yafang Shao
  2023-11-03 19:05 ` [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit
compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or
CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN
takes over and grants whatever part of BPF syscall is required.

Similar kind of checks that involve CAP_NET_ADMIN are not so consistent.
One out of four uses does follow CAP_BPF/CAP_PERFMON model: during
BPF_PROG_LOAD, if the type of BPF program is "network-related" either
CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed.

But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN
is set:
  - when creating DEVMAP/XDKMAP/CPU_MAP maps;
  - when attaching CGROUP_SKB programs;
  - when handling BPF_PROG_QUERY command.

This patch is changing the latter three cases to follow BPF_PROG_LOAD
model, that is allowing to proceed under either CAP_NET_ADMIN or
CAP_SYS_ADMIN.

This also makes it cleaner in subsequent BPF token patches to switch
wholesomely to a generic bpf_token_capable(int cap) check, that always
falls back to CAP_SYS_ADMIN if requested capability is missing.

Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0ed286b8a0f0..ad4d8e433ccc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1096,6 +1096,11 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
+static bool bpf_net_capable(void)
+{
+	return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
+}
+
 #define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
@@ -1199,7 +1204,7 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_DEVMAP:
 	case BPF_MAP_TYPE_DEVMAP_HASH:
 	case BPF_MAP_TYPE_XSKMAP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!bpf_net_capable())
 			return -EPERM;
 		break;
 	default:
@@ -2599,7 +2604,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	    !bpf_capable())
 		return -EPERM;
 
-	if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN))
+	if (is_net_admin_prog_type(type) && !bpf_net_capable())
 		return -EPERM;
 	if (is_perfmon_prog_type(type) && !perfmon_capable())
 		return -EPERM;
@@ -3751,7 +3756,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
-		if (!capable(CAP_NET_ADMIN))
+		if (!bpf_net_capable())
 			/* cg-skb progs can be loaded by unpriv user.
 			 * check permissions at attach time.
 			 */
@@ -3954,7 +3959,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 static int bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
 {
-	if (!capable(CAP_NET_ADMIN))
+	if (!bpf_net_capable())
 		return -EPERM;
 	if (CHECK_ATTR(BPF_PROG_QUERY))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-08 13:51   ` Christian Brauner
  2023-11-03 19:05 ` [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object Andrii Nakryiko
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Add few new mount options to BPF FS that allow to specify that a given
BPF FS instance allows creation of BPF token (added in the next patch),
and what sort of operations are allowed under BPF token. As such, we get
4 new mount options, each is a bit mask
  - `delegate_cmds` allow to specify which bpf() syscall commands are
    allowed with BPF token derived from this BPF FS instance;
  - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
    a set of allowable BPF map types that could be created with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
    a set of allowable BPF program types that could be loaded with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
    a set of allowable BPF program attach types that could be loaded with
    BPF token; delegate_progs and delegate_attachs are meant to be used
    together, as full BPF program type is, in general, determined
    through both program type and program attach type.

Currently, these mount options accept the following forms of values:
  - a special value "any", that enables all possible values of a given
  bit set;
  - numeric value (decimal or hexadecimal, determined by kernel
  automatically) that specifies a bit mask value directly;
  - all the values for a given mount option are combined, if specified
  multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
  delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
  mask.

Ideally, more convenient (for humans) symbolic form derived from
corresponding UAPI enums would be accepted (e.g., `-o
delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
it requires a bunch of UAPI header churn, so I postponed it until this
feature lands upstream or at least there is a definite consensus that
this feature is acceptable and is going to make it, just to minimize
amount of wasted effort and not increase amount of non-essential code to
be reviewed.

Attentive reader will notice that BPF FS is now marked as
FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
user namespace as long as the process has sufficient *namespaced*
capabilities within that user namespace. But in reality we still
restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
to allow creating BPF FS context object (i.e., fsopen("bpf")) from
inside unprivileged process inside non-init userns, to capture that
userns as the owning userns. It will still be required to pass this
context object back to privileged process to instantiate and mount it.

This manipulation is important, because capturing non-init userns as the
owning userns of BPF FS instance (super block) allows to use that userns
to constraint BPF token to that userns later on (see next patch). So
creating BPF FS with delegation inside unprivileged userns will restrict
derived BPF token objects to only "work" inside that intended userns,
making it scoped to a intended "container".

There is a set of selftests at the end of the patch set that simulates
this sequence of steps and validates that everything works as intended.
But careful review is requested to make sure there are no missed gaps in
the implementation and testing.

All this is based on suggestions and discussions with Christian Brauner
([0]), to the best of my ability to follow all the implications.

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h | 10 ++++++
 kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b4825d3cdb29..df50a7bf1a77 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1562,6 +1562,16 @@ struct bpf_link_primer {
 	u32 id;
 };
 
+struct bpf_mount_opts {
+	umode_t mode;
+
+	/* BPF token-related delegation options */
+	u64 delegate_cmds;
+	u64 delegate_maps;
+	u64 delegate_progs;
+	u64 delegate_attachs;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 1aafb2ff2e95..e49e93bc65e3 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -20,6 +20,7 @@
 #include <linux/filter.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <linux/kstrtox.h>
 #include "preload/bpf_preload.h"
 
 enum bpf_type {
@@ -599,10 +600,31 @@ EXPORT_SYMBOL(bpf_prog_get_type_path);
  */
 static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
+	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
 
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
+
+	if (opts->delegate_cmds == ~0ULL)
+		seq_printf(m, ",delegate_cmds=any");
+	else if (opts->delegate_cmds)
+		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
+
+	if (opts->delegate_maps == ~0ULL)
+		seq_printf(m, ",delegate_maps=any");
+	else if (opts->delegate_maps)
+		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
+
+	if (opts->delegate_progs == ~0ULL)
+		seq_printf(m, ",delegate_progs=any");
+	else if (opts->delegate_progs)
+		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
+
+	if (opts->delegate_attachs == ~0ULL)
+		seq_printf(m, ",delegate_attachs=any");
+	else if (opts->delegate_attachs)
+		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
 	return 0;
 }
 
@@ -626,22 +648,27 @@ static const struct super_operations bpf_super_ops = {
 
 enum {
 	OPT_MODE,
+	OPT_DELEGATE_CMDS,
+	OPT_DELEGATE_MAPS,
+	OPT_DELEGATE_PROGS,
+	OPT_DELEGATE_ATTACHS,
 };
 
 static const struct fs_parameter_spec bpf_fs_parameters[] = {
 	fsparam_u32oct	("mode",			OPT_MODE),
+	fsparam_string	("delegate_cmds",		OPT_DELEGATE_CMDS),
+	fsparam_string	("delegate_maps",		OPT_DELEGATE_MAPS),
+	fsparam_string	("delegate_progs",		OPT_DELEGATE_PROGS),
+	fsparam_string	("delegate_attachs",		OPT_DELEGATE_ATTACHS),
 	{}
 };
 
-struct bpf_mount_opts {
-	umode_t mode;
-};
-
 static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 {
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = fc->s_fs_info;
 	struct fs_parse_result result;
-	int opt;
+	int opt, err;
+	u64 msk;
 
 	opt = fs_parse(fc, bpf_fs_parameters, param, &result);
 	if (opt < 0) {
@@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case OPT_MODE:
 		opts->mode = result.uint_32 & S_IALLUGO;
 		break;
+	case OPT_DELEGATE_CMDS:
+	case OPT_DELEGATE_MAPS:
+	case OPT_DELEGATE_PROGS:
+	case OPT_DELEGATE_ATTACHS:
+		if (strcmp(param->string, "any") == 0) {
+			msk = ~0ULL;
+		} else {
+			err = kstrtou64(param->string, 0, &msk);
+			if (err)
+				return err;
+		}
+		switch (opt) {
+		case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
+		case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
+		case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
+		case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
+		default: return -EINVAL;
+		}
+		break;
 	}
 
 	return 0;
@@ -739,10 +785,14 @@ static int populate_bpffs(struct dentry *parent)
 static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	static const struct tree_descr bpf_rfiles[] = { { "" } };
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = sb->s_fs_info;
 	struct inode *inode;
 	int ret;
 
+	/* Mounting an instance of BPF FS requires privileges */
+	if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
 	if (ret)
 		return ret;
@@ -764,7 +814,10 @@ static int bpf_get_tree(struct fs_context *fc)
 
 static void bpf_free_fc(struct fs_context *fc)
 {
-	kfree(fc->fs_private);
+	struct bpf_mount_opts *opts = fc->s_fs_info;
+
+	if (opts)
+		kfree(opts);
 }
 
 static const struct fs_context_operations bpf_context_ops = {
@@ -786,17 +839,32 @@ static int bpf_init_fs_context(struct fs_context *fc)
 
 	opts->mode = S_IRWXUGO;
 
-	fc->fs_private = opts;
+	/* start out with no BPF token delegation enabled */
+	opts->delegate_cmds = 0;
+	opts->delegate_maps = 0;
+	opts->delegate_progs = 0;
+	opts->delegate_attachs = 0;
+
+	fc->s_fs_info = opts;
 	fc->ops = &bpf_context_ops;
 	return 0;
 }
 
+static void bpf_kill_super(struct super_block *sb)
+{
+	struct bpf_mount_opts *opts = sb->s_fs_info;
+
+	kill_litter_super(sb);
+	kfree(opts);
+}
+
 static struct file_system_type bpf_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "bpf",
 	.init_fs_context = bpf_init_fs_context,
 	.parameters	= bpf_fs_parameters,
-	.kill_sb	= kill_litter_super,
+	.kill_sb	= bpf_kill_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 static int __init bpf_init(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-08 14:28   ` Christian Brauner
  2023-11-03 19:05 ` [PATCH v9 bpf-next 04/17] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while have a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts
a path specification (using the usual fd + string path combo) to a BPF
FS mount. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the creation
time or after the fact, allowing the process to guard itself further
from, e.g., unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token.

The alternative to creating BPF token object was:
  a) not having any extra object and just pasing BPF FS path to each
     relevant bpf() command. This seems suboptimal as it's racy (mount
     under the same path might change in between checking it and using it
     for bpf() command). And also less flexible if we'd like to further
     restrict ourselves compared to all the delegated functionality
     allowed on BPF FS.
  b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
     a dedicated FD that would represent a token-like functionality. This
     doesn't seem superior to having a proper bpf() command, so
     BPF_TOKEN_CREATE was chosen.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  41 +++++++
 include/uapi/linux/bpf.h       |  39 +++++++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/inode.c             |  17 ++-
 kernel/bpf/syscall.c           |  17 +++
 kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  39 +++++++
 7 files changed, 342 insertions(+), 10 deletions(-)
 create mode 100644 kernel/bpf/token.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index df50a7bf1a77..5ab418c78876 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -51,6 +51,10 @@ struct module;
 struct bpf_func_state;
 struct ftrace_ops;
 struct cgroup;
+struct bpf_token;
+struct user_namespace;
+struct super_block;
+struct inode;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
 	u64 delegate_attachs;
 };
 
+struct bpf_token {
+	struct work_struct work;
+	atomic64_t refcnt;
+	struct user_namespace *userns;
+	u64 allowed_cmds;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
@@ -2029,6 +2040,7 @@ static inline void bpf_enable_instrumentation(void)
 	migrate_enable();
 }
 
+extern const struct super_operations bpf_super_ops;
 extern const struct file_operations bpf_map_fops;
 extern const struct file_operations bpf_prog_fops;
 extern const struct file_operations bpf_iter_fops;
@@ -2163,6 +2175,8 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map)
 
 extern int sysctl_unprivileged_bpf_disabled;
 
+bool bpf_token_capable(const struct bpf_token *token, int cap);
+
 static inline bool bpf_allow_ptr_leaks(void)
 {
 	return perfmon_capable();
@@ -2197,8 +2211,17 @@ int bpf_link_new_fd(struct bpf_link *link);
 struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
 
+void bpf_token_inc(struct bpf_token *token);
+void bpf_token_put(struct bpf_token *token);
+int bpf_token_create(union bpf_attr *attr);
+struct bpf_token *bpf_token_get_from_fd(u32 ufd);
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
+struct inode *bpf_get_inode(struct super_block *sb, const struct inode *dir,
+			    umode_t mode);
 
 #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
 #define DEFINE_BPF_ITER_FUNC(target, args...)			\
@@ -2561,6 +2584,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
 	return -EOPNOTSUPP;
 }
 
+static inline bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
+
+static inline void bpf_token_inc(struct bpf_token *token)
+{
+}
+
+static inline void bpf_token_put(struct bpf_token *token)
+{
+}
+
+static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
 static inline void __dev_flush(void)
 {
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0f6cdf52b1da..f75f289f1749 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -847,6 +847,37 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		BPF FS mount is specified with openat()-style path FD + string.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +932,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1709,6 +1742,12 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_path_fd;
+		__aligned_u64	bpffs_pathname;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f526b7573e97..4ce95acfcaa7 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
 endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index e49e93bc65e3..3418a0edf7b4 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -99,9 +99,9 @@ static const struct inode_operations bpf_prog_iops = { };
 static const struct inode_operations bpf_map_iops  = { };
 static const struct inode_operations bpf_link_iops  = { };
 
-static struct inode *bpf_get_inode(struct super_block *sb,
-				   const struct inode *dir,
-				   umode_t mode)
+struct inode *bpf_get_inode(struct super_block *sb,
+			    const struct inode *dir,
+			    umode_t mode)
 {
 	struct inode *inode;
 
@@ -602,11 +602,13 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
 	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
+	u64 mask;
 
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
 
-	if (opts->delegate_cmds == ~0ULL)
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((opts->delegate_cmds & mask) == mask)
 		seq_printf(m, ",delegate_cmds=any");
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
@@ -639,7 +641,7 @@ static void bpf_free_inode(struct inode *inode)
 	free_inode_nonrcu(inode);
 }
 
-static const struct super_operations bpf_super_ops = {
+const struct super_operations bpf_super_ops = {
 	.statfs		= simple_statfs,
 	.drop_inode	= generic_delete_inode,
 	.show_options	= bpf_show_options,
@@ -814,10 +816,7 @@ static int bpf_get_tree(struct fs_context *fc)
 
 static void bpf_free_fc(struct fs_context *fc)
 {
-	struct bpf_mount_opts *opts = fc->s_fs_info;
-
-	if (opts)
-		kfree(opts);
+	kfree(fc->s_fs_info);
 }
 
 static const struct fs_context_operations bpf_context_ops = {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ad4d8e433ccc..0565e46509a6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5346,6 +5346,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 	return ret;
 }
 
+#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_pathname
+
+static int token_create(union bpf_attr *attr)
+{
+	if (CHECK_ATTR(BPF_TOKEN_CREATE))
+		return -EINVAL;
+
+	/* no flags are supported yet */
+	if (attr->token_create.flags)
+		return -EINVAL;
+
+	return bpf_token_create(attr);
+}
+
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
@@ -5479,6 +5493,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 	case BPF_PROG_BIND_MAP:
 		err = bpf_prog_bind_map(&attr);
 		break;
+	case BPF_TOKEN_CREATE:
+		err = token_create(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
new file mode 100644
index 000000000000..32fa395f829b
--- /dev/null
+++ b/kernel/bpf/token.c
@@ -0,0 +1,197 @@
+#include <linux/bpf.h>
+#include <linux/vmalloc.h>
+#include <linux/fdtable.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/idr.h>
+#include <linux/namei.h>
+#include <linux/user_namespace.h>
+
+bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	/* BPF token allows ns_capable() level of capabilities */
+	if (token) {
+		if (ns_capable(token->userns, cap))
+			return true;
+		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
+			return true;
+	}
+	/* otherwise fallback to capable() checks */
+	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
+
+void bpf_token_inc(struct bpf_token *token)
+{
+	atomic64_inc(&token->refcnt);
+}
+
+static void bpf_token_free(struct bpf_token *token)
+{
+	put_user_ns(token->userns);
+	kvfree(token);
+}
+
+static void bpf_token_put_deferred(struct work_struct *work)
+{
+	struct bpf_token *token = container_of(work, struct bpf_token, work);
+
+	bpf_token_free(token);
+}
+
+void bpf_token_put(struct bpf_token *token)
+{
+	if (!token)
+		return;
+
+	if (!atomic64_dec_and_test(&token->refcnt))
+		return;
+
+	INIT_WORK(&token->work, bpf_token_put_deferred);
+	schedule_work(&token->work);
+}
+
+static int bpf_token_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+
+	bpf_token_put(token);
+	return 0;
+}
+
+static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+	u64 mask;
+
+	BUILD_BUG_ON(__MAX_BPF_CMD >= 64);
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((token->allowed_cmds & mask) == mask)
+		seq_printf(m, "allowed_cmds:\tany\n");
+	else
+		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+}
+
+#define BPF_TOKEN_INODE_NAME "bpf-token"
+
+static const struct inode_operations bpf_token_iops = { };
+
+static const struct file_operations bpf_token_fops = {
+	.release	= bpf_token_release,
+	.show_fdinfo	= bpf_token_show_fdinfo,
+};
+
+int bpf_token_create(union bpf_attr *attr)
+{
+	struct bpf_mount_opts *mnt_opts;
+	struct bpf_token *token = NULL;
+	struct user_namespace *userns;
+	struct inode *inode;
+	struct file *file;
+	struct path path;
+	umode_t mode;
+	int err, fd;
+
+	err = user_path_at(attr->token_create.bpffs_path_fd,
+			   u64_to_user_ptr(attr->token_create.bpffs_pathname),
+			   LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
+	if (err)
+		return err;
+
+	if (path.mnt->mnt_root != path.dentry) {
+		err = -EINVAL;
+		goto out_path;
+	}
+	if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
+		err = -EINVAL;
+		goto out_path;
+	}
+	err = path_permission(&path, MAY_ACCESS);
+	if (err)
+		goto out_path;
+
+	userns = path.dentry->d_sb->s_user_ns;
+	if (!ns_capable(userns, CAP_BPF)) {
+		err = -EPERM;
+		goto out_path;
+	}
+
+	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
+	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out_path;
+	}
+
+	inode->i_op = &bpf_token_iops;
+	inode->i_fop = &bpf_token_fops;
+	clear_nlink(inode); /* make sure it is unlinked */
+
+	file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
+	if (IS_ERR(file)) {
+		iput(inode);
+		err = PTR_ERR(file);
+		goto out_path;
+	}
+
+	token = kvzalloc(sizeof(*token), GFP_USER);
+	if (!token) {
+		err = -ENOMEM;
+		goto out_file;
+	}
+
+	atomic64_set(&token->refcnt, 1);
+
+	/* remember bpffs owning userns for future ns_capable() checks */
+	token->userns = get_user_ns(userns);
+
+	mnt_opts = path.dentry->d_sb->s_fs_info;
+	token->allowed_cmds = mnt_opts->delegate_cmds;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		err = fd;
+		goto out_token;
+	}
+
+	file->private_data = token;
+	fd_install(fd, file);
+
+	path_put(&path);
+	return fd;
+
+out_token:
+	bpf_token_free(token);
+out_file:
+	fput(file);
+out_path:
+	path_put(&path);
+	return err;
+}
+
+struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_token *token;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+	if (f.file->f_op != &bpf_token_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	token = f.file->private_data;
+	bpf_token_inc(token);
+	fdput(f);
+
+	return token;
+}
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	if (!token)
+		return false;
+
+	return token->allowed_cmds & (1ULL << cmd);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0f6cdf52b1da..f75f289f1749 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -847,6 +847,37 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		BPF FS mount is specified with openat()-style path FD + string.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +932,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1709,6 +1742,12 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_path_fd;
+		__aligned_u64	bpffs_pathname;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 04/17] bpf: add BPF token support to BPF_MAP_CREATE command
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 05/17] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  2 +
 include/uapi/linux/bpf.h                      |  2 +
 kernel/bpf/inode.c                            |  3 +-
 kernel/bpf/syscall.c                          | 52 ++++++++++++++-----
 kernel/bpf/token.c                            | 16 ++++++
 tools/include/uapi/linux/bpf.h                |  2 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 ++
 8 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5ab418c78876..0d8d88c1939c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1581,6 +1581,7 @@ struct bpf_token {
 	atomic64_t refcnt;
 	struct user_namespace *userns;
 	u64 allowed_cmds;
+	u64 allowed_maps;
 };
 
 struct bpf_struct_ops_value;
@@ -2217,6 +2218,7 @@ int bpf_token_create(union bpf_attr *attr);
 struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f75f289f1749..0f413e4d4771 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -984,6 +984,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1431,6 +1432,7 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 3418a0edf7b4..9c4e30cd4f2e 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -613,7 +613,8 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
 
-	if (opts->delegate_maps == ~0ULL)
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((opts->delegate_maps & mask) == mask)
 		seq_printf(m, ",delegate_maps=any");
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0565e46509a6..99085f2a558e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -984,8 +984,8 @@ int map_check_no_btf(const struct bpf_map *map,
 	return -ENOTSUPP;
 }
 
-static int map_check_btf(struct bpf_map *map, const struct btf *btf,
-			 u32 btf_key_id, u32 btf_value_id)
+static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
+			 const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
 {
 	const struct btf_type *key_type, *value_type;
 	u32 key_size, value_size;
@@ -1013,7 +1013,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
 
-		if (!bpf_capable()) {
+		if (!bpf_token_capable(token, CAP_BPF)) {
 			ret = -EPERM;
 			goto free_map_tab;
 		}
@@ -1101,11 +1101,12 @@ static bool bpf_net_capable(void)
 	return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD map_extra
+#define BPF_MAP_CREATE_LAST_FIELD map_token_fd
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
 	const struct bpf_map_ops *ops;
+	struct bpf_token *token = NULL;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 map_type = attr->map_type;
 	struct bpf_map *map;
@@ -1156,14 +1157,32 @@ static int map_create(union bpf_attr *attr)
 	if (!ops->map_mem_usage)
 		return -EINVAL;
 
+	if (attr->map_token_fd) {
+		token = bpf_token_get_from_fd(attr->map_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+
+		/* if current token doesn't grant map creation permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) ||
+		    !bpf_token_allow_map_type(token, attr->map_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	err = -EPERM;
+
 	/* Intent here is for unprivileged_bpf_disabled to block BPF map
 	 * creation for unprivileged users; other actions depend
 	 * on fd availability and access to bpffs, so are dependent on
 	 * object creation success. Even with unprivileged BPF disabled,
 	 * capability checks are still carried out.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF))
+		goto put_token;
 
 	/* check privileged map type permissions */
 	switch (map_type) {
@@ -1196,25 +1215,27 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
 	case BPF_MAP_TYPE_STRUCT_OPS:
 	case BPF_MAP_TYPE_CPUMAP:
-		if (!bpf_capable())
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_BPF))
+			goto put_token;
 		break;
 	case BPF_MAP_TYPE_SOCKMAP:
 	case BPF_MAP_TYPE_SOCKHASH:
 	case BPF_MAP_TYPE_DEVMAP:
 	case BPF_MAP_TYPE_DEVMAP_HASH:
 	case BPF_MAP_TYPE_XSKMAP:
-		if (!bpf_net_capable())
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_NET_ADMIN))
+			goto put_token;
 		break;
 	default:
 		WARN(1, "unsupported map type %d", map_type);
-		return -EPERM;
+		goto put_token;
 	}
 
 	map = ops->map_alloc(attr);
-	if (IS_ERR(map))
-		return PTR_ERR(map);
+	if (IS_ERR(map)) {
+		err = PTR_ERR(map);
+		goto put_token;
+	}
 	map->ops = ops;
 	map->map_type = map_type;
 
@@ -1251,7 +1272,7 @@ static int map_create(union bpf_attr *attr)
 		map->btf = btf;
 
 		if (attr->btf_value_type_id) {
-			err = map_check_btf(map, btf, attr->btf_key_type_id,
+			err = map_check_btf(map, token, btf, attr->btf_key_type_id,
 					    attr->btf_value_type_id);
 			if (err)
 				goto free_map;
@@ -1272,6 +1293,7 @@ static int map_create(union bpf_attr *attr)
 		goto free_map_sec;
 
 	bpf_map_save_memcg(map);
+	bpf_token_put(token);
 
 	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
@@ -1292,6 +1314,8 @@ static int map_create(union bpf_attr *attr)
 free_map:
 	btf_put(map->btf);
 	map->ops->map_free(map);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 32fa395f829b..c6eb25fcc308 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -70,6 +70,13 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_cmds:\tany\n");
 	else
 		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+
+	BUILD_BUG_ON(__MAX_BPF_MAP_TYPE >= 64);
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((token->allowed_maps & mask) == mask)
+		seq_printf(m, "allowed_maps:\tany\n");
+	else
+		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
 }
 
 #define BPF_TOKEN_INODE_NAME "bpf-token"
@@ -147,6 +154,7 @@ int bpf_token_create(union bpf_attr *attr)
 
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
+	token->allowed_maps = mnt_opts->delegate_maps;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
@@ -195,3 +203,11 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
 
 	return token->allowed_cmds & (1ULL << cmd);
 }
+
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
+{
+	if (!token || type >= __MAX_BPF_MAP_TYPE)
+		return false;
+
+	return token->allowed_maps & (1ULL << type);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f75f289f1749..0f413e4d4771 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -984,6 +984,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1431,6 +1432,7 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 9f766ddd946a..573249a2814d 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -68,6 +68,8 @@ void test_libbpf_probe_map_types(void)
 
 		if (map_type == BPF_MAP_TYPE_UNSPEC)
 			continue;
+		if (strcmp(map_type_name, "__MAX_BPF_MAP_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(map_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index c440ea3311ed..2a0633f43c73 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -132,6 +132,9 @@ static void test_libbpf_bpf_map_type_str(void)
 		const char *map_type_str;
 		char buf[256];
 
+		if (map_type == __MAX_BPF_MAP_TYPE)
+			continue;
+
 		map_type_name = btf__str_by_offset(btf, e->name_off);
 		map_type_str = libbpf_bpf_map_type_str(map_type);
 		ASSERT_OK_PTR(map_type_str, map_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 05/17] bpf: add BPF token support to BPF_BTF_LOAD command
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 04/17] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 06/17] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BTF loading is a pretty straightforward
operation, so as long as BPF token is created with allow_cmds granting
BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating
BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/uapi/linux/bpf.h       |  1 +
 kernel/bpf/syscall.c           | 20 ++++++++++++++++++--
 tools/include/uapi/linux/bpf.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0f413e4d4771..022c2ef4d0eb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1614,6 +1614,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_token_fd;
 	};
 
 	struct {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 99085f2a558e..6158ee3f61bb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4751,15 +4751,31 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 	return err;
 }
 
-#define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size
+#define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
 
 static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
 {
+	struct bpf_token *token = NULL;
+
 	if (CHECK_ATTR(BPF_BTF_LOAD))
 		return -EINVAL;
 
-	if (!bpf_capable())
+	if (attr->btf_token_fd) {
+		token = bpf_token_get_from_fd(attr->btf_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		if (!bpf_token_allow_cmd(token, BPF_BTF_LOAD)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	if (!bpf_token_capable(token, CAP_BPF)) {
+		bpf_token_put(token);
 		return -EPERM;
+	}
+
+	bpf_token_put(token);
 
 	return btf_new_fd(attr, uattr, uattr_size);
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0f413e4d4771..022c2ef4d0eb 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1614,6 +1614,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_token_fd;
 	};
 
 	struct {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 06/17] bpf: add BPF token support to BPF_PROG_LOAD command
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 05/17] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 07/17] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
allowed BPF program types and attach types, derived from BPF FS at BPF
token creation time. Then make sure we perform bpf_token_capable()
checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  6 ++
 include/uapi/linux/bpf.h                      |  2 +
 kernel/bpf/core.c                             |  1 +
 kernel/bpf/inode.c                            |  6 +-
 kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
 kernel/bpf/token.c                            | 27 ++++++
 tools/include/uapi/linux/bpf.h                |  2 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 +
 9 files changed, 110 insertions(+), 26 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0d8d88c1939c..048d115db3e8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1442,6 +1442,7 @@ struct bpf_prog_aux {
 #ifdef CONFIG_SECURITY
 	void *security;
 #endif
+	struct bpf_token *token;
 	struct bpf_prog_offload *offload;
 	struct btf *btf;
 	struct bpf_func_info *func_info;
@@ -1582,6 +1583,8 @@ struct bpf_token {
 	struct user_namespace *userns;
 	u64 allowed_cmds;
 	u64 allowed_maps;
+	u64 allowed_progs;
+	u64 allowed_attachs;
 };
 
 struct bpf_struct_ops_value;
@@ -2219,6 +2222,9 @@ struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
 bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 022c2ef4d0eb..97b451e9d7e6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1029,6 +1029,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1502,6 +1503,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		__u32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 08626b519ce2..fc8de25b7948 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2747,6 +2747,7 @@ void bpf_prog_free(struct bpf_prog *fp)
 
 	if (aux->dst_prog)
 		bpf_prog_put(aux->dst_prog);
+	bpf_token_put(aux->token);
 	INIT_WORK(&aux->work, bpf_prog_free_deferred);
 	schedule_work(&aux->work);
 }
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 9c4e30cd4f2e..8c6182488325 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -619,12 +619,14 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
 
-	if (opts->delegate_progs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((opts->delegate_progs & mask) == mask)
 		seq_printf(m, ",delegate_progs=any");
 	else if (opts->delegate_progs)
 		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
 
-	if (opts->delegate_attachs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((opts->delegate_attachs & mask) == mask)
 		seq_printf(m, ",delegate_attachs=any");
 	else if (opts->delegate_attachs)
 		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6158ee3f61bb..fcb867bbc517 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2583,13 +2583,15 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 }
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD log_true_size
+#define BPF_PROG_LOAD_LAST_FIELD prog_token_fd
 
 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 {
 	enum bpf_prog_type type = attr->prog_type;
 	struct bpf_prog *prog, *dst_prog = NULL;
 	struct btf *attach_btf = NULL;
+	struct bpf_token *token = NULL;
+	bool bpf_cap;
 	int err;
 	char license[128];
 
@@ -2605,10 +2607,31 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_XDP_DEV_BOUND_ONLY))
 		return -EINVAL;
 
+	bpf_prog_load_fixup_attach_type(attr);
+
+	if (attr->prog_token_fd) {
+		token = bpf_token_get_from_fd(attr->prog_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		/* if current token doesn't grant prog loading permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_PROG_LOAD) ||
+		    !bpf_token_allow_prog_type(token, attr->prog_type,
+					       attr->expected_attach_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	bpf_cap = bpf_token_capable(token, CAP_BPF);
+	err = -EPERM;
+
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
 	/* Intent here is for unprivileged_bpf_disabled to block BPF program
 	 * creation for unprivileged users; other actions depend
@@ -2617,21 +2640,23 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 * capability checks are still carried out for these
 	 * and other operations.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_cap)
+		goto put_token;
 
 	if (attr->insn_cnt == 0 ||
-	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
-		return -E2BIG;
+	    attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) {
+		err = -E2BIG;
+		goto put_token;
+	}
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
-	if (is_net_admin_prog_type(type) && !bpf_net_capable())
-		return -EPERM;
-	if (is_perfmon_prog_type(type) && !perfmon_capable())
-		return -EPERM;
+	if (is_net_admin_prog_type(type) && !bpf_token_capable(token, CAP_NET_ADMIN))
+		goto put_token;
+	if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON))
+		goto put_token;
 
 	/* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog
 	 * or btf, we need to check which one it is
@@ -2641,27 +2666,33 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 		if (IS_ERR(dst_prog)) {
 			dst_prog = NULL;
 			attach_btf = btf_get_by_fd(attr->attach_btf_obj_fd);
-			if (IS_ERR(attach_btf))
-				return -EINVAL;
+			if (IS_ERR(attach_btf)) {
+				err = -EINVAL;
+				goto put_token;
+			}
 			if (!btf_is_kernel(attach_btf)) {
 				/* attaching through specifying bpf_prog's BTF
 				 * objects directly might be supported eventually
 				 */
 				btf_put(attach_btf);
-				return -ENOTSUPP;
+				err = -ENOTSUPP;
+				goto put_token;
 			}
 		}
 	} else if (attr->attach_btf_id) {
 		/* fall back to vmlinux BTF, if BTF type ID is specified */
 		attach_btf = bpf_get_btf_vmlinux();
-		if (IS_ERR(attach_btf))
-			return PTR_ERR(attach_btf);
-		if (!attach_btf)
-			return -EINVAL;
+		if (IS_ERR(attach_btf)) {
+			err = PTR_ERR(attach_btf);
+			goto put_token;
+		}
+		if (!attach_btf) {
+			err = -EINVAL;
+			goto put_token;
+		}
 		btf_get(attach_btf);
 	}
 
-	bpf_prog_load_fixup_attach_type(attr);
 	if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
 				       attach_btf, attr->attach_btf_id,
 				       dst_prog)) {
@@ -2669,7 +2700,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -EINVAL;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	/* plain bpf_prog allocation */
@@ -2679,7 +2711,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -ENOMEM;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	prog->expected_attach_type = attr->expected_attach_type;
@@ -2690,6 +2723,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
 
+	/* move token into prog->aux, reuse taken refcnt */
+	prog->aux->token = token;
+	token = NULL;
+
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
 		goto free_prog;
@@ -2791,6 +2828,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 	bpf_prog_free(prog);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
@@ -3780,7 +3819,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
-		if (!bpf_net_capable())
+		if (!bpf_token_capable(prog->aux->token, CAP_NET_ADMIN))
 			/* cg-skb progs can be loaded by unpriv user.
 			 * check permissions at attach time.
 			 */
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index c6eb25fcc308..35e6f55c2a41 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -77,6 +77,20 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_maps:\tany\n");
 	else
 		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
+
+	BUILD_BUG_ON(__MAX_BPF_PROG_TYPE >= 64);
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((token->allowed_progs & mask) == mask)
+		seq_printf(m, "allowed_progs:\tany\n");
+	else
+		seq_printf(m, "allowed_progs:\t0x%llx\n", token->allowed_progs);
+
+	BUILD_BUG_ON(__MAX_BPF_ATTACH_TYPE >= 64);
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((token->allowed_attachs & mask) == mask)
+		seq_printf(m, "allowed_attachs:\tany\n");
+	else
+		seq_printf(m, "allowed_attachs:\t0x%llx\n", token->allowed_attachs);
 }
 
 #define BPF_TOKEN_INODE_NAME "bpf-token"
@@ -155,6 +169,8 @@ int bpf_token_create(union bpf_attr *attr)
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
 	token->allowed_maps = mnt_opts->delegate_maps;
+	token->allowed_progs = mnt_opts->delegate_progs;
+	token->allowed_attachs = mnt_opts->delegate_attachs;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
@@ -211,3 +227,14 @@ bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type t
 
 	return token->allowed_maps & (1ULL << type);
 }
+
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type)
+{
+	if (!token || prog_type >= __MAX_BPF_PROG_TYPE || attach_type >= __MAX_BPF_ATTACH_TYPE)
+		return false;
+
+	return (token->allowed_progs & (1ULL << prog_type)) &&
+	       (token->allowed_attachs & (1ULL << attach_type));
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 022c2ef4d0eb..97b451e9d7e6 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1029,6 +1029,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1502,6 +1503,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		__u32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 573249a2814d..4ed46ed58a7b 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -30,6 +30,8 @@ void test_libbpf_probe_prog_types(void)
 
 		if (prog_type == BPF_PROG_TYPE_UNSPEC)
 			continue;
+		if (strcmp(prog_type_name, "__MAX_BPF_PROG_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(prog_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index 2a0633f43c73..384bc1f7a65e 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -189,6 +189,9 @@ static void test_libbpf_bpf_prog_type_str(void)
 		const char *prog_type_str;
 		char buf[256];
 
+		if (prog_type == __MAX_BPF_PROG_TYPE)
+			continue;
+
 		prog_type_name = btf__str_by_offset(btf, e->name_off);
 		prog_type_str = libbpf_bpf_prog_type_str(prog_type);
 		ASSERT_OK_PTR(prog_type_str, prog_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 07/17] bpf: take into account BPF token when fetching helper protos
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 06/17] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 08/17] bpf: consistently use BPF token throughout BPF verifier logic Andrii Nakryiko
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Instead of performing unconditional system-wide bpf_capable() and
perfmon_capable() calls inside bpf_base_func_proto() function (and other
similar ones) to determine eligibility of a given BPF helper for a given
program, use previously recorded BPF token during BPF_PROG_LOAD command
handling to inform the decision.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 drivers/media/rc/bpf-lirc.c |  2 +-
 include/linux/bpf.h         |  5 +++--
 kernel/bpf/cgroup.c         |  6 +++---
 kernel/bpf/helpers.c        |  6 +++---
 kernel/bpf/syscall.c        |  5 +++--
 kernel/trace/bpf_trace.c    |  2 +-
 net/core/filter.c           | 32 ++++++++++++++++----------------
 net/ipv4/bpf_tcp_ca.c       |  2 +-
 net/netfilter/nf_bpf_link.c |  2 +-
 9 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index fe17c7f98e81..6d07693c6b9f 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_trace_printk:
-		if (perfmon_capable())
+		if (bpf_token_capable(prog->aux->token, CAP_PERFMON))
 			return bpf_get_trace_printk_proto();
 		fallthrough;
 	default:
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 048d115db3e8..fae4f780fd82 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2473,7 +2473,8 @@ const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type
 struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
 
-const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
+const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
+						 const struct bpf_prog *prog);
 void bpf_task_storage_free(struct task_struct *task);
 void bpf_cgrp_storage_free(struct cgroup *cgroup);
 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
@@ -2733,7 +2734,7 @@ static inline int btf_struct_access(struct bpf_verifier_log *log,
 }
 
 static inline const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	return NULL;
 }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 491d20038cbe..98e0e3835b28 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1630,7 +1630,7 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2191,7 +2191,7 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2348,7 +2348,7 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e46ac288a108..6d81ee27f972 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1669,7 +1669,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
 const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
 
 const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -1720,7 +1720,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!bpf_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 		return NULL;
 
 	switch (func_id) {
@@ -1778,7 +1778,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	switch (func_id) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fcb867bbc517..a48d3d5f83ea 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5681,7 +5681,7 @@ static const struct bpf_func_proto bpf_sys_bpf_proto = {
 const struct bpf_func_proto * __weak
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 BPF_CALL_1(bpf_sys_close, u32, fd)
@@ -5731,7 +5731,8 @@ syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_sys_bpf:
-		return !perfmon_capable() ? NULL : &bpf_sys_bpf_proto;
+		return !bpf_token_capable(prog->aux->token, CAP_PERFMON)
+		       ? NULL : &bpf_sys_bpf_proto;
 	case BPF_FUNC_btf_find_by_name_kind:
 		return &bpf_btf_find_by_name_kind_proto;
 	case BPF_FUNC_sys_close:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index df697c74d519..ef22a2e870b3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1557,7 +1557,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 21d75108c2e9..cd2ef9fb4f1e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -86,7 +86,7 @@
 #include "dev.h"
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id);
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 
 int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len)
 {
@@ -7839,7 +7839,7 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7932,7 +7932,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 			return NULL;
 		}
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7951,7 +7951,7 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_skb_event_output_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8138,7 +8138,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8197,7 +8197,7 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 
 #if IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)
@@ -8258,7 +8258,7 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_sock_proto;
 #endif /* CONFIG_INET */
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8300,7 +8300,7 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_cgroup_classid_curr_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8344,7 +8344,7 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_skc_lookup_tcp_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8355,7 +8355,7 @@ flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_load_bytes:
 		return &bpf_flow_dissector_load_bytes_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8382,7 +8382,7 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11213,7 +11213,7 @@ sk_reuseport_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11395,7 +11395,7 @@ sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_sk_release:
 		return &bpf_sk_release_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11729,7 +11729,7 @@ const struct bpf_func_proto bpf_sock_from_file_proto = {
 };
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id)
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	const struct bpf_func_proto *func;
 
@@ -11758,10 +11758,10 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	return func;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 39dcccf0f174..c7bbd8f3c708 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -191,7 +191,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c
index e502ec00b2fe..1969facac91c 100644
--- a/net/netfilter/nf_bpf_link.c
+++ b/net/netfilter/nf_bpf_link.c
@@ -314,7 +314,7 @@ static bool nf_is_valid_access(int off, int size, enum bpf_access_type type,
 static const struct bpf_func_proto *
 bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 const struct bpf_verifier_ops netfilter_verifier_ops = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 08/17] bpf: consistently use BPF token throughout BPF verifier logic
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 07/17] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 09/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Remove remaining direct queries to perfmon_capable() and bpf_capable()
in BPF verifier logic and instead use BPF token (if available) to make
decisions about privileges.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h    | 16 ++++++++--------
 include/linux/filter.h |  2 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/core.c      |  2 +-
 kernel/bpf/verifier.c  | 13 ++++++-------
 net/core/filter.c      |  4 ++--
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fae4f780fd82..975aa7790d51 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2181,24 +2181,24 @@ extern int sysctl_unprivileged_bpf_disabled;
 
 bool bpf_token_capable(const struct bpf_token *token, int cap);
 
-static inline bool bpf_allow_ptr_leaks(void)
+static inline bool bpf_allow_ptr_leaks(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_allow_uninit_stack(void)
+static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v1(void)
+static inline bool bpf_bypass_spec_v1(const struct bpf_token *token)
 {
-	return cpu_mitigations_off() || perfmon_capable();
+	return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v4(void)
+static inline bool bpf_bypass_spec_v4(const struct bpf_token *token)
 {
-	return cpu_mitigations_off() || perfmon_capable();
+	return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
 }
 
 int bpf_map_new_fd(struct bpf_map *map, int flags);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a4953fafc8cb..14354605ad26 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1139,7 +1139,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
 		return false;
 	if (!bpf_jit_harden)
 		return false;
-	if (bpf_jit_harden == 1 && bpf_capable())
+	if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
 		return false;
 
 	return true;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..f0c64df6b6ff 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -82,7 +82,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 elem_size, index_mask, max_entries;
-	bool bypass_spec_v1 = bpf_bypass_spec_v1();
+	bool bypass_spec_v1 = bpf_bypass_spec_v1(NULL);
 	u64 array_size, mask64;
 	struct bpf_array *array;
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index fc8de25b7948..ce307440fa8d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -675,7 +675,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
 	if (!bpf_prog_kallsyms_candidate(fp) ||
-	    !bpf_capable())
+	    !bpf_token_capable(fp->aux->token, CAP_BPF))
 		return;
 
 	bpf_prog_ksym_set_addr(fp);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2197385d91dc..b70782dffb3e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -20655,7 +20655,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	env->prog = *prog;
 	env->ops = bpf_verifier_ops[env->prog->type];
 	env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
-	is_priv = bpf_capable();
+
+	env->allow_ptr_leaks = bpf_allow_ptr_leaks(env->prog->aux->token);
+	env->allow_uninit_stack = bpf_allow_uninit_stack(env->prog->aux->token);
+	env->bypass_spec_v1 = bpf_bypass_spec_v1(env->prog->aux->token);
+	env->bypass_spec_v4 = bpf_bypass_spec_v4(env->prog->aux->token);
+	env->bpf_capable = is_priv = bpf_token_capable(env->prog->aux->token, CAP_BPF);
 
 	bpf_get_btf_vmlinux();
 
@@ -20687,12 +20692,6 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
 		env->strict_alignment = false;
 
-	env->allow_ptr_leaks = bpf_allow_ptr_leaks();
-	env->allow_uninit_stack = bpf_allow_uninit_stack();
-	env->bypass_spec_v1 = bpf_bypass_spec_v1();
-	env->bypass_spec_v4 = bpf_bypass_spec_v4();
-	env->bpf_capable = bpf_capable();
-
 	if (is_priv)
 		env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index cd2ef9fb4f1e..a3de4a7d3733 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8557,7 +8557,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		return false;
 	case bpf_ctx_range(struct __sk_buff, data):
 	case bpf_ctx_range(struct __sk_buff, data_end):
-		if (!bpf_capable())
+		if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 			return false;
 		break;
 	}
@@ -8569,7 +8569,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
 			break;
 		case bpf_ctx_range(struct __sk_buff, tstamp):
-			if (!bpf_capable())
+			if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 				return false;
 			break;
 		default:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 09/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 08/17] bpf: consistently use BPF token throughout BPF verifier logic Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-06  5:01   ` [PATCH v9 9/17] " Paul Moore
  2023-11-03 19:05 ` [PATCH v9 bpf-next 10/17] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Based on upstream discussion ([0]), rework existing
bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
program struct. Also, we pass bpf_attr union with all the user-provided
arguments for BPF_PROG_LOAD command.  This will give LSMs as much
information as we can basically provide.

The hook is also BPF token-aware now, and optional bpf_token struct is
passed as a third argument. bpf_prog_load LSM hook is called after
a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
allocated and filled out, but right before performing full-fledged BPF
verification step.

bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
consistency. SELinux code is adjusted to all new names, types, and
signatures.

Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
used by some LSMs to allocate extra security blob, but also by other
LSMs to reject BPF program loading, we need to make sure that
bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
*even* if the hook itself returned error. If we don't do that, we run
the risk of leaking memory. This seems to be possible today when
combining SELinux and BPF LSM, as one example, depending on their
relative ordering.

Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
sleepable LSM hooks list, as they are both executed in sleepable
context. Also drop bpf_prog_load hook from untrusted, as there is no
issue with refcount or anything else anymore, that originally forced us
to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM
hook arguments as trusted"). We now trigger this hook much later and it
should not be an issue anymore.

  [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/lsm_hook_defs.h |  5 +++--
 include/linux/security.h      | 12 +++++++-----
 kernel/bpf/bpf_lsm.c          |  5 +++--
 kernel/bpf/syscall.c          | 25 +++++++++++++------------
 security/security.c           | 25 +++++++++++++++----------
 security/selinux/hooks.c      | 15 ++++++++-------
 6 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 99b8176c3738..085989e04209 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -400,8 +400,9 @@ LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
 LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
 LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
 LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
-LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
-LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free_security, struct bpf_prog_aux *aux)
+LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
+	 struct bpf_token *token)
+LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
 #endif /* CONFIG_BPF_SYSCALL */
 
 LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1d1df326c881..65467eef6678 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2020,15 +2020,16 @@ static inline void securityfs_remove(struct dentry *dentry)
 union bpf_attr;
 struct bpf_map;
 struct bpf_prog;
-struct bpf_prog_aux;
+struct bpf_token;
 #ifdef CONFIG_SECURITY
 extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
 extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
 extern int security_bpf_prog(struct bpf_prog *prog);
 extern int security_bpf_map_alloc(struct bpf_map *map);
 extern void security_bpf_map_free(struct bpf_map *map);
-extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
-extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
+extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+				  struct bpf_token *token);
+extern void security_bpf_prog_free(struct bpf_prog *prog);
 #else
 static inline int security_bpf(int cmd, union bpf_attr *attr,
 					     unsigned int size)
@@ -2054,12 +2055,13 @@ static inline int security_bpf_map_alloc(struct bpf_map *map)
 static inline void security_bpf_map_free(struct bpf_map *map)
 { }
 
-static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+					 struct bpf_token *token)
 {
 	return 0;
 }
 
-static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
+static inline void security_bpf_prog_free(struct bpf_prog *prog)
 { }
 #endif /* CONFIG_SECURITY */
 #endif /* CONFIG_BPF_SYSCALL */
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index e14c822f8911..3e956f6302f3 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -263,6 +263,8 @@ BTF_ID(func, bpf_lsm_bpf_map)
 BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
 BTF_ID(func, bpf_lsm_bpf_map_free_security)
 BTF_ID(func, bpf_lsm_bpf_prog)
+BTF_ID(func, bpf_lsm_bpf_prog_load)
+BTF_ID(func, bpf_lsm_bpf_prog_free)
 BTF_ID(func, bpf_lsm_bprm_check_security)
 BTF_ID(func, bpf_lsm_bprm_committed_creds)
 BTF_ID(func, bpf_lsm_bprm_committing_creds)
@@ -346,8 +348,7 @@ BTF_SET_END(sleepable_lsm_hooks)
 
 BTF_SET_START(untrusted_lsm_hooks)
 BTF_ID(func, bpf_lsm_bpf_map_free_security)
-BTF_ID(func, bpf_lsm_bpf_prog_alloc_security)
-BTF_ID(func, bpf_lsm_bpf_prog_free_security)
+BTF_ID(func, bpf_lsm_bpf_prog_free)
 BTF_ID(func, bpf_lsm_file_alloc_security)
 BTF_ID(func, bpf_lsm_file_free_security)
 #ifdef CONFIG_SECURITY_NETWORK
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a48d3d5f83ea..a20befd62281 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2137,7 +2137,7 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 	kvfree(aux->func_info);
 	kfree(aux->func_info_aux);
 	free_uid(aux->user);
-	security_bpf_prog_free(aux);
+	security_bpf_prog_free(aux->prog);
 	bpf_prog_free(aux->prog);
 }
 
@@ -2727,10 +2727,6 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->aux->token = token;
 	token = NULL;
 
-	err = security_bpf_prog_alloc(prog->aux);
-	if (err)
-		goto free_prog;
-
 	prog->aux->user = get_current_user();
 	prog->len = attr->insn_cnt;
 
@@ -2738,12 +2734,12 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (copy_from_bpfptr(prog->insns,
 			     make_bpfptr(attr->insns, uattr.is_kernel),
 			     bpf_prog_insn_size(prog)) != 0)
-		goto free_prog_sec;
+		goto free_prog;
 	/* copy eBPF program license from user space */
 	if (strncpy_from_bpfptr(license,
 				make_bpfptr(attr->license, uattr.is_kernel),
 				sizeof(license) - 1) < 0)
-		goto free_prog_sec;
+		goto free_prog;
 	license[sizeof(license) - 1] = 0;
 
 	/* eBPF programs must be GPL compatible to use GPL-ed functions */
@@ -2757,25 +2753,29 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (bpf_prog_is_dev_bound(prog->aux)) {
 		err = bpf_prog_dev_bound_init(prog, attr);
 		if (err)
-			goto free_prog_sec;
+			goto free_prog;
 	}
 
 	if (type == BPF_PROG_TYPE_EXT && dst_prog &&
 	    bpf_prog_is_dev_bound(dst_prog->aux)) {
 		err = bpf_prog_dev_bound_inherit(prog, dst_prog);
 		if (err)
-			goto free_prog_sec;
+			goto free_prog;
 	}
 
 	/* find program type: socket_filter vs tracing_filter */
 	err = find_prog_type(type, prog);
 	if (err < 0)
-		goto free_prog_sec;
+		goto free_prog;
 
 	prog->aux->load_time = ktime_get_boottime_ns();
 	err = bpf_obj_name_cpy(prog->aux->name, attr->prog_name,
 			       sizeof(attr->prog_name));
 	if (err < 0)
+		goto free_prog;
+
+	err = security_bpf_prog_load(prog, attr, token);
+	if (err)
 		goto free_prog_sec;
 
 	/* run eBPF verifier */
@@ -2821,10 +2821,11 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 */
 	__bpf_prog_put_noref(prog, prog->aux->real_func_cnt);
 	return err;
+
 free_prog_sec:
-	free_uid(prog->aux->user);
-	security_bpf_prog_free(prog->aux);
+	security_bpf_prog_free(prog);
 free_prog:
+	free_uid(prog->aux->user);
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 	bpf_prog_free(prog);
diff --git a/security/security.c b/security/security.c
index dcb3e7014f9b..5773d446210e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5180,16 +5180,21 @@ int security_bpf_map_alloc(struct bpf_map *map)
 }
 
 /**
- * security_bpf_prog_alloc() - Allocate a bpf program LSM blob
- * @aux: bpf program aux info struct
+ * security_bpf_prog_load() - Check if loading of BPF program is allowed
+ * @prog: BPF program object
+ * @attr: BPF syscall attributes used to create BPF program
+ * @token: BPF token used to grant user access to BPF subsystem
  *
- * Initialize the security field inside bpf program.
+ * Do a check when the kernel allocates BPF program object and is about to
+ * pass it to BPF verifier for additional correctness checks. This is also the
+ * point where LSM blob is allocated for LSMs that need them.
  *
  * Return: Returns 0 on success, error on failure.
  */
-int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+			   struct bpf_token *token)
 {
-	return call_int_hook(bpf_prog_alloc_security, 0, aux);
+	return call_int_hook(bpf_prog_load, 0, prog, attr, token);
 }
 
 /**
@@ -5204,14 +5209,14 @@ void security_bpf_map_free(struct bpf_map *map)
 }
 
 /**
- * security_bpf_prog_free() - Free a bpf program's LSM blob
- * @aux: bpf program aux info struct
+ * security_bpf_prog_free() - Free a BPF program's LSM blob
+ * @prog: BPF program struct
  *
- * Clean up the security information stored inside bpf prog.
+ * Clean up the security information stored inside BPF program.
  */
-void security_bpf_prog_free(struct bpf_prog_aux *aux)
+void security_bpf_prog_free(struct bpf_prog *prog)
 {
-	call_void_hook(bpf_prog_free_security, aux);
+	call_void_hook(bpf_prog_free, prog);
 }
 #endif /* CONFIG_BPF_SYSCALL */
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index feda711c6b7b..eabee39e983c 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6805,7 +6805,8 @@ static void selinux_bpf_map_free(struct bpf_map *map)
 	kfree(bpfsec);
 }
 
-static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
+static int selinux_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+				 struct bpf_token *token)
 {
 	struct bpf_security_struct *bpfsec;
 
@@ -6814,16 +6815,16 @@ static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
 		return -ENOMEM;
 
 	bpfsec->sid = current_sid();
-	aux->security = bpfsec;
+	prog->aux->security = bpfsec;
 
 	return 0;
 }
 
-static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
+static void selinux_bpf_prog_free(struct bpf_prog *prog)
 {
-	struct bpf_security_struct *bpfsec = aux->security;
+	struct bpf_security_struct *bpfsec = prog->aux->security;
 
-	aux->security = NULL;
+	prog->aux->security = NULL;
 	kfree(bpfsec);
 }
 #endif
@@ -7180,7 +7181,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
 	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
 	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
-	LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
+	LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
 #endif
 
 #ifdef CONFIG_PERF_EVENTS
@@ -7238,7 +7239,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 #endif
 #ifdef CONFIG_BPF_SYSCALL
 	LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
-	LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
+	LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
 #endif
 #ifdef CONFIG_PERF_EVENTS
 	LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 10/17] bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM hooks
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 09/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-06  5:01   ` [PATCH v9 " Paul Moore
  2023-11-03 19:05 ` [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token " Andrii Nakryiko
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Similarly to bpf_prog_alloc LSM hook, rename and extend bpf_map_alloc
hook into bpf_map_create, taking not just struct bpf_map, but also
bpf_attr and bpf_token, to give a fuller context to LSMs.

Unlike bpf_prog_alloc, there is no need to move the hook around, as it
currently is firing right before allocating BPF map ID and FD, which
seems to be a sweet spot.

But like bpf_prog_alloc/bpf_prog_free combo, make sure that bpf_map_free
LSM hook is called even if bpf_map_create hook returned error, as if few
LSMs are combined together it could be that one LSM successfully
allocated security blob for its needs, while subsequent LSM rejected BPF
map creation. The former LSM would still need to free up LSM blob, so we
need to ensure security_bpf_map_free() is called regardless of the
outcome.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/lsm_hook_defs.h |  5 +++--
 include/linux/security.h      |  6 ++++--
 kernel/bpf/bpf_lsm.c          |  6 +++---
 kernel/bpf/syscall.c          |  4 ++--
 security/security.c           | 16 ++++++++++------
 security/selinux/hooks.c      |  7 ++++---
 6 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 085989e04209..795d3860c302 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -398,8 +398,9 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
 LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
 LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
 LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
-LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
-LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
+LSM_HOOK(int, 0, bpf_map_create, struct bpf_map *map, union bpf_attr *attr,
+	 struct bpf_token *token)
+LSM_HOOK(void, LSM_RET_VOID, bpf_map_free, struct bpf_map *map)
 LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
 	 struct bpf_token *token)
 LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
diff --git a/include/linux/security.h b/include/linux/security.h
index 65467eef6678..08fd777cbe94 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2025,7 +2025,8 @@ struct bpf_token;
 extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
 extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
 extern int security_bpf_prog(struct bpf_prog *prog);
-extern int security_bpf_map_alloc(struct bpf_map *map);
+extern int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+				   struct bpf_token *token);
 extern void security_bpf_map_free(struct bpf_map *map);
 extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
 				  struct bpf_token *token);
@@ -2047,7 +2048,8 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
 	return 0;
 }
 
-static inline int security_bpf_map_alloc(struct bpf_map *map)
+static inline int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+					  struct bpf_token *token)
 {
 	return 0;
 }
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 3e956f6302f3..9e4e615f11eb 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -260,8 +260,8 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 BTF_SET_START(sleepable_lsm_hooks)
 BTF_ID(func, bpf_lsm_bpf)
 BTF_ID(func, bpf_lsm_bpf_map)
-BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
-BTF_ID(func, bpf_lsm_bpf_map_free_security)
+BTF_ID(func, bpf_lsm_bpf_map_create)
+BTF_ID(func, bpf_lsm_bpf_map_free)
 BTF_ID(func, bpf_lsm_bpf_prog)
 BTF_ID(func, bpf_lsm_bpf_prog_load)
 BTF_ID(func, bpf_lsm_bpf_prog_free)
@@ -347,7 +347,7 @@ BTF_ID(func, bpf_lsm_userns_create)
 BTF_SET_END(sleepable_lsm_hooks)
 
 BTF_SET_START(untrusted_lsm_hooks)
-BTF_ID(func, bpf_lsm_bpf_map_free_security)
+BTF_ID(func, bpf_lsm_bpf_map_free)
 BTF_ID(func, bpf_lsm_bpf_prog_free)
 BTF_ID(func, bpf_lsm_file_alloc_security)
 BTF_ID(func, bpf_lsm_file_free_security)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a20befd62281..32e0d3feee4a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1284,9 +1284,9 @@ static int map_create(union bpf_attr *attr)
 			attr->btf_vmlinux_value_type_id;
 	}
 
-	err = security_bpf_map_alloc(map);
+	err = security_bpf_map_create(map, attr, token);
 	if (err)
-		goto free_map;
+		goto free_map_sec;
 
 	err = bpf_map_alloc_id(map);
 	if (err)
diff --git a/security/security.c b/security/security.c
index 5773d446210e..745edcd7b04b 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5167,16 +5167,20 @@ int security_bpf_prog(struct bpf_prog *prog)
 }
 
 /**
- * security_bpf_map_alloc() - Allocate a bpf map LSM blob
- * @map: bpf map
+ * security_bpf_map_create() - Check if BPF map creation is allowed
+ * @map: BPF map object
+ * @attr: BPF syscall attributes used to create BPF map
+ * @token: BPF token used to grant user access
  *
- * Initialize the security field inside bpf map.
+ * Do a check when the kernel creates a new BPF map. This is also the
+ * point where LSM blob is allocated for LSMs that need them.
  *
  * Return: Returns 0 on success, error on failure.
  */
-int security_bpf_map_alloc(struct bpf_map *map)
+int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+			    struct bpf_token *token)
 {
-	return call_int_hook(bpf_map_alloc_security, 0, map);
+	return call_int_hook(bpf_map_create, 0, map, attr, token);
 }
 
 /**
@@ -5205,7 +5209,7 @@ int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
  */
 void security_bpf_map_free(struct bpf_map *map)
 {
-	call_void_hook(bpf_map_free_security, map);
+	call_void_hook(bpf_map_free, map);
 }
 
 /**
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index eabee39e983c..002351ab67b7 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6783,7 +6783,8 @@ static int selinux_bpf_prog(struct bpf_prog *prog)
 			    BPF__PROG_RUN, NULL);
 }
 
-static int selinux_bpf_map_alloc(struct bpf_map *map)
+static int selinux_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+				  struct bpf_token *token)
 {
 	struct bpf_security_struct *bpfsec;
 
@@ -7180,7 +7181,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(bpf, selinux_bpf),
 	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
 	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
-	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
+	LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free),
 	LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
 #endif
 
@@ -7238,7 +7239,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(audit_rule_init, selinux_audit_rule_init),
 #endif
 #ifdef CONFIG_BPF_SYSCALL
-	LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
+	LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create),
 	LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
 #endif
 #ifdef CONFIG_PERF_EVENTS
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token LSM hooks
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 10/17] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-04  0:36   ` kernel test robot
  2023-11-06  5:01   ` [PATCH v9 " Paul Moore
  2023-11-03 19:05 ` [PATCH v9 bpf-next 12/17] libbpf: add bpf_token_create() API Andrii Nakryiko
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to
allocate LSM security blob (we add `void *security` field to struct
bpf_token for that), but also control who can instantiate BPF token.
This follows existing pattern for BPF map and BPF prog.

Also add security_bpf_token_allow_cmd() and security_bpf_token_capable()
LSM hooks that allow LSM implementation to control and negate (if
necessary) BPF token's delegation of a specific bpf_cmd and capability,
respectively.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h           |  3 ++
 include/linux/lsm_hook_defs.h |  5 +++
 include/linux/security.h      | 25 +++++++++++++++
 kernel/bpf/bpf_lsm.c          |  4 +++
 kernel/bpf/token.c            | 13 ++++++--
 security/security.c           | 60 +++++++++++++++++++++++++++++++++++
 6 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 975aa7790d51..8344c050e615 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1585,6 +1585,9 @@ struct bpf_token {
 	u64 allowed_maps;
 	u64 allowed_progs;
 	u64 allowed_attachs;
+#ifdef CONFIG_SECURITY
+	void *security;
+#endif
 };
 
 struct bpf_struct_ops_value;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 795d3860c302..da4c4ef94140 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -404,6 +404,11 @@ LSM_HOOK(void, LSM_RET_VOID, bpf_map_free, struct bpf_map *map)
 LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
 	 struct bpf_token *token)
 LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
+LSM_HOOK(int, 0, bpf_token_create, struct bpf_token *token, union bpf_attr *attr,
+	 struct path *path)
+LSM_HOOK(void, LSM_RET_VOID, bpf_token_free, struct bpf_token *token)
+LSM_HOOK(int, 0, bpf_token_allow_cmd, const struct bpf_token *token, enum bpf_cmd cmd)
+LSM_HOOK(int, 0, bpf_token_capable, const struct bpf_token *token, int cap)
 #endif /* CONFIG_BPF_SYSCALL */
 
 LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)
diff --git a/include/linux/security.h b/include/linux/security.h
index 08fd777cbe94..1d6edbf45d1c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -60,6 +60,7 @@ struct fs_parameter;
 enum fs_value_type;
 struct watch;
 struct watch_notification;
+enum bpf_cmd;
 
 /* Default (no) options for the capable function */
 #define CAP_OPT_NONE 0x0
@@ -2031,6 +2032,11 @@ extern void security_bpf_map_free(struct bpf_map *map);
 extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
 				  struct bpf_token *token);
 extern void security_bpf_prog_free(struct bpf_prog *prog);
+extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+				     struct path *path);
+extern void security_bpf_token_free(struct bpf_token *token);
+extern int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
 #else
 static inline int security_bpf(int cmd, union bpf_attr *attr,
 					     unsigned int size)
@@ -2065,6 +2071,25 @@ static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *
 
 static inline void security_bpf_prog_free(struct bpf_prog *prog)
 { }
+
+static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+				     struct path *path)
+{
+	return 0;
+}
+
+static inline void security_bpf_token_free(struct bpf_token *token)
+{ }
+
+static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	return 0;
+}
+
+static inline int security_bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return 0;
+}
 #endif /* CONFIG_SECURITY */
 #endif /* CONFIG_BPF_SYSCALL */
 
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 9e4e615f11eb..2b491e07485d 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -265,6 +265,10 @@ BTF_ID(func, bpf_lsm_bpf_map_free)
 BTF_ID(func, bpf_lsm_bpf_prog)
 BTF_ID(func, bpf_lsm_bpf_prog_load)
 BTF_ID(func, bpf_lsm_bpf_prog_free)
+BTF_ID(func, bpf_lsm_bpf_token_create)
+BTF_ID(func, bpf_lsm_bpf_token_free)
+BTF_ID(func, bpf_lsm_bpf_token_allow_cmd)
+BTF_ID(func, bpf_lsm_bpf_token_capable)
 BTF_ID(func, bpf_lsm_bprm_check_security)
 BTF_ID(func, bpf_lsm_bprm_committed_creds)
 BTF_ID(func, bpf_lsm_bprm_committing_creds)
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 35e6f55c2a41..5d04da54faea 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -7,11 +7,12 @@
 #include <linux/idr.h>
 #include <linux/namei.h>
 #include <linux/user_namespace.h>
+#include <linux/security.h>
 
 bool bpf_token_capable(const struct bpf_token *token, int cap)
 {
 	/* BPF token allows ns_capable() level of capabilities */
-	if (token) {
+	if (token && security_bpf_token_capable(token, cap) == 0) {
 		if (ns_capable(token->userns, cap))
 			return true;
 		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
@@ -28,6 +29,7 @@ void bpf_token_inc(struct bpf_token *token)
 
 static void bpf_token_free(struct bpf_token *token)
 {
+	security_bpf_token_free(token);
 	put_user_ns(token->userns);
 	kvfree(token);
 }
@@ -172,6 +174,10 @@ int bpf_token_create(union bpf_attr *attr)
 	token->allowed_progs = mnt_opts->delegate_progs;
 	token->allowed_attachs = mnt_opts->delegate_attachs;
 
+	err = security_bpf_token_create(token, attr, &path);
+	if (err)
+		goto out_token;
+
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
 		err = fd;
@@ -216,8 +222,9 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
 {
 	if (!token)
 		return false;
-
-	return token->allowed_cmds & (1ULL << cmd);
+	if (!(token->allowed_cmds & (1ULL << cmd)))
+		return false;
+	return security_bpf_token_allow_cmd(token, cmd) == 0;
 }
 
 bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
diff --git a/security/security.c b/security/security.c
index 745edcd7b04b..abcee81b9cea 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5201,6 +5201,55 @@ int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
 	return call_int_hook(bpf_prog_load, 0, prog, attr, token);
 }
 
+/**
+ * security_bpf_token_create() - Check if creating of BPF token is allowed
+ * @token: BPF token object
+ * @attr: BPF syscall attributes used to create BPF token
+ * @path: path pointing to BPF FS mount point from which BPF token is created
+ *
+ * Do a check when the kernel instantiates a new BPF token object from BPF FS
+ * instance. This is also the point where LSM blob can be allocated for LSMs.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+			      struct path *path)
+{
+	return call_int_hook(bpf_token_create, 0, token, attr, path);
+}
+
+/**
+ * security_bpf_token_allow_cmd() - Check if BPF token is allowed to delegate
+ * requested BPF syscall command
+ * @token: BPF token object
+ * @cmd: BPF syscall command requested to be delegated by BPF token
+ *
+ * Do a check when the kernel decides whether provided BPF token should allow
+ * delegation of requested BPF syscall command.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	return call_int_hook(bpf_token_allow_cmd, 0, token, cmd);
+}
+
+/**
+ * security_bpf_token_capable() - Check if BPF token is allowed to delegate
+ * requested BPF-related capability
+ * @token: BPF token object
+ * @cap: capabilities requested to be delegated by BPF token
+ *
+ * Do a check when the kernel decides whether provided BPF token should allow
+ * delegation of requested BPF-related capabilities.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return call_int_hook(bpf_token_capable, 0, token, cap);
+}
+
 /**
  * security_bpf_map_free() - Free a bpf map's LSM blob
  * @map: bpf map
@@ -5222,6 +5271,17 @@ void security_bpf_prog_free(struct bpf_prog *prog)
 {
 	call_void_hook(bpf_prog_free, prog);
 }
+
+/**
+ * security_bpf_token_free() - Free a BPF token's LSM blob
+ * @token: BPF token struct
+ *
+ * Clean up the security information stored inside BPF token.
+ */
+void security_bpf_token_free(struct bpf_token *token)
+{
+	call_void_hook(bpf_token_free, token);
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 12/17] libbpf: add bpf_token_create() API
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (10 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token " Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 13/17] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c      | 19 +++++++++++++++++++
 tools/lib/bpf/bpf.h      | 28 ++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 48 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9dc9625651dc..6064fe90af19 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1287,3 +1287,22 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
 	ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz);
 	return libbpf_err_errno(ret);
 }
+
+int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
+		     struct bpf_token_create_opts *opts)
+{
+	const size_t attr_sz = offsetofend(union bpf_attr, token_create);
+	union bpf_attr attr;
+	int fd;
+
+	if (!OPTS_VALID(opts, bpf_token_create_opts))
+		return libbpf_err(-EINVAL);
+
+	memset(&attr, 0, attr_sz);
+	attr.token_create.bpffs_path_fd = bpffs_path_fd;
+	attr.token_create.bpffs_pathname = ptr_to_u64(bpffs_pathname);
+	attr.token_create.flags = OPTS_GET(opts, flags, 0);
+
+	fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
+	return libbpf_err_errno(fd);
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index d0f53772bdc0..0c558550a3f2 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -640,6 +640,34 @@ struct bpf_test_run_opts {
 LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
 				      struct bpf_test_run_opts *opts);
 
+struct bpf_token_create_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
+	size_t :0;
+};
+#define bpf_token_create_opts__last_field flags
+
+/**
+ * @brief **bpf_token_create()** creates a new instance of BPF token derived
+ * from specified BPF FS mount point.
+ *
+ * BPF token created with this API can be passed to bpf() syscall for
+ * commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
+ *
+ * @param bpffs_path_fd O_PATH FD (see man 2 openat() for semantics) specifying,
+ * in combination with *bpffs_pathname*, BPF FS mount from which to derive
+ * a BPF token instance.
+ * @param pin_pathname absolute or relative path specifying,in combination
+ * with *bpffs_pathname*, BPF FS mount from which to derive a BPF token
+ * instance.
+ * @param opts optional BPF token creation options, can be NULL
+ *
+ * @return BPF token FD > 0, on success; negative error code, otherwise (errno
+ * is also set to the error code)
+ */
+LIBBPF_API int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
+				struct bpf_token_create_opts *opts);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b52dc28dc8af..8224738d2492 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -401,6 +401,7 @@ LIBBPF_1.3.0 {
 		bpf_program__attach_netkit;
 		bpf_program__attach_tcx;
 		bpf_program__attach_uprobe_multi;
+		bpf_token_create;
 		ring__avail_data_size;
 		ring__consume;
 		ring__consumer_pos;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 13/17] libbpf: add BPF token support to bpf_map_create() API
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (11 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 12/17] libbpf: add bpf_token_create() API Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 14/17] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 4 +++-
 tools/lib/bpf/bpf.h | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 6064fe90af19..91d45af3e9cd 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -169,7 +169,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 		   __u32 max_entries,
 		   const struct bpf_map_create_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
+	const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd);
 	union bpf_attr attr;
 	int fd;
 
@@ -198,6 +198,8 @@ int bpf_map_create(enum bpf_map_type map_type,
 	attr.numa_node = OPTS_GET(opts, numa_node, 0);
 	attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
 
+	attr.map_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
 	return libbpf_err_errno(fd);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 0c558550a3f2..26018aab839c 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -51,8 +51,11 @@ struct bpf_map_create_opts {
 
 	__u32 numa_node;
 	__u32 map_ifindex;
+
+	__u32 token_fd;
+	size_t :0;
 };
-#define bpf_map_create_opts__last_field map_ifindex
+#define bpf_map_create_opts__last_field token_fd
 
 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
 			      const char *map_name,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 14/17] libbpf: add BPF token support to bpf_btf_load() API
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (12 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 13/17] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 15/17] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 4 +++-
 tools/lib/bpf/bpf.h | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 91d45af3e9cd..4b15d20347ac 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1184,7 +1184,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
 
 int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, btf_log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd);
 	union bpf_attr attr;
 	char *log_buf;
 	size_t log_size;
@@ -1209,6 +1209,8 @@ int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts
 
 	attr.btf = ptr_to_u64(btf_data);
 	attr.btf_size = btf_size;
+	attr.btf_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	/* log_level == 0 and log_buf != NULL means "try loading without
 	 * log_buf, but retry with log_buf and log_level=1 on error", which is
 	 * consistent across low-level and high-level BTF and program loading
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 26018aab839c..323bc04d8082 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -133,9 +133,10 @@ struct bpf_btf_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_btf_load_opts__last_field log_true_size
+#define bpf_btf_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
 			    struct bpf_btf_load_opts *opts);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 15/17] libbpf: add BPF token support to bpf_prog_load() API
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (13 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 14/17] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 16/17] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 17/17] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 3 ++-
 tools/lib/bpf/bpf.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4b15d20347ac..7d84b3711d62 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -234,7 +234,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, size_t insn_cnt,
 		  struct bpf_prog_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
 	void *finfo = NULL, *linfo = NULL;
 	const char *func_info, *line_info;
 	__u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd;
@@ -263,6 +263,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	attr.prog_flags = OPTS_GET(opts, prog_flags, 0);
 	attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0);
 	attr.kern_version = OPTS_GET(opts, kern_version, 0);
+	attr.prog_token_fd = OPTS_GET(opts, token_fd, 0);
 
 	if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
 		libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 323bc04d8082..3ea11eba4a04 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -105,9 +105,10 @@ struct bpf_prog_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_prog_load_opts__last_field log_true_size
+#define bpf_prog_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
 			     const char *prog_name, const char *license,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 16/17] selftests/bpf: add BPF token-enabled tests
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (14 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 15/17] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-03 19:05 ` [PATCH v9 bpf-next 17/17] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
  16 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Add a selftest that attempts to conceptually replicate intended BPF
token use cases inside user namespaced container.

Child process is forked. It is then put into its own userns and mountns.
Child creates BPF FS context object and sets it up as desired. This
ensures child userns is captures as owning userns for this instance of
BPF FS.

This context is passed back to privileged parent process through Unix
socket, where parent creates and mounts it as a detached mount. This
mount FD is passed back to the child to be used for BPF token creation,
which allows otherwise privileged BPF operations to succeed inside
userns.

We validate that all of token-enabled privileged commands (BPF_BTF_LOAD,
BPF_MAP_CREATE, and BPF_PROG_LOAD) work as intended. They should only
succeed inside the userns if a) BPF token is provided with proper
allowed sets of commands and types; and b) namespaces CAP_BPF and other
privileges are set. Lacking a) or b) should lead to -EPERM failures.

Based on suggested workflow by Christian Brauner ([0]).

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 622 ++++++++++++++++++
 1 file changed, 622 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
new file mode 100644
index 000000000000..8cb71aba9d9a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -0,0 +1,622 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#define _GNU_SOURCE
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "cap_helpers.h"
+#include <fcntl.h>
+#include <sched.h>
+#include <signal.h>
+#include <unistd.h>
+#include <linux/filter.h>
+#include <linux/unistd.h>
+#include <linux/mount.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <sys/un.h>
+
+static inline int sys_mount(const char *dev_name, const char *dir_name,
+			    const char *type, unsigned long flags,
+			    const void *data)
+{
+	return syscall(__NR_mount, dev_name, dir_name, type, flags, data);
+}
+
+static inline int sys_fsopen(const char *fsname, unsigned flags)
+{
+	return syscall(__NR_fsopen, fsname, flags);
+}
+
+static inline int sys_fsconfig(int fs_fd, unsigned cmd, const char *key, const void *val, int aux)
+{
+	return syscall(__NR_fsconfig, fs_fd, cmd, key, val, aux);
+}
+
+static inline int sys_fsmount(int fs_fd, unsigned flags, unsigned ms_flags)
+{
+	return syscall(__NR_fsmount, fs_fd, flags, ms_flags);
+}
+
+static int drop_priv_caps(__u64 *old_caps)
+{
+	return cap_disable_effective((1ULL << CAP_BPF) |
+				     (1ULL << CAP_PERFMON) |
+				     (1ULL << CAP_NET_ADMIN) |
+				     (1ULL << CAP_SYS_ADMIN), old_caps);
+}
+
+static int restore_priv_caps(__u64 old_caps)
+{
+	return cap_enable_effective(old_caps, NULL);
+}
+
+static int set_delegate_mask(int fs_fd, const char *key, __u64 mask)
+{
+	char buf[32];
+	int err;
+
+	snprintf(buf, sizeof(buf), "0x%llx", (unsigned long long)mask);
+	err = sys_fsconfig(fs_fd, FSCONFIG_SET_STRING, key,
+			   mask == ~0ULL ? "any" : buf, 0);
+	if (err < 0)
+		err = -errno;
+	return err;
+}
+
+#define zclose(fd) do { if (fd >= 0) close(fd); fd = -1; } while (0)
+
+struct bpffs_opts {
+	__u64 cmds;
+	__u64 maps;
+	__u64 progs;
+	__u64 attachs;
+};
+
+static int setup_bpffs_fd(struct bpffs_opts *opts)
+{
+	int fs_fd = -1, err;
+
+	/* create VFS context */
+	fs_fd = sys_fsopen("bpf", 0);
+	if (!ASSERT_GE(fs_fd, 0, "fs_fd"))
+		goto cleanup;
+
+	/* set up token delegation mount options */
+	err = set_delegate_mask(fs_fd, "delegate_cmds", opts->cmds);
+	if (!ASSERT_OK(err, "fs_cfg_cmds"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_maps", opts->maps);
+	if (!ASSERT_OK(err, "fs_cfg_maps"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_progs", opts->progs);
+	if (!ASSERT_OK(err, "fs_cfg_progs"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_attachs", opts->attachs);
+	if (!ASSERT_OK(err, "fs_cfg_attachs"))
+		goto cleanup;
+
+	return fs_fd;
+cleanup:
+	zclose(fs_fd);
+	return -1;
+}
+
+static int materialize_bpffs_fd(int fs_fd)
+{
+	int mnt_fd, err;
+
+	/* instantiate FS object */
+	err = sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+	if (err < 0)
+		return -errno;
+
+	/* create O_PATH fd for detached mount */
+	mnt_fd = sys_fsmount(fs_fd, 0, 0);
+	if (err < 0)
+		return -errno;
+
+	return mnt_fd;
+}
+
+/* send FD over Unix domain (AF_UNIX) socket */
+static int sendfd(int sockfd, int fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1] = { fd }, err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+	cmsg = CMSG_FIRSTHDR(&msg);
+	cmsg->cmsg_level = SOL_SOCKET;
+	cmsg->cmsg_type = SCM_RIGHTS;
+	cmsg->cmsg_len = CMSG_LEN(sizeof(fds));
+	memcpy(CMSG_DATA(cmsg), fds, sizeof(fds));
+
+	err = sendmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "sendmsg"))
+		return -EINVAL;
+
+	return 0;
+}
+
+/* receive FD over Unix domain (AF_UNIX) socket */
+static int recvfd(int sockfd, int *fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1], err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+
+	err = recvmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "recvmsg"))
+		return -EINVAL;
+
+	cmsg = CMSG_FIRSTHDR(&msg);
+	if (!ASSERT_OK_PTR(cmsg, "cmsg_null") ||
+	    !ASSERT_EQ(cmsg->cmsg_len, CMSG_LEN(sizeof(fds)), "cmsg_len") ||
+	    !ASSERT_EQ(cmsg->cmsg_level, SOL_SOCKET, "cmsg_level") ||
+	    !ASSERT_EQ(cmsg->cmsg_type, SCM_RIGHTS, "cmsg_type"))
+		return -EINVAL;
+
+	memcpy(fds, CMSG_DATA(cmsg), sizeof(fds));
+	*fd = fds[0];
+
+	return 0;
+}
+
+static ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = write(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+static int write_file(const char *path, const void *buf, size_t count)
+{
+	int fd;
+	ssize_t ret;
+
+	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
+	if (fd < 0)
+		return -1;
+
+	ret = write_nointr(fd, buf, count);
+	close(fd);
+	if (ret < 0 || (size_t)ret != count)
+		return -1;
+
+	return 0;
+}
+
+static int create_and_enter_userns(void)
+{
+	uid_t uid;
+	gid_t gid;
+	char map[100];
+
+	uid = getuid();
+	gid = getgid();
+
+	if (unshare(CLONE_NEWUSER))
+		return -1;
+
+	if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) &&
+	    errno != ENOENT)
+		return -1;
+
+	snprintf(map, sizeof(map), "0 %d 1", uid);
+	if (write_file("/proc/self/uid_map", map, strlen(map)))
+		return -1;
+
+
+	snprintf(map, sizeof(map), "0 %d 1", gid);
+	if (write_file("/proc/self/gid_map", map, strlen(map)))
+		return -1;
+
+	if (setgid(0))
+		return -1;
+
+	if (setuid(0))
+		return -1;
+
+	return 0;
+}
+
+typedef int (*child_callback_fn)(int);
+
+static void child(int sock_fd, struct bpffs_opts *bpffs_opts, child_callback_fn callback)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int mnt_fd = -1, fs_fd = -1, err = 0;
+
+	/* setup userns with root mappings */
+	err = create_and_enter_userns();
+	if (!ASSERT_OK(err, "create_and_enter_userns"))
+		goto cleanup;
+
+	/* setup mountns to allow creating BPF FS (fsopen("bpf")) from unpriv process */
+	err = unshare(CLONE_NEWNS);
+	if (!ASSERT_OK(err, "create_mountns"))
+		goto cleanup;
+
+	err = sys_mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
+	if (!ASSERT_OK(err, "remount_root"))
+		goto cleanup;
+
+	fs_fd = setup_bpffs_fd(bpffs_opts);
+	if (!ASSERT_GE(fs_fd, 0, "setup_bpffs")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, fs_fd);
+	if (!ASSERT_OK(err, "send_fs_fd"))
+		goto cleanup;
+
+	/* avoid mucking around with mount namespaces and mounting at
+	 * well-known path, just get detach-mounted BPF FS fd back from parent
+	 */
+	err = recvfd(sock_fd, &mnt_fd);
+	if (!ASSERT_OK(err, "recv_mnt_fd"))
+		goto cleanup;
+
+	/* do custom test logic with customly set up BPF FS instance */
+	err = callback(mnt_fd);
+	if (!ASSERT_OK(err, "test_callback"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	zclose(sock_fd);
+	zclose(mnt_fd);
+
+	exit(-err);
+}
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static void parent(int child_pid, int sock_fd)
+{
+	int fs_fd = -1, mnt_fd = -1, err;
+
+	err = recvfd(sock_fd, &fs_fd);
+	if (!ASSERT_OK(err, "recv_bpffs_fd"))
+		goto cleanup;
+
+	mnt_fd = materialize_bpffs_fd(fs_fd);
+	if (!ASSERT_GE(mnt_fd, 0, "materialize_bpffs_fd")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	zclose(fs_fd);
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, mnt_fd);
+	if (!ASSERT_OK(err, "send_mnt_fd"))
+		goto cleanup;
+	zclose(mnt_fd);
+
+	err = wait_for_pid(child_pid);
+	ASSERT_OK(err, "waitpid_child");
+
+cleanup:
+	zclose(sock_fd);
+	zclose(fs_fd);
+	zclose(mnt_fd);
+
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static void subtest_userns(struct bpffs_opts *bpffs_opts, child_callback_fn cb)
+{
+	int sock_fds[2] = { -1, -1 };
+	int child_pid = 0, err;
+
+	err = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fds);
+	if (!ASSERT_OK(err, "socketpair"))
+		goto cleanup;
+
+	child_pid = fork();
+	if (!ASSERT_GE(child_pid, 0, "fork"))
+		goto cleanup;
+
+	if (child_pid == 0) {
+		zclose(sock_fds[0]);
+		return child(sock_fds[1], bpffs_opts, cb);
+
+	} else {
+		zclose(sock_fds[1]);
+		return parent(child_pid, sock_fds[0]);
+	}
+
+cleanup:
+	zclose(sock_fds[0]);
+	zclose(sock_fds[1]);
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static int userns_map_create(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int err, token_fd = -1, map_fd = -1;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no token, no CAP_BPF -> fail */
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* token without CAP_BPF -> fail */
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_w_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* CAP_BPF without token -> fail */
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_w_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* finally, namespaced CAP_BPF + token -> success */
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_GT(map_fd, 0, "stack_map_w_token_w_cap_bpf")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+cleanup:
+	zclose(token_fd);
+	zclose(map_fd);
+	return err;
+}
+
+static int userns_btf_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_btf_load_opts, btf_opts);
+	int err, token_fd = -1, btf_fd = -1;
+	const void *raw_btf_data;
+	struct btf *btf = NULL;
+	__u32 raw_btf_size;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* setup a trivial BTF data to load to the kernel */
+	btf = btf__new_empty();
+	if (!ASSERT_OK_PTR(btf, "empty_btf"))
+		goto cleanup;
+
+	ASSERT_GT(btf__add_int(btf, "int", 4, 0), 0, "int_type");
+
+	raw_btf_data = btf__raw_data(btf, &raw_btf_size);
+	if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data"))
+		goto cleanup;
+
+	/* no token + no CAP_BPF -> failure */
+	btf_opts.token_fd = 0;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "no_token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* token + no CAP_BPF -> failure */
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* token + CAP_BPF -> success */
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_GT(btf_fd, 0, "token_and_cap_success"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	btf__free(btf);
+	zclose(btf_fd);
+	zclose(token_fd);
+	return err;
+}
+
+static int userns_prog_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, prog_opts);
+	int err, token_fd = -1, prog_fd = -1;
+	struct bpf_insn insns[] = {
+		/* bpf_jiffies64() requires CAP_BPF */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		/* bpf_get_current_task() requires CAP_PERFMON */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_current_task),
+		/* r0 = 0; exit; */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	size_t insn_cnt = ARRAY_SIZE(insns);
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* validate we can successfully load BPF program with token; this
+	 * being XDP program (CAP_NET_ADMIN) using bpf_jiffies64() (CAP_BPF)
+	 * and bpf_get_current_task() (CAP_PERFMON) helpers validates we have
+	 * BPF token wired properly in a bunch of places in the kernel
+	 */
+	prog_opts.token_fd = token_fd;
+	prog_opts.expected_attach_type = BPF_XDP;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_GT(prog_fd, 0, "prog_fd")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	/* no token + caps -> failure */
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no caps + token -> failure */
+	prog_opts.token_fd = token_fd;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	/* no caps + no token -> definitely a failure */
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	err = 0;
+cleanup:
+	zclose(prog_fd);
+	zclose(token_fd);
+	return err;
+}
+
+void test_token(void)
+{
+	if (test__start_subtest("map_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_MAP_CREATE,
+			.maps = 1ULL << BPF_MAP_TYPE_STACK,
+		};
+
+		subtest_userns(&opts, userns_map_create);
+	}
+	if (test__start_subtest("btf_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_BTF_LOAD,
+		};
+
+		subtest_userns(&opts, userns_btf_load);
+	}
+	if (test__start_subtest("prog_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_PROG_LOAD,
+			.progs = 1ULL << BPF_PROG_TYPE_XDP,
+			.attachs = 1ULL << BPF_XDP,
+		};
+
+		subtest_userns(&opts, userns_prog_load);
+	}
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 bpf-next 17/17] bpf,selinux: allocate bpf_security_struct per BPF token
  2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (15 preceding siblings ...)
  2023-11-03 19:05 ` [PATCH v9 bpf-next 16/17] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
@ 2023-11-03 19:05 ` Andrii Nakryiko
  2023-11-06  5:01   ` [PATCH v9 " Paul Moore
  16 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-03 19:05 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

Utilize newly added bpf_token_create/bpf_token_free LSM hooks to
allocate struct bpf_security_struct for each BPF token object in
SELinux. This just follows similar pattern for BPF prog and map.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 security/selinux/hooks.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 002351ab67b7..1501e95366a1 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6828,6 +6828,29 @@ static void selinux_bpf_prog_free(struct bpf_prog *prog)
 	prog->aux->security = NULL;
 	kfree(bpfsec);
 }
+
+static int selinux_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+				    struct path *path)
+{
+	struct bpf_security_struct *bpfsec;
+
+	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+	if (!bpfsec)
+		return -ENOMEM;
+
+	bpfsec->sid = current_sid();
+	token->security = bpfsec;
+
+	return 0;
+}
+
+static void selinux_bpf_token_free(struct bpf_token *token)
+{
+	struct bpf_security_struct *bpfsec = token->security;
+
+	token->security = NULL;
+	kfree(bpfsec);
+}
 #endif
 
 struct lsm_blob_sizes selinux_blob_sizes __ro_after_init = {
@@ -7183,6 +7206,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
 	LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free),
 	LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
+	LSM_HOOK_INIT(bpf_token_free, selinux_bpf_token_free),
 #endif
 
 #ifdef CONFIG_PERF_EVENTS
@@ -7241,6 +7265,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 #ifdef CONFIG_BPF_SYSCALL
 	LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create),
 	LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
+	LSM_HOOK_INIT(bpf_token_create, selinux_bpf_token_create),
 #endif
 #ifdef CONFIG_PERF_EVENTS
 	LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token LSM hooks
  2023-11-03 19:05 ` [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token " Andrii Nakryiko
@ 2023-11-04  0:36   ` kernel test robot
  2023-11-04  3:20     ` Andrii Nakryiko
  2023-11-06  5:01   ` [PATCH v9 " Paul Moore
  1 sibling, 1 reply; 37+ messages in thread
From: kernel test robot @ 2023-11-04  0:36 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, paul, brauner
  Cc: oe-kbuild-all, linux-fsdevel, linux-security-module, keescook,
	kernel-team, sargun

Hi Andrii,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-align-CAP_NET_ADMIN-checks-with-bpf_capable-approach/20231104-031714
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20231103190523.6353-12-andrii%40kernel.org
patch subject: [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token LSM hooks
config: m68k-defconfig (https://download.01.org/0day-ci/archive/20231104/202311040829.XrnpSV8z-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231104/202311040829.XrnpSV8z-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311040829.XrnpSV8z-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/net/scm.h:8,
                    from include/linux/netlink.h:9,
                    from include/uapi/linux/neighbour.h:6,
                    from include/linux/netdevice.h:45,
                    from include/net/sock.h:46,
                    from include/linux/tcp.h:19,
                    from include/linux/ipv6.h:95,
                    from include/net/ipv6.h:12,
                    from include/linux/sunrpc/addr.h:14,
                    from fs/nfsd/nfsd.h:22,
                    from fs/nfsd/state.h:42,
                    from fs/nfsd/xdr4.h:40,
                    from fs/nfsd/trace.h:17,
                    from fs/nfsd/trace.c:4:
>> include/linux/security.h:2084:92: error: parameter 2 ('cmd') has incomplete type
    2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
         |                                                                               ~~~~~~~~~~~~~^~~
>> include/linux/security.h:2084:19: error: function declaration isn't a prototype [-Werror=strict-prototypes]
    2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from include/net/scm.h:8,
                    from include/linux/netlink.h:9,
                    from include/uapi/linux/neighbour.h:6,
                    from include/linux/netdevice.h:45,
                    from include/net/sock.h:46,
                    from include/linux/tcp.h:19,
                    from include/linux/ipv6.h:95,
                    from include/net/ipv6.h:12,
                    from include/linux/sunrpc/addr.h:14,
                    from fs/nfsd/nfsd.h:22,
                    from fs/nfsd/export.c:21:
>> include/linux/security.h:2084:92: error: parameter 2 ('cmd') has incomplete type
    2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
         |                                                                               ~~~~~~~~~~~~~^~~
>> include/linux/security.h:2084:19: error: function declaration isn't a prototype [-Werror=strict-prototypes]
    2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfsd/export.c: In function 'exp_rootfh':
   fs/nfsd/export.c:1017:34: warning: variable 'inode' set but not used [-Wunused-but-set-variable]
    1017 |         struct inode            *inode;
         |                                  ^~~~~
   cc1: some warnings being treated as errors
--
   In file included from include/net/scm.h:8,
                    from include/linux/netlink.h:9,
                    from include/uapi/linux/neighbour.h:6,
                    from include/linux/netdevice.h:45,
                    from include/net/sock.h:46,
                    from include/linux/tcp.h:19,
                    from include/linux/ipv6.h:95,
                    from include/net/ipv6.h:12,
                    from include/linux/sunrpc/addr.h:14,
                    from fs/nfsd/nfsd.h:22,
                    from fs/nfsd/state.h:42,
                    from fs/nfsd/xdr4.h:40,
                    from fs/nfsd/trace.h:17,
                    from fs/nfsd/trace.c:4:
>> include/linux/security.h:2084:92: error: parameter 2 ('cmd') has incomplete type
    2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
         |                                                                               ~~~~~~~~~~~~~^~~
>> include/linux/security.h:2084:19: error: function declaration isn't a prototype [-Werror=strict-prototypes]
    2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from fs/nfsd/trace.h:1958:
   include/trace/define_trace.h:95:42: fatal error: ./trace.h: No such file or directory
      95 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
         |                                          ^
   cc1: some warnings being treated as errors
   compilation terminated.


vim +2084 include/linux/security.h

  2083	
> 2084	static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
  2085	{
  2086		return 0;
  2087	}
  2088	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token LSM hooks
  2023-11-04  0:36   ` kernel test robot
@ 2023-11-04  3:20     ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-04  3:20 UTC (permalink / raw)
  To: kernel test robot
  Cc: Andrii Nakryiko, bpf, netdev, paul, brauner, oe-kbuild-all,
	linux-fsdevel, linux-security-module, keescook, kernel-team,
	sargun

On Fri, Nov 3, 2023 at 5:38 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Andrii,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on bpf-next/master]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-align-CAP_NET_ADMIN-checks-with-bpf_capable-approach/20231104-031714
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link:    https://lore.kernel.org/r/20231103190523.6353-12-andrii%40kernel.org
> patch subject: [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token LSM hooks
> config: m68k-defconfig (https://download.01.org/0day-ci/archive/20231104/202311040829.XrnpSV8z-lkp@intel.com/config)
> compiler: m68k-linux-gcc (GCC) 13.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231104/202311040829.XrnpSV8z-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202311040829.XrnpSV8z-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
>    In file included from include/net/scm.h:8,
>                     from include/linux/netlink.h:9,
>                     from include/uapi/linux/neighbour.h:6,
>                     from include/linux/netdevice.h:45,
>                     from include/net/sock.h:46,
>                     from include/linux/tcp.h:19,
>                     from include/linux/ipv6.h:95,
>                     from include/net/ipv6.h:12,
>                     from include/linux/sunrpc/addr.h:14,
>                     from fs/nfsd/nfsd.h:22,
>                     from fs/nfsd/state.h:42,
>                     from fs/nfsd/xdr4.h:40,
>                     from fs/nfsd/trace.h:17,
>                     from fs/nfsd/trace.c:4:
> >> include/linux/security.h:2084:92: error: parameter 2 ('cmd') has incomplete type
>     2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>          |                                                                               ~~~~~~~~~~~~~^~~
> >> include/linux/security.h:2084:19: error: function declaration isn't a prototype [-Werror=strict-prototypes]
>     2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>          |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    cc1: some warnings being treated as errors

Ok, so apparently enum forward declaration doesn't work with static
inline functions.

Would it be ok to just #include <linux/bpf.h> in this file?

$ git diff
diff --git a/include/linux/security.h b/include/linux/security.h
index 1d6edbf45d1c..cfe6176824c2 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -32,6 +32,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/sockptr.h>
+#include <linux/bpf.h>

 struct linux_binprm;
 struct cred;
@@ -60,7 +61,6 @@ struct fs_parameter;
 enum fs_value_type;
 struct watch;
 struct watch_notification;
-enum bpf_cmd;

 /* Default (no) options for the capable function */
 #define CAP_OPT_NONE 0x0


If not, then I guess another alternative would be to pass `int cmd`
instead of `enum bpf_cmd cmd`, but that doesn't seems like the best
solution, tbh.

Paul, any preferences?

> --
>    In file included from include/net/scm.h:8,
>                     from include/linux/netlink.h:9,
>                     from include/uapi/linux/neighbour.h:6,
>                     from include/linux/netdevice.h:45,
>                     from include/net/sock.h:46,
>                     from include/linux/tcp.h:19,
>                     from include/linux/ipv6.h:95,
>                     from include/net/ipv6.h:12,
>                     from include/linux/sunrpc/addr.h:14,
>                     from fs/nfsd/nfsd.h:22,
>                     from fs/nfsd/export.c:21:
> >> include/linux/security.h:2084:92: error: parameter 2 ('cmd') has incomplete type
>     2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>          |                                                                               ~~~~~~~~~~~~~^~~
> >> include/linux/security.h:2084:19: error: function declaration isn't a prototype [-Werror=strict-prototypes]
>     2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>          |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    fs/nfsd/export.c: In function 'exp_rootfh':
>    fs/nfsd/export.c:1017:34: warning: variable 'inode' set but not used [-Wunused-but-set-variable]
>     1017 |         struct inode            *inode;
>          |                                  ^~~~~
>    cc1: some warnings being treated as errors
> --
>    In file included from include/net/scm.h:8,
>                     from include/linux/netlink.h:9,
>                     from include/uapi/linux/neighbour.h:6,
>                     from include/linux/netdevice.h:45,
>                     from include/net/sock.h:46,
>                     from include/linux/tcp.h:19,
>                     from include/linux/ipv6.h:95,
>                     from include/net/ipv6.h:12,
>                     from include/linux/sunrpc/addr.h:14,
>                     from fs/nfsd/nfsd.h:22,
>                     from fs/nfsd/state.h:42,
>                     from fs/nfsd/xdr4.h:40,
>                     from fs/nfsd/trace.h:17,
>                     from fs/nfsd/trace.c:4:
> >> include/linux/security.h:2084:92: error: parameter 2 ('cmd') has incomplete type
>     2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>          |                                                                               ~~~~~~~~~~~~~^~~
> >> include/linux/security.h:2084:19: error: function declaration isn't a prototype [-Werror=strict-prototypes]
>     2084 | static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>          |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    In file included from fs/nfsd/trace.h:1958:
>    include/trace/define_trace.h:95:42: fatal error: ./trace.h: No such file or directory
>       95 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
>          |                                          ^
>    cc1: some warnings being treated as errors
>    compilation terminated.
>
>
> vim +2084 include/linux/security.h
>
>   2083
> > 2084  static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>   2085  {
>   2086          return 0;
>   2087  }
>   2088
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
  2023-11-03 19:05 ` [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
@ 2023-11-05  6:33   ` Yafang Shao
  0 siblings, 0 replies; 37+ messages in thread
From: Yafang Shao @ 2023-11-05  6:33 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, brauner, linux-fsdevel, linux-security-module,
	keescook, kernel-team, sargun

On Sat, Nov 4, 2023 at 3:05 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit
> compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or
> CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN
> takes over and grants whatever part of BPF syscall is required.
>
> Similar kind of checks that involve CAP_NET_ADMIN are not so consistent.
> One out of four uses does follow CAP_BPF/CAP_PERFMON model: during
> BPF_PROG_LOAD, if the type of BPF program is "network-related" either
> CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed.
>
> But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN
> is set:
>   - when creating DEVMAP/XDKMAP/CPU_MAP maps;
>   - when attaching CGROUP_SKB programs;
>   - when handling BPF_PROG_QUERY command.
>
> This patch is changing the latter three cases to follow BPF_PROG_LOAD
> model, that is allowing to proceed under either CAP_NET_ADMIN or
> CAP_SYS_ADMIN.
>
> This also makes it cleaner in subsequent BPF token patches to switch
> wholesomely to a generic bpf_token_capable(int cap) check, that always
> falls back to CAP_SYS_ADMIN if requested capability is missing.
>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yafang Shao <laoar.shao@gmail.com>

> ---
>  kernel/bpf/syscall.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 0ed286b8a0f0..ad4d8e433ccc 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1096,6 +1096,11 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>         return ret;
>  }
>
> +static bool bpf_net_capable(void)
> +{
> +       return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
> +}
> +
>  #define BPF_MAP_CREATE_LAST_FIELD map_extra
>  /* called via syscall */
>  static int map_create(union bpf_attr *attr)
> @@ -1199,7 +1204,7 @@ static int map_create(union bpf_attr *attr)
>         case BPF_MAP_TYPE_DEVMAP:
>         case BPF_MAP_TYPE_DEVMAP_HASH:
>         case BPF_MAP_TYPE_XSKMAP:
> -               if (!capable(CAP_NET_ADMIN))
> +               if (!bpf_net_capable())
>                         return -EPERM;
>                 break;
>         default:
> @@ -2599,7 +2604,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
>             !bpf_capable())
>                 return -EPERM;
>
> -       if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN))
> +       if (is_net_admin_prog_type(type) && !bpf_net_capable())
>                 return -EPERM;
>         if (is_perfmon_prog_type(type) && !perfmon_capable())
>                 return -EPERM;
> @@ -3751,7 +3756,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
>         case BPF_PROG_TYPE_SK_LOOKUP:
>                 return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
>         case BPF_PROG_TYPE_CGROUP_SKB:
> -               if (!capable(CAP_NET_ADMIN))
> +               if (!bpf_net_capable())
>                         /* cg-skb progs can be loaded by unpriv user.
>                          * check permissions at attach time.
>                          */
> @@ -3954,7 +3959,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>  static int bpf_prog_query(const union bpf_attr *attr,
>                           union bpf_attr __user *uattr)
>  {
> -       if (!capable(CAP_NET_ADMIN))
> +       if (!bpf_net_capable())
>                 return -EPERM;
>         if (CHECK_ATTR(BPF_PROG_QUERY))
>                 return -EINVAL;
> --
> 2.34.1
>
>


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 9/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM  hooks
  2023-11-03 19:05 ` [PATCH v9 bpf-next 09/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
@ 2023-11-06  5:01   ` Paul Moore
  2023-11-06 19:03     ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Paul Moore @ 2023-11-06  5:01 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Based on upstream discussion ([0]), rework existing
> bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
> of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
> program struct. Also, we pass bpf_attr union with all the user-provided
> arguments for BPF_PROG_LOAD command.  This will give LSMs as much
> information as we can basically provide.
> 
> The hook is also BPF token-aware now, and optional bpf_token struct is
> passed as a third argument. bpf_prog_load LSM hook is called after
> a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
> allocated and filled out, but right before performing full-fledged BPF
> verification step.
> 
> bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
> consistency. SELinux code is adjusted to all new names, types, and
> signatures.
> 
> Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
> used by some LSMs to allocate extra security blob, but also by other
> LSMs to reject BPF program loading, we need to make sure that
> bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
> *even* if the hook itself returned error. If we don't do that, we run
> the risk of leaking memory. This seems to be possible today when
> combining SELinux and BPF LSM, as one example, depending on their
> relative ordering.
> 
> Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
> sleepable LSM hooks list, as they are both executed in sleepable
> context. Also drop bpf_prog_load hook from untrusted, as there is no
> issue with refcount or anything else anymore, that originally forced us
> to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM
> hook arguments as trusted"). We now trigger this hook much later and it
> should not be an issue anymore.

See my comment below, but it isn't clear to me if this means it is okay
to have `BTF_ID(func, bpf_lsm_bpf_prog_free)` called twice.  It probably
would be a good idea to get KP, BPF LSM maintainer, to review this change
as well to make sure this looks good to him.

>   [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/lsm_hook_defs.h |  5 +++--
>  include/linux/security.h      | 12 +++++++-----
>  kernel/bpf/bpf_lsm.c          |  5 +++--
>  kernel/bpf/syscall.c          | 25 +++++++++++++------------
>  security/security.c           | 25 +++++++++++++++----------
>  security/selinux/hooks.c      | 15 ++++++++-------
>  6 files changed, 49 insertions(+), 38 deletions(-)

...

> diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> index e14c822f8911..3e956f6302f3 100644
> --- a/kernel/bpf/bpf_lsm.c
> +++ b/kernel/bpf/bpf_lsm.c
> @@ -263,6 +263,8 @@ BTF_ID(func, bpf_lsm_bpf_map)
>  BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
>  BTF_ID(func, bpf_lsm_bpf_map_free_security)
>  BTF_ID(func, bpf_lsm_bpf_prog)
> +BTF_ID(func, bpf_lsm_bpf_prog_load)
> +BTF_ID(func, bpf_lsm_bpf_prog_free)
>  BTF_ID(func, bpf_lsm_bprm_check_security)
>  BTF_ID(func, bpf_lsm_bprm_committed_creds)
>  BTF_ID(func, bpf_lsm_bprm_committing_creds)
> @@ -346,8 +348,7 @@ BTF_SET_END(sleepable_lsm_hooks)
>  
>  BTF_SET_START(untrusted_lsm_hooks)
>  BTF_ID(func, bpf_lsm_bpf_map_free_security)
> -BTF_ID(func, bpf_lsm_bpf_prog_alloc_security)
> -BTF_ID(func, bpf_lsm_bpf_prog_free_security)
> +BTF_ID(func, bpf_lsm_bpf_prog_free)
>  BTF_ID(func, bpf_lsm_file_alloc_security)
>  BTF_ID(func, bpf_lsm_file_free_security)
>  #ifdef CONFIG_SECURITY_NETWORK

It looks like you're calling the BTF_ID() macro on bpf_lsm_bpf_prog_free
twice?  I would have expected a only one macro call for each bpf_prog_load
and bpf_prog_free, is that a bad assuption?

> diff --git a/security/security.c b/security/security.c
> index dcb3e7014f9b..5773d446210e 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -5180,16 +5180,21 @@ int security_bpf_map_alloc(struct bpf_map *map)
>  }
>  
>  /**
> - * security_bpf_prog_alloc() - Allocate a bpf program LSM blob
> - * @aux: bpf program aux info struct
> + * security_bpf_prog_load() - Check if loading of BPF program is allowed
> + * @prog: BPF program object
> + * @attr: BPF syscall attributes used to create BPF program
> + * @token: BPF token used to grant user access to BPF subsystem
>   *
> - * Initialize the security field inside bpf program.
> + * Do a check when the kernel allocates BPF program object and is about to
> + * pass it to BPF verifier for additional correctness checks. This is also the
> + * point where LSM blob is allocated for LSMs that need them.

This is pretty nitpicky, but I'm guessing you may need to make another
revision to this patchset, if you do please drop the BPF verifier remark
from the comment above.

Example: "Perform an access control check when the kernel loads a BPF
program and allocates the associated BPF program object.  This hook is
also responsibile for allocating any required LSM state for the BPF
program."

>   * Return: Returns 0 on success, error on failure.
>   */
> -int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
> +int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
> +			   struct bpf_token *token)
>  {
> -	return call_int_hook(bpf_prog_alloc_security, 0, aux);
> +	return call_int_hook(bpf_prog_load, 0, prog, attr, token);
>  }

--
paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 10/17] bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM  hooks
  2023-11-03 19:05 ` [PATCH v9 bpf-next 10/17] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
@ 2023-11-06  5:01   ` Paul Moore
  0 siblings, 0 replies; 37+ messages in thread
From: Paul Moore @ 2023-11-06  5:01 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Similarly to bpf_prog_alloc LSM hook, rename and extend bpf_map_alloc
> hook into bpf_map_create, taking not just struct bpf_map, but also
> bpf_attr and bpf_token, to give a fuller context to LSMs.
> 
> Unlike bpf_prog_alloc, there is no need to move the hook around, as it
> currently is firing right before allocating BPF map ID and FD, which
> seems to be a sweet spot.
> 
> But like bpf_prog_alloc/bpf_prog_free combo, make sure that bpf_map_free
> LSM hook is called even if bpf_map_create hook returned error, as if few
> LSMs are combined together it could be that one LSM successfully
> allocated security blob for its needs, while subsequent LSM rejected BPF
> map creation. The former LSM would still need to free up LSM blob, so we
> need to ensure security_bpf_map_free() is called regardless of the
> outcome.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/lsm_hook_defs.h |  5 +++--
>  include/linux/security.h      |  6 ++++--
>  kernel/bpf/bpf_lsm.c          |  6 +++---
>  kernel/bpf/syscall.c          |  4 ++--
>  security/security.c           | 16 ++++++++++------
>  security/selinux/hooks.c      |  7 ++++---
>  6 files changed, 26 insertions(+), 18 deletions(-)

Acked-by: Paul Moore <paul@paul-moore.com>

--
paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 11/17] bpf,lsm: add BPF token LSM hooks
  2023-11-03 19:05 ` [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token " Andrii Nakryiko
  2023-11-04  0:36   ` kernel test robot
@ 2023-11-06  5:01   ` Paul Moore
  2023-11-06 19:17     ` Andrii Nakryiko
  1 sibling, 1 reply; 37+ messages in thread
From: Paul Moore @ 2023-11-06  5:01 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to
> allocate LSM security blob (we add `void *security` field to struct
> bpf_token for that), but also control who can instantiate BPF token.
> This follows existing pattern for BPF map and BPF prog.
> 
> Also add security_bpf_token_allow_cmd() and security_bpf_token_capable()
> LSM hooks that allow LSM implementation to control and negate (if
> necessary) BPF token's delegation of a specific bpf_cmd and capability,
> respectively.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h           |  3 ++
>  include/linux/lsm_hook_defs.h |  5 +++
>  include/linux/security.h      | 25 +++++++++++++++
>  kernel/bpf/bpf_lsm.c          |  4 +++
>  kernel/bpf/token.c            | 13 ++++++--
>  security/security.c           | 60 +++++++++++++++++++++++++++++++++++
>  6 files changed, 107 insertions(+), 3 deletions(-)

...

> diff --git a/include/linux/security.h b/include/linux/security.h
> index 08fd777cbe94..1d6edbf45d1c 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -60,6 +60,7 @@ struct fs_parameter;
>  enum fs_value_type;
>  struct watch;
>  struct watch_notification;
> +enum bpf_cmd;

Yes, I think it's fine to include bpf.h in security.h instead of the
forward declaration.

>  /* Default (no) options for the capable function */
>  #define CAP_OPT_NONE 0x0
> @@ -2031,6 +2032,11 @@ extern void security_bpf_map_free(struct bpf_map *map);
>  extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
>  				  struct bpf_token *token);
>  extern void security_bpf_prog_free(struct bpf_prog *prog);
> +extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> +				     struct path *path);
> +extern void security_bpf_token_free(struct bpf_token *token);
> +extern int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> +extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
>  #else
>  static inline int security_bpf(int cmd, union bpf_attr *attr,
>  					     unsigned int size)
> @@ -2065,6 +2071,25 @@ static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *
>  
>  static inline void security_bpf_prog_free(struct bpf_prog *prog)
>  { }
> +
> +static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> +				     struct path *path)
> +{
> +	return 0;
> +}
> +
> +static inline void security_bpf_token_free(struct bpf_token *token)
> +{ }
> +
> +static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> +{
> +	return 0;
> +}
> +
> +static inline int security_bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +	return 0;
> +}

Another nitpick, but I would prefer to shorten
security_bpf_token_allow_cmd() renamed to security_bpf_token_cmd() both
to shorten the name and to better fit convention.  I realize the caller
is named bpf_token_allow_cmd() but I'd still rather see the LSM hook
with the shorter name.

> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> index 35e6f55c2a41..5d04da54faea 100644
> --- a/kernel/bpf/token.c
> +++ b/kernel/bpf/token.c
> @@ -7,11 +7,12 @@
>  #include <linux/idr.h>
>  #include <linux/namei.h>
>  #include <linux/user_namespace.h>
> +#include <linux/security.h>
>  
>  bool bpf_token_capable(const struct bpf_token *token, int cap)
>  {
>  	/* BPF token allows ns_capable() level of capabilities */
> -	if (token) {
> +	if (token && security_bpf_token_capable(token, cap) == 0) {
>  		if (ns_capable(token->userns, cap))
>  			return true;
>  		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))

We typically perform the capability based access controls prior to the
LSM controls, meaning if we want to the token controls to work in a
similar way we should do something like this:

  bool bpf_token_capable(...)
  {
    if (token) {
      if (ns_capable(token, cap) ||
          (cap != ADMIN && ns_capable(token, ADMIN)))
        return security_bpf_token_capable(token, cap);
    }
    return capable(cap) || (cap != ADMIN && capable(...))
  }

> @@ -28,6 +29,7 @@ void bpf_token_inc(struct bpf_token *token)
>  
>  static void bpf_token_free(struct bpf_token *token)
>  {
> +	security_bpf_token_free(token);
>  	put_user_ns(token->userns);
>  	kvfree(token);
>  }
> @@ -172,6 +174,10 @@ int bpf_token_create(union bpf_attr *attr)
>  	token->allowed_progs = mnt_opts->delegate_progs;
>  	token->allowed_attachs = mnt_opts->delegate_attachs;
>  
> +	err = security_bpf_token_create(token, attr, &path);
> +	if (err)
> +		goto out_token;
> +
>  	fd = get_unused_fd_flags(O_CLOEXEC);
>  	if (fd < 0) {
>  		err = fd;
> @@ -216,8 +222,9 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
>  {
>  	if (!token)
>  		return false;
> -
> -	return token->allowed_cmds & (1ULL << cmd);
> +	if (!(token->allowed_cmds & (1ULL << cmd)))
> +		return false;
> +	return security_bpf_token_allow_cmd(token, cmd) == 0;

I'm not sure how much it really matters, but someone might prefer
the '!!' approach/style over '== 0'.

>  }
>  
>  bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)

--
paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 17/17] bpf,selinux: allocate bpf_security_struct per BPF  token
  2023-11-03 19:05 ` [PATCH v9 bpf-next 17/17] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
@ 2023-11-06  5:01   ` Paul Moore
  2023-11-06 19:18     ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Paul Moore @ 2023-11-06  5:01 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, brauner
  Cc: linux-fsdevel, linux-security-module, keescook, kernel-team, sargun

On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Utilize newly added bpf_token_create/bpf_token_free LSM hooks to
> allocate struct bpf_security_struct for each BPF token object in
> SELinux. This just follows similar pattern for BPF prog and map.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  security/selinux/hooks.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)

Thanks Andrii, we'll need some additional code to fully enable the
BPF tokens on a SELinux system but I can help provide that if you'd
like.  Although I might not be able to get to that until after the
merge window closes.

> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 002351ab67b7..1501e95366a1 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -6828,6 +6828,29 @@ static void selinux_bpf_prog_free(struct bpf_prog *prog)
>  	prog->aux->security = NULL;
>  	kfree(bpfsec);
>  }
> +
> +static int selinux_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> +				    struct path *path)
> +{
> +	struct bpf_security_struct *bpfsec;
> +
> +	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> +	if (!bpfsec)
> +		return -ENOMEM;
> +
> +	bpfsec->sid = current_sid();
> +	token->security = bpfsec;
> +
> +	return 0;
> +}
> +
> +static void selinux_bpf_token_free(struct bpf_token *token)
> +{
> +	struct bpf_security_struct *bpfsec = token->security;
> +
> +	token->security = NULL;
> +	kfree(bpfsec);
> +}
>  #endif
>  
>  struct lsm_blob_sizes selinux_blob_sizes __ro_after_init = {
> @@ -7183,6 +7206,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
>  	LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free),
>  	LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
> +	LSM_HOOK_INIT(bpf_token_free, selinux_bpf_token_free),
>  #endif
>  
>  #ifdef CONFIG_PERF_EVENTS
> @@ -7241,6 +7265,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
>  #ifdef CONFIG_BPF_SYSCALL
>  	LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create),
>  	LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
> +	LSM_HOOK_INIT(bpf_token_create, selinux_bpf_token_create),
>  #endif
>  #ifdef CONFIG_PERF_EVENTS
>  	LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
> -- 
> 2.34.1

--
paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 9/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
  2023-11-06  5:01   ` [PATCH v9 9/17] " Paul Moore
@ 2023-11-06 19:03     ` Andrii Nakryiko
  2023-11-06 22:29       ` Paul Moore
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-06 19:03 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, brauner, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Sun, Nov 5, 2023 at 9:01 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Based on upstream discussion ([0]), rework existing
> > bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
> > of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
> > program struct. Also, we pass bpf_attr union with all the user-provided
> > arguments for BPF_PROG_LOAD command.  This will give LSMs as much
> > information as we can basically provide.
> >
> > The hook is also BPF token-aware now, and optional bpf_token struct is
> > passed as a third argument. bpf_prog_load LSM hook is called after
> > a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
> > allocated and filled out, but right before performing full-fledged BPF
> > verification step.
> >
> > bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
> > consistency. SELinux code is adjusted to all new names, types, and
> > signatures.
> >
> > Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
> > used by some LSMs to allocate extra security blob, but also by other
> > LSMs to reject BPF program loading, we need to make sure that
> > bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
> > *even* if the hook itself returned error. If we don't do that, we run
> > the risk of leaking memory. This seems to be possible today when
> > combining SELinux and BPF LSM, as one example, depending on their
> > relative ordering.
> >
> > Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
> > sleepable LSM hooks list, as they are both executed in sleepable
> > context. Also drop bpf_prog_load hook from untrusted, as there is no
> > issue with refcount or anything else anymore, that originally forced us
> > to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM
> > hook arguments as trusted"). We now trigger this hook much later and it
> > should not be an issue anymore.
>
> See my comment below, but it isn't clear to me if this means it is okay
> to have `BTF_ID(func, bpf_lsm_bpf_prog_free)` called twice.  It probably
> would be a good idea to get KP, BPF LSM maintainer, to review this change
> as well to make sure this looks good to him.
>
> >   [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/lsm_hook_defs.h |  5 +++--
> >  include/linux/security.h      | 12 +++++++-----
> >  kernel/bpf/bpf_lsm.c          |  5 +++--
> >  kernel/bpf/syscall.c          | 25 +++++++++++++------------
> >  security/security.c           | 25 +++++++++++++++----------
> >  security/selinux/hooks.c      | 15 ++++++++-------
> >  6 files changed, 49 insertions(+), 38 deletions(-)
>
> ...
>
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index e14c822f8911..3e956f6302f3 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -263,6 +263,8 @@ BTF_ID(func, bpf_lsm_bpf_map)
> >  BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
> >  BTF_ID(func, bpf_lsm_bpf_map_free_security)
> >  BTF_ID(func, bpf_lsm_bpf_prog)
> > +BTF_ID(func, bpf_lsm_bpf_prog_load)
> > +BTF_ID(func, bpf_lsm_bpf_prog_free)
> >  BTF_ID(func, bpf_lsm_bprm_check_security)
> >  BTF_ID(func, bpf_lsm_bprm_committed_creds)
> >  BTF_ID(func, bpf_lsm_bprm_committing_creds)
> > @@ -346,8 +348,7 @@ BTF_SET_END(sleepable_lsm_hooks)
> >
> >  BTF_SET_START(untrusted_lsm_hooks)
> >  BTF_ID(func, bpf_lsm_bpf_map_free_security)
> > -BTF_ID(func, bpf_lsm_bpf_prog_alloc_security)
> > -BTF_ID(func, bpf_lsm_bpf_prog_free_security)
> > +BTF_ID(func, bpf_lsm_bpf_prog_free)
> >  BTF_ID(func, bpf_lsm_file_alloc_security)
> >  BTF_ID(func, bpf_lsm_file_free_security)
> >  #ifdef CONFIG_SECURITY_NETWORK
>
> It looks like you're calling the BTF_ID() macro on bpf_lsm_bpf_prog_free
> twice?  I would have expected a only one macro call for each bpf_prog_load
> and bpf_prog_free, is that a bad assuption?
>

Yeah, there is no problem having multiple BTF_ID() invocations for the
same function. BTF_ID() macro (conceptually) emits a relocation that
will instruct resolve_btfids tool to put an actual BTF ID for the
specified function in a designated 4-byte slot.

In this case, we have two separate lists: sleepable_lsm_hooks and
untrusted_lsm_hooks, so we need two separate BTF_ID() entries for the
same function. It's expected to be duplicated.

> > diff --git a/security/security.c b/security/security.c
> > index dcb3e7014f9b..5773d446210e 100644
> > --- a/security/security.c
> > +++ b/security/security.c
> > @@ -5180,16 +5180,21 @@ int security_bpf_map_alloc(struct bpf_map *map)
> >  }
> >
> >  /**
> > - * security_bpf_prog_alloc() - Allocate a bpf program LSM blob
> > - * @aux: bpf program aux info struct
> > + * security_bpf_prog_load() - Check if loading of BPF program is allowed
> > + * @prog: BPF program object
> > + * @attr: BPF syscall attributes used to create BPF program
> > + * @token: BPF token used to grant user access to BPF subsystem
> >   *
> > - * Initialize the security field inside bpf program.
> > + * Do a check when the kernel allocates BPF program object and is about to
> > + * pass it to BPF verifier for additional correctness checks. This is also the
> > + * point where LSM blob is allocated for LSMs that need them.
>
> This is pretty nitpicky, but I'm guessing you may need to make another
> revision to this patchset, if you do please drop the BPF verifier remark
> from the comment above.
>
> Example: "Perform an access control check when the kernel loads a BPF
> program and allocates the associated BPF program object.  This hook is
> also responsibile for allocating any required LSM state for the BPF
> program."

Done, no problem.

>
> >   * Return: Returns 0 on success, error on failure.
> >   */
> > -int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
> > +int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
> > +                        struct bpf_token *token)
> >  {
> > -     return call_int_hook(bpf_prog_alloc_security, 0, aux);
> > +     return call_int_hook(bpf_prog_load, 0, prog, attr, token);
> >  }
>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 11/17] bpf,lsm: add BPF token LSM hooks
  2023-11-06  5:01   ` [PATCH v9 " Paul Moore
@ 2023-11-06 19:17     ` Andrii Nakryiko
  2023-11-06 22:46       ` Paul Moore
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-06 19:17 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, brauner, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Sun, Nov 5, 2023 at 9:01 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to
> > allocate LSM security blob (we add `void *security` field to struct
> > bpf_token for that), but also control who can instantiate BPF token.
> > This follows existing pattern for BPF map and BPF prog.
> >
> > Also add security_bpf_token_allow_cmd() and security_bpf_token_capable()
> > LSM hooks that allow LSM implementation to control and negate (if
> > necessary) BPF token's delegation of a specific bpf_cmd and capability,
> > respectively.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h           |  3 ++
> >  include/linux/lsm_hook_defs.h |  5 +++
> >  include/linux/security.h      | 25 +++++++++++++++
> >  kernel/bpf/bpf_lsm.c          |  4 +++
> >  kernel/bpf/token.c            | 13 ++++++--
> >  security/security.c           | 60 +++++++++++++++++++++++++++++++++++
> >  6 files changed, 107 insertions(+), 3 deletions(-)
>
> ...
>
> > diff --git a/include/linux/security.h b/include/linux/security.h
> > index 08fd777cbe94..1d6edbf45d1c 100644
> > --- a/include/linux/security.h
> > +++ b/include/linux/security.h
> > @@ -60,6 +60,7 @@ struct fs_parameter;
> >  enum fs_value_type;
> >  struct watch;
> >  struct watch_notification;
> > +enum bpf_cmd;
>
> Yes, I think it's fine to include bpf.h in security.h instead of the
> forward declaration.
>
> >  /* Default (no) options for the capable function */
> >  #define CAP_OPT_NONE 0x0
> > @@ -2031,6 +2032,11 @@ extern void security_bpf_map_free(struct bpf_map *map);
> >  extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
> >                                 struct bpf_token *token);
> >  extern void security_bpf_prog_free(struct bpf_prog *prog);
> > +extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> > +                                  struct path *path);
> > +extern void security_bpf_token_free(struct bpf_token *token);
> > +extern int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> > +extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
> >  #else
> >  static inline int security_bpf(int cmd, union bpf_attr *attr,
> >                                            unsigned int size)
> > @@ -2065,6 +2071,25 @@ static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *
> >
> >  static inline void security_bpf_prog_free(struct bpf_prog *prog)
> >  { }
> > +
> > +static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> > +                                  struct path *path)
> > +{
> > +     return 0;
> > +}
> > +
> > +static inline void security_bpf_token_free(struct bpf_token *token)
> > +{ }
> > +
> > +static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > +{
> > +     return 0;
> > +}
> > +
> > +static inline int security_bpf_token_capable(const struct bpf_token *token, int cap)
> > +{
> > +     return 0;
> > +}
>
> Another nitpick, but I would prefer to shorten
> security_bpf_token_allow_cmd() renamed to security_bpf_token_cmd() both
> to shorten the name and to better fit convention.  I realize the caller
> is named bpf_token_allow_cmd() but I'd still rather see the LSM hook
> with the shorter name.

Makes sense, renamed to security_bpf_token_cmd() and updated hook name as well

>
> > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > index 35e6f55c2a41..5d04da54faea 100644
> > --- a/kernel/bpf/token.c
> > +++ b/kernel/bpf/token.c
> > @@ -7,11 +7,12 @@
> >  #include <linux/idr.h>
> >  #include <linux/namei.h>
> >  #include <linux/user_namespace.h>
> > +#include <linux/security.h>
> >
> >  bool bpf_token_capable(const struct bpf_token *token, int cap)
> >  {
> >       /* BPF token allows ns_capable() level of capabilities */
> > -     if (token) {
> > +     if (token && security_bpf_token_capable(token, cap) == 0) {
> >               if (ns_capable(token->userns, cap))
> >                       return true;
> >               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
>
> We typically perform the capability based access controls prior to the
> LSM controls, meaning if we want to the token controls to work in a
> similar way we should do something like this:
>
>   bool bpf_token_capable(...)
>   {
>     if (token) {
>       if (ns_capable(token, cap) ||
>           (cap != ADMIN && ns_capable(token, ADMIN)))
>         return security_bpf_token_capable(token, cap);
>     }
>     return capable(cap) || (cap != ADMIN && capable(...))
>   }

yep, makes sense, I changed it as you suggested above

>
> > @@ -28,6 +29,7 @@ void bpf_token_inc(struct bpf_token *token)
> >
> >  static void bpf_token_free(struct bpf_token *token)
> >  {
> > +     security_bpf_token_free(token);
> >       put_user_ns(token->userns);
> >       kvfree(token);
> >  }
> > @@ -172,6 +174,10 @@ int bpf_token_create(union bpf_attr *attr)
> >       token->allowed_progs = mnt_opts->delegate_progs;
> >       token->allowed_attachs = mnt_opts->delegate_attachs;
> >
> > +     err = security_bpf_token_create(token, attr, &path);
> > +     if (err)
> > +             goto out_token;
> > +
> >       fd = get_unused_fd_flags(O_CLOEXEC);
> >       if (fd < 0) {
> >               err = fd;
> > @@ -216,8 +222,9 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> >  {
> >       if (!token)
> >               return false;
> > -
> > -     return token->allowed_cmds & (1ULL << cmd);
> > +     if (!(token->allowed_cmds & (1ULL << cmd)))
> > +             return false;
> > +     return security_bpf_token_allow_cmd(token, cmd) == 0;
>
> I'm not sure how much it really matters, but someone might prefer
> the '!!' approach/style over '== 0'.

it would have to be !security_bpf_token_cmd(), right? And that single
negation is just very confusing when dealing with int-returning
function. I find it much easier to make sure the logic is correct when
we have explicit `== 0`.

Like, when I see `return !security_bpf_token_cmd(...);`, my immediate
read of that is "return whether bpf_token_cmd is not allowed" or
something along those lines, giving me a huge pause... I have the same
relationship with strcmp(), btw, while people seem totally fine with
`!strcmp()` (which to me also reads backwards).

Anyways, unless you really feel strongly, I'd keep == 0 here and above
for security_bpf_token_capable(), just because it's int-returning
function result conversion to bool-returning result.

>
> >  }
> >
> >  bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 17/17] bpf,selinux: allocate bpf_security_struct per BPF token
  2023-11-06  5:01   ` [PATCH v9 " Paul Moore
@ 2023-11-06 19:18     ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-06 19:18 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, brauner, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Sun, Nov 5, 2023 at 9:01 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Utilize newly added bpf_token_create/bpf_token_free LSM hooks to
> > allocate struct bpf_security_struct for each BPF token object in
> > SELinux. This just follows similar pattern for BPF prog and map.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  security/selinux/hooks.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
>
> Thanks Andrii, we'll need some additional code to fully enable the
> BPF tokens on a SELinux system but I can help provide that if you'd
> like.  Although I might not be able to get to that until after the
> merge window closes.

Yep, I'd appreciate your help with the SELinux side. Until after the
merge window is fine, yes.

Thanks for reviewing the patch set!

>
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 002351ab67b7..1501e95366a1 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -6828,6 +6828,29 @@ static void selinux_bpf_prog_free(struct bpf_prog *prog)
> >       prog->aux->security = NULL;
> >       kfree(bpfsec);
> >  }
> > +
> > +static int selinux_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> > +                                 struct path *path)
> > +{
> > +     struct bpf_security_struct *bpfsec;
> > +
> > +     bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> > +     if (!bpfsec)
> > +             return -ENOMEM;
> > +
> > +     bpfsec->sid = current_sid();
> > +     token->security = bpfsec;
> > +
> > +     return 0;
> > +}
> > +
> > +static void selinux_bpf_token_free(struct bpf_token *token)
> > +{
> > +     struct bpf_security_struct *bpfsec = token->security;
> > +
> > +     token->security = NULL;
> > +     kfree(bpfsec);
> > +}
> >  #endif
> >
> >  struct lsm_blob_sizes selinux_blob_sizes __ro_after_init = {
> > @@ -7183,6 +7206,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
> >       LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
> >       LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free),
> >       LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
> > +     LSM_HOOK_INIT(bpf_token_free, selinux_bpf_token_free),
> >  #endif
> >
> >  #ifdef CONFIG_PERF_EVENTS
> > @@ -7241,6 +7265,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
> >  #ifdef CONFIG_BPF_SYSCALL
> >       LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create),
> >       LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
> > +     LSM_HOOK_INIT(bpf_token_create, selinux_bpf_token_create),
> >  #endif
> >  #ifdef CONFIG_PERF_EVENTS
> >       LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
> > --
> > 2.34.1
>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 9/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
  2023-11-06 19:03     ` Andrii Nakryiko
@ 2023-11-06 22:29       ` Paul Moore
  0 siblings, 0 replies; 37+ messages in thread
From: Paul Moore @ 2023-11-06 22:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, netdev, brauner, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Mon, Nov 6, 2023 at 2:03 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Sun, Nov 5, 2023 at 9:01 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Based on upstream discussion ([0]), rework existing
> > > bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
> > > of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
> > > program struct. Also, we pass bpf_attr union with all the user-provided
> > > arguments for BPF_PROG_LOAD command.  This will give LSMs as much
> > > information as we can basically provide.
> > >
> > > The hook is also BPF token-aware now, and optional bpf_token struct is
> > > passed as a third argument. bpf_prog_load LSM hook is called after
> > > a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
> > > allocated and filled out, but right before performing full-fledged BPF
> > > verification step.
> > >
> > > bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
> > > consistency. SELinux code is adjusted to all new names, types, and
> > > signatures.
> > >
> > > Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
> > > used by some LSMs to allocate extra security blob, but also by other
> > > LSMs to reject BPF program loading, we need to make sure that
> > > bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
> > > *even* if the hook itself returned error. If we don't do that, we run
> > > the risk of leaking memory. This seems to be possible today when
> > > combining SELinux and BPF LSM, as one example, depending on their
> > > relative ordering.
> > >
> > > Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
> > > sleepable LSM hooks list, as they are both executed in sleepable
> > > context. Also drop bpf_prog_load hook from untrusted, as there is no
> > > issue with refcount or anything else anymore, that originally forced us
> > > to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM
> > > hook arguments as trusted"). We now trigger this hook much later and it
> > > should not be an issue anymore.
> >
> > See my comment below, but it isn't clear to me if this means it is okay
> > to have `BTF_ID(func, bpf_lsm_bpf_prog_free)` called twice.  It probably
> > would be a good idea to get KP, BPF LSM maintainer, to review this change
> > as well to make sure this looks good to him.
> >
> > >   [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/lsm_hook_defs.h |  5 +++--
> > >  include/linux/security.h      | 12 +++++++-----
> > >  kernel/bpf/bpf_lsm.c          |  5 +++--
> > >  kernel/bpf/syscall.c          | 25 +++++++++++++------------
> > >  security/security.c           | 25 +++++++++++++++----------
> > >  security/selinux/hooks.c      | 15 ++++++++-------
> > >  6 files changed, 49 insertions(+), 38 deletions(-)
> >
> > ...
> >
> > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > index e14c822f8911..3e956f6302f3 100644
> > > --- a/kernel/bpf/bpf_lsm.c
> > > +++ b/kernel/bpf/bpf_lsm.c
> > > @@ -263,6 +263,8 @@ BTF_ID(func, bpf_lsm_bpf_map)
> > >  BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
> > >  BTF_ID(func, bpf_lsm_bpf_map_free_security)
> > >  BTF_ID(func, bpf_lsm_bpf_prog)
> > > +BTF_ID(func, bpf_lsm_bpf_prog_load)
> > > +BTF_ID(func, bpf_lsm_bpf_prog_free)
> > >  BTF_ID(func, bpf_lsm_bprm_check_security)
> > >  BTF_ID(func, bpf_lsm_bprm_committed_creds)
> > >  BTF_ID(func, bpf_lsm_bprm_committing_creds)
> > > @@ -346,8 +348,7 @@ BTF_SET_END(sleepable_lsm_hooks)
> > >
> > >  BTF_SET_START(untrusted_lsm_hooks)
> > >  BTF_ID(func, bpf_lsm_bpf_map_free_security)
> > > -BTF_ID(func, bpf_lsm_bpf_prog_alloc_security)
> > > -BTF_ID(func, bpf_lsm_bpf_prog_free_security)
> > > +BTF_ID(func, bpf_lsm_bpf_prog_free)
> > >  BTF_ID(func, bpf_lsm_file_alloc_security)
> > >  BTF_ID(func, bpf_lsm_file_free_security)
> > >  #ifdef CONFIG_SECURITY_NETWORK
> >
> > It looks like you're calling the BTF_ID() macro on bpf_lsm_bpf_prog_free
> > twice?  I would have expected a only one macro call for each bpf_prog_load
> > and bpf_prog_free, is that a bad assuption?
> >
>
> Yeah, there is no problem having multiple BTF_ID() invocations for the
> same function. BTF_ID() macro (conceptually) emits a relocation that
> will instruct resolve_btfids tool to put an actual BTF ID for the
> specified function in a designated 4-byte slot.
>
> In this case, we have two separate lists: sleepable_lsm_hooks and
> untrusted_lsm_hooks, so we need two separate BTF_ID() entries for the
> same function. It's expected to be duplicated.

Okay, thanks for the explanation.  It jumped out as a deviation from
the current code when I was looking at the changes and I was worried
that it was a typo, but it sounds like it's expected and the proper
thing to do.

> > > diff --git a/security/security.c b/security/security.c
> > > index dcb3e7014f9b..5773d446210e 100644
> > > --- a/security/security.c
> > > +++ b/security/security.c
> > > @@ -5180,16 +5180,21 @@ int security_bpf_map_alloc(struct bpf_map *map)
> > >  }
> > >
> > >  /**
> > > - * security_bpf_prog_alloc() - Allocate a bpf program LSM blob
> > > - * @aux: bpf program aux info struct
> > > + * security_bpf_prog_load() - Check if loading of BPF program is allowed
> > > + * @prog: BPF program object
> > > + * @attr: BPF syscall attributes used to create BPF program
> > > + * @token: BPF token used to grant user access to BPF subsystem
> > >   *
> > > - * Initialize the security field inside bpf program.
> > > + * Do a check when the kernel allocates BPF program object and is about to
> > > + * pass it to BPF verifier for additional correctness checks. This is also the
> > > + * point where LSM blob is allocated for LSMs that need them.
> >
> > This is pretty nitpicky, but I'm guessing you may need to make another
> > revision to this patchset, if you do please drop the BPF verifier remark
> > from the comment above.
> >
> > Example: "Perform an access control check when the kernel loads a BPF
> > program and allocates the associated BPF program object.  This hook is
> > also responsibile for allocating any required LSM state for the BPF
> > program."
>
> Done, no problem.

With the comment change above, and the clarification of the BTF_ID()
calls, feel free to add my ACK.

Acked-by: Paul Moore <paul@paul-moore.com>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 11/17] bpf,lsm: add BPF token LSM hooks
  2023-11-06 19:17     ` Andrii Nakryiko
@ 2023-11-06 22:46       ` Paul Moore
  0 siblings, 0 replies; 37+ messages in thread
From: Paul Moore @ 2023-11-06 22:46 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, netdev, brauner, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Mon, Nov 6, 2023 at 2:17 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Sun, Nov 5, 2023 at 9:01 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Nov  3, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to
> > > allocate LSM security blob (we add `void *security` field to struct
> > > bpf_token for that), but also control who can instantiate BPF token.
> > > This follows existing pattern for BPF map and BPF prog.
> > >
> > > Also add security_bpf_token_allow_cmd() and security_bpf_token_capable()
> > > LSM hooks that allow LSM implementation to control and negate (if
> > > necessary) BPF token's delegation of a specific bpf_cmd and capability,
> > > respectively.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h           |  3 ++
> > >  include/linux/lsm_hook_defs.h |  5 +++
> > >  include/linux/security.h      | 25 +++++++++++++++
> > >  kernel/bpf/bpf_lsm.c          |  4 +++
> > >  kernel/bpf/token.c            | 13 ++++++--
> > >  security/security.c           | 60 +++++++++++++++++++++++++++++++++++
> > >  6 files changed, 107 insertions(+), 3 deletions(-)
> >
> > ...
> >
> > > diff --git a/include/linux/security.h b/include/linux/security.h
> > > index 08fd777cbe94..1d6edbf45d1c 100644
> > > --- a/include/linux/security.h
> > > +++ b/include/linux/security.h
> > > @@ -60,6 +60,7 @@ struct fs_parameter;
> > >  enum fs_value_type;
> > >  struct watch;
> > >  struct watch_notification;
> > > +enum bpf_cmd;
> >
> > Yes, I think it's fine to include bpf.h in security.h instead of the
> > forward declaration.
> >
> > >  /* Default (no) options for the capable function */
> > >  #define CAP_OPT_NONE 0x0
> > > @@ -2031,6 +2032,11 @@ extern void security_bpf_map_free(struct bpf_map *map);
> > >  extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
> > >                                 struct bpf_token *token);
> > >  extern void security_bpf_prog_free(struct bpf_prog *prog);
> > > +extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> > > +                                  struct path *path);
> > > +extern void security_bpf_token_free(struct bpf_token *token);
> > > +extern int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> > > +extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
> > >  #else
> > >  static inline int security_bpf(int cmd, union bpf_attr *attr,
> > >                                            unsigned int size)
> > > @@ -2065,6 +2071,25 @@ static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *
> > >
> > >  static inline void security_bpf_prog_free(struct bpf_prog *prog)
> > >  { }
> > > +
> > > +static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
> > > +                                  struct path *path)
> > > +{
> > > +     return 0;
> > > +}
> > > +
> > > +static inline void security_bpf_token_free(struct bpf_token *token)
> > > +{ }
> > > +
> > > +static inline int security_bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > +{
> > > +     return 0;
> > > +}
> > > +
> > > +static inline int security_bpf_token_capable(const struct bpf_token *token, int cap)
> > > +{
> > > +     return 0;
> > > +}
> >
> > Another nitpick, but I would prefer to shorten
> > security_bpf_token_allow_cmd() renamed to security_bpf_token_cmd() both
> > to shorten the name and to better fit convention.  I realize the caller
> > is named bpf_token_allow_cmd() but I'd still rather see the LSM hook
> > with the shorter name.
>
> Makes sense, renamed to security_bpf_token_cmd() and updated hook name as well

Thanks.

> > > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > > index 35e6f55c2a41..5d04da54faea 100644
> > > --- a/kernel/bpf/token.c
> > > +++ b/kernel/bpf/token.c
> > > @@ -7,11 +7,12 @@
> > >  #include <linux/idr.h>
> > >  #include <linux/namei.h>
> > >  #include <linux/user_namespace.h>
> > > +#include <linux/security.h>
> > >
> > >  bool bpf_token_capable(const struct bpf_token *token, int cap)
> > >  {
> > >       /* BPF token allows ns_capable() level of capabilities */
> > > -     if (token) {
> > > +     if (token && security_bpf_token_capable(token, cap) == 0) {
> > >               if (ns_capable(token->userns, cap))
> > >                       return true;
> > >               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> >
> > We typically perform the capability based access controls prior to the
> > LSM controls, meaning if we want to the token controls to work in a
> > similar way we should do something like this:
> >
> >   bool bpf_token_capable(...)
> >   {
> >     if (token) {
> >       if (ns_capable(token, cap) ||
> >           (cap != ADMIN && ns_capable(token, ADMIN)))
> >         return security_bpf_token_capable(token, cap);
> >     }
> >     return capable(cap) || (cap != ADMIN && capable(...))
> >   }
>
> yep, makes sense, I changed it as you suggested above

Thanks again.

> > > @@ -28,6 +29,7 @@ void bpf_token_inc(struct bpf_token *token)
> > >
> > >  static void bpf_token_free(struct bpf_token *token)
> > >  {
> > > +     security_bpf_token_free(token);
> > >       put_user_ns(token->userns);
> > >       kvfree(token);
> > >  }
> > > @@ -172,6 +174,10 @@ int bpf_token_create(union bpf_attr *attr)
> > >       token->allowed_progs = mnt_opts->delegate_progs;
> > >       token->allowed_attachs = mnt_opts->delegate_attachs;
> > >
> > > +     err = security_bpf_token_create(token, attr, &path);
> > > +     if (err)
> > > +             goto out_token;
> > > +
> > >       fd = get_unused_fd_flags(O_CLOEXEC);
> > >       if (fd < 0) {
> > >               err = fd;
> > > @@ -216,8 +222,9 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > >  {
> > >       if (!token)
> > >               return false;
> > > -
> > > -     return token->allowed_cmds & (1ULL << cmd);
> > > +     if (!(token->allowed_cmds & (1ULL << cmd)))
> > > +             return false;
> > > +     return security_bpf_token_allow_cmd(token, cmd) == 0;
> >
> > I'm not sure how much it really matters, but someone might prefer
> > the '!!' approach/style over '== 0'.
>
> it would have to be !security_bpf_token_cmd(), right?

Yeah :P

In most, although definitely not all, kernel functions when something
returns 0 we consider that the positive/success case, with non-zero
values being some sort of failure.  I must have defaulted to that
logic here, but you are correct that just a single negation would be
needed here.

> And that single
> negation is just very confusing when dealing with int-returning
> function. I find it much easier to make sure the logic is correct when
> we have explicit `== 0`.

That's fine, it's something I've seen mentioned over the years and
thought I might offer it as a comment.  I can read either approach
just fine :)

Anyway, with the other changes mentioned above, e.g. naming and
permission ordering, feel free to add my ACK.

Acked-by: Paul Moore <paul@paul-moore.com>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
  2023-11-03 19:05 ` [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
@ 2023-11-08 13:51   ` Christian Brauner
  2023-11-08 21:09     ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Christian Brauner @ 2023-11-08 13:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, linux-fsdevel, linux-security-module,
	keescook, kernel-team, sargun

On Fri, Nov 03, 2023 at 12:05:08PM -0700, Andrii Nakryiko wrote:
> Add few new mount options to BPF FS that allow to specify that a given
> BPF FS instance allows creation of BPF token (added in the next patch),
> and what sort of operations are allowed under BPF token. As such, we get
> 4 new mount options, each is a bit mask
>   - `delegate_cmds` allow to specify which bpf() syscall commands are
>     allowed with BPF token derived from this BPF FS instance;
>   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
>     a set of allowable BPF map types that could be created with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
>     a set of allowable BPF program types that could be loaded with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
>     a set of allowable BPF program attach types that could be loaded with
>     BPF token; delegate_progs and delegate_attachs are meant to be used
>     together, as full BPF program type is, in general, determined
>     through both program type and program attach type.
> 
> Currently, these mount options accept the following forms of values:
>   - a special value "any", that enables all possible values of a given
>   bit set;
>   - numeric value (decimal or hexadecimal, determined by kernel
>   automatically) that specifies a bit mask value directly;
>   - all the values for a given mount option are combined, if specified
>   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
>   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
>   mask.
> 
> Ideally, more convenient (for humans) symbolic form derived from
> corresponding UAPI enums would be accepted (e.g., `-o
> delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> it requires a bunch of UAPI header churn, so I postponed it until this
> feature lands upstream or at least there is a definite consensus that
> this feature is acceptable and is going to make it, just to minimize
> amount of wasted effort and not increase amount of non-essential code to
> be reviewed.
> 
> Attentive reader will notice that BPF FS is now marked as
> FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> user namespace as long as the process has sufficient *namespaced*
> capabilities within that user namespace. But in reality we still
> restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> inside unprivileged process inside non-init userns, to capture that
> userns as the owning userns. It will still be required to pass this
> context object back to privileged process to instantiate and mount it.
> 
> This manipulation is important, because capturing non-init userns as the
> owning userns of BPF FS instance (super block) allows to use that userns
> to constraint BPF token to that userns later on (see next patch). So
> creating BPF FS with delegation inside unprivileged userns will restrict
> derived BPF token objects to only "work" inside that intended userns,
> making it scoped to a intended "container".
> 
> There is a set of selftests at the end of the patch set that simulates
> this sequence of steps and validates that everything works as intended.
> But careful review is requested to make sure there are no missed gaps in
> the implementation and testing.
> 
> All this is based on suggestions and discussions with Christian Brauner
> ([0]), to the best of my ability to follow all the implications.

"who will not be held responsible for any CVE future or present as he's
 not sure whether bpf token is a good idea in general"

I'm not opposing it because it's really not my subsystem. But it'd be
nice if you also added a disclaimer that I'm not endorsing this. :)

A comment below.

> 
>   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h | 10 ++++++
>  kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 88 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index b4825d3cdb29..df50a7bf1a77 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1562,6 +1562,16 @@ struct bpf_link_primer {
>  	u32 id;
>  };
>  
> +struct bpf_mount_opts {
> +	umode_t mode;
> +
> +	/* BPF token-related delegation options */
> +	u64 delegate_cmds;
> +	u64 delegate_maps;
> +	u64 delegate_progs;
> +	u64 delegate_attachs;
> +};
> +
>  struct bpf_struct_ops_value;
>  struct btf_member;
>  
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 1aafb2ff2e95..e49e93bc65e3 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -20,6 +20,7 @@
>  #include <linux/filter.h>
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
> +#include <linux/kstrtox.h>
>  #include "preload/bpf_preload.h"
>  
>  enum bpf_type {
> @@ -599,10 +600,31 @@ EXPORT_SYMBOL(bpf_prog_get_type_path);
>   */
>  static int bpf_show_options(struct seq_file *m, struct dentry *root)
>  {
> +	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
>  	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
>  
>  	if (mode != S_IRWXUGO)
>  		seq_printf(m, ",mode=%o", mode);
> +
> +	if (opts->delegate_cmds == ~0ULL)
> +		seq_printf(m, ",delegate_cmds=any");
> +	else if (opts->delegate_cmds)
> +		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
> +
> +	if (opts->delegate_maps == ~0ULL)
> +		seq_printf(m, ",delegate_maps=any");
> +	else if (opts->delegate_maps)
> +		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
> +
> +	if (opts->delegate_progs == ~0ULL)
> +		seq_printf(m, ",delegate_progs=any");
> +	else if (opts->delegate_progs)
> +		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
> +
> +	if (opts->delegate_attachs == ~0ULL)
> +		seq_printf(m, ",delegate_attachs=any");
> +	else if (opts->delegate_attachs)
> +		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
>  	return 0;
>  }
>  
> @@ -626,22 +648,27 @@ static const struct super_operations bpf_super_ops = {
>  
>  enum {
>  	OPT_MODE,
> +	OPT_DELEGATE_CMDS,
> +	OPT_DELEGATE_MAPS,
> +	OPT_DELEGATE_PROGS,
> +	OPT_DELEGATE_ATTACHS,
>  };
>  
>  static const struct fs_parameter_spec bpf_fs_parameters[] = {
>  	fsparam_u32oct	("mode",			OPT_MODE),
> +	fsparam_string	("delegate_cmds",		OPT_DELEGATE_CMDS),
> +	fsparam_string	("delegate_maps",		OPT_DELEGATE_MAPS),
> +	fsparam_string	("delegate_progs",		OPT_DELEGATE_PROGS),
> +	fsparam_string	("delegate_attachs",		OPT_DELEGATE_ATTACHS),
>  	{}
>  };
>  
> -struct bpf_mount_opts {
> -	umode_t mode;
> -};
> -
>  static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
>  {
> -	struct bpf_mount_opts *opts = fc->fs_private;
> +	struct bpf_mount_opts *opts = fc->s_fs_info;
>  	struct fs_parse_result result;
> -	int opt;
> +	int opt, err;
> +	u64 msk;
>  
>  	opt = fs_parse(fc, bpf_fs_parameters, param, &result);
>  	if (opt < 0) {
> @@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
>  	case OPT_MODE:
>  		opts->mode = result.uint_32 & S_IALLUGO;
>  		break;
> +	case OPT_DELEGATE_CMDS:
> +	case OPT_DELEGATE_MAPS:
> +	case OPT_DELEGATE_PROGS:
> +	case OPT_DELEGATE_ATTACHS:
> +		if (strcmp(param->string, "any") == 0) {
> +			msk = ~0ULL;
> +		} else {
> +			err = kstrtou64(param->string, 0, &msk);
> +			if (err)
> +				return err;
> +		}
> +		switch (opt) {
> +		case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
> +		case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
> +		case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
> +		case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
> +		default: return -EINVAL;
> +		}
> +		break;
>  	}

So just to repeat that this will allow a container to set it's own
delegation options:

        # unprivileged container

        fd_fs = fsopen();
        fsconfig(fd_fs, FSCONFIG_BLA_BLA, "give-me-all-the-delegation");

        # Now hand of that fd_fs to a privileged process

        fsconfig(fd_fs, FSCONFIG_CREATE_CMD, ...)

This means the container manager can't be part of your threat model
because you need to trust it to set delegation options.

But if the container manager is part of your threat model then you can
never trust an fd_fs handed to you because the container manager might
have enabled arbitrary delegation privileges.

There's ways around this:

(1) kernel: Account for this in the kernel and require privileges when
    setting delegation options.
(2) userspace: A trusted helper that allocates an fs_context fd in
    the target user namespace, then sets delegation options and creates
    superblock.

(1) Is more restrictive but also more secure. (2) is less restrictive
but requires more care from userspace.

Either way I would probably consider writing a document detailing
various delegation scenarios and possible pitfalls and implications
before advertising it.

If you choose (2) then you also need to be aware that the security of
this also hinges on bpffs not allowing to reconfigure parameters once it
has been mounted. Otherwise an unprivileged container can change
delegation options.

I would recommend that you either add a dummy bpf_reconfigure() method
with a comment in it or you add a comment on top of bpf_context_ops.
Something like:

/*
 * Unprivileged mounts of bpffs are owned by the user namespace they are
 * mounted in. That means unprivileged users can change vfs mount
 * options (ro<->rw, nosuid, etc.).
 *
 * They currently cannot change bpffs specific mount options such as
 * delegation settings. If that is ever implemented it is necessary to
 * require rivileges in the initial namespace. Otherwise unprivileged
 * users can change delegation options to whatever they want.
 */

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object
  2023-11-03 19:05 ` [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object Andrii Nakryiko
@ 2023-11-08 14:28   ` Christian Brauner
  2023-11-08 21:09     ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Christian Brauner @ 2023-11-08 14:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, linux-fsdevel, linux-security-module,
	keescook, kernel-team, sargun

On Fri, Nov 03, 2023 at 12:05:09PM -0700, Andrii Nakryiko wrote:
> Add new kind of BPF kernel object, BPF token. BPF token is meant to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while have a good amount of control over which
> privileged operations could be performed using provided BPF token.
> 
> This is achieved through mounting BPF FS instance with extra delegation
> mount options, which determine what operations are delegatable, and also
> constraining it to the owning user namespace (as mentioned in the
> previous patch).
> 
> BPF token itself is just a derivative from BPF FS and can be created
> through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts
> a path specification (using the usual fd + string path combo) to a BPF
> FS mount. Currently, BPF token "inherits" delegated command, map types,
> prog type, and attach type bit sets from BPF FS as is. In the future,
> having an BPF token as a separate object with its own FD, we can allow
> to further restrict BPF token's allowable set of things either at the creation
> time or after the fact, allowing the process to guard itself further
> from, e.g., unintentionally trying to load undesired kind of BPF
> programs. But for now we keep things simple and just copy bit sets as is.
> 
> When BPF token is created from BPF FS mount, we take reference to the
> BPF super block's owning user namespace, and then use that namespace for
> checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> capabilities that are normally only checked against init userns (using
> capable()), but now we check them using ns_capable() instead (if BPF
> token is provided). See bpf_token_capable() for details.
> 
> Such setup means that BPF token in itself is not sufficient to grant BPF
> functionality. User namespaced process has to *also* have necessary
> combination of capabilities inside that user namespace. So while
> previously CAP_BPF was useless when granted within user namespace, now
> it gains a meaning and allows container managers and sys admins to have
> a flexible control over which processes can and need to use BPF
> functionality within the user namespace (i.e., container in practice).
> And BPF FS delegation mount options and derived BPF tokens serve as
> a per-container "flag" to grant overall ability to use bpf() (plus further
> restrict on which parts of bpf() syscalls are treated as namespaced).
> 
> Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
> within the BPF FS owning user namespace, rounding up the ns_capable()
> story of BPF token.
> 
> The alternative to creating BPF token object was:
>   a) not having any extra object and just pasing BPF FS path to each
>      relevant bpf() command. This seems suboptimal as it's racy (mount
>      under the same path might change in between checking it and using it
>      for bpf() command). And also less flexible if we'd like to further

I don't understand "mount under the same path might change in between
checking it and using it for bpf() command".

Just require userspace to open() the bpffs instance and pass that fd to
bpf() just as you're doing right now. If that is racy then the current
implementation is even more so because it is passing:

bpffs_path_fd
bpffs_pathname

and then performs a lookup. More on that below.

I want to point out that most of this code here is unnecessary if you
use the bpffs fd itself as a token. But that's your decision. I'm just
saying that I'm not sure the critique that it's racy is valid.

>      restrict ourselves compared to all the delegated functionality
>      allowed on BPF FS.
>   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
>      a dedicated FD that would represent a token-like functionality. This
>      doesn't seem superior to having a proper bpf() command, so
>      BPF_TOKEN_CREATE was chosen.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h            |  41 +++++++
>  include/uapi/linux/bpf.h       |  39 +++++++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/inode.c             |  17 ++-
>  kernel/bpf/syscall.c           |  17 +++
>  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  39 +++++++
>  7 files changed, 342 insertions(+), 10 deletions(-)
>  create mode 100644 kernel/bpf/token.c
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index df50a7bf1a77..5ab418c78876 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -51,6 +51,10 @@ struct module;
>  struct bpf_func_state;
>  struct ftrace_ops;
>  struct cgroup;
> +struct bpf_token;
> +struct user_namespace;
> +struct super_block;
> +struct inode;
>  
>  extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
> @@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
>  	u64 delegate_attachs;
>  };
>  
> +struct bpf_token {
> +	struct work_struct work;
> +	atomic64_t refcnt;
> +	struct user_namespace *userns;
> +	u64 allowed_cmds;
> +};
> +
>  struct bpf_struct_ops_value;
>  struct btf_member;
>  
> @@ -2029,6 +2040,7 @@ static inline void bpf_enable_instrumentation(void)
>  	migrate_enable();
>  }
>  
> +extern const struct super_operations bpf_super_ops;
>  extern const struct file_operations bpf_map_fops;
>  extern const struct file_operations bpf_prog_fops;
>  extern const struct file_operations bpf_iter_fops;
> @@ -2163,6 +2175,8 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map)
>  
>  extern int sysctl_unprivileged_bpf_disabled;
>  
> +bool bpf_token_capable(const struct bpf_token *token, int cap);
> +
>  static inline bool bpf_allow_ptr_leaks(void)
>  {
>  	return perfmon_capable();
> @@ -2197,8 +2211,17 @@ int bpf_link_new_fd(struct bpf_link *link);
>  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
>  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
>  
> +void bpf_token_inc(struct bpf_token *token);
> +void bpf_token_put(struct bpf_token *token);
> +int bpf_token_create(union bpf_attr *attr);
> +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> +
> +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> +
>  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
>  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> +struct inode *bpf_get_inode(struct super_block *sb, const struct inode *dir,
> +			    umode_t mode);
>  
>  #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
>  #define DEFINE_BPF_ITER_FUNC(target, args...)			\
> @@ -2561,6 +2584,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
>  	return -EOPNOTSUPP;
>  }
>  
> +static inline bool bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +}
> +
> +static inline void bpf_token_inc(struct bpf_token *token)
> +{
> +}
> +
> +static inline void bpf_token_put(struct bpf_token *token)
> +{
> +}
> +
> +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> +{
> +	return ERR_PTR(-EOPNOTSUPP);
> +}
> +
>  static inline void __dev_flush(void)
>  {
>  }
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 0f6cdf52b1da..f75f289f1749 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -847,6 +847,37 @@ union bpf_iter_link_info {
>   *		Returns zero on success. On error, -1 is returned and *errno*
>   *		is set appropriately.
>   *
> + * BPF_TOKEN_CREATE
> + *	Description
> + *		Create BPF token with embedded information about what
> + *		BPF-related functionality it allows:
> + *		- a set of allowed bpf() syscall commands;
> + *		- a set of allowed BPF map types to be created with
> + *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
> + *		- a set of allowed BPF program types and BPF program attach
> + *		types to be loaded with BPF_PROG_LOAD command, if
> + *		BPF_PROG_LOAD itself is allowed.
> + *
> + *		BPF token is created (derived) from an instance of BPF FS,
> + *		assuming it has necessary delegation mount options specified.
> + *		BPF FS mount is specified with openat()-style path FD + string.
> + *		This BPF token can be passed as an extra parameter to various
> + *		bpf() syscall commands to grant BPF subsystem functionality to
> + *		unprivileged processes.
> + *
> + *		When created, BPF token is "associated" with the owning
> + *		user namespace of BPF FS instance (super block) that it was
> + *		derived from, and subsequent BPF operations performed with
> + *		BPF token would be performing capabilities checks (i.e.,
> + *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
> + *		that user namespace. Without BPF token, such capabilities
> + *		have to be granted in init user namespace, making bpf()
> + *		syscall incompatible with user namespace, for the most part.
> + *
> + *	Return
> + *		A new file descriptor (a nonnegative integer), or -1 if an
> + *		error occurred (in which case, *errno* is set appropriately).
> + *
>   * NOTES
>   *	eBPF objects (maps and programs) can be shared between processes.
>   *
> @@ -901,6 +932,8 @@ enum bpf_cmd {
>  	BPF_ITER_CREATE,
>  	BPF_LINK_DETACH,
>  	BPF_PROG_BIND_MAP,
> +	BPF_TOKEN_CREATE,
> +	__MAX_BPF_CMD,
>  };
>  
>  enum bpf_map_type {
> @@ -1709,6 +1742,12 @@ union bpf_attr {
>  		__u32		flags;		/* extra flags */
>  	} prog_bind_map;
>  
> +	struct { /* struct used by BPF_TOKEN_CREATE command */
> +		__u32		flags;
> +		__u32		bpffs_path_fd;
> +		__aligned_u64	bpffs_pathname;
> +	} token_create;
> +
>  } __attribute__((aligned(8)));
>  
>  /* The description below is an attempt at providing documentation to eBPF
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index f526b7573e97..4ce95acfcaa7 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
>  endif
>  CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
>  
> -obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
> +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
>  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
>  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index e49e93bc65e3..3418a0edf7b4 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -99,9 +99,9 @@ static const struct inode_operations bpf_prog_iops = { };
>  static const struct inode_operations bpf_map_iops  = { };
>  static const struct inode_operations bpf_link_iops  = { };
>  
> -static struct inode *bpf_get_inode(struct super_block *sb,
> -				   const struct inode *dir,
> -				   umode_t mode)
> +struct inode *bpf_get_inode(struct super_block *sb,
> +			    const struct inode *dir,
> +			    umode_t mode)
>  {
>  	struct inode *inode;
>  
> @@ -602,11 +602,13 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
>  {
>  	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
>  	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
> +	u64 mask;
>  
>  	if (mode != S_IRWXUGO)
>  		seq_printf(m, ",mode=%o", mode);
>  
> -	if (opts->delegate_cmds == ~0ULL)
> +	mask = (1ULL << __MAX_BPF_CMD) - 1;
> +	if ((opts->delegate_cmds & mask) == mask)
>  		seq_printf(m, ",delegate_cmds=any");
>  	else if (opts->delegate_cmds)
>  		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
> @@ -639,7 +641,7 @@ static void bpf_free_inode(struct inode *inode)
>  	free_inode_nonrcu(inode);
>  }
>  
> -static const struct super_operations bpf_super_ops = {
> +const struct super_operations bpf_super_ops = {
>  	.statfs		= simple_statfs,
>  	.drop_inode	= generic_delete_inode,
>  	.show_options	= bpf_show_options,
> @@ -814,10 +816,7 @@ static int bpf_get_tree(struct fs_context *fc)
>  
>  static void bpf_free_fc(struct fs_context *fc)
>  {
> -	struct bpf_mount_opts *opts = fc->s_fs_info;
> -
> -	if (opts)
> -		kfree(opts);
> +	kfree(fc->s_fs_info);
>  }
>  
>  static const struct fs_context_operations bpf_context_ops = {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ad4d8e433ccc..0565e46509a6 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -5346,6 +5346,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
>  	return ret;
>  }
>  
> +#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_pathname
> +
> +static int token_create(union bpf_attr *attr)
> +{
> +	if (CHECK_ATTR(BPF_TOKEN_CREATE))
> +		return -EINVAL;
> +
> +	/* no flags are supported yet */
> +	if (attr->token_create.flags)
> +		return -EINVAL;
> +
> +	return bpf_token_create(attr);
> +}
> +
>  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
>  {
>  	union bpf_attr attr;
> @@ -5479,6 +5493,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
>  	case BPF_PROG_BIND_MAP:
>  		err = bpf_prog_bind_map(&attr);
>  		break;
> +	case BPF_TOKEN_CREATE:
> +		err = token_create(&attr);
> +		break;
>  	default:
>  		err = -EINVAL;
>  		break;
> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> new file mode 100644
> index 000000000000..32fa395f829b
> --- /dev/null
> +++ b/kernel/bpf/token.c
> @@ -0,0 +1,197 @@
> +#include <linux/bpf.h>
> +#include <linux/vmalloc.h>
> +#include <linux/fdtable.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/kernel.h>
> +#include <linux/idr.h>
> +#include <linux/namei.h>
> +#include <linux/user_namespace.h>
> +
> +bool bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +	/* BPF token allows ns_capable() level of capabilities */
> +	if (token) {
> +		if (ns_capable(token->userns, cap))
> +			return true;
> +		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> +			return true;
> +	}
> +	/* otherwise fallback to capable() checks */
> +	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +}
> +
> +void bpf_token_inc(struct bpf_token *token)
> +{
> +	atomic64_inc(&token->refcnt);
> +}
> +
> +static void bpf_token_free(struct bpf_token *token)
> +{
> +	put_user_ns(token->userns);
> +	kvfree(token);
> +}
> +
> +static void bpf_token_put_deferred(struct work_struct *work)
> +{
> +	struct bpf_token *token = container_of(work, struct bpf_token, work);
> +
> +	bpf_token_free(token);
> +}
> +
> +void bpf_token_put(struct bpf_token *token)
> +{
> +	if (!token)
> +		return;
> +
> +	if (!atomic64_dec_and_test(&token->refcnt))
> +		return;
> +
> +	INIT_WORK(&token->work, bpf_token_put_deferred);
> +	schedule_work(&token->work);
> +}
> +
> +static int bpf_token_release(struct inode *inode, struct file *filp)
> +{
> +	struct bpf_token *token = filp->private_data;
> +
> +	bpf_token_put(token);
> +	return 0;
> +}
> +
> +static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
> +{
> +	struct bpf_token *token = filp->private_data;
> +	u64 mask;
> +
> +	BUILD_BUG_ON(__MAX_BPF_CMD >= 64);
> +	mask = (1ULL << __MAX_BPF_CMD) - 1;
> +	if ((token->allowed_cmds & mask) == mask)
> +		seq_printf(m, "allowed_cmds:\tany\n");
> +	else
> +		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
> +}
> +
> +#define BPF_TOKEN_INODE_NAME "bpf-token"
> +
> +static const struct inode_operations bpf_token_iops = { };
> +
> +static const struct file_operations bpf_token_fops = {
> +	.release	= bpf_token_release,
> +	.show_fdinfo	= bpf_token_show_fdinfo,
> +};
> +
> +int bpf_token_create(union bpf_attr *attr)
> +{
> +	struct bpf_mount_opts *mnt_opts;
> +	struct bpf_token *token = NULL;
> +	struct user_namespace *userns;
> +	struct inode *inode;
> +	struct file *file;
> +	struct path path;
> +	umode_t mode;
> +	int err, fd;
> +
> +	err = user_path_at(attr->token_create.bpffs_path_fd,
> +			   u64_to_user_ptr(attr->token_create.bpffs_pathname),
> +			   LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);

Do you really need bpffs_path_fd and bpffs_pathname?
This seems unnecessar as you're forcing a lookup that's best done in
userspace through regular open() apis. So I would just make this:

struct { /* struct used by BPF_TOKEN_CREATE command */
	__u32		flags;
	__u32		bpffs_path_fd;
} token_create;

In bpf_token_create() you can then just do:

        struct fd f;
        struct path path;

        f = fdget(attr->token_create.bpffs_path_fd);
        if (!f.file)
                return -EBADF;

        *path = f.file->f_path;
        path_get(path);
        fdput(f);

> +	if (err)
> +		return err;
> +
> +	if (path.mnt->mnt_root != path.dentry) {
> +		err = -EINVAL;
> +		goto out_path;
> +	}
> +	if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
> +		err = -EINVAL;
> +		goto out_path;
> +	}
> +	err = path_permission(&path, MAY_ACCESS);
> +	if (err)
> +		goto out_path;
> +
> +	userns = path.dentry->d_sb->s_user_ns;
> +	if (!ns_capable(userns, CAP_BPF)) {
> +		err = -EPERM;
> +		goto out_path;
> +	}
> +
> +	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> +	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> +	if (IS_ERR(inode)) {
> +		err = PTR_ERR(inode);
> +		goto out_path;
> +	}
> +
> +	inode->i_op = &bpf_token_iops;
> +	inode->i_fop = &bpf_token_fops;
> +	clear_nlink(inode); /* make sure it is unlinked */
> +
> +	file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> +	if (IS_ERR(file)) {
> +		iput(inode);
> +		err = PTR_ERR(file);
> +		goto out_path;
> +	}
> +
> +	token = kvzalloc(sizeof(*token), GFP_USER);
> +	if (!token) {
> +		err = -ENOMEM;
> +		goto out_file;
> +	}
> +
> +	atomic64_set(&token->refcnt, 1);
> +
> +	/* remember bpffs owning userns for future ns_capable() checks */
> +	token->userns = get_user_ns(userns);
> +
> +	mnt_opts = path.dentry->d_sb->s_fs_info;
> +	token->allowed_cmds = mnt_opts->delegate_cmds;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0) {
> +		err = fd;
> +		goto out_token;
> +	}
> +
> +	file->private_data = token;
> +	fd_install(fd, file);
> +
> +	path_put(&path);
> +	return fd;
> +
> +out_token:
> +	bpf_token_free(token);
> +out_file:
> +	fput(file);
> +out_path:
> +	path_put(&path);
> +	return err;
> +}
> +
> +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> +{
> +	struct fd f = fdget(ufd);
> +	struct bpf_token *token;
> +
> +	if (!f.file)
> +		return ERR_PTR(-EBADF);
> +	if (f.file->f_op != &bpf_token_fops) {
> +		fdput(f);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	token = f.file->private_data;
> +	bpf_token_inc(token);
> +	fdput(f);
> +
> +	return token;
> +}
> +
> +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> +{
> +	if (!token)
> +		return false;
> +
> +	return token->allowed_cmds & (1ULL << cmd);
> +}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 0f6cdf52b1da..f75f289f1749 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -847,6 +847,37 @@ union bpf_iter_link_info {
>   *		Returns zero on success. On error, -1 is returned and *errno*
>   *		is set appropriately.
>   *
> + * BPF_TOKEN_CREATE
> + *	Description
> + *		Create BPF token with embedded information about what
> + *		BPF-related functionality it allows:
> + *		- a set of allowed bpf() syscall commands;
> + *		- a set of allowed BPF map types to be created with
> + *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
> + *		- a set of allowed BPF program types and BPF program attach
> + *		types to be loaded with BPF_PROG_LOAD command, if
> + *		BPF_PROG_LOAD itself is allowed.
> + *
> + *		BPF token is created (derived) from an instance of BPF FS,
> + *		assuming it has necessary delegation mount options specified.
> + *		BPF FS mount is specified with openat()-style path FD + string.
> + *		This BPF token can be passed as an extra parameter to various
> + *		bpf() syscall commands to grant BPF subsystem functionality to
> + *		unprivileged processes.
> + *
> + *		When created, BPF token is "associated" with the owning
> + *		user namespace of BPF FS instance (super block) that it was
> + *		derived from, and subsequent BPF operations performed with
> + *		BPF token would be performing capabilities checks (i.e.,
> + *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
> + *		that user namespace. Without BPF token, such capabilities
> + *		have to be granted in init user namespace, making bpf()
> + *		syscall incompatible with user namespace, for the most part.
> + *
> + *	Return
> + *		A new file descriptor (a nonnegative integer), or -1 if an
> + *		error occurred (in which case, *errno* is set appropriately).
> + *
>   * NOTES
>   *	eBPF objects (maps and programs) can be shared between processes.
>   *
> @@ -901,6 +932,8 @@ enum bpf_cmd {
>  	BPF_ITER_CREATE,
>  	BPF_LINK_DETACH,
>  	BPF_PROG_BIND_MAP,
> +	BPF_TOKEN_CREATE,
> +	__MAX_BPF_CMD,
>  };
>  
>  enum bpf_map_type {
> @@ -1709,6 +1742,12 @@ union bpf_attr {
>  		__u32		flags;		/* extra flags */
>  	} prog_bind_map;
>  
> +	struct { /* struct used by BPF_TOKEN_CREATE command */
> +		__u32		flags;
> +		__u32		bpffs_path_fd;
> +		__aligned_u64	bpffs_pathname;
> +	} token_create;
> +
>  } __attribute__((aligned(8)));
>  
>  /* The description below is an attempt at providing documentation to eBPF
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
  2023-11-08 13:51   ` Christian Brauner
@ 2023-11-08 21:09     ` Andrii Nakryiko
  2023-11-09  8:47       ` Christian Brauner
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-08 21:09 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Wed, Nov 8, 2023 at 5:51 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Nov 03, 2023 at 12:05:08PM -0700, Andrii Nakryiko wrote:
> > Add few new mount options to BPF FS that allow to specify that a given
> > BPF FS instance allows creation of BPF token (added in the next patch),
> > and what sort of operations are allowed under BPF token. As such, we get
> > 4 new mount options, each is a bit mask
> >   - `delegate_cmds` allow to specify which bpf() syscall commands are
> >     allowed with BPF token derived from this BPF FS instance;
> >   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> >     a set of allowable BPF map types that could be created with BPF token;
> >   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> >     a set of allowable BPF program types that could be loaded with BPF token;
> >   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> >     a set of allowable BPF program attach types that could be loaded with
> >     BPF token; delegate_progs and delegate_attachs are meant to be used
> >     together, as full BPF program type is, in general, determined
> >     through both program type and program attach type.
> >
> > Currently, these mount options accept the following forms of values:
> >   - a special value "any", that enables all possible values of a given
> >   bit set;
> >   - numeric value (decimal or hexadecimal, determined by kernel
> >   automatically) that specifies a bit mask value directly;
> >   - all the values for a given mount option are combined, if specified
> >   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> >   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> >   mask.
> >
> > Ideally, more convenient (for humans) symbolic form derived from
> > corresponding UAPI enums would be accepted (e.g., `-o
> > delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> > it requires a bunch of UAPI header churn, so I postponed it until this
> > feature lands upstream or at least there is a definite consensus that
> > this feature is acceptable and is going to make it, just to minimize
> > amount of wasted effort and not increase amount of non-essential code to
> > be reviewed.
> >
> > Attentive reader will notice that BPF FS is now marked as
> > FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> > user namespace as long as the process has sufficient *namespaced*
> > capabilities within that user namespace. But in reality we still
> > restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> > init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> > to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> > inside unprivileged process inside non-init userns, to capture that
> > userns as the owning userns. It will still be required to pass this
> > context object back to privileged process to instantiate and mount it.
> >
> > This manipulation is important, because capturing non-init userns as the
> > owning userns of BPF FS instance (super block) allows to use that userns
> > to constraint BPF token to that userns later on (see next patch). So
> > creating BPF FS with delegation inside unprivileged userns will restrict
> > derived BPF token objects to only "work" inside that intended userns,
> > making it scoped to a intended "container".
> >
> > There is a set of selftests at the end of the patch set that simulates
> > this sequence of steps and validates that everything works as intended.
> > But careful review is requested to make sure there are no missed gaps in
> > the implementation and testing.
> >
> > All this is based on suggestions and discussions with Christian Brauner
> > ([0]), to the best of my ability to follow all the implications.
>
> "who will not be held responsible for any CVE future or present as he's
>  not sure whether bpf token is a good idea in general"
>
> I'm not opposing it because it's really not my subsystem. But it'd be
> nice if you also added a disclaimer that I'm not endorsing this. :)
>

Sure, I'll clarify. I still appreciate your reviewing everything and
pointing out all the gotchas (like the reconfiguration and other
stuff), thanks!

> A comment below.
>
> >
> >   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h | 10 ++++++
> >  kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
> >  2 files changed, 88 insertions(+), 10 deletions(-)
> >

[...]

> >       opt = fs_parse(fc, bpf_fs_parameters, param, &result);
> >       if (opt < 0) {
> > @@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
> >       case OPT_MODE:
> >               opts->mode = result.uint_32 & S_IALLUGO;
> >               break;
> > +     case OPT_DELEGATE_CMDS:
> > +     case OPT_DELEGATE_MAPS:
> > +     case OPT_DELEGATE_PROGS:
> > +     case OPT_DELEGATE_ATTACHS:
> > +             if (strcmp(param->string, "any") == 0) {
> > +                     msk = ~0ULL;
> > +             } else {
> > +                     err = kstrtou64(param->string, 0, &msk);
> > +                     if (err)
> > +                             return err;
> > +             }
> > +             switch (opt) {
> > +             case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
> > +             case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
> > +             case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
> > +             case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
> > +             default: return -EINVAL;
> > +             }
> > +             break;
> >       }
>
> So just to repeat that this will allow a container to set it's own
> delegation options:
>
>         # unprivileged container
>
>         fd_fs = fsopen();
>         fsconfig(fd_fs, FSCONFIG_BLA_BLA, "give-me-all-the-delegation");
>
>         # Now hand of that fd_fs to a privileged process
>
>         fsconfig(fd_fs, FSCONFIG_CREATE_CMD, ...)
>
> This means the container manager can't be part of your threat model
> because you need to trust it to set delegation options.
>
> But if the container manager is part of your threat model then you can
> never trust an fd_fs handed to you because the container manager might
> have enabled arbitrary delegation privileges.
>
> There's ways around this:
>
> (1) kernel: Account for this in the kernel and require privileges when
>     setting delegation options.

What sort of privilege would that be? We are in an unprivileged user
namespace, so that would have to be some ns_capable() checks or
something? I can add ns_capable(CAP_BPF), but what else did you have
in mind?

I think even if we say that privileged parent does FSCONFIG_SET_STRING
and unprivileged child just does sys_fsopen("bpf", 0) and nothing
more, we still can't be sure that child won't race with parent and set
FSCONFIG_SET_STRING at the same time. Because they both have access to
the same fs_fd.

> (2) userspace: A trusted helper that allocates an fs_context fd in
>     the target user namespace, then sets delegation options and creates
>     superblock.
>
> (1) Is more restrictive but also more secure. (2) is less restrictive
> but requires more care from userspace.
>
> Either way I would probably consider writing a document detailing
> various delegation scenarios and possible pitfalls and implications
> before advertising it.
>
> If you choose (2) then you also need to be aware that the security of
> this also hinges on bpffs not allowing to reconfigure parameters once it
> has been mounted. Otherwise an unprivileged container can change
> delegation options.
>
> I would recommend that you either add a dummy bpf_reconfigure() method
> with a comment in it or you add a comment on top of bpf_context_ops.
> Something like:
>
> /*
>  * Unprivileged mounts of bpffs are owned by the user namespace they are
>  * mounted in. That means unprivileged users can change vfs mount
>  * options (ro<->rw, nosuid, etc.).
>  *
>  * They currently cannot change bpffs specific mount options such as
>  * delegation settings. If that is ever implemented it is necessary to
>  * require rivileges in the initial namespace. Otherwise unprivileged
>  * users can change delegation options to whatever they want.
>  */

Yep, I will add a custom callback. I think we can allow reconfiguring
towards less permissive delegation subset, but I'll need to look at
the specifics to see if we can support that easily.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object
  2023-11-08 14:28   ` Christian Brauner
@ 2023-11-08 21:09     ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-08 21:09 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Wed, Nov 8, 2023 at 6:28 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Nov 03, 2023 at 12:05:09PM -0700, Andrii Nakryiko wrote:
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operations could be performed using provided BPF token.
> >
> > This is achieved through mounting BPF FS instance with extra delegation
> > mount options, which determine what operations are delegatable, and also
> > constraining it to the owning user namespace (as mentioned in the
> > previous patch).
> >
> > BPF token itself is just a derivative from BPF FS and can be created
> > through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts
> > a path specification (using the usual fd + string path combo) to a BPF
> > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > prog type, and attach type bit sets from BPF FS as is. In the future,
> > having an BPF token as a separate object with its own FD, we can allow
> > to further restrict BPF token's allowable set of things either at the creation
> > time or after the fact, allowing the process to guard itself further
> > from, e.g., unintentionally trying to load undesired kind of BPF
> > programs. But for now we keep things simple and just copy bit sets as is.
> >
> > When BPF token is created from BPF FS mount, we take reference to the
> > BPF super block's owning user namespace, and then use that namespace for
> > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > capabilities that are normally only checked against init userns (using
> > capable()), but now we check them using ns_capable() instead (if BPF
> > token is provided). See bpf_token_capable() for details.
> >
> > Such setup means that BPF token in itself is not sufficient to grant BPF
> > functionality. User namespaced process has to *also* have necessary
> > combination of capabilities inside that user namespace. So while
> > previously CAP_BPF was useless when granted within user namespace, now
> > it gains a meaning and allows container managers and sys admins to have
> > a flexible control over which processes can and need to use BPF
> > functionality within the user namespace (i.e., container in practice).
> > And BPF FS delegation mount options and derived BPF tokens serve as
> > a per-container "flag" to grant overall ability to use bpf() (plus further
> > restrict on which parts of bpf() syscalls are treated as namespaced).
> >
> > Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
> > within the BPF FS owning user namespace, rounding up the ns_capable()
> > story of BPF token.
> >
> > The alternative to creating BPF token object was:
> >   a) not having any extra object and just pasing BPF FS path to each
> >      relevant bpf() command. This seems suboptimal as it's racy (mount
> >      under the same path might change in between checking it and using it
> >      for bpf() command). And also less flexible if we'd like to further
>
> I don't understand "mount under the same path might change in between
> checking it and using it for bpf() command".
>
> Just require userspace to open() the bpffs instance and pass that fd to
> bpf() just as you're doing right now. If that is racy then the current
> implementation is even more so because it is passing:
>
> bpffs_path_fd
> bpffs_pathname
>
> and then performs a lookup. More on that below.

Yes, this is a result of my initial confusion with how O_PATH-based
open() works. You are right that it's not racy, I'll update the
message.

>
> I want to point out that most of this code here is unnecessary if you
> use the bpffs fd itself as a token. But that's your decision. I'm just
> saying that I'm not sure the critique that it's racy is valid.

Ack.

>
> >      restrict ourselves compared to all the delegated functionality
> >      allowed on BPF FS.
> >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> >      a dedicated FD that would represent a token-like functionality. This
> >      doesn't seem superior to having a proper bpf() command, so
> >      BPF_TOKEN_CREATE was chosen.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h            |  41 +++++++
> >  include/uapi/linux/bpf.h       |  39 +++++++
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/inode.c             |  17 ++-
> >  kernel/bpf/syscall.c           |  17 +++
> >  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  39 +++++++
> >  7 files changed, 342 insertions(+), 10 deletions(-)
> >  create mode 100644 kernel/bpf/token.c
> >

[...]

> > +
> > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > +
> > +static const struct inode_operations bpf_token_iops = { };
> > +
> > +static const struct file_operations bpf_token_fops = {
> > +     .release        = bpf_token_release,
> > +     .show_fdinfo    = bpf_token_show_fdinfo,
> > +};
> > +
> > +int bpf_token_create(union bpf_attr *attr)
> > +{
> > +     struct bpf_mount_opts *mnt_opts;
> > +     struct bpf_token *token = NULL;
> > +     struct user_namespace *userns;
> > +     struct inode *inode;
> > +     struct file *file;
> > +     struct path path;
> > +     umode_t mode;
> > +     int err, fd;
> > +
> > +     err = user_path_at(attr->token_create.bpffs_path_fd,
> > +                        u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > +                        LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
>
> Do you really need bpffs_path_fd and bpffs_pathname?
> This seems unnecessar as you're forcing a lookup that's best done in
> userspace through regular open() apis. So I would just make this:
>
> struct { /* struct used by BPF_TOKEN_CREATE command */
>         __u32           flags;
>         __u32           bpffs_path_fd;
> } token_create;
>
> In bpf_token_create() you can then just do:
>
>         struct fd f;
>         struct path path;
>
>         f = fdget(attr->token_create.bpffs_path_fd);
>         if (!f.file)
>                 return -EBADF;
>
>         *path = f.file->f_path;
>         path_get(path);
>         fdput(f);
>

Yes, you are right. I'll simplify this part, thanks.


> > +     if (err)
> > +             return err;
> > +
> > +     if (path.mnt->mnt_root != path.dentry) {
> > +             err = -EINVAL;
> > +             goto out_path;
> > +     }
> > +     if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
> > +             err = -EINVAL;
> > +             goto out_path;
> > +     }

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
  2023-11-08 21:09     ` Andrii Nakryiko
@ 2023-11-09  8:47       ` Christian Brauner
  2023-11-09 17:09         ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Christian Brauner @ 2023-11-09  8:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Wed, Nov 08, 2023 at 01:09:27PM -0800, Andrii Nakryiko wrote:
> On Wed, Nov 8, 2023 at 5:51 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Fri, Nov 03, 2023 at 12:05:08PM -0700, Andrii Nakryiko wrote:
> > > Add few new mount options to BPF FS that allow to specify that a given
> > > BPF FS instance allows creation of BPF token (added in the next patch),
> > > and what sort of operations are allowed under BPF token. As such, we get
> > > 4 new mount options, each is a bit mask
> > >   - `delegate_cmds` allow to specify which bpf() syscall commands are
> > >     allowed with BPF token derived from this BPF FS instance;
> > >   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> > >     a set of allowable BPF map types that could be created with BPF token;
> > >   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> > >     a set of allowable BPF program types that could be loaded with BPF token;
> > >   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> > >     a set of allowable BPF program attach types that could be loaded with
> > >     BPF token; delegate_progs and delegate_attachs are meant to be used
> > >     together, as full BPF program type is, in general, determined
> > >     through both program type and program attach type.
> > >
> > > Currently, these mount options accept the following forms of values:
> > >   - a special value "any", that enables all possible values of a given
> > >   bit set;
> > >   - numeric value (decimal or hexadecimal, determined by kernel
> > >   automatically) that specifies a bit mask value directly;
> > >   - all the values for a given mount option are combined, if specified
> > >   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> > >   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> > >   mask.
> > >
> > > Ideally, more convenient (for humans) symbolic form derived from
> > > corresponding UAPI enums would be accepted (e.g., `-o
> > > delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> > > it requires a bunch of UAPI header churn, so I postponed it until this
> > > feature lands upstream or at least there is a definite consensus that
> > > this feature is acceptable and is going to make it, just to minimize
> > > amount of wasted effort and not increase amount of non-essential code to
> > > be reviewed.
> > >
> > > Attentive reader will notice that BPF FS is now marked as
> > > FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> > > user namespace as long as the process has sufficient *namespaced*
> > > capabilities within that user namespace. But in reality we still
> > > restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> > > init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> > > to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> > > inside unprivileged process inside non-init userns, to capture that
> > > userns as the owning userns. It will still be required to pass this
> > > context object back to privileged process to instantiate and mount it.
> > >
> > > This manipulation is important, because capturing non-init userns as the
> > > owning userns of BPF FS instance (super block) allows to use that userns
> > > to constraint BPF token to that userns later on (see next patch). So
> > > creating BPF FS with delegation inside unprivileged userns will restrict
> > > derived BPF token objects to only "work" inside that intended userns,
> > > making it scoped to a intended "container".
> > >
> > > There is a set of selftests at the end of the patch set that simulates
> > > this sequence of steps and validates that everything works as intended.
> > > But careful review is requested to make sure there are no missed gaps in
> > > the implementation and testing.
> > >
> > > All this is based on suggestions and discussions with Christian Brauner
> > > ([0]), to the best of my ability to follow all the implications.
> >
> > "who will not be held responsible for any CVE future or present as he's
> >  not sure whether bpf token is a good idea in general"
> >
> > I'm not opposing it because it's really not my subsystem. But it'd be
> > nice if you also added a disclaimer that I'm not endorsing this. :)
> >
> 
> Sure, I'll clarify. I still appreciate your reviewing everything and
> pointing out all the gotchas (like the reconfiguration and other
> stuff), thanks!
> 
> > A comment below.
> >
> > >
> > >   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h | 10 ++++++
> > >  kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
> > >  2 files changed, 88 insertions(+), 10 deletions(-)
> > >
> 
> [...]
> 
> > >       opt = fs_parse(fc, bpf_fs_parameters, param, &result);
> > >       if (opt < 0) {
> > > @@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > >       case OPT_MODE:
> > >               opts->mode = result.uint_32 & S_IALLUGO;
> > >               break;
> > > +     case OPT_DELEGATE_CMDS:
> > > +     case OPT_DELEGATE_MAPS:
> > > +     case OPT_DELEGATE_PROGS:
> > > +     case OPT_DELEGATE_ATTACHS:
> > > +             if (strcmp(param->string, "any") == 0) {
> > > +                     msk = ~0ULL;
> > > +             } else {
> > > +                     err = kstrtou64(param->string, 0, &msk);
> > > +                     if (err)
> > > +                             return err;
> > > +             }
> > > +             switch (opt) {
> > > +             case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
> > > +             case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
> > > +             case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
> > > +             case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
> > > +             default: return -EINVAL;
> > > +             }
> > > +             break;
> > >       }
> >
> > So just to repeat that this will allow a container to set it's own
> > delegation options:
> >
> >         # unprivileged container
> >
> >         fd_fs = fsopen();
> >         fsconfig(fd_fs, FSCONFIG_BLA_BLA, "give-me-all-the-delegation");
> >
> >         # Now hand of that fd_fs to a privileged process
> >
> >         fsconfig(fd_fs, FSCONFIG_CREATE_CMD, ...)
> >
> > This means the container manager can't be part of your threat model
> > because you need to trust it to set delegation options.
> >
> > But if the container manager is part of your threat model then you can
> > never trust an fd_fs handed to you because the container manager might
> > have enabled arbitrary delegation privileges.
> >
> > There's ways around this:
> >
> > (1) kernel: Account for this in the kernel and require privileges when
> >     setting delegation options.
> 
> What sort of privilege would that be? We are in an unprivileged user
> namespace, so that would have to be some ns_capable() checks or
> something? I can add ns_capable(CAP_BPF), but what else did you have
> in mind?

You would require privileges in the initial namespace aka capable()
checks similar to what you require for superblock creation.

> 
> I think even if we say that privileged parent does FSCONFIG_SET_STRING
> and unprivileged child just does sys_fsopen("bpf", 0) and nothing
> more, we still can't be sure that child won't race with parent and set
> FSCONFIG_SET_STRING at the same time. Because they both have access to
> the same fs_fd.

Unless you require privileges as outlined above to set delegation
options in which case an unprivileged container cannot change delegation
options at all.

> 
> > (2) userspace: A trusted helper that allocates an fs_context fd in
> >     the target user namespace, then sets delegation options and creates
> >     superblock.
> >
> > (1) Is more restrictive but also more secure. (2) is less restrictive
> > but requires more care from userspace.
> >
> > Either way I would probably consider writing a document detailing
> > various delegation scenarios and possible pitfalls and implications
> > before advertising it.
> >
> > If you choose (2) then you also need to be aware that the security of
> > this also hinges on bpffs not allowing to reconfigure parameters once it
> > has been mounted. Otherwise an unprivileged container can change
> > delegation options.
> >
> > I would recommend that you either add a dummy bpf_reconfigure() method
> > with a comment in it or you add a comment on top of bpf_context_ops.
> > Something like:
> >
> > /*
> >  * Unprivileged mounts of bpffs are owned by the user namespace they are
> >  * mounted in. That means unprivileged users can change vfs mount
> >  * options (ro<->rw, nosuid, etc.).
> >  *
> >  * They currently cannot change bpffs specific mount options such as
> >  * delegation settings. If that is ever implemented it is necessary to
> >  * require rivileges in the initial namespace. Otherwise unprivileged
> >  * users can change delegation options to whatever they want.
> >  */
> 
> Yep, I will add a custom callback. I think we can allow reconfiguring
> towards less permissive delegation subset, but I'll need to look at
> the specifics to see if we can support that easily.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
  2023-11-09  8:47       ` Christian Brauner
@ 2023-11-09 17:09         ` Andrii Nakryiko
  2023-11-09 22:29           ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:09 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Thu, Nov 9, 2023 at 12:48 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Nov 08, 2023 at 01:09:27PM -0800, Andrii Nakryiko wrote:
> > On Wed, Nov 8, 2023 at 5:51 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Fri, Nov 03, 2023 at 12:05:08PM -0700, Andrii Nakryiko wrote:
> > > > Add few new mount options to BPF FS that allow to specify that a given
> > > > BPF FS instance allows creation of BPF token (added in the next patch),
> > > > and what sort of operations are allowed under BPF token. As such, we get
> > > > 4 new mount options, each is a bit mask
> > > >   - `delegate_cmds` allow to specify which bpf() syscall commands are
> > > >     allowed with BPF token derived from this BPF FS instance;
> > > >   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> > > >     a set of allowable BPF map types that could be created with BPF token;
> > > >   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> > > >     a set of allowable BPF program types that could be loaded with BPF token;
> > > >   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> > > >     a set of allowable BPF program attach types that could be loaded with
> > > >     BPF token; delegate_progs and delegate_attachs are meant to be used
> > > >     together, as full BPF program type is, in general, determined
> > > >     through both program type and program attach type.
> > > >
> > > > Currently, these mount options accept the following forms of values:
> > > >   - a special value "any", that enables all possible values of a given
> > > >   bit set;
> > > >   - numeric value (decimal or hexadecimal, determined by kernel
> > > >   automatically) that specifies a bit mask value directly;
> > > >   - all the values for a given mount option are combined, if specified
> > > >   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> > > >   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> > > >   mask.
> > > >
> > > > Ideally, more convenient (for humans) symbolic form derived from
> > > > corresponding UAPI enums would be accepted (e.g., `-o
> > > > delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> > > > it requires a bunch of UAPI header churn, so I postponed it until this
> > > > feature lands upstream or at least there is a definite consensus that
> > > > this feature is acceptable and is going to make it, just to minimize
> > > > amount of wasted effort and not increase amount of non-essential code to
> > > > be reviewed.
> > > >
> > > > Attentive reader will notice that BPF FS is now marked as
> > > > FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> > > > user namespace as long as the process has sufficient *namespaced*
> > > > capabilities within that user namespace. But in reality we still
> > > > restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> > > > init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> > > > to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> > > > inside unprivileged process inside non-init userns, to capture that
> > > > userns as the owning userns. It will still be required to pass this
> > > > context object back to privileged process to instantiate and mount it.
> > > >
> > > > This manipulation is important, because capturing non-init userns as the
> > > > owning userns of BPF FS instance (super block) allows to use that userns
> > > > to constraint BPF token to that userns later on (see next patch). So
> > > > creating BPF FS with delegation inside unprivileged userns will restrict
> > > > derived BPF token objects to only "work" inside that intended userns,
> > > > making it scoped to a intended "container".
> > > >
> > > > There is a set of selftests at the end of the patch set that simulates
> > > > this sequence of steps and validates that everything works as intended.
> > > > But careful review is requested to make sure there are no missed gaps in
> > > > the implementation and testing.
> > > >
> > > > All this is based on suggestions and discussions with Christian Brauner
> > > > ([0]), to the best of my ability to follow all the implications.
> > >
> > > "who will not be held responsible for any CVE future or present as he's
> > >  not sure whether bpf token is a good idea in general"
> > >
> > > I'm not opposing it because it's really not my subsystem. But it'd be
> > > nice if you also added a disclaimer that I'm not endorsing this. :)
> > >
> >
> > Sure, I'll clarify. I still appreciate your reviewing everything and
> > pointing out all the gotchas (like the reconfiguration and other
> > stuff), thanks!
> >
> > > A comment below.
> > >
> > > >
> > > >   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  include/linux/bpf.h | 10 ++++++
> > > >  kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
> > > >  2 files changed, 88 insertions(+), 10 deletions(-)
> > > >
> >
> > [...]
> >
> > > >       opt = fs_parse(fc, bpf_fs_parameters, param, &result);
> > > >       if (opt < 0) {
> > > > @@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > > >       case OPT_MODE:
> > > >               opts->mode = result.uint_32 & S_IALLUGO;
> > > >               break;
> > > > +     case OPT_DELEGATE_CMDS:
> > > > +     case OPT_DELEGATE_MAPS:
> > > > +     case OPT_DELEGATE_PROGS:
> > > > +     case OPT_DELEGATE_ATTACHS:
> > > > +             if (strcmp(param->string, "any") == 0) {
> > > > +                     msk = ~0ULL;
> > > > +             } else {
> > > > +                     err = kstrtou64(param->string, 0, &msk);
> > > > +                     if (err)
> > > > +                             return err;
> > > > +             }
> > > > +             switch (opt) {
> > > > +             case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
> > > > +             case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
> > > > +             case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
> > > > +             case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
> > > > +             default: return -EINVAL;
> > > > +             }
> > > > +             break;
> > > >       }
> > >
> > > So just to repeat that this will allow a container to set it's own
> > > delegation options:
> > >
> > >         # unprivileged container
> > >
> > >         fd_fs = fsopen();
> > >         fsconfig(fd_fs, FSCONFIG_BLA_BLA, "give-me-all-the-delegation");
> > >
> > >         # Now hand of that fd_fs to a privileged process
> > >
> > >         fsconfig(fd_fs, FSCONFIG_CREATE_CMD, ...)
> > >
> > > This means the container manager can't be part of your threat model
> > > because you need to trust it to set delegation options.
> > >
> > > But if the container manager is part of your threat model then you can
> > > never trust an fd_fs handed to you because the container manager might
> > > have enabled arbitrary delegation privileges.
> > >
> > > There's ways around this:
> > >
> > > (1) kernel: Account for this in the kernel and require privileges when
> > >     setting delegation options.
> >
> > What sort of privilege would that be? We are in an unprivileged user
> > namespace, so that would have to be some ns_capable() checks or
> > something? I can add ns_capable(CAP_BPF), but what else did you have
> > in mind?
>
> You would require privileges in the initial namespace aka capable()
> checks similar to what you require for superblock creation.

ok, I was just wondering if I'm missing something non-obvious.
capable(CAP_SYS_ADMIN) makes sense and doesn't really hurt intended
use case. Privileged parent will set these config values and then do
FSCONFIG_CREATE_CMD.

For reconfiguration I'll enforce same capable(CAP_SYS_ADMIN) checks,
unless unprivileged user drops permissions to more restrictive ones
(but I haven't had a chance to look at exact callback API, so we'll
see if that's easy to support).

Thanks for feedback!

>
> >
> > I think even if we say that privileged parent does FSCONFIG_SET_STRING
> > and unprivileged child just does sys_fsopen("bpf", 0) and nothing
> > more, we still can't be sure that child won't race with parent and set
> > FSCONFIG_SET_STRING at the same time. Because they both have access to
> > the same fs_fd.
>
> Unless you require privileges as outlined above to set delegation
> options in which case an unprivileged container cannot change delegation
> options at all.

Yep, makes sense, that's what I'm going to do.

>
> >
> > > (2) userspace: A trusted helper that allocates an fs_context fd in
> > >     the target user namespace, then sets delegation options and creates
> > >     superblock.
> > >
> > > (1) Is more restrictive but also more secure. (2) is less restrictive
> > > but requires more care from userspace.
> > >
> > > Either way I would probably consider writing a document detailing
> > > various delegation scenarios and possible pitfalls and implications
> > > before advertising it.
> > >
> > > If you choose (2) then you also need to be aware that the security of
> > > this also hinges on bpffs not allowing to reconfigure parameters once it
> > > has been mounted. Otherwise an unprivileged container can change
> > > delegation options.
> > >
> > > I would recommend that you either add a dummy bpf_reconfigure() method
> > > with a comment in it or you add a comment on top of bpf_context_ops.
> > > Something like:
> > >
> > > /*
> > >  * Unprivileged mounts of bpffs are owned by the user namespace they are
> > >  * mounted in. That means unprivileged users can change vfs mount
> > >  * options (ro<->rw, nosuid, etc.).
> > >  *
> > >  * They currently cannot change bpffs specific mount options such as
> > >  * delegation settings. If that is ever implemented it is necessary to
> > >  * require rivileges in the initial namespace. Otherwise unprivileged
> > >  * users can change delegation options to whatever they want.
> > >  */
> >
> > Yep, I will add a custom callback. I think we can allow reconfiguring
> > towards less permissive delegation subset, but I'll need to look at
> > the specifics to see if we can support that easily.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
  2023-11-09 17:09         ` Andrii Nakryiko
@ 2023-11-09 22:29           ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 22:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, keescook, kernel-team, sargun

On Thu, Nov 9, 2023 at 9:09 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 12:48 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Nov 08, 2023 at 01:09:27PM -0800, Andrii Nakryiko wrote:
> > > On Wed, Nov 8, 2023 at 5:51 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Fri, Nov 03, 2023 at 12:05:08PM -0700, Andrii Nakryiko wrote:
> > > > > Add few new mount options to BPF FS that allow to specify that a given
> > > > > BPF FS instance allows creation of BPF token (added in the next patch),
> > > > > and what sort of operations are allowed under BPF token. As such, we get
> > > > > 4 new mount options, each is a bit mask
> > > > >   - `delegate_cmds` allow to specify which bpf() syscall commands are
> > > > >     allowed with BPF token derived from this BPF FS instance;
> > > > >   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> > > > >     a set of allowable BPF map types that could be created with BPF token;
> > > > >   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> > > > >     a set of allowable BPF program types that could be loaded with BPF token;
> > > > >   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> > > > >     a set of allowable BPF program attach types that could be loaded with
> > > > >     BPF token; delegate_progs and delegate_attachs are meant to be used
> > > > >     together, as full BPF program type is, in general, determined
> > > > >     through both program type and program attach type.
> > > > >
> > > > > Currently, these mount options accept the following forms of values:
> > > > >   - a special value "any", that enables all possible values of a given
> > > > >   bit set;
> > > > >   - numeric value (decimal or hexadecimal, determined by kernel
> > > > >   automatically) that specifies a bit mask value directly;
> > > > >   - all the values for a given mount option are combined, if specified
> > > > >   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> > > > >   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> > > > >   mask.
> > > > >
> > > > > Ideally, more convenient (for humans) symbolic form derived from
> > > > > corresponding UAPI enums would be accepted (e.g., `-o
> > > > > delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> > > > > it requires a bunch of UAPI header churn, so I postponed it until this
> > > > > feature lands upstream or at least there is a definite consensus that
> > > > > this feature is acceptable and is going to make it, just to minimize
> > > > > amount of wasted effort and not increase amount of non-essential code to
> > > > > be reviewed.
> > > > >
> > > > > Attentive reader will notice that BPF FS is now marked as
> > > > > FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> > > > > user namespace as long as the process has sufficient *namespaced*
> > > > > capabilities within that user namespace. But in reality we still
> > > > > restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> > > > > init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> > > > > to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> > > > > inside unprivileged process inside non-init userns, to capture that
> > > > > userns as the owning userns. It will still be required to pass this
> > > > > context object back to privileged process to instantiate and mount it.
> > > > >
> > > > > This manipulation is important, because capturing non-init userns as the
> > > > > owning userns of BPF FS instance (super block) allows to use that userns
> > > > > to constraint BPF token to that userns later on (see next patch). So
> > > > > creating BPF FS with delegation inside unprivileged userns will restrict
> > > > > derived BPF token objects to only "work" inside that intended userns,
> > > > > making it scoped to a intended "container".
> > > > >
> > > > > There is a set of selftests at the end of the patch set that simulates
> > > > > this sequence of steps and validates that everything works as intended.
> > > > > But careful review is requested to make sure there are no missed gaps in
> > > > > the implementation and testing.
> > > > >
> > > > > All this is based on suggestions and discussions with Christian Brauner
> > > > > ([0]), to the best of my ability to follow all the implications.
> > > >
> > > > "who will not be held responsible for any CVE future or present as he's
> > > >  not sure whether bpf token is a good idea in general"
> > > >
> > > > I'm not opposing it because it's really not my subsystem. But it'd be
> > > > nice if you also added a disclaimer that I'm not endorsing this. :)
> > > >
> > >
> > > Sure, I'll clarify. I still appreciate your reviewing everything and
> > > pointing out all the gotchas (like the reconfiguration and other
> > > stuff), thanks!
> > >
> > > > A comment below.
> > > >
> > > > >
> > > > >   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > ---
> > > > >  include/linux/bpf.h | 10 ++++++
> > > > >  kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
> > > > >  2 files changed, 88 insertions(+), 10 deletions(-)
> > > > >
> > >
> > > [...]
> > >
> > > > >       opt = fs_parse(fc, bpf_fs_parameters, param, &result);
> > > > >       if (opt < 0) {
> > > > > @@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > > > >       case OPT_MODE:
> > > > >               opts->mode = result.uint_32 & S_IALLUGO;
> > > > >               break;
> > > > > +     case OPT_DELEGATE_CMDS:
> > > > > +     case OPT_DELEGATE_MAPS:
> > > > > +     case OPT_DELEGATE_PROGS:
> > > > > +     case OPT_DELEGATE_ATTACHS:
> > > > > +             if (strcmp(param->string, "any") == 0) {
> > > > > +                     msk = ~0ULL;
> > > > > +             } else {
> > > > > +                     err = kstrtou64(param->string, 0, &msk);
> > > > > +                     if (err)
> > > > > +                             return err;
> > > > > +             }
> > > > > +             switch (opt) {
> > > > > +             case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
> > > > > +             case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
> > > > > +             case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
> > > > > +             case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
> > > > > +             default: return -EINVAL;
> > > > > +             }
> > > > > +             break;
> > > > >       }
> > > >
> > > > So just to repeat that this will allow a container to set it's own
> > > > delegation options:
> > > >
> > > >         # unprivileged container
> > > >
> > > >         fd_fs = fsopen();
> > > >         fsconfig(fd_fs, FSCONFIG_BLA_BLA, "give-me-all-the-delegation");
> > > >
> > > >         # Now hand of that fd_fs to a privileged process
> > > >
> > > >         fsconfig(fd_fs, FSCONFIG_CREATE_CMD, ...)
> > > >
> > > > This means the container manager can't be part of your threat model
> > > > because you need to trust it to set delegation options.
> > > >
> > > > But if the container manager is part of your threat model then you can
> > > > never trust an fd_fs handed to you because the container manager might
> > > > have enabled arbitrary delegation privileges.
> > > >
> > > > There's ways around this:
> > > >
> > > > (1) kernel: Account for this in the kernel and require privileges when
> > > >     setting delegation options.
> > >
> > > What sort of privilege would that be? We are in an unprivileged user
> > > namespace, so that would have to be some ns_capable() checks or
> > > something? I can add ns_capable(CAP_BPF), but what else did you have
> > > in mind?
> >
> > You would require privileges in the initial namespace aka capable()
> > checks similar to what you require for superblock creation.
>
> ok, I was just wondering if I'm missing something non-obvious.
> capable(CAP_SYS_ADMIN) makes sense and doesn't really hurt intended
> use case. Privileged parent will set these config values and then do
> FSCONFIG_CREATE_CMD.
>
> For reconfiguration I'll enforce same capable(CAP_SYS_ADMIN) checks,
> unless unprivileged user drops permissions to more restrictive ones
> (but I haven't had a chance to look at exact callback API, so we'll
> see if that's easy to support).

Ok, so I played with this a bit. It seems that if I require
capable(CAP_SYS_ADMIN) for in fsconfig() to set delegation options, I
don't have to do anything special about reconfiguration. Any
FSCONFIG_SET_xxx command for delegation option will just fail, and so
reconfiguration is harmless. I'm going to go with that and keep it
simple.

>
> Thanks for feedback!
>
> >
> > >
> > > I think even if we say that privileged parent does FSCONFIG_SET_STRING
> > > and unprivileged child just does sys_fsopen("bpf", 0) and nothing
> > > more, we still can't be sure that child won't race with parent and set
> > > FSCONFIG_SET_STRING at the same time. Because they both have access to
> > > the same fs_fd.
> >
> > Unless you require privileges as outlined above to set delegation
> > options in which case an unprivileged container cannot change delegation
> > options at all.
>
> Yep, makes sense, that's what I'm going to do.
>
> >
> > >
> > > > (2) userspace: A trusted helper that allocates an fs_context fd in
> > > >     the target user namespace, then sets delegation options and creates
> > > >     superblock.
> > > >
> > > > (1) Is more restrictive but also more secure. (2) is less restrictive
> > > > but requires more care from userspace.
> > > >
> > > > Either way I would probably consider writing a document detailing
> > > > various delegation scenarios and possible pitfalls and implications
> > > > before advertising it.
> > > >
> > > > If you choose (2) then you also need to be aware that the security of
> > > > this also hinges on bpffs not allowing to reconfigure parameters once it
> > > > has been mounted. Otherwise an unprivileged container can change
> > > > delegation options.
> > > >
> > > > I would recommend that you either add a dummy bpf_reconfigure() method
> > > > with a comment in it or you add a comment on top of bpf_context_ops.
> > > > Something like:
> > > >
> > > > /*
> > > >  * Unprivileged mounts of bpffs are owned by the user namespace they are
> > > >  * mounted in. That means unprivileged users can change vfs mount
> > > >  * options (ro<->rw, nosuid, etc.).
> > > >  *
> > > >  * They currently cannot change bpffs specific mount options such as
> > > >  * delegation settings. If that is ever implemented it is necessary to
> > > >  * require rivileges in the initial namespace. Otherwise unprivileged
> > > >  * users can change delegation options to whatever they want.
> > > >  */
> > >
> > > Yep, I will add a custom callback. I think we can allow reconfiguring
> > > towards less permissive delegation subset, but I'll need to look at
> > > the specifics to see if we can support that easily.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-11-09 22:30 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
2023-11-05  6:33   ` Yafang Shao
2023-11-03 19:05 ` [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
2023-11-08 13:51   ` Christian Brauner
2023-11-08 21:09     ` Andrii Nakryiko
2023-11-09  8:47       ` Christian Brauner
2023-11-09 17:09         ` Andrii Nakryiko
2023-11-09 22:29           ` Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object Andrii Nakryiko
2023-11-08 14:28   ` Christian Brauner
2023-11-08 21:09     ` Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 04/17] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 05/17] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 06/17] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 07/17] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 08/17] bpf: consistently use BPF token throughout BPF verifier logic Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 09/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 9/17] " Paul Moore
2023-11-06 19:03     ` Andrii Nakryiko
2023-11-06 22:29       ` Paul Moore
2023-11-03 19:05 ` [PATCH v9 bpf-next 10/17] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 " Paul Moore
2023-11-03 19:05 ` [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token " Andrii Nakryiko
2023-11-04  0:36   ` kernel test robot
2023-11-04  3:20     ` Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 " Paul Moore
2023-11-06 19:17     ` Andrii Nakryiko
2023-11-06 22:46       ` Paul Moore
2023-11-03 19:05 ` [PATCH v9 bpf-next 12/17] libbpf: add bpf_token_create() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 13/17] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 14/17] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 15/17] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 16/17] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 17/17] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 " Paul Moore
2023-11-06 19:18     ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.