All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation
@ 2023-09-12 21:28 Andrii Nakryiko
  2023-09-12 21:28 ` [PATCH v4 bpf-next 01/12] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:28 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

This patch set introduces an ability to delegate a subset of BPF subsystem
functionality from privileged system-wide daemon (e.g., systemd or any other
container manager) through special mount options for userns-bound BPF FS to
a *trusted* unprivileged application. Trust is the key here. This
functionality is not about allowing unconditional unprivileged BPF usage.
Establishing trust, though, is completely up to the discretion of respective
privileged application that would create and mount a BPF FS instance with
delegation enabled, as different production setups can and do achieve it
through a combination of different means (signing, LSM, code reviews, etc),
and it's undesirable and infeasible for kernel to enforce any particular way
of validating trustworthiness of particular process.

The main motivation for this work is a desire to enable containerized BPF
applications to be used together with user namespaces. This is currently
impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced
or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF
helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read
arbitrary memory, and it's impossible to ensure that they only read memory of
processes belonging to any given namespace. This means that it's impossible to
have a mechanically verifiable namespace-aware CAP_BPF capability, and as such
another mechanism to allow safe usage of BPF functionality is necessary.BPF FS
delegation mount options and BPF token derived from such BPF FS instance is
such a mechanism. Kernel makes no assumption about what "trusted" constitutes
in any particular case, and it's up to specific privileged applications and
their surrounding infrastructure to decide that. What kernel provides is a set
of APIs to setup and mount special BPF FS instanecs and derive BPF tokens from
it. BPF FS and BPF token are both bound to its owning userns and in such a way
are constrained inside intended container. Users can then pass BPF token FD to
privileged bpf() syscall commands, like BPF map creation and BPF program
loading, to perform such operations without having init userns privileged.

This v4 incorporates feedback and suggestions ([3]) received on v3 of this
patch set, and instead of allowing to create BPF tokens directly assuming
capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accepts a few new
delegation mount options. If these options are used and BPF FS itself is
properly created, set up, and mounted inside the user namespaced container,
user application is able to derive a BPF token object from BPF FS instance,
and pass that token to bpf() syscall. As explained in patch #2, BPF token
itself doesn't grant access to BPF functionality, but instead allows kernel to
do namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF,
CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one
half of a puzzle and allows container managers and sys admins to have safe and
flexible configuration options: determining which containers get delegation of
BPF functionality through BPF FS, and then which applications within such
containers are allowed to perform bpf() commands, based on namespaces
capabilities.

Previous attempt at addressing this very same problem ([0]) attempted to
utilize authoritative LSM approach, but was conclusively rejected by upstream
LSM maintainers. BPF token concept is not changing anything about LSM
approach, but can be combined with LSM hooks for very fine-grained security
policy. Some ideas about making BPF token more convenient to use with LSM (in
particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF
2023 presentation ([1]). E.g., an ability to specify user-provided data
(context), which in combination with BPF LSM would allow implementing a very
dynamic and fine-granular custom security policies on top of BPF token. In the
interest of minimizing API surface area and discussions this was relegated to
follow up patches, as it's not essential to the fundamental concept of
delegatable BPF token.

It should be noted that BPF token is conceptually quite similar to the idea of
/dev/bpf device file, proposed by Song a while ago ([2]). The biggest
difference is the idea of using virtual anon_inode file to hold BPF token and
allowing multiple independent instances of them, each (potentially) with its
own set of restrictions. And also, crucially, BPF token approach is not using
any special stateful task-scoped flags. Instead, bpf() syscall accepts
token_fd parameters explicitly for each relevant BPF command. This addresses
main concerns brought up during the /dev/bpf discussion, and fits better with
overall BPF subsystem design.

This patch set adds a basic minimum of functionality to make BPF token idea
useful and to discuss API and functionality. Currently only low-level libbpf
APIs support creating and passing BPF token around, allowing to test kernel
functionality, but for the most part is not sufficient for real-world
applications, which typically use high-level libbpf APIs based on `struct
bpf_object` type. This was done with the intent to limit the size of patch set
and concentrate on mostly kernel-side changes. All the necessary plumbing for
libbpf will be sent as a separate follow up patch set kernel support makes it
upstream.

Another part that should happen once kernel-side BPF token is established, is
a set of conventions between applications (e.g., systemd), tools (e.g.,
bpftool), and libraries (e.g., libbpf) on exposing delegatable BPF FS
instance(s) at well-defined locations to allow applications take advantage of
this in automatic fashion without explicit code changes on BPF application's
side. But I'd like to postpone this discussion to after BPF token concept
lands.

  [0] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/
  [1] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf
  [2] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/
  [3] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

v3->v4;
  - add delegation mount options to BPF FS;
  - BPF token is derived from the instance of BPF FS and associates itself
    with BPF FS' owning userns;
  - BPF token doesn't grant BPF functionality directly, it just turns
    capable() checks into ns_capable() checks within BPF FS' owning user;
  - BPF token cannot be pinned;
v2->v3:
  - make BPF_TOKEN_CREATE pin created BPF token in BPF FS, and disallow
    BPF_OBJ_PIN for BPF token;
v1->v2:
  - fix build failures on Kconfig with CONFIG_BPF_SYSCALL unset;
  - drop BPF_F_TOKEN_UNKNOWN_* flags and simplify UAPI (Stanislav).


Andrii Nakryiko (12):
  bpf: add BPF token delegation mount options to BPF FS
  bpf: introduce BPF token object
  bpf: add BPF token support to BPF_MAP_CREATE command
  bpf: add BPF token support to BPF_BTF_LOAD command
  bpf: add BPF token support to BPF_PROG_LOAD command
  bpf: take into account BPF token when fetching helper protos
  bpf: consistenly use BPF token throughout BPF verifier logic
  libbpf: add bpf_token_create() API
  libbpf: add BPF token support to bpf_map_create() API
  libbpf: add BPF token support to bpf_btf_load() API
  libbpf: add BPF token support to bpf_prog_load() API
  selftests/bpf: add BPF token-enabled tests

 drivers/media/rc/bpf-lirc.c                   |   2 +-
 include/linux/bpf.h                           |  78 ++-
 include/linux/filter.h                        |   2 +-
 include/uapi/linux/bpf.h                      |  44 ++
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/arraymap.c                         |   2 +-
 kernel/bpf/cgroup.c                           |   6 +-
 kernel/bpf/core.c                             |   3 +-
 kernel/bpf/helpers.c                          |   6 +-
 kernel/bpf/inode.c                            |  93 ++-
 kernel/bpf/syscall.c                          | 183 ++++--
 kernel/bpf/token.c                            | 229 +++++++
 kernel/bpf/verifier.c                         |  13 +-
 kernel/trace/bpf_trace.c                      |   2 +-
 net/core/filter.c                             |  36 +-
 net/ipv4/bpf_tcp_ca.c                         |   2 +-
 net/netfilter/nf_bpf_link.c                   |   2 +-
 tools/include/uapi/linux/bpf.h                |  44 ++
 tools/lib/bpf/bpf.c                           |  30 +-
 tools/lib/bpf/bpf.h                           |  39 +-
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |   4 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |   6 +
 .../testing/selftests/bpf/prog_tests/token.c  | 621 ++++++++++++++++++
 24 files changed, 1346 insertions(+), 104 deletions(-)
 create mode 100644 kernel/bpf/token.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 01/12] bpf: add BPF token delegation mount options to BPF FS
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
@ 2023-09-12 21:28 ` Andrii Nakryiko
  2023-09-12 21:28 ` [PATCH v4 bpf-next 02/12] bpf: introduce BPF token object Andrii Nakryiko
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:28 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add few new mount options to BPF FS that allow to specify that a given
BPF FS instance allows creation of BPF token (added in the next patch),
and what sort of operations are allowed under BPF token. As such, we get
4 new mount options, each is a bit mask
  - `delegate_cmds` allow to specify which bpf() syscall commands are
    allowed with BPF token derived from this BPF FS instance;
  - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
    a set of allowable BPF map types that could be created with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
    a set of allowable BPF program types that could be loaded with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
    a set of allowable BPF program attach types that could be loaded with
    BPF token; delegate_progs and delegate_attachs are meant to be used
    together, as full BPF program type is, in general, determined
    through both program type and program attach type.

Currently, these mount options accept the following forms of values:
  - a special value "any", that enables all possible values of a given
  bit set;
  - numeric value (decimal or hexadecimal, determined by kernel
  automatically) that specifies a bit mask value directly;
  - all the values for a given mount option are combined, if specified
  multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
  delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
  mask.

Ideally, more convenient (for humans) symbolic form derived from
corresponding UAPI enums would be accepted (e.g., `-o
delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
it requires a bunch of UAPI header churn, so I postponed it until this
feature lands upstream or at least there is a definite consensus that
this feature is acceptable and is going to make it, just to minimize
amount of wasted effort and not increase amount of non-essential code to
be reviewed.

Attentive reader will notice that BPF FS is now marked as
FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
user namespace as long as the process has sufficient *namespaced*
capabilities within that user namespace. But in reality we still
restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
to allow creating BPF FS context object (i.e., fsopen("bpf")) from
inside unprivileged process inside non-init userns, to capture that
userns as the owning userns. It will still be required to pass this
context object back to privileged process to instantiate and mount it.

This manipulation is important, because capturing non-init userns as the
owning userns of BPF FS instance (super block) allows to use that userns
to constraint BPF token to that userns later on (see next patch). So
creating BPF FS with delegation inside unprivileged userns will restrict
derived BPF token objects to only "work" inside that intended userns,
making it scoped to a intended "container".

There is a set of selftests at the end of the patch set that simulates
this sequence of steps and validates that everything works as intended.
But careful review is requested to make sure there are no missed gaps in
the implementation and testing.

All this is based on suggestions and discussions with Christian Brauner
([0]), to the best of my ability to follow all the implications.

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h | 10 ++++++
 kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b9e573159432..e9a3ab390844 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1558,6 +1558,16 @@ struct bpf_link_primer {
 	u32 id;
 };
 
+struct bpf_mount_opts {
+	umode_t mode;
+
+	/* BPF token-related delegation options */
+	u64 delegate_cmds;
+	u64 delegate_maps;
+	u64 delegate_progs;
+	u64 delegate_attachs;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 99d0625b6c82..8f66b57d3546 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -20,6 +20,7 @@
 #include <linux/filter.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <linux/kstrtox.h>
 #include "preload/bpf_preload.h"
 
 enum bpf_type {
@@ -600,10 +601,31 @@ EXPORT_SYMBOL(bpf_prog_get_type_path);
  */
 static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
+	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
 
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
+
+	if (opts->delegate_cmds == ~0ULL)
+		seq_printf(m, ",delegate_cmds=any");
+	else if (opts->delegate_cmds)
+		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
+
+	if (opts->delegate_maps == ~0ULL)
+		seq_printf(m, ",delegate_maps=any");
+	else if (opts->delegate_maps)
+		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
+
+	if (opts->delegate_progs == ~0ULL)
+		seq_printf(m, ",delegate_progs=any");
+	else if (opts->delegate_progs)
+		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
+
+	if (opts->delegate_attachs == ~0ULL)
+		seq_printf(m, ",delegate_attachs=any");
+	else if (opts->delegate_attachs)
+		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
 	return 0;
 }
 
@@ -627,22 +649,27 @@ static const struct super_operations bpf_super_ops = {
 
 enum {
 	OPT_MODE,
+	OPT_DELEGATE_CMDS,
+	OPT_DELEGATE_MAPS,
+	OPT_DELEGATE_PROGS,
+	OPT_DELEGATE_ATTACHS,
 };
 
 static const struct fs_parameter_spec bpf_fs_parameters[] = {
 	fsparam_u32oct	("mode",			OPT_MODE),
+	fsparam_string	("delegate_cmds",		OPT_DELEGATE_CMDS),
+	fsparam_string	("delegate_maps",		OPT_DELEGATE_MAPS),
+	fsparam_string	("delegate_progs",		OPT_DELEGATE_PROGS),
+	fsparam_string	("delegate_attachs",		OPT_DELEGATE_ATTACHS),
 	{}
 };
 
-struct bpf_mount_opts {
-	umode_t mode;
-};
-
 static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 {
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = fc->s_fs_info;
 	struct fs_parse_result result;
-	int opt;
+	int opt, err;
+	u64 msk;
 
 	opt = fs_parse(fc, bpf_fs_parameters, param, &result);
 	if (opt < 0) {
@@ -666,6 +693,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case OPT_MODE:
 		opts->mode = result.uint_32 & S_IALLUGO;
 		break;
+	case OPT_DELEGATE_CMDS:
+	case OPT_DELEGATE_MAPS:
+	case OPT_DELEGATE_PROGS:
+	case OPT_DELEGATE_ATTACHS:
+		if (strcmp(param->string, "any") == 0) {
+			msk = ~0ULL;
+		} else {
+			err = kstrtou64(param->string, 0, &msk);
+			if (err)
+				return err;
+		}
+		switch (opt) {
+		case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
+		case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
+		case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
+		case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
+		default: return -EINVAL;
+		}
+		break;
 	}
 
 	return 0;
@@ -740,10 +786,14 @@ static int populate_bpffs(struct dentry *parent)
 static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	static const struct tree_descr bpf_rfiles[] = { { "" } };
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = sb->s_fs_info;
 	struct inode *inode;
 	int ret;
 
+	/* Delegating an instance of BPF FS requires privileges */
+	if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
 	if (ret)
 		return ret;
@@ -765,7 +815,10 @@ static int bpf_get_tree(struct fs_context *fc)
 
 static void bpf_free_fc(struct fs_context *fc)
 {
-	kfree(fc->fs_private);
+	struct bpf_mount_opts *opts = fc->s_fs_info;
+
+	if (opts)
+		kfree(opts);
 }
 
 static const struct fs_context_operations bpf_context_ops = {
@@ -787,17 +840,32 @@ static int bpf_init_fs_context(struct fs_context *fc)
 
 	opts->mode = S_IRWXUGO;
 
-	fc->fs_private = opts;
+	/* start out with no BPF token delegation enabled */
+	opts->delegate_cmds = 0;
+	opts->delegate_maps = 0;
+	opts->delegate_progs = 0;
+	opts->delegate_attachs = 0;
+
+	fc->s_fs_info = opts;
 	fc->ops = &bpf_context_ops;
 	return 0;
 }
 
+static void bpf_kill_super(struct super_block *sb)
+{
+	struct bpf_mount_opts *opts = sb->s_fs_info;
+
+	kill_litter_super(sb);
+	kfree(opts);
+}
+
 static struct file_system_type bpf_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "bpf",
 	.init_fs_context = bpf_init_fs_context,
 	.parameters	= bpf_fs_parameters,
-	.kill_sb	= kill_litter_super,
+	.kill_sb	= bpf_kill_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 static int __init bpf_init(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 02/12] bpf: introduce BPF token object
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
  2023-09-12 21:28 ` [PATCH v4 bpf-next 01/12] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
@ 2023-09-12 21:28 ` Andrii Nakryiko
  2023-09-12 21:46   ` Andrii Nakryiko
  2023-09-13 21:46   ` [PATCH v4 2/12] " Paul Moore
  2023-09-12 21:28 ` [PATCH v4 bpf-next 03/12] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:28 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while have a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
a path specification (using the usual fd + string path combo) to a BPF
FS mount. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the creation
time or after the fact, allowing the process to guard itself further
from, e.g., unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

The alternative to creating BPF token object was:
  a) not having any extra object and just pasing BPF FS path to each
     relevant bpf() command. This seems suboptimal as it's racy (mount
     under the same path might change in between checking it and using it
     for bpf() command). And also less flexible if we'd like to further
     restrict ourselves compared to all the delegated functionality
     allowed on BPF FS.
  b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
     a dedicated FD that would represent a token-like functionality. This
     doesn't seem superior to having a proper bpf() command, so
     BPF_TOKEN_CREATE was chosen.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  36 +++++++
 include/uapi/linux/bpf.h       |  39 +++++++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/inode.c             |   4 +-
 kernel/bpf/syscall.c           |  17 +++
 kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  39 +++++++
 7 files changed, 324 insertions(+), 2 deletions(-)
 create mode 100644 kernel/bpf/token.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e9a3ab390844..6abd2b96e096 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -51,6 +51,8 @@ struct module;
 struct bpf_func_state;
 struct ftrace_ops;
 struct cgroup;
+struct bpf_token;
+struct user_namespace;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1568,6 +1570,13 @@ struct bpf_mount_opts {
 	u64 delegate_attachs;
 };
 
+struct bpf_token {
+	struct work_struct work;
+	atomic64_t refcnt;
+	struct user_namespace *userns;
+	u64 allowed_cmds;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
@@ -2192,6 +2201,15 @@ int bpf_link_new_fd(struct bpf_link *link);
 struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
 
+void bpf_token_inc(struct bpf_token *token);
+void bpf_token_put(struct bpf_token *token);
+int bpf_token_create(union bpf_attr *attr);
+int bpf_token_new_fd(struct bpf_token *token);
+struct bpf_token *bpf_token_get_from_fd(u32 ufd);
+
+bool bpf_token_capable(const struct bpf_token *token, int cap);
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
 
@@ -2551,6 +2569,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
 	return -EOPNOTSUPP;
 }
 
+static inline void bpf_token_inc(struct bpf_token *token)
+{
+}
+
+static inline void bpf_token_put(struct bpf_token *token)
+{
+}
+
+static inline int bpf_token_new_fd(struct bpf_token *token)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
 static inline void __dev_flush(void)
 {
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 73b155e52204..36e98c6f8944 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -847,6 +847,37 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		BPF FS mount is specified with openat()-style path FD + string.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +932,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1694,6 +1727,12 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_path_fd;
+		__u64		bpffs_pathname;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f526b7573e97..4ce95acfcaa7 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
 endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 8f66b57d3546..82f11fbffd3e 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -603,11 +603,13 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
 	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
+	u64 mask;
 
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
 
-	if (opts->delegate_cmds == ~0ULL)
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((opts->delegate_cmds & mask) == mask)
 		seq_printf(m, ",delegate_cmds=any");
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6a692f3bea15..4fae678c1f48 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5297,6 +5297,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 	return ret;
 }
 
+#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_pathname
+
+static int token_create(union bpf_attr *attr)
+{
+	if (CHECK_ATTR(BPF_TOKEN_CREATE))
+		return -EINVAL;
+
+	/* no flags are supported yet */
+	if (attr->token_create.flags)
+		return -EINVAL;
+
+	return bpf_token_create(attr);
+}
+
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
@@ -5430,6 +5444,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 	case BPF_PROG_BIND_MAP:
 		err = bpf_prog_bind_map(&attr);
 		break;
+	case BPF_TOKEN_CREATE:
+		err = token_create(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
new file mode 100644
index 000000000000..f6ea3eddbee6
--- /dev/null
+++ b/kernel/bpf/token.c
@@ -0,0 +1,189 @@
+#include <linux/bpf.h>
+#include <linux/vmalloc.h>
+#include <linux/anon_inodes.h>
+#include <linux/fdtable.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/idr.h>
+#include <linux/namei.h>
+#include <linux/user_namespace.h>
+
+bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	/* BPF token allows ns_capable() level of capabilities */
+	if (token) {
+		if (ns_capable(token->userns, cap))
+			return true;
+		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
+			return true;
+	}
+	/* otherwise fallback to capable() checks */
+	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
+
+void bpf_token_inc(struct bpf_token *token)
+{
+	atomic64_inc(&token->refcnt);
+}
+
+static void bpf_token_free(struct bpf_token *token)
+{
+	put_user_ns(token->userns);
+	kvfree(token);
+}
+
+static void bpf_token_put_deferred(struct work_struct *work)
+{
+	struct bpf_token *token = container_of(work, struct bpf_token, work);
+
+	bpf_token_free(token);
+}
+
+void bpf_token_put(struct bpf_token *token)
+{
+	if (!token)
+		return;
+
+	if (!atomic64_dec_and_test(&token->refcnt))
+		return;
+
+	INIT_WORK(&token->work, bpf_token_put_deferred);
+	schedule_work(&token->work);
+}
+
+static int bpf_token_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+
+	bpf_token_put(token);
+	return 0;
+}
+
+static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz,
+			      loff_t *ppos)
+{
+	/* We need this handler such that alloc_file() enables
+	 * f_mode with FMODE_CAN_READ.
+	 */
+	return -EINVAL;
+}
+
+static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
+			       size_t siz, loff_t *ppos)
+{
+	/* We need this handler such that alloc_file() enables
+	 * f_mode with FMODE_CAN_WRITE.
+	 */
+	return -EINVAL;
+}
+
+static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+	u64 mask;
+
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((token->allowed_cmds & mask) == mask)
+		seq_printf(m, "allowed_cmds:\tany\n");
+	else
+		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+}
+
+static const struct file_operations bpf_token_fops = {
+	.release	= bpf_token_release,
+	.read		= bpf_dummy_read,
+	.write		= bpf_dummy_write,
+	.show_fdinfo	= bpf_token_show_fdinfo,
+};
+
+static struct bpf_token *bpf_token_alloc(void)
+{
+	struct bpf_token *token;
+
+	token = kvzalloc(sizeof(*token), GFP_USER);
+	if (!token)
+		return NULL;
+
+	atomic64_set(&token->refcnt, 1);
+
+	return token;
+}
+
+int bpf_token_create(union bpf_attr *attr)
+{
+	struct path path;
+	struct bpf_mount_opts *mnt_opts;
+	struct bpf_token *token;
+	int ret;
+
+	ret = user_path_at(attr->token_create.bpffs_path_fd,
+			   u64_to_user_ptr(attr->token_create.bpffs_pathname),
+			   LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
+	if (ret)
+		return ret;
+
+	if (path.mnt->mnt_root != path.dentry) {
+		ret = -EINVAL;
+		goto out;
+	}
+	ret = path_permission(&path, MAY_ACCESS);
+	if (ret)
+		goto out;
+
+	token = bpf_token_alloc();
+	if (!token) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* remember bpffs owning userns for future ns_capable() checks */
+	token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
+
+	mnt_opts = path.dentry->d_sb->s_fs_info;
+	token->allowed_cmds = mnt_opts->delegate_cmds;
+
+	ret = bpf_token_new_fd(token);
+	if (ret < 0)
+		bpf_token_free(token);
+out:
+	path_put(&path);
+	return ret;
+}
+
+#define BPF_TOKEN_INODE_NAME "bpf-token"
+
+/* Alloc anon_inode and FD for prepared token.
+ * Returns fd >= 0 on success; negative error, otherwise.
+ */
+int bpf_token_new_fd(struct bpf_token *token)
+{
+	return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
+}
+
+struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_token *token;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+	if (f.file->f_op != &bpf_token_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	token = f.file->private_data;
+	bpf_token_inc(token);
+	fdput(f);
+
+	return token;
+}
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	if (!token)
+		return false;
+
+	return token->allowed_cmds & (1ULL << cmd);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 73b155e52204..36e98c6f8944 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -847,6 +847,37 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		BPF FS mount is specified with openat()-style path FD + string.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +932,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1694,6 +1727,12 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_path_fd;
+		__u64		bpffs_pathname;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 03/12] bpf: add BPF token support to BPF_MAP_CREATE command
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
  2023-09-12 21:28 ` [PATCH v4 bpf-next 01/12] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
  2023-09-12 21:28 ` [PATCH v4 bpf-next 02/12] bpf: introduce BPF token object Andrii Nakryiko
@ 2023-09-12 21:28 ` Andrii Nakryiko
  2023-09-12 21:28 ` [PATCH v4 bpf-next 04/12] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:28 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  3 ++
 include/uapi/linux/bpf.h                      |  2 +
 kernel/bpf/inode.c                            |  3 +-
 kernel/bpf/syscall.c                          | 54 +++++++++++++++----
 kernel/bpf/token.c                            | 15 ++++++
 tools/include/uapi/linux/bpf.h                |  2 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 ++
 8 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6abd2b96e096..730c218e6a63 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -265,6 +265,7 @@ struct bpf_map {
 	u32 btf_value_type_id;
 	u32 btf_vmlinux_value_type_id;
 	struct btf *btf;
+	struct bpf_token *token;
 #ifdef CONFIG_MEMCG_KMEM
 	struct obj_cgroup *objcg;
 #endif
@@ -1575,6 +1576,7 @@ struct bpf_token {
 	atomic64_t refcnt;
 	struct user_namespace *userns;
 	u64 allowed_cmds;
+	u64 allowed_maps;
 };
 
 struct bpf_struct_ops_value;
@@ -2209,6 +2211,7 @@ struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 
 bool bpf_token_capable(const struct bpf_token *token, int cap);
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 36e98c6f8944..9c399454712e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -984,6 +984,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1423,6 +1424,7 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 82f11fbffd3e..6e8b4e2bda97 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -614,7 +614,8 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
 
-	if (opts->delegate_maps == ~0ULL)
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((opts->delegate_maps & mask) == mask)
 		seq_printf(m, ",delegate_maps=any");
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4fae678c1f48..fbd8c82e5ec6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 	struct btf_record *rec = map->record;
+	struct bpf_token *token = map->token;
 
 	security_bpf_map_free(map);
 	bpf_map_release_memcg(map);
@@ -710,6 +711,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 	 * template bpf_map struct used during verification.
 	 */
 	btf_record_free(rec);
+	bpf_token_put(token);
 }
 
 static void bpf_map_put_uref(struct bpf_map *map)
@@ -1014,7 +1016,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
 
-		if (!bpf_capable()) {
+		if (!bpf_token_capable(map->token, CAP_BPF)) {
 			ret = -EPERM;
 			goto free_map_tab;
 		}
@@ -1097,11 +1099,12 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD map_extra
+#define BPF_MAP_CREATE_LAST_FIELD map_token_fd
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
 	const struct bpf_map_ops *ops;
+	struct bpf_token *token = NULL;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 map_type = attr->map_type;
 	struct bpf_map *map;
@@ -1152,14 +1155,32 @@ static int map_create(union bpf_attr *attr)
 	if (!ops->map_mem_usage)
 		return -EINVAL;
 
+	if (attr->map_token_fd) {
+		token = bpf_token_get_from_fd(attr->map_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+
+		/* if current token doesn't grant map creation permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) ||
+		    !bpf_token_allow_map_type(token, attr->map_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	err = -EPERM;
+
 	/* Intent here is for unprivileged_bpf_disabled to block BPF map
 	 * creation for unprivileged users; other actions depend
 	 * on fd availability and access to bpffs, so are dependent on
 	 * object creation success. Even with unprivileged BPF disabled,
 	 * capability checks are still carried out.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF))
+		goto put_token;
 
 	/* check privileged map type permissions */
 	switch (map_type) {
@@ -1192,28 +1213,36 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
 	case BPF_MAP_TYPE_STRUCT_OPS:
 	case BPF_MAP_TYPE_CPUMAP:
-		if (!bpf_capable())
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_BPF))
+			goto put_token;
 		break;
 	case BPF_MAP_TYPE_SOCKMAP:
 	case BPF_MAP_TYPE_SOCKHASH:
 	case BPF_MAP_TYPE_DEVMAP:
 	case BPF_MAP_TYPE_DEVMAP_HASH:
 	case BPF_MAP_TYPE_XSKMAP:
-		if (!capable(CAP_NET_ADMIN))
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_NET_ADMIN))
+			goto put_token;
 		break;
 	default:
 		WARN(1, "unsupported map type %d", map_type);
-		return -EPERM;
+		goto put_token;
 	}
 
 	map = ops->map_alloc(attr);
-	if (IS_ERR(map))
-		return PTR_ERR(map);
+	if (IS_ERR(map)) {
+		err = PTR_ERR(map);
+		goto put_token;
+	}
 	map->ops = ops;
 	map->map_type = map_type;
 
+	if (token) {
+		/* move token reference into map->token, reuse our refcnt */
+		map->token = token;
+		token = NULL;
+	}
+
 	err = bpf_obj_name_cpy(map->name, attr->map_name,
 			       sizeof(attr->map_name));
 	if (err < 0)
@@ -1286,8 +1315,11 @@ static int map_create(union bpf_attr *attr)
 free_map_sec:
 	security_bpf_map_free(map);
 free_map:
+	bpf_token_put(map->token);
 	btf_put(map->btf);
 	map->ops->map_free(map);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index f6ea3eddbee6..bcc170fcf341 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -88,6 +88,12 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_cmds:\tany\n");
 	else
 		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((token->allowed_maps & mask) == mask)
+		seq_printf(m, "allowed_maps:\tany\n");
+	else
+		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
 }
 
 static const struct file_operations bpf_token_fops = {
@@ -142,6 +148,7 @@ int bpf_token_create(union bpf_attr *attr)
 
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
+	token->allowed_maps = mnt_opts->delegate_maps;
 
 	ret = bpf_token_new_fd(token);
 	if (ret < 0)
@@ -187,3 +194,11 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
 
 	return token->allowed_cmds & (1ULL << cmd);
 }
+
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
+{
+	if (!token || type >= __MAX_BPF_MAP_TYPE)
+		return false;
+
+	return token->allowed_maps & (1ULL << type);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 36e98c6f8944..9c399454712e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -984,6 +984,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1423,6 +1424,7 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 9f766ddd946a..573249a2814d 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -68,6 +68,8 @@ void test_libbpf_probe_map_types(void)
 
 		if (map_type == BPF_MAP_TYPE_UNSPEC)
 			continue;
+		if (strcmp(map_type_name, "__MAX_BPF_MAP_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(map_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index c440ea3311ed..2a0633f43c73 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -132,6 +132,9 @@ static void test_libbpf_bpf_map_type_str(void)
 		const char *map_type_str;
 		char buf[256];
 
+		if (map_type == __MAX_BPF_MAP_TYPE)
+			continue;
+
 		map_type_name = btf__str_by_offset(btf, e->name_off);
 		map_type_str = libbpf_bpf_map_type_str(map_type);
 		ASSERT_OK_PTR(map_type_str, map_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 04/12] bpf: add BPF token support to BPF_BTF_LOAD command
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-09-12 21:28 ` [PATCH v4 bpf-next 03/12] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
@ 2023-09-12 21:28 ` Andrii Nakryiko
  2023-09-12 21:28 ` [PATCH v4 bpf-next 05/12] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:28 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BTF loading is a pretty straightforward
operation, so as long as BPF token is created with allow_cmds granting
BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating
BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/uapi/linux/bpf.h       |  1 +
 kernel/bpf/syscall.c           | 20 ++++++++++++++++++--
 tools/include/uapi/linux/bpf.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9c399454712e..1527d861f408 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1606,6 +1606,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_token_fd;
 	};
 
 	struct {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fbd8c82e5ec6..0d394b471ee0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4714,15 +4714,31 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 	return err;
 }
 
-#define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size
+#define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
 
 static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
 {
+	struct bpf_token *token = NULL;
+
 	if (CHECK_ATTR(BPF_BTF_LOAD))
 		return -EINVAL;
 
-	if (!bpf_capable())
+	if (attr->btf_token_fd) {
+		token = bpf_token_get_from_fd(attr->btf_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		if (!bpf_token_allow_cmd(token, BPF_BTF_LOAD)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	if (!bpf_token_capable(token, CAP_BPF)) {
+		bpf_token_put(token);
 		return -EPERM;
+	}
+
+	bpf_token_put(token);
 
 	return btf_new_fd(attr, uattr, uattr_size);
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9c399454712e..1527d861f408 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1606,6 +1606,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_token_fd;
 	};
 
 	struct {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 05/12] bpf: add BPF token support to BPF_PROG_LOAD command
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-09-12 21:28 ` [PATCH v4 bpf-next 04/12] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
@ 2023-09-12 21:28 ` Andrii Nakryiko
  2023-09-12 21:29 ` [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:28 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
allowed BPF program types and attach types, derived from BPF FS at BPF
token creation time. Then make sure we perform bpf_token_capable()
checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  6 ++
 include/uapi/linux/bpf.h                      |  2 +
 kernel/bpf/core.c                             |  1 +
 kernel/bpf/inode.c                            |  6 +-
 kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
 kernel/bpf/token.c                            | 25 ++++++
 tools/include/uapi/linux/bpf.h                |  2 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 +
 9 files changed, 108 insertions(+), 26 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 730c218e6a63..4eb055bcf65f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1437,6 +1437,7 @@ struct bpf_prog_aux {
 #ifdef CONFIG_SECURITY
 	void *security;
 #endif
+	struct bpf_token *token;
 	struct bpf_prog_offload *offload;
 	struct btf *btf;
 	struct bpf_func_info *func_info;
@@ -1577,6 +1578,8 @@ struct bpf_token {
 	struct user_namespace *userns;
 	u64 allowed_cmds;
 	u64 allowed_maps;
+	u64 allowed_progs;
+	u64 allowed_attachs;
 };
 
 struct bpf_struct_ops_value;
@@ -2212,6 +2215,9 @@ struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 bool bpf_token_capable(const struct bpf_token *token, int cap);
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
 bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1527d861f408..2fec43a56170 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1029,6 +1029,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1494,6 +1495,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		__u32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 95599df82ee4..531d18a59121 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2743,6 +2743,7 @@ void bpf_prog_free(struct bpf_prog *fp)
 
 	if (aux->dst_prog)
 		bpf_prog_put(aux->dst_prog);
+	bpf_token_put(aux->token);
 	INIT_WORK(&aux->work, bpf_prog_free_deferred);
 	schedule_work(&aux->work);
 }
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 6e8b4e2bda97..fa39797f7076 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -620,12 +620,14 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
 
-	if (opts->delegate_progs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((opts->delegate_progs & mask) == mask)
 		seq_printf(m, ",delegate_progs=any");
 	else if (opts->delegate_progs)
 		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
 
-	if (opts->delegate_attachs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((opts->delegate_attachs & mask) == mask)
 		seq_printf(m, ",delegate_attachs=any");
 	else if (opts->delegate_attachs)
 		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0d394b471ee0..dafefc73c620 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2582,13 +2582,15 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 }
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD log_true_size
+#define BPF_PROG_LOAD_LAST_FIELD prog_token_fd
 
 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 {
 	enum bpf_prog_type type = attr->prog_type;
 	struct bpf_prog *prog, *dst_prog = NULL;
 	struct btf *attach_btf = NULL;
+	struct bpf_token *token = NULL;
+	bool bpf_cap;
 	int err;
 	char license[128];
 
@@ -2604,10 +2606,31 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_XDP_DEV_BOUND_ONLY))
 		return -EINVAL;
 
+	bpf_prog_load_fixup_attach_type(attr);
+
+	if (attr->prog_token_fd) {
+		token = bpf_token_get_from_fd(attr->prog_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		/* if current token doesn't grant prog loading permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_PROG_LOAD) ||
+		    !bpf_token_allow_prog_type(token, attr->prog_type,
+					       attr->expected_attach_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	bpf_cap = bpf_token_capable(token, CAP_BPF);
+	err = -EPERM;
+
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
 	/* Intent here is for unprivileged_bpf_disabled to block BPF program
 	 * creation for unprivileged users; other actions depend
@@ -2616,21 +2639,23 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 * capability checks are still carried out for these
 	 * and other operations.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_cap)
+		goto put_token;
 
 	if (attr->insn_cnt == 0 ||
-	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
-		return -E2BIG;
+	    attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) {
+		err = -E2BIG;
+		goto put_token;
+	}
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
-	if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN))
-		return -EPERM;
-	if (is_perfmon_prog_type(type) && !perfmon_capable())
-		return -EPERM;
+	if (is_net_admin_prog_type(type) && !bpf_token_capable(token, CAP_NET_ADMIN))
+		goto put_token;
+	if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON))
+		goto put_token;
 
 	/* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog
 	 * or btf, we need to check which one it is
@@ -2640,27 +2665,33 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 		if (IS_ERR(dst_prog)) {
 			dst_prog = NULL;
 			attach_btf = btf_get_by_fd(attr->attach_btf_obj_fd);
-			if (IS_ERR(attach_btf))
-				return -EINVAL;
+			if (IS_ERR(attach_btf)) {
+				err = -EINVAL;
+				goto put_token;
+			}
 			if (!btf_is_kernel(attach_btf)) {
 				/* attaching through specifying bpf_prog's BTF
 				 * objects directly might be supported eventually
 				 */
 				btf_put(attach_btf);
-				return -ENOTSUPP;
+				err = -ENOTSUPP;
+				goto put_token;
 			}
 		}
 	} else if (attr->attach_btf_id) {
 		/* fall back to vmlinux BTF, if BTF type ID is specified */
 		attach_btf = bpf_get_btf_vmlinux();
-		if (IS_ERR(attach_btf))
-			return PTR_ERR(attach_btf);
-		if (!attach_btf)
-			return -EINVAL;
+		if (IS_ERR(attach_btf)) {
+			err = PTR_ERR(attach_btf);
+			goto put_token;
+		}
+		if (!attach_btf) {
+			err = -EINVAL;
+			goto put_token;
+		}
 		btf_get(attach_btf);
 	}
 
-	bpf_prog_load_fixup_attach_type(attr);
 	if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
 				       attach_btf, attr->attach_btf_id,
 				       dst_prog)) {
@@ -2668,7 +2699,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -EINVAL;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	/* plain bpf_prog allocation */
@@ -2678,7 +2710,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -ENOMEM;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	prog->expected_attach_type = attr->expected_attach_type;
@@ -2689,6 +2722,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
 
+	/* move token into prog->aux, reuse taken refcnt */
+	prog->aux->token = token;
+	token = NULL;
+
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
 		goto free_prog;
@@ -2790,6 +2827,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 	bpf_prog_free(prog);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
@@ -3770,7 +3809,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
-		if (!capable(CAP_NET_ADMIN))
+		if (!bpf_token_capable(prog->aux->token, CAP_NET_ADMIN))
 			/* cg-skb progs can be loaded by unpriv user.
 			 * check permissions at attach time.
 			 */
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index bcc170fcf341..b28589e8875e 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -94,6 +94,18 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_maps:\tany\n");
 	else
 		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
+
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((token->allowed_progs & mask) == mask)
+		seq_printf(m, "allowed_progs:\tany\n");
+	else
+		seq_printf(m, "allowed_progs:\t0x%llx\n", token->allowed_progs);
+
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((token->allowed_attachs & mask) == mask)
+		seq_printf(m, "allowed_attachs:\tany\n");
+	else
+		seq_printf(m, "allowed_attachs:\t0x%llx\n", token->allowed_attachs);
 }
 
 static const struct file_operations bpf_token_fops = {
@@ -149,6 +161,8 @@ int bpf_token_create(union bpf_attr *attr)
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
 	token->allowed_maps = mnt_opts->delegate_maps;
+	token->allowed_progs = mnt_opts->delegate_progs;
+	token->allowed_attachs = mnt_opts->delegate_attachs;
 
 	ret = bpf_token_new_fd(token);
 	if (ret < 0)
@@ -202,3 +216,14 @@ bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type t
 
 	return token->allowed_maps & (1ULL << type);
 }
+
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type)
+{
+	if (!token || prog_type >= __MAX_BPF_PROG_TYPE || attach_type >= __MAX_BPF_ATTACH_TYPE)
+		return false;
+
+	return (token->allowed_progs & (1ULL << prog_type)) &&
+	       (token->allowed_attachs & (1ULL << attach_type));
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1527d861f408..2fec43a56170 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1029,6 +1029,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1494,6 +1495,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		__u32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 573249a2814d..4ed46ed58a7b 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -30,6 +30,8 @@ void test_libbpf_probe_prog_types(void)
 
 		if (prog_type == BPF_PROG_TYPE_UNSPEC)
 			continue;
+		if (strcmp(prog_type_name, "__MAX_BPF_PROG_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(prog_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index 2a0633f43c73..384bc1f7a65e 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -189,6 +189,9 @@ static void test_libbpf_bpf_prog_type_str(void)
 		const char *prog_type_str;
 		char buf[256];
 
+		if (prog_type == __MAX_BPF_PROG_TYPE)
+			continue;
+
 		prog_type_name = btf__str_by_offset(btf, e->name_off);
 		prog_type_str = libbpf_bpf_prog_type_str(prog_type);
 		ASSERT_OK_PTR(prog_type_str, prog_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-09-12 21:28 ` [PATCH v4 bpf-next 05/12] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
@ 2023-09-12 21:29 ` Andrii Nakryiko
  2023-09-13  9:45   ` kernel test robot
  2023-09-13 18:41   ` kernel test robot
  2023-09-12 21:29 ` [PATCH v4 bpf-next 07/12] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:29 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Instead of performing unconditional system-wide bpf_capable() and
perfmon_capable() calls inside bpf_base_func_proto() function (and other
similar ones) to determine eligibility of a given BPF helper for a given
program, use previously recorded BPF token during BPF_PROG_LOAD command
handling to inform the decision.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 drivers/media/rc/bpf-lirc.c |  2 +-
 include/linux/bpf.h         |  5 +++--
 kernel/bpf/cgroup.c         |  6 +++---
 kernel/bpf/helpers.c        |  6 +++---
 kernel/bpf/syscall.c        |  5 +++--
 kernel/trace/bpf_trace.c    |  2 +-
 net/core/filter.c           | 32 ++++++++++++++++----------------
 net/ipv4/bpf_tcp_ca.c       |  2 +-
 net/netfilter/nf_bpf_link.c |  2 +-
 9 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index fe17c7f98e81..6d07693c6b9f 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_trace_printk:
-		if (perfmon_capable())
+		if (bpf_token_capable(prog->aux->token, CAP_PERFMON))
 			return bpf_get_trace_printk_proto();
 		fallthrough;
 	default:
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4eb055bcf65f..ddeb27be83ca 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2462,7 +2462,8 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
 struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
 
-const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
+const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
+						 const struct bpf_prog *prog);
 void bpf_task_storage_free(struct task_struct *task);
 void bpf_cgrp_storage_free(struct cgroup *cgroup);
 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
@@ -2719,7 +2720,7 @@ static inline int btf_struct_access(struct bpf_verifier_log *log,
 }
 
 static inline const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	return NULL;
 }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5b2741aa0d9b..39d6cfb6f304 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1615,7 +1615,7 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2173,7 +2173,7 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2330,7 +2330,7 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index b0a9834f1051..858d5e16ce9b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1665,7 +1665,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
 const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
 
 const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -1716,7 +1716,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!bpf_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 		return NULL;
 
 	switch (func_id) {
@@ -1774,7 +1774,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	switch (func_id) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index dafefc73c620..ee486fa4a233 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5640,7 +5640,7 @@ static const struct bpf_func_proto bpf_sys_bpf_proto = {
 const struct bpf_func_proto * __weak
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 BPF_CALL_1(bpf_sys_close, u32, fd)
@@ -5690,7 +5690,8 @@ syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_sys_bpf:
-		return !perfmon_capable() ? NULL : &bpf_sys_bpf_proto;
+		return !bpf_token_capable(prog->aux->token, CAP_PERFMON)
+		       ? NULL : &bpf_sys_bpf_proto;
 	case BPF_FUNC_btf_find_by_name_kind:
 		return &bpf_btf_find_by_name_kind_proto;
 	case BPF_FUNC_sys_close:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a7264b2c17ad..c57139e63f48 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1554,7 +1554,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a094694899c9..6f0aa4095543 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -83,7 +83,7 @@
 #include <net/netfilter/nf_conntrack_bpf.h>
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id);
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 
 int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len)
 {
@@ -7806,7 +7806,7 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7889,7 +7889,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 			return NULL;
 		}
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7908,7 +7908,7 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_skb_event_output_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8095,7 +8095,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8154,7 +8154,7 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 
 #if IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)
@@ -8215,7 +8215,7 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_sock_proto;
 #endif /* CONFIG_INET */
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8257,7 +8257,7 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_cgroup_classid_curr_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8301,7 +8301,7 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_skc_lookup_tcp_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8312,7 +8312,7 @@ flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_load_bytes:
 		return &bpf_flow_dissector_load_bytes_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8339,7 +8339,7 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11170,7 +11170,7 @@ sk_reuseport_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11352,7 +11352,7 @@ sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_sk_release:
 		return &bpf_sk_release_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11686,7 +11686,7 @@ const struct bpf_func_proto bpf_sock_from_file_proto = {
 };
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id)
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	const struct bpf_func_proto *func;
 
@@ -11715,10 +11715,10 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	return func;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 39dcccf0f174..c7bbd8f3c708 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -191,7 +191,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c
index e502ec00b2fe..1969facac91c 100644
--- a/net/netfilter/nf_bpf_link.c
+++ b/net/netfilter/nf_bpf_link.c
@@ -314,7 +314,7 @@ static bool nf_is_valid_access(int off, int size, enum bpf_access_type type,
 static const struct bpf_func_proto *
 bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 const struct bpf_verifier_ops netfilter_verifier_ops = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 07/12] bpf: consistenly use BPF token throughout BPF verifier logic
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-09-12 21:29 ` [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
@ 2023-09-12 21:29 ` Andrii Nakryiko
  2023-09-13 22:15   ` kernel test robot
  2023-09-12 21:29 ` [PATCH v4 bpf-next 08/12] libbpf: add bpf_token_create() API Andrii Nakryiko
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:29 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Remove remaining direct queries to perfmon_capable() and bpf_capable()
in BPF verifier logic and instead use BPF token (if available) to make
decisions about privileges.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h    | 18 ++++++++++--------
 include/linux/filter.h |  2 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/core.c      |  2 +-
 kernel/bpf/verifier.c  | 13 ++++++-------
 net/core/filter.c      |  4 ++--
 6 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ddeb27be83ca..0421f9b11520 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2172,24 +2172,26 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map)
 
 extern int sysctl_unprivileged_bpf_disabled;
 
-static inline bool bpf_allow_ptr_leaks(void)
+bool bpf_token_capable(const struct bpf_token *token, int cap);
+
+static inline bool bpf_allow_ptr_leaks(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_allow_uninit_stack(void)
+static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v1(void)
+static inline bool bpf_bypass_spec_v1(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v4(void)
+static inline bool bpf_bypass_spec_v4(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
 int bpf_map_new_fd(struct bpf_map *map, int flags);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 761af6b3cf2b..90851fc987e1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1101,7 +1101,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
 		return false;
 	if (!bpf_jit_harden)
 		return false;
-	if (bpf_jit_harden == 1 && bpf_capable())
+	if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
 		return false;
 
 	return true;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..f0c64df6b6ff 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -82,7 +82,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 elem_size, index_mask, max_entries;
-	bool bypass_spec_v1 = bpf_bypass_spec_v1();
+	bool bypass_spec_v1 = bpf_bypass_spec_v1(NULL);
 	u64 array_size, mask64;
 	struct bpf_array *array;
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 531d18a59121..11f6346ac09d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -671,7 +671,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
 	if (!bpf_prog_kallsyms_candidate(fp) ||
-	    !bpf_capable())
+	    !bpf_token_capable(fp->aux->token, CAP_BPF))
 		return;
 
 	bpf_prog_ksym_set_addr(fp);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 18e673c0ac15..593cb0bbb6ae 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19824,7 +19824,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	env->prog = *prog;
 	env->ops = bpf_verifier_ops[env->prog->type];
 	env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
-	is_priv = bpf_capable();
+
+	env->allow_ptr_leaks = bpf_allow_ptr_leaks(env->prog->aux->token);
+	env->allow_uninit_stack = bpf_allow_uninit_stack(env->prog->aux->token);
+	env->bypass_spec_v1 = bpf_bypass_spec_v1(env->prog->aux->token);
+	env->bypass_spec_v4 = bpf_bypass_spec_v4(env->prog->aux->token);
+	env->bpf_capable = is_priv = bpf_token_capable(env->prog->aux->token, CAP_BPF);
 
 	bpf_get_btf_vmlinux();
 
@@ -19856,12 +19861,6 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
 		env->strict_alignment = false;
 
-	env->allow_ptr_leaks = bpf_allow_ptr_leaks();
-	env->allow_uninit_stack = bpf_allow_uninit_stack();
-	env->bypass_spec_v1 = bpf_bypass_spec_v1();
-	env->bypass_spec_v4 = bpf_bypass_spec_v4();
-	env->bpf_capable = bpf_capable();
-
 	if (is_priv)
 		env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 6f0aa4095543..b4f4041541c3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8514,7 +8514,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		return false;
 	case bpf_ctx_range(struct __sk_buff, data):
 	case bpf_ctx_range(struct __sk_buff, data_end):
-		if (!bpf_capable())
+		if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 			return false;
 		break;
 	}
@@ -8526,7 +8526,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
 			break;
 		case bpf_ctx_range(struct __sk_buff, tstamp):
-			if (!bpf_capable())
+			if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 				return false;
 			break;
 		default:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 08/12] libbpf: add bpf_token_create() API
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-09-12 21:29 ` [PATCH v4 bpf-next 07/12] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
@ 2023-09-12 21:29 ` Andrii Nakryiko
  2023-09-12 21:42   ` Andrii Nakryiko
  2023-09-12 21:29 ` [PATCH v4 bpf-next 09/12] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:29 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c      | 19 +++++++++++++++++++
 tools/lib/bpf/bpf.h      | 29 +++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 49 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index b0f1913763a3..593ff9ea120d 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1271,3 +1271,22 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
 	ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz);
 	return libbpf_err_errno(ret);
 }
+
+int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
+		     struct bpf_token_create_opts *opts)
+{
+	const size_t attr_sz = offsetofend(union bpf_attr, token_create);
+	union bpf_attr attr;
+	int fd;
+
+	if (!OPTS_VALID(opts, bpf_token_create_opts))
+		return libbpf_err(-EINVAL);
+
+	memset(&attr, 0, attr_sz);
+	attr.token_create.bpffs_path_fd = bpffs_path_fd;
+	attr.token_create.bpffs_pathname = ptr_to_u64(bpffs_pathname);
+	attr.token_create.flags = OPTS_GET(opts, flags, 0);
+
+	fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
+	return libbpf_err_errno(fd);
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 74c2887cfd24..16d5c257066c 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -635,6 +635,35 @@ struct bpf_test_run_opts {
 LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
 				      struct bpf_test_run_opts *opts);
 
+struct bpf_token_create_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
+	size_t :0;
+};
+#define bpf_token_create_opts__last_field flags
+
+/**
+ * @brief **bpf_token_create()** creates a new instance of BPF token, pinning
+ * it at the specified location in BPF FS.
+ *
+ * BPF token created and pinned with this API can be subsequently opened using
+ * bpf_obj_get() API to obtain FD that can be passed to bpf() syscall for
+ * commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
+ *
+ * @param pin_path_fd O_PATH FD (see man 2 openat() for semantics) specifying,
+ * in combination with *pin_pathname*, target location in BPF FS at which to
+ * create and pin BPF token.
+ * @param pin_pathname absolute or relative path specifying, in combination
+ * with *pin_path_fd*, specifying in combination with *pin_path_fd*, target
+ * location in BPF FS at which to create and pin BPF token.
+ * @param opts optional BPF token creation options, can be NULL
+ *
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
+				struct bpf_token_create_opts *opts);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 57712321490f..c45c28a5e14c 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -400,4 +400,5 @@ LIBBPF_1.3.0 {
 		bpf_program__attach_netfilter;
 		bpf_program__attach_tcx;
 		bpf_program__attach_uprobe_multi;
+		bpf_token_create;
 } LIBBPF_1.2.0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 09/12] libbpf: add BPF token support to bpf_map_create() API
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2023-09-12 21:29 ` [PATCH v4 bpf-next 08/12] libbpf: add bpf_token_create() API Andrii Nakryiko
@ 2023-09-12 21:29 ` Andrii Nakryiko
  2023-09-12 21:29 ` [PATCH v4 bpf-next 10/12] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:29 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 4 +++-
 tools/lib/bpf/bpf.h | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 593ff9ea120d..f9ee7608a96a 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -169,7 +169,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 		   __u32 max_entries,
 		   const struct bpf_map_create_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
+	const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd);
 	union bpf_attr attr;
 	int fd;
 
@@ -198,6 +198,8 @@ int bpf_map_create(enum bpf_map_type map_type,
 	attr.numa_node = OPTS_GET(opts, numa_node, 0);
 	attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
 
+	attr.map_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
 	return libbpf_err_errno(fd);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 16d5c257066c..415ecebd41aa 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -51,8 +51,10 @@ struct bpf_map_create_opts {
 
 	__u32 numa_node;
 	__u32 map_ifindex;
+
+	__u32 token_fd;
 };
-#define bpf_map_create_opts__last_field map_ifindex
+#define bpf_map_create_opts__last_field token_fd
 
 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
 			      const char *map_name,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 10/12] libbpf: add BPF token support to bpf_btf_load() API
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2023-09-12 21:29 ` [PATCH v4 bpf-next 09/12] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
@ 2023-09-12 21:29 ` Andrii Nakryiko
  2023-09-12 21:29 ` [PATCH v4 bpf-next 11/12] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
  2023-09-12 21:29 ` [PATCH v4 bpf-next 12/12] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:29 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 4 +++-
 tools/lib/bpf/bpf.h | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index f9ee7608a96a..4547ae1037af 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1168,7 +1168,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
 
 int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, btf_log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd);
 	union bpf_attr attr;
 	char *log_buf;
 	size_t log_size;
@@ -1193,6 +1193,8 @@ int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts
 
 	attr.btf = ptr_to_u64(btf_data);
 	attr.btf_size = btf_size;
+	attr.btf_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	/* log_level == 0 and log_buf != NULL means "try loading without
 	 * log_buf, but retry with log_buf and log_level=1 on error", which is
 	 * consistent across low-level and high-level BTF and program loading
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 415ecebd41aa..20351bfba533 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -132,9 +132,10 @@ struct bpf_btf_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_btf_load_opts__last_field log_true_size
+#define bpf_btf_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
 			    struct bpf_btf_load_opts *opts);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 11/12] libbpf: add BPF token support to bpf_prog_load() API
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2023-09-12 21:29 ` [PATCH v4 bpf-next 10/12] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
@ 2023-09-12 21:29 ` Andrii Nakryiko
  2023-09-12 21:29 ` [PATCH v4 bpf-next 12/12] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:29 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 3 ++-
 tools/lib/bpf/bpf.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4547ae1037af..5a238831b4ff 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -234,7 +234,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, size_t insn_cnt,
 		  struct bpf_prog_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
 	void *finfo = NULL, *linfo = NULL;
 	const char *func_info, *line_info;
 	__u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd;
@@ -263,6 +263,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	attr.prog_flags = OPTS_GET(opts, prog_flags, 0);
 	attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0);
 	attr.kern_version = OPTS_GET(opts, kern_version, 0);
+	attr.prog_token_fd = OPTS_GET(opts, token_fd, 0);
 
 	if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
 		libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 20351bfba533..d082e3cfa070 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -104,9 +104,10 @@ struct bpf_prog_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_prog_load_opts__last_field log_true_size
+#define bpf_prog_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
 			     const char *prog_name, const char *license,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 bpf-next 12/12] selftests/bpf: add BPF token-enabled tests
  2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (10 preceding siblings ...)
  2023-09-12 21:29 ` [PATCH v4 bpf-next 11/12] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
@ 2023-09-12 21:29 ` Andrii Nakryiko
  11 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:29 UTC (permalink / raw)
  To: bpf
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add a selftest that attempts to conceptually replicate intended BPF
token use cases inside user namespaced container.

Child process is forked. It is then put into its own userns and mountns.
Child creates BPF FS context object and sets it up as desired. This
ensures child userns is captures as owning userns for this instance of
BPF FS.

This context is passed back to privileged parent process through Unix
socket, where parent creates and mounts it as a detached mount. This
mount FD is passed back to the child to be used for BPF token creation,
which allows otherwise privileged BPF operations to succeed inside
userns.

We validate that all of token-enabled privileged commands (BPF_BTF_LOAD,
BPF_MAP_CREATE, and BPF_PROG_LOAD) work as intended. They should only
succeed inside the userns if a) BPF token is provided with proper
allowed sets of commands and types; and b) namespaces CAP_BPF and other
privileges are set. Lacking a) or b) should lead to -EPERM failures.

Based on suggested workflow by Christian Brauner ([0]).

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 621 ++++++++++++++++++
 1 file changed, 621 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
new file mode 100644
index 000000000000..c95d0e41e563
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -0,0 +1,621 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#define _GNU_SOURCE
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "cap_helpers.h"
+#include <fcntl.h>
+#include <sched.h>
+#include <signal.h>
+#include <unistd.h>
+#include <linux/filter.h>
+#include <linux/unistd.h>
+#include <sys/mount.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <sys/un.h>
+
+/* copied from include/uapi/linux/mount.h, as including it conflicts with
+ * sys/mount.h include
+ */
+enum fsconfig_command {
+	FSCONFIG_SET_FLAG       = 0,    /* Set parameter, supplying no value */
+	FSCONFIG_SET_STRING     = 1,    /* Set parameter, supplying a string value */
+	FSCONFIG_SET_BINARY     = 2,    /* Set parameter, supplying a binary blob value */
+	FSCONFIG_SET_PATH       = 3,    /* Set parameter, supplying an object by path */
+	FSCONFIG_SET_PATH_EMPTY = 4,    /* Set parameter, supplying an object by (empty) path */
+	FSCONFIG_SET_FD         = 5,    /* Set parameter, supplying an object by fd */
+	FSCONFIG_CMD_CREATE     = 6,    /* Invoke superblock creation */
+	FSCONFIG_CMD_RECONFIGURE = 7,   /* Invoke superblock reconfiguration */
+};
+
+static inline int sys_fsopen(const char *fsname, unsigned flags)
+{
+	return syscall(__NR_fsopen, fsname, flags);
+}
+
+static inline int sys_fsconfig(int fs_fd, unsigned cmd, const char *key, const void *val, int aux)
+{
+	return syscall(__NR_fsconfig, fs_fd, cmd, key, val, aux);
+}
+
+static inline int sys_fsmount(int fs_fd, unsigned flags, unsigned ms_flags)
+{
+	return syscall(__NR_fsmount, fs_fd, flags, ms_flags);
+}
+
+static int drop_priv_caps(__u64 *old_caps)
+{
+	return cap_disable_effective((1ULL << CAP_BPF) |
+				     (1ULL << CAP_PERFMON) |
+				     (1ULL << CAP_NET_ADMIN) |
+				     (1ULL << CAP_SYS_ADMIN), old_caps);
+}
+
+static int restore_priv_caps(__u64 old_caps)
+{
+	return cap_enable_effective(old_caps, NULL);
+}
+
+static int set_delegate_mask(int fs_fd, const char *key, __u64 mask)
+{
+	char buf[32];
+	int err;
+
+	snprintf(buf, sizeof(buf), "0x%llx", (unsigned long long)mask);
+	err = sys_fsconfig(fs_fd, FSCONFIG_SET_STRING, key,
+			   mask == ~0ULL ? "any" : buf, 0);
+	if (err < 0)
+		err = -errno;
+	return err;
+}
+
+#define zclose(fd) do { if (fd >= 0) close(fd); fd = -1; } while (0)
+
+struct bpffs_opts {
+	__u64 cmds;
+	__u64 maps;
+	__u64 progs;
+	__u64 attachs;
+};
+
+static int setup_bpffs_fd(struct bpffs_opts *opts)
+{
+	int fs_fd = -1, err;
+
+	/* create VFS context */
+	fs_fd = sys_fsopen("bpf", 0);
+	if (!ASSERT_GE(fs_fd, 0, "fs_fd"))
+		goto cleanup;
+
+	/* set up token delegation mount options */
+	err = set_delegate_mask(fs_fd, "delegate_cmds", opts->cmds);
+	if (!ASSERT_OK(err, "fs_cfg_cmds"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_maps", opts->maps);
+	if (!ASSERT_OK(err, "fs_cfg_maps"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_progs", opts->progs);
+	if (!ASSERT_OK(err, "fs_cfg_progs"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_attachs", opts->attachs);
+	if (!ASSERT_OK(err, "fs_cfg_attachs"))
+		goto cleanup;
+
+	return fs_fd;
+cleanup:
+	zclose(fs_fd);
+	return -1;
+}
+
+static int materialize_bpffs_fd(int fs_fd)
+{
+	int mnt_fd, err;
+
+	/* instantiate FS object */
+	err = sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+	if (err < 0)
+		return -errno;
+
+	/* create O_PATH fd for detached mount */
+	mnt_fd = sys_fsmount(fs_fd, 0, 0);
+	if (err < 0)
+		return -errno;
+
+	return mnt_fd;
+}
+
+/* send FD over Unix domain (AF_UNIX) socket */
+static int sendfd(int sockfd, int fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1] = { fd }, err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+	cmsg = CMSG_FIRSTHDR(&msg);
+	cmsg->cmsg_level = SOL_SOCKET;
+	cmsg->cmsg_type = SCM_RIGHTS;
+	cmsg->cmsg_len = CMSG_LEN(sizeof(fds));
+	memcpy(CMSG_DATA(cmsg), fds, sizeof(fds));
+
+	err = sendmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "sendmsg"))
+		return -EINVAL;
+
+	return 0;
+}
+
+/* receive FD over Unix domain (AF_UNIX) socket */
+static int recvfd(int sockfd, int *fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1], err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+
+	err = recvmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "recvmsg"))
+		return -EINVAL;
+
+	cmsg = CMSG_FIRSTHDR(&msg);
+	if (!ASSERT_OK_PTR(cmsg, "cmsg_null") ||
+	    !ASSERT_EQ(cmsg->cmsg_len, CMSG_LEN(sizeof(fds)), "cmsg_len") ||
+	    !ASSERT_EQ(cmsg->cmsg_level, SOL_SOCKET, "cmsg_level") ||
+	    !ASSERT_EQ(cmsg->cmsg_type, SCM_RIGHTS, "cmsg_type"))
+		return -EINVAL;
+
+	memcpy(fds, CMSG_DATA(cmsg), sizeof(fds));
+	*fd = fds[0];
+
+	return 0;
+}
+
+static ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = write(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+static int write_file(const char *path, const void *buf, size_t count)
+{
+	int fd;
+	ssize_t ret;
+
+	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
+	if (fd < 0)
+		return -1;
+
+	ret = write_nointr(fd, buf, count);
+	close(fd);
+	if (ret < 0 || (size_t)ret != count)
+		return -1;
+
+	return 0;
+}
+
+static int create_and_enter_userns(void)
+{
+	uid_t uid;
+	gid_t gid;
+	char map[100];
+
+	uid = getuid();
+	gid = getgid();
+
+	if (unshare(CLONE_NEWUSER))
+		return -1;
+
+	if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) &&
+	    errno != ENOENT)
+		return -1;
+
+	snprintf(map, sizeof(map), "0 %d 1", uid);
+	if (write_file("/proc/self/uid_map", map, strlen(map)))
+		return -1;
+
+
+	snprintf(map, sizeof(map), "0 %d 1", gid);
+	if (write_file("/proc/self/gid_map", map, strlen(map)))
+		return -1;
+
+	if (setgid(0))
+		return -1;
+
+	if (setuid(0))
+		return -1;
+
+	return 0;
+}
+
+typedef int (*child_callback_fn)(int);
+
+static void child(int sock_fd, struct bpffs_opts *bpffs_opts, child_callback_fn callback)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int mnt_fd = -1, fs_fd = -1, err = 0;
+
+	/* setup userns with root mappings */
+	err = create_and_enter_userns();
+	if (!ASSERT_OK(err, "create_and_enter_userns"))
+		goto cleanup;
+
+	/* setup mountns to allow creating BPF FS (fsopen("bpf")) from unpriv process */
+	err = unshare(CLONE_NEWNS);
+	if (!ASSERT_OK(err, "create_mountns"))
+		goto cleanup;
+
+	err = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
+	if (!ASSERT_OK(err, "remount_root"))
+		goto cleanup;
+
+	fs_fd = setup_bpffs_fd(bpffs_opts);
+	if (!ASSERT_GE(fs_fd, 0, "setup_bpffs")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, fs_fd);
+	if (!ASSERT_OK(err, "send_fs_fd"))
+		goto cleanup;
+
+	/* avoid mucking around with mount namespaces and mounting at
+	 * well-known path, just get detach-mounted BPF FS fd back from parent
+	 */
+	err = recvfd(sock_fd, &mnt_fd);
+	if (!ASSERT_OK(err, "recv_mnt_fd"))
+		goto cleanup;
+
+	/* do custom test logic with customly set up BPF FS instance */
+	err = callback(mnt_fd);
+	if (!ASSERT_OK(err, "test_callback"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	zclose(sock_fd);
+	zclose(mnt_fd);
+
+	exit(-err);
+}
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static void parent(int child_pid, int sock_fd)
+{
+	int fs_fd = -1, mnt_fd = -1, err;
+
+	err = recvfd(sock_fd, &fs_fd);
+	if (!ASSERT_OK(err, "recv_bpffs_fd"))
+		goto cleanup;
+
+	mnt_fd = materialize_bpffs_fd(fs_fd);
+	if (!ASSERT_GE(mnt_fd, 0, "materialize_bpffs_fd")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	zclose(fs_fd);
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, mnt_fd);
+	if (!ASSERT_OK(err, "send_mnt_fd"))
+		goto cleanup;
+	zclose(mnt_fd);
+
+	err = wait_for_pid(child_pid);
+	ASSERT_OK(err, "waitpid_child");
+
+cleanup:
+	zclose(sock_fd);
+	zclose(fs_fd);
+	zclose(mnt_fd);
+
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static void subtest_userns(struct bpffs_opts *bpffs_opts, child_callback_fn cb)
+{
+	int sock_fds[2] = { -1, -1 };
+	int child_pid, err;
+
+	err = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fds);
+	if (!ASSERT_OK(err, "socketpair"))
+		goto cleanup;
+
+	child_pid = fork();
+	if (!ASSERT_GE(child_pid, 0, "fork"))
+		goto cleanup;
+
+	if (child_pid == 0) {
+		zclose(sock_fds[0]);
+		return child(sock_fds[1], bpffs_opts, cb);
+
+	} else {
+		zclose(sock_fds[1]);
+		return parent(child_pid, sock_fds[0]);
+	}
+
+cleanup:
+	zclose(sock_fds[0]);
+	zclose(sock_fds[1]);
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static int userns_map_create(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int err, token_fd = -1, map_fd = -1;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no token, no CAP_BPF -> fail */
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* token without CAP_BPF -> fail */
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_w_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* CAP_BPF without token -> fail */
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_w_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* finally, namespaced CAP_BPF + token -> success */
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_GT(map_fd, 0, "stack_map_w_token_w_cap_bpf")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+cleanup:
+	zclose(token_fd);
+	zclose(map_fd);
+	return err;
+}
+
+static int userns_btf_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_btf_load_opts, btf_opts);
+	int err, token_fd = -1, btf_fd = -1;
+	const void *raw_btf_data;
+	struct btf *btf = NULL;
+	__u32 raw_btf_size;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* setup a trivial BTF data to load to the kernel */
+	btf = btf__new_empty();
+	if (!ASSERT_OK_PTR(btf, "empty_btf"))
+		goto cleanup;
+
+	ASSERT_GT(btf__add_int(btf, "int", 4, 0), 0, "int_type");
+
+	raw_btf_data = btf__raw_data(btf, &raw_btf_size);
+	if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data"))
+		goto cleanup;
+
+	/* no token + no CAP_BPF -> failure */
+	btf_opts.token_fd = 0;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "no_token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* token + no CAP_BPF -> failure */
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* token + CAP_BPF -> success */
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_GT(btf_fd, 0, "token_and_cap_success"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	btf__free(btf);
+	zclose(btf_fd);
+	zclose(token_fd);
+	return err;
+}
+
+static int userns_prog_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, prog_opts);
+	int err, token_fd = -1, prog_fd = -1;
+	struct bpf_insn insns[] = {
+		/* bpf_jiffies64() requires CAP_BPF */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		/* bpf_get_current_task() requires CAP_PERFMON */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_current_task),
+		/* r0 = 0; exit; */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	size_t insn_cnt = ARRAY_SIZE(insns);
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* validate we can successfully load BPF program with token; this
+	 * being XDP program (CAP_NET_ADMIN) using bpf_jiffies64() (CAP_BPF)
+	 * and bpf_get_current_task() (CAP_PERFMON) helpers validates we have
+	 * BPF token wired properly in a bunch of places in the kernel
+	 */
+	prog_opts.token_fd = token_fd;
+	prog_opts.expected_attach_type = BPF_XDP;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_GT(prog_fd, 0, "prog_fd"))
+		goto cleanup;
+
+	/* no token + caps -> failure */
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm"))
+		goto cleanup;
+
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no caps + token -> failure */
+	prog_opts.token_fd = token_fd;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm"))
+		goto cleanup;
+
+	/* no caps + no token -> definitely a failure */
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	zclose(prog_fd);
+	zclose(token_fd);
+	return err;
+}
+
+void test_token(void)
+{
+	if (test__start_subtest("map_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_MAP_CREATE,
+			.maps = 1ULL << BPF_MAP_TYPE_STACK,
+		};
+
+		subtest_userns(&opts, userns_map_create);
+	}
+	if (test__start_subtest("btf_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_BTF_LOAD,
+		};
+
+		subtest_userns(&opts, userns_btf_load);
+	}
+	if (test__start_subtest("prog_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_PROG_LOAD,
+			.progs = 1ULL << BPF_PROG_TYPE_XDP,
+			.attachs = 1ULL << BPF_XDP,
+		};
+
+		subtest_userns(&opts, userns_prog_load);
+	}
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 bpf-next 08/12] libbpf: add bpf_token_create() API
  2023-09-12 21:29 ` [PATCH v4 bpf-next 08/12] libbpf: add bpf_token_create() API Andrii Nakryiko
@ 2023-09-12 21:42   ` Andrii Nakryiko
  0 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-fsdevel, linux-security-module, keescook, brauner,
	lennart, kernel-team, sargun

On Tue, Sep 12, 2023 at 2:30 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/bpf.c      | 19 +++++++++++++++++++
>  tools/lib/bpf/bpf.h      | 29 +++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.map |  1 +
>  3 files changed, 49 insertions(+)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index b0f1913763a3..593ff9ea120d 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -1271,3 +1271,22 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
>         ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz);
>         return libbpf_err_errno(ret);
>  }
> +
> +int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
> +                    struct bpf_token_create_opts *opts)
> +{
> +       const size_t attr_sz = offsetofend(union bpf_attr, token_create);
> +       union bpf_attr attr;
> +       int fd;
> +
> +       if (!OPTS_VALID(opts, bpf_token_create_opts))
> +               return libbpf_err(-EINVAL);
> +
> +       memset(&attr, 0, attr_sz);
> +       attr.token_create.bpffs_path_fd = bpffs_path_fd;
> +       attr.token_create.bpffs_pathname = ptr_to_u64(bpffs_pathname);
> +       attr.token_create.flags = OPTS_GET(opts, flags, 0);
> +
> +       fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
> +       return libbpf_err_errno(fd);
> +}
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 74c2887cfd24..16d5c257066c 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -635,6 +635,35 @@ struct bpf_test_run_opts {
>  LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
>                                       struct bpf_test_run_opts *opts);
>
> +struct bpf_token_create_opts {
> +       size_t sz; /* size of this struct for forward/backward compatibility */
> +       __u32 flags;
> +       size_t :0;
> +};
> +#define bpf_token_create_opts__last_field flags
> +
> +/**
> + * @brief **bpf_token_create()** creates a new instance of BPF token, pinning
> + * it at the specified location in BPF FS.
> + *
> + * BPF token created and pinned with this API can be subsequently opened using
> + * bpf_obj_get() API to obtain FD that can be passed to bpf() syscall for
> + * commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
> + *
> + * @param pin_path_fd O_PATH FD (see man 2 openat() for semantics) specifying,
> + * in combination with *pin_pathname*, target location in BPF FS at which to
> + * create and pin BPF token.
> + * @param pin_pathname absolute or relative path specifying, in combination
> + * with *pin_path_fd*, specifying in combination with *pin_path_fd*, target
> + * location in BPF FS at which to create and pin BPF token.
> + * @param opts optional BPF token creation options, can be NULL
> + *

this description is obviously outdated (there is no pinning involved
anymore) and I just realized after sending patches out, I'll fix it
for next revision


> + * @return 0, on success; negative error code, otherwise (errno is also set to
> + * the error code)
> + */
> +LIBBPF_API int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
> +                               struct bpf_token_create_opts *opts);
> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 57712321490f..c45c28a5e14c 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -400,4 +400,5 @@ LIBBPF_1.3.0 {
>                 bpf_program__attach_netfilter;
>                 bpf_program__attach_tcx;
>                 bpf_program__attach_uprobe_multi;
> +               bpf_token_create;
>  } LIBBPF_1.2.0;
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 bpf-next 02/12] bpf: introduce BPF token object
  2023-09-12 21:28 ` [PATCH v4 bpf-next 02/12] bpf: introduce BPF token object Andrii Nakryiko
@ 2023-09-12 21:46   ` Andrii Nakryiko
  2023-09-13 21:46   ` [PATCH v4 2/12] " Paul Moore
  1 sibling, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 21:46 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-fsdevel, linux-security-module, keescook, brauner,
	lennart, kernel-team, sargun

On Tue, Sep 12, 2023 at 2:30 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add new kind of BPF kernel object, BPF token. BPF token is meant to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while have a good amount of control over which
> privileged operations could be performed using provided BPF token.
>
> This is achieved through mounting BPF FS instance with extra delegation
> mount options, which determine what operations are delegatable, and also
> constraining it to the owning user namespace (as mentioned in the
> previous patch).
>
> BPF token itself is just a derivative from BPF FS and can be created
> through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> a path specification (using the usual fd + string path combo) to a BPF
> FS mount. Currently, BPF token "inherits" delegated command, map types,
> prog type, and attach type bit sets from BPF FS as is. In the future,
> having an BPF token as a separate object with its own FD, we can allow
> to further restrict BPF token's allowable set of things either at the creation
> time or after the fact, allowing the process to guard itself further
> from, e.g., unintentionally trying to load undesired kind of BPF
> programs. But for now we keep things simple and just copy bit sets as is.
>
> When BPF token is created from BPF FS mount, we take reference to the
> BPF super block's owning user namespace, and then use that namespace for
> checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> capabilities that are normally only checked against init userns (using
> capable()), but now we check them using ns_capable() instead (if BPF
> token is provided). See bpf_token_capable() for details.
>
> Such setup means that BPF token in itself is not sufficient to grant BPF
> functionality. User namespaced process has to *also* have necessary
> combination of capabilities inside that user namespace. So while
> previously CAP_BPF was useless when granted within user namespace, now
> it gains a meaning and allows container managers and sys admins to have
> a flexible control over which processes can and need to use BPF
> functionality within the user namespace (i.e., container in practice).
> And BPF FS delegation mount options and derived BPF tokens serve as
> a per-container "flag" to grant overall ability to use bpf() (plus further
> restrict on which parts of bpf() syscalls are treated as namespaced).
>
> The alternative to creating BPF token object was:
>   a) not having any extra object and just pasing BPF FS path to each
>      relevant bpf() command. This seems suboptimal as it's racy (mount
>      under the same path might change in between checking it and using it
>      for bpf() command). And also less flexible if we'd like to further
>      restrict ourselves compared to all the delegated functionality
>      allowed on BPF FS.
>   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
>      a dedicated FD that would represent a token-like functionality. This
>      doesn't seem superior to having a proper bpf() command, so
>      BPF_TOKEN_CREATE was chosen.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h            |  36 +++++++
>  include/uapi/linux/bpf.h       |  39 +++++++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/inode.c             |   4 +-
>  kernel/bpf/syscall.c           |  17 +++
>  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  39 +++++++
>  7 files changed, 324 insertions(+), 2 deletions(-)
>  create mode 100644 kernel/bpf/token.c
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e9a3ab390844..6abd2b96e096 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -51,6 +51,8 @@ struct module;
>  struct bpf_func_state;
>  struct ftrace_ops;
>  struct cgroup;
> +struct bpf_token;
> +struct user_namespace;
>
>  extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
> @@ -1568,6 +1570,13 @@ struct bpf_mount_opts {
>         u64 delegate_attachs;
>  };
>
> +struct bpf_token {
> +       struct work_struct work;
> +       atomic64_t refcnt;
> +       struct user_namespace *userns;
> +       u64 allowed_cmds;
> +};
> +
>  struct bpf_struct_ops_value;
>  struct btf_member;
>
> @@ -2192,6 +2201,15 @@ int bpf_link_new_fd(struct bpf_link *link);
>  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
>  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
>
> +void bpf_token_inc(struct bpf_token *token);
> +void bpf_token_put(struct bpf_token *token);
> +int bpf_token_create(union bpf_attr *attr);
> +int bpf_token_new_fd(struct bpf_token *token);
> +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> +
> +bool bpf_token_capable(const struct bpf_token *token, int cap);
> +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> +
>  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
>  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
>
> @@ -2551,6 +2569,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
>         return -EOPNOTSUPP;
>  }
>
> +static inline void bpf_token_inc(struct bpf_token *token)
> +{
> +}
> +
> +static inline void bpf_token_put(struct bpf_token *token)
> +{
> +}
> +
> +static inline int bpf_token_new_fd(struct bpf_token *token)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> +{
> +       return ERR_PTR(-EOPNOTSUPP);
> +}
> +
>  static inline void __dev_flush(void)
>  {
>  }
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 73b155e52204..36e98c6f8944 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -847,6 +847,37 @@ union bpf_iter_link_info {
>   *             Returns zero on success. On error, -1 is returned and *errno*
>   *             is set appropriately.
>   *
> + * BPF_TOKEN_CREATE
> + *     Description
> + *             Create BPF token with embedded information about what
> + *             BPF-related functionality it allows:
> + *             - a set of allowed bpf() syscall commands;
> + *             - a set of allowed BPF map types to be created with
> + *             BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
> + *             - a set of allowed BPF program types and BPF program attach
> + *             types to be loaded with BPF_PROG_LOAD command, if
> + *             BPF_PROG_LOAD itself is allowed.
> + *
> + *             BPF token is created (derived) from an instance of BPF FS,
> + *             assuming it has necessary delegation mount options specified.
> + *             BPF FS mount is specified with openat()-style path FD + string.
> + *             This BPF token can be passed as an extra parameter to various
> + *             bpf() syscall commands to grant BPF subsystem functionality to
> + *             unprivileged processes.
> + *
> + *             When created, BPF token is "associated" with the owning
> + *             user namespace of BPF FS instance (super block) that it was
> + *             derived from, and subsequent BPF operations performed with
> + *             BPF token would be performing capabilities checks (i.e.,
> + *             CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
> + *             that user namespace. Without BPF token, such capabilities
> + *             have to be granted in init user namespace, making bpf()
> + *             syscall incompatible with user namespace, for the most part.
> + *
> + *     Return
> + *             A new file descriptor (a nonnegative integer), or -1 if an
> + *             error occurred (in which case, *errno* is set appropriately).
> + *
>   * NOTES
>   *     eBPF objects (maps and programs) can be shared between processes.
>   *
> @@ -901,6 +932,8 @@ enum bpf_cmd {
>         BPF_ITER_CREATE,
>         BPF_LINK_DETACH,
>         BPF_PROG_BIND_MAP,
> +       BPF_TOKEN_CREATE,
> +       __MAX_BPF_CMD,
>  };
>
>  enum bpf_map_type {
> @@ -1694,6 +1727,12 @@ union bpf_attr {
>                 __u32           flags;          /* extra flags */
>         } prog_bind_map;
>
> +       struct { /* struct used by BPF_TOKEN_CREATE command */
> +               __u32           flags;
> +               __u32           bpffs_path_fd;
> +               __u64           bpffs_pathname;
> +       } token_create;
> +
>  } __attribute__((aligned(8)));
>
>  /* The description below is an attempt at providing documentation to eBPF
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index f526b7573e97..4ce95acfcaa7 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
>  endif
>  CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
>
> -obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
> +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
>  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
>  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 8f66b57d3546..82f11fbffd3e 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -603,11 +603,13 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
>  {
>         struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
>         umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
> +       u64 mask;
>
>         if (mode != S_IRWXUGO)
>                 seq_printf(m, ",mode=%o", mode);
>
> -       if (opts->delegate_cmds == ~0ULL)
> +       mask = (1ULL << __MAX_BPF_CMD) - 1;
> +       if ((opts->delegate_cmds & mask) == mask)
>                 seq_printf(m, ",delegate_cmds=any");
>         else if (opts->delegate_cmds)
>                 seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 6a692f3bea15..4fae678c1f48 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -5297,6 +5297,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
>         return ret;
>  }
>
> +#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_pathname
> +
> +static int token_create(union bpf_attr *attr)
> +{
> +       if (CHECK_ATTR(BPF_TOKEN_CREATE))
> +               return -EINVAL;
> +
> +       /* no flags are supported yet */
> +       if (attr->token_create.flags)
> +               return -EINVAL;

A question to people looking at this: should BPF_TOKEN_CREATE be
guarded with ns_capable(CAP_BPF) or it's fine to rely on FS
permissions alone for this? It can't be capable(CAP_BPF), obviously,
but having it guarded by ns_capable(CAP_BPF) would make it impossible
to even construct a BPF token without having namespaced CAP_BPF inside
the container.

> +
> +       return bpf_token_create(attr);
> +}
> +
>  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
>  {
>         union bpf_attr attr;
> @@ -5430,6 +5444,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
>         case BPF_PROG_BIND_MAP:
>                 err = bpf_prog_bind_map(&attr);
>                 break;
> +       case BPF_TOKEN_CREATE:
> +               err = token_create(&attr);
> +               break;
>         default:
>                 err = -EINVAL;
>                 break;
> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> new file mode 100644
> index 000000000000..f6ea3eddbee6
> --- /dev/null
> +++ b/kernel/bpf/token.c
> @@ -0,0 +1,189 @@
> +#include <linux/bpf.h>
> +#include <linux/vmalloc.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/fdtable.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/kernel.h>
> +#include <linux/idr.h>
> +#include <linux/namei.h>
> +#include <linux/user_namespace.h>
> +
> +bool bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +       /* BPF token allows ns_capable() level of capabilities */
> +       if (token) {
> +               if (ns_capable(token->userns, cap))
> +                       return true;
> +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> +                       return true;
> +       }
> +       /* otherwise fallback to capable() checks */
> +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +}
> +
> +void bpf_token_inc(struct bpf_token *token)
> +{
> +       atomic64_inc(&token->refcnt);
> +}
> +
> +static void bpf_token_free(struct bpf_token *token)
> +{
> +       put_user_ns(token->userns);
> +       kvfree(token);
> +}
> +
> +static void bpf_token_put_deferred(struct work_struct *work)
> +{
> +       struct bpf_token *token = container_of(work, struct bpf_token, work);
> +
> +       bpf_token_free(token);
> +}
> +
> +void bpf_token_put(struct bpf_token *token)
> +{
> +       if (!token)
> +               return;
> +
> +       if (!atomic64_dec_and_test(&token->refcnt))
> +               return;
> +
> +       INIT_WORK(&token->work, bpf_token_put_deferred);
> +       schedule_work(&token->work);
> +}
> +
> +static int bpf_token_release(struct inode *inode, struct file *filp)
> +{
> +       struct bpf_token *token = filp->private_data;
> +
> +       bpf_token_put(token);
> +       return 0;
> +}
> +
> +static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz,
> +                             loff_t *ppos)
> +{
> +       /* We need this handler such that alloc_file() enables
> +        * f_mode with FMODE_CAN_READ.
> +        */
> +       return -EINVAL;
> +}
> +
> +static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
> +                              size_t siz, loff_t *ppos)
> +{
> +       /* We need this handler such that alloc_file() enables
> +        * f_mode with FMODE_CAN_WRITE.
> +        */
> +       return -EINVAL;
> +}
> +
> +static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
> +{
> +       struct bpf_token *token = filp->private_data;
> +       u64 mask;
> +
> +       mask = (1ULL << __MAX_BPF_CMD) - 1;
> +       if ((token->allowed_cmds & mask) == mask)
> +               seq_printf(m, "allowed_cmds:\tany\n");
> +       else
> +               seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
> +}
> +
> +static const struct file_operations bpf_token_fops = {
> +       .release        = bpf_token_release,
> +       .read           = bpf_dummy_read,
> +       .write          = bpf_dummy_write,
> +       .show_fdinfo    = bpf_token_show_fdinfo,
> +};
> +
> +static struct bpf_token *bpf_token_alloc(void)
> +{
> +       struct bpf_token *token;
> +
> +       token = kvzalloc(sizeof(*token), GFP_USER);
> +       if (!token)
> +               return NULL;
> +
> +       atomic64_set(&token->refcnt, 1);
> +
> +       return token;
> +}
> +
> +int bpf_token_create(union bpf_attr *attr)
> +{
> +       struct path path;
> +       struct bpf_mount_opts *mnt_opts;
> +       struct bpf_token *token;
> +       int ret;
> +
> +       ret = user_path_at(attr->token_create.bpffs_path_fd,
> +                          u64_to_user_ptr(attr->token_create.bpffs_pathname),
> +                          LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> +       if (ret)
> +               return ret;
> +
> +       if (path.mnt->mnt_root != path.dentry) {
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +       ret = path_permission(&path, MAY_ACCESS);
> +       if (ret)
> +               goto out;
> +
> +       token = bpf_token_alloc();
> +       if (!token) {
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       /* remember bpffs owning userns for future ns_capable() checks */
> +       token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> +
> +       mnt_opts = path.dentry->d_sb->s_fs_info;
> +       token->allowed_cmds = mnt_opts->delegate_cmds;
> +
> +       ret = bpf_token_new_fd(token);
> +       if (ret < 0)
> +               bpf_token_free(token);
> +out:
> +       path_put(&path);
> +       return ret;
> +}
> +
> +#define BPF_TOKEN_INODE_NAME "bpf-token"
> +
> +/* Alloc anon_inode and FD for prepared token.
> + * Returns fd >= 0 on success; negative error, otherwise.
> + */
> +int bpf_token_new_fd(struct bpf_token *token)
> +{
> +       return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> +}
> +
> +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> +{
> +       struct fd f = fdget(ufd);
> +       struct bpf_token *token;
> +
> +       if (!f.file)
> +               return ERR_PTR(-EBADF);
> +       if (f.file->f_op != &bpf_token_fops) {
> +               fdput(f);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       token = f.file->private_data;
> +       bpf_token_inc(token);
> +       fdput(f);
> +
> +       return token;
> +}
> +
> +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> +{
> +       if (!token)
> +               return false;
> +
> +       return token->allowed_cmds & (1ULL << cmd);
> +}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 73b155e52204..36e98c6f8944 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -847,6 +847,37 @@ union bpf_iter_link_info {
>   *             Returns zero on success. On error, -1 is returned and *errno*
>   *             is set appropriately.
>   *
> + * BPF_TOKEN_CREATE
> + *     Description
> + *             Create BPF token with embedded information about what
> + *             BPF-related functionality it allows:
> + *             - a set of allowed bpf() syscall commands;
> + *             - a set of allowed BPF map types to be created with
> + *             BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
> + *             - a set of allowed BPF program types and BPF program attach
> + *             types to be loaded with BPF_PROG_LOAD command, if
> + *             BPF_PROG_LOAD itself is allowed.
> + *
> + *             BPF token is created (derived) from an instance of BPF FS,
> + *             assuming it has necessary delegation mount options specified.
> + *             BPF FS mount is specified with openat()-style path FD + string.
> + *             This BPF token can be passed as an extra parameter to various
> + *             bpf() syscall commands to grant BPF subsystem functionality to
> + *             unprivileged processes.
> + *
> + *             When created, BPF token is "associated" with the owning
> + *             user namespace of BPF FS instance (super block) that it was
> + *             derived from, and subsequent BPF operations performed with
> + *             BPF token would be performing capabilities checks (i.e.,
> + *             CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
> + *             that user namespace. Without BPF token, such capabilities
> + *             have to be granted in init user namespace, making bpf()
> + *             syscall incompatible with user namespace, for the most part.
> + *
> + *     Return
> + *             A new file descriptor (a nonnegative integer), or -1 if an
> + *             error occurred (in which case, *errno* is set appropriately).
> + *
>   * NOTES
>   *     eBPF objects (maps and programs) can be shared between processes.
>   *
> @@ -901,6 +932,8 @@ enum bpf_cmd {
>         BPF_ITER_CREATE,
>         BPF_LINK_DETACH,
>         BPF_PROG_BIND_MAP,
> +       BPF_TOKEN_CREATE,
> +       __MAX_BPF_CMD,
>  };
>
>  enum bpf_map_type {
> @@ -1694,6 +1727,12 @@ union bpf_attr {
>                 __u32           flags;          /* extra flags */
>         } prog_bind_map;
>
> +       struct { /* struct used by BPF_TOKEN_CREATE command */
> +               __u32           flags;
> +               __u32           bpffs_path_fd;
> +               __u64           bpffs_pathname;
> +       } token_create;
> +
>  } __attribute__((aligned(8)));
>
>  /* The description below is an attempt at providing documentation to eBPF
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos
  2023-09-12 21:29 ` [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
@ 2023-09-13  9:45   ` kernel test robot
  2023-09-13 18:41   ` kernel test robot
  1 sibling, 0 replies; 28+ messages in thread
From: kernel test robot @ 2023-09-13  9:45 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf
  Cc: llvm, oe-kbuild-all, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun

Hi Andrii,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-add-BPF-token-delegation-mount-options-to-BPF-FS/20230913-053240
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230912212906.3975866-7-andrii%40kernel.org
patch subject: [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos
config: um-allyesconfig (https://download.01.org/0day-ci/archive/20230913/202309131744.fLl0eeCO-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/202309131744.fLl0eeCO-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309131744.fLl0eeCO-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from net/core/filter.c:21:
   In file included from include/linux/bpf_verifier.h:7:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from net/core/filter.c:21:
   In file included from include/linux/bpf_verifier.h:7:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from net/core/filter.c:21:
   In file included from include/linux/bpf_verifier.h:7:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> net/core/filter.c:11721:7: error: implicit declaration of function 'bpf_token_capable' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
           if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
                ^
   12 warnings and 1 error generated.


vim +/bpf_token_capable +11721 net/core/filter.c

 11687	
 11688	static const struct bpf_func_proto *
 11689	bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 11690	{
 11691		const struct bpf_func_proto *func;
 11692	
 11693		switch (func_id) {
 11694		case BPF_FUNC_skc_to_tcp6_sock:
 11695			func = &bpf_skc_to_tcp6_sock_proto;
 11696			break;
 11697		case BPF_FUNC_skc_to_tcp_sock:
 11698			func = &bpf_skc_to_tcp_sock_proto;
 11699			break;
 11700		case BPF_FUNC_skc_to_tcp_timewait_sock:
 11701			func = &bpf_skc_to_tcp_timewait_sock_proto;
 11702			break;
 11703		case BPF_FUNC_skc_to_tcp_request_sock:
 11704			func = &bpf_skc_to_tcp_request_sock_proto;
 11705			break;
 11706		case BPF_FUNC_skc_to_udp6_sock:
 11707			func = &bpf_skc_to_udp6_sock_proto;
 11708			break;
 11709		case BPF_FUNC_skc_to_unix_sock:
 11710			func = &bpf_skc_to_unix_sock_proto;
 11711			break;
 11712		case BPF_FUNC_skc_to_mptcp_sock:
 11713			func = &bpf_skc_to_mptcp_sock_proto;
 11714			break;
 11715		case BPF_FUNC_ktime_get_coarse_ns:
 11716			return &bpf_ktime_get_coarse_ns_proto;
 11717		default:
 11718			return bpf_base_func_proto(func_id, prog);
 11719		}
 11720	
 11721		if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 11722			return NULL;
 11723	
 11724		return func;
 11725	}
 11726	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos
  2023-09-12 21:29 ` [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
  2023-09-13  9:45   ` kernel test robot
@ 2023-09-13 18:41   ` kernel test robot
  1 sibling, 0 replies; 28+ messages in thread
From: kernel test robot @ 2023-09-13 18:41 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf
  Cc: llvm, oe-kbuild-all, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun

Hi Andrii,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-add-BPF-token-delegation-mount-options-to-BPF-FS/20230913-053240
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230912212906.3975866-7-andrii%40kernel.org
patch subject: [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos
config: i386-randconfig-r015-20230913 (https://download.01.org/0day-ci/archive/20230914/202309140202.lwVDn4bK-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230914/202309140202.lwVDn4bK-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309140202.lwVDn4bK-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/core/filter.c:11721:7: error: call to undeclared function 'bpf_token_capable'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
                ^
   1 error generated.


vim +/bpf_token_capable +11721 net/core/filter.c

 11687	
 11688	static const struct bpf_func_proto *
 11689	bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 11690	{
 11691		const struct bpf_func_proto *func;
 11692	
 11693		switch (func_id) {
 11694		case BPF_FUNC_skc_to_tcp6_sock:
 11695			func = &bpf_skc_to_tcp6_sock_proto;
 11696			break;
 11697		case BPF_FUNC_skc_to_tcp_sock:
 11698			func = &bpf_skc_to_tcp_sock_proto;
 11699			break;
 11700		case BPF_FUNC_skc_to_tcp_timewait_sock:
 11701			func = &bpf_skc_to_tcp_timewait_sock_proto;
 11702			break;
 11703		case BPF_FUNC_skc_to_tcp_request_sock:
 11704			func = &bpf_skc_to_tcp_request_sock_proto;
 11705			break;
 11706		case BPF_FUNC_skc_to_udp6_sock:
 11707			func = &bpf_skc_to_udp6_sock_proto;
 11708			break;
 11709		case BPF_FUNC_skc_to_unix_sock:
 11710			func = &bpf_skc_to_unix_sock_proto;
 11711			break;
 11712		case BPF_FUNC_skc_to_mptcp_sock:
 11713			func = &bpf_skc_to_mptcp_sock_proto;
 11714			break;
 11715		case BPF_FUNC_ktime_get_coarse_ns:
 11716			return &bpf_ktime_get_coarse_ns_proto;
 11717		default:
 11718			return bpf_base_func_proto(func_id, prog);
 11719		}
 11720	
 11721		if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 11722			return NULL;
 11723	
 11724		return func;
 11725	}
 11726	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-12 21:28 ` [PATCH v4 bpf-next 02/12] bpf: introduce BPF token object Andrii Nakryiko
  2023-09-12 21:46   ` Andrii Nakryiko
@ 2023-09-13 21:46   ` Paul Moore
  2023-09-14 17:31     ` Andrii Nakryiko
  1 sibling, 1 reply; 28+ messages in thread
From: Paul Moore @ 2023-09-13 21:46 UTC (permalink / raw)
  To: Andrii Nakryiko, Andrii Nakryiko
  Cc: bpf, linux-fsdevel, linux-security-module, keescook, brauner,
	lennart, kernel-team, sargun

On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> Add new kind of BPF kernel object, BPF token. BPF token is meant to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while have a good amount of control over which
> privileged operations could be performed using provided BPF token.
> 
> This is achieved through mounting BPF FS instance with extra delegation
> mount options, which determine what operations are delegatable, and also
> constraining it to the owning user namespace (as mentioned in the
> previous patch).
> 
> BPF token itself is just a derivative from BPF FS and can be created
> through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> a path specification (using the usual fd + string path combo) to a BPF
> FS mount. Currently, BPF token "inherits" delegated command, map types,
> prog type, and attach type bit sets from BPF FS as is. In the future,
> having an BPF token as a separate object with its own FD, we can allow
> to further restrict BPF token's allowable set of things either at the creation
> time or after the fact, allowing the process to guard itself further
> from, e.g., unintentionally trying to load undesired kind of BPF
> programs. But for now we keep things simple and just copy bit sets as is.
> 
> When BPF token is created from BPF FS mount, we take reference to the
> BPF super block's owning user namespace, and then use that namespace for
> checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> capabilities that are normally only checked against init userns (using
> capable()), but now we check them using ns_capable() instead (if BPF
> token is provided). See bpf_token_capable() for details.
> 
> Such setup means that BPF token in itself is not sufficient to grant BPF
> functionality. User namespaced process has to *also* have necessary
> combination of capabilities inside that user namespace. So while
> previously CAP_BPF was useless when granted within user namespace, now
> it gains a meaning and allows container managers and sys admins to have
> a flexible control over which processes can and need to use BPF
> functionality within the user namespace (i.e., container in practice).
> And BPF FS delegation mount options and derived BPF tokens serve as
> a per-container "flag" to grant overall ability to use bpf() (plus further
> restrict on which parts of bpf() syscalls are treated as namespaced).
> 
> The alternative to creating BPF token object was:
>   a) not having any extra object and just pasing BPF FS path to each
>      relevant bpf() command. This seems suboptimal as it's racy (mount
>      under the same path might change in between checking it and using it
>      for bpf() command). And also less flexible if we'd like to further
>      restrict ourselves compared to all the delegated functionality
>      allowed on BPF FS.
>   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
>      a dedicated FD that would represent a token-like functionality. This
>      doesn't seem superior to having a proper bpf() command, so
>      BPF_TOKEN_CREATE was chosen.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h            |  36 +++++++
>  include/uapi/linux/bpf.h       |  39 +++++++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/inode.c             |   4 +-
>  kernel/bpf/syscall.c           |  17 +++
>  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  39 +++++++
>  7 files changed, 324 insertions(+), 2 deletions(-)
>  create mode 100644 kernel/bpf/token.c

...

> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> new file mode 100644
> index 000000000000..f6ea3eddbee6
> --- /dev/null
> +++ b/kernel/bpf/token.c
> @@ -0,0 +1,189 @@
> +#include <linux/bpf.h>
> +#include <linux/vmalloc.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/fdtable.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/kernel.h>
> +#include <linux/idr.h>
> +#include <linux/namei.h>
> +#include <linux/user_namespace.h>
> +
> +bool bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +	/* BPF token allows ns_capable() level of capabilities */
> +	if (token) {
> +		if (ns_capable(token->userns, cap))
> +			return true;
> +		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> +			return true;
> +	}
> +	/* otherwise fallback to capable() checks */
> +	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +}

While the above looks to be equivalent to the bpf_capable() function it
replaces, for callers checking CAP_BPF and CAP_SYS_ADMIN, I'm looking
quickly at patch 3/12 and this is also being used to replace a
capable(CAP_NET_ADMIN) call which results in a change in behavior.
The current code which performs a capable(CAP_NET_ADMIN) check cannot
be satisfied by CAP_SYS_ADMIN, but this patchset using
bpf_token_capable(token, CAP_NET_ADMIN) can be satisfied by either
CAP_NET_ADMIN or CAP_SYS_ADMIN.

It seems that while bpf_token_capable() can be used as a replacement
for bpf_capable(), it is not currently a suitable replacement for a
generic capable() call.  Perhaps this is intentional, but I didn't see
it mentioned in the commit description, or in the comments, and I
wanted to make sure it wasn't an oversight.

> +void bpf_token_inc(struct bpf_token *token)
> +{
> +	atomic64_inc(&token->refcnt);
> +}
> +
> +static void bpf_token_free(struct bpf_token *token)
> +{
> +	put_user_ns(token->userns);
> +	kvfree(token);
> +}
> +
> +static void bpf_token_put_deferred(struct work_struct *work)
> +{
> +	struct bpf_token *token = container_of(work, struct bpf_token, work);
> +
> +	bpf_token_free(token);
> +}
> +
> +void bpf_token_put(struct bpf_token *token)
> +{
> +	if (!token)
> +		return;
> +
> +	if (!atomic64_dec_and_test(&token->refcnt))
> +		return;
> +
> +	INIT_WORK(&token->work, bpf_token_put_deferred);
> +	schedule_work(&token->work);
> +}
> +
> +static int bpf_token_release(struct inode *inode, struct file *filp)
> +{
> +	struct bpf_token *token = filp->private_data;
> +
> +	bpf_token_put(token);
> +	return 0;
> +}
> +
> +static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz,
> +			      loff_t *ppos)
> +{
> +	/* We need this handler such that alloc_file() enables
> +	 * f_mode with FMODE_CAN_READ.
> +	 */
> +	return -EINVAL;
> +}
> +
> +static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
> +			       size_t siz, loff_t *ppos)
> +{
> +	/* We need this handler such that alloc_file() enables
> +	 * f_mode with FMODE_CAN_WRITE.
> +	 */
> +	return -EINVAL;
> +}
> +
> +static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
> +{
> +	struct bpf_token *token = filp->private_data;
> +	u64 mask;
> +
> +	mask = (1ULL << __MAX_BPF_CMD) - 1;
> +	if ((token->allowed_cmds & mask) == mask)
> +		seq_printf(m, "allowed_cmds:\tany\n");
> +	else
> +		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
> +}
> +
> +static const struct file_operations bpf_token_fops = {
> +	.release	= bpf_token_release,
> +	.read		= bpf_dummy_read,
> +	.write		= bpf_dummy_write,
> +	.show_fdinfo	= bpf_token_show_fdinfo,
> +};
> +
> +static struct bpf_token *bpf_token_alloc(void)
> +{
> +	struct bpf_token *token;
> +
> +	token = kvzalloc(sizeof(*token), GFP_USER);
> +	if (!token)
> +		return NULL;
> +
> +	atomic64_set(&token->refcnt, 1);
> +
> +	return token;
> +}
> +
> +int bpf_token_create(union bpf_attr *attr)
> +{
> +	struct path path;
> +	struct bpf_mount_opts *mnt_opts;
> +	struct bpf_token *token;
> +	int ret;
> +
> +	ret = user_path_at(attr->token_create.bpffs_path_fd,
> +			   u64_to_user_ptr(attr->token_create.bpffs_pathname),
> +			   LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> +	if (ret)
> +		return ret;
> +
> +	if (path.mnt->mnt_root != path.dentry) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	ret = path_permission(&path, MAY_ACCESS);
> +	if (ret)
> +		goto out;
> +
> +	token = bpf_token_alloc();
> +	if (!token) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	/* remember bpffs owning userns for future ns_capable() checks */
> +	token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> +
> +	mnt_opts = path.dentry->d_sb->s_fs_info;
> +	token->allowed_cmds = mnt_opts->delegate_cmds;
> +
> +	ret = bpf_token_new_fd(token);
> +	if (ret < 0)
> +		bpf_token_free(token);
> +out:
> +	path_put(&path);
> +	return ret;
> +}
> +
> +#define BPF_TOKEN_INODE_NAME "bpf-token"
> +
> +/* Alloc anon_inode and FD for prepared token.
> + * Returns fd >= 0 on success; negative error, otherwise.
> + */
> +int bpf_token_new_fd(struct bpf_token *token)
> +{
> +	return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> +}
> +
> +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> +{
> +	struct fd f = fdget(ufd);
> +	struct bpf_token *token;
> +
> +	if (!f.file)
> +		return ERR_PTR(-EBADF);
> +	if (f.file->f_op != &bpf_token_fops) {
> +		fdput(f);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	token = f.file->private_data;
> +	bpf_token_inc(token);
> +	fdput(f);
> +
> +	return token;
> +}
> +
> +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> +{
> +	if (!token)
> +		return false;
> +
> +	return token->allowed_cmds & (1ULL << cmd);
> +}

I mentioned this a while back, likely in the other threads where this
token-based approach was only being discussed in general terms, but I
think we want to have a LSM hook at the point of initial token
delegation for this and a hook when the token is used.  My initial
thinking is that we should be able to address the former with a hook
in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
but it doesn't allow for much in the way of granularity.  Inserting the
LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
gracefully fallback to the system-wide checks if the LSM denied the
requested access whereas an access denial in bpf_token_get_from_fd()
denial would cause the operation to error out.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 bpf-next 07/12] bpf: consistenly use BPF token throughout BPF verifier logic
  2023-09-12 21:29 ` [PATCH v4 bpf-next 07/12] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
@ 2023-09-13 22:15   ` kernel test robot
  0 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2023-09-13 22:15 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf
  Cc: oe-kbuild-all, linux-fsdevel, linux-security-module, keescook,
	brauner, lennart, kernel-team, sargun

Hi Andrii,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-add-BPF-token-delegation-mount-options-to-BPF-FS/20230913-053240
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230912212906.3975866-8-andrii%40kernel.org
patch subject: [PATCH v4 bpf-next 07/12] bpf: consistenly use BPF token throughout BPF verifier logic
config: x86_64-randconfig-074-20230914 (https://download.01.org/0day-ci/archive/20230914/202309140537.jHmBqMd6-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230914/202309140537.jHmBqMd6-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309140537.jHmBqMd6-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/net/sock_reuseport.h:5,
                    from include/net/tcp.h:35,
                    from include/linux/netfilter_ipv6.h:11,
                    from include/uapi/linux/netfilter_ipv6/ip6_tables.h:22,
                    from include/linux/netfilter_ipv6/ip6_tables.h:23,
                    from net/ipv6/netfilter/ip6table_filter.c:11:
   include/linux/filter.h: In function 'bpf_jit_blinding_enabled':
>> include/linux/filter.h:1104:36: error: implicit declaration of function 'bpf_token_capable'; did you mean 'bpf_token_put'? [-Werror=implicit-function-declaration]
    1104 |         if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
         |                                    ^~~~~~~~~~~~~~~~~
         |                                    bpf_token_put
   cc1: some warnings being treated as errors
--
   In file included from include/net/sock_reuseport.h:5,
                    from include/net/tcp.h:35,
                    from include/linux/netfilter_ipv6.h:11,
                    from net/ipv6/netfilter/nf_reject_ipv6.c:12:
   include/linux/filter.h: In function 'bpf_jit_blinding_enabled':
>> include/linux/filter.h:1104:36: error: implicit declaration of function 'bpf_token_capable'; did you mean 'bpf_token_put'? [-Werror=implicit-function-declaration]
    1104 |         if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
         |                                    ^~~~~~~~~~~~~~~~~
         |                                    bpf_token_put
   net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6':
   net/ipv6/netfilter/nf_reject_ipv6.c:287:25: warning: variable 'ip6h' set but not used [-Wunused-but-set-variable]
     287 |         struct ipv6hdr *ip6h;
         |                         ^~~~
   cc1: some warnings being treated as errors


vim +1104 include/linux/filter.h

  1091	
  1092	static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
  1093	{
  1094		/* These are the prerequisites, should someone ever have the
  1095		 * idea to call blinding outside of them, we make sure to
  1096		 * bail out.
  1097		 */
  1098		if (!bpf_jit_is_ebpf())
  1099			return false;
  1100		if (!prog->jit_requested)
  1101			return false;
  1102		if (!bpf_jit_harden)
  1103			return false;
> 1104		if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
  1105			return false;
  1106	
  1107		return true;
  1108	}
  1109	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-13 21:46   ` [PATCH v4 2/12] " Paul Moore
@ 2023-09-14 17:31     ` Andrii Nakryiko
  2023-09-15  0:55       ` Paul Moore
  0 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-14 17:31 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun

On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operations could be performed using provided BPF token.
> >
> > This is achieved through mounting BPF FS instance with extra delegation
> > mount options, which determine what operations are delegatable, and also
> > constraining it to the owning user namespace (as mentioned in the
> > previous patch).
> >
> > BPF token itself is just a derivative from BPF FS and can be created
> > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > a path specification (using the usual fd + string path combo) to a BPF
> > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > prog type, and attach type bit sets from BPF FS as is. In the future,
> > having an BPF token as a separate object with its own FD, we can allow
> > to further restrict BPF token's allowable set of things either at the creation
> > time or after the fact, allowing the process to guard itself further
> > from, e.g., unintentionally trying to load undesired kind of BPF
> > programs. But for now we keep things simple and just copy bit sets as is.
> >
> > When BPF token is created from BPF FS mount, we take reference to the
> > BPF super block's owning user namespace, and then use that namespace for
> > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > capabilities that are normally only checked against init userns (using
> > capable()), but now we check them using ns_capable() instead (if BPF
> > token is provided). See bpf_token_capable() for details.
> >
> > Such setup means that BPF token in itself is not sufficient to grant BPF
> > functionality. User namespaced process has to *also* have necessary
> > combination of capabilities inside that user namespace. So while
> > previously CAP_BPF was useless when granted within user namespace, now
> > it gains a meaning and allows container managers and sys admins to have
> > a flexible control over which processes can and need to use BPF
> > functionality within the user namespace (i.e., container in practice).
> > And BPF FS delegation mount options and derived BPF tokens serve as
> > a per-container "flag" to grant overall ability to use bpf() (plus further
> > restrict on which parts of bpf() syscalls are treated as namespaced).
> >
> > The alternative to creating BPF token object was:
> >   a) not having any extra object and just pasing BPF FS path to each
> >      relevant bpf() command. This seems suboptimal as it's racy (mount
> >      under the same path might change in between checking it and using it
> >      for bpf() command). And also less flexible if we'd like to further
> >      restrict ourselves compared to all the delegated functionality
> >      allowed on BPF FS.
> >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> >      a dedicated FD that would represent a token-like functionality. This
> >      doesn't seem superior to having a proper bpf() command, so
> >      BPF_TOKEN_CREATE was chosen.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h            |  36 +++++++
> >  include/uapi/linux/bpf.h       |  39 +++++++
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/inode.c             |   4 +-
> >  kernel/bpf/syscall.c           |  17 +++
> >  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  39 +++++++
> >  7 files changed, 324 insertions(+), 2 deletions(-)
> >  create mode 100644 kernel/bpf/token.c
>
> ...
>
> > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > new file mode 100644
> > index 000000000000..f6ea3eddbee6
> > --- /dev/null
> > +++ b/kernel/bpf/token.c
> > @@ -0,0 +1,189 @@
> > +#include <linux/bpf.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/anon_inodes.h>
> > +#include <linux/fdtable.h>
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/kernel.h>
> > +#include <linux/idr.h>
> > +#include <linux/namei.h>
> > +#include <linux/user_namespace.h>
> > +
> > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > +{
> > +     /* BPF token allows ns_capable() level of capabilities */
> > +     if (token) {
> > +             if (ns_capable(token->userns, cap))
> > +                     return true;
> > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > +                     return true;
> > +     }
> > +     /* otherwise fallback to capable() checks */
> > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > +}
>
> While the above looks to be equivalent to the bpf_capable() function it
> replaces, for callers checking CAP_BPF and CAP_SYS_ADMIN, I'm looking
> quickly at patch 3/12 and this is also being used to replace a
> capable(CAP_NET_ADMIN) call which results in a change in behavior.
> The current code which performs a capable(CAP_NET_ADMIN) check cannot
> be satisfied by CAP_SYS_ADMIN, but this patchset using
> bpf_token_capable(token, CAP_NET_ADMIN) can be satisfied by either
> CAP_NET_ADMIN or CAP_SYS_ADMIN.
>
> It seems that while bpf_token_capable() can be used as a replacement
> for bpf_capable(), it is not currently a suitable replacement for a
> generic capable() call.  Perhaps this is intentional, but I didn't see
> it mentioned in the commit description, or in the comments, and I
> wanted to make sure it wasn't an oversight.

You are right. It is an intentional attempt to unify all such checks.
If you look at bpf_prog_load(), we have this:

if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) &&
!capable(CAP_SYS_ADMIN))
    return -EPERM;

So seeing that, I realized that we did have an intent to always use
CAP_SYS_ADMIN as a "fallback" cap, even for CAP_NET_ADMIN when it
comes to using network-enabled BPF programs. So I decided that
unifying all this makes sense.

I'll add a comment mentioning this, I should have been more explicit
from the get go.

>
> > +void bpf_token_inc(struct bpf_token *token)
> > +{
> > +     atomic64_inc(&token->refcnt);
> > +}
> > +

[...]

> > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > +
> > +/* Alloc anon_inode and FD for prepared token.
> > + * Returns fd >= 0 on success; negative error, otherwise.
> > + */
> > +int bpf_token_new_fd(struct bpf_token *token)
> > +{
> > +     return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> > +}
> > +
> > +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > +{
> > +     struct fd f = fdget(ufd);
> > +     struct bpf_token *token;
> > +
> > +     if (!f.file)
> > +             return ERR_PTR(-EBADF);
> > +     if (f.file->f_op != &bpf_token_fops) {
> > +             fdput(f);
> > +             return ERR_PTR(-EINVAL);
> > +     }
> > +
> > +     token = f.file->private_data;
> > +     bpf_token_inc(token);
> > +     fdput(f);
> > +
> > +     return token;
> > +}
> > +
> > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > +{
> > +     if (!token)
> > +             return false;
> > +
> > +     return token->allowed_cmds & (1ULL << cmd);
> > +}
>
> I mentioned this a while back, likely in the other threads where this
> token-based approach was only being discussed in general terms, but I
> think we want to have a LSM hook at the point of initial token
> delegation for this and a hook when the token is used.  My initial
> thinking is that we should be able to address the former with a hook
> in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> but it doesn't allow for much in the way of granularity.  Inserting the
> LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> gracefully fallback to the system-wide checks if the LSM denied the
> requested access whereas an access denial in bpf_token_get_from_fd()
> denial would cause the operation to error out.

I think the bpf_fill_super() LSM hook makes sense, but I thought
someone mentioned that we already have some generic LSM hook for
validating mounts? If we don't, I can certainly add one for BPF FS
specifically.

As for the bpf_token_allow_xxx(). This feels a bit too specific and
narrow-focused. What if we later add yet another dimension for BPF FS
and token? Do we need to introduce yet another LSM for each such case?
But also see bpf_prog_load(). There are two checks, allow_prog_type
and allow_attach_type, which are really only meaningful in
combination. And yet you'd have to have two separate LSM hooks for
that.

So I feel like the better approach is less mechanistically
concentrating on BPF token operations themselves, but rather on more
semantically meaningful operations that are token-enabled. E.g.,
protect BPF program loading, BPF map creation, BTF loading, etc. And
we do have such LSM hooks right now, though they might not be the most
convenient. So perhaps the right move is to add new ones that would
provide a bit more context (e.g., we can pass in the BPF token that
was used for the operation, attributes with which map/prog was
created, etc). Low-level token LSMs seem hard to use cohesively in
practice, though.

WDYT?

>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-14 17:31     ` Andrii Nakryiko
@ 2023-09-15  0:55       ` Paul Moore
  2023-09-15 20:59         ` Andrii Nakryiko
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Moore @ 2023-09-15  0:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun

On Thu, Sep 14, 2023 at 1:31 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > >
> > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > allow delegating privileged BPF functionality, like loading a BPF
> > > program or creating a BPF map, from privileged process to a *trusted*
> > > unprivileged process, all while have a good amount of control over which
> > > privileged operations could be performed using provided BPF token.
> > >
> > > This is achieved through mounting BPF FS instance with extra delegation
> > > mount options, which determine what operations are delegatable, and also
> > > constraining it to the owning user namespace (as mentioned in the
> > > previous patch).
> > >
> > > BPF token itself is just a derivative from BPF FS and can be created
> > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > a path specification (using the usual fd + string path combo) to a BPF
> > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > having an BPF token as a separate object with its own FD, we can allow
> > > to further restrict BPF token's allowable set of things either at the creation
> > > time or after the fact, allowing the process to guard itself further
> > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > programs. But for now we keep things simple and just copy bit sets as is.
> > >
> > > When BPF token is created from BPF FS mount, we take reference to the
> > > BPF super block's owning user namespace, and then use that namespace for
> > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > capabilities that are normally only checked against init userns (using
> > > capable()), but now we check them using ns_capable() instead (if BPF
> > > token is provided). See bpf_token_capable() for details.
> > >
> > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > functionality. User namespaced process has to *also* have necessary
> > > combination of capabilities inside that user namespace. So while
> > > previously CAP_BPF was useless when granted within user namespace, now
> > > it gains a meaning and allows container managers and sys admins to have
> > > a flexible control over which processes can and need to use BPF
> > > functionality within the user namespace (i.e., container in practice).
> > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > >
> > > The alternative to creating BPF token object was:
> > >   a) not having any extra object and just pasing BPF FS path to each
> > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > >      under the same path might change in between checking it and using it
> > >      for bpf() command). And also less flexible if we'd like to further
> > >      restrict ourselves compared to all the delegated functionality
> > >      allowed on BPF FS.
> > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > >      a dedicated FD that would represent a token-like functionality. This
> > >      doesn't seem superior to having a proper bpf() command, so
> > >      BPF_TOKEN_CREATE was chosen.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h            |  36 +++++++
> > >  include/uapi/linux/bpf.h       |  39 +++++++
> > >  kernel/bpf/Makefile            |   2 +-
> > >  kernel/bpf/inode.c             |   4 +-
> > >  kernel/bpf/syscall.c           |  17 +++
> > >  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > >  7 files changed, 324 insertions(+), 2 deletions(-)
> > >  create mode 100644 kernel/bpf/token.c
> >
> > ...
> >
> > > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > > new file mode 100644
> > > index 000000000000..f6ea3eddbee6
> > > --- /dev/null
> > > +++ b/kernel/bpf/token.c
> > > @@ -0,0 +1,189 @@
> > > +#include <linux/bpf.h>
> > > +#include <linux/vmalloc.h>
> > > +#include <linux/anon_inodes.h>
> > > +#include <linux/fdtable.h>
> > > +#include <linux/file.h>
> > > +#include <linux/fs.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/idr.h>
> > > +#include <linux/namei.h>
> > > +#include <linux/user_namespace.h>
> > > +
> > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > +{
> > > +     /* BPF token allows ns_capable() level of capabilities */
> > > +     if (token) {
> > > +             if (ns_capable(token->userns, cap))
> > > +                     return true;
> > > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > +                     return true;
> > > +     }
> > > +     /* otherwise fallback to capable() checks */
> > > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > +}
> >
> > While the above looks to be equivalent to the bpf_capable() function it
> > replaces, for callers checking CAP_BPF and CAP_SYS_ADMIN, I'm looking
> > quickly at patch 3/12 and this is also being used to replace a
> > capable(CAP_NET_ADMIN) call which results in a change in behavior.
> > The current code which performs a capable(CAP_NET_ADMIN) check cannot
> > be satisfied by CAP_SYS_ADMIN, but this patchset using
> > bpf_token_capable(token, CAP_NET_ADMIN) can be satisfied by either
> > CAP_NET_ADMIN or CAP_SYS_ADMIN.
> >
> > It seems that while bpf_token_capable() can be used as a replacement
> > for bpf_capable(), it is not currently a suitable replacement for a
> > generic capable() call.  Perhaps this is intentional, but I didn't see
> > it mentioned in the commit description, or in the comments, and I
> > wanted to make sure it wasn't an oversight.
>
> You are right. It is an intentional attempt to unify all such checks.
> If you look at bpf_prog_load(), we have this:
>
> if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) &&
> !capable(CAP_SYS_ADMIN))
>     return -EPERM;
>
> So seeing that, I realized that we did have an intent to always use
> CAP_SYS_ADMIN as a "fallback" cap, even for CAP_NET_ADMIN when it
> comes to using network-enabled BPF programs. So I decided that
> unifying all this makes sense.
>
> I'll add a comment mentioning this, I should have been more explicit
> from the get go.

Thanks for the clarification.  I'm not to worried about checking
CAP_SYS_ADMIN as a fallback, but I always get a little twitchy when I
see capability changes in the code without any mention.

A mention in the commit description is good, and you could also draft
up a standalone patch that adds the CAP_SYS_ADMIN fallback to the
current in-tree code.  That would be a good way to really highlight
the capability changes and deal with any issues that might arise
(review, odd corner cases?, etc.) prior to the BPF capability
delegation patcheset we are discussing here.

> > > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > > +
> > > +/* Alloc anon_inode and FD for prepared token.
> > > + * Returns fd >= 0 on success; negative error, otherwise.
> > > + */
> > > +int bpf_token_new_fd(struct bpf_token *token)
> > > +{
> > > +     return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> > > +}
> > > +
> > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > +{
> > > +     struct fd f = fdget(ufd);
> > > +     struct bpf_token *token;
> > > +
> > > +     if (!f.file)
> > > +             return ERR_PTR(-EBADF);
> > > +     if (f.file->f_op != &bpf_token_fops) {
> > > +             fdput(f);
> > > +             return ERR_PTR(-EINVAL);
> > > +     }
> > > +
> > > +     token = f.file->private_data;
> > > +     bpf_token_inc(token);
> > > +     fdput(f);
> > > +
> > > +     return token;
> > > +}
> > > +
> > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > +{
> > > +     if (!token)
> > > +             return false;
> > > +
> > > +     return token->allowed_cmds & (1ULL << cmd);
> > > +}
> >
> > I mentioned this a while back, likely in the other threads where this
> > token-based approach was only being discussed in general terms, but I
> > think we want to have a LSM hook at the point of initial token
> > delegation for this and a hook when the token is used.  My initial
> > thinking is that we should be able to address the former with a hook
> > in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> > or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> > but it doesn't allow for much in the way of granularity.  Inserting the
> > LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> > gracefully fallback to the system-wide checks if the LSM denied the
> > requested access whereas an access denial in bpf_token_get_from_fd()
> > denial would cause the operation to error out.
>
> I think the bpf_fill_super() LSM hook makes sense, but I thought
> someone mentioned that we already have some generic LSM hook for
> validating mounts? If we don't, I can certainly add one for BPF FS
> specifically.

We do have security_sb_mount(), but that is a generic mount operation
access control and not well suited for controlling the mount-based
capability delegation that you are proposing here.  However, if you or
someone else has a clever way to make security_sb_mount() work for
this purpose I would be very happy to review that code.

> As for the bpf_token_allow_xxx(). This feels a bit too specific and
> narrow-focused. What if we later add yet another dimension for BPF FS
> and token? Do we need to introduce yet another LSM for each such case?

[I'm assuming you meant new LSM *hook*]

Possibly.  There are also some other issues which I've been thinking
about along these lines, specifically the fact that the
capability/command delegation happens after the existing
security_bpf() hook is called which makes things rather awkward from a
LSM perspective: the LSM would first need to allow the process access
to the desired BPF op using it's current LSM specific security
attributes (e.g. SELinux security domain, etc.) and then later
consider the op in the context of the delegated access control rights
(if the LSM decides to support those hooks).

I suspect that if we want to make this practical we would need to
either move some of the token code up into __sys_bpf() so we could
have a better interaction with security_bpf(), or we need to consider
moving the security_bpf() call into the op specific functions.  I'm
still thinking on this (lots of reviews to get through this week), but
I'm hoping there is a better way because I'm not sure I like either
option very much.

> But also see bpf_prog_load(). There are two checks, allow_prog_type
> and allow_attach_type, which are really only meaningful in
> combination. And yet you'd have to have two separate LSM hooks for
> that.
>
> So I feel like the better approach is less mechanistically
> concentrating on BPF token operations themselves, but rather on more
> semantically meaningful operations that are token-enabled. E.g.,
> protect BPF program loading, BPF map creation, BTF loading, etc. And
> we do have such LSM hooks right now, though they might not be the most
> convenient. So perhaps the right move is to add new ones that would
> provide a bit more context (e.g., we can pass in the BPF token that
> was used for the operation, attributes with which map/prog was
> created, etc). Low-level token LSMs seem hard to use cohesively in
> practice, though.

Can you elaborate a bit more?  It's hard to judge the comments above
without some more specifics about hook location, parameters, etc.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-15  0:55       ` Paul Moore
@ 2023-09-15 20:59         ` Andrii Nakryiko
  2023-09-21 22:18           ` Paul Moore
  0 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-15 20:59 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun

On Thu, Sep 14, 2023 at 5:55 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Sep 14, 2023 at 1:31 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > unprivileged process, all while have a good amount of control over which
> > > > privileged operations could be performed using provided BPF token.
> > > >
> > > > This is achieved through mounting BPF FS instance with extra delegation
> > > > mount options, which determine what operations are delegatable, and also
> > > > constraining it to the owning user namespace (as mentioned in the
> > > > previous patch).
> > > >
> > > > BPF token itself is just a derivative from BPF FS and can be created
> > > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > > a path specification (using the usual fd + string path combo) to a BPF
> > > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > > having an BPF token as a separate object with its own FD, we can allow
> > > > to further restrict BPF token's allowable set of things either at the creation
> > > > time or after the fact, allowing the process to guard itself further
> > > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > > programs. But for now we keep things simple and just copy bit sets as is.
> > > >
> > > > When BPF token is created from BPF FS mount, we take reference to the
> > > > BPF super block's owning user namespace, and then use that namespace for
> > > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > > capabilities that are normally only checked against init userns (using
> > > > capable()), but now we check them using ns_capable() instead (if BPF
> > > > token is provided). See bpf_token_capable() for details.
> > > >
> > > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > > functionality. User namespaced process has to *also* have necessary
> > > > combination of capabilities inside that user namespace. So while
> > > > previously CAP_BPF was useless when granted within user namespace, now
> > > > it gains a meaning and allows container managers and sys admins to have
> > > > a flexible control over which processes can and need to use BPF
> > > > functionality within the user namespace (i.e., container in practice).
> > > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > > >
> > > > The alternative to creating BPF token object was:
> > > >   a) not having any extra object and just pasing BPF FS path to each
> > > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > > >      under the same path might change in between checking it and using it
> > > >      for bpf() command). And also less flexible if we'd like to further
> > > >      restrict ourselves compared to all the delegated functionality
> > > >      allowed on BPF FS.
> > > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > > >      a dedicated FD that would represent a token-like functionality. This
> > > >      doesn't seem superior to having a proper bpf() command, so
> > > >      BPF_TOKEN_CREATE was chosen.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  include/linux/bpf.h            |  36 +++++++
> > > >  include/uapi/linux/bpf.h       |  39 +++++++
> > > >  kernel/bpf/Makefile            |   2 +-
> > > >  kernel/bpf/inode.c             |   4 +-
> > > >  kernel/bpf/syscall.c           |  17 +++
> > > >  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
> > > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > > >  7 files changed, 324 insertions(+), 2 deletions(-)
> > > >  create mode 100644 kernel/bpf/token.c
> > >
> > > ...
> > >
> > > > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > > > new file mode 100644
> > > > index 000000000000..f6ea3eddbee6
> > > > --- /dev/null
> > > > +++ b/kernel/bpf/token.c
> > > > @@ -0,0 +1,189 @@
> > > > +#include <linux/bpf.h>
> > > > +#include <linux/vmalloc.h>
> > > > +#include <linux/anon_inodes.h>
> > > > +#include <linux/fdtable.h>
> > > > +#include <linux/file.h>
> > > > +#include <linux/fs.h>
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/idr.h>
> > > > +#include <linux/namei.h>
> > > > +#include <linux/user_namespace.h>
> > > > +
> > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > +{
> > > > +     /* BPF token allows ns_capable() level of capabilities */
> > > > +     if (token) {
> > > > +             if (ns_capable(token->userns, cap))
> > > > +                     return true;
> > > > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > +                     return true;
> > > > +     }
> > > > +     /* otherwise fallback to capable() checks */
> > > > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > +}
> > >
> > > While the above looks to be equivalent to the bpf_capable() function it
> > > replaces, for callers checking CAP_BPF and CAP_SYS_ADMIN, I'm looking
> > > quickly at patch 3/12 and this is also being used to replace a
> > > capable(CAP_NET_ADMIN) call which results in a change in behavior.
> > > The current code which performs a capable(CAP_NET_ADMIN) check cannot
> > > be satisfied by CAP_SYS_ADMIN, but this patchset using
> > > bpf_token_capable(token, CAP_NET_ADMIN) can be satisfied by either
> > > CAP_NET_ADMIN or CAP_SYS_ADMIN.
> > >
> > > It seems that while bpf_token_capable() can be used as a replacement
> > > for bpf_capable(), it is not currently a suitable replacement for a
> > > generic capable() call.  Perhaps this is intentional, but I didn't see
> > > it mentioned in the commit description, or in the comments, and I
> > > wanted to make sure it wasn't an oversight.
> >
> > You are right. It is an intentional attempt to unify all such checks.
> > If you look at bpf_prog_load(), we have this:
> >
> > if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) &&
> > !capable(CAP_SYS_ADMIN))
> >     return -EPERM;
> >
> > So seeing that, I realized that we did have an intent to always use
> > CAP_SYS_ADMIN as a "fallback" cap, even for CAP_NET_ADMIN when it
> > comes to using network-enabled BPF programs. So I decided that
> > unifying all this makes sense.
> >
> > I'll add a comment mentioning this, I should have been more explicit
> > from the get go.
>
> Thanks for the clarification.  I'm not to worried about checking
> CAP_SYS_ADMIN as a fallback, but I always get a little twitchy when I
> see capability changes in the code without any mention.
>
> A mention in the commit description is good, and you could also draft
> up a standalone patch that adds the CAP_SYS_ADMIN fallback to the
> current in-tree code.  That would be a good way to really highlight
> the capability changes and deal with any issues that might arise
> (review, odd corner cases?, etc.) prior to the BPF capability
> delegation patcheset we are discussing here.

Sure, sounds good, I'll add this as a pre-patch for next revision.

>
> > > > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > > > +
> > > > +/* Alloc anon_inode and FD for prepared token.
> > > > + * Returns fd >= 0 on success; negative error, otherwise.
> > > > + */
> > > > +int bpf_token_new_fd(struct bpf_token *token)
> > > > +{
> > > > +     return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> > > > +}
> > > > +
> > > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > > +{
> > > > +     struct fd f = fdget(ufd);
> > > > +     struct bpf_token *token;
> > > > +
> > > > +     if (!f.file)
> > > > +             return ERR_PTR(-EBADF);
> > > > +     if (f.file->f_op != &bpf_token_fops) {
> > > > +             fdput(f);
> > > > +             return ERR_PTR(-EINVAL);
> > > > +     }
> > > > +
> > > > +     token = f.file->private_data;
> > > > +     bpf_token_inc(token);
> > > > +     fdput(f);
> > > > +
> > > > +     return token;
> > > > +}
> > > > +
> > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > > +{
> > > > +     if (!token)
> > > > +             return false;
> > > > +
> > > > +     return token->allowed_cmds & (1ULL << cmd);
> > > > +}
> > >
> > > I mentioned this a while back, likely in the other threads where this
> > > token-based approach was only being discussed in general terms, but I
> > > think we want to have a LSM hook at the point of initial token
> > > delegation for this and a hook when the token is used.  My initial
> > > thinking is that we should be able to address the former with a hook
> > > in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> > > or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> > > but it doesn't allow for much in the way of granularity.  Inserting the
> > > LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> > > gracefully fallback to the system-wide checks if the LSM denied the
> > > requested access whereas an access denial in bpf_token_get_from_fd()
> > > denial would cause the operation to error out.
> >
> > I think the bpf_fill_super() LSM hook makes sense, but I thought
> > someone mentioned that we already have some generic LSM hook for
> > validating mounts? If we don't, I can certainly add one for BPF FS
> > specifically.
>
> We do have security_sb_mount(), but that is a generic mount operation
> access control and not well suited for controlling the mount-based
> capability delegation that you are proposing here.  However, if you or
> someone else has a clever way to make security_sb_mount() work for
> this purpose I would be very happy to review that code.

To be honest, I'm a bit out of my depth here, as I don't know the
mounting parts well. Perhaps someone from VFS side can advise. But
regardless, I have no problem adding a new LSM hook as well, ideally
not very BPF-specific. If you have a specific form of it in mind, I'd
be curious to see it and implement it.

>
> > As for the bpf_token_allow_xxx(). This feels a bit too specific and
> > narrow-focused. What if we later add yet another dimension for BPF FS
> > and token? Do we need to introduce yet another LSM for each such case?
>
> [I'm assuming you meant new LSM *hook*]

yep, of course, sorry about using terminology sloppily

>
> Possibly.  There are also some other issues which I've been thinking
> about along these lines, specifically the fact that the
> capability/command delegation happens after the existing
> security_bpf() hook is called which makes things rather awkward from a
> LSM perspective: the LSM would first need to allow the process access
> to the desired BPF op using it's current LSM specific security
> attributes (e.g. SELinux security domain, etc.) and then later
> consider the op in the context of the delegated access control rights
> (if the LSM decides to support those hooks).
>
> I suspect that if we want to make this practical we would need to
> either move some of the token code up into __sys_bpf() so we could
> have a better interaction with security_bpf(), or we need to consider
> moving the security_bpf() call into the op specific functions.  I'm
> still thinking on this (lots of reviews to get through this week), but
> I'm hoping there is a better way because I'm not sure I like either
> option very much.

Yes, security_bpf() is happening extremely early and is lacking a lot
of context. I'm not sure if moving it around is a good idea as it
basically changes its semantics. But adding a new set of coherent LSM
hooks per each appropriate BPF operation with good context to make
decisions sounds like a good improvement. E.g., for BPF_PROG_LOAD, we
can have LSM hook after struct bpf_prog is allocated, bpf_token is
available, attributes are sanity checked. All that together is a very
useful and powerful context that can be used both by more fixed LSM
policies (like SELinux), and very dynamic user-defined BPF LSM
programs.

But I'd like to keep all that outside of the BPF token feature itself,
as it's already pretty hard to get a consensus just on those bits, so
complicating this with simultaneously designing a new set of LSM hooks
is something that we should avoid. Let's keep discussing this, but not
block that on BPF token.

>
> > But also see bpf_prog_load(). There are two checks, allow_prog_type
> > and allow_attach_type, which are really only meaningful in
> > combination. And yet you'd have to have two separate LSM hooks for
> > that.
> >
> > So I feel like the better approach is less mechanistically
> > concentrating on BPF token operations themselves, but rather on more
> > semantically meaningful operations that are token-enabled. E.g.,
> > protect BPF program loading, BPF map creation, BTF loading, etc. And
> > we do have such LSM hooks right now, though they might not be the most
> > convenient. So perhaps the right move is to add new ones that would
> > provide a bit more context (e.g., we can pass in the BPF token that
> > was used for the operation, attributes with which map/prog was
> > created, etc). Low-level token LSMs seem hard to use cohesively in
> > practice, though.
>
> Can you elaborate a bit more?  It's hard to judge the comments above
> without some more specifics about hook location, parameters, etc.

So something like my above proposal for a new LSM hook in
BPF_PROG_LOAD command. Just right before passing bpf_prog to BPF
verifier, we can have

err = security_bpf_prog_load(prog, attr, token)
if (err)
    return -EPERM;

Program, attributes, and token give a lot of inputs for security
policy logic to make a decision about allowing that specific BPF
program to be verified and loaded or not. I know how this could be
used from BPF LSM side, but I assume that SELinux and others can take
advantage of that provided additional context as well.

Similarly we can have a BPF_MAP_CREATE-specific LSM hook with context
relevant to creating a BPF map. And so on.


>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-15 20:59         ` Andrii Nakryiko
@ 2023-09-21 22:18           ` Paul Moore
  2023-09-22  9:27             ` Paul Moore
  2023-09-22 22:35             ` Andrii Nakryiko
  0 siblings, 2 replies; 28+ messages in thread
From: Paul Moore @ 2023-09-21 22:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun, selinux

On Fri, Sep 15, 2023 at 4:59 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Thu, Sep 14, 2023 at 5:55 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Sep 14, 2023 at 1:31 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > > unprivileged process, all while have a good amount of control over which
> > > > > privileged operations could be performed using provided BPF token.
> > > > >
> > > > > This is achieved through mounting BPF FS instance with extra delegation
> > > > > mount options, which determine what operations are delegatable, and also
> > > > > constraining it to the owning user namespace (as mentioned in the
> > > > > previous patch).
> > > > >
> > > > > BPF token itself is just a derivative from BPF FS and can be created
> > > > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > > > a path specification (using the usual fd + string path combo) to a BPF
> > > > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > > > having an BPF token as a separate object with its own FD, we can allow
> > > > > to further restrict BPF token's allowable set of things either at the creation
> > > > > time or after the fact, allowing the process to guard itself further
> > > > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > > > programs. But for now we keep things simple and just copy bit sets as is.
> > > > >
> > > > > When BPF token is created from BPF FS mount, we take reference to the
> > > > > BPF super block's owning user namespace, and then use that namespace for
> > > > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > > > capabilities that are normally only checked against init userns (using
> > > > > capable()), but now we check them using ns_capable() instead (if BPF
> > > > > token is provided). See bpf_token_capable() for details.
> > > > >
> > > > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > > > functionality. User namespaced process has to *also* have necessary
> > > > > combination of capabilities inside that user namespace. So while
> > > > > previously CAP_BPF was useless when granted within user namespace, now
> > > > > it gains a meaning and allows container managers and sys admins to have
> > > > > a flexible control over which processes can and need to use BPF
> > > > > functionality within the user namespace (i.e., container in practice).
> > > > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > > > >
> > > > > The alternative to creating BPF token object was:
> > > > >   a) not having any extra object and just pasing BPF FS path to each
> > > > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > > > >      under the same path might change in between checking it and using it
> > > > >      for bpf() command). And also less flexible if we'd like to further
> > > > >      restrict ourselves compared to all the delegated functionality
> > > > >      allowed on BPF FS.
> > > > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > > > >      a dedicated FD that would represent a token-like functionality. This
> > > > >      doesn't seem superior to having a proper bpf() command, so
> > > > >      BPF_TOKEN_CREATE was chosen.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > ---
> > > > >  include/linux/bpf.h            |  36 +++++++
> > > > >  include/uapi/linux/bpf.h       |  39 +++++++
> > > > >  kernel/bpf/Makefile            |   2 +-
> > > > >  kernel/bpf/inode.c             |   4 +-
> > > > >  kernel/bpf/syscall.c           |  17 +++
> > > > >  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
> > > > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > > > >  7 files changed, 324 insertions(+), 2 deletions(-)
> > > > >  create mode 100644 kernel/bpf/token.c
> > > >
> > > > ...
> > > >
> > > > > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > > > > new file mode 100644
> > > > > index 000000000000..f6ea3eddbee6
> > > > > --- /dev/null
> > > > > +++ b/kernel/bpf/token.c
> > > > > @@ -0,0 +1,189 @@
> > > > > +#include <linux/bpf.h>
> > > > > +#include <linux/vmalloc.h>
> > > > > +#include <linux/anon_inodes.h>
> > > > > +#include <linux/fdtable.h>
> > > > > +#include <linux/file.h>
> > > > > +#include <linux/fs.h>
> > > > > +#include <linux/kernel.h>
> > > > > +#include <linux/idr.h>
> > > > > +#include <linux/namei.h>
> > > > > +#include <linux/user_namespace.h>
> > > > > +
> > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > +{
> > > > > +     /* BPF token allows ns_capable() level of capabilities */
> > > > > +     if (token) {
> > > > > +             if (ns_capable(token->userns, cap))
> > > > > +                     return true;
> > > > > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > > +                     return true;
> > > > > +     }
> > > > > +     /* otherwise fallback to capable() checks */
> > > > > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > +}
> > > >
> > > > While the above looks to be equivalent to the bpf_capable() function it
> > > > replaces, for callers checking CAP_BPF and CAP_SYS_ADMIN, I'm looking
> > > > quickly at patch 3/12 and this is also being used to replace a
> > > > capable(CAP_NET_ADMIN) call which results in a change in behavior.
> > > > The current code which performs a capable(CAP_NET_ADMIN) check cannot
> > > > be satisfied by CAP_SYS_ADMIN, but this patchset using
> > > > bpf_token_capable(token, CAP_NET_ADMIN) can be satisfied by either
> > > > CAP_NET_ADMIN or CAP_SYS_ADMIN.
> > > >
> > > > It seems that while bpf_token_capable() can be used as a replacement
> > > > for bpf_capable(), it is not currently a suitable replacement for a
> > > > generic capable() call.  Perhaps this is intentional, but I didn't see
> > > > it mentioned in the commit description, or in the comments, and I
> > > > wanted to make sure it wasn't an oversight.
> > >
> > > You are right. It is an intentional attempt to unify all such checks.
> > > If you look at bpf_prog_load(), we have this:
> > >
> > > if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) &&
> > > !capable(CAP_SYS_ADMIN))
> > >     return -EPERM;
> > >
> > > So seeing that, I realized that we did have an intent to always use
> > > CAP_SYS_ADMIN as a "fallback" cap, even for CAP_NET_ADMIN when it
> > > comes to using network-enabled BPF programs. So I decided that
> > > unifying all this makes sense.
> > >
> > > I'll add a comment mentioning this, I should have been more explicit
> > > from the get go.
> >
> > Thanks for the clarification.  I'm not to worried about checking
> > CAP_SYS_ADMIN as a fallback, but I always get a little twitchy when I
> > see capability changes in the code without any mention.
> >
> > A mention in the commit description is good, and you could also draft
> > up a standalone patch that adds the CAP_SYS_ADMIN fallback to the
> > current in-tree code.  That would be a good way to really highlight
> > the capability changes and deal with any issues that might arise
> > (review, odd corner cases?, etc.) prior to the BPF capability
> > delegation patcheset we are discussing here.
>
> Sure, sounds good, I'll add this as a pre-patch for next revision.

My apologies on the delay, I've been traveling this week and haven't
had the time to dig back into this.

I do see that you've posted another revision of this patchset with the
capability pre-patch, thanks for doing that.

> > > > > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > > > > +
> > > > > +/* Alloc anon_inode and FD for prepared token.
> > > > > + * Returns fd >= 0 on success; negative error, otherwise.
> > > > > + */
> > > > > +int bpf_token_new_fd(struct bpf_token *token)
> > > > > +{
> > > > > +     return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> > > > > +}
> > > > > +
> > > > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > > > +{
> > > > > +     struct fd f = fdget(ufd);
> > > > > +     struct bpf_token *token;
> > > > > +
> > > > > +     if (!f.file)
> > > > > +             return ERR_PTR(-EBADF);
> > > > > +     if (f.file->f_op != &bpf_token_fops) {
> > > > > +             fdput(f);
> > > > > +             return ERR_PTR(-EINVAL);
> > > > > +     }
> > > > > +
> > > > > +     token = f.file->private_data;
> > > > > +     bpf_token_inc(token);
> > > > > +     fdput(f);
> > > > > +
> > > > > +     return token;
> > > > > +}
> > > > > +
> > > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > > > +{
> > > > > +     if (!token)
> > > > > +             return false;
> > > > > +
> > > > > +     return token->allowed_cmds & (1ULL << cmd);
> > > > > +}
> > > >
> > > > I mentioned this a while back, likely in the other threads where this
> > > > token-based approach was only being discussed in general terms, but I
> > > > think we want to have a LSM hook at the point of initial token
> > > > delegation for this and a hook when the token is used.  My initial
> > > > thinking is that we should be able to address the former with a hook
> > > > in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> > > > or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> > > > but it doesn't allow for much in the way of granularity.  Inserting the
> > > > LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> > > > gracefully fallback to the system-wide checks if the LSM denied the
> > > > requested access whereas an access denial in bpf_token_get_from_fd()
> > > > denial would cause the operation to error out.
> > >
> > > I think the bpf_fill_super() LSM hook makes sense, but I thought
> > > someone mentioned that we already have some generic LSM hook for
> > > validating mounts? If we don't, I can certainly add one for BPF FS
> > > specifically.
> >
> > We do have security_sb_mount(), but that is a generic mount operation
> > access control and not well suited for controlling the mount-based
> > capability delegation that you are proposing here.  However, if you or
> > someone else has a clever way to make security_sb_mount() work for
> > this purpose I would be very happy to review that code.
>
> To be honest, I'm a bit out of my depth here, as I don't know the
> mounting parts well. Perhaps someone from VFS side can advise. But
> regardless, I have no problem adding a new LSM hook as well, ideally
> not very BPF-specific. If you have a specific form of it in mind, I'd
> be curious to see it and implement it.

I agree that there can be benefits to generalized LSM hooks, but in
this hook I think it may need to be BPF specific simply because the
hook would be dealing with the specific concept of delegating BPF
permissions.

I haven't taken the time to write up any hook patches yet as I wanted
to discuss it with you and the others on the To/CC line, but it seems
like we are roughly on the same page, at least with the initial
delegation hook, so I can put something together if you aren't
comfortable working on this (more on this below) ...

> > > As for the bpf_token_allow_xxx(). This feels a bit too specific and
> > > narrow-focused. What if we later add yet another dimension for BPF FS
> > > and token? Do we need to introduce yet another LSM for each such case?
> >
> > [I'm assuming you meant new LSM *hook*]
>
> yep, of course, sorry about using terminology sloppily
>
> >
> > Possibly.  There are also some other issues which I've been thinking
> > about along these lines, specifically the fact that the
> > capability/command delegation happens after the existing
> > security_bpf() hook is called which makes things rather awkward from a
> > LSM perspective: the LSM would first need to allow the process access
> > to the desired BPF op using it's current LSM specific security
> > attributes (e.g. SELinux security domain, etc.) and then later
> > consider the op in the context of the delegated access control rights
> > (if the LSM decides to support those hooks).
> >
> > I suspect that if we want to make this practical we would need to
> > either move some of the token code up into __sys_bpf() so we could
> > have a better interaction with security_bpf(), or we need to consider
> > moving the security_bpf() call into the op specific functions.  I'm
> > still thinking on this (lots of reviews to get through this week), but
> > I'm hoping there is a better way because I'm not sure I like either
> > option very much.
>
> Yes, security_bpf() is happening extremely early and is lacking a lot
> of context. I'm not sure if moving it around is a good idea as it
> basically changes its semantics.

There are a couple of things that make this not quite as scary as it
may seem.  The first is that currently only SELinux implements a
security_bpf() hook and the implementation is rather simplistic in
terms of what information it requires to perform the existing access
controls; decomposing the single security_bpf() call site into
multiple op specific calls, perhaps with some op specific hooks,
should be doable without causing major semantic changes.  The second
thing is that we could augment the existing security_bpf() hook and
call site with a new LSM hook(s) that are called from the op specific
call sites; this would allow those LSMs that desire the current
semantics to use the existing security_bpf() hook and those that wish
to use the new semantics could implement the new hook(s).  This is
very similar to the pathname-based and inode-based hooks in the VFS
layer, some LSMs choose to implement pathname-based security and use
one set of hooks, while others implement a label-based security
mechanism and use a different set of hooks.

> But adding a new set of coherent LSM
> hooks per each appropriate BPF operation with good context to make
> decisions sounds like a good improvement. E.g., for BPF_PROG_LOAD, we
> can have LSM hook after struct bpf_prog is allocated, bpf_token is
> available, attributes are sanity checked. All that together is a very
> useful and powerful context that can be used both by more fixed LSM
> policies (like SELinux), and very dynamic user-defined BPF LSM
> programs.

This is where it is my turn to mention that I'm getting a bit out of
my depth, but I'm hopeful that the two of us can keep each other from
drowning :)

Typically the LSM hook call sites end up being in the same general
area as the capability checks, usually just after (we want the normal
Linux discretionary access controls to always come first for the sake
of consistency).  Sticking with that approach it looks like we would
end up with a LSM call in bpf_prog_load() right after bpf_capable()
call, the only gotcha with that is the bpf_prog struct isn't populated
yet, but how important is that when we have the bpf_attr info (honest
question, I don't know the answer to this)?

Ignoring the bpf_prog struct, do you think something like this would
work for a hook call site (please forgive the pseudo code)?

  int bpf_prog_load(...)
  {
         ...
     bpf_cap = bpf_token_capable(token, CAP_BPF);
     err = security_bpf_token(BPF_PROG_LOAD, attr, uattr_size, token);
     if (err)
       return err;
    ...
  }

Assuming this type of hook configuration, and an empty/passthrough
security_bpf() hook, a LSM would first see the various
capable()/ns_capable() checks present in bpf_token_capable() followed
by a BPF op check, complete with token, in the security_bpf_token()
hook.  Further assuming that we convert the bpf_token_new_fd() to use
anon_inode_getfd_secure() instead of anon_inode_getfd() and the
security_bpf_token() could still access the token fd via the bpf_attr
struct I think we could do something like this for the SELinux case
(more rough pseudo code):

  int selinux_bpf_token(...)
  {
    ssid = current_sid();
    if (token) {
      /* this could be simplified with better integration
       * in bpf_token_get_from_fd() */
      fd = fdget(attr->prog_token_fd);
      inode = file_inode(fd.file);
      isec = selinux_inode(inode);
      tsid = isec->sid;
      fdput(fd);
    } else
      tsid = ssid;
    switch(cmd) {
    ...
    case BPF_PROG_LOAD:
      rc = avc_has_perm(ssid, tsid, SECCLAS_BPF, BPF__PROG_LOAD);
      break;
    default:
      rc = 0;
    }
    return rc;
  }

This would preserve the current behaviour when a token was not present:

 allow @current @current : bpf { prog_load }

... but this would change to the following if a token was present:

 allow @current @DELEGATED_ANON_INODE : bpf { prog_load }

That seems reasonable to me, but I've CC'd the SELinux list on this so
others can sanity check the above :)

> But I'd like to keep all that outside of the BPF token feature itself,
> as it's already pretty hard to get a consensus just on those bits, so
> complicating this with simultaneously designing a new set of LSM hooks
> is something that we should avoid. Let's keep discussing this, but not
> block that on BPF token.

The unfortunate aspect of disconnecting new functionality from the
associated access controls is that it introduces a gap where the new
functionality is not secured in a manner that users expect.  There are
billions of systems/users that rely on LSM-based access controls for a
large part of their security story, and I think we are doing them a
disservice by not including the LSM controls with new security
significant features.

We (the LSM folks) are happy to work with you to get this sorted out,
and I would hope my comments in this thread (as well as prior
iterations) and the rough design above is a good faith indicator of
that.

> > > But also see bpf_prog_load(). There are two checks, allow_prog_type
> > > and allow_attach_type, which are really only meaningful in
> > > combination. And yet you'd have to have two separate LSM hooks for
> > > that.
> > >
> > > So I feel like the better approach is less mechanistically
> > > concentrating on BPF token operations themselves, but rather on more
> > > semantically meaningful operations that are token-enabled. E.g.,
> > > protect BPF program loading, BPF map creation, BTF loading, etc. And
> > > we do have such LSM hooks right now, though they might not be the most
> > > convenient. So perhaps the right move is to add new ones that would
> > > provide a bit more context (e.g., we can pass in the BPF token that
> > > was used for the operation, attributes with which map/prog was
> > > created, etc). Low-level token LSMs seem hard to use cohesively in
> > > practice, though.
> >
> > Can you elaborate a bit more?  It's hard to judge the comments above
> > without some more specifics about hook location, parameters, etc.
>
> So something like my above proposal for a new LSM hook in
> BPF_PROG_LOAD command. Just right before passing bpf_prog to BPF
> verifier, we can have
>
> err = security_bpf_prog_load(prog, attr, token)
> if (err)
>     return -EPERM;
>
> Program, attributes, and token give a lot of inputs for security
> policy logic to make a decision about allowing that specific BPF
> program to be verified and loaded or not. I know how this could be
> used from BPF LSM side, but I assume that SELinux and others can take
> advantage of that provided additional context as well.

If you think a populated bpf_prog struct is important for BPF LSM
programs then I have no problem with that hook placement.  It's a lot
later in the process than we might normally want to place the hook,
but we can still safely error out here so that should be okay.

From a LSM perspective I think we can make either work, I think the
big question is which would you rather have in the BPF code: the
security_bpf_prog_load() hook you've suggested here or the
security_bpf_token() hook I suggested above?

> Similarly we can have a BPF_MAP_CREATE-specific LSM hook with context
> relevant to creating a BPF map. And so on.

Of course.  I've been operating under the assumption that whatever we
do for one op we should be able to apply the same idea to the others
that need it.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-21 22:18           ` Paul Moore
@ 2023-09-22  9:27             ` Paul Moore
  2023-09-22 22:35             ` Andrii Nakryiko
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Moore @ 2023-09-22  9:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun, selinux

On Thu, Sep 21, 2023 at 6:18 PM Paul Moore <paul@paul-moore.com> wrote:
>

...

> Typically the LSM hook call sites end up being in the same general
> area as the capability checks, usually just after (we want the normal
> Linux discretionary access controls to always come first for the sake
> of consistency).  Sticking with that approach it looks like we would
> end up with a LSM call in bpf_prog_load() right after bpf_capable()
> call, the only gotcha with that is the bpf_prog struct isn't populated
> yet, but how important is that when we have the bpf_attr info (honest
> question, I don't know the answer to this)?
>
> Ignoring the bpf_prog struct, do you think something like this would
> work for a hook call site (please forgive the pseudo code)?
>
>   int bpf_prog_load(...)
>   {
>          ...
>      bpf_cap = bpf_token_capable(token, CAP_BPF);
>      err = security_bpf_token(BPF_PROG_LOAD, attr, uattr_size, token);
>      if (err)
>        return err;
>     ...
>   }
>
> Assuming this type of hook configuration, and an empty/passthrough
> security_bpf() hook, a LSM would first see the various
> capable()/ns_capable() checks present in bpf_token_capable() followed
> by a BPF op check, complete with token, in the security_bpf_token()
> hook.  Further assuming that we convert the bpf_token_new_fd() to use
> anon_inode_getfd_secure() instead of anon_inode_getfd() and the
> security_bpf_token() could still access the token fd via the bpf_attr
> struct I think we could do something like this for the SELinux case
> (more rough pseudo code):
>
>   int selinux_bpf_token(...)
>   {
>     ssid = current_sid();
>     if (token) {
>       /* this could be simplified with better integration
>        * in bpf_token_get_from_fd() */
>       fd = fdget(attr->prog_token_fd);
>       inode = file_inode(fd.file);
>       isec = selinux_inode(inode);
>       tsid = isec->sid;
>       fdput(fd);
>     } else
>       tsid = ssid;
>     switch(cmd) {
>     ...
>     case BPF_PROG_LOAD:
>       rc = avc_has_perm(ssid, tsid, SECCLAS_BPF, BPF__PROG_LOAD);
>       break;
>     default:
>       rc = 0;
>     }
>     return rc;
>   }
>
> This would preserve the current behaviour when a token was not present:
>
>  allow @current @current : bpf { prog_load }
>
> ... but this would change to the following if a token was present:
>
>  allow @current @DELEGATED_ANON_INODE : bpf { prog_load }
>
> That seems reasonable to me, but I've CC'd the SELinux list on this so
> others can sanity check the above :)

I thought it might be helpful to add a bit more background on my
thinking for the SELinux folks, especially since the object label used
in the example above is a bit unusual.  As a reminder, the object
label in the delegated case is not the current domain as it is now for
standard BPF program loads, it is the label of the BPF delegation
token (anonymous inode) that is created by the process/orchestrator
that manages the namespace and explicitly enabled the BPF privilege
delegation.  The BPF token can be labeled using the existing anonymous
inode transition rules.

First off I decided to reuse the existing permission so as not to
break current policies.  We can always separate the PROG_LOAD
permission into a standard and delegated permission if desired, but I
believe we would need to gate that with a policy capability and
preserve some form of access control for the legacy PROG_LOAD-only
case.

Preserving the PROG_LOAD permission does present a challenge with
respect to differentiating the delegated program load from a normal
program load while ensuring that existing policies continue to work
and delegated operations require explicit policy adjustments.
Changing the object label in the delegated case was the only approach
I could think of that would satisfy all of these constraints, but I'm
open to other ideas, tweaks, etc. and I would love to get some other
opinions on this.

Thoughts?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-21 22:18           ` Paul Moore
  2023-09-22  9:27             ` Paul Moore
@ 2023-09-22 22:35             ` Andrii Nakryiko
  2023-09-26 21:32               ` Paul Moore
  2023-10-10 21:19               ` Paul Moore
  1 sibling, 2 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-09-22 22:35 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun, selinux

On Thu, Sep 21, 2023 at 3:18 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Fri, Sep 15, 2023 at 4:59 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Thu, Sep 14, 2023 at 5:55 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Thu, Sep 14, 2023 at 1:31 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > >
> > > > > On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > > > unprivileged process, all while have a good amount of control over which
> > > > > > privileged operations could be performed using provided BPF token.
> > > > > >
> > > > > > This is achieved through mounting BPF FS instance with extra delegation
> > > > > > mount options, which determine what operations are delegatable, and also
> > > > > > constraining it to the owning user namespace (as mentioned in the
> > > > > > previous patch).
> > > > > >
> > > > > > BPF token itself is just a derivative from BPF FS and can be created
> > > > > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > > > > a path specification (using the usual fd + string path combo) to a BPF
> > > > > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > > > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > > > > having an BPF token as a separate object with its own FD, we can allow
> > > > > > to further restrict BPF token's allowable set of things either at the creation
> > > > > > time or after the fact, allowing the process to guard itself further
> > > > > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > > > > programs. But for now we keep things simple and just copy bit sets as is.
> > > > > >
> > > > > > When BPF token is created from BPF FS mount, we take reference to the
> > > > > > BPF super block's owning user namespace, and then use that namespace for
> > > > > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > > > > capabilities that are normally only checked against init userns (using
> > > > > > capable()), but now we check them using ns_capable() instead (if BPF
> > > > > > token is provided). See bpf_token_capable() for details.
> > > > > >
> > > > > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > > > > functionality. User namespaced process has to *also* have necessary
> > > > > > combination of capabilities inside that user namespace. So while
> > > > > > previously CAP_BPF was useless when granted within user namespace, now
> > > > > > it gains a meaning and allows container managers and sys admins to have
> > > > > > a flexible control over which processes can and need to use BPF
> > > > > > functionality within the user namespace (i.e., container in practice).
> > > > > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > > > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > > > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > > > > >
> > > > > > The alternative to creating BPF token object was:
> > > > > >   a) not having any extra object and just pasing BPF FS path to each
> > > > > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > > > > >      under the same path might change in between checking it and using it
> > > > > >      for bpf() command). And also less flexible if we'd like to further
> > > > > >      restrict ourselves compared to all the delegated functionality
> > > > > >      allowed on BPF FS.
> > > > > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > > > > >      a dedicated FD that would represent a token-like functionality. This
> > > > > >      doesn't seem superior to having a proper bpf() command, so
> > > > > >      BPF_TOKEN_CREATE was chosen.
> > > > > >
> > > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > > ---
> > > > > >  include/linux/bpf.h            |  36 +++++++
> > > > > >  include/uapi/linux/bpf.h       |  39 +++++++
> > > > > >  kernel/bpf/Makefile            |   2 +-
> > > > > >  kernel/bpf/inode.c             |   4 +-
> > > > > >  kernel/bpf/syscall.c           |  17 +++
> > > > > >  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
> > > > > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > > > > >  7 files changed, 324 insertions(+), 2 deletions(-)
> > > > > >  create mode 100644 kernel/bpf/token.c
> > > > >
> > > > > ...
> > > > >
> > > > > > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > > > > > new file mode 100644
> > > > > > index 000000000000..f6ea3eddbee6
> > > > > > --- /dev/null
> > > > > > +++ b/kernel/bpf/token.c
> > > > > > @@ -0,0 +1,189 @@
> > > > > > +#include <linux/bpf.h>
> > > > > > +#include <linux/vmalloc.h>
> > > > > > +#include <linux/anon_inodes.h>
> > > > > > +#include <linux/fdtable.h>
> > > > > > +#include <linux/file.h>
> > > > > > +#include <linux/fs.h>
> > > > > > +#include <linux/kernel.h>
> > > > > > +#include <linux/idr.h>
> > > > > > +#include <linux/namei.h>
> > > > > > +#include <linux/user_namespace.h>
> > > > > > +
> > > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > > +{
> > > > > > +     /* BPF token allows ns_capable() level of capabilities */
> > > > > > +     if (token) {
> > > > > > +             if (ns_capable(token->userns, cap))
> > > > > > +                     return true;
> > > > > > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > > > +                     return true;
> > > > > > +     }
> > > > > > +     /* otherwise fallback to capable() checks */
> > > > > > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > > +}
> > > > >
> > > > > While the above looks to be equivalent to the bpf_capable() function it
> > > > > replaces, for callers checking CAP_BPF and CAP_SYS_ADMIN, I'm looking
> > > > > quickly at patch 3/12 and this is also being used to replace a
> > > > > capable(CAP_NET_ADMIN) call which results in a change in behavior.
> > > > > The current code which performs a capable(CAP_NET_ADMIN) check cannot
> > > > > be satisfied by CAP_SYS_ADMIN, but this patchset using
> > > > > bpf_token_capable(token, CAP_NET_ADMIN) can be satisfied by either
> > > > > CAP_NET_ADMIN or CAP_SYS_ADMIN.
> > > > >
> > > > > It seems that while bpf_token_capable() can be used as a replacement
> > > > > for bpf_capable(), it is not currently a suitable replacement for a
> > > > > generic capable() call.  Perhaps this is intentional, but I didn't see
> > > > > it mentioned in the commit description, or in the comments, and I
> > > > > wanted to make sure it wasn't an oversight.
> > > >
> > > > You are right. It is an intentional attempt to unify all such checks.
> > > > If you look at bpf_prog_load(), we have this:
> > > >
> > > > if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) &&
> > > > !capable(CAP_SYS_ADMIN))
> > > >     return -EPERM;
> > > >
> > > > So seeing that, I realized that we did have an intent to always use
> > > > CAP_SYS_ADMIN as a "fallback" cap, even for CAP_NET_ADMIN when it
> > > > comes to using network-enabled BPF programs. So I decided that
> > > > unifying all this makes sense.
> > > >
> > > > I'll add a comment mentioning this, I should have been more explicit
> > > > from the get go.
> > >
> > > Thanks for the clarification.  I'm not to worried about checking
> > > CAP_SYS_ADMIN as a fallback, but I always get a little twitchy when I
> > > see capability changes in the code without any mention.
> > >
> > > A mention in the commit description is good, and you could also draft
> > > up a standalone patch that adds the CAP_SYS_ADMIN fallback to the
> > > current in-tree code.  That would be a good way to really highlight
> > > the capability changes and deal with any issues that might arise
> > > (review, odd corner cases?, etc.) prior to the BPF capability
> > > delegation patcheset we are discussing here.
> >
> > Sure, sounds good, I'll add this as a pre-patch for next revision.
>
> My apologies on the delay, I've been traveling this week and haven't
> had the time to dig back into this.
>

No worries, lots of conferences are happening right now, so I expected
people to be unavailable.

> I do see that you've posted another revision of this patchset with the
> capability pre-patch, thanks for doing that.

Yep, hopefully networking folks won't be opposed and we can streamline
all that a bit.

>
> > > > > > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > > > > > +
> > > > > > +/* Alloc anon_inode and FD for prepared token.
> > > > > > + * Returns fd >= 0 on success; negative error, otherwise.
> > > > > > + */
> > > > > > +int bpf_token_new_fd(struct bpf_token *token)
> > > > > > +{
> > > > > > +     return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> > > > > > +}
> > > > > > +
> > > > > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > > > > +{
> > > > > > +     struct fd f = fdget(ufd);
> > > > > > +     struct bpf_token *token;
> > > > > > +
> > > > > > +     if (!f.file)
> > > > > > +             return ERR_PTR(-EBADF);
> > > > > > +     if (f.file->f_op != &bpf_token_fops) {
> > > > > > +             fdput(f);
> > > > > > +             return ERR_PTR(-EINVAL);
> > > > > > +     }
> > > > > > +
> > > > > > +     token = f.file->private_data;
> > > > > > +     bpf_token_inc(token);
> > > > > > +     fdput(f);
> > > > > > +
> > > > > > +     return token;
> > > > > > +}
> > > > > > +
> > > > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > > > > +{
> > > > > > +     if (!token)
> > > > > > +             return false;
> > > > > > +
> > > > > > +     return token->allowed_cmds & (1ULL << cmd);
> > > > > > +}
> > > > >
> > > > > I mentioned this a while back, likely in the other threads where this
> > > > > token-based approach was only being discussed in general terms, but I
> > > > > think we want to have a LSM hook at the point of initial token
> > > > > delegation for this and a hook when the token is used.  My initial
> > > > > thinking is that we should be able to address the former with a hook
> > > > > in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> > > > > or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> > > > > but it doesn't allow for much in the way of granularity.  Inserting the
> > > > > LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> > > > > gracefully fallback to the system-wide checks if the LSM denied the
> > > > > requested access whereas an access denial in bpf_token_get_from_fd()
> > > > > denial would cause the operation to error out.
> > > >
> > > > I think the bpf_fill_super() LSM hook makes sense, but I thought
> > > > someone mentioned that we already have some generic LSM hook for
> > > > validating mounts? If we don't, I can certainly add one for BPF FS
> > > > specifically.
> > >
> > > We do have security_sb_mount(), but that is a generic mount operation
> > > access control and not well suited for controlling the mount-based
> > > capability delegation that you are proposing here.  However, if you or
> > > someone else has a clever way to make security_sb_mount() work for
> > > this purpose I would be very happy to review that code.
> >
> > To be honest, I'm a bit out of my depth here, as I don't know the
> > mounting parts well. Perhaps someone from VFS side can advise. But
> > regardless, I have no problem adding a new LSM hook as well, ideally
> > not very BPF-specific. If you have a specific form of it in mind, I'd
> > be curious to see it and implement it.
>
> I agree that there can be benefits to generalized LSM hooks, but in
> this hook I think it may need to be BPF specific simply because the
> hook would be dealing with the specific concept of delegating BPF
> permissions.

Sure. As an alternative, if this is about controlling BPF delegation,
instead of doing mount-time checks and LSM hook, perhaps we can add a
new LSM hook to BPF_CREATE_TOKEN, just like we have ones for
BPF_MAP_CREATE and BPF_PROG_LOAD. That will enable controlling
delegation more directly when it is actually attempted to be used.

>
> I haven't taken the time to write up any hook patches yet as I wanted
> to discuss it with you and the others on the To/CC line, but it seems
> like we are roughly on the same page, at least with the initial
> delegation hook, so I can put something together if you aren't
> comfortable working on this (more on this below) ...

I'd appreciate the help from the SELinux side specifically, yes. I'm
absolutely OK to add a few new LSM hooks, though.

>
> > > > As for the bpf_token_allow_xxx(). This feels a bit too specific and
> > > > narrow-focused. What if we later add yet another dimension for BPF FS
> > > > and token? Do we need to introduce yet another LSM for each such case?
> > >
> > > [I'm assuming you meant new LSM *hook*]
> >
> > yep, of course, sorry about using terminology sloppily
> >
> > >
> > > Possibly.  There are also some other issues which I've been thinking
> > > about along these lines, specifically the fact that the
> > > capability/command delegation happens after the existing
> > > security_bpf() hook is called which makes things rather awkward from a
> > > LSM perspective: the LSM would first need to allow the process access
> > > to the desired BPF op using it's current LSM specific security
> > > attributes (e.g. SELinux security domain, etc.) and then later
> > > consider the op in the context of the delegated access control rights
> > > (if the LSM decides to support those hooks).
> > >
> > > I suspect that if we want to make this practical we would need to
> > > either move some of the token code up into __sys_bpf() so we could
> > > have a better interaction with security_bpf(), or we need to consider
> > > moving the security_bpf() call into the op specific functions.  I'm
> > > still thinking on this (lots of reviews to get through this week), but
> > > I'm hoping there is a better way because I'm not sure I like either
> > > option very much.
> >
> > Yes, security_bpf() is happening extremely early and is lacking a lot
> > of context. I'm not sure if moving it around is a good idea as it
> > basically changes its semantics.
>
> There are a couple of things that make this not quite as scary as it
> may seem.  The first is that currently only SELinux implements a
> security_bpf() hook and the implementation is rather simplistic in
> terms of what information it requires to perform the existing access
> controls; decomposing the single security_bpf() call site into
> multiple op specific calls, perhaps with some op specific hooks,
> should be doable without causing major semantic changes.  The second
> thing is that we could augment the existing security_bpf() hook and
> call site with a new LSM hook(s) that are called from the op specific
> call sites; this would allow those LSMs that desire the current
> semantics to use the existing security_bpf() hook and those that wish
> to use the new semantics could implement the new hook(s).  This is
> very similar to the pathname-based and inode-based hooks in the VFS
> layer, some LSMs choose to implement pathname-based security and use
> one set of hooks, while others implement a label-based security
> mechanism and use a different set of hooks.
>

Agreed. I think new LSM hooks that are operation-specific make a lot
of sense. I'd probably not touch existing security_bpf(), it's an
early-entry LSM hook for anything bpf() syscall-specific. This might
be very useful in some cases, probably.

> > But adding a new set of coherent LSM
> > hooks per each appropriate BPF operation with good context to make
> > decisions sounds like a good improvement. E.g., for BPF_PROG_LOAD, we
> > can have LSM hook after struct bpf_prog is allocated, bpf_token is
> > available, attributes are sanity checked. All that together is a very
> > useful and powerful context that can be used both by more fixed LSM
> > policies (like SELinux), and very dynamic user-defined BPF LSM
> > programs.
>
> This is where it is my turn to mention that I'm getting a bit out of
> my depth, but I'm hopeful that the two of us can keep each other from
> drowning :)
>
> Typically the LSM hook call sites end up being in the same general
> area as the capability checks, usually just after (we want the normal
> Linux discretionary access controls to always come first for the sake
> of consistency).  Sticking with that approach it looks like we would
> end up with a LSM call in bpf_prog_load() right after bpf_capable()
> call, the only gotcha with that is the bpf_prog struct isn't populated
> yet, but how important is that when we have the bpf_attr info (honest
> question, I don't know the answer to this)?

Ok, so I agree in general about having LSM hooks close to capability
checks, but at least specifically for BPF_PROG_CREATE, it won't work.
This bpf_capable() check you mention. This is just one check. If you
look into bpf_prog_load() in kernel/bpf/syscall.c, you'll see that we
can also check CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, in
addition to CAP_BPF, based on various aspects (like program type +
subtype). So for such a complex BPF_PROG_CREATE operation I think we
should deviate a bit and place LSM in a logical place that would
enable doing LSM enforcement with lots of relevant information, but
before doing anything dangerous or expensive.

For BPF_PROG_LOAD that place seems to be right before bpf_check(),
which is BPF verification. By that time we did a bunch of different
sanity checks, resolved various things (like found another bpf_program
to attach to, if requested), copied user-provided strings (e.g.,
license), etc, etc. That's tons of good stuff to be used for either
audit or enforcement.

This also answers your question about why bpf_attr isn't enough.
bpf_attr has various FDs, data pointers (program instructions),
strings, etc, etc. All of that might be a) inconvenient to fetch (at
least from BPF LSM) and/or b) racy (e.g., FD can get changed between
the check and actual usage). So while bpf_attr is useful as an input,
ideally we'd augment it with bpf_prog and bpf_token.

Right now we have `security_bpf_prog_alloc(prog->aux);`, which is
almost in the ideal place, but provides prog->aux instead of program
itself (not sure why), and doesn't provide bpf_attr and bpf_token.

So I'm thinking that maybe we get rid of bpf_prog_alloc() in favor of
new security_bpf_prog_load(prog, &attr, token)?

>
> Ignoring the bpf_prog struct, do you think something like this would
> work for a hook call site (please forgive the pseudo code)?
>
>   int bpf_prog_load(...)
>   {
>          ...
>      bpf_cap = bpf_token_capable(token, CAP_BPF);
>      err = security_bpf_token(BPF_PROG_LOAD, attr, uattr_size, token);
>      if (err)
>        return err;
>     ...
>   }
>

See above, I think this should be program-centric, not token-centric.


> Assuming this type of hook configuration, and an empty/passthrough
> security_bpf() hook, a LSM would first see the various
> capable()/ns_capable() checks present in bpf_token_capable() followed
> by a BPF op check, complete with token, in the security_bpf_token()
> hook.  Further assuming that we convert the bpf_token_new_fd() to use
> anon_inode_getfd_secure() instead of anon_inode_getfd() and the
> security_bpf_token() could still access the token fd via the bpf_attr

wouldn't this be a race to read FD from bpf_attr for LSM, and then
separately read it again in bpf_prog_load()? That seems like TOCTOU to
me? As I mentioned above, I think bpf_token should be provided as an
argument after it was "extracted" from FD in one place.

> struct I think we could do something like this for the SELinux case
> (more rough pseudo code):
>
>   int selinux_bpf_token(...)
>   {
>     ssid = current_sid();
>     if (token) {
>       /* this could be simplified with better integration
>        * in bpf_token_get_from_fd() */
>       fd = fdget(attr->prog_token_fd);
>       inode = file_inode(fd.file);
>       isec = selinux_inode(inode);
>       tsid = isec->sid;
>       fdput(fd);
>     } else
>       tsid = ssid;
>     switch(cmd) {
>     ...
>     case BPF_PROG_LOAD:
>       rc = avc_has_perm(ssid, tsid, SECCLAS_BPF, BPF__PROG_LOAD);
>       break;
>     default:
>       rc = 0;
>     }
>     return rc;
>   }
>
> This would preserve the current behaviour when a token was not present:
>
>  allow @current @current : bpf { prog_load }
>
> ... but this would change to the following if a token was present:
>
>  allow @current @DELEGATED_ANON_INODE : bpf { prog_load }
>
> That seems reasonable to me, but I've CC'd the SELinux list on this so
> others can sanity check the above :)

doesn't seem like using anon_inode_getfd_secure() should be a big deal

>
> > But I'd like to keep all that outside of the BPF token feature itself,
> > as it's already pretty hard to get a consensus just on those bits, so
> > complicating this with simultaneously designing a new set of LSM hooks
> > is something that we should avoid. Let's keep discussing this, but not
> > block that on BPF token.
>
> The unfortunate aspect of disconnecting new functionality from the
> associated access controls is that it introduces a gap where the new
> functionality is not secured in a manner that users expect.  There are
> billions of systems/users that rely on LSM-based access controls for a
> large part of their security story, and I think we are doing them a
> disservice by not including the LSM controls with new security
> significant features.
>
> We (the LSM folks) are happy to work with you to get this sorted out,
> and I would hope my comments in this thread (as well as prior
> iterations) and the rough design above is a good faith indicator of
> that.

I'd be happy to collaborate on designing proper LSM hooks around all
this (which is what we are doing right now, I believe). I'm just
trying to think pragmatically how this should all work logistically.
This patch set gets the BPF token concept into the kernel. But there
is more work to do in libbpf and other supporting infrastructure to
make proper use of it. So I'm just trying to avoid going too broad
with this patch set.

But if you'd be ok to converge on the design of BPF token-enabled LSM
hooks for bpf() syscall here, I'm happy to implement them. I'm not
feeling confident enough to do SELinux work on top, though, so that's
where I'd appreciate help. If LSM folks would be willing to add
SELinux interface on top of LSM hooks, we'd be able to parallelize
this work with me finishing libbpf and user-space parts, while you or
someone else finalizes the SELinux details.

How does that sound to you?

>
> > > > But also see bpf_prog_load(). There are two checks, allow_prog_type
> > > > and allow_attach_type, which are really only meaningful in
> > > > combination. And yet you'd have to have two separate LSM hooks for
> > > > that.
> > > >
> > > > So I feel like the better approach is less mechanistically
> > > > concentrating on BPF token operations themselves, but rather on more
> > > > semantically meaningful operations that are token-enabled. E.g.,
> > > > protect BPF program loading, BPF map creation, BTF loading, etc. And
> > > > we do have such LSM hooks right now, though they might not be the most
> > > > convenient. So perhaps the right move is to add new ones that would
> > > > provide a bit more context (e.g., we can pass in the BPF token that
> > > > was used for the operation, attributes with which map/prog was
> > > > created, etc). Low-level token LSMs seem hard to use cohesively in
> > > > practice, though.
> > >
> > > Can you elaborate a bit more?  It's hard to judge the comments above
> > > without some more specifics about hook location, parameters, etc.
> >
> > So something like my above proposal for a new LSM hook in
> > BPF_PROG_LOAD command. Just right before passing bpf_prog to BPF
> > verifier, we can have
> >
> > err = security_bpf_prog_load(prog, attr, token)
> > if (err)
> >     return -EPERM;
> >
> > Program, attributes, and token give a lot of inputs for security
> > policy logic to make a decision about allowing that specific BPF
> > program to be verified and loaded or not. I know how this could be
> > used from BPF LSM side, but I assume that SELinux and others can take
> > advantage of that provided additional context as well.
>
> If you think a populated bpf_prog struct is important for BPF LSM
> programs then I have no problem with that hook placement.  It's a lot
> later in the process than we might normally want to place the hook,
> but we can still safely error out here so that should be okay.
>
> From a LSM perspective I think we can make either work, I think the
> big question is which would you rather have in the BPF code: the
> security_bpf_prog_load() hook you've suggested here or the
> security_bpf_token() hook I suggested above?

I think security_bpf_prog_load() makes more sense, as I tried to lay
out above. But you know, it's not set in stone yet, so let's try to
converge. I tried to elaborate on why security_bpf_prog_load() and
then separately security_bpf_map_create(), etc make most sense. For
BPF LSM, having pointer to struct bpf_prog* and struct bpf_token* is a
huge benefit compared to trying to somehow get them out of union
bpf_attr. Same for map creation or any other BPF object that bpf()
syscall operates on.

>
> > Similarly we can have a BPF_MAP_CREATE-specific LSM hook with context
> > relevant to creating a BPF map. And so on.
>
> Of course.  I've been operating under the assumption that whatever we
> do for one op we should be able to apply the same idea to the others
> that need it.
>

Awesome, yep.

> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-22 22:35             ` Andrii Nakryiko
@ 2023-09-26 21:32               ` Paul Moore
  2023-10-10 21:19               ` Paul Moore
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Moore @ 2023-09-26 21:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun, selinux

On Fri, Sep 22, 2023 at 6:35 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> No worries, lots of conferences are happening right now, so I expected
> people to be unavailable.

Just a quick note to let you know that my network access is still
limited, but I appreciate the understanding and the detail in your
reply; I'll get you a proper response next week.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-09-22 22:35             ` Andrii Nakryiko
  2023-09-26 21:32               ` Paul Moore
@ 2023-10-10 21:19               ` Paul Moore
  2023-10-12  0:32                 ` Andrii Nakryiko
  1 sibling, 1 reply; 28+ messages in thread
From: Paul Moore @ 2023-10-10 21:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun, selinux

On Fri, Sep 22, 2023 at 6:35 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Thu, Sep 21, 2023 at 3:18 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Fri, Sep 15, 2023 at 4:59 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Thu, Sep 14, 2023 at 5:55 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Thu, Sep 14, 2023 at 1:31 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > >
> > > > > > On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

...

> > > > > > I mentioned this a while back, likely in the other threads where this
> > > > > > token-based approach was only being discussed in general terms, but I
> > > > > > think we want to have a LSM hook at the point of initial token
> > > > > > delegation for this and a hook when the token is used.  My initial
> > > > > > thinking is that we should be able to address the former with a hook
> > > > > > in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> > > > > > or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> > > > > > but it doesn't allow for much in the way of granularity.  Inserting the
> > > > > > LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> > > > > > gracefully fallback to the system-wide checks if the LSM denied the
> > > > > > requested access whereas an access denial in bpf_token_get_from_fd()
> > > > > > denial would cause the operation to error out.
> > > > >
> > > > > I think the bpf_fill_super() LSM hook makes sense, but I thought
> > > > > someone mentioned that we already have some generic LSM hook for
> > > > > validating mounts? If we don't, I can certainly add one for BPF FS
> > > > > specifically.
> > > >
> > > > We do have security_sb_mount(), but that is a generic mount operation
> > > > access control and not well suited for controlling the mount-based
> > > > capability delegation that you are proposing here.  However, if you or
> > > > someone else has a clever way to make security_sb_mount() work for
> > > > this purpose I would be very happy to review that code.
> > >
> > > To be honest, I'm a bit out of my depth here, as I don't know the
> > > mounting parts well. Perhaps someone from VFS side can advise. But
> > > regardless, I have no problem adding a new LSM hook as well, ideally
> > > not very BPF-specific. If you have a specific form of it in mind, I'd
> > > be curious to see it and implement it.
> >
> > I agree that there can be benefits to generalized LSM hooks, but in
> > this hook I think it may need to be BPF specific simply because the
> > hook would be dealing with the specific concept of delegating BPF
> > permissions.
>
> Sure. As an alternative, if this is about controlling BPF delegation,
> instead of doing mount-time checks and LSM hook, perhaps we can add a
> new LSM hook to BPF_CREATE_TOKEN, just like we have ones for
> BPF_MAP_CREATE and BPF_PROG_LOAD. That will enable controlling
> delegation more directly when it is actually attempted to be used.

I'm also going to reply to the v6 patchset, but I thought there were
some important points in this thread that were worth responding to
here so that it would have the context of our previous discussion.

So yes, from an LSM perspective we are concerned with who grants the
delegation (creates the token) and who leverages that token to do
work.  When this patchset was still using anonymous inodes, marking
and controlling token creation was relatively easy as we have existing
hooks/control-points for anonymous inodes which take into account the
anonymous inode class/type, e.g. bpffs.  Now that this patchset is
using a regular bpffs inode we may need to do some additional work so
that we can mark the bpffs token inode as a "token" so that we can
later distinguish it from an ordinary bpffs inode; it might also serve
as a convenient place to control creation of the token, but as you
have already mentioned we could also control this from the existing
security_bpf(BPF_CREATE_TOKEN, ...) hook at the top of __sys_bpf().

Anyway, more on this in the v6 patchset.

> > I haven't taken the time to write up any hook patches yet as I wanted
> > to discuss it with you and the others on the To/CC line, but it seems
> > like we are roughly on the same page, at least with the initial
> > delegation hook, so I can put something together if you aren't
> > comfortable working on this (more on this below) ...
>
> I'd appreciate the help from the SELinux side specifically, yes. I'm
> absolutely OK to add a few new LSM hooks, though.

I just want to say again that I'm very happy we can work together to
make sure everything is covered :)

> > > > > As for the bpf_token_allow_xxx(). This feels a bit too specific and
> > > > > narrow-focused. What if we later add yet another dimension for BPF FS
> > > > > and token? Do we need to introduce yet another LSM for each such case?
> > > >
> > > > [I'm assuming you meant new LSM *hook*]
> > >
> > > yep, of course, sorry about using terminology sloppily
> > >
> > > > Possibly.  There are also some other issues which I've been thinking
> > > > about along these lines, specifically the fact that the
> > > > capability/command delegation happens after the existing
> > > > security_bpf() hook is called which makes things rather awkward from a
> > > > LSM perspective: the LSM would first need to allow the process access
> > > > to the desired BPF op using it's current LSM specific security
> > > > attributes (e.g. SELinux security domain, etc.) and then later
> > > > consider the op in the context of the delegated access control rights
> > > > (if the LSM decides to support those hooks).
> > > >
> > > > I suspect that if we want to make this practical we would need to
> > > > either move some of the token code up into __sys_bpf() so we could
> > > > have a better interaction with security_bpf(), or we need to consider
> > > > moving the security_bpf() call into the op specific functions.  I'm
> > > > still thinking on this (lots of reviews to get through this week), but
> > > > I'm hoping there is a better way because I'm not sure I like either
> > > > option very much.
> > >
> > > Yes, security_bpf() is happening extremely early and is lacking a lot
> > > of context. I'm not sure if moving it around is a good idea as it
> > > basically changes its semantics.
> >
> > There are a couple of things that make this not quite as scary as it
> > may seem.  The first is that currently only SELinux implements a
> > security_bpf() hook and the implementation is rather simplistic in
> > terms of what information it requires to perform the existing access
> > controls; decomposing the single security_bpf() call site into
> > multiple op specific calls, perhaps with some op specific hooks,
> > should be doable without causing major semantic changes.  The second
> > thing is that we could augment the existing security_bpf() hook and
> > call site with a new LSM hook(s) that are called from the op specific
> > call sites; this would allow those LSMs that desire the current
> > semantics to use the existing security_bpf() hook and those that wish
> > to use the new semantics could implement the new hook(s).  This is
> > very similar to the pathname-based and inode-based hooks in the VFS
> > layer, some LSMs choose to implement pathname-based security and use
> > one set of hooks, while others implement a label-based security
> > mechanism and use a different set of hooks.
>
> Agreed. I think new LSM hooks that are operation-specific make a lot
> of sense. I'd probably not touch existing security_bpf(), it's an
> early-entry LSM hook for anything bpf() syscall-specific. This might
> be very useful in some cases, probably.
>
> > > But adding a new set of coherent LSM
> > > hooks per each appropriate BPF operation with good context to make
> > > decisions sounds like a good improvement. E.g., for BPF_PROG_LOAD, we
> > > can have LSM hook after struct bpf_prog is allocated, bpf_token is
> > > available, attributes are sanity checked. All that together is a very
> > > useful and powerful context that can be used both by more fixed LSM
> > > policies (like SELinux), and very dynamic user-defined BPF LSM
> > > programs.
> >
> > This is where it is my turn to mention that I'm getting a bit out of
> > my depth, but I'm hopeful that the two of us can keep each other from
> > drowning :)
> >
> > Typically the LSM hook call sites end up being in the same general
> > area as the capability checks, usually just after (we want the normal
> > Linux discretionary access controls to always come first for the sake
> > of consistency).  Sticking with that approach it looks like we would
> > end up with a LSM call in bpf_prog_load() right after bpf_capable()
> > call, the only gotcha with that is the bpf_prog struct isn't populated
> > yet, but how important is that when we have the bpf_attr info (honest
> > question, I don't know the answer to this)?
>
> Ok, so I agree in general about having LSM hooks close to capability
> checks, but at least specifically for BPF_PROG_CREATE, it won't work.
> This bpf_capable() check you mention. This is just one check. If you
> look into bpf_prog_load() in kernel/bpf/syscall.c, you'll see that we
> can also check CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, in
> addition to CAP_BPF, based on various aspects (like program type +
> subtype).

That's a fair point.

> So for such a complex BPF_PROG_CREATE operation I think we
> should deviate a bit and place LSM in a logical place that would
> enable doing LSM enforcement with lots of relevant information, but
> before doing anything dangerous or expensive.
>
> For BPF_PROG_LOAD that place seems to be right before bpf_check(),
> which is BPF verification ...

> ... Right now we have `security_bpf_prog_alloc(prog->aux);`, which is
> almost in the ideal place, but provides prog->aux instead of program
> itself (not sure why), and doesn't provide bpf_attr and bpf_token.
>
> So I'm thinking that maybe we get rid of bpf_prog_alloc() in favor of
> new security_bpf_prog_load(prog, &attr, token)?

That sounds reasonable.  We'll need to make sure we update the docs
for that LSM hook to indicate that it performs both allocation of the
LSM's BPF program state (it's current behavior), as well as access
control for BPF program loads both with and without delegation.

I think those are the big points worth wrapping up here in this
thread, I'll move the rest over to the v6 patchset.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/12] bpf: introduce BPF token object
  2023-10-10 21:19               ` Paul Moore
@ 2023-10-12  0:32                 ` Andrii Nakryiko
  0 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2023-10-12  0:32 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, linux-fsdevel, linux-security-module,
	keescook, brauner, lennart, kernel-team, sargun, selinux

On Tue, Oct 10, 2023 at 2:20 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Fri, Sep 22, 2023 at 6:35 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Thu, Sep 21, 2023 at 3:18 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Fri, Sep 15, 2023 at 4:59 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Thu, Sep 14, 2023 at 5:55 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > On Thu, Sep 14, 2023 at 1:31 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > > >
> > > > > > > On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> ...
>
> > > > > > > I mentioned this a while back, likely in the other threads where this
> > > > > > > token-based approach was only being discussed in general terms, but I
> > > > > > > think we want to have a LSM hook at the point of initial token
> > > > > > > delegation for this and a hook when the token is used.  My initial
> > > > > > > thinking is that we should be able to address the former with a hook
> > > > > > > in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> > > > > > > or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> > > > > > > but it doesn't allow for much in the way of granularity.  Inserting the
> > > > > > > LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> > > > > > > gracefully fallback to the system-wide checks if the LSM denied the
> > > > > > > requested access whereas an access denial in bpf_token_get_from_fd()
> > > > > > > denial would cause the operation to error out.
> > > > > >
> > > > > > I think the bpf_fill_super() LSM hook makes sense, but I thought
> > > > > > someone mentioned that we already have some generic LSM hook for
> > > > > > validating mounts? If we don't, I can certainly add one for BPF FS
> > > > > > specifically.
> > > > >
> > > > > We do have security_sb_mount(), but that is a generic mount operation
> > > > > access control and not well suited for controlling the mount-based
> > > > > capability delegation that you are proposing here.  However, if you or
> > > > > someone else has a clever way to make security_sb_mount() work for
> > > > > this purpose I would be very happy to review that code.
> > > >
> > > > To be honest, I'm a bit out of my depth here, as I don't know the
> > > > mounting parts well. Perhaps someone from VFS side can advise. But
> > > > regardless, I have no problem adding a new LSM hook as well, ideally
> > > > not very BPF-specific. If you have a specific form of it in mind, I'd
> > > > be curious to see it and implement it.
> > >
> > > I agree that there can be benefits to generalized LSM hooks, but in
> > > this hook I think it may need to be BPF specific simply because the
> > > hook would be dealing with the specific concept of delegating BPF
> > > permissions.
> >
> > Sure. As an alternative, if this is about controlling BPF delegation,
> > instead of doing mount-time checks and LSM hook, perhaps we can add a
> > new LSM hook to BPF_CREATE_TOKEN, just like we have ones for
> > BPF_MAP_CREATE and BPF_PROG_LOAD. That will enable controlling
> > delegation more directly when it is actually attempted to be used.
>
> I'm also going to reply to the v6 patchset, but I thought there were
> some important points in this thread that were worth responding to
> here so that it would have the context of our previous discussion.
>
> So yes, from an LSM perspective we are concerned with who grants the
> delegation (creates the token) and who leverages that token to do
> work.  When this patchset was still using anonymous inodes, marking
> and controlling token creation was relatively easy as we have existing
> hooks/control-points for anonymous inodes which take into account the
> anonymous inode class/type, e.g. bpffs.  Now that this patchset is
> using a regular bpffs inode we may need to do some additional work so
> that we can mark the bpffs token inode as a "token" so that we can
> later distinguish it from an ordinary bpffs inode; it might also serve
> as a convenient place to control creation of the token, but as you
> have already mentioned we could also control this from the existing
> security_bpf(BPF_CREATE_TOKEN, ...) hook at the top of __sys_bpf().
>
> Anyway, more on this in the v6 patchset.
>
> > > I haven't taken the time to write up any hook patches yet as I wanted
> > > to discuss it with you and the others on the To/CC line, but it seems
> > > like we are roughly on the same page, at least with the initial
> > > delegation hook, so I can put something together if you aren't
> > > comfortable working on this (more on this below) ...
> >
> > I'd appreciate the help from the SELinux side specifically, yes. I'm
> > absolutely OK to add a few new LSM hooks, though.
>
> I just want to say again that I'm very happy we can work together to
> make sure everything is covered :)

Likewise! I ran out of time today to finish all the requested changes,
but hopefully I will be able to post a new version tomorrow with all
the feedback applied.

>
> > > > > > As for the bpf_token_allow_xxx(). This feels a bit too specific and
> > > > > > narrow-focused. What if we later add yet another dimension for BPF FS
> > > > > > and token? Do we need to introduce yet another LSM for each such case?
> > > > >
> > > > > [I'm assuming you meant new LSM *hook*]
> > > >
> > > > yep, of course, sorry about using terminology sloppily
> > > >
> > > > > Possibly.  There are also some other issues which I've been thinking
> > > > > about along these lines, specifically the fact that the
> > > > > capability/command delegation happens after the existing
> > > > > security_bpf() hook is called which makes things rather awkward from a
> > > > > LSM perspective: the LSM would first need to allow the process access
> > > > > to the desired BPF op using it's current LSM specific security
> > > > > attributes (e.g. SELinux security domain, etc.) and then later
> > > > > consider the op in the context of the delegated access control rights
> > > > > (if the LSM decides to support those hooks).
> > > > >
> > > > > I suspect that if we want to make this practical we would need to
> > > > > either move some of the token code up into __sys_bpf() so we could
> > > > > have a better interaction with security_bpf(), or we need to consider
> > > > > moving the security_bpf() call into the op specific functions.  I'm
> > > > > still thinking on this (lots of reviews to get through this week), but
> > > > > I'm hoping there is a better way because I'm not sure I like either
> > > > > option very much.
> > > >
> > > > Yes, security_bpf() is happening extremely early and is lacking a lot
> > > > of context. I'm not sure if moving it around is a good idea as it
> > > > basically changes its semantics.
> > >
> > > There are a couple of things that make this not quite as scary as it
> > > may seem.  The first is that currently only SELinux implements a
> > > security_bpf() hook and the implementation is rather simplistic in
> > > terms of what information it requires to perform the existing access
> > > controls; decomposing the single security_bpf() call site into
> > > multiple op specific calls, perhaps with some op specific hooks,
> > > should be doable without causing major semantic changes.  The second
> > > thing is that we could augment the existing security_bpf() hook and
> > > call site with a new LSM hook(s) that are called from the op specific
> > > call sites; this would allow those LSMs that desire the current
> > > semantics to use the existing security_bpf() hook and those that wish
> > > to use the new semantics could implement the new hook(s).  This is
> > > very similar to the pathname-based and inode-based hooks in the VFS
> > > layer, some LSMs choose to implement pathname-based security and use
> > > one set of hooks, while others implement a label-based security
> > > mechanism and use a different set of hooks.
> >
> > Agreed. I think new LSM hooks that are operation-specific make a lot
> > of sense. I'd probably not touch existing security_bpf(), it's an
> > early-entry LSM hook for anything bpf() syscall-specific. This might
> > be very useful in some cases, probably.
> >
> > > > But adding a new set of coherent LSM
> > > > hooks per each appropriate BPF operation with good context to make
> > > > decisions sounds like a good improvement. E.g., for BPF_PROG_LOAD, we
> > > > can have LSM hook after struct bpf_prog is allocated, bpf_token is
> > > > available, attributes are sanity checked. All that together is a very
> > > > useful and powerful context that can be used both by more fixed LSM
> > > > policies (like SELinux), and very dynamic user-defined BPF LSM
> > > > programs.
> > >
> > > This is where it is my turn to mention that I'm getting a bit out of
> > > my depth, but I'm hopeful that the two of us can keep each other from
> > > drowning :)
> > >
> > > Typically the LSM hook call sites end up being in the same general
> > > area as the capability checks, usually just after (we want the normal
> > > Linux discretionary access controls to always come first for the sake
> > > of consistency).  Sticking with that approach it looks like we would
> > > end up with a LSM call in bpf_prog_load() right after bpf_capable()
> > > call, the only gotcha with that is the bpf_prog struct isn't populated
> > > yet, but how important is that when we have the bpf_attr info (honest
> > > question, I don't know the answer to this)?
> >
> > Ok, so I agree in general about having LSM hooks close to capability
> > checks, but at least specifically for BPF_PROG_CREATE, it won't work.
> > This bpf_capable() check you mention. This is just one check. If you
> > look into bpf_prog_load() in kernel/bpf/syscall.c, you'll see that we
> > can also check CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, in
> > addition to CAP_BPF, based on various aspects (like program type +
> > subtype).
>
> That's a fair point.
>
> > So for such a complex BPF_PROG_CREATE operation I think we
> > should deviate a bit and place LSM in a logical place that would
> > enable doing LSM enforcement with lots of relevant information, but
> > before doing anything dangerous or expensive.
> >
> > For BPF_PROG_LOAD that place seems to be right before bpf_check(),
> > which is BPF verification ...
>
> > ... Right now we have `security_bpf_prog_alloc(prog->aux);`, which is
> > almost in the ideal place, but provides prog->aux instead of program
> > itself (not sure why), and doesn't provide bpf_attr and bpf_token.
> >
> > So I'm thinking that maybe we get rid of bpf_prog_alloc() in favor of
> > new security_bpf_prog_load(prog, &attr, token)?
>
> That sounds reasonable.  We'll need to make sure we update the docs
> for that LSM hook to indicate that it performs both allocation of the
> LSM's BPF program state (it's current behavior), as well as access
> control for BPF program loads both with and without delegation.

Heh, I asked about this distinction between "allocation of LSM state"
and "access control" in another email thread, but based on this it
seems like it's purely convention based and it's OK to do both with
the same hook, right? If that's the case, then I think we should
combine what you proposed as two hooks, security_bpf_token_alloc() and
security_bpf_token_create(), into one hook to minimize mental
overhead. But if I'm missing something, please point it out on the
patch.

>
> I think those are the big points worth wrapping up here in this
> thread, I'll move the rest over to the v6 patchset.

It all makes sense and I intend to add all those hooks and do
refactoring of existing map/prog ones. I will put that into a separate
patch in the series for a more focused review.



>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-10-12  0:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-12 21:28 [PATCH v4 bpf-next 00/12] BPF token and BPF FS-based delegation Andrii Nakryiko
2023-09-12 21:28 ` [PATCH v4 bpf-next 01/12] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
2023-09-12 21:28 ` [PATCH v4 bpf-next 02/12] bpf: introduce BPF token object Andrii Nakryiko
2023-09-12 21:46   ` Andrii Nakryiko
2023-09-13 21:46   ` [PATCH v4 2/12] " Paul Moore
2023-09-14 17:31     ` Andrii Nakryiko
2023-09-15  0:55       ` Paul Moore
2023-09-15 20:59         ` Andrii Nakryiko
2023-09-21 22:18           ` Paul Moore
2023-09-22  9:27             ` Paul Moore
2023-09-22 22:35             ` Andrii Nakryiko
2023-09-26 21:32               ` Paul Moore
2023-10-10 21:19               ` Paul Moore
2023-10-12  0:32                 ` Andrii Nakryiko
2023-09-12 21:28 ` [PATCH v4 bpf-next 03/12] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
2023-09-12 21:28 ` [PATCH v4 bpf-next 04/12] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
2023-09-12 21:28 ` [PATCH v4 bpf-next 05/12] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
2023-09-12 21:29 ` [PATCH v4 bpf-next 06/12] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
2023-09-13  9:45   ` kernel test robot
2023-09-13 18:41   ` kernel test robot
2023-09-12 21:29 ` [PATCH v4 bpf-next 07/12] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
2023-09-13 22:15   ` kernel test robot
2023-09-12 21:29 ` [PATCH v4 bpf-next 08/12] libbpf: add bpf_token_create() API Andrii Nakryiko
2023-09-12 21:42   ` Andrii Nakryiko
2023-09-12 21:29 ` [PATCH v4 bpf-next 09/12] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
2023-09-12 21:29 ` [PATCH v4 bpf-next 10/12] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
2023-09-12 21:29 ` [PATCH v4 bpf-next 11/12] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
2023-09-12 21:29 ` [PATCH v4 bpf-next 12/12] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.