netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor
@ 2022-06-10 16:57 Stanislav Fomichev
  2022-06-10 16:57 ` [PATCH bpf-next v9 01/10] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:57 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, andrii, Stanislav Fomichev, kafai, kpsingh, jakub

This series implements new lsm flavor for attaching per-cgroup programs to
existing lsm hooks. The cgroup is taken out of 'current', unless
the first argument of the hook is 'struct socket'. In this case,
the cgroup association is taken out of socket. The attachment
looks like a regular per-cgroup attachment: we add new BPF_LSM_CGROUP
attach type which, together with attach_btf_id, signals per-cgroup lsm.
Behind the scenes, we allocate trampoline shim program and
attach to lsm. This program looks up cgroup from current/socket
and runs cgroup's effective prog array. The rest of the per-cgroup BPF
stays the same: hierarchy, local storage, retval conventions
(return 1 == success).

Current limitations:
* haven't considered sleepable bpf; can be extended later on
* not sure the verifier does the right thing with null checks;
  see latest selftest for details
* total of 10 (global) per-cgroup LSM attach points

Cc: ast@kernel.org
Cc: daniel@iogearbox.net
Cc: kafai@fb.com
Cc: kpsingh@kernel.org
Cc: jakub@cloudflare.com

v9:
Major change since last version is the switch to bpf_setsockopt to
change the socket state instead of letting the progs poke socket directly.
This, in turn, highlights the challenge that we need to care about whether
the socket is locked or not when we call bpf_setsockopt. (with my original
example selftest, the hooks are running early in the init phase for this
not to matter).

For now, I've added two btf id lists:
* hooks where we know the socket is locked and it's safe to call bpf_setsockopt
* hooks where we know the socket is _not_ locked, but the hook works on
  the socket that's not yet exposed to userspace so it should be safe
  (for this mode, special new set of bpf_{s,g}etsockopt helpers
   is added; they don't have sock_owned_by_me check)

Going forward, for the rest of the hooks, this might be a good motivation
to expand lsm cgroup to support sleeping bpf and allow the callers to
lock/unlock sockets or have a new bpf_setsockopt variant that does the
locking.

- ifdef around cleanup in cgroup_bpf_release
- Andrii: a few nits in libbpf patches
- Martin: remove unused btf_id_set_index
- Martin: bring back refcnt for cgroup_atype
- Martin: make __cgroup_bpf_query a bit more readable
- Martin: expose dst_prog->aux->attach_btf as attach_btf_obj_id as well
- Martin: reorg check_return_code path for BPF_LSM_CGROUP
- Martin: return directly from check_helper_call (instead of goto err)
- Martin: add note to new warning in check_return_code, print only for void hooks
- Martin: remove confusing shim reuse
- Martin: use bpf_{s,g}etsockopt instead of poking into socket data
- Martin: use CONFIG_CGROUP_BPF in bpf_prog_alloc_no_stats/bpf_prog_free_deferred

v8:
- CI: fix compile issue
- CI: fix broken bpf_cookie
- Yonghong: remove __bpf_trampoline_unlink_prog comment
- Yonghong: move cgroup_atype around to fill the gap
- Yonghong: make bpf_lsm_find_cgroup_shim void
- Yonghong: rename regs to args
- Yonghong: remove if(current) check
- Martin: move refcnt into bpf_link
- Martin: move shim management to bpf_link ops
- Martin: use cgroup_atype for shim only
- Martin: go back to arrays for managing cgroup_atype(s)
- Martin: export bpf_obj_id(aux->attach_btf)
- Andrii: reorder SEC_DEF("lsm_cgroup+")
- Andrii: OPTS_SET instead of OPTS_HAS
- Andrii: rename attach_btf_func_id
- Andrii: move into 1.0 map

v7:
- there were a lot of comments last time, hope I didn't forget anything,
  some of the bigger ones:
  - Martin: use/extend BTF_SOCK_TYPE_SOCKET
  - Martin: expose bpf_set_retval
  - Martin: reject 'return 0' at the verifier for 'void' hooks
  - Martin: prog_query returns all BPF_LSM_CGROUP, prog_info
    returns attach_btf_func_id
  - Andrii: split libbpf changes
  - Andrii: add field access test to test_progs, not test_verifier (still
    using asm though)
- things that I haven't addressed, stating them here explicitly, let
  me know if some of these are still problematic:
  1. Andrii: exposing only link-based api: seems like the changes
     to support non-link-based ones are minimal, couple of lines,
     so seems like it worth having it?
  2. Alexei: applying cgroup_atype for all cgroup hooks, not only
     cgroup lsm: looks a bit harder to apply everywhere that I
     originally thought; with lsm cgroup, we have a shim_prog pointer where
     we store cgroup_atype; for non-lsm programs, we don't have a
     trace program where to store it, so we still need some kind
     of global table to map from "static" hook to "dynamic" slot.
     So I'm dropping this "can be easily extended" clause from the
     description for now. I have converted this whole machinery
     to an RCU-managed list to remove synchronize_rcu().
- also note that I had to introduce new bpf_shim_tramp_link and
  moved refcnt there; we need something to manage new bpf_tramp_link

v6:
- remove active count & stats for shim program (Martin KaFai Lau)
- remove NULL/error check for btf_vmlinux (Martin)
- don't check cgroup_atype in bpf_cgroup_lsm_shim_release (Martin)
- use old_prog (instead of passed one) in __cgroup_bpf_detach (Martin)
- make sure attach_btf_id is the same in __cgroup_bpf_replace (Martin)
- enable cgroup local storage and test it (Martin)
- properly implement prog query and add bpftool & tests (Martin)
- prohibit non-shared cgroup storage mode for BPF_LSM_CGROUP (Martin)

v5:
- __cgroup_bpf_run_lsm_socket remove NULL sock/sk checks (Martin KaFai Lau)
- __cgroup_bpf_run_lsm_{socket,current} s/prog/shim_prog/ (Martin)
- make sure bpf_lsm_find_cgroup_shim works for hooks without args (Martin)
- __cgroup_bpf_attach make sure attach_btf_id is the same when replacing (Martin)
- call bpf_cgroup_lsm_shim_release only for LSM_CGROUP (Martin)
- drop BPF_LSM_CGROUP from bpf_attach_type_to_tramp (Martin)
- drop jited check from cgroup_shim_find (Martin)
- new patch to convert cgroup_bpf to hlist_node (Jakub Sitnicki)
- new shim flavor for 'struct sock' + list of exceptions (Martin)

v4:
- fix build when jit is on but syscall is off

v3:
- add BPF_LSM_CGROUP to bpftool
- use simple int instead of refcnt_t (to avoid use-after-free
  false positive)

v2:
- addressed build bot failures

Stanislav Fomichev (10):
  bpf: add bpf_func_t and trampoline helpers
  bpf: convert cgroup_bpf.progs to hlist
  bpf: per-cgroup lsm flavor
  bpf: minimize number of allocated lsm slots per program
  bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP
  bpf: expose bpf_{g,s}etsockopt to lsm cgroup
  libbpf: add lsm_cgoup_sock type
  libbpf: implement bpf_prog_query_opts
  bpftool: implement cgroup tree for BPF_LSM_CGROUP
  selftests/bpf: lsm_cgroup functional test

 arch/x86/net/bpf_jit_comp.c                   |  24 +-
 include/linux/bpf-cgroup-defs.h               |  13 +-
 include/linux/bpf-cgroup.h                    |   9 +-
 include/linux/bpf.h                           |  39 +-
 include/linux/bpf_lsm.h                       |   7 +
 include/linux/btf_ids.h                       |   3 +-
 include/uapi/linux/bpf.h                      |   4 +
 kernel/bpf/bpf_lsm.c                          |  83 ++++
 kernel/bpf/btf.c                              |   1 +
 kernel/bpf/cgroup.c                           | 360 ++++++++++++++----
 kernel/bpf/core.c                             |   9 +
 kernel/bpf/syscall.c                          |  18 +-
 kernel/bpf/trampoline.c                       | 262 +++++++++++--
 kernel/bpf/verifier.c                         |  32 ++
 net/core/filter.c                             |  60 ++-
 tools/bpf/bpftool/cgroup.c                    |  80 ++--
 tools/include/linux/btf_ids.h                 |   4 +-
 tools/include/uapi/linux/bpf.h                |   4 +
 tools/lib/bpf/bpf.c                           |  38 +-
 tools/lib/bpf/bpf.h                           |  15 +
 tools/lib/bpf/libbpf.c                        |   3 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/lsm_cgroup.c     | 277 ++++++++++++++
 .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
 .../testing/selftests/bpf/progs/lsm_cgroup.c  | 180 +++++++++
 25 files changed, 1369 insertions(+), 158 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/progs/lsm_cgroup.c

-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 01/10] bpf: add bpf_func_t and trampoline helpers
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
@ 2022-06-10 16:57 ` Stanislav Fomichev
  2022-06-16 19:53   ` Martin KaFai Lau
  2022-06-10 16:57 ` [PATCH bpf-next v9 02/10] bpf: convert cgroup_bpf.progs to hlist Stanislav Fomichev
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:57 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

I'll be adding lsm cgroup specific helpers that grab
trampoline mutex.

No functional changes.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h     | 11 ++++---
 kernel/bpf/trampoline.c | 63 +++++++++++++++++++++--------------------
 2 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8e6092d0ea95..b0af68b6297b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -53,6 +53,8 @@ typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
 					struct bpf_iter_aux_info *aux);
 typedef void (*bpf_iter_fini_seq_priv_t)(void *private_data);
+typedef unsigned int (*bpf_func_t)(const void *,
+				   const struct bpf_insn *);
 struct bpf_iter_seq_info {
 	const struct seq_operations *seq_ops;
 	bpf_iter_init_seq_priv_t init_seq_private;
@@ -863,8 +865,7 @@ struct bpf_dispatcher {
 static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func(
 	const void *ctx,
 	const struct bpf_insn *insnsi,
-	unsigned int (*bpf_func)(const void *,
-				 const struct bpf_insn *))
+	bpf_func_t bpf_func)
 {
 	return bpf_func(ctx, insnsi);
 }
@@ -893,8 +894,7 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs);
 	noinline __nocfi unsigned int bpf_dispatcher_##name##_func(	\
 		const void *ctx,					\
 		const struct bpf_insn *insnsi,				\
-		unsigned int (*bpf_func)(const void *,			\
-					 const struct bpf_insn *))	\
+		bpf_func_t bpf_func)					\
 	{								\
 		return bpf_func(ctx, insnsi);				\
 	}								\
@@ -905,8 +905,7 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs);
 	unsigned int bpf_dispatcher_##name##_func(			\
 		const void *ctx,					\
 		const struct bpf_insn *insnsi,				\
-		unsigned int (*bpf_func)(const void *,			\
-					 const struct bpf_insn *));	\
+		bpf_func_t bpf_func);					\
 	extern struct bpf_dispatcher bpf_dispatcher_##name;
 #define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_##name##_func
 #define BPF_DISPATCHER_PTR(name) (&bpf_dispatcher_##name)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 93c7675f0c9e..5466e15be61f 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -410,7 +410,7 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
 	}
 }
 
-int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
+static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
 {
 	enum bpf_tramp_prog_type kind;
 	struct bpf_tramp_link *link_exiting;
@@ -418,44 +418,33 @@ int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline
 	int cnt = 0, i;
 
 	kind = bpf_attach_type_to_tramp(link->link.prog);
-	mutex_lock(&tr->mutex);
-	if (tr->extension_prog) {
+	if (tr->extension_prog)
 		/* cannot attach fentry/fexit if extension prog is attached.
 		 * cannot overwrite extension prog either.
 		 */
-		err = -EBUSY;
-		goto out;
-	}
+		return -EBUSY;
 
 	for (i = 0; i < BPF_TRAMP_MAX; i++)
 		cnt += tr->progs_cnt[i];
 
 	if (kind == BPF_TRAMP_REPLACE) {
 		/* Cannot attach extension if fentry/fexit are in use. */
-		if (cnt) {
-			err = -EBUSY;
-			goto out;
-		}
+		if (cnt)
+			return -EBUSY;
 		tr->extension_prog = link->link.prog;
-		err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP, NULL,
-					 link->link.prog->bpf_func);
-		goto out;
-	}
-	if (cnt >= BPF_MAX_TRAMP_LINKS) {
-		err = -E2BIG;
-		goto out;
+		return bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP, NULL,
+					  link->link.prog->bpf_func);
 	}
-	if (!hlist_unhashed(&link->tramp_hlist)) {
+	if (cnt >= BPF_MAX_TRAMP_LINKS)
+		return -E2BIG;
+	if (!hlist_unhashed(&link->tramp_hlist))
 		/* prog already linked */
-		err = -EBUSY;
-		goto out;
-	}
+		return -EBUSY;
 	hlist_for_each_entry(link_exiting, &tr->progs_hlist[kind], tramp_hlist) {
 		if (link_exiting->link.prog != link->link.prog)
 			continue;
 		/* prog already linked */
-		err = -EBUSY;
-		goto out;
+		return -EBUSY;
 	}
 
 	hlist_add_head(&link->tramp_hlist, &tr->progs_hlist[kind]);
@@ -465,30 +454,44 @@ int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline
 		hlist_del_init(&link->tramp_hlist);
 		tr->progs_cnt[kind]--;
 	}
-out:
+	return err;
+}
+
+int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
+{
+	int err;
+
+	mutex_lock(&tr->mutex);
+	err = __bpf_trampoline_link_prog(link, tr);
 	mutex_unlock(&tr->mutex);
 	return err;
 }
 
-/* bpf_trampoline_unlink_prog() should never fail. */
-int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
+static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
 {
 	enum bpf_tramp_prog_type kind;
 	int err;
 
 	kind = bpf_attach_type_to_tramp(link->link.prog);
-	mutex_lock(&tr->mutex);
 	if (kind == BPF_TRAMP_REPLACE) {
 		WARN_ON_ONCE(!tr->extension_prog);
 		err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP,
 					 tr->extension_prog->bpf_func, NULL);
 		tr->extension_prog = NULL;
-		goto out;
+		return err;
 	}
 	hlist_del_init(&link->tramp_hlist);
 	tr->progs_cnt[kind]--;
-	err = bpf_trampoline_update(tr);
-out:
+	return bpf_trampoline_update(tr);
+}
+
+/* bpf_trampoline_unlink_prog() should never fail. */
+int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
+{
+	int err;
+
+	mutex_lock(&tr->mutex);
+	err = __bpf_trampoline_unlink_prog(link, tr);
 	mutex_unlock(&tr->mutex);
 	return err;
 }
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 02/10] bpf: convert cgroup_bpf.progs to hlist
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
  2022-06-10 16:57 ` [PATCH bpf-next v9 01/10] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
@ 2022-06-10 16:57 ` Stanislav Fomichev
  2022-06-16 19:59   ` Martin KaFai Lau
  2022-06-10 16:57 ` [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor Stanislav Fomichev
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:57 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev, Jakub Sitnicki

This lets us reclaim some space to be used by new cgroup lsm slots.

Before:
struct cgroup_bpf {
	struct bpf_prog_array *    effective[23];        /*     0   184 */
	/* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */
	struct list_head           progs[23];            /*   184   368 */
	/* --- cacheline 8 boundary (512 bytes) was 40 bytes ago --- */
	u32                        flags[23];            /*   552    92 */

	/* XXX 4 bytes hole, try to pack */

	/* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */
	struct list_head           storages;             /*   648    16 */
	struct bpf_prog_array *    inactive;             /*   664     8 */
	struct percpu_ref          refcnt;               /*   672    16 */
	struct work_struct         release_work;         /*   688    32 */

	/* size: 720, cachelines: 12, members: 7 */
	/* sum members: 716, holes: 1, sum holes: 4 */
	/* last cacheline: 16 bytes */
};

After:
struct cgroup_bpf {
	struct bpf_prog_array *    effective[23];        /*     0   184 */
	/* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */
	struct hlist_head          progs[23];            /*   184   184 */
	/* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */
	u8                         flags[23];            /*   368    23 */

	/* XXX 1 byte hole, try to pack */

	/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
	struct list_head           storages;             /*   392    16 */
	struct bpf_prog_array *    inactive;             /*   408     8 */
	struct percpu_ref          refcnt;               /*   416    16 */
	struct work_struct         release_work;         /*   432    72 */

	/* size: 504, cachelines: 8, members: 7 */
	/* sum members: 503, holes: 1, sum holes: 1 */
	/* last cacheline: 56 bytes */
};

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf-cgroup-defs.h |  4 +-
 include/linux/bpf-cgroup.h      |  2 +-
 kernel/bpf/cgroup.c             | 76 +++++++++++++++++++--------------
 3 files changed, 47 insertions(+), 35 deletions(-)

diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
index 695d1224a71b..5d268e76d8e6 100644
--- a/include/linux/bpf-cgroup-defs.h
+++ b/include/linux/bpf-cgroup-defs.h
@@ -47,8 +47,8 @@ struct cgroup_bpf {
 	 * have either zero or one element
 	 * when BPF_F_ALLOW_MULTI the list can have up to BPF_CGROUP_MAX_PROGS
 	 */
-	struct list_head progs[MAX_CGROUP_BPF_ATTACH_TYPE];
-	u32 flags[MAX_CGROUP_BPF_ATTACH_TYPE];
+	struct hlist_head progs[MAX_CGROUP_BPF_ATTACH_TYPE];
+	u8 flags[MAX_CGROUP_BPF_ATTACH_TYPE];
 
 	/* list of cgroup shared storages */
 	struct list_head storages;
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 669d96d074ad..6673acfbf2ef 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -95,7 +95,7 @@ struct bpf_cgroup_link {
 };
 
 struct bpf_prog_list {
-	struct list_head node;
+	struct hlist_node node;
 	struct bpf_prog *prog;
 	struct bpf_cgroup_link *link;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE];
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 7a394f7c205c..4adb4f3ecb7f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -157,11 +157,12 @@ static void cgroup_bpf_release(struct work_struct *work)
 	mutex_lock(&cgroup_mutex);
 
 	for (atype = 0; atype < ARRAY_SIZE(cgrp->bpf.progs); atype++) {
-		struct list_head *progs = &cgrp->bpf.progs[atype];
-		struct bpf_prog_list *pl, *pltmp;
+		struct hlist_head *progs = &cgrp->bpf.progs[atype];
+		struct bpf_prog_list *pl;
+		struct hlist_node *pltmp;
 
-		list_for_each_entry_safe(pl, pltmp, progs, node) {
-			list_del(&pl->node);
+		hlist_for_each_entry_safe(pl, pltmp, progs, node) {
+			hlist_del(&pl->node);
 			if (pl->prog)
 				bpf_prog_put(pl->prog);
 			if (pl->link)
@@ -217,12 +218,12 @@ static struct bpf_prog *prog_list_prog(struct bpf_prog_list *pl)
 /* count number of elements in the list.
  * it's slow but the list cannot be long
  */
-static u32 prog_list_length(struct list_head *head)
+static u32 prog_list_length(struct hlist_head *head)
 {
 	struct bpf_prog_list *pl;
 	u32 cnt = 0;
 
-	list_for_each_entry(pl, head, node) {
+	hlist_for_each_entry(pl, head, node) {
 		if (!prog_list_prog(pl))
 			continue;
 		cnt++;
@@ -291,7 +292,7 @@ static int compute_effective_progs(struct cgroup *cgrp,
 		if (cnt > 0 && !(p->bpf.flags[atype] & BPF_F_ALLOW_MULTI))
 			continue;
 
-		list_for_each_entry(pl, &p->bpf.progs[atype], node) {
+		hlist_for_each_entry(pl, &p->bpf.progs[atype], node) {
 			if (!prog_list_prog(pl))
 				continue;
 
@@ -342,7 +343,7 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
 		cgroup_bpf_get(p);
 
 	for (i = 0; i < NR; i++)
-		INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
+		INIT_HLIST_HEAD(&cgrp->bpf.progs[i]);
 
 	INIT_LIST_HEAD(&cgrp->bpf.storages);
 
@@ -418,7 +419,7 @@ static int update_effective_progs(struct cgroup *cgrp,
 
 #define BPF_CGROUP_MAX_PROGS 64
 
-static struct bpf_prog_list *find_attach_entry(struct list_head *progs,
+static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
 					       struct bpf_prog *prog,
 					       struct bpf_cgroup_link *link,
 					       struct bpf_prog *replace_prog,
@@ -428,12 +429,12 @@ static struct bpf_prog_list *find_attach_entry(struct list_head *progs,
 
 	/* single-attach case */
 	if (!allow_multi) {
-		if (list_empty(progs))
+		if (hlist_empty(progs))
 			return NULL;
-		return list_first_entry(progs, typeof(*pl), node);
+		return hlist_entry(progs->first, typeof(*pl), node);
 	}
 
-	list_for_each_entry(pl, progs, node) {
+	hlist_for_each_entry(pl, progs, node) {
 		if (prog && pl->prog == prog && prog != replace_prog)
 			/* disallow attaching the same prog twice */
 			return ERR_PTR(-EINVAL);
@@ -444,7 +445,7 @@ static struct bpf_prog_list *find_attach_entry(struct list_head *progs,
 
 	/* direct prog multi-attach w/ replacement case */
 	if (replace_prog) {
-		list_for_each_entry(pl, progs, node) {
+		hlist_for_each_entry(pl, progs, node) {
 			if (pl->prog == replace_prog)
 				/* a match found */
 				return pl;
@@ -480,7 +481,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog_list *pl;
-	struct list_head *progs;
+	struct hlist_head *progs;
 	int err;
 
 	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
@@ -503,7 +504,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (!hierarchy_allows_attach(cgrp, atype))
 		return -EPERM;
 
-	if (!list_empty(progs) && cgrp->bpf.flags[atype] != saved_flags)
+	if (!hlist_empty(progs) && cgrp->bpf.flags[atype] != saved_flags)
 		/* Disallow attaching non-overridable on top
 		 * of existing overridable in this cgroup.
 		 * Disallow attaching multi-prog if overridable or none
@@ -525,12 +526,22 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (pl) {
 		old_prog = pl->prog;
 	} else {
+		struct hlist_node *last = NULL;
+
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
 		if (!pl) {
 			bpf_cgroup_storages_free(new_storage);
 			return -ENOMEM;
 		}
-		list_add_tail(&pl->node, progs);
+		if (hlist_empty(progs))
+			hlist_add_head(&pl->node, progs);
+		else
+			hlist_for_each(last, progs) {
+				if (last->next)
+					continue;
+				hlist_add_behind(&pl->node, last);
+				break;
+			}
 	}
 
 	pl->prog = prog;
@@ -556,7 +567,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	}
 	bpf_cgroup_storages_free(new_storage);
 	if (!old_prog) {
-		list_del(&pl->node);
+		hlist_del(&pl->node);
 		kfree(pl);
 	}
 	return err;
@@ -587,7 +598,7 @@ static void replace_effective_prog(struct cgroup *cgrp,
 	struct cgroup_subsys_state *css;
 	struct bpf_prog_array *progs;
 	struct bpf_prog_list *pl;
-	struct list_head *head;
+	struct hlist_head *head;
 	struct cgroup *cg;
 	int pos;
 
@@ -603,7 +614,7 @@ static void replace_effective_prog(struct cgroup *cgrp,
 				continue;
 
 			head = &cg->bpf.progs[atype];
-			list_for_each_entry(pl, head, node) {
+			hlist_for_each_entry(pl, head, node) {
 				if (!prog_list_prog(pl))
 					continue;
 				if (pl->link == link)
@@ -637,7 +648,7 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog *old_prog;
 	struct bpf_prog_list *pl;
-	struct list_head *progs;
+	struct hlist_head *progs;
 	bool found = false;
 
 	atype = to_cgroup_bpf_attach_type(link->type);
@@ -649,7 +660,7 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
 	if (link->link.prog->type != new_prog->type)
 		return -EINVAL;
 
-	list_for_each_entry(pl, progs, node) {
+	hlist_for_each_entry(pl, progs, node) {
 		if (pl->link == link) {
 			found = true;
 			break;
@@ -688,7 +699,7 @@ static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog,
 	return ret;
 }
 
-static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
+static struct bpf_prog_list *find_detach_entry(struct hlist_head *progs,
 					       struct bpf_prog *prog,
 					       struct bpf_cgroup_link *link,
 					       bool allow_multi)
@@ -696,14 +707,14 @@ static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
 	struct bpf_prog_list *pl;
 
 	if (!allow_multi) {
-		if (list_empty(progs))
+		if (hlist_empty(progs))
 			/* report error when trying to detach and nothing is attached */
 			return ERR_PTR(-ENOENT);
 
 		/* to maintain backward compatibility NONE and OVERRIDE cgroups
 		 * allow detaching with invalid FD (prog==NULL) in legacy mode
 		 */
-		return list_first_entry(progs, typeof(*pl), node);
+		return hlist_entry(progs->first, typeof(*pl), node);
 	}
 
 	if (!prog && !link)
@@ -713,7 +724,7 @@ static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
 		return ERR_PTR(-EINVAL);
 
 	/* find the prog or link and detach it */
-	list_for_each_entry(pl, progs, node) {
+	hlist_for_each_entry(pl, progs, node) {
 		if (pl->prog == prog && pl->link == link)
 			return pl;
 	}
@@ -737,7 +748,7 @@ static void purge_effective_progs(struct cgroup *cgrp, struct bpf_prog *prog,
 	struct cgroup_subsys_state *css;
 	struct bpf_prog_array *progs;
 	struct bpf_prog_list *pl;
-	struct list_head *head;
+	struct hlist_head *head;
 	struct cgroup *cg;
 	int pos;
 
@@ -754,7 +765,7 @@ static void purge_effective_progs(struct cgroup *cgrp, struct bpf_prog *prog,
 				continue;
 
 			head = &cg->bpf.progs[atype];
-			list_for_each_entry(pl, head, node) {
+			hlist_for_each_entry(pl, head, node) {
 				if (!prog_list_prog(pl))
 					continue;
 				if (pl->prog == prog && pl->link == link)
@@ -791,7 +802,7 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog *old_prog;
 	struct bpf_prog_list *pl;
-	struct list_head *progs;
+	struct hlist_head *progs;
 	u32 flags;
 
 	atype = to_cgroup_bpf_attach_type(type);
@@ -822,9 +833,10 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	}
 
 	/* now can actually delete it from this cgroup list */
-	list_del(&pl->node);
+	hlist_del(&pl->node);
+
 	kfree(pl);
-	if (list_empty(progs))
+	if (hlist_empty(progs))
 		/* last program was detached, reset flags to zero */
 		cgrp->bpf.flags[atype] = 0;
 	if (old_prog)
@@ -852,7 +864,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 	enum bpf_attach_type type = attr->query.attach_type;
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog_array *effective;
-	struct list_head *progs;
+	struct hlist_head *progs;
 	struct bpf_prog *prog;
 	int cnt, ret = 0, i;
 	u32 flags;
@@ -891,7 +903,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		u32 id;
 
 		i = 0;
-		list_for_each_entry(pl, progs, node) {
+		hlist_for_each_entry(pl, progs, node) {
 			prog = prog_list_prog(pl);
 			id = prog->aux->id;
 			if (copy_to_user(prog_ids + i, &id, sizeof(id)))
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
  2022-06-10 16:57 ` [PATCH bpf-next v9 01/10] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
  2022-06-10 16:57 ` [PATCH bpf-next v9 02/10] bpf: convert cgroup_bpf.progs to hlist Stanislav Fomichev
@ 2022-06-10 16:57 ` Stanislav Fomichev
  2022-06-16 22:25   ` Martin KaFai Lau
  2022-06-10 16:57 ` [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:57 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

Allow attaching to lsm hooks in the cgroup context.

Attaching to per-cgroup LSM works exactly like attaching
to other per-cgroup hooks. New BPF_LSM_CGROUP is added
to trigger new mode; the actual lsm hook we attach to is
signaled via existing attach_btf_id.

For the hooks that have 'struct socket' or 'struct sock' as its first
argument, we use the cgroup associated with that socket. For the rest,
we use 'current' cgroup (this is all on default hierarchy == v2 only).
Note that for some hooks that work on 'struct sock' we still
take the cgroup from 'current' because some of them work on the socket
that hasn't been properly initialized yet.

Behind the scenes, we allocate a shim program that is attached
to the trampoline and runs cgroup effective BPF programs array.
This shim has some rudimentary ref counting and can be shared
between several programs attaching to the same per-cgroup lsm hook.

Note that this patch bloats cgroup size because we add 211
cgroup_bpf_attach_type(s) for simplicity sake. This will be
addressed in the subsequent patch.

Also note that we only add non-sleepable flavor for now. To enable
sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu,
shim programs have to be freed via trace rcu, cgroup_bpf.effective
should be also trace-rcu-managed + maybe some other changes that
I'm not aware of.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 arch/x86/net/bpf_jit_comp.c     |  24 ++--
 include/linux/bpf-cgroup-defs.h |   8 ++
 include/linux/bpf-cgroup.h      |   7 ++
 include/linux/bpf.h             |  24 ++++
 include/linux/bpf_lsm.h         |  13 +++
 include/linux/btf_ids.h         |   3 +-
 include/uapi/linux/bpf.h        |   1 +
 kernel/bpf/bpf_lsm.c            |  48 ++++++++
 kernel/bpf/btf.c                |  11 ++
 kernel/bpf/cgroup.c             | 139 ++++++++++++++++++++--
 kernel/bpf/core.c               |   2 +
 kernel/bpf/syscall.c            |  10 ++
 kernel/bpf/trampoline.c         | 198 ++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c           |  32 ++++++
 tools/include/linux/btf_ids.h   |   4 +-
 tools/include/uapi/linux/bpf.h  |   1 +
 16 files changed, 504 insertions(+), 21 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index f298b18a9a3d..54a814fa41e0 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1770,6 +1770,10 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 			   struct bpf_tramp_link *l, int stack_size,
 			   int run_ctx_off, bool save_ret)
 {
+	void (*exit)(struct bpf_prog *prog, u64 start,
+		     struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_exit;
+	u64 (*enter)(struct bpf_prog *prog,
+		     struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_enter;
 	u8 *prog = *pprog;
 	u8 *jmp_insn;
 	int ctx_cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
@@ -1788,15 +1792,21 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	 */
 	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_1, -run_ctx_off + ctx_cookie_off);
 
+	if (p->aux->sleepable) {
+		enter = __bpf_prog_enter_sleepable;
+		exit = __bpf_prog_exit_sleepable;
+	} else if (p->expected_attach_type == BPF_LSM_CGROUP) {
+		enter = __bpf_prog_enter_lsm_cgroup;
+		exit = __bpf_prog_exit_lsm_cgroup;
+	}
+
 	/* arg1: mov rdi, progs[i] */
 	emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
 	/* arg2: lea rsi, [rbp - ctx_cookie_off] */
 	EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
 
-	if (emit_call(&prog,
-		      p->aux->sleepable ? __bpf_prog_enter_sleepable :
-		      __bpf_prog_enter, prog))
-			return -EINVAL;
+	if (emit_call(&prog, enter, prog))
+		return -EINVAL;
 	/* remember prog start time returned by __bpf_prog_enter */
 	emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
 
@@ -1840,10 +1850,8 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
 	/* arg3: lea rdx, [rbp - run_ctx_off] */
 	EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
-	if (emit_call(&prog,
-		      p->aux->sleepable ? __bpf_prog_exit_sleepable :
-		      __bpf_prog_exit, prog))
-			return -EINVAL;
+	if (emit_call(&prog, exit, prog))
+		return -EINVAL;
 
 	*pprog = prog;
 	return 0;
diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
index 5d268e76d8e6..b99f8c3e37ea 100644
--- a/include/linux/bpf-cgroup-defs.h
+++ b/include/linux/bpf-cgroup-defs.h
@@ -10,6 +10,12 @@
 
 struct bpf_prog_array;
 
+#ifdef CONFIG_BPF_LSM
+#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
+#else
+#define CGROUP_LSM_NUM 0
+#endif
+
 enum cgroup_bpf_attach_type {
 	CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
 	CGROUP_INET_INGRESS = 0,
@@ -35,6 +41,8 @@ enum cgroup_bpf_attach_type {
 	CGROUP_INET4_GETSOCKNAME,
 	CGROUP_INET6_GETSOCKNAME,
 	CGROUP_INET_SOCK_RELEASE,
+	CGROUP_LSM_START,
+	CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
 	MAX_CGROUP_BPF_ATTACH_TYPE
 };
 
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 6673acfbf2ef..2bd1b5f8de9b 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -23,6 +23,13 @@ struct ctl_table;
 struct ctl_table_header;
 struct task_struct;
 
+unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx,
+				       const struct bpf_insn *insn);
+unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
+					 const struct bpf_insn *insn);
+unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
+					  const struct bpf_insn *insn);
+
 #ifdef CONFIG_CGROUP_BPF
 
 #define CGROUP_ATYPE(type) \
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b0af68b6297b..4dceb86229f6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -778,6 +778,10 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_
 u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx);
 void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start,
 				       struct bpf_tramp_run_ctx *run_ctx);
+u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog,
+					struct bpf_tramp_run_ctx *run_ctx);
+void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start,
+					struct bpf_tramp_run_ctx *run_ctx);
 void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr);
 void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr);
 
@@ -1044,6 +1048,7 @@ struct bpf_prog_aux {
 	struct user_struct *user;
 	u64 load_time; /* ns since boottime */
 	u32 verified_insns;
+	int cgroup_atype; /* enum cgroup_bpf_attach_type */
 	struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
 	char name[BPF_OBJ_NAME_LEN];
 #ifdef CONFIG_SECURITY
@@ -1117,6 +1122,11 @@ struct bpf_tramp_link {
 	u64 cookie;
 };
 
+struct bpf_shim_tramp_link {
+	struct bpf_tramp_link link;
+	struct bpf_trampoline *trampoline;
+};
+
 struct bpf_tracing_link {
 	struct bpf_tramp_link link;
 	enum bpf_attach_type attach_type;
@@ -1195,6 +1205,9 @@ struct bpf_dummy_ops {
 int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
 			    union bpf_attr __user *uattr);
 #endif
+int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
+				    int cgroup_atype);
+void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog);
 #else
 static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
 {
@@ -1218,6 +1231,14 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map,
 {
 	return -EINVAL;
 }
+static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
+						  int cgroup_atype)
+{
+	return -EOPNOTSUPP;
+}
+static inline void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
+{
+}
 #endif
 
 struct bpf_array {
@@ -2267,6 +2288,8 @@ extern const struct bpf_func_proto bpf_loop_proto;
 extern const struct bpf_func_proto bpf_strncmp_proto;
 extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
 extern const struct bpf_func_proto bpf_kptr_xchg_proto;
+extern const struct bpf_func_proto bpf_set_retval_proto;
+extern const struct bpf_func_proto bpf_get_retval_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
@@ -2384,6 +2407,7 @@ int bpf_arch_text_invalidate(void *dst, size_t len);
 
 struct btf_id_set;
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
+int btf_id_set_index(const struct btf_id_set *set, u32 id);
 
 #define MAX_BPRINTF_VARARGS		12
 
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index 479c101546ad..61787a5f6af9 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -42,6 +42,9 @@ extern const struct bpf_func_proto bpf_inode_storage_get_proto;
 extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
 void bpf_inode_storage_free(struct inode *inode);
 
+int bpf_lsm_hook_idx(u32 btf_id);
+void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
+
 #else /* !CONFIG_BPF_LSM */
 
 static inline bool bpf_lsm_is_sleepable_hook(u32 btf_id)
@@ -65,6 +68,16 @@ static inline void bpf_inode_storage_free(struct inode *inode)
 {
 }
 
+static inline void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
+					   bpf_func_t *bpf_func)
+{
+}
+
+static inline int bpf_lsm_hook_idx(u32 btf_id)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_BPF_LSM */
 
 #endif /* _LINUX_BPF_LSM_H */
diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h
index 335a19092368..252a4befeab1 100644
--- a/include/linux/btf_ids.h
+++ b/include/linux/btf_ids.h
@@ -179,7 +179,8 @@ extern struct btf_id_set name;
 	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock)			\
 	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)			\
 	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock)			\
-	BTF_SOCK_TYPE(BTF_SOCK_TYPE_MPTCP, mptcp_sock)
+	BTF_SOCK_TYPE(BTF_SOCK_TYPE_MPTCP, mptcp_sock)			\
+	BTF_SOCK_TYPE(BTF_SOCK_TYPE_SOCKET, socket)
 
 enum {
 #define BTF_SOCK_TYPE(name, str) name,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f4009dbdf62d..fa64b0b612fd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -998,6 +998,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
 	BPF_TRACE_KPROBE_MULTI,
+	BPF_LSM_CGROUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index c1351df9f7ee..0f72020bfdcf 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -16,6 +16,7 @@
 #include <linux/bpf_local_storage.h>
 #include <linux/btf_ids.h>
 #include <linux/ima.h>
+#include <linux/bpf-cgroup.h>
 
 /* For every LSM hook that allows attachment of BPF programs, declare a nop
  * function where a BPF program can be attached.
@@ -35,6 +36,44 @@ BTF_SET_START(bpf_lsm_hooks)
 #undef LSM_HOOK
 BTF_SET_END(bpf_lsm_hooks)
 
+/* List of LSM hooks that should operate on 'current' cgroup regardless
+ * of function signature.
+ */
+BTF_SET_START(bpf_lsm_current_hooks)
+/* operate on freshly allocated sk without any cgroup association */
+BTF_ID(func, bpf_lsm_sk_alloc_security)
+BTF_ID(func, bpf_lsm_sk_free_security)
+BTF_SET_END(bpf_lsm_current_hooks)
+
+void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
+			     bpf_func_t *bpf_func)
+{
+	const struct btf_param *args;
+
+	if (btf_type_vlen(prog->aux->attach_func_proto) < 1 ||
+	    btf_id_set_contains(&bpf_lsm_current_hooks,
+				prog->aux->attach_btf_id)) {
+		*bpf_func = __cgroup_bpf_run_lsm_current;
+		return;
+	}
+
+	args = btf_params(prog->aux->attach_func_proto);
+
+#ifdef CONFIG_NET
+	if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCKET])
+		*bpf_func = __cgroup_bpf_run_lsm_socket;
+	else if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCK])
+		*bpf_func = __cgroup_bpf_run_lsm_sock;
+	else
+#endif
+		*bpf_func = __cgroup_bpf_run_lsm_current;
+}
+
+int bpf_lsm_hook_idx(u32 btf_id)
+{
+	return btf_id_set_index(&bpf_lsm_hooks, btf_id);
+}
+
 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog)
 {
@@ -158,6 +197,15 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return prog->aux->sleepable ? &bpf_ima_file_hash_proto : NULL;
 	case BPF_FUNC_get_attach_cookie:
 		return bpf_prog_has_trampoline(prog) ? &bpf_get_attach_cookie_proto : NULL;
+	case BPF_FUNC_get_local_storage:
+		return prog->expected_attach_type == BPF_LSM_CGROUP ?
+			&bpf_get_local_storage_proto : NULL;
+	case BPF_FUNC_set_retval:
+		return prog->expected_attach_type == BPF_LSM_CGROUP ?
+			&bpf_set_retval_proto : NULL;
+	case BPF_FUNC_get_retval:
+		return prog->expected_attach_type == BPF_LSM_CGROUP ?
+			&bpf_get_retval_proto : NULL;
 	default:
 		return tracing_prog_func_proto(func_id, prog);
 	}
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6c0d8480e15c..ec070d7f2f7a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5363,6 +5363,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 
 	if (arg == nr_args) {
 		switch (prog->expected_attach_type) {
+		case BPF_LSM_CGROUP:
 		case BPF_LSM_MAC:
 		case BPF_TRACE_FEXIT:
 			/* When LSM programs are attached to void LSM hooks
@@ -6841,6 +6842,16 @@ static int btf_id_cmp_func(const void *a, const void *b)
 	return *pa - *pb;
 }
 
+int btf_id_set_index(const struct btf_id_set *set, u32 id)
+{
+	const u32 *p;
+
+	p = bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func);
+	if (!p)
+		return -1;
+	return p - set->ids;
+}
+
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
 {
 	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 4adb4f3ecb7f..b0314889a409 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -14,6 +14,8 @@
 #include <linux/string.h>
 #include <linux/bpf.h>
 #include <linux/bpf-cgroup.h>
+#include <linux/bpf_lsm.h>
+#include <linux/bpf_verifier.h>
 #include <net/sock.h>
 #include <net/bpf_sk_storage.h>
 
@@ -61,6 +63,88 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
 	return run_ctx.retval;
 }
 
+unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx,
+				       const struct bpf_insn *insn)
+{
+	const struct bpf_prog *shim_prog;
+	struct sock *sk;
+	struct cgroup *cgrp;
+	int ret = 0;
+	u64 *args;
+
+	args = (u64 *)ctx;
+	sk = (void *)(unsigned long)args[0];
+	/*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
+	shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
+
+	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+	if (likely(cgrp))
+		ret = bpf_prog_run_array_cg(&cgrp->bpf,
+					    shim_prog->aux->cgroup_atype,
+					    ctx, bpf_prog_run, 0, NULL);
+	return ret;
+}
+
+unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
+					 const struct bpf_insn *insn)
+{
+	const struct bpf_prog *shim_prog;
+	struct socket *sock;
+	struct cgroup *cgrp;
+	int ret = 0;
+	u64 *args;
+
+	args = (u64 *)ctx;
+	sock = (void *)(unsigned long)args[0];
+	/*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
+	shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
+
+	cgrp = sock_cgroup_ptr(&sock->sk->sk_cgrp_data);
+	if (likely(cgrp))
+		ret = bpf_prog_run_array_cg(&cgrp->bpf,
+					    shim_prog->aux->cgroup_atype,
+					    ctx, bpf_prog_run, 0, NULL);
+	return ret;
+}
+
+unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
+					  const struct bpf_insn *insn)
+{
+	const struct bpf_prog *shim_prog;
+	struct cgroup *cgrp;
+	int ret = 0;
+
+	/*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
+	shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	if (likely(cgrp))
+		ret = bpf_prog_run_array_cg(&cgrp->bpf,
+					    shim_prog->aux->cgroup_atype,
+					    ctx, bpf_prog_run, 0, NULL);
+	rcu_read_unlock();
+	return ret;
+}
+
+#ifdef CONFIG_BPF_LSM
+static enum cgroup_bpf_attach_type
+bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
+{
+	if (attach_type != BPF_LSM_CGROUP)
+		return to_cgroup_bpf_attach_type(attach_type);
+	return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
+}
+#else
+static enum cgroup_bpf_attach_type
+bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
+{
+	if (attach_type != BPF_LSM_CGROUP)
+		return to_cgroup_bpf_attach_type(attach_type);
+	return -EOPNOTSUPP;
+}
+#endif /* CONFIG_BPF_LSM */
+
 void cgroup_bpf_offline(struct cgroup *cgrp)
 {
 	cgroup_get(cgrp);
@@ -163,10 +247,20 @@ static void cgroup_bpf_release(struct work_struct *work)
 
 		hlist_for_each_entry_safe(pl, pltmp, progs, node) {
 			hlist_del(&pl->node);
-			if (pl->prog)
+			if (pl->prog) {
+#ifdef CONFIG_BPF_LSM
+				if (pl->prog->expected_attach_type == BPF_LSM_CGROUP)
+					bpf_trampoline_unlink_cgroup_shim(pl->prog);
+#endif
 				bpf_prog_put(pl->prog);
-			if (pl->link)
+			}
+			if (pl->link) {
+#ifdef CONFIG_BPF_LSM
+				if (pl->link->link.prog->expected_attach_type == BPF_LSM_CGROUP)
+					bpf_trampoline_unlink_cgroup_shim(pl->link->link.prog);
+#endif
 				bpf_cgroup_link_auto_detach(pl->link);
+			}
 			kfree(pl);
 			static_branch_dec(&cgroup_bpf_enabled_key[atype]);
 		}
@@ -479,6 +573,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
 	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
+	struct bpf_prog *new_prog = prog ? : link->link.prog;
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog_list *pl;
 	struct hlist_head *progs;
@@ -495,7 +590,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 		/* replace_prog implies BPF_F_REPLACE, and vice versa */
 		return -EINVAL;
 
-	atype = to_cgroup_bpf_attach_type(type);
+	atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
 	if (atype < 0)
 		return -EINVAL;
 
@@ -549,17 +644,30 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	bpf_cgroup_storages_assign(pl->storage, storage);
 	cgrp->bpf.flags[atype] = saved_flags;
 
+	if (type == BPF_LSM_CGROUP) {
+		err = bpf_trampoline_link_cgroup_shim(new_prog, atype);
+		if (err)
+			goto cleanup;
+	}
+
 	err = update_effective_progs(cgrp, atype);
 	if (err)
-		goto cleanup;
+		goto cleanup_trampoline;
 
-	if (old_prog)
+	if (old_prog) {
+		if (type == BPF_LSM_CGROUP)
+			bpf_trampoline_unlink_cgroup_shim(old_prog);
 		bpf_prog_put(old_prog);
-	else
+	} else {
 		static_branch_inc(&cgroup_bpf_enabled_key[atype]);
+	}
 	bpf_cgroup_storages_link(new_storage, cgrp, type);
 	return 0;
 
+cleanup_trampoline:
+	if (type == BPF_LSM_CGROUP)
+		bpf_trampoline_unlink_cgroup_shim(new_prog);
+
 cleanup:
 	if (old_prog) {
 		pl->prog = old_prog;
@@ -651,7 +759,7 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
 	struct hlist_head *progs;
 	bool found = false;
 
-	atype = to_cgroup_bpf_attach_type(link->type);
+	atype = bpf_cgroup_atype_find(link->type, new_prog->aux->attach_btf_id);
 	if (atype < 0)
 		return -EINVAL;
 
@@ -803,9 +911,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	struct bpf_prog *old_prog;
 	struct bpf_prog_list *pl;
 	struct hlist_head *progs;
+	u32 attach_btf_id = 0;
 	u32 flags;
 
-	atype = to_cgroup_bpf_attach_type(type);
+	if (prog)
+		attach_btf_id = prog->aux->attach_btf_id;
+	if (link)
+		attach_btf_id = link->link.prog->aux->attach_btf_id;
+
+	atype = bpf_cgroup_atype_find(type, attach_btf_id);
 	if (atype < 0)
 		return -EINVAL;
 
@@ -839,8 +953,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (hlist_empty(progs))
 		/* last program was detached, reset flags to zero */
 		cgrp->bpf.flags[atype] = 0;
-	if (old_prog)
+	if (old_prog) {
+		if (type == BPF_LSM_CGROUP)
+			bpf_trampoline_unlink_cgroup_shim(old_prog);
 		bpf_prog_put(old_prog);
+	}
 	static_branch_dec(&cgroup_bpf_enabled_key[atype]);
 	return 0;
 }
@@ -1343,7 +1460,7 @@ BPF_CALL_0(bpf_get_retval)
 	return ctx->retval;
 }
 
-static const struct bpf_func_proto bpf_get_retval_proto = {
+const struct bpf_func_proto bpf_get_retval_proto = {
 	.func		= bpf_get_retval,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
@@ -1358,7 +1475,7 @@ BPF_CALL_1(bpf_set_retval, int, retval)
 	return 0;
 }
 
-static const struct bpf_func_proto bpf_set_retval_proto = {
+const struct bpf_func_proto bpf_set_retval_proto = {
 	.func		= bpf_set_retval,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index e78cc5eea4a5..8d171eb0ed0d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2651,6 +2651,8 @@ const struct bpf_func_proto bpf_get_local_storage_proto __weak;
 const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto __weak;
 const struct bpf_func_proto bpf_snprintf_btf_proto __weak;
 const struct bpf_func_proto bpf_seq_printf_btf_proto __weak;
+const struct bpf_func_proto bpf_set_retval_proto __weak;
+const struct bpf_func_proto bpf_get_retval_proto __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index aeb31137b2ed..a237be4f8bb3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3416,6 +3416,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_SK_LOOKUP;
 	case BPF_XDP:
 		return BPF_PROG_TYPE_XDP;
+	case BPF_LSM_CGROUP:
+		return BPF_PROG_TYPE_LSM;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -3469,6 +3471,11 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_LSM:
+		if (ptype == BPF_PROG_TYPE_LSM &&
+		    prog->expected_attach_type != BPF_LSM_CGROUP)
+			return -EINVAL;
+
 		ret = cgroup_bpf_prog_attach(attr, ptype, prog);
 		break;
 	default:
@@ -3506,6 +3513,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_LSM:
 		return cgroup_bpf_prog_detach(attr, ptype);
 	default:
 		return -EINVAL;
@@ -4540,6 +4548,8 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 			ret = bpf_raw_tp_link_attach(prog, NULL);
 		else if (prog->expected_attach_type == BPF_TRACE_ITER)
 			ret = bpf_iter_link_attach(attr, uattr, prog);
+		else if (prog->expected_attach_type == BPF_LSM_CGROUP)
+			ret = cgroup_bpf_link_attach(attr, prog);
 		else
 			ret = bpf_tracing_prog_attach(prog,
 						      attr->link_create.target_fd,
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 5466e15be61f..023239a10e7c 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -11,6 +11,8 @@
 #include <linux/rcupdate_wait.h>
 #include <linux/module.h>
 #include <linux/static_call.h>
+#include <linux/bpf_verifier.h>
+#include <linux/bpf_lsm.h>
 
 /* dummy _ops. The verifier will operate on target program's ops. */
 const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@@ -496,6 +498,177 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampolin
 	return err;
 }
 
+#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
+static void bpf_shim_tramp_link_release(struct bpf_link *link)
+{
+	struct bpf_shim_tramp_link *shim_link =
+		container_of(link, struct bpf_shim_tramp_link, link.link);
+
+	/* paired with 'shim_link->trampoline = tr' in bpf_trampoline_link_cgroup_shim */
+	if (!shim_link->trampoline)
+		return;
+
+	WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link, shim_link->trampoline));
+	bpf_trampoline_put(shim_link->trampoline);
+}
+
+static void bpf_shim_tramp_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_shim_tramp_link *shim_link =
+		container_of(link, struct bpf_shim_tramp_link, link.link);
+
+	kfree(shim_link);
+}
+
+static const struct bpf_link_ops bpf_shim_tramp_link_lops = {
+	.release = bpf_shim_tramp_link_release,
+	.dealloc = bpf_shim_tramp_link_dealloc,
+};
+
+static struct bpf_shim_tramp_link *cgroup_shim_alloc(const struct bpf_prog *prog,
+						     bpf_func_t bpf_func,
+						     int cgroup_atype)
+{
+	struct bpf_shim_tramp_link *shim_link = NULL;
+	struct bpf_prog *p;
+
+	shim_link = kzalloc(sizeof(*shim_link), GFP_USER);
+	if (!shim_link)
+		return NULL;
+
+	p = bpf_prog_alloc(1, 0);
+	if (!p) {
+		kfree(shim_link);
+		return NULL;
+	}
+
+	p->jited = false;
+	p->bpf_func = bpf_func;
+
+	p->aux->cgroup_atype = cgroup_atype;
+	p->aux->attach_func_proto = prog->aux->attach_func_proto;
+	p->aux->attach_btf_id = prog->aux->attach_btf_id;
+	p->aux->attach_btf = prog->aux->attach_btf;
+	btf_get(p->aux->attach_btf);
+	p->type = BPF_PROG_TYPE_LSM;
+	p->expected_attach_type = BPF_LSM_MAC;
+	bpf_prog_inc(p);
+	bpf_link_init(&shim_link->link.link, BPF_LINK_TYPE_UNSPEC,
+		      &bpf_shim_tramp_link_lops, p);
+
+	return shim_link;
+}
+
+static struct bpf_shim_tramp_link *cgroup_shim_find(struct bpf_trampoline *tr,
+						    bpf_func_t bpf_func)
+{
+	struct bpf_tramp_link *link;
+	int kind;
+
+	for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
+		hlist_for_each_entry(link, &tr->progs_hlist[kind], tramp_hlist) {
+			struct bpf_prog *p = link->link.prog;
+
+			if (p->bpf_func == bpf_func)
+				return container_of(link, struct bpf_shim_tramp_link, link);
+		}
+	}
+
+	return NULL;
+}
+
+int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
+				    int cgroup_atype)
+{
+	struct bpf_shim_tramp_link *shim_link = NULL;
+	struct bpf_attach_target_info tgt_info = {};
+	struct bpf_trampoline *tr;
+	bpf_func_t bpf_func;
+	u64 key;
+	int err;
+
+	err = bpf_check_attach_target(NULL, prog, NULL,
+				      prog->aux->attach_btf_id,
+				      &tgt_info);
+	if (err)
+		return err;
+
+	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
+					 prog->aux->attach_btf_id);
+
+	bpf_lsm_find_cgroup_shim(prog, &bpf_func);
+	tr = bpf_trampoline_get(key, &tgt_info);
+	if (!tr)
+		return  -ENOMEM;
+
+	mutex_lock(&tr->mutex);
+
+	shim_link = cgroup_shim_find(tr, bpf_func);
+	if (shim_link) {
+		/* Reusing existing shim attached by the other program. */
+		bpf_link_inc(&shim_link->link.link);
+
+		mutex_unlock(&tr->mutex);
+		bpf_trampoline_put(tr); /* bpf_trampoline_get above */
+		return 0;
+	}
+
+	/* Allocate and install new shim. */
+
+	shim_link = cgroup_shim_alloc(prog, bpf_func, cgroup_atype);
+	if (!shim_link) {
+		err = -ENOMEM;
+		goto err;
+	}
+
+	err = __bpf_trampoline_link_prog(&shim_link->link, tr);
+	if (err)
+		goto err;
+
+	shim_link->trampoline = tr;
+	/* note, we're still holding tr refcnt from above */
+
+	mutex_unlock(&tr->mutex);
+
+	return 0;
+err:
+	mutex_unlock(&tr->mutex);
+
+	if (shim_link)
+		bpf_link_put(&shim_link->link.link);
+
+	/* have to release tr while _not_ holding its mutex */
+	bpf_trampoline_put(tr); /* bpf_trampoline_get above */
+
+	return err;
+}
+
+void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
+{
+	struct bpf_shim_tramp_link *shim_link = NULL;
+	struct bpf_trampoline *tr;
+	bpf_func_t bpf_func;
+	u64 key;
+
+	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
+					 prog->aux->attach_btf_id);
+
+	bpf_lsm_find_cgroup_shim(prog, &bpf_func);
+	tr = bpf_trampoline_lookup(key);
+	if (!tr)
+		return;
+
+	mutex_lock(&tr->mutex);
+	shim_link = cgroup_shim_find(tr, bpf_func);
+	mutex_unlock(&tr->mutex);
+
+	if (shim_link)
+		bpf_link_put(&shim_link->link.link);
+
+	bpf_trampoline_put(tr); /* bpf_trampoline_lookup above */
+}
+#endif
+
 struct bpf_trampoline *bpf_trampoline_get(u64 key,
 					  struct bpf_attach_target_info *tgt_info)
 {
@@ -628,6 +801,31 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_
 	rcu_read_unlock();
 }
 
+u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog,
+					struct bpf_tramp_run_ctx *run_ctx)
+	__acquires(RCU)
+{
+	/* Runtime stats are exported via actual BPF_LSM_CGROUP
+	 * programs, not the shims.
+	 */
+	rcu_read_lock();
+	migrate_disable();
+
+	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
+
+	return NO_START_TIME;
+}
+
+void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start,
+					struct bpf_tramp_run_ctx *run_ctx)
+	__releases(RCU)
+{
+	bpf_reset_run_ctx(run_ctx->saved_run_ctx);
+
+	migrate_enable();
+	rcu_read_unlock();
+}
+
 u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
 {
 	rcu_read_lock_trace();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2d2872682278..975145d6bbee 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7264,6 +7264,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 				reg_type_str(env, regs[BPF_REG_1].type));
 			return -EACCES;
 		}
+		break;
+	case BPF_FUNC_set_retval:
+		if (env->prog->expected_attach_type == BPF_LSM_CGROUP) {
+			if (!env->prog->aux->attach_func_proto->type) {
+				/* Make sure programs that attach to void
+				 * hooks don't try to modify return value.
+				 */
+				verbose(env, "BPF_LSM_CGROUP that attach to void LSM hooks can't modify return value!\n");
+				return -EINVAL;
+			}
+		}
+		break;
 	}
 
 	if (err)
@@ -10474,6 +10486,22 @@ static int check_return_code(struct bpf_verifier_env *env)
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		range = tnum_range(SK_DROP, SK_PASS);
 		break;
+
+	case BPF_PROG_TYPE_LSM:
+		if (env->prog->expected_attach_type != BPF_LSM_CGROUP) {
+			/* Regular BPF_PROG_TYPE_LSM programs can return
+			 * any value.
+			 */
+			return 0;
+		}
+		if (!env->prog->aux->attach_func_proto->type) {
+			/* Make sure programs that attach to void
+			 * hooks don't try to modify return value.
+			 */
+			range = tnum_range(1, 1);
+		}
+		break;
+
 	case BPF_PROG_TYPE_EXT:
 		/* freplace program can return anything as its return value
 		 * depends on the to-be-replaced kernel func or bpf program.
@@ -10490,6 +10518,9 @@ static int check_return_code(struct bpf_verifier_env *env)
 
 	if (!tnum_in(range, reg->var_off)) {
 		verbose_invalid_scalar(env, reg, &range, "program exit", "R0");
+		if (prog->expected_attach_type == BPF_LSM_CGROUP &&
+		    !prog->aux->attach_func_proto->type)
+			verbose(env, "Note, BPF_LSM_CGROUP that attach to void LSM hooks can't modify return value!\n");
 		return -EINVAL;
 	}
 
@@ -14713,6 +14744,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		fallthrough;
 	case BPF_MODIFY_RETURN:
 	case BPF_LSM_MAC:
+	case BPF_LSM_CGROUP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
 		if (!btf_type_is_func(t)) {
diff --git a/tools/include/linux/btf_ids.h b/tools/include/linux/btf_ids.h
index 57890b357f85..2345b502b439 100644
--- a/tools/include/linux/btf_ids.h
+++ b/tools/include/linux/btf_ids.h
@@ -172,7 +172,9 @@ extern struct btf_id_set name;
 	BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP_TW, tcp_timewait_sock)		\
 	BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP6, tcp6_sock)			\
 	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock)			\
-	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)
+	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)			\
+	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock)			\
+	BTF_SOCK_TYPE(BTF_SOCK_TYPE_SOCKET, socket)
 
 enum {
 #define BTF_SOCK_TYPE(name, str) name,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f4009dbdf62d..fa64b0b612fd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -998,6 +998,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
 	BPF_TRACE_KPROBE_MULTI,
+	BPF_LSM_CGROUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2022-06-10 16:57 ` [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor Stanislav Fomichev
@ 2022-06-10 16:57 ` Stanislav Fomichev
  2022-06-11 16:53   ` kernel test robot
  2022-06-17  0:43   ` Martin KaFai Lau
  2022-06-10 16:57 ` [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP Stanislav Fomichev
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:57 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

Previous patch adds 1:1 mapping between all 211 LSM hooks
and bpf_cgroup program array. Instead of reserving a slot per
possible hook, reserve 10 slots per cgroup for lsm programs.
Those slots are dynamically allocated on demand and reclaimed.

struct cgroup_bpf {
	struct bpf_prog_array *    effective[33];        /*     0   264 */
	/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
	struct hlist_head          progs[33];            /*   264   264 */
	/* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */
	u8                         flags[33];            /*   528    33 */

	/* XXX 7 bytes hole, try to pack */

	struct list_head           storages;             /*   568    16 */
	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
	struct bpf_prog_array *    inactive;             /*   584     8 */
	struct percpu_ref          refcnt;               /*   592    16 */
	struct work_struct         release_work;         /*   608    72 */

	/* size: 680, cachelines: 11, members: 7 */
	/* sum members: 673, holes: 1, sum holes: 7 */
	/* last cacheline: 40 bytes */
};

Move 'ifdef CONFIG_CGROUP_BPF' to expose CGROUP_BPF_ATTACH_TYPE_INVALID
to non-cgroup (core) parts.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf-cgroup-defs.h |  3 +-
 include/linux/bpf.h             |  4 ++-
 include/linux/bpf_lsm.h         |  6 ----
 kernel/bpf/bpf_lsm.c            |  5 ---
 kernel/bpf/btf.c                | 10 ------
 kernel/bpf/cgroup.c             | 54 ++++++++++++++++++++++++++++++++-
 kernel/bpf/core.c               |  7 +++++
 kernel/bpf/trampoline.c         |  1 +
 8 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
index b99f8c3e37ea..7b121bd780eb 100644
--- a/include/linux/bpf-cgroup-defs.h
+++ b/include/linux/bpf-cgroup-defs.h
@@ -11,7 +11,8 @@
 struct bpf_prog_array;
 
 #ifdef CONFIG_BPF_LSM
-#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
+/* Maximum number of concurrently attachable per-cgroup LSM hooks. */
+#define CGROUP_LSM_NUM 10
 #else
 #define CGROUP_LSM_NUM 0
 #endif
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4dceb86229f6..503f28fa66d2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2407,7 +2407,6 @@ int bpf_arch_text_invalidate(void *dst, size_t len);
 
 struct btf_id_set;
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
-int btf_id_set_index(const struct btf_id_set *set, u32 id);
 
 #define MAX_BPRINTF_VARARGS		12
 
@@ -2444,4 +2443,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
 void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
 int bpf_dynptr_check_size(u32 size);
 
+void bpf_cgroup_atype_put(int cgroup_atype);
+void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index 61787a5f6af9..4bcf76a9bb06 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -42,7 +42,6 @@ extern const struct bpf_func_proto bpf_inode_storage_get_proto;
 extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
 void bpf_inode_storage_free(struct inode *inode);
 
-int bpf_lsm_hook_idx(u32 btf_id);
 void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
 
 #else /* !CONFIG_BPF_LSM */
@@ -73,11 +72,6 @@ static inline void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
 {
 }
 
-static inline int bpf_lsm_hook_idx(u32 btf_id)
-{
-	return -EINVAL;
-}
-
 #endif /* CONFIG_BPF_LSM */
 
 #endif /* _LINUX_BPF_LSM_H */
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 0f72020bfdcf..83aa431dd52e 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -69,11 +69,6 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
 		*bpf_func = __cgroup_bpf_run_lsm_current;
 }
 
-int bpf_lsm_hook_idx(u32 btf_id)
-{
-	return btf_id_set_index(&bpf_lsm_hooks, btf_id);
-}
-
 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog)
 {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ec070d7f2f7a..ed8099e7550a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6842,16 +6842,6 @@ static int btf_id_cmp_func(const void *a, const void *b)
 	return *pa - *pb;
 }
 
-int btf_id_set_index(const struct btf_id_set *set, u32 id)
-{
-	const u32 *p;
-
-	p = bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func);
-	if (!p)
-		return -1;
-	return p - set->ids;
-}
-
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
 {
 	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index b0314889a409..ba402d50e130 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -128,12 +128,56 @@ unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
 }
 
 #ifdef CONFIG_BPF_LSM
+struct cgroup_lsm_atype {
+	u32 attach_btf_id;
+	int refcnt;
+};
+
+static struct cgroup_lsm_atype cgroup_lsm_atype[CGROUP_LSM_NUM];
+
 static enum cgroup_bpf_attach_type
 bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
 {
+	int i;
+
+	lockdep_assert_held(&cgroup_mutex);
+
 	if (attach_type != BPF_LSM_CGROUP)
 		return to_cgroup_bpf_attach_type(attach_type);
-	return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
+
+	for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype); i++)
+		if (cgroup_lsm_atype[i].attach_btf_id == attach_btf_id)
+			return CGROUP_LSM_START + i;
+
+	for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype); i++)
+		if (cgroup_lsm_atype[i].attach_btf_id == 0)
+			return CGROUP_LSM_START + i;
+
+	return -E2BIG;
+
+}
+
+void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype)
+{
+	int i = cgroup_atype - CGROUP_LSM_START;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	WARN_ON_ONCE(cgroup_lsm_atype[i].attach_btf_id &&
+		     cgroup_lsm_atype[i].attach_btf_id != attach_btf_id);
+
+	cgroup_lsm_atype[i].attach_btf_id = attach_btf_id;
+	cgroup_lsm_atype[i].refcnt++;
+}
+
+void bpf_cgroup_atype_put(int cgroup_atype)
+{
+	int i = cgroup_atype - CGROUP_LSM_START;
+
+	mutex_lock(&cgroup_mutex);
+	if (--cgroup_lsm_atype[i].refcnt <= 0)
+		cgroup_lsm_atype[i].attach_btf_id = 0;
+	mutex_unlock(&cgroup_mutex);
 }
 #else
 static enum cgroup_bpf_attach_type
@@ -143,6 +187,14 @@ bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
 		return to_cgroup_bpf_attach_type(attach_type);
 	return -EOPNOTSUPP;
 }
+
+void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype)
+{
+}
+
+void bpf_cgroup_atype_put(int cgroup_atype)
+{
+}
 #endif /* CONFIG_BPF_LSM */
 
 void cgroup_bpf_offline(struct cgroup *cgrp)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8d171eb0ed0d..0699098dc6bc 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -107,6 +107,9 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
 	fp->aux->prog = fp;
 	fp->jit_requested = ebpf_jit_enabled();
 	fp->blinding_requested = bpf_jit_blinding_enabled(fp);
+#ifdef CONFIG_CGROUP_BPF
+	aux->cgroup_atype = CGROUP_BPF_ATTACH_TYPE_INVALID;
+#endif
 
 	INIT_LIST_HEAD_RCU(&fp->aux->ksym.lnode);
 	mutex_init(&fp->aux->used_maps_mutex);
@@ -2554,6 +2557,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 	aux = container_of(work, struct bpf_prog_aux, work);
 #ifdef CONFIG_BPF_SYSCALL
 	bpf_free_kfunc_btf_tab(aux->kfunc_btf_tab);
+#endif
+#ifdef CONFIG_CGROUP_BPF
+	if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
+		bpf_cgroup_atype_put(aux->cgroup_atype);
 #endif
 	bpf_free_used_maps(aux);
 	bpf_free_used_btfs(aux);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 023239a10e7c..e849dd243624 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -555,6 +555,7 @@ static struct bpf_shim_tramp_link *cgroup_shim_alloc(const struct bpf_prog *prog
 	bpf_prog_inc(p);
 	bpf_link_init(&shim_link->link.link, BPF_LINK_TYPE_UNSPEC,
 		      &bpf_shim_tramp_link_lops, p);
+	bpf_cgroup_atype_get(p->aux->attach_btf_id, cgroup_atype);
 
 	return shim_link;
 }
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (3 preceding siblings ...)
  2022-06-10 16:57 ` [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
@ 2022-06-10 16:57 ` Stanislav Fomichev
  2022-06-17  0:58   ` Martin KaFai Lau
  2022-06-10 16:57 ` [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup Stanislav Fomichev
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:57 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

We have two options:
1. Treat all BPF_LSM_CGROUP the same, regardless of attach_btf_id
2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate hook point

I was doing (2) in the original patch, but switching to (1) here:

* bpf_prog_query returns all attached BPF_LSM_CGROUP programs
regardless of attach_btf_id
* attach_btf_id is exported via bpf_prog_info

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/uapi/linux/bpf.h |  3 ++
 kernel/bpf/cgroup.c      | 91 +++++++++++++++++++++++++++-------------
 kernel/bpf/syscall.c     |  8 +++-
 3 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fa64b0b612fd..4271ef3c2afb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1432,6 +1432,7 @@ union bpf_attr {
 		__u32		attach_flags;
 		__aligned_u64	prog_ids;
 		__u32		prog_cnt;
+		__aligned_u64	prog_attach_flags; /* output: per-program attach_flags */
 	} query;
 
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
@@ -5996,6 +5997,8 @@ struct bpf_prog_info {
 	__u64 run_cnt;
 	__u64 recursion_misses;
 	__u32 verified_insns;
+	__u32 attach_btf_obj_id;
+	__u32 attach_btf_id;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ba402d50e130..c869317479ec 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1029,57 +1029,92 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 			      union bpf_attr __user *uattr)
 {
+	__u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
 	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
 	enum bpf_attach_type type = attr->query.attach_type;
+	enum cgroup_bpf_attach_type from_atype, to_atype;
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog_array *effective;
 	struct hlist_head *progs;
 	struct bpf_prog *prog;
 	int cnt, ret = 0, i;
+	int total_cnt = 0;
 	u32 flags;
 
-	atype = to_cgroup_bpf_attach_type(type);
-	if (atype < 0)
-		return -EINVAL;
+	if (type == BPF_LSM_CGROUP) {
+		if (attr->query.prog_cnt && prog_ids && !prog_attach_flags)
+			return -EINVAL;
 
-	progs = &cgrp->bpf.progs[atype];
-	flags = cgrp->bpf.flags[atype];
+		from_atype = CGROUP_LSM_START;
+		to_atype = CGROUP_LSM_END;
+		flags = 0;
+	} else {
+		from_atype = to_cgroup_bpf_attach_type(type);
+		if (from_atype < 0)
+			return -EINVAL;
+		to_atype = from_atype;
+		flags = cgrp->bpf.flags[from_atype];
+	}
 
-	effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
-					      lockdep_is_held(&cgroup_mutex));
+	for (atype = from_atype; atype <= to_atype; atype++) {
+		progs = &cgrp->bpf.progs[atype];
 
-	if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE)
-		cnt = bpf_prog_array_length(effective);
-	else
-		cnt = prog_list_length(progs);
+		if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
+			effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
+							      lockdep_is_held(&cgroup_mutex));
+			total_cnt += bpf_prog_array_length(effective);
+		} else {
+			total_cnt += prog_list_length(progs);
+		}
+	}
 
 	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
 		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt)))
+	if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
 		return -EFAULT;
-	if (attr->query.prog_cnt == 0 || !prog_ids || !cnt)
+	if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
 		/* return early if user requested only program count + flags */
 		return 0;
-	if (attr->query.prog_cnt < cnt) {
-		cnt = attr->query.prog_cnt;
+
+	if (attr->query.prog_cnt < total_cnt) {
+		total_cnt = attr->query.prog_cnt;
 		ret = -ENOSPC;
 	}
 
-	if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
-		return bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
-	} else {
-		struct bpf_prog_list *pl;
-		u32 id;
+	for (atype = from_atype; atype <= to_atype && total_cnt; atype++) {
+		progs = &cgrp->bpf.progs[atype];
+		flags = cgrp->bpf.flags[atype];
 
-		i = 0;
-		hlist_for_each_entry(pl, progs, node) {
-			prog = prog_list_prog(pl);
-			id = prog->aux->id;
-			if (copy_to_user(prog_ids + i, &id, sizeof(id)))
-				return -EFAULT;
-			if (++i == cnt)
-				break;
+		if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
+			effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
+							      lockdep_is_held(&cgroup_mutex));
+			cnt = min_t(int, bpf_prog_array_length(effective), total_cnt);
+			ret = bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
+		} else {
+			struct bpf_prog_list *pl;
+			u32 id;
+
+			cnt = min_t(int, prog_list_length(progs), total_cnt);
+			i = 0;
+			hlist_for_each_entry(pl, progs, node) {
+				prog = prog_list_prog(pl);
+				id = prog->aux->id;
+				if (copy_to_user(prog_ids + i, &id, sizeof(id)))
+					return -EFAULT;
+				if (++i == cnt)
+					break;
+			}
 		}
+
+		if (prog_attach_flags) {
+			for (i = 0; i < cnt; i++)
+				if (copy_to_user(prog_attach_flags + i, &flags, sizeof(flags)))
+					return -EFAULT;
+			prog_attach_flags += cnt;
+		}
+
+		prog_ids += cnt;
+		total_cnt -= cnt;
 	}
 	return ret;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a237be4f8bb3..b826247de971 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3520,7 +3520,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	}
 }
 
-#define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
+#define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags
 
 static int bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
@@ -3556,6 +3556,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_SYSCTL:
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
+	case BPF_LSM_CGROUP:
 		return cgroup_bpf_prog_query(attr, uattr);
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
@@ -4066,6 +4067,11 @@ static int bpf_prog_get_info_by_fd(struct file *file,
 
 	if (prog->aux->btf)
 		info.btf_id = btf_obj_id(prog->aux->btf);
+	info.attach_btf_id = prog->aux->attach_btf_id;
+	if (prog->aux->attach_btf)
+		info.attach_btf_obj_id = btf_obj_id(prog->aux->attach_btf);
+	else if (prog->aux->dst_prog)
+		info.attach_btf_obj_id = btf_obj_id(prog->aux->dst_prog->aux->attach_btf);
 
 	ulen = info.nr_func_info;
 	info.nr_func_info = prog->aux->func_info_cnt;
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (4 preceding siblings ...)
  2022-06-10 16:57 ` [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP Stanislav Fomichev
@ 2022-06-10 16:57 ` Stanislav Fomichev
  2022-06-17  5:42   ` Martin KaFai Lau
  2022-06-10 16:58 ` [PATCH bpf-next v9 07/10] libbpf: add lsm_cgoup_sock type Stanislav Fomichev
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:57 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

I don't see how to make it nice without introducing btf id lists
for the hooks where these helpers are allowed. Some LSM hooks
work on the locked sockets, some are triggering early and
don't grab any locks, so have two lists for now:

1. LSM hooks which trigger under socket lock - minority of the hooks,
   but ideal case for us, we can expose existing BTF-based helpers
2. LSM hooks which trigger without socket lock, but they trigger
   early in the socket creation path where it should be safe to
   do setsockopt without any locks
3. The rest are prohibited. I'm thinking that this use-case might
   be a good gateway to sleeping lsm cgroup hooks in the future.
   We can either expose lock/unlock operations (and add tracking
   to the verifier) or have another set of bpf_setsockopt
   wrapper that grab the locks and might sleep.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/bpf_lsm.c | 40 +++++++++++++++++++++++++++++
 net/core/filter.c    | 60 ++++++++++++++++++++++++++++++++++++++------
 3 files changed, 95 insertions(+), 7 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 503f28fa66d2..c0a269269882 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2282,6 +2282,8 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
 extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
+extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
+extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
 extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
 extern const struct bpf_func_proto bpf_find_vma_proto;
 extern const struct bpf_func_proto bpf_loop_proto;
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 83aa431dd52e..52b6e3067986 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -45,6 +45,26 @@ BTF_ID(func, bpf_lsm_sk_alloc_security)
 BTF_ID(func, bpf_lsm_sk_free_security)
 BTF_SET_END(bpf_lsm_current_hooks)
 
+/* List of LSM hooks that trigger while the socket is properly locked.
+ */
+BTF_SET_START(bpf_lsm_locked_sockopt_hooks)
+BTF_ID(func, bpf_lsm_socket_sock_rcv_skb)
+BTF_ID(func, bpf_lsm_sk_clone_security)
+BTF_ID(func, bpf_lsm_sock_graft)
+BTF_ID(func, bpf_lsm_inet_csk_clone)
+BTF_ID(func, bpf_lsm_inet_conn_established)
+BTF_ID(func, bpf_lsm_sctp_bind_connect)
+BTF_SET_END(bpf_lsm_locked_sockopt_hooks)
+
+/* List of LSM hooks that trigger while the socket is _not_ locked,
+ * but it's ok to call bpf_{g,s}etsockopt because the socket is still
+ * in the early init phase.
+ */
+BTF_SET_START(bpf_lsm_unlocked_sockopt_hooks)
+BTF_ID(func, bpf_lsm_socket_post_create)
+BTF_ID(func, bpf_lsm_socket_socketpair)
+BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks)
+
 void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
 			     bpf_func_t *bpf_func)
 {
@@ -201,6 +221,26 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_retval:
 		return prog->expected_attach_type == BPF_LSM_CGROUP ?
 			&bpf_get_retval_proto : NULL;
+	case BPF_FUNC_setsockopt:
+		if (prog->expected_attach_type != BPF_LSM_CGROUP)
+			return NULL;
+		if (btf_id_set_contains(&bpf_lsm_locked_sockopt_hooks,
+					prog->aux->attach_btf_id))
+			return &bpf_sk_setsockopt_proto;
+		if (btf_id_set_contains(&bpf_lsm_unlocked_sockopt_hooks,
+					prog->aux->attach_btf_id))
+			return &bpf_unlocked_sk_setsockopt_proto;
+		return NULL;
+	case BPF_FUNC_getsockopt:
+		if (prog->expected_attach_type != BPF_LSM_CGROUP)
+			return NULL;
+		if (btf_id_set_contains(&bpf_lsm_locked_sockopt_hooks,
+					prog->aux->attach_btf_id))
+			return &bpf_sk_getsockopt_proto;
+		if (btf_id_set_contains(&bpf_lsm_unlocked_sockopt_hooks,
+					prog->aux->attach_btf_id))
+			return &bpf_unlocked_sk_getsockopt_proto;
+		return NULL;
 	default:
 		return tracing_prog_func_proto(func_id, prog);
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index 5af58eb48587..57cc22e3750e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5012,8 +5012,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
-static int _bpf_setsockopt(struct sock *sk, int level, int optname,
-			   char *optval, int optlen)
+static int __bpf_setsockopt(struct sock *sk, int level, int optname,
+			    char *optval, int optlen)
 {
 	char devname[IFNAMSIZ];
 	int val, valbool;
@@ -5024,8 +5024,6 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 	if (!sk_fullsock(sk))
 		return -EINVAL;
 
-	sock_owned_by_me(sk);
-
 	if (level == SOL_SOCKET) {
 		if (optlen != sizeof(int) && optname != SO_BINDTODEVICE)
 			return -EINVAL;
@@ -5258,14 +5256,20 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 	return ret;
 }
 
-static int _bpf_getsockopt(struct sock *sk, int level, int optname,
+static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 			   char *optval, int optlen)
+{
+	if (sk_fullsock(sk))
+		sock_owned_by_me(sk);
+	return __bpf_setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int __bpf_getsockopt(struct sock *sk, int level, int optname,
+			    char *optval, int optlen)
 {
 	if (!sk_fullsock(sk))
 		goto err_clear;
 
-	sock_owned_by_me(sk);
-
 	if (level == SOL_SOCKET) {
 		if (optlen != sizeof(int))
 			goto err_clear;
@@ -5360,6 +5364,14 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname,
 	return -EINVAL;
 }
 
+static int _bpf_getsockopt(struct sock *sk, int level, int optname,
+			   char *optval, int optlen)
+{
+	if (sk_fullsock(sk))
+		sock_owned_by_me(sk);
+	return __bpf_getsockopt(sk, level, optname, optval, optlen);
+}
+
 BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level,
 	   int, optname, char *, optval, int, optlen)
 {
@@ -5400,6 +5412,40 @@ const struct bpf_func_proto bpf_sk_getsockopt_proto = {
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
+BPF_CALL_5(bpf_unlocked_sk_setsockopt, struct sock *, sk, int, level,
+	   int, optname, char *, optval, int, optlen)
+{
+	return __bpf_setsockopt(sk, level, optname, optval, optlen);
+}
+
+const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto = {
+	.func		= bpf_unlocked_sk_setsockopt,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg5_type	= ARG_CONST_SIZE,
+};
+
+BPF_CALL_5(bpf_unlocked_sk_getsockopt, struct sock *, sk, int, level,
+	   int, optname, char *, optval, int, optlen)
+{
+	return __bpf_getsockopt(sk, level, optname, optval, optlen);
+}
+
+const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto = {
+	.func		= bpf_unlocked_sk_getsockopt,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
+};
+
 BPF_CALL_5(bpf_sock_addr_setsockopt, struct bpf_sock_addr_kern *, ctx,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 07/10] libbpf: add lsm_cgoup_sock type
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (5 preceding siblings ...)
  2022-06-10 16:57 ` [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup Stanislav Fomichev
@ 2022-06-10 16:58 ` Stanislav Fomichev
  2022-06-10 16:58 ` [PATCH bpf-next v9 08/10] libbpf: implement bpf_prog_query_opts Stanislav Fomichev
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:58 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

lsm_cgroup/ is the prefix for BPF_LSM_CGROUP.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/lib/bpf/libbpf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0781fae58a06..cb1720de1b65 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -107,6 +107,7 @@ static const char * const attach_type_name[] = {
 	[BPF_TRACE_FEXIT]		= "trace_fexit",
 	[BPF_MODIFY_RETURN]		= "modify_return",
 	[BPF_LSM_MAC]			= "lsm_mac",
+	[BPF_LSM_CGROUP]		= "lsm_cgroup",
 	[BPF_SK_LOOKUP]			= "sk_lookup",
 	[BPF_TRACE_ITER]		= "trace_iter",
 	[BPF_XDP_DEVMAP]		= "xdp_devmap",
@@ -9201,6 +9202,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("freplace+",		EXT, 0, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("lsm+",			LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm),
 	SEC_DEF("lsm.s+",		LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
+	SEC_DEF("lsm_cgroup+",		LSM, BPF_LSM_CGROUP, SEC_ATTACH_BTF),
 	SEC_DEF("iter+",		TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF, attach_iter),
 	SEC_DEF("iter.s+",		TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_iter),
 	SEC_DEF("syscall",		SYSCALL, 0, SEC_SLEEPABLE),
@@ -9654,6 +9656,7 @@ void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
 		*kind = BTF_KIND_TYPEDEF;
 		break;
 	case BPF_LSM_MAC:
+	case BPF_LSM_CGROUP:
 		*prefix = BTF_LSM_PREFIX;
 		*kind = BTF_KIND_FUNC;
 		break;
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 08/10] libbpf: implement bpf_prog_query_opts
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (6 preceding siblings ...)
  2022-06-10 16:58 ` [PATCH bpf-next v9 07/10] libbpf: add lsm_cgoup_sock type Stanislav Fomichev
@ 2022-06-10 16:58 ` Stanislav Fomichev
  2022-06-10 16:58 ` [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP Stanislav Fomichev
  2022-06-10 16:58 ` [PATCH bpf-next v9 10/10] selftests/bpf: lsm_cgroup functional test Stanislav Fomichev
  9 siblings, 0 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:58 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

Implement bpf_prog_query_opts as a more expendable version of
bpf_prog_query. Expose new prog_attach_flags and attach_btf_func_id as
well:

* prog_attach_flags is a per-program attach_type; relevant only for
  lsm cgroup program which might have different attach_flags
  per attach_btf_id
* attach_btf_func_id is a new field expose for prog_query which
  specifies real btf function id for lsm cgroup attachments

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h |  3 +++
 tools/lib/bpf/bpf.c            | 38 +++++++++++++++++++++++++++-------
 tools/lib/bpf/bpf.h            | 15 ++++++++++++++
 tools/lib/bpf/libbpf.map       |  1 +
 4 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index fa64b0b612fd..4271ef3c2afb 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1432,6 +1432,7 @@ union bpf_attr {
 		__u32		attach_flags;
 		__aligned_u64	prog_ids;
 		__u32		prog_cnt;
+		__aligned_u64	prog_attach_flags; /* output: per-program attach_flags */
 	} query;
 
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
@@ -5996,6 +5997,8 @@ struct bpf_prog_info {
 	__u64 run_cnt;
 	__u64 recursion_misses;
 	__u32 verified_insns;
+	__u32 attach_btf_obj_id;
+	__u32 attach_btf_id;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 240186aac8e6..accc97cf9928 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -888,24 +888,48 @@ int bpf_iter_create(int link_fd)
 	return libbpf_err_errno(fd);
 }
 
-int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
-		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
+int bpf_prog_query_opts(int target_fd,
+			enum bpf_attach_type type,
+			struct bpf_prog_query_opts *opts)
 {
 	union bpf_attr attr;
 	int ret;
 
+	if (!OPTS_VALID(opts, bpf_prog_query_opts))
+		return libbpf_err(-EINVAL);
+
 	memset(&attr, 0, sizeof(attr));
+
 	attr.query.target_fd	= target_fd;
 	attr.query.attach_type	= type;
-	attr.query.query_flags	= query_flags;
-	attr.query.prog_cnt	= *prog_cnt;
-	attr.query.prog_ids	= ptr_to_u64(prog_ids);
+	attr.query.query_flags	= OPTS_GET(opts, query_flags, 0);
+	attr.query.prog_cnt	= OPTS_GET(opts, prog_cnt, 0);
+	attr.query.prog_ids	= ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
+	attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
 
 	ret = sys_bpf(BPF_PROG_QUERY, &attr, sizeof(attr));
 
+	OPTS_SET(opts, attach_flags, attr.query.attach_flags);
+	OPTS_SET(opts, prog_cnt, attr.query.prog_cnt);
+
+	return libbpf_err_errno(ret);
+}
+
+int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
+		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
+{
+	LIBBPF_OPTS(bpf_prog_query_opts, opts);
+	int ret;
+
+	opts.query_flags = query_flags;
+	opts.prog_ids = prog_ids;
+	opts.prog_cnt = *prog_cnt;
+
+	ret = bpf_prog_query_opts(target_fd, type, &opts);
+
 	if (attach_flags)
-		*attach_flags = attr.query.attach_flags;
-	*prog_cnt = attr.query.prog_cnt;
+		*attach_flags = opts.attach_flags;
+	*prog_cnt = opts.prog_cnt;
 
 	return libbpf_err_errno(ret);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index cabc03703e29..e8f70ce6b537 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -442,9 +442,24 @@ LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_link_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len);
+
+struct bpf_prog_query_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 query_flags;
+	__u32 attach_flags; /* output argument */
+	__u32 *prog_ids;
+	__u32 prog_cnt; /* input+output argument */
+	__u32 *prog_attach_flags;
+};
+#define bpf_prog_query_opts__last_field prog_attach_flags
+
+LIBBPF_API int bpf_prog_query_opts(int target_fd,
+				   enum bpf_attach_type type,
+				   struct bpf_prog_query_opts *opts);
 LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
 			      __u32 query_flags, __u32 *attach_flags,
 			      __u32 *prog_ids, __u32 *prog_cnt);
+
 LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
 LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
 				 __u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 116a2a8ee7c2..03c69cb821b3 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -462,6 +462,7 @@ LIBBPF_0.8.0 {
 
 LIBBPF_1.0.0 {
 	global:
+		bpf_prog_query_opts;
 		btf__add_enum64;
 		btf__add_enum64_value;
 		libbpf_bpf_attach_type_str;
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (7 preceding siblings ...)
  2022-06-10 16:58 ` [PATCH bpf-next v9 08/10] libbpf: implement bpf_prog_query_opts Stanislav Fomichev
@ 2022-06-10 16:58 ` Stanislav Fomichev
  2022-06-13 12:07   ` Quentin Monnet
  2022-06-17  5:58   ` Martin KaFai Lau
  2022-06-10 16:58 ` [PATCH bpf-next v9 10/10] selftests/bpf: lsm_cgroup functional test Stanislav Fomichev
  9 siblings, 2 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:58 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

$ bpftool --nomount prog loadall $KDIR/tools/testing/selftests/bpf/lsm_cgroup.o /sys/fs/bpf/x
$ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_alloc
$ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_bind
$ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_clone
$ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
$ bpftool cgroup tree
CgroupPath
ID       AttachType      AttachFlags     Name
/sys/fs/cgroup
6        lsm_cgroup                      socket_post_create bpf_lsm_socket_post_create
8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone

$ bpftool cgroup detach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
$ bpftool cgroup tree
CgroupPath
ID       AttachType      AttachFlags     Name
/sys/fs/cgroup
8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/bpf/bpftool/cgroup.c | 80 +++++++++++++++++++++++++++-----------
 1 file changed, 58 insertions(+), 22 deletions(-)

diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index 42421fe47a58..6e55f583a62f 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -15,6 +15,7 @@
 #include <unistd.h>
 
 #include <bpf/bpf.h>
+#include <bpf/btf.h>
 
 #include "main.h"
 
@@ -36,6 +37,7 @@
 	"                        cgroup_inet_sock_release }"
 
 static unsigned int query_flags;
+static struct btf *btf_vmlinux;
 
 static enum bpf_attach_type parse_attach_type(const char *str)
 {
@@ -69,6 +71,7 @@ static int show_bpf_prog(int id, enum bpf_attach_type attach_type,
 			 int level)
 {
 	char prog_name[MAX_PROG_FULL_NAME];
+	const char *attach_btf_name = NULL;
 	struct bpf_prog_info info = {};
 	const char *attach_type_str;
 	__u32 info_len = sizeof(info);
@@ -84,6 +87,19 @@ static int show_bpf_prog(int id, enum bpf_attach_type attach_type,
 	}
 
 	attach_type_str = libbpf_bpf_attach_type_str(attach_type);
+
+	if (btf_vmlinux &&
+	    info.attach_btf_id < btf__type_cnt(btf_vmlinux)) {
+		/* Note, we ignore info.attach_btf_obj_id for now. There
+		 * is no good way to resolve btf_id to vmlinux
+		 * or module btf.
+		 */
+		const struct btf_type *t = btf__type_by_id(btf_vmlinux,
+							   info.attach_btf_id);
+		attach_btf_name = btf__name_by_offset(btf_vmlinux,
+						      t->name_off);
+	}
+
 	get_prog_full_name(&info, prog_fd, prog_name, sizeof(prog_name));
 	if (json_output) {
 		jsonw_start_object(json_wtr);
@@ -95,6 +111,10 @@ static int show_bpf_prog(int id, enum bpf_attach_type attach_type,
 		jsonw_string_field(json_wtr, "attach_flags",
 				   attach_flags_str);
 		jsonw_string_field(json_wtr, "name", prog_name);
+		if (attach_btf_name)
+			jsonw_string_field(json_wtr, "attach_btf_name", attach_btf_name);
+		jsonw_uint_field(json_wtr, "attach_btf_obj_id", info.attach_btf_obj_id);
+		jsonw_uint_field(json_wtr, "attach_btf_id", info.attach_btf_id);
 		jsonw_end_object(json_wtr);
 	} else {
 		printf("%s%-8u ", level ? "    " : "", info.id);
@@ -102,7 +122,13 @@ static int show_bpf_prog(int id, enum bpf_attach_type attach_type,
 			printf("%-15s", attach_type_str);
 		else
 			printf("type %-10u", attach_type);
-		printf(" %-15s %-15s\n", attach_flags_str, prog_name);
+		printf(" %-15s %-15s", attach_flags_str, prog_name);
+		if (attach_btf_name)
+			printf(" %-15s", attach_btf_name);
+		else if (info.attach_btf_id)
+			printf(" attach_btf_obj_id=%d attach_btf_id=%d",
+			       info.attach_btf_obj_id, info.attach_btf_id);
+		printf("\n");
 	}
 
 	close(prog_fd);
@@ -144,40 +170,49 @@ static int cgroup_has_attached_progs(int cgroup_fd)
 static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
 				   int level)
 {
+	LIBBPF_OPTS(bpf_prog_query_opts, p);
+	__u32 prog_attach_flags[1024] = {0};
 	const char *attach_flags_str;
 	__u32 prog_ids[1024] = {0};
-	__u32 prog_cnt, iter;
-	__u32 attach_flags;
 	char buf[32];
+	__u32 iter;
 	int ret;
 
-	prog_cnt = ARRAY_SIZE(prog_ids);
-	ret = bpf_prog_query(cgroup_fd, type, query_flags, &attach_flags,
-			     prog_ids, &prog_cnt);
+	p.query_flags = query_flags;
+	p.prog_cnt = ARRAY_SIZE(prog_ids);
+	p.prog_ids = prog_ids;
+	p.prog_attach_flags = prog_attach_flags;
+
+	ret = bpf_prog_query_opts(cgroup_fd, type, &p);
 	if (ret)
 		return ret;
 
-	if (prog_cnt == 0)
+	if (p.prog_cnt == 0)
 		return 0;
 
-	switch (attach_flags) {
-	case BPF_F_ALLOW_MULTI:
-		attach_flags_str = "multi";
-		break;
-	case BPF_F_ALLOW_OVERRIDE:
-		attach_flags_str = "override";
-		break;
-	case 0:
-		attach_flags_str = "";
-		break;
-	default:
-		snprintf(buf, sizeof(buf), "unknown(%x)", attach_flags);
-		attach_flags_str = buf;
-	}
+	for (iter = 0; iter < p.prog_cnt; iter++) {
+		__u32 attach_flags;
+
+		attach_flags = prog_attach_flags[iter] ?: p.attach_flags;
+
+		switch (attach_flags) {
+		case BPF_F_ALLOW_MULTI:
+			attach_flags_str = "multi";
+			break;
+		case BPF_F_ALLOW_OVERRIDE:
+			attach_flags_str = "override";
+			break;
+		case 0:
+			attach_flags_str = "";
+			break;
+		default:
+			snprintf(buf, sizeof(buf), "unknown(%x)", attach_flags);
+			attach_flags_str = buf;
+		}
 
-	for (iter = 0; iter < prog_cnt; iter++)
 		show_bpf_prog(prog_ids[iter], type,
 			      attach_flags_str, level);
+	}
 
 	return 0;
 }
@@ -542,5 +577,6 @@ static const struct cmd cmds[] = {
 
 int do_cgroup(int argc, char **argv)
 {
+	btf_vmlinux = libbpf_find_kernel_btf();
 	return cmd_select(cmds, argc, argv, do_help);
 }
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH bpf-next v9 10/10] selftests/bpf: lsm_cgroup functional test
  2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (8 preceding siblings ...)
  2022-06-10 16:58 ` [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP Stanislav Fomichev
@ 2022-06-10 16:58 ` Stanislav Fomichev
  9 siblings, 0 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:58 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

Functional test that exercises the following:

1. apply default sk_priority policy
2. permit TX-only AF_PACKET socket
3. cgroup attach/detach/replace
4. reusing trampoline shim

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/lsm_cgroup.c     | 277 ++++++++++++++++++
 .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
 .../testing/selftests/bpf/progs/lsm_cgroup.c  | 180 ++++++++++++
 3 files changed, 458 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/progs/lsm_cgroup.c

diff --git a/tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c b/tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
new file mode 100644
index 000000000000..a96057ec7dd4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
@@ -0,0 +1,277 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+#include <bpf/btf.h>
+
+#include "lsm_cgroup.skel.h"
+#include "cgroup_helpers.h"
+#include "network_helpers.h"
+
+static __u32 query_prog_cnt(int cgroup_fd, const char *attach_func)
+{
+	LIBBPF_OPTS(bpf_prog_query_opts, p);
+	static struct btf *btf;
+	int cnt = 0;
+	int i;
+
+	ASSERT_OK(bpf_prog_query_opts(cgroup_fd, BPF_LSM_CGROUP, &p), "prog_query");
+
+	if (!attach_func)
+		return p.prog_cnt;
+
+	/* When attach_func is provided, count the number of progs that
+	 * attach to the given symbol.
+	 */
+
+	if (!btf)
+		btf = btf__load_vmlinux_btf();
+	if (!ASSERT_OK(libbpf_get_error(btf), "btf_vmlinux"))
+		return -1;
+
+	p.prog_ids = malloc(sizeof(u32) * p.prog_cnt);
+	p.prog_attach_flags = malloc(sizeof(u32) * p.prog_cnt);
+	ASSERT_OK(bpf_prog_query_opts(cgroup_fd, BPF_LSM_CGROUP, &p), "prog_query");
+
+	for (i = 0; i < p.prog_cnt; i++) {
+		struct bpf_prog_info info = {};
+		__u32 info_len = sizeof(info);
+		int fd;
+
+		fd = bpf_prog_get_fd_by_id(p.prog_ids[i]);
+		ASSERT_GE(fd, 0, "prog_get_fd_by_id");
+		ASSERT_OK(bpf_obj_get_info_by_fd(fd, &info, &info_len), "prog_info_by_fd");
+		close(fd);
+
+		if (info.attach_btf_id ==
+		    btf__find_by_name_kind(btf, attach_func, BTF_KIND_FUNC))
+			cnt++;
+	}
+
+	return cnt;
+}
+
+static void test_lsm_cgroup_functional(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_prog_attach_opts, attach_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int cgroup_fd, cgroup_fd2, err, fd, prio;
+	int listen_fd, client_fd, accepted_fd;
+	struct lsm_cgroup *skel = NULL;
+	int post_create_prog_fd2 = -1;
+	int post_create_prog_fd = -1;
+	int bind_link_fd2 = -1;
+	int bind_prog_fd2 = -1;
+	int alloc_prog_fd = -1;
+	int bind_prog_fd = -1;
+	int bind_link_fd = -1;
+	int clone_prog_fd = -1;
+	socklen_t socklen;
+
+	cgroup_fd = test__join_cgroup("/sock_policy");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+		goto close_skel;
+
+	cgroup_fd2 = create_and_get_cgroup("/sock_policy2");
+	if (!ASSERT_GE(cgroup_fd2, 0, "create second cgroup"))
+		goto close_skel;
+
+	skel = lsm_cgroup__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		goto close_cgroup;
+
+	post_create_prog_fd = bpf_program__fd(skel->progs.socket_post_create);
+	post_create_prog_fd2 = bpf_program__fd(skel->progs.socket_post_create2);
+	bind_prog_fd = bpf_program__fd(skel->progs.socket_bind);
+	bind_prog_fd2 = bpf_program__fd(skel->progs.socket_bind2);
+	alloc_prog_fd = bpf_program__fd(skel->progs.socket_alloc);
+	clone_prog_fd = bpf_program__fd(skel->progs.socket_clone);
+
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_sk_alloc_security"), 0, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 0, "total prog count");
+	err = bpf_prog_attach(alloc_prog_fd, cgroup_fd, BPF_LSM_CGROUP, 0);
+	if (!ASSERT_OK(err, "attach alloc_prog_fd"))
+		goto detach_cgroup;
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_sk_alloc_security"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 1, "total prog count");
+
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_inet_csk_clone"), 0, "prog count");
+	err = bpf_prog_attach(clone_prog_fd, cgroup_fd, BPF_LSM_CGROUP, 0);
+	if (!ASSERT_OK(err, "attach clone_prog_fd"))
+		goto detach_cgroup;
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_inet_csk_clone"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 2, "total prog count");
+
+	/* Make sure replacing works. */
+
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_socket_post_create"), 0, "prog count");
+	err = bpf_prog_attach(post_create_prog_fd, cgroup_fd,
+			      BPF_LSM_CGROUP, 0);
+	if (!ASSERT_OK(err, "attach post_create_prog_fd"))
+		goto close_cgroup;
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_socket_post_create"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 3, "total prog count");
+
+	attach_opts.replace_prog_fd = post_create_prog_fd;
+	err = bpf_prog_attach_opts(post_create_prog_fd2, cgroup_fd,
+				   BPF_LSM_CGROUP, &attach_opts);
+	if (!ASSERT_OK(err, "prog replace post_create_prog_fd"))
+		goto detach_cgroup;
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_socket_post_create"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 3, "total prog count");
+
+	/* Try the same attach/replace via link API. */
+
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_socket_bind"), 0, "prog count");
+	bind_link_fd = bpf_link_create(bind_prog_fd, cgroup_fd,
+				       BPF_LSM_CGROUP, NULL);
+	if (!ASSERT_GE(bind_link_fd, 0, "link create bind_prog_fd"))
+		goto detach_cgroup;
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_socket_bind"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 4, "total prog count");
+
+	update_opts.old_prog_fd = bind_prog_fd;
+	update_opts.flags = BPF_F_REPLACE;
+
+	err = bpf_link_update(bind_link_fd, bind_prog_fd2, &update_opts);
+	if (!ASSERT_OK(err, "link update bind_prog_fd"))
+		goto detach_cgroup;
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_socket_bind"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 4, "total prog count");
+
+	/* Attach another instance of bind program to another cgroup.
+	 * This should trigger the reuse of the trampoline shim (two
+	 * programs attaching to the same btf_id).
+	 */
+
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, "bpf_lsm_socket_bind"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd2, "bpf_lsm_socket_bind"), 0, "prog count");
+	bind_link_fd2 = bpf_link_create(bind_prog_fd2, cgroup_fd2,
+					BPF_LSM_CGROUP, NULL);
+	if (!ASSERT_GE(bind_link_fd2, 0, "link create bind_prog_fd2"))
+		goto detach_cgroup;
+	ASSERT_EQ(query_prog_cnt(cgroup_fd2, "bpf_lsm_socket_bind"), 1, "prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 4, "total prog count");
+	ASSERT_EQ(query_prog_cnt(cgroup_fd2, NULL), 1, "total prog count");
+
+	/* AF_UNIX is prohibited. */
+
+	fd = socket(AF_UNIX, SOCK_STREAM, 0);
+	ASSERT_LT(fd, 0, "socket(AF_UNIX)");
+
+	/* AF_INET6 gets default policy (sk_priority). */
+
+	fd = socket(AF_INET6, SOCK_STREAM, 0);
+	if (!ASSERT_GE(fd, 0, "socket(SOCK_STREAM)"))
+		goto detach_cgroup;
+
+	prio = 0;
+	socklen = sizeof(prio);
+	ASSERT_GE(getsockopt(fd, SOL_SOCKET, SO_PRIORITY, &prio, &socklen), 0,
+		  "getsockopt");
+	ASSERT_EQ(prio, 123, "sk_priority");
+
+	close(fd);
+
+	/* TX-only AF_PACKET is allowed. */
+
+	ASSERT_LT(socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)), 0,
+		  "socket(AF_PACKET, ..., ETH_P_ALL)");
+
+	fd = socket(AF_PACKET, SOCK_RAW, 0);
+	ASSERT_GE(fd, 0, "socket(AF_PACKET, ..., 0)");
+
+	/* TX-only AF_PACKET can not be rebound. */
+
+	struct sockaddr_ll sa = {
+		.sll_family = AF_PACKET,
+		.sll_protocol = htons(ETH_P_ALL),
+	};
+	ASSERT_LT(bind(fd, (struct sockaddr *)&sa, sizeof(sa)), 0,
+		  "bind(ETH_P_ALL)");
+
+	close(fd);
+
+	/* Trigger passive open. */
+
+	listen_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
+	ASSERT_GE(listen_fd, 0, "start_server");
+	client_fd = connect_to_fd(listen_fd, 0);
+	ASSERT_GE(client_fd, 0, "connect_to_fd");
+	accepted_fd = accept(listen_fd, NULL, NULL);
+	ASSERT_GE(accepted_fd, 0, "accept");
+
+	prio = 0;
+	socklen = sizeof(prio);
+	ASSERT_GE(getsockopt(accepted_fd, SOL_SOCKET, SO_PRIORITY, &prio, &socklen), 0,
+		  "getsockopt");
+	ASSERT_EQ(prio, 234, "sk_priority");
+
+	/* These are replaced and never called. */
+	ASSERT_EQ(skel->bss->called_socket_post_create, 0, "called_create");
+	ASSERT_EQ(skel->bss->called_socket_bind, 0, "called_bind");
+
+	/* AF_INET6+SOCK_STREAM
+	 * AF_PACKET+SOCK_RAW
+	 * listen_fd
+	 * client_fd
+	 * accepted_fd
+	 */
+	ASSERT_EQ(skel->bss->called_socket_post_create2, 5, "called_create2");
+
+	/* start_server
+	 * bind(ETH_P_ALL)
+	 */
+	ASSERT_EQ(skel->bss->called_socket_bind2, 2, "called_bind2");
+	/* Single accept(). */
+	ASSERT_EQ(skel->bss->called_socket_clone, 1, "called_clone");
+
+	/* AF_UNIX+SOCK_STREAM (failed)
+	 * AF_INET6+SOCK_STREAM
+	 * AF_PACKET+SOCK_RAW (failed)
+	 * AF_PACKET+SOCK_RAW
+	 * listen_fd
+	 * client_fd
+	 * accepted_fd
+	 */
+	ASSERT_EQ(skel->bss->called_socket_alloc, 7, "called_alloc");
+
+	/* Make sure other cgroup doesn't trigger the programs. */
+
+	if (!ASSERT_OK(join_cgroup(""), "join root cgroup"))
+		goto detach_cgroup;
+
+	fd = socket(AF_INET6, SOCK_STREAM, 0);
+	if (!ASSERT_GE(fd, 0, "socket(SOCK_STREAM)"))
+		goto detach_cgroup;
+
+	prio = 0;
+	socklen = sizeof(prio);
+	ASSERT_GE(getsockopt(fd, SOL_SOCKET, SO_PRIORITY, &prio, &socklen), 0,
+		  "getsockopt");
+	ASSERT_EQ(prio, 0, "sk_priority");
+
+	close(fd);
+
+detach_cgroup:
+	ASSERT_GE(bpf_prog_detach2(post_create_prog_fd2, cgroup_fd,
+				   BPF_LSM_CGROUP), 0, "detach_create");
+	close(bind_link_fd);
+	/* Don't close bind_link_fd2, exercise cgroup release cleanup. */
+	ASSERT_GE(bpf_prog_detach2(alloc_prog_fd, cgroup_fd,
+				   BPF_LSM_CGROUP), 0, "detach_alloc");
+	ASSERT_GE(bpf_prog_detach2(clone_prog_fd, cgroup_fd,
+				   BPF_LSM_CGROUP), 0, "detach_clone");
+
+close_cgroup:
+	close(cgroup_fd);
+close_skel:
+	lsm_cgroup__destroy(skel);
+}
+
+void test_lsm_cgroup(void)
+{
+	if (test__start_subtest("functional"))
+		test_lsm_cgroup_functional();
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 1c1289ba5fc5..98dd2c4815f0 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -8,6 +8,7 @@
 #define SOL_SOCKET		1
 #define SO_SNDBUF		7
 #define __SO_ACCEPTCON		(1 << 16)
+#define SO_PRIORITY		12
 
 #define SOL_TCP			6
 #define TCP_CONGESTION		13
diff --git a/tools/testing/selftests/bpf/progs/lsm_cgroup.c b/tools/testing/selftests/bpf/progs/lsm_cgroup.c
new file mode 100644
index 000000000000..89f3b1e961a8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/lsm_cgroup.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include "bpf_tracing_net.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#ifndef AF_PACKET
+#define AF_PACKET 17
+#endif
+
+#ifndef AF_UNIX
+#define AF_UNIX 1
+#endif
+
+#ifndef EPERM
+#define EPERM 1
+#endif
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
+	__type(key, __u64);
+	__type(value, __u64);
+} cgroup_storage SEC(".maps");
+
+int called_socket_post_create;
+int called_socket_post_create2;
+int called_socket_bind;
+int called_socket_bind2;
+int called_socket_alloc;
+int called_socket_clone;
+
+static __always_inline int test_local_storage(void)
+{
+	__u64 *val;
+
+	val = bpf_get_local_storage(&cgroup_storage, 0);
+	if (!val)
+		return 0;
+	*val += 1;
+
+	return 1;
+}
+
+static __always_inline int real_create(struct socket *sock, int family,
+				       int protocol)
+{
+	struct sock *sk;
+	int prio = 123;
+
+	/* Reject non-tx-only AF_PACKET. */
+	if (family == AF_PACKET && protocol != 0)
+		return 0; /* EPERM */
+
+	sk = sock->sk;
+	if (!sk)
+		return 1;
+
+	/* The rest of the sockets get default policy. */
+	if (bpf_setsockopt(sk, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio)))
+		return 0; /* EPERM */
+
+	/* Make sure bpf_getsockopt is allowed and works. */
+	prio = 0;
+	if (bpf_getsockopt(sk, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio)))
+		return 0; /* EPERM */
+	if (prio != 123)
+		return 0; /* EPERM */
+
+	/* Can access cgroup local storage. */
+	if (!test_local_storage())
+		return 0; /* EPERM */
+
+	return 1;
+}
+
+/* __cgroup_bpf_run_lsm_socket */
+SEC("lsm_cgroup/socket_post_create")
+int BPF_PROG(socket_post_create, struct socket *sock, int family,
+	     int type, int protocol, int kern)
+{
+	called_socket_post_create++;
+	return real_create(sock, family, protocol);
+}
+
+/* __cgroup_bpf_run_lsm_socket */
+SEC("lsm_cgroup/socket_post_create")
+int BPF_PROG(socket_post_create2, struct socket *sock, int family,
+	     int type, int protocol, int kern)
+{
+	called_socket_post_create2++;
+	return real_create(sock, family, protocol);
+}
+
+static __always_inline int real_bind(struct socket *sock,
+				     struct sockaddr *address,
+				     int addrlen)
+{
+	struct sockaddr_ll sa = {};
+
+	if (sock->sk->__sk_common.skc_family != AF_PACKET)
+		return 1;
+
+	if (sock->sk->sk_kern_sock)
+		return 1;
+
+	bpf_probe_read_kernel(&sa, sizeof(sa), address);
+	if (sa.sll_protocol)
+		return 0; /* EPERM */
+
+	/* Can access cgroup local storage. */
+	if (!test_local_storage())
+		return 0; /* EPERM */
+
+	return 1;
+}
+
+/* __cgroup_bpf_run_lsm_socket */
+SEC("lsm_cgroup/socket_bind")
+int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address,
+	     int addrlen)
+{
+	called_socket_bind++;
+	return real_bind(sock, address, addrlen);
+}
+
+/* __cgroup_bpf_run_lsm_socket */
+SEC("lsm_cgroup/socket_bind")
+int BPF_PROG(socket_bind2, struct socket *sock, struct sockaddr *address,
+	     int addrlen)
+{
+	called_socket_bind2++;
+	return real_bind(sock, address, addrlen);
+}
+
+/* __cgroup_bpf_run_lsm_current (via bpf_lsm_current_hooks) */
+SEC("lsm_cgroup/sk_alloc_security")
+int BPF_PROG(socket_alloc, struct sock *sk, int family, gfp_t priority)
+{
+	called_socket_alloc++;
+	if (family == AF_UNIX)
+		return 0; /* EPERM */
+
+	/* Can access cgroup local storage. */
+	if (!test_local_storage())
+		return 0; /* EPERM */
+
+	return 1;
+}
+
+/* __cgroup_bpf_run_lsm_sock */
+SEC("lsm_cgroup/inet_csk_clone")
+int BPF_PROG(socket_clone, struct sock *newsk, const struct request_sock *req)
+{
+	int prio = 234;
+
+	called_socket_clone++;
+
+	if (!newsk)
+		return 1;
+
+	/* Accepted request sockets get a different priority. */
+	if (bpf_setsockopt(newsk, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio)))
+		return 0; /* EPERM */
+
+	/* Make sure bpf_getsockopt is allowed and works. */
+	prio = 0;
+	if (bpf_getsockopt(newsk, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio)))
+		return 0; /* EPERM */
+	if (prio != 234)
+		return 0; /* EPERM */
+
+	/* Can access cgroup local storage. */
+	if (!test_local_storage())
+		return 0; /* EPERM */
+
+	return 1;
+}
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program
  2022-06-10 16:57 ` [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
@ 2022-06-11 16:53   ` kernel test robot
  2022-06-17  0:43   ` Martin KaFai Lau
  1 sibling, 0 replies; 31+ messages in thread
From: kernel test robot @ 2022-06-11 16:53 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf
  Cc: llvm, kbuild-all, ast, daniel, andrii, Stanislav Fomichev

Hi Stanislav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220611-010241
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-randconfig-a015 (https://download.01.org/0day-ci/archive/20220612/202206120010.PClasXCc-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project ff4abe755279a3a47cc416ef80dbc900d9a98a19)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/ca53950e781503c3d62454a19b4d0395dbd79dd7
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220611-010241
        git checkout ca53950e781503c3d62454a19b4d0395dbd79dd7
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: bpf_cgroup_atype_get
   >>> referenced by trampoline.c:558 (kernel/bpf/trampoline.c:558)
   >>>               bpf/trampoline.o:(bpf_trampoline_link_cgroup_shim) in archive kernel/built-in.a

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP
  2022-06-10 16:58 ` [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP Stanislav Fomichev
@ 2022-06-13 12:07   ` Quentin Monnet
  2022-06-13 15:53     ` Stanislav Fomichev
  2022-06-17  5:58   ` Martin KaFai Lau
  1 sibling, 1 reply; 31+ messages in thread
From: Quentin Monnet @ 2022-06-13 12:07 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf; +Cc: ast, daniel, andrii

2022-06-10 09:58 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> $ bpftool --nomount prog loadall $KDIR/tools/testing/selftests/bpf/lsm_cgroup.o /sys/fs/bpf/x
> $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_alloc
> $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_bind
> $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_clone
> $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
> $ bpftool cgroup tree
> CgroupPath
> ID       AttachType      AttachFlags     Name
> /sys/fs/cgroup
> 6        lsm_cgroup                      socket_post_create bpf_lsm_socket_post_create
> 8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
> 10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
> 11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone
> 
> $ bpftool cgroup detach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
> $ bpftool cgroup tree
> CgroupPath
> ID       AttachType      AttachFlags     Name
> /sys/fs/cgroup
> 8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
> 10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
> 11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

The changes for bpftool look good to me, thanks!
Reviewed-by: Quentin Monnet <quentin@isovalent.com>

> ---
>  tools/bpf/bpftool/cgroup.c | 80 +++++++++++++++++++++++++++-----------
>  1 file changed, 58 insertions(+), 22 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
> index 42421fe47a58..6e55f583a62f 100644
> --- a/tools/bpf/bpftool/cgroup.c
> +++ b/tools/bpf/bpftool/cgroup.c

> @@ -542,5 +577,6 @@ static const struct cmd cmds[] = {
>  
>  int do_cgroup(int argc, char **argv)
>  {
> +	btf_vmlinux = libbpf_find_kernel_btf();
>  	return cmd_select(cmds, argc, argv, do_help);
>  }

This is not required for all "bpftool cgroup" operations (attach/detach
don't need it I think), but should be inexpensive, right?

Quentin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP
  2022-06-13 12:07   ` Quentin Monnet
@ 2022-06-13 15:53     ` Stanislav Fomichev
  0 siblings, 0 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-13 15:53 UTC (permalink / raw)
  To: Quentin Monnet; +Cc: netdev, bpf, ast, daniel, andrii

On Mon, Jun 13, 2022 at 5:08 AM Quentin Monnet <quentin@isovalent.com> wrote:
>
> 2022-06-10 09:58 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> > $ bpftool --nomount prog loadall $KDIR/tools/testing/selftests/bpf/lsm_cgroup.o /sys/fs/bpf/x
> > $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_alloc
> > $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_bind
> > $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_clone
> > $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
> > $ bpftool cgroup tree
> > CgroupPath
> > ID       AttachType      AttachFlags     Name
> > /sys/fs/cgroup
> > 6        lsm_cgroup                      socket_post_create bpf_lsm_socket_post_create
> > 8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
> > 10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
> > 11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone
> >
> > $ bpftool cgroup detach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
> > $ bpftool cgroup tree
> > CgroupPath
> > ID       AttachType      AttachFlags     Name
> > /sys/fs/cgroup
> > 8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
> > 10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
> > 11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>
> The changes for bpftool look good to me, thanks!
> Reviewed-by: Quentin Monnet <quentin@isovalent.com>

Thank you for the review!

> > ---
> >  tools/bpf/bpftool/cgroup.c | 80 +++++++++++++++++++++++++++-----------
> >  1 file changed, 58 insertions(+), 22 deletions(-)
> >
> > diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
> > index 42421fe47a58..6e55f583a62f 100644
> > --- a/tools/bpf/bpftool/cgroup.c
> > +++ b/tools/bpf/bpftool/cgroup.c
>
> > @@ -542,5 +577,6 @@ static const struct cmd cmds[] = {
> >
> >  int do_cgroup(int argc, char **argv)
> >  {
> > +     btf_vmlinux = libbpf_find_kernel_btf();
> >       return cmd_select(cmds, argc, argv, do_help);
> >  }
>
> This is not required for all "bpftool cgroup" operations (attach/detach
> don't need it I think), but should be inexpensive, right?

Good point, let's move it close to the place where we use it
regardless of whether it's expensive or not.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 01/10] bpf: add bpf_func_t and trampoline helpers
  2022-06-10 16:57 ` [PATCH bpf-next v9 01/10] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
@ 2022-06-16 19:53   ` Martin KaFai Lau
  0 siblings, 0 replies; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-16 19:53 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 10, 2022 at 09:57:54AM -0700, Stanislav Fomichev wrote:
> I'll be adding lsm cgroup specific helpers that grab
> trampoline mutex.
> 
> No functional changes.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 02/10] bpf: convert cgroup_bpf.progs to hlist
  2022-06-10 16:57 ` [PATCH bpf-next v9 02/10] bpf: convert cgroup_bpf.progs to hlist Stanislav Fomichev
@ 2022-06-16 19:59   ` Martin KaFai Lau
  0 siblings, 0 replies; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-16 19:59 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii, Jakub Sitnicki

On Fri, Jun 10, 2022 at 09:57:55AM -0700, Stanislav Fomichev wrote:
> This lets us reclaim some space to be used by new cgroup lsm slots.
> 
> Before:
> struct cgroup_bpf {
> 	struct bpf_prog_array *    effective[23];        /*     0   184 */
> 	/* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */
> 	struct list_head           progs[23];            /*   184   368 */
> 	/* --- cacheline 8 boundary (512 bytes) was 40 bytes ago --- */
> 	u32                        flags[23];            /*   552    92 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	/* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */
> 	struct list_head           storages;             /*   648    16 */
> 	struct bpf_prog_array *    inactive;             /*   664     8 */
> 	struct percpu_ref          refcnt;               /*   672    16 */
> 	struct work_struct         release_work;         /*   688    32 */
> 
> 	/* size: 720, cachelines: 12, members: 7 */
> 	/* sum members: 716, holes: 1, sum holes: 4 */
> 	/* last cacheline: 16 bytes */
> };
> 
> After:
> struct cgroup_bpf {
> 	struct bpf_prog_array *    effective[23];        /*     0   184 */
> 	/* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */
> 	struct hlist_head          progs[23];            /*   184   184 */
> 	/* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */
> 	u8                         flags[23];            /*   368    23 */
> 
> 	/* XXX 1 byte hole, try to pack */
> 
> 	/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
> 	struct list_head           storages;             /*   392    16 */
> 	struct bpf_prog_array *    inactive;             /*   408     8 */
> 	struct percpu_ref          refcnt;               /*   416    16 */
> 	struct work_struct         release_work;         /*   432    72 */
> 
> 	/* size: 504, cachelines: 8, members: 7 */
> 	/* sum members: 503, holes: 1, sum holes: 1 */
> 	/* last cacheline: 56 bytes */
> };
> 
> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor
  2022-06-10 16:57 ` [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor Stanislav Fomichev
@ 2022-06-16 22:25   ` Martin KaFai Lau
  2022-06-17 18:28     ` Stanislav Fomichev
  0 siblings, 1 reply; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-16 22:25 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 10, 2022 at 09:57:56AM -0700, Stanislav Fomichev wrote:
> Allow attaching to lsm hooks in the cgroup context.
> 
> Attaching to per-cgroup LSM works exactly like attaching
> to other per-cgroup hooks. New BPF_LSM_CGROUP is added
> to trigger new mode; the actual lsm hook we attach to is
> signaled via existing attach_btf_id.
> 
> For the hooks that have 'struct socket' or 'struct sock' as its first
> argument, we use the cgroup associated with that socket. For the rest,
> we use 'current' cgroup (this is all on default hierarchy == v2 only).
> Note that for some hooks that work on 'struct sock' we still
> take the cgroup from 'current' because some of them work on the socket
> that hasn't been properly initialized yet.
> 
> Behind the scenes, we allocate a shim program that is attached
> to the trampoline and runs cgroup effective BPF programs array.
> This shim has some rudimentary ref counting and can be shared
> between several programs attaching to the same per-cgroup lsm hook.
nit. The 'per-cgroup' may be read as each cgroup has its own shim.
It will be useful to rephrase it a little.

> 
> Note that this patch bloats cgroup size because we add 211
> cgroup_bpf_attach_type(s) for simplicity sake. This will be
> addressed in the subsequent patch.
> 
> Also note that we only add non-sleepable flavor for now. To enable
> sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu,
> shim programs have to be freed via trace rcu, cgroup_bpf.effective
> should be also trace-rcu-managed + maybe some other changes that
> I'm not aware of.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

[ ... ]

> @@ -1840,10 +1850,8 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>  	emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
>  	/* arg3: lea rdx, [rbp - run_ctx_off] */
>  	EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
> -	if (emit_call(&prog,
> -		      p->aux->sleepable ? __bpf_prog_exit_sleepable :
> -		      __bpf_prog_exit, prog))
> -			return -EINVAL;
> +	if (emit_call(&prog, exit, prog))
> +		return -EINVAL;
>  
>  	*pprog = prog;
>  	return 0;
> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> index 5d268e76d8e6..b99f8c3e37ea 100644
> --- a/include/linux/bpf-cgroup-defs.h
> +++ b/include/linux/bpf-cgroup-defs.h
> @@ -10,6 +10,12 @@
>  
>  struct bpf_prog_array;
>  
> +#ifdef CONFIG_BPF_LSM
> +#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> +#else
> +#define CGROUP_LSM_NUM 0
> +#endif
> +
>  enum cgroup_bpf_attach_type {
>  	CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
>  	CGROUP_INET_INGRESS = 0,
> @@ -35,6 +41,8 @@ enum cgroup_bpf_attach_type {
>  	CGROUP_INET4_GETSOCKNAME,
>  	CGROUP_INET6_GETSOCKNAME,
>  	CGROUP_INET_SOCK_RELEASE,
> +	CGROUP_LSM_START,
> +	CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
Likely a dumb question, just in case, presumably everything should be fine when
CGROUP_LSM_NUM is 0 ?

>  	MAX_CGROUP_BPF_ATTACH_TYPE
>  };
>  

[ ... ]

> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 4adb4f3ecb7f..b0314889a409 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -14,6 +14,8 @@
>  #include <linux/string.h>
>  #include <linux/bpf.h>
>  #include <linux/bpf-cgroup.h>
> +#include <linux/bpf_lsm.h>
> +#include <linux/bpf_verifier.h>
>  #include <net/sock.h>
>  #include <net/bpf_sk_storage.h>
>  
> @@ -61,6 +63,88 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
>  	return run_ctx.retval;
>  }
>  
> +unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx,
> +				       const struct bpf_insn *insn)
> +{
> +	const struct bpf_prog *shim_prog;
> +	struct sock *sk;
> +	struct cgroup *cgrp;
> +	int ret = 0;
> +	u64 *args;
> +
> +	args = (u64 *)ctx;
> +	sk = (void *)(unsigned long)args[0];
> +	/*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> +	shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> +
> +	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> +	if (likely(cgrp))
> +		ret = bpf_prog_run_array_cg(&cgrp->bpf,
> +					    shim_prog->aux->cgroup_atype,
> +					    ctx, bpf_prog_run, 0, NULL);
> +	return ret;
> +}
> +
> +unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> +					 const struct bpf_insn *insn)
> +{
> +	const struct bpf_prog *shim_prog;
> +	struct socket *sock;
> +	struct cgroup *cgrp;
> +	int ret = 0;
> +	u64 *args;
> +
> +	args = (u64 *)ctx;
> +	sock = (void *)(unsigned long)args[0];
> +	/*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> +	shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> +
> +	cgrp = sock_cgroup_ptr(&sock->sk->sk_cgrp_data);
> +	if (likely(cgrp))
> +		ret = bpf_prog_run_array_cg(&cgrp->bpf,
> +					    shim_prog->aux->cgroup_atype,
> +					    ctx, bpf_prog_run, 0, NULL);
> +	return ret;
> +}
> +
> +unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> +					  const struct bpf_insn *insn)
> +{
> +	const struct bpf_prog *shim_prog;
> +	struct cgroup *cgrp;
> +	int ret = 0;
> +
> +	/*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> +	shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> +
> +	rcu_read_lock();
nit. Not needed.  May be a comment about it has been acquired
in __bpf_prog_enter_lsm_cgroup().  For sleepable in the future,
rcu_read_lock() cannot be done here also.

> +	cgrp = task_dfl_cgroup(current);
> +	if (likely(cgrp))
> +		ret = bpf_prog_run_array_cg(&cgrp->bpf,
> +					    shim_prog->aux->cgroup_atype,
> +					    ctx, bpf_prog_run, 0, NULL);
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +#ifdef CONFIG_BPF_LSM
> +static enum cgroup_bpf_attach_type
> +bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
> +{
> +	if (attach_type != BPF_LSM_CGROUP)
> +		return to_cgroup_bpf_attach_type(attach_type);
> +	return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
> +}
> +#else
> +static enum cgroup_bpf_attach_type
> +bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
> +{
> +	if (attach_type != BPF_LSM_CGROUP)
> +		return to_cgroup_bpf_attach_type(attach_type);
> +	return -EOPNOTSUPP;
> +}
> +#endif /* CONFIG_BPF_LSM */
> +
>  void cgroup_bpf_offline(struct cgroup *cgrp)
>  {
>  	cgroup_get(cgrp);
> @@ -163,10 +247,20 @@ static void cgroup_bpf_release(struct work_struct *work)
>  
>  		hlist_for_each_entry_safe(pl, pltmp, progs, node) {
>  			hlist_del(&pl->node);
> -			if (pl->prog)
> +			if (pl->prog) {
> +#ifdef CONFIG_BPF_LSM
This should not be needed as it is not needed in __cgroup_bpf_attach() below.

> +				if (pl->prog->expected_attach_type == BPF_LSM_CGROUP)
> +					bpf_trampoline_unlink_cgroup_shim(pl->prog);
> +#endif
>  				bpf_prog_put(pl->prog);
> -			if (pl->link)
> +			}
> +			if (pl->link) {
> +#ifdef CONFIG_BPF_LSM
Same here.

> +				if (pl->link->link.prog->expected_attach_type == BPF_LSM_CGROUP)
> +					bpf_trampoline_unlink_cgroup_shim(pl->link->link.prog);
> +#endif
>  				bpf_cgroup_link_auto_detach(pl->link);
> +			}
>  			kfree(pl);
>  			static_branch_dec(&cgroup_bpf_enabled_key[atype]);
>  		}
> @@ -479,6 +573,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	struct bpf_prog *old_prog = NULL;
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
>  	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> +	struct bpf_prog *new_prog = prog ? : link->link.prog;
>  	enum cgroup_bpf_attach_type atype;
>  	struct bpf_prog_list *pl;
>  	struct hlist_head *progs;
> @@ -495,7 +590,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  		/* replace_prog implies BPF_F_REPLACE, and vice versa */
>  		return -EINVAL;
>  
> -	atype = to_cgroup_bpf_attach_type(type);
> +	atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
>  	if (atype < 0)
>  		return -EINVAL;
>  
> @@ -549,17 +644,30 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	bpf_cgroup_storages_assign(pl->storage, storage);
>  	cgrp->bpf.flags[atype] = saved_flags;
>  
> +	if (type == BPF_LSM_CGROUP) {
> +		err = bpf_trampoline_link_cgroup_shim(new_prog, atype);
> +		if (err)
> +			goto cleanup;
> +	}
> +
>  	err = update_effective_progs(cgrp, atype);
>  	if (err)
> -		goto cleanup;
> +		goto cleanup_trampoline;
>  
> -	if (old_prog)
> +	if (old_prog) {
> +		if (type == BPF_LSM_CGROUP)
> +			bpf_trampoline_unlink_cgroup_shim(old_prog);
>  		bpf_prog_put(old_prog);
> -	else
> +	} else {
>  		static_branch_inc(&cgroup_bpf_enabled_key[atype]);
> +	}
>  	bpf_cgroup_storages_link(new_storage, cgrp, type);
>  	return 0;
>  
> +cleanup_trampoline:
> +	if (type == BPF_LSM_CGROUP)
> +		bpf_trampoline_unlink_cgroup_shim(new_prog);
> +
>  cleanup:
>  	if (old_prog) {
>  		pl->prog = old_prog;
> @@ -651,7 +759,7 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
>  	struct hlist_head *progs;
>  	bool found = false;
>  
> -	atype = to_cgroup_bpf_attach_type(link->type);
> +	atype = bpf_cgroup_atype_find(link->type, new_prog->aux->attach_btf_id);
>  	if (atype < 0)
>  		return -EINVAL;
>  
> @@ -803,9 +911,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	struct bpf_prog *old_prog;
>  	struct bpf_prog_list *pl;
>  	struct hlist_head *progs;
> +	u32 attach_btf_id = 0;
>  	u32 flags;
>  
> -	atype = to_cgroup_bpf_attach_type(type);
> +	if (prog)
> +		attach_btf_id = prog->aux->attach_btf_id;
> +	if (link)
> +		attach_btf_id = link->link.prog->aux->attach_btf_id;
> +
> +	atype = bpf_cgroup_atype_find(type, attach_btf_id);
>  	if (atype < 0)
>  		return -EINVAL;
>  
> @@ -839,8 +953,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	if (hlist_empty(progs))
>  		/* last program was detached, reset flags to zero */
>  		cgrp->bpf.flags[atype] = 0;
> -	if (old_prog)
> +	if (old_prog) {
> +		if (type == BPF_LSM_CGROUP)
> +			bpf_trampoline_unlink_cgroup_shim(old_prog);
I think the same bpf_trampoline_unlink_cgroup_shim() needs to be done
in bpf_cgroup_link_release()?  It should be done just after
WARN_ON(__cgroup_bpf_detach()).

[ ... ]

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index aeb31137b2ed..a237be4f8bb3 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3416,6 +3416,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
>  		return BPF_PROG_TYPE_SK_LOOKUP;
>  	case BPF_XDP:
>  		return BPF_PROG_TYPE_XDP;
> +	case BPF_LSM_CGROUP:
> +		return BPF_PROG_TYPE_LSM;
>  	default:
>  		return BPF_PROG_TYPE_UNSPEC;
>  	}
> @@ -3469,6 +3471,11 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
>  	case BPF_PROG_TYPE_CGROUP_SYSCTL:
>  	case BPF_PROG_TYPE_SOCK_OPS:
> +	case BPF_PROG_TYPE_LSM:
> +		if (ptype == BPF_PROG_TYPE_LSM &&
> +		    prog->expected_attach_type != BPF_LSM_CGROUP)
Check this in bpf_prog_attach_check_attach_type() where
other expected_attach_type are enforced.

> +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
> +{
> +	struct bpf_shim_tramp_link *shim_link = NULL;
> +	struct bpf_trampoline *tr;
> +	bpf_func_t bpf_func;
> +	u64 key;
> +
> +	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> +					 prog->aux->attach_btf_id);
> +
> +	bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> +	tr = bpf_trampoline_lookup(key);
> +	if (!tr)
nit. !tr should not be possible?  If not, may be a WARN_ON_ONCE()
or remove this check.

> +		return;
> +
> +	mutex_lock(&tr->mutex);
> +	shim_link = cgroup_shim_find(tr, bpf_func);
> +	mutex_unlock(&tr->mutex);
> +
> +	if (shim_link)
> +		bpf_link_put(&shim_link->link.link);
> +
> +	bpf_trampoline_put(tr); /* bpf_trampoline_lookup above */
> +}
> +#endif

[ ... ]

> diff --git a/tools/include/linux/btf_ids.h b/tools/include/linux/btf_ids.h
> index 57890b357f85..2345b502b439 100644
> --- a/tools/include/linux/btf_ids.h
> +++ b/tools/include/linux/btf_ids.h
> @@ -172,7 +172,9 @@ extern struct btf_id_set name;
>  	BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP_TW, tcp_timewait_sock)		\
>  	BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP6, tcp6_sock)			\
>  	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock)			\
> -	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)
> +	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)			\
> +	BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock)			\
The existing tools's btf_ids.h looks more outdated from
the kernel's btf_ids.h.  unix_sock is missing which is added back here.
mptcp_sock is missing also but not added.  I assume the latter test
needs unix_sock here ?

Others lgtm.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program
  2022-06-10 16:57 ` [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
  2022-06-11 16:53   ` kernel test robot
@ 2022-06-17  0:43   ` Martin KaFai Lau
  2022-06-17 18:28     ` Stanislav Fomichev
  1 sibling, 1 reply; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17  0:43 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 10, 2022 at 09:57:57AM -0700, Stanislav Fomichev wrote:
> Previous patch adds 1:1 mapping between all 211 LSM hooks
> and bpf_cgroup program array. Instead of reserving a slot per
> possible hook, reserve 10 slots per cgroup for lsm programs.
> Those slots are dynamically allocated on demand and reclaimed.
> 
> struct cgroup_bpf {
> 	struct bpf_prog_array *    effective[33];        /*     0   264 */
> 	/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> 	struct hlist_head          progs[33];            /*   264   264 */
> 	/* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */
> 	u8                         flags[33];            /*   528    33 */
> 
> 	/* XXX 7 bytes hole, try to pack */
> 
> 	struct list_head           storages;             /*   568    16 */
> 	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
> 	struct bpf_prog_array *    inactive;             /*   584     8 */
> 	struct percpu_ref          refcnt;               /*   592    16 */
> 	struct work_struct         release_work;         /*   608    72 */
> 
> 	/* size: 680, cachelines: 11, members: 7 */
> 	/* sum members: 673, holes: 1, sum holes: 7 */
> 	/* last cacheline: 40 bytes */
> };
> 
> Move 'ifdef CONFIG_CGROUP_BPF' to expose CGROUP_BPF_ATTACH_TYPE_INVALID
> to non-cgroup (core) parts.
hmm... don't see this change in bpf-cgroup-defs.h in this patch.

> 
> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> index b99f8c3e37ea..7b121bd780eb 100644
> --- a/include/linux/bpf-cgroup-defs.h
> +++ b/include/linux/bpf-cgroup-defs.h
> @@ -11,7 +11,8 @@
>  struct bpf_prog_array;
>  
>  #ifdef CONFIG_BPF_LSM
> -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> +/* Maximum number of concurrently attachable per-cgroup LSM hooks. */
> +#define CGROUP_LSM_NUM 10
>  #else
>  #define CGROUP_LSM_NUM 0
>  #endif
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 4dceb86229f6..503f28fa66d2 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2407,7 +2407,6 @@ int bpf_arch_text_invalidate(void *dst, size_t len);
>  
>  struct btf_id_set;
>  bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
> -int btf_id_set_index(const struct btf_id_set *set, u32 id);
>  
>  #define MAX_BPRINTF_VARARGS		12
>  
> @@ -2444,4 +2443,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
>  void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>  int bpf_dynptr_check_size(u32 size);
>  
> +void bpf_cgroup_atype_put(int cgroup_atype);
> +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
> +
>  #endif /* _LINUX_BPF_H */

[ ... ]

> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index b0314889a409..ba402d50e130 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -128,12 +128,56 @@ unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
>  }
>  
>  #ifdef CONFIG_BPF_LSM
> +struct cgroup_lsm_atype {
> +	u32 attach_btf_id;
> +	int refcnt;
> +};
> +
> +static struct cgroup_lsm_atype cgroup_lsm_atype[CGROUP_LSM_NUM];
> +
>  static enum cgroup_bpf_attach_type
>  bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
>  {
> +	int i;
> +
> +	lockdep_assert_held(&cgroup_mutex);
> +
>  	if (attach_type != BPF_LSM_CGROUP)
>  		return to_cgroup_bpf_attach_type(attach_type);
> -	return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
> +
> +	for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype); i++)
> +		if (cgroup_lsm_atype[i].attach_btf_id == attach_btf_id)
> +			return CGROUP_LSM_START + i;
> +
> +	for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype); i++)
> +		if (cgroup_lsm_atype[i].attach_btf_id == 0)
> +			return CGROUP_LSM_START + i;
> +
> +	return -E2BIG;
> +
> +}
> +
> +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype)
> +{
> +	int i = cgroup_atype - CGROUP_LSM_START;
> +
> +	lockdep_assert_held(&cgroup_mutex);
> +
> +	WARN_ON_ONCE(cgroup_lsm_atype[i].attach_btf_id &&
> +		     cgroup_lsm_atype[i].attach_btf_id != attach_btf_id);
> +
> +	cgroup_lsm_atype[i].attach_btf_id = attach_btf_id;
> +	cgroup_lsm_atype[i].refcnt++;
> +}
> +
> +void bpf_cgroup_atype_put(int cgroup_atype)
> +{
> +	int i = cgroup_atype - CGROUP_LSM_START;
> +
> +	mutex_lock(&cgroup_mutex);
> +	if (--cgroup_lsm_atype[i].refcnt <= 0)
> +		cgroup_lsm_atype[i].attach_btf_id = 0;
> +	mutex_unlock(&cgroup_mutex);
>  }
>  #else
>  static enum cgroup_bpf_attach_type
> @@ -143,6 +187,14 @@ bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
>  		return to_cgroup_bpf_attach_type(attach_type);
>  	return -EOPNOTSUPP;
>  }
> +
> +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype)
> +{
> +}
> +
> +void bpf_cgroup_atype_put(int cgroup_atype)
> +{
> +}
From the test bot report, these two empty functions may need
to be inlined in a .h or else the caller needs to have a CONFIG_CGROUP_BPF
before calling bpf_cgroup_atype_get().  The bpf-cgroup.h may be a better place
than bpf.h for the inlines but not sure if it is easy to be included in
trampoline.c or core.c.  Whatever way makes more sense.  Either .h is fine.

Others lgtm.

>  #endif /* CONFIG_BPF_LSM */
>  
>  void cgroup_bpf_offline(struct cgroup *cgrp)
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8d171eb0ed0d..0699098dc6bc 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -107,6 +107,9 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
>  	fp->aux->prog = fp;
>  	fp->jit_requested = ebpf_jit_enabled();
>  	fp->blinding_requested = bpf_jit_blinding_enabled(fp);
> +#ifdef CONFIG_CGROUP_BPF
> +	aux->cgroup_atype = CGROUP_BPF_ATTACH_TYPE_INVALID;
> +#endif
>  
>  	INIT_LIST_HEAD_RCU(&fp->aux->ksym.lnode);
>  	mutex_init(&fp->aux->used_maps_mutex);
> @@ -2554,6 +2557,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
>  	aux = container_of(work, struct bpf_prog_aux, work);
>  #ifdef CONFIG_BPF_SYSCALL
>  	bpf_free_kfunc_btf_tab(aux->kfunc_btf_tab);
> +#endif
> +#ifdef CONFIG_CGROUP_BPF
> +	if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
> +		bpf_cgroup_atype_put(aux->cgroup_atype);
>  #endif
>  	bpf_free_used_maps(aux);
>  	bpf_free_used_btfs(aux);
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 023239a10e7c..e849dd243624 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -555,6 +555,7 @@ static struct bpf_shim_tramp_link *cgroup_shim_alloc(const struct bpf_prog *prog
>  	bpf_prog_inc(p);
>  	bpf_link_init(&shim_link->link.link, BPF_LINK_TYPE_UNSPEC,
>  		      &bpf_shim_tramp_link_lops, p);
> +	bpf_cgroup_atype_get(p->aux->attach_btf_id, cgroup_atype);
>  
>  	return shim_link;
>  }
> -- 
> 2.36.1.476.g0c4daa206d-goog
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP
  2022-06-10 16:57 ` [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP Stanislav Fomichev
@ 2022-06-17  0:58   ` Martin KaFai Lau
  2022-06-17 18:28     ` Stanislav Fomichev
  0 siblings, 1 reply; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17  0:58 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 10, 2022 at 09:57:58AM -0700, Stanislav Fomichev wrote:
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index ba402d50e130..c869317479ec 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1029,57 +1029,92 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  			      union bpf_attr __user *uattr)
>  {
> +	__u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
>  	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
>  	enum bpf_attach_type type = attr->query.attach_type;
> +	enum cgroup_bpf_attach_type from_atype, to_atype;
>  	enum cgroup_bpf_attach_type atype;
>  	struct bpf_prog_array *effective;
>  	struct hlist_head *progs;
>  	struct bpf_prog *prog;
>  	int cnt, ret = 0, i;
> +	int total_cnt = 0;
>  	u32 flags;
>  
> -	atype = to_cgroup_bpf_attach_type(type);
> -	if (atype < 0)
> -		return -EINVAL;
> +	if (type == BPF_LSM_CGROUP) {
> +		if (attr->query.prog_cnt && prog_ids && !prog_attach_flags)
> +			return -EINVAL;
>  
> -	progs = &cgrp->bpf.progs[atype];
> -	flags = cgrp->bpf.flags[atype];
> +		from_atype = CGROUP_LSM_START;
> +		to_atype = CGROUP_LSM_END;
> +		flags = 0;
> +	} else {
> +		from_atype = to_cgroup_bpf_attach_type(type);
> +		if (from_atype < 0)
> +			return -EINVAL;
> +		to_atype = from_atype;
> +		flags = cgrp->bpf.flags[from_atype];
> +	}
>  
> -	effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> -					      lockdep_is_held(&cgroup_mutex));
> +	for (atype = from_atype; atype <= to_atype; atype++) {
> +		progs = &cgrp->bpf.progs[atype];
nit. Move the 'progs = ...' into the 'else {}' case below.

>  
> -	if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE)
> -		cnt = bpf_prog_array_length(effective);
> -	else
> -		cnt = prog_list_length(progs);
> +		if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> +			effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> +							      lockdep_is_held(&cgroup_mutex));
> +			total_cnt += bpf_prog_array_length(effective);
> +		} else {
> +			total_cnt += prog_list_length(progs);
> +		}
> +	}
>  
>  	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
>  		return -EFAULT;
> -	if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt)))
> +	if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
>  		return -EFAULT;
> -	if (attr->query.prog_cnt == 0 || !prog_ids || !cnt)
> +	if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
>  		/* return early if user requested only program count + flags */
>  		return 0;
> -	if (attr->query.prog_cnt < cnt) {
> -		cnt = attr->query.prog_cnt;
> +
> +	if (attr->query.prog_cnt < total_cnt) {
> +		total_cnt = attr->query.prog_cnt;
>  		ret = -ENOSPC;
>  	}
>  
> -	if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> -		return bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> -	} else {
> -		struct bpf_prog_list *pl;
> -		u32 id;
> +	for (atype = from_atype; atype <= to_atype && total_cnt; atype++) {
> +		progs = &cgrp->bpf.progs[atype];
same here.

> +		flags = cgrp->bpf.flags[atype];
and the 'flags = ...' can be moved to 'if (prog_attach_flags) {}'

Others lgtm.

Reviewed-by: Martin KaFai Lau <kafai@fb.com>

>  
> -		i = 0;
> -		hlist_for_each_entry(pl, progs, node) {
> -			prog = prog_list_prog(pl);
> -			id = prog->aux->id;
> -			if (copy_to_user(prog_ids + i, &id, sizeof(id)))
> -				return -EFAULT;
> -			if (++i == cnt)
> -				break;
> +		if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> +			effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> +							      lockdep_is_held(&cgroup_mutex));
> +			cnt = min_t(int, bpf_prog_array_length(effective), total_cnt);
> +			ret = bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> +		} else {
> +			struct bpf_prog_list *pl;
> +			u32 id;
> +
> +			cnt = min_t(int, prog_list_length(progs), total_cnt);
> +			i = 0;
> +			hlist_for_each_entry(pl, progs, node) {
> +				prog = prog_list_prog(pl);
> +				id = prog->aux->id;
> +				if (copy_to_user(prog_ids + i, &id, sizeof(id)))
> +					return -EFAULT;
> +				if (++i == cnt)
> +					break;
> +			}
>  		}
> +
> +		if (prog_attach_flags) {
> +			for (i = 0; i < cnt; i++)
> +				if (copy_to_user(prog_attach_flags + i, &flags, sizeof(flags)))
> +					return -EFAULT;
> +			prog_attach_flags += cnt;
> +		}
> +
> +		prog_ids += cnt;
> +		total_cnt -= cnt;
>  	}
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup
  2022-06-10 16:57 ` [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup Stanislav Fomichev
@ 2022-06-17  5:42   ` Martin KaFai Lau
  2022-06-17 18:28     ` Stanislav Fomichev
  0 siblings, 1 reply; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17  5:42 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 10, 2022 at 09:57:59AM -0700, Stanislav Fomichev wrote:
> I don't see how to make it nice without introducing btf id lists
> for the hooks where these helpers are allowed. Some LSM hooks
> work on the locked sockets, some are triggering early and
> don't grab any locks, so have two lists for now:
> 
> 1. LSM hooks which trigger under socket lock - minority of the hooks,
>    but ideal case for us, we can expose existing BTF-based helpers
> 2. LSM hooks which trigger without socket lock, but they trigger
>    early in the socket creation path where it should be safe to
>    do setsockopt without any locks
> 3. The rest are prohibited. I'm thinking that this use-case might
>    be a good gateway to sleeping lsm cgroup hooks in the future.
>    We can either expose lock/unlock operations (and add tracking
>    to the verifier) or have another set of bpf_setsockopt
>    wrapper that grab the locks and might sleep.
Another possibility is to acquire/release the sk lock in
__bpf_prog_{enter,exit}_lsm_cgroup().  However, it will unnecessarily
acquire it even the prog is not doing any get/setsockopt.
It probably can make some checking to avoid the lock...etc. :/

sleepable bpf-prog is a cleaner way out.  From a quick look,
cgroup_storage is not safe for sleepable bpf-prog.
All other BPF_MAP_TYPE_{SK,INODE,TASK}_STORAGE is already
safe once their common infra in bpf_local_storage.c was made
sleepable-safe.

> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf.h  |  2 ++
>  kernel/bpf/bpf_lsm.c | 40 +++++++++++++++++++++++++++++
>  net/core/filter.c    | 60 ++++++++++++++++++++++++++++++++++++++------
>  3 files changed, 95 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 503f28fa66d2..c0a269269882 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2282,6 +2282,8 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
>  extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
>  extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
>  extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
> +extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
> +extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
>  extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
>  extern const struct bpf_func_proto bpf_find_vma_proto;
>  extern const struct bpf_func_proto bpf_loop_proto;
> diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> index 83aa431dd52e..52b6e3067986 100644
> --- a/kernel/bpf/bpf_lsm.c
> +++ b/kernel/bpf/bpf_lsm.c
> @@ -45,6 +45,26 @@ BTF_ID(func, bpf_lsm_sk_alloc_security)
>  BTF_ID(func, bpf_lsm_sk_free_security)
>  BTF_SET_END(bpf_lsm_current_hooks)
>  
> +/* List of LSM hooks that trigger while the socket is properly locked.
> + */
> +BTF_SET_START(bpf_lsm_locked_sockopt_hooks)
> +BTF_ID(func, bpf_lsm_socket_sock_rcv_skb)
> +BTF_ID(func, bpf_lsm_sk_clone_security)
From looking how security_sk_clone() is used at sock_copy(),
it has two sk args, one is listen sk and one is the clone.
I think both of them are not locked.

The bpf_lsm_inet_csk_clone below should be enough to
do setsockopt in the new clone?

> +BTF_ID(func, bpf_lsm_sock_graft)
> +BTF_ID(func, bpf_lsm_inet_csk_clone)
> +BTF_ID(func, bpf_lsm_inet_conn_established)
> +BTF_ID(func, bpf_lsm_sctp_bind_connect)
I didn't look at this one, so I can't comment.
Do you have a use case?

> +BTF_SET_END(bpf_lsm_locked_sockopt_hooks)
> +
> +/* List of LSM hooks that trigger while the socket is _not_ locked,
> + * but it's ok to call bpf_{g,s}etsockopt because the socket is still
> + * in the early init phase.
> + */
> +BTF_SET_START(bpf_lsm_unlocked_sockopt_hooks)
> +BTF_ID(func, bpf_lsm_socket_post_create)
> +BTF_ID(func, bpf_lsm_socket_socketpair)
> +BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks)
> +

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP
  2022-06-10 16:58 ` [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP Stanislav Fomichev
  2022-06-13 12:07   ` Quentin Monnet
@ 2022-06-17  5:58   ` Martin KaFai Lau
  2022-06-17 18:28     ` Stanislav Fomichev
  1 sibling, 1 reply; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17  5:58 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 10, 2022 at 09:58:02AM -0700, Stanislav Fomichev wrote:
> @@ -84,6 +87,19 @@ static int show_bpf_prog(int id, enum bpf_attach_type attach_type,
>  	}
>  
>  	attach_type_str = libbpf_bpf_attach_type_str(attach_type);
> +
> +	if (btf_vmlinux &&
> +	    info.attach_btf_id < btf__type_cnt(btf_vmlinux)) {
> +		/* Note, we ignore info.attach_btf_obj_id for now. There
> +		 * is no good way to resolve btf_id to vmlinux
> +		 * or module btf.
> +		 */
On usage could be using it to get the btf_info by bpf_obj_get_info_by_fd.
Check for the kernel_btf == true and name is "vmlinux".
To be future proof, it can print <unknown> func for anything
other than "vmlinux" btf.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor
  2022-06-16 22:25   ` Martin KaFai Lau
@ 2022-06-17 18:28     ` Stanislav Fomichev
  2022-06-17 22:25       ` Martin KaFai Lau
  0 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-17 18:28 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Thu, Jun 16, 2022 at 3:25 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Jun 10, 2022 at 09:57:56AM -0700, Stanislav Fomichev wrote:
> > Allow attaching to lsm hooks in the cgroup context.
> >
> > Attaching to per-cgroup LSM works exactly like attaching
> > to other per-cgroup hooks. New BPF_LSM_CGROUP is added
> > to trigger new mode; the actual lsm hook we attach to is
> > signaled via existing attach_btf_id.
> >
> > For the hooks that have 'struct socket' or 'struct sock' as its first
> > argument, we use the cgroup associated with that socket. For the rest,
> > we use 'current' cgroup (this is all on default hierarchy == v2 only).
> > Note that for some hooks that work on 'struct sock' we still
> > take the cgroup from 'current' because some of them work on the socket
> > that hasn't been properly initialized yet.
> >
> > Behind the scenes, we allocate a shim program that is attached
> > to the trampoline and runs cgroup effective BPF programs array.
> > This shim has some rudimentary ref counting and can be shared
> > between several programs attaching to the same per-cgroup lsm hook.
> nit. The 'per-cgroup' may be read as each cgroup has its own shim.
> It will be useful to rephrase it a little.

I'll put the following, LMK if still not clear.

This shim has some rudimentary ref counting and can be shared
between several programs attaching to the same lsm hook from
different cgroups.

> >
> > Note that this patch bloats cgroup size because we add 211
> > cgroup_bpf_attach_type(s) for simplicity sake. This will be
> > addressed in the subsequent patch.
> >
> > Also note that we only add non-sleepable flavor for now. To enable
> > sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu,
> > shim programs have to be freed via trace rcu, cgroup_bpf.effective
> > should be also trace-rcu-managed + maybe some other changes that
> > I'm not aware of.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>
> [ ... ]
>
> > @@ -1840,10 +1850,8 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
> >       emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
> >       /* arg3: lea rdx, [rbp - run_ctx_off] */
> >       EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
> > -     if (emit_call(&prog,
> > -                   p->aux->sleepable ? __bpf_prog_exit_sleepable :
> > -                   __bpf_prog_exit, prog))
> > -                     return -EINVAL;
> > +     if (emit_call(&prog, exit, prog))
> > +             return -EINVAL;
> >
> >       *pprog = prog;
> >       return 0;
> > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > index 5d268e76d8e6..b99f8c3e37ea 100644
> > --- a/include/linux/bpf-cgroup-defs.h
> > +++ b/include/linux/bpf-cgroup-defs.h
> > @@ -10,6 +10,12 @@
> >
> >  struct bpf_prog_array;
> >
> > +#ifdef CONFIG_BPF_LSM
> > +#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > +#else
> > +#define CGROUP_LSM_NUM 0
> > +#endif
> > +
> >  enum cgroup_bpf_attach_type {
> >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> >       CGROUP_INET_INGRESS = 0,
> > @@ -35,6 +41,8 @@ enum cgroup_bpf_attach_type {
> >       CGROUP_INET4_GETSOCKNAME,
> >       CGROUP_INET6_GETSOCKNAME,
> >       CGROUP_INET_SOCK_RELEASE,
> > +     CGROUP_LSM_START,
> > +     CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
> Likely a dumb question, just in case, presumably everything should be fine when
> CGROUP_LSM_NUM is 0 ?

My thinking was:

Let's say CGROUP_INET_SOCK_RELEASE is 10. CGROUP_LSM_START should be
11. And CGROUP_LSM_END should be 10 again. This makes
MAX_CGROUP_BPF_ATTACH_TYPE 11 and doesn't waste any slots when
CONFIG_BPF_LSM=n.

The places where it might lead to problems are the loops like (or any
other range checks):

for (i = CGROUP_LSM_START; i <= CGROUP_LSM_END; i++)

There was one issue in cgroup_bpf_release which we've replaced with
'pl->prog->expected_attach_type == BPF_LSM_CGROUP' and I think the
other one in __cgroup_bpf_query should be fine.

> >       MAX_CGROUP_BPF_ATTACH_TYPE
> >  };
> >
>
> [ ... ]
>
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index 4adb4f3ecb7f..b0314889a409 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -14,6 +14,8 @@
> >  #include <linux/string.h>
> >  #include <linux/bpf.h>
> >  #include <linux/bpf-cgroup.h>
> > +#include <linux/bpf_lsm.h>
> > +#include <linux/bpf_verifier.h>
> >  #include <net/sock.h>
> >  #include <net/bpf_sk_storage.h>
> >
> > @@ -61,6 +63,88 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
> >       return run_ctx.retval;
> >  }
> >
> > +unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx,
> > +                                    const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *shim_prog;
> > +     struct sock *sk;
> > +     struct cgroup *cgrp;
> > +     int ret = 0;
> > +     u64 *args;
> > +
> > +     args = (u64 *)ctx;
> > +     sk = (void *)(unsigned long)args[0];
> > +     /*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > +
> > +     cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > +     if (likely(cgrp))
> > +             ret = bpf_prog_run_array_cg(&cgrp->bpf,
> > +                                         shim_prog->aux->cgroup_atype,
> > +                                         ctx, bpf_prog_run, 0, NULL);
> > +     return ret;
> > +}
> > +
> > +unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > +                                      const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *shim_prog;
> > +     struct socket *sock;
> > +     struct cgroup *cgrp;
> > +     int ret = 0;
> > +     u64 *args;
> > +
> > +     args = (u64 *)ctx;
> > +     sock = (void *)(unsigned long)args[0];
> > +     /*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > +
> > +     cgrp = sock_cgroup_ptr(&sock->sk->sk_cgrp_data);
> > +     if (likely(cgrp))
> > +             ret = bpf_prog_run_array_cg(&cgrp->bpf,
> > +                                         shim_prog->aux->cgroup_atype,
> > +                                         ctx, bpf_prog_run, 0, NULL);
> > +     return ret;
> > +}
> > +
> > +unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > +                                       const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *shim_prog;
> > +     struct cgroup *cgrp;
> > +     int ret = 0;
> > +
> > +     /*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > +
> > +     rcu_read_lock();
> nit. Not needed.  May be a comment about it has been acquired
> in __bpf_prog_enter_lsm_cgroup().  For sleepable in the future,
> rcu_read_lock() cannot be done here also.

True, will add a comment instead, thanks!

> > +     cgrp = task_dfl_cgroup(current);
> > +     if (likely(cgrp))
> > +             ret = bpf_prog_run_array_cg(&cgrp->bpf,
> > +                                         shim_prog->aux->cgroup_atype,
> > +                                         ctx, bpf_prog_run, 0, NULL);
> > +     rcu_read_unlock();
> > +     return ret;
> > +}
> > +
> > +#ifdef CONFIG_BPF_LSM
> > +static enum cgroup_bpf_attach_type
> > +bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
> > +{
> > +     if (attach_type != BPF_LSM_CGROUP)
> > +             return to_cgroup_bpf_attach_type(attach_type);
> > +     return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
> > +}
> > +#else
> > +static enum cgroup_bpf_attach_type
> > +bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
> > +{
> > +     if (attach_type != BPF_LSM_CGROUP)
> > +             return to_cgroup_bpf_attach_type(attach_type);
> > +     return -EOPNOTSUPP;
> > +}
> > +#endif /* CONFIG_BPF_LSM */
> > +
> >  void cgroup_bpf_offline(struct cgroup *cgrp)
> >  {
> >       cgroup_get(cgrp);
> > @@ -163,10 +247,20 @@ static void cgroup_bpf_release(struct work_struct *work)
> >
> >               hlist_for_each_entry_safe(pl, pltmp, progs, node) {
> >                       hlist_del(&pl->node);
> > -                     if (pl->prog)
> > +                     if (pl->prog) {
> > +#ifdef CONFIG_BPF_LSM
> This should not be needed as it is not needed in __cgroup_bpf_attach() below.

Oh yes, as I mentioned above, it's been here because I used to do if
atype >= START && atype <= END and there was a compiler warning for
CONFIG_BPF_LSM=n case.

> > +                             if (pl->prog->expected_attach_type == BPF_LSM_CGROUP)
> > +                                     bpf_trampoline_unlink_cgroup_shim(pl->prog);
> > +#endif
> >                               bpf_prog_put(pl->prog);
> > -                     if (pl->link)
> > +                     }
> > +                     if (pl->link) {
> > +#ifdef CONFIG_BPF_LSM
> Same here.
>
> > +                             if (pl->link->link.prog->expected_attach_type == BPF_LSM_CGROUP)
> > +                                     bpf_trampoline_unlink_cgroup_shim(pl->link->link.prog);
> > +#endif
> >                               bpf_cgroup_link_auto_detach(pl->link);
> > +                     }
> >                       kfree(pl);
> >                       static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> >               }
> > @@ -479,6 +573,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       struct bpf_prog *old_prog = NULL;
> >       struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> >       struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> > +     struct bpf_prog *new_prog = prog ? : link->link.prog;
> >       enum cgroup_bpf_attach_type atype;
> >       struct bpf_prog_list *pl;
> >       struct hlist_head *progs;
> > @@ -495,7 +590,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >               /* replace_prog implies BPF_F_REPLACE, and vice versa */
> >               return -EINVAL;
> >
> > -     atype = to_cgroup_bpf_attach_type(type);
> > +     atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
> >       if (atype < 0)
> >               return -EINVAL;
> >
> > @@ -549,17 +644,30 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       bpf_cgroup_storages_assign(pl->storage, storage);
> >       cgrp->bpf.flags[atype] = saved_flags;
> >
> > +     if (type == BPF_LSM_CGROUP) {
> > +             err = bpf_trampoline_link_cgroup_shim(new_prog, atype);
> > +             if (err)
> > +                     goto cleanup;
> > +     }
> > +
> >       err = update_effective_progs(cgrp, atype);
> >       if (err)
> > -             goto cleanup;
> > +             goto cleanup_trampoline;
> >
> > -     if (old_prog)
> > +     if (old_prog) {
> > +             if (type == BPF_LSM_CGROUP)
> > +                     bpf_trampoline_unlink_cgroup_shim(old_prog);
> >               bpf_prog_put(old_prog);
> > -     else
> > +     } else {
> >               static_branch_inc(&cgroup_bpf_enabled_key[atype]);
> > +     }
> >       bpf_cgroup_storages_link(new_storage, cgrp, type);
> >       return 0;
> >
> > +cleanup_trampoline:
> > +     if (type == BPF_LSM_CGROUP)
> > +             bpf_trampoline_unlink_cgroup_shim(new_prog);
> > +
> >  cleanup:
> >       if (old_prog) {
> >               pl->prog = old_prog;
> > @@ -651,7 +759,7 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
> >       struct hlist_head *progs;
> >       bool found = false;
> >
> > -     atype = to_cgroup_bpf_attach_type(link->type);
> > +     atype = bpf_cgroup_atype_find(link->type, new_prog->aux->attach_btf_id);
> >       if (atype < 0)
> >               return -EINVAL;
> >
> > @@ -803,9 +911,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >       struct bpf_prog *old_prog;
> >       struct bpf_prog_list *pl;
> >       struct hlist_head *progs;
> > +     u32 attach_btf_id = 0;
> >       u32 flags;
> >
> > -     atype = to_cgroup_bpf_attach_type(type);
> > +     if (prog)
> > +             attach_btf_id = prog->aux->attach_btf_id;
> > +     if (link)
> > +             attach_btf_id = link->link.prog->aux->attach_btf_id;
> > +
> > +     atype = bpf_cgroup_atype_find(type, attach_btf_id);
> >       if (atype < 0)
> >               return -EINVAL;
> >
> > @@ -839,8 +953,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >       if (hlist_empty(progs))
> >               /* last program was detached, reset flags to zero */
> >               cgrp->bpf.flags[atype] = 0;
> > -     if (old_prog)
> > +     if (old_prog) {
> > +             if (type == BPF_LSM_CGROUP)
> > +                     bpf_trampoline_unlink_cgroup_shim(old_prog);
> I think the same bpf_trampoline_unlink_cgroup_shim() needs to be done
> in bpf_cgroup_link_release()?  It should be done just after
> WARN_ON(__cgroup_bpf_detach()).

Oooh, I missed that, I thought that old_prog would have the pointer to
the old program even if it's a link :-(

Do you mind if I handle it in __cgroup_bpf_detach as well? Or do you
think it's cleaner to do in bpf_cgroup_link_release?

if (old_prog) {
  ...
} else if (link) {
  if (type == BPF_LSM_CGROUP)
    bpf_trampoline_unlink_cgroup_shim(link->link.prog);
}

> [ ... ]
>
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index aeb31137b2ed..a237be4f8bb3 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -3416,6 +3416,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
> >               return BPF_PROG_TYPE_SK_LOOKUP;
> >       case BPF_XDP:
> >               return BPF_PROG_TYPE_XDP;
> > +     case BPF_LSM_CGROUP:
> > +             return BPF_PROG_TYPE_LSM;
> >       default:
> >               return BPF_PROG_TYPE_UNSPEC;
> >       }
> > @@ -3469,6 +3471,11 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> >       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> >       case BPF_PROG_TYPE_CGROUP_SYSCTL:
> >       case BPF_PROG_TYPE_SOCK_OPS:
> > +     case BPF_PROG_TYPE_LSM:
> > +             if (ptype == BPF_PROG_TYPE_LSM &&
> > +                 prog->expected_attach_type != BPF_LSM_CGROUP)
> Check this in bpf_prog_attach_check_attach_type() where
> other expected_attach_type are enforced.

It was there initially but I moved it here because
bpf_prog_attach_check_attach_type() is called from link_create() as
well where the range of acceptable expected_attach_type(s) is larger.

> > +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
> > +{
> > +     struct bpf_shim_tramp_link *shim_link = NULL;
> > +     struct bpf_trampoline *tr;
> > +     bpf_func_t bpf_func;
> > +     u64 key;
> > +
> > +     key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > +                                      prog->aux->attach_btf_id);
> > +
> > +     bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > +     tr = bpf_trampoline_lookup(key);
> > +     if (!tr)
> nit. !tr should not be possible?  If not, may be a WARN_ON_ONCE()
> or remove this check.

Let's have WARN_ON_ONCE, you never know..

>> +             return;
> > +
> > +     mutex_lock(&tr->mutex);
> > +     shim_link = cgroup_shim_find(tr, bpf_func);
> > +     mutex_unlock(&tr->mutex);
> > +
> > +     if (shim_link)
> > +             bpf_link_put(&shim_link->link.link);
> > +
> > +     bpf_trampoline_put(tr); /* bpf_trampoline_lookup above */
> > +}
> > +#endif
>
> [ ... ]
>
> > diff --git a/tools/include/linux/btf_ids.h b/tools/include/linux/btf_ids.h
> > index 57890b357f85..2345b502b439 100644
> > --- a/tools/include/linux/btf_ids.h
> > +++ b/tools/include/linux/btf_ids.h
> > @@ -172,7 +172,9 @@ extern struct btf_id_set name;
> >       BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP_TW, tcp_timewait_sock)          \
> >       BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP6, tcp6_sock)                    \
> >       BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock)                      \
> > -     BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)
> > +     BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)                    \
> > +     BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock)                    \
> The existing tools's btf_ids.h looks more outdated from
> the kernel's btf_ids.h.  unix_sock is missing which is added back here.
> mptcp_sock is missing also but not added.  I assume the latter test
> needs unix_sock here ?

I haven't added mptcp_sock because I was added recently.

I don't think we really need to do the changes to tools/btf_ids.h, but
it still might be worth trying to keep them in sync?

> Others lgtm.

Thank you, again, for the review!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program
  2022-06-17  0:43   ` Martin KaFai Lau
@ 2022-06-17 18:28     ` Stanislav Fomichev
  2022-06-17 22:27       ` Martin KaFai Lau
  0 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-17 18:28 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Thu, Jun 16, 2022 at 5:43 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Jun 10, 2022 at 09:57:57AM -0700, Stanislav Fomichev wrote:
> > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > and bpf_cgroup program array. Instead of reserving a slot per
> > possible hook, reserve 10 slots per cgroup for lsm programs.
> > Those slots are dynamically allocated on demand and reclaimed.
> >
> > struct cgroup_bpf {
> >       struct bpf_prog_array *    effective[33];        /*     0   264 */
> >       /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> >       struct hlist_head          progs[33];            /*   264   264 */
> >       /* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */
> >       u8                         flags[33];            /*   528    33 */
> >
> >       /* XXX 7 bytes hole, try to pack */
> >
> >       struct list_head           storages;             /*   568    16 */
> >       /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
> >       struct bpf_prog_array *    inactive;             /*   584     8 */
> >       struct percpu_ref          refcnt;               /*   592    16 */
> >       struct work_struct         release_work;         /*   608    72 */
> >
> >       /* size: 680, cachelines: 11, members: 7 */
> >       /* sum members: 673, holes: 1, sum holes: 7 */
> >       /* last cacheline: 40 bytes */
> > };
> >
> > Move 'ifdef CONFIG_CGROUP_BPF' to expose CGROUP_BPF_ATTACH_TYPE_INVALID
> > to non-cgroup (core) parts.
> hmm... don't see this change in bpf-cgroup-defs.h in this patch.

Sorry for confusion, that's a leftover from my previous attempt to
make it work for CONFIG_CGROUP_BPF=n case; will remove.

> >
> > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > index b99f8c3e37ea..7b121bd780eb 100644
> > --- a/include/linux/bpf-cgroup-defs.h
> > +++ b/include/linux/bpf-cgroup-defs.h
> > @@ -11,7 +11,8 @@
> >  struct bpf_prog_array;
> >
> >  #ifdef CONFIG_BPF_LSM
> > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > +/* Maximum number of concurrently attachable per-cgroup LSM hooks. */
> > +#define CGROUP_LSM_NUM 10
> >  #else
> >  #define CGROUP_LSM_NUM 0
> >  #endif
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 4dceb86229f6..503f28fa66d2 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2407,7 +2407,6 @@ int bpf_arch_text_invalidate(void *dst, size_t len);
> >
> >  struct btf_id_set;
> >  bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
> > -int btf_id_set_index(const struct btf_id_set *set, u32 id);
> >
> >  #define MAX_BPRINTF_VARARGS          12
> >
> > @@ -2444,4 +2443,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
> >  void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> >  int bpf_dynptr_check_size(u32 size);
> >
> > +void bpf_cgroup_atype_put(int cgroup_atype);
> > +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
> > +
> >  #endif /* _LINUX_BPF_H */
>
> [ ... ]
>
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index b0314889a409..ba402d50e130 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -128,12 +128,56 @@ unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> >  }
> >
> >  #ifdef CONFIG_BPF_LSM
> > +struct cgroup_lsm_atype {
> > +     u32 attach_btf_id;
> > +     int refcnt;
> > +};
> > +
> > +static struct cgroup_lsm_atype cgroup_lsm_atype[CGROUP_LSM_NUM];
> > +
> >  static enum cgroup_bpf_attach_type
> >  bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
> >  {
> > +     int i;
> > +
> > +     lockdep_assert_held(&cgroup_mutex);
> > +
> >       if (attach_type != BPF_LSM_CGROUP)
> >               return to_cgroup_bpf_attach_type(attach_type);
> > -     return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
> > +
> > +     for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype); i++)
> > +             if (cgroup_lsm_atype[i].attach_btf_id == attach_btf_id)
> > +                     return CGROUP_LSM_START + i;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype); i++)
> > +             if (cgroup_lsm_atype[i].attach_btf_id == 0)
> > +                     return CGROUP_LSM_START + i;
> > +
> > +     return -E2BIG;
> > +
> > +}
> > +
> > +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype)
> > +{
> > +     int i = cgroup_atype - CGROUP_LSM_START;
> > +
> > +     lockdep_assert_held(&cgroup_mutex);
> > +
> > +     WARN_ON_ONCE(cgroup_lsm_atype[i].attach_btf_id &&
> > +                  cgroup_lsm_atype[i].attach_btf_id != attach_btf_id);
> > +
> > +     cgroup_lsm_atype[i].attach_btf_id = attach_btf_id;
> > +     cgroup_lsm_atype[i].refcnt++;
> > +}
> > +
> > +void bpf_cgroup_atype_put(int cgroup_atype)
> > +{
> > +     int i = cgroup_atype - CGROUP_LSM_START;
> > +
> > +     mutex_lock(&cgroup_mutex);
> > +     if (--cgroup_lsm_atype[i].refcnt <= 0)
> > +             cgroup_lsm_atype[i].attach_btf_id = 0;
> > +     mutex_unlock(&cgroup_mutex);
> >  }
> >  #else
> >  static enum cgroup_bpf_attach_type
> > @@ -143,6 +187,14 @@ bpf_cgroup_atype_find(enum bpf_attach_type attach_type, u32 attach_btf_id)
> >               return to_cgroup_bpf_attach_type(attach_type);
> >       return -EOPNOTSUPP;
> >  }
> > +
> > +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype)
> > +{
> > +}
> > +
> > +void bpf_cgroup_atype_put(int cgroup_atype)
> > +{
> > +}
> From the test bot report, these two empty functions may need
> to be inlined in a .h or else the caller needs to have a CONFIG_CGROUP_BPF
> before calling bpf_cgroup_atype_get().  The bpf-cgroup.h may be a better place
> than bpf.h for the inlines but not sure if it is easy to be included in
> trampoline.c or core.c.  Whatever way makes more sense.  Either .h is fine.

Yeah, I already moved them into headers after the complaints from the
bot. Thanks for double checking!

Let's keep in bpf.h ?

kernel/bpf/core.c:2563:17: error: implicit declaration of function
‘bpf_cgroup_atype_put’ [-Werror=implicit-function-declaration]
 2563 |                 bpf_cgroup_atype_put(aux->cgroup_atype);


> Others lgtm.
>
> >  #endif /* CONFIG_BPF_LSM */
> >
> >  void cgroup_bpf_offline(struct cgroup *cgrp)
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8d171eb0ed0d..0699098dc6bc 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -107,6 +107,9 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
> >       fp->aux->prog = fp;
> >       fp->jit_requested = ebpf_jit_enabled();
> >       fp->blinding_requested = bpf_jit_blinding_enabled(fp);
> > +#ifdef CONFIG_CGROUP_BPF
> > +     aux->cgroup_atype = CGROUP_BPF_ATTACH_TYPE_INVALID;
> > +#endif
> >
> >       INIT_LIST_HEAD_RCU(&fp->aux->ksym.lnode);
> >       mutex_init(&fp->aux->used_maps_mutex);
> > @@ -2554,6 +2557,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
> >       aux = container_of(work, struct bpf_prog_aux, work);
> >  #ifdef CONFIG_BPF_SYSCALL
> >       bpf_free_kfunc_btf_tab(aux->kfunc_btf_tab);
> > +#endif
> > +#ifdef CONFIG_CGROUP_BPF
> > +     if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
> > +             bpf_cgroup_atype_put(aux->cgroup_atype);
> >  #endif
> >       bpf_free_used_maps(aux);
> >       bpf_free_used_btfs(aux);
> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > index 023239a10e7c..e849dd243624 100644
> > --- a/kernel/bpf/trampoline.c
> > +++ b/kernel/bpf/trampoline.c
> > @@ -555,6 +555,7 @@ static struct bpf_shim_tramp_link *cgroup_shim_alloc(const struct bpf_prog *prog
> >       bpf_prog_inc(p);
> >       bpf_link_init(&shim_link->link.link, BPF_LINK_TYPE_UNSPEC,
> >                     &bpf_shim_tramp_link_lops, p);
> > +     bpf_cgroup_atype_get(p->aux->attach_btf_id, cgroup_atype);
> >
> >       return shim_link;
> >  }
> > --
> > 2.36.1.476.g0c4daa206d-goog
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP
  2022-06-17  0:58   ` Martin KaFai Lau
@ 2022-06-17 18:28     ` Stanislav Fomichev
  2022-06-17 22:29       ` Martin KaFai Lau
  0 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-17 18:28 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Thu, Jun 16, 2022 at 5:58 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Jun 10, 2022 at 09:57:58AM -0700, Stanislav Fomichev wrote:
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index ba402d50e130..c869317479ec 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -1029,57 +1029,92 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >  static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> >                             union bpf_attr __user *uattr)
> >  {
> > +     __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
> >       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> >       enum bpf_attach_type type = attr->query.attach_type;
> > +     enum cgroup_bpf_attach_type from_atype, to_atype;
> >       enum cgroup_bpf_attach_type atype;
> >       struct bpf_prog_array *effective;
> >       struct hlist_head *progs;
> >       struct bpf_prog *prog;
> >       int cnt, ret = 0, i;
> > +     int total_cnt = 0;
> >       u32 flags;
> >
> > -     atype = to_cgroup_bpf_attach_type(type);
> > -     if (atype < 0)
> > -             return -EINVAL;
> > +     if (type == BPF_LSM_CGROUP) {
> > +             if (attr->query.prog_cnt && prog_ids && !prog_attach_flags)
> > +                     return -EINVAL;
> >
> > -     progs = &cgrp->bpf.progs[atype];
> > -     flags = cgrp->bpf.flags[atype];
> > +             from_atype = CGROUP_LSM_START;
> > +             to_atype = CGROUP_LSM_END;
> > +             flags = 0;
> > +     } else {
> > +             from_atype = to_cgroup_bpf_attach_type(type);
> > +             if (from_atype < 0)
> > +                     return -EINVAL;
> > +             to_atype = from_atype;
> > +             flags = cgrp->bpf.flags[from_atype];
> > +     }
> >
> > -     effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> > -                                           lockdep_is_held(&cgroup_mutex));
> > +     for (atype = from_atype; atype <= to_atype; atype++) {
> > +             progs = &cgrp->bpf.progs[atype];
> nit. Move the 'progs = ...' into the 'else {}' case below.
>
> >
> > -     if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE)
> > -             cnt = bpf_prog_array_length(effective);
> > -     else
> > -             cnt = prog_list_length(progs);
> > +             if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> > +                     effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> > +                                                           lockdep_is_held(&cgroup_mutex));
> > +                     total_cnt += bpf_prog_array_length(effective);
> > +             } else {
> > +                     total_cnt += prog_list_length(progs);
> > +             }
> > +     }
> >
> >       if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
> >               return -EFAULT;
> > -     if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt)))
> > +     if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
> >               return -EFAULT;
> > -     if (attr->query.prog_cnt == 0 || !prog_ids || !cnt)
> > +     if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
> >               /* return early if user requested only program count + flags */
> >               return 0;
> > -     if (attr->query.prog_cnt < cnt) {
> > -             cnt = attr->query.prog_cnt;
> > +
> > +     if (attr->query.prog_cnt < total_cnt) {
> > +             total_cnt = attr->query.prog_cnt;
> >               ret = -ENOSPC;
> >       }
> >
> > -     if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> > -             return bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> > -     } else {
> > -             struct bpf_prog_list *pl;
> > -             u32 id;
> > +     for (atype = from_atype; atype <= to_atype && total_cnt; atype++) {
> > +             progs = &cgrp->bpf.progs[atype];
> same here.
>
> > +             flags = cgrp->bpf.flags[atype];
> and the 'flags = ...' can be moved to 'if (prog_attach_flags) {}'
>
> Others lgtm.
>
> Reviewed-by: Martin KaFai Lau <kafai@fb.com>

Everything makes sense, will do, thanks!

Maybe we should also move "struct hlist_head *progs;" closer to the
places where we use them? Same for "struct bpf_prog *prog;" which
seems to be used only in one place.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup
  2022-06-17  5:42   ` Martin KaFai Lau
@ 2022-06-17 18:28     ` Stanislav Fomichev
  2022-06-17 23:07       ` Martin KaFai Lau
  0 siblings, 1 reply; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-17 18:28 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Thu, Jun 16, 2022 at 10:42 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Jun 10, 2022 at 09:57:59AM -0700, Stanislav Fomichev wrote:
> > I don't see how to make it nice without introducing btf id lists
> > for the hooks where these helpers are allowed. Some LSM hooks
> > work on the locked sockets, some are triggering early and
> > don't grab any locks, so have two lists for now:
> >
> > 1. LSM hooks which trigger under socket lock - minority of the hooks,
> >    but ideal case for us, we can expose existing BTF-based helpers
> > 2. LSM hooks which trigger without socket lock, but they trigger
> >    early in the socket creation path where it should be safe to
> >    do setsockopt without any locks
> > 3. The rest are prohibited. I'm thinking that this use-case might
> >    be a good gateway to sleeping lsm cgroup hooks in the future.
> >    We can either expose lock/unlock operations (and add tracking
> >    to the verifier) or have another set of bpf_setsockopt
> >    wrapper that grab the locks and might sleep.
> Another possibility is to acquire/release the sk lock in
> __bpf_prog_{enter,exit}_lsm_cgroup().  However, it will unnecessarily
> acquire it even the prog is not doing any get/setsockopt.
> It probably can make some checking to avoid the lock...etc. :/
>
> sleepable bpf-prog is a cleaner way out.  From a quick look,
> cgroup_storage is not safe for sleepable bpf-prog.

Is it because it's using non-trace-flavor of rcu?

> All other BPF_MAP_TYPE_{SK,INODE,TASK}_STORAGE is already
> safe once their common infra in bpf_local_storage.c was made
> sleepable-safe.

That might be another argument in favor of replacing the internal
implementation for cgroup_storage with the generic framework we use
for sk/inode/task.

> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/bpf.h  |  2 ++
> >  kernel/bpf/bpf_lsm.c | 40 +++++++++++++++++++++++++++++
> >  net/core/filter.c    | 60 ++++++++++++++++++++++++++++++++++++++------
> >  3 files changed, 95 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 503f28fa66d2..c0a269269882 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2282,6 +2282,8 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
> >  extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
> >  extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
> >  extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
> > +extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
> > +extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
> >  extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
> >  extern const struct bpf_func_proto bpf_find_vma_proto;
> >  extern const struct bpf_func_proto bpf_loop_proto;
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index 83aa431dd52e..52b6e3067986 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -45,6 +45,26 @@ BTF_ID(func, bpf_lsm_sk_alloc_security)
> >  BTF_ID(func, bpf_lsm_sk_free_security)
> >  BTF_SET_END(bpf_lsm_current_hooks)
> >
> > +/* List of LSM hooks that trigger while the socket is properly locked.
> > + */
> > +BTF_SET_START(bpf_lsm_locked_sockopt_hooks)
> > +BTF_ID(func, bpf_lsm_socket_sock_rcv_skb)
> > +BTF_ID(func, bpf_lsm_sk_clone_security)
> From looking how security_sk_clone() is used at sock_copy(),
> it has two sk args, one is listen sk and one is the clone.
> I think both of them are not locked.
>
> The bpf_lsm_inet_csk_clone below should be enough to
> do setsockopt in the new clone?

Hm, good point, let me drop this one.

I wonder if long term, instead of those lists, we can annotate the
arguments with __locked or __unlocked (the way we do with __user
pointers)? That might be more scalable and we can let sleepable bpf
deal with __unlocked cases. Just thinking out loud...

> > +BTF_ID(func, bpf_lsm_sock_graft)
> > +BTF_ID(func, bpf_lsm_inet_csk_clone)
> > +BTF_ID(func, bpf_lsm_inet_conn_established)
> > +BTF_ID(func, bpf_lsm_sctp_bind_connect)
> I didn't look at this one, so I can't comment.
> Do you have a use case?

No, let's drop as well. I didn't want those lists to contain only the
cases I want, otherwise it doesn't feel generic. But sctp seems dead
anyway.


> > +BTF_SET_END(bpf_lsm_locked_sockopt_hooks)
> > +
> > +/* List of LSM hooks that trigger while the socket is _not_ locked,
> > + * but it's ok to call bpf_{g,s}etsockopt because the socket is still
> > + * in the early init phase.
> > + */
> > +BTF_SET_START(bpf_lsm_unlocked_sockopt_hooks)
> > +BTF_ID(func, bpf_lsm_socket_post_create)
> > +BTF_ID(func, bpf_lsm_socket_socketpair)
> > +BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks)
> > +

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP
  2022-06-17  5:58   ` Martin KaFai Lau
@ 2022-06-17 18:28     ` Stanislav Fomichev
  0 siblings, 0 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-17 18:28 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Thu, Jun 16, 2022 at 10:58 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Jun 10, 2022 at 09:58:02AM -0700, Stanislav Fomichev wrote:
> > @@ -84,6 +87,19 @@ static int show_bpf_prog(int id, enum bpf_attach_type attach_type,
> >       }
> >
> >       attach_type_str = libbpf_bpf_attach_type_str(attach_type);
> > +
> > +     if (btf_vmlinux &&
> > +         info.attach_btf_id < btf__type_cnt(btf_vmlinux)) {
> > +             /* Note, we ignore info.attach_btf_obj_id for now. There
> > +              * is no good way to resolve btf_id to vmlinux
> > +              * or module btf.
> > +              */
> On usage could be using it to get the btf_info by bpf_obj_get_info_by_fd.
> Check for the kernel_btf == true and name is "vmlinux".
> To be future proof, it can print <unknown> func for anything
> other than "vmlinux" btf.

Thanks for the suggestion! I'll try to do it and cache btf_vmlinux_id
if it's "vmlinux". What we really need is a way to obtain vmlinux
btf_id and compare it against info.attach_btf_obj_id.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor
  2022-06-17 18:28     ` Stanislav Fomichev
@ 2022-06-17 22:25       ` Martin KaFai Lau
  0 siblings, 0 replies; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 22:25 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 17, 2022 at 11:28:15AM -0700, Stanislav Fomichev wrote:
> On Thu, Jun 16, 2022 at 3:25 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 09:57:56AM -0700, Stanislav Fomichev wrote:
> > > Allow attaching to lsm hooks in the cgroup context.
> > >
> > > Attaching to per-cgroup LSM works exactly like attaching
> > > to other per-cgroup hooks. New BPF_LSM_CGROUP is added
> > > to trigger new mode; the actual lsm hook we attach to is
> > > signaled via existing attach_btf_id.
> > >
> > > For the hooks that have 'struct socket' or 'struct sock' as its first
> > > argument, we use the cgroup associated with that socket. For the rest,
> > > we use 'current' cgroup (this is all on default hierarchy == v2 only).
> > > Note that for some hooks that work on 'struct sock' we still
> > > take the cgroup from 'current' because some of them work on the socket
> > > that hasn't been properly initialized yet.
> > >
> > > Behind the scenes, we allocate a shim program that is attached
> > > to the trampoline and runs cgroup effective BPF programs array.
> > > This shim has some rudimentary ref counting and can be shared
> > > between several programs attaching to the same per-cgroup lsm hook.
> > nit. The 'per-cgroup' may be read as each cgroup has its own shim.
> > It will be useful to rephrase it a little.
> 
> I'll put the following, LMK if still not clear.
> 
> This shim has some rudimentary ref counting and can be shared
> between several programs attaching to the same lsm hook from
> different cgroups.
SG.

> > > @@ -839,8 +953,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> > >       if (hlist_empty(progs))
> > >               /* last program was detached, reset flags to zero */
> > >               cgrp->bpf.flags[atype] = 0;
> > > -     if (old_prog)
> > > +     if (old_prog) {
> > > +             if (type == BPF_LSM_CGROUP)
> > > +                     bpf_trampoline_unlink_cgroup_shim(old_prog);
> > I think the same bpf_trampoline_unlink_cgroup_shim() needs to be done
> > in bpf_cgroup_link_release()?  It should be done just after
> > WARN_ON(__cgroup_bpf_detach()).
> 
> Oooh, I missed that, I thought that old_prog would have the pointer to
> the old program even if it's a link :-(
> 
> Do you mind if I handle it in __cgroup_bpf_detach as well? Or do you
> think it's cleaner to do in bpf_cgroup_link_release?
> 
> if (old_prog) {
>   ...
> } else if (link) {
>   if (type == BPF_LSM_CGROUP)
>     bpf_trampoline_unlink_cgroup_shim(link->link.prog);
> }
I think this will be similar to the earlier revision.

I was thinking doing it in bpf_cgroup_link_release() for link
where I know the bpf_prog_put() will be handled by bpf_link_free()
as other link's release/detach functions do.  Otherwise, the first
reading impression is why there is no bpf_prog_put() in
__cgroup_bpf_detach() for the link case.

No strong opinion here.  Either way is fine.  Mostly personal impression
when reading the code in the first pass.  Reading it closely should be
able to figure out in either way.

> 
> > [ ... ]
> >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index aeb31137b2ed..a237be4f8bb3 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -3416,6 +3416,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
> > >               return BPF_PROG_TYPE_SK_LOOKUP;
> > >       case BPF_XDP:
> > >               return BPF_PROG_TYPE_XDP;
> > > +     case BPF_LSM_CGROUP:
> > > +             return BPF_PROG_TYPE_LSM;
> > >       default:
> > >               return BPF_PROG_TYPE_UNSPEC;
> > >       }
> > > @@ -3469,6 +3471,11 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> > >       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> > >       case BPF_PROG_TYPE_CGROUP_SYSCTL:
> > >       case BPF_PROG_TYPE_SOCK_OPS:
> > > +     case BPF_PROG_TYPE_LSM:
> > > +             if (ptype == BPF_PROG_TYPE_LSM &&
> > > +                 prog->expected_attach_type != BPF_LSM_CGROUP)
> > Check this in bpf_prog_attach_check_attach_type() where
> > other expected_attach_type are enforced.
> 
> It was there initially but I moved it here because
> bpf_prog_attach_check_attach_type() is called from link_create() as
> well where the range of acceptable expected_attach_type(s) is larger.
Ah. ic.  Thanks for the explanation.

> > > diff --git a/tools/include/linux/btf_ids.h b/tools/include/linux/btf_ids.h
> > > index 57890b357f85..2345b502b439 100644
> > > --- a/tools/include/linux/btf_ids.h
> > > +++ b/tools/include/linux/btf_ids.h
> > > @@ -172,7 +172,9 @@ extern struct btf_id_set name;
> > >       BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP_TW, tcp_timewait_sock)          \
> > >       BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP6, tcp6_sock)                    \
> > >       BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock)                      \
> > > -     BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)
> > > +     BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)                    \
> > > +     BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock)                    \
> > The existing tools's btf_ids.h looks more outdated from
> > the kernel's btf_ids.h.  unix_sock is missing which is added back here.
> > mptcp_sock is missing also but not added.  I assume the latter test
> > needs unix_sock here ?
> 
> I haven't added mptcp_sock because I was added recently.
> 
> I don't think we really need to do the changes to tools/btf_ids.h, but
> it still might be worth trying to keep them in sync?
Yeah.  Other than this list, it seems other parts of this file is out of sync
also.  May be just do an individual patch in this series to do a
copy-and-paste update of the whole file.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program
  2022-06-17 18:28     ` Stanislav Fomichev
@ 2022-06-17 22:27       ` Martin KaFai Lau
  0 siblings, 0 replies; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 22:27 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 17, 2022 at 11:28:18AM -0700, Stanislav Fomichev wrote:
> > > +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype)
> > > +{
> > > +}
> > > +
> > > +void bpf_cgroup_atype_put(int cgroup_atype)
> > > +{
> > > +}
> > From the test bot report, these two empty functions may need
> > to be inlined in a .h or else the caller needs to have a CONFIG_CGROUP_BPF
> > before calling bpf_cgroup_atype_get().  The bpf-cgroup.h may be a better place
> > than bpf.h for the inlines but not sure if it is easy to be included in
> > trampoline.c or core.c.  Whatever way makes more sense.  Either .h is fine.
> 
> Yeah, I already moved them into headers after the complaints from the
> bot. Thanks for double checking!
> 
> Let's keep in bpf.h ?
ok.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP
  2022-06-17 18:28     ` Stanislav Fomichev
@ 2022-06-17 22:29       ` Martin KaFai Lau
  0 siblings, 0 replies; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 22:29 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Jun 17, 2022 at 11:28:21AM -0700, Stanislav Fomichev wrote:
> On Thu, Jun 16, 2022 at 5:58 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 09:57:58AM -0700, Stanislav Fomichev wrote:
> > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > index ba402d50e130..c869317479ec 100644
> > > --- a/kernel/bpf/cgroup.c
> > > +++ b/kernel/bpf/cgroup.c
> > > @@ -1029,57 +1029,92 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> > >  static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> > >                             union bpf_attr __user *uattr)
> > >  {
> > > +     __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
> > >       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> > >       enum bpf_attach_type type = attr->query.attach_type;
> > > +     enum cgroup_bpf_attach_type from_atype, to_atype;
> > >       enum cgroup_bpf_attach_type atype;
> > >       struct bpf_prog_array *effective;
> > >       struct hlist_head *progs;
> > >       struct bpf_prog *prog;
> > >       int cnt, ret = 0, i;
> > > +     int total_cnt = 0;
> > >       u32 flags;
> > >
> > > -     atype = to_cgroup_bpf_attach_type(type);
> > > -     if (atype < 0)
> > > -             return -EINVAL;
> > > +     if (type == BPF_LSM_CGROUP) {
> > > +             if (attr->query.prog_cnt && prog_ids && !prog_attach_flags)
> > > +                     return -EINVAL;
> > >
> > > -     progs = &cgrp->bpf.progs[atype];
> > > -     flags = cgrp->bpf.flags[atype];
> > > +             from_atype = CGROUP_LSM_START;
> > > +             to_atype = CGROUP_LSM_END;
> > > +             flags = 0;
> > > +     } else {
> > > +             from_atype = to_cgroup_bpf_attach_type(type);
> > > +             if (from_atype < 0)
> > > +                     return -EINVAL;
> > > +             to_atype = from_atype;
> > > +             flags = cgrp->bpf.flags[from_atype];
> > > +     }
> > >
> > > -     effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > -                                           lockdep_is_held(&cgroup_mutex));
> > > +     for (atype = from_atype; atype <= to_atype; atype++) {
> > > +             progs = &cgrp->bpf.progs[atype];
> > nit. Move the 'progs = ...' into the 'else {}' case below.
> >
> > >
> > > -     if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE)
> > > -             cnt = bpf_prog_array_length(effective);
> > > -     else
> > > -             cnt = prog_list_length(progs);
> > > +             if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> > > +                     effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > +                                                           lockdep_is_held(&cgroup_mutex));
> > > +                     total_cnt += bpf_prog_array_length(effective);
> > > +             } else {
> > > +                     total_cnt += prog_list_length(progs);
> > > +             }
> > > +     }
> > >
> > >       if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
> > >               return -EFAULT;
> > > -     if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt)))
> > > +     if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
> > >               return -EFAULT;
> > > -     if (attr->query.prog_cnt == 0 || !prog_ids || !cnt)
> > > +     if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
> > >               /* return early if user requested only program count + flags */
> > >               return 0;
> > > -     if (attr->query.prog_cnt < cnt) {
> > > -             cnt = attr->query.prog_cnt;
> > > +
> > > +     if (attr->query.prog_cnt < total_cnt) {
> > > +             total_cnt = attr->query.prog_cnt;
> > >               ret = -ENOSPC;
> > >       }
> > >
> > > -     if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> > > -             return bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> > > -     } else {
> > > -             struct bpf_prog_list *pl;
> > > -             u32 id;
> > > +     for (atype = from_atype; atype <= to_atype && total_cnt; atype++) {
> > > +             progs = &cgrp->bpf.progs[atype];
> > same here.
> >
> > > +             flags = cgrp->bpf.flags[atype];
> > and the 'flags = ...' can be moved to 'if (prog_attach_flags) {}'
> >
> > Others lgtm.
> >
> > Reviewed-by: Martin KaFai Lau <kafai@fb.com>
> 
> Everything makes sense, will do, thanks!
> 
> Maybe we should also move "struct hlist_head *progs;" closer to the
> places where we use them? Same for "struct bpf_prog *prog;" which
> seems to be used only in one place.
Yep. SG. Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup
  2022-06-17 18:28     ` Stanislav Fomichev
@ 2022-06-17 23:07       ` Martin KaFai Lau
  2022-06-21 17:51         ` Stanislav Fomichev
  0 siblings, 1 reply; 31+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 23:07 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii, Yonghong Song

On Fri, Jun 17, 2022 at 11:28:24AM -0700, Stanislav Fomichev wrote:
> On Thu, Jun 16, 2022 at 10:42 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 09:57:59AM -0700, Stanislav Fomichev wrote:
> > > I don't see how to make it nice without introducing btf id lists
> > > for the hooks where these helpers are allowed. Some LSM hooks
> > > work on the locked sockets, some are triggering early and
> > > don't grab any locks, so have two lists for now:
> > >
> > > 1. LSM hooks which trigger under socket lock - minority of the hooks,
> > >    but ideal case for us, we can expose existing BTF-based helpers
> > > 2. LSM hooks which trigger without socket lock, but they trigger
> > >    early in the socket creation path where it should be safe to
> > >    do setsockopt without any locks
> > > 3. The rest are prohibited. I'm thinking that this use-case might
> > >    be a good gateway to sleeping lsm cgroup hooks in the future.
> > >    We can either expose lock/unlock operations (and add tracking
> > >    to the verifier) or have another set of bpf_setsockopt
> > >    wrapper that grab the locks and might sleep.
> > Another possibility is to acquire/release the sk lock in
> > __bpf_prog_{enter,exit}_lsm_cgroup().  However, it will unnecessarily
> > acquire it even the prog is not doing any get/setsockopt.
> > It probably can make some checking to avoid the lock...etc. :/
> >
> > sleepable bpf-prog is a cleaner way out.  From a quick look,
> > cgroup_storage is not safe for sleepable bpf-prog.
> 
> Is it because it's using non-trace-flavor of rcu?
Right, and commit 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs")
is to make it work for both flavors.

> 
> > All other BPF_MAP_TYPE_{SK,INODE,TASK}_STORAGE is already
> > safe once their common infra in bpf_local_storage.c was made
> > sleepable-safe.
> 
> That might be another argument in favor of replacing the internal
> implementation for cgroup_storage with the generic framework we use
> for sk/inode/task.
It could be a new map type to support sk/inode/task style of local storage.

I am seeing use cases that the bpf prog is not a cgroup-bpf prog
and it has a hold of the cgroup pointer.  It ends up creating a bpf hashmap with
the cg_id as the key.  For example,
https://lore.kernel.org/bpf/20220610194435.2268290-9-yosryahmed@google.com/
It will be useful to support this use case for cgroup as sk/inode/task
storage does.  A quick thought is it needs another map_type because
of different helper interface, e.g. the bpf prog can create and
delete a sk/inode/task storage.

> 
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  include/linux/bpf.h  |  2 ++
> > >  kernel/bpf/bpf_lsm.c | 40 +++++++++++++++++++++++++++++
> > >  net/core/filter.c    | 60 ++++++++++++++++++++++++++++++++++++++------
> > >  3 files changed, 95 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 503f28fa66d2..c0a269269882 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -2282,6 +2282,8 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
> > >  extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
> > >  extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
> > >  extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
> > > +extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
> > > +extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
> > >  extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
> > >  extern const struct bpf_func_proto bpf_find_vma_proto;
> > >  extern const struct bpf_func_proto bpf_loop_proto;
> > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > index 83aa431dd52e..52b6e3067986 100644
> > > --- a/kernel/bpf/bpf_lsm.c
> > > +++ b/kernel/bpf/bpf_lsm.c
> > > @@ -45,6 +45,26 @@ BTF_ID(func, bpf_lsm_sk_alloc_security)
> > >  BTF_ID(func, bpf_lsm_sk_free_security)
> > >  BTF_SET_END(bpf_lsm_current_hooks)
> > >
> > > +/* List of LSM hooks that trigger while the socket is properly locked.
> > > + */
> > > +BTF_SET_START(bpf_lsm_locked_sockopt_hooks)
> > > +BTF_ID(func, bpf_lsm_socket_sock_rcv_skb)
> > > +BTF_ID(func, bpf_lsm_sk_clone_security)
> > From looking how security_sk_clone() is used at sock_copy(),
> > it has two sk args, one is listen sk and one is the clone.
> > I think both of them are not locked.
> >
> > The bpf_lsm_inet_csk_clone below should be enough to
> > do setsockopt in the new clone?
> 
> Hm, good point, let me drop this one.
> 
> I wonder if long term, instead of those lists, we can annotate the
> arguments with __locked or __unlocked (the way we do with __user
> pointers)? That might be more scalable and we can let sleepable bpf
> deal with __unlocked cases. Just thinking out loud...
I think the btf_tag may help here. Cc: Yonghong.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup
  2022-06-17 23:07       ` Martin KaFai Lau
@ 2022-06-21 17:51         ` Stanislav Fomichev
  0 siblings, 0 replies; 31+ messages in thread
From: Stanislav Fomichev @ 2022-06-21 17:51 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii, Yonghong Song

On Fri, Jun 17, 2022 at 4:08 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Jun 17, 2022 at 11:28:24AM -0700, Stanislav Fomichev wrote:
> > On Thu, Jun 16, 2022 at 10:42 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Fri, Jun 10, 2022 at 09:57:59AM -0700, Stanislav Fomichev wrote:
> > > > I don't see how to make it nice without introducing btf id lists
> > > > for the hooks where these helpers are allowed. Some LSM hooks
> > > > work on the locked sockets, some are triggering early and
> > > > don't grab any locks, so have two lists for now:
> > > >
> > > > 1. LSM hooks which trigger under socket lock - minority of the hooks,
> > > >    but ideal case for us, we can expose existing BTF-based helpers
> > > > 2. LSM hooks which trigger without socket lock, but they trigger
> > > >    early in the socket creation path where it should be safe to
> > > >    do setsockopt without any locks
> > > > 3. The rest are prohibited. I'm thinking that this use-case might
> > > >    be a good gateway to sleeping lsm cgroup hooks in the future.
> > > >    We can either expose lock/unlock operations (and add tracking
> > > >    to the verifier) or have another set of bpf_setsockopt
> > > >    wrapper that grab the locks and might sleep.
> > > Another possibility is to acquire/release the sk lock in
> > > __bpf_prog_{enter,exit}_lsm_cgroup().  However, it will unnecessarily
> > > acquire it even the prog is not doing any get/setsockopt.
> > > It probably can make some checking to avoid the lock...etc. :/
> > >
> > > sleepable bpf-prog is a cleaner way out.  From a quick look,
> > > cgroup_storage is not safe for sleepable bpf-prog.
> >
> > Is it because it's using non-trace-flavor of rcu?
> Right, and commit 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs")
> is to make it work for both flavors.
>
> >
> > > All other BPF_MAP_TYPE_{SK,INODE,TASK}_STORAGE is already
> > > safe once their common infra in bpf_local_storage.c was made
> > > sleepable-safe.
> >
> > That might be another argument in favor of replacing the internal
> > implementation for cgroup_storage with the generic framework we use
> > for sk/inode/task.
> It could be a new map type to support sk/inode/task style of local storage.
>
> I am seeing use cases that the bpf prog is not a cgroup-bpf prog
> and it has a hold of the cgroup pointer.  It ends up creating a bpf hashmap with
> the cg_id as the key.  For example,
> https://lore.kernel.org/bpf/20220610194435.2268290-9-yosryahmed@google.com/
> It will be useful to support this use case for cgroup as sk/inode/task
> storage does.  A quick thought is it needs another map_type because
> of different helper interface, e.g. the bpf prog can create and
> delete a sk/inode/task storage.

Good point. We've also discussed that new map type internally with
Yosry. And for me the biggest issue with a new map was some major
differentiating factor from the existing one. Making it work with
non-cgroup progs might be it. Another, as you mention, is the ability
to remove the value. Having special treatment for
bpf_get_local_storage (in terms of always assuming non-null return
value) might be problematic for the internal conversion to the common
storage framework :-(

> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > > >  include/linux/bpf.h  |  2 ++
> > > >  kernel/bpf/bpf_lsm.c | 40 +++++++++++++++++++++++++++++
> > > >  net/core/filter.c    | 60 ++++++++++++++++++++++++++++++++++++++------
> > > >  3 files changed, 95 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 503f28fa66d2..c0a269269882 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -2282,6 +2282,8 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
> > > >  extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
> > > >  extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
> > > >  extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
> > > > +extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
> > > > +extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
> > > >  extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
> > > >  extern const struct bpf_func_proto bpf_find_vma_proto;
> > > >  extern const struct bpf_func_proto bpf_loop_proto;
> > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > index 83aa431dd52e..52b6e3067986 100644
> > > > --- a/kernel/bpf/bpf_lsm.c
> > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > @@ -45,6 +45,26 @@ BTF_ID(func, bpf_lsm_sk_alloc_security)
> > > >  BTF_ID(func, bpf_lsm_sk_free_security)
> > > >  BTF_SET_END(bpf_lsm_current_hooks)
> > > >
> > > > +/* List of LSM hooks that trigger while the socket is properly locked.
> > > > + */
> > > > +BTF_SET_START(bpf_lsm_locked_sockopt_hooks)
> > > > +BTF_ID(func, bpf_lsm_socket_sock_rcv_skb)
> > > > +BTF_ID(func, bpf_lsm_sk_clone_security)
> > > From looking how security_sk_clone() is used at sock_copy(),
> > > it has two sk args, one is listen sk and one is the clone.
> > > I think both of them are not locked.
> > >
> > > The bpf_lsm_inet_csk_clone below should be enough to
> > > do setsockopt in the new clone?
> >
> > Hm, good point, let me drop this one.
> >
> > I wonder if long term, instead of those lists, we can annotate the
> > arguments with __locked or __unlocked (the way we do with __user
> > pointers)? That might be more scalable and we can let sleepable bpf
> > deal with __unlocked cases. Just thinking out loud...
> I think the btf_tag may help here. Cc: Yonghong.

Exactly. I haven't looked closely, but that seems like the right thing
to leverage. Thx!

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-06-21 17:51 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 16:57 [PATCH bpf-next v9 00/10] bpf: cgroup_sock lsm flavor Stanislav Fomichev
2022-06-10 16:57 ` [PATCH bpf-next v9 01/10] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
2022-06-16 19:53   ` Martin KaFai Lau
2022-06-10 16:57 ` [PATCH bpf-next v9 02/10] bpf: convert cgroup_bpf.progs to hlist Stanislav Fomichev
2022-06-16 19:59   ` Martin KaFai Lau
2022-06-10 16:57 ` [PATCH bpf-next v9 03/10] bpf: per-cgroup lsm flavor Stanislav Fomichev
2022-06-16 22:25   ` Martin KaFai Lau
2022-06-17 18:28     ` Stanislav Fomichev
2022-06-17 22:25       ` Martin KaFai Lau
2022-06-10 16:57 ` [PATCH bpf-next v9 04/10] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
2022-06-11 16:53   ` kernel test robot
2022-06-17  0:43   ` Martin KaFai Lau
2022-06-17 18:28     ` Stanislav Fomichev
2022-06-17 22:27       ` Martin KaFai Lau
2022-06-10 16:57 ` [PATCH bpf-next v9 05/10] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP Stanislav Fomichev
2022-06-17  0:58   ` Martin KaFai Lau
2022-06-17 18:28     ` Stanislav Fomichev
2022-06-17 22:29       ` Martin KaFai Lau
2022-06-10 16:57 ` [PATCH bpf-next v9 06/10] bpf: expose bpf_{g,s}etsockopt to lsm cgroup Stanislav Fomichev
2022-06-17  5:42   ` Martin KaFai Lau
2022-06-17 18:28     ` Stanislav Fomichev
2022-06-17 23:07       ` Martin KaFai Lau
2022-06-21 17:51         ` Stanislav Fomichev
2022-06-10 16:58 ` [PATCH bpf-next v9 07/10] libbpf: add lsm_cgoup_sock type Stanislav Fomichev
2022-06-10 16:58 ` [PATCH bpf-next v9 08/10] libbpf: implement bpf_prog_query_opts Stanislav Fomichev
2022-06-10 16:58 ` [PATCH bpf-next v9 09/10] bpftool: implement cgroup tree for BPF_LSM_CGROUP Stanislav Fomichev
2022-06-13 12:07   ` Quentin Monnet
2022-06-13 15:53     ` Stanislav Fomichev
2022-06-17  5:58   ` Martin KaFai Lau
2022-06-17 18:28     ` Stanislav Fomichev
2022-06-10 16:58 ` [PATCH bpf-next v9 10/10] selftests/bpf: lsm_cgroup functional test Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).