bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor
@ 2022-04-07 22:31 Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 1/7] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev, kafai, kpsingh

This series implements new lsm flavor for attaching per-cgroup programs to
existing lsm hooks. The cgroup is taken out of 'current', unless
the first argument of the hook is 'struct socket'. In this case,
the cgroup association is taken out of socket. The attachment
looks like a regular per-cgroup attachment: we add new BPF_LSM_CGROUP
attach type which, together with attach_btf_id, signals per-cgroup lsm.
Behind the scenes, we allocate trampoline shim program and
attach to lsm. This program looks up cgroup from current/socket
and runs cgroup's effective prog array. The rest of the per-cgroup BPF
stays the same: hierarchy, local storage, retval conventions
(return 1 == success).

Current limitations:
* haven't considered sleepable bpf; can be extended later on
* not sure the verifier does the right thing with null checks;
  see latest selftest for details
* total of 10 (global) per-cgroup LSM attach points; this bloats
  bpf_cgroup a bit

Cc: ast@kernel.org
Cc: daniel@iogearbox.net
Cc: kafai@fb.com
Cc: kpsingh@kernel.org

v3:
- add BPF_LSM_CGROUP to bpftool
- use simple int instead of refcnt_t (to avoid use-after-free
  false positive)

v2:
- addressed build bot failures

Stanislav Fomichev (7):
  bpf: add bpf_func_t and trampoline helpers
  bpf: per-cgroup lsm flavor
  bpf: minimize number of allocated lsm slots per program
  bpf: allow writing to a subset of sock fields from lsm progtype
  libbpf: add lsm_cgoup_sock type
  selftests/bpf: lsm_cgroup functional test
  selftests/bpf: verify lsm_cgroup struct sock access

 include/linux/bpf-cgroup-defs.h               |   8 +
 include/linux/bpf.h                           |  24 +-
 include/linux/bpf_lsm.h                       |   8 +
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/bpf_lsm.c                          | 147 ++++++++++++
 kernel/bpf/btf.c                              |  11 +
 kernel/bpf/cgroup.c                           | 210 ++++++++++++++++--
 kernel/bpf/syscall.c                          |  10 +
 kernel/bpf/trampoline.c                       | 205 ++++++++++++++---
 kernel/bpf/verifier.c                         |   4 +-
 tools/bpf/bpftool/common.c                    |   1 +
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/libbpf.c                        |   2 +
 .../selftests/bpf/prog_tests/lsm_cgroup.c     | 158 +++++++++++++
 .../testing/selftests/bpf/progs/lsm_cgroup.c  |  94 ++++++++
 tools/testing/selftests/bpf/test_verifier.c   |  54 ++++-
 .../selftests/bpf/verifier/lsm_cgroup.c       |  34 +++
 17 files changed, 914 insertions(+), 58 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/progs/lsm_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/verifier/lsm_cgroup.c

-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 1/7] bpf: add bpf_func_t and trampoline helpers
  2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
@ 2022-04-07 22:31 ` Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor Stanislav Fomichev
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

I'll be adding lsm cgroup specific helpers that grab
trampoline mutex.

No functional changes.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h     | 11 ++++---
 kernel/bpf/trampoline.c | 63 +++++++++++++++++++++++------------------
 2 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb5298735ce..487aba40ce52 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -52,6 +52,8 @@ typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
 					struct bpf_iter_aux_info *aux);
 typedef void (*bpf_iter_fini_seq_priv_t)(void *private_data);
+typedef unsigned int (*bpf_func_t)(const void *,
+				   const struct bpf_insn *);
 struct bpf_iter_seq_info {
 	const struct seq_operations *seq_ops;
 	bpf_iter_init_seq_priv_t init_seq_private;
@@ -798,8 +800,7 @@ struct bpf_dispatcher {
 static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func(
 	const void *ctx,
 	const struct bpf_insn *insnsi,
-	unsigned int (*bpf_func)(const void *,
-				 const struct bpf_insn *))
+	bpf_func_t bpf_func)
 {
 	return bpf_func(ctx, insnsi);
 }
@@ -827,8 +828,7 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs);
 	noinline __nocfi unsigned int bpf_dispatcher_##name##_func(	\
 		const void *ctx,					\
 		const struct bpf_insn *insnsi,				\
-		unsigned int (*bpf_func)(const void *,			\
-					 const struct bpf_insn *))	\
+		bpf_func_t bpf_func)					\
 	{								\
 		return bpf_func(ctx, insnsi);				\
 	}								\
@@ -839,8 +839,7 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs);
 	unsigned int bpf_dispatcher_##name##_func(			\
 		const void *ctx,					\
 		const struct bpf_insn *insnsi,				\
-		unsigned int (*bpf_func)(const void *,			\
-					 const struct bpf_insn *));	\
+		bpf_func_t bpf_func);					\
 	extern struct bpf_dispatcher bpf_dispatcher_##name;
 #define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_##name##_func
 #define BPF_DISPATCHER_PTR(name) (&bpf_dispatcher_##name)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index ada97751ae1b..0c4fd194e801 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -407,42 +407,34 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
 	}
 }
 
-int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
+static int __bpf_trampoline_link_prog(struct bpf_prog *prog,
+				      struct bpf_trampoline *tr)
 {
 	enum bpf_tramp_prog_type kind;
 	int err = 0;
 	int cnt;
 
 	kind = bpf_attach_type_to_tramp(prog);
-	mutex_lock(&tr->mutex);
-	if (tr->extension_prog) {
+	if (tr->extension_prog)
 		/* cannot attach fentry/fexit if extension prog is attached.
 		 * cannot overwrite extension prog either.
 		 */
-		err = -EBUSY;
-		goto out;
-	}
+		return -EBUSY;
+
 	cnt = tr->progs_cnt[BPF_TRAMP_FENTRY] + tr->progs_cnt[BPF_TRAMP_FEXIT];
 	if (kind == BPF_TRAMP_REPLACE) {
 		/* Cannot attach extension if fentry/fexit are in use. */
-		if (cnt) {
-			err = -EBUSY;
-			goto out;
-		}
+		if (cnt)
+			return -EBUSY;
 		tr->extension_prog = prog;
-		err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP, NULL,
-					 prog->bpf_func);
-		goto out;
-	}
-	if (cnt >= BPF_MAX_TRAMP_PROGS) {
-		err = -E2BIG;
-		goto out;
+		return bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP, NULL,
+					  prog->bpf_func);
 	}
-	if (!hlist_unhashed(&prog->aux->tramp_hlist)) {
+	if (cnt >= BPF_MAX_TRAMP_PROGS)
+		return -E2BIG;
+	if (!hlist_unhashed(&prog->aux->tramp_hlist))
 		/* prog already linked */
-		err = -EBUSY;
-		goto out;
-	}
+		return -EBUSY;
 	hlist_add_head(&prog->aux->tramp_hlist, &tr->progs_hlist[kind]);
 	tr->progs_cnt[kind]++;
 	err = bpf_trampoline_update(tr);
@@ -450,30 +442,45 @@ int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
 		hlist_del_init(&prog->aux->tramp_hlist);
 		tr->progs_cnt[kind]--;
 	}
-out:
+	return err;
+}
+
+int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
+{
+	int err;
+
+	mutex_lock(&tr->mutex);
+	err = __bpf_trampoline_link_prog(prog, tr);
 	mutex_unlock(&tr->mutex);
 	return err;
 }
 
-/* bpf_trampoline_unlink_prog() should never fail. */
-int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
+static int __bpf_trampoline_unlink_prog(struct bpf_prog *prog,
+					struct bpf_trampoline *tr)
 {
 	enum bpf_tramp_prog_type kind;
 	int err;
 
 	kind = bpf_attach_type_to_tramp(prog);
-	mutex_lock(&tr->mutex);
 	if (kind == BPF_TRAMP_REPLACE) {
 		WARN_ON_ONCE(!tr->extension_prog);
 		err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP,
 					 tr->extension_prog->bpf_func, NULL);
 		tr->extension_prog = NULL;
-		goto out;
+		return err;
 	}
 	hlist_del_init(&prog->aux->tramp_hlist);
 	tr->progs_cnt[kind]--;
-	err = bpf_trampoline_update(tr);
-out:
+	return bpf_trampoline_update(tr);
+}
+
+/* bpf_trampoline_unlink_prog() should never fail. */
+int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
+{
+	int err;
+
+	mutex_lock(&tr->mutex);
+	err = __bpf_trampoline_unlink_prog(prog, tr);
 	mutex_unlock(&tr->mutex);
 	return err;
 }
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 1/7] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
@ 2022-04-07 22:31 ` Stanislav Fomichev
  2022-04-08 14:20   ` kernel test robot
                     ` (3 more replies)
  2022-04-07 22:31 ` [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
                   ` (4 subsequent siblings)
  6 siblings, 4 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

Allow attaching to lsm hooks in the cgroup context.

Attaching to per-cgroup LSM works exactly like attaching
to other per-cgroup hooks. New BPF_LSM_CGROUP is added
to trigger new mode; the actual lsm hook we attach to is
signaled via existing attach_btf_id.

For the hooks that have 'struct socket' as its first argument,
we use the cgroup associated with that socket. For the rest,
we use 'current' cgroup (this is all on default hierarchy == v2 only).
Note that for the hooks that work on 'struct sock' we still
take the cgroup from 'current' because most of the time,
the 'sock' argument is not properly initialized.

Behind the scenes, we allocate a shim program that is attached
to the trampoline and runs cgroup effective BPF programs array.
This shim has some rudimentary ref counting and can be shared
between several programs attaching to the same per-cgroup lsm hook.

Note that this patch bloats cgroup size because we add 211
cgroup_bpf_attach_type(s) for simplicity sake. This will be
addressed in the subsequent patch.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf-cgroup-defs.h |   6 ++
 include/linux/bpf.h             |  13 +++
 include/linux/bpf_lsm.h         |  14 ++++
 include/uapi/linux/bpf.h        |   1 +
 kernel/bpf/bpf_lsm.c            |  92 ++++++++++++++++++++
 kernel/bpf/btf.c                |  11 +++
 kernel/bpf/cgroup.c             | 116 ++++++++++++++++++++++---
 kernel/bpf/syscall.c            |  10 +++
 kernel/bpf/trampoline.c         | 144 ++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c           |   1 +
 tools/include/uapi/linux/bpf.h  |   1 +
 11 files changed, 397 insertions(+), 12 deletions(-)

diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
index 695d1224a71b..6c661b4df9fa 100644
--- a/include/linux/bpf-cgroup-defs.h
+++ b/include/linux/bpf-cgroup-defs.h
@@ -10,6 +10,8 @@
 
 struct bpf_prog_array;
 
+#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
+
 enum cgroup_bpf_attach_type {
 	CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
 	CGROUP_INET_INGRESS = 0,
@@ -35,6 +37,10 @@ enum cgroup_bpf_attach_type {
 	CGROUP_INET4_GETSOCKNAME,
 	CGROUP_INET6_GETSOCKNAME,
 	CGROUP_INET_SOCK_RELEASE,
+#ifdef CONFIG_BPF_LSM
+	CGROUP_LSM_START,
+	CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
+#endif
 	MAX_CGROUP_BPF_ATTACH_TYPE
 };
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 487aba40ce52..17bbe2f7b2be 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -807,6 +807,9 @@ static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func(
 #ifdef CONFIG_BPF_JIT
 int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
+int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
+				    struct bpf_attach_target_info *tgt_info);
+void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog);
 struct bpf_trampoline *bpf_trampoline_get(u64 key,
 					  struct bpf_attach_target_info *tgt_info);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
@@ -865,6 +868,14 @@ static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog,
 {
 	return -ENOTSUPP;
 }
+static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
+						  struct bpf_attach_target_info *tgt_info)
+{
+	return -EOPNOTSUPP;
+}
+static inline void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
+{
+}
 static inline struct bpf_trampoline *bpf_trampoline_get(u64 key,
 							struct bpf_attach_target_info *tgt_info)
 {
@@ -980,6 +991,7 @@ struct bpf_prog_aux {
 	u64 load_time; /* ns since boottime */
 	u32 verified_insns;
 	struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
+	int cgroup_atype; /* enum cgroup_bpf_attach_type */
 	char name[BPF_OBJ_NAME_LEN];
 #ifdef CONFIG_SECURITY
 	void *security;
@@ -2383,6 +2395,7 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len);
 
 struct btf_id_set;
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
+int btf_id_set_index(const struct btf_id_set *set, u32 id);
 
 #define MAX_BPRINTF_VARARGS		12
 
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index 479c101546ad..7f0e59f5f9be 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -42,6 +42,9 @@ extern const struct bpf_func_proto bpf_inode_storage_get_proto;
 extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
 void bpf_inode_storage_free(struct inode *inode);
 
+int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
+int bpf_lsm_hook_idx(u32 btf_id);
+
 #else /* !CONFIG_BPF_LSM */
 
 static inline bool bpf_lsm_is_sleepable_hook(u32 btf_id)
@@ -65,6 +68,17 @@ static inline void bpf_inode_storage_free(struct inode *inode)
 {
 }
 
+static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
+					   bpf_func_t *bpf_func)
+{
+	return -ENOENT;
+}
+
+static inline int bpf_lsm_hook_idx(u32 btf_id)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_BPF_LSM */
 
 #endif /* _LINUX_BPF_LSM_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d14b10b85e51..bbe48a2dd852 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -998,6 +998,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
 	BPF_TRACE_KPROBE_MULTI,
+	BPF_LSM_CGROUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 064eccba641d..eca258ba71d8 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -35,6 +35,98 @@ BTF_SET_START(bpf_lsm_hooks)
 #undef LSM_HOOK
 BTF_SET_END(bpf_lsm_hooks)
 
+static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
+						const struct bpf_insn *insn)
+{
+	const struct bpf_prog *prog;
+	struct socket *sock;
+	struct cgroup *cgrp;
+	struct sock *sk;
+	int ret = 0;
+	u64 *regs;
+
+	regs = (u64 *)ctx;
+	sock = (void *)(unsigned long)regs[BPF_REG_0];
+	/*prog = container_of(insn, struct bpf_prog, insnsi);*/
+	prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
+
+	if (unlikely(!sock))
+		return 0;
+
+	sk = sock->sk;
+	if (unlikely(!sk))
+		return 0;
+
+	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+	if (likely(cgrp))
+		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
+					    ctx, bpf_prog_run, 0);
+	return ret;
+}
+
+static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
+						 const struct bpf_insn *insn)
+{
+	const struct bpf_prog *prog;
+	struct cgroup *cgrp;
+	int ret = 0;
+
+	if (unlikely(!current))
+		return 0;
+
+	/*prog = container_of(insn, struct bpf_prog, insnsi);*/
+	prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	if (likely(cgrp))
+		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
+					    ctx, bpf_prog_run, 0);
+	rcu_read_unlock();
+	return ret;
+}
+
+int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
+			     bpf_func_t *bpf_func)
+{
+	const struct btf_type *first_arg_type;
+	const struct btf_type *sock_type;
+	const struct btf *btf_vmlinux;
+	const struct btf_param *args;
+	s32 type_id;
+
+	if (!prog->aux->attach_func_proto ||
+	    !btf_type_is_func_proto(prog->aux->attach_func_proto))
+		return -EINVAL;
+
+	if (btf_type_vlen(prog->aux->attach_func_proto) < 1)
+		return -EINVAL;
+
+	args = (const struct btf_param *)(prog->aux->attach_func_proto + 1);
+
+	btf_vmlinux = bpf_get_btf_vmlinux();
+	if (!btf_vmlinux)
+		return -EINVAL;
+
+	type_id = btf_find_by_name_kind(btf_vmlinux, "socket", BTF_KIND_STRUCT);
+	if (type_id < 0)
+		return -EINVAL;
+	sock_type = btf_type_by_id(btf_vmlinux, type_id);
+
+	first_arg_type = btf_type_resolve_ptr(btf_vmlinux, args[0].type, NULL);
+	if (first_arg_type == sock_type)
+		*bpf_func = __cgroup_bpf_run_lsm_socket;
+	else
+		*bpf_func = __cgroup_bpf_run_lsm_current;
+
+	return 0;
+}
+
+int bpf_lsm_hook_idx(u32 btf_id)
+{
+	return btf_id_set_index(&bpf_lsm_hooks, btf_id);
+}
+
 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog)
 {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0918a39279f6..4199de31f49c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -4971,6 +4971,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 
 	if (arg == nr_args) {
 		switch (prog->expected_attach_type) {
+		case BPF_LSM_CGROUP:
 		case BPF_LSM_MAC:
 		case BPF_TRACE_FEXIT:
 			/* When LSM programs are attached to void LSM hooks
@@ -6396,6 +6397,16 @@ static int btf_id_cmp_func(const void *a, const void *b)
 	return *pa - *pb;
 }
 
+int btf_id_set_index(const struct btf_id_set *set, u32 id)
+{
+	const u32 *p;
+
+	p = bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func);
+	if (!p)
+		return -1;
+	return p - set->ids;
+}
+
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
 {
 	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 128028efda64..8c77703954f7 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -14,6 +14,9 @@
 #include <linux/string.h>
 #include <linux/bpf.h>
 #include <linux/bpf-cgroup.h>
+#include <linux/btf_ids.h>
+#include <linux/bpf_lsm.h>
+#include <linux/bpf_verifier.h>
 #include <net/sock.h>
 #include <net/bpf_sk_storage.h>
 
@@ -22,6 +25,18 @@
 DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE);
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
 
+#ifdef CONFIG_BPF_LSM
+static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
+{
+	return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
+}
+#else
+static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 void cgroup_bpf_offline(struct cgroup *cgrp)
 {
 	cgroup_get(cgrp);
@@ -89,6 +104,14 @@ static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
 		bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
 }
 
+static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		bpf_cgroup_storage_unlink(storages[stype]);
+}
+
 /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
  * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
  * doesn't free link memory, which will eventually be done by bpf_link's
@@ -100,6 +123,15 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
 	link->cgroup = NULL;
 }
 
+static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
+					enum cgroup_bpf_attach_type atype)
+{
+	if (!prog || atype != prog->aux->cgroup_atype)
+		return;
+
+	bpf_trampoline_unlink_cgroup_shim(prog);
+}
+
 /**
  * cgroup_bpf_release() - put references of all bpf programs and
  *                        release all cgroup bpf data
@@ -123,10 +155,16 @@ static void cgroup_bpf_release(struct work_struct *work)
 
 		list_for_each_entry_safe(pl, pltmp, progs, node) {
 			list_del(&pl->node);
-			if (pl->prog)
+			if (pl->prog) {
+				bpf_cgroup_lsm_shim_release(pl->prog,
+							    atype);
 				bpf_prog_put(pl->prog);
-			if (pl->link)
+			}
+			if (pl->link) {
+				bpf_cgroup_lsm_shim_release(pl->link->link.prog,
+							    atype);
 				bpf_cgroup_link_auto_detach(pl->link);
+			}
 			kfree(pl);
 			static_branch_dec(&cgroup_bpf_enabled_key[atype]);
 		}
@@ -439,6 +477,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
 	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
+	struct bpf_attach_target_info tgt_info = {};
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog_list *pl;
 	struct list_head *progs;
@@ -455,9 +494,31 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 		/* replace_prog implies BPF_F_REPLACE, and vice versa */
 		return -EINVAL;
 
-	atype = to_cgroup_bpf_attach_type(type);
-	if (atype < 0)
-		return -EINVAL;
+	if (type == BPF_LSM_CGROUP) {
+		struct bpf_prog *p = prog ? : link->link.prog;
+
+		if (replace_prog) {
+			/* Reusing shim from the original program.
+			 */
+			atype = replace_prog->aux->cgroup_atype;
+		} else {
+			err = bpf_check_attach_target(NULL, p, NULL,
+						      p->aux->attach_btf_id,
+						      &tgt_info);
+			if (err)
+				return -EINVAL;
+
+			atype = bpf_lsm_attach_type_get(p->aux->attach_btf_id);
+			if (atype < 0)
+				return atype;
+		}
+
+		p->aux->cgroup_atype = atype;
+	} else {
+		atype = to_cgroup_bpf_attach_type(type);
+		if (atype < 0)
+			return -EINVAL;
+	}
 
 	progs = &cgrp->bpf.progs[atype];
 
@@ -503,13 +564,27 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (err)
 		goto cleanup;
 
+	bpf_cgroup_storages_link(new_storage, cgrp, type);
+
+	if (type == BPF_LSM_CGROUP && !old_prog) {
+		struct bpf_prog *p = prog ? : link->link.prog;
+		int err;
+
+		err = bpf_trampoline_link_cgroup_shim(p, &tgt_info);
+		if (err)
+			goto cleanup_trampoline;
+	}
+
 	if (old_prog)
 		bpf_prog_put(old_prog);
 	else
 		static_branch_inc(&cgroup_bpf_enabled_key[atype]);
-	bpf_cgroup_storages_link(new_storage, cgrp, type);
+
 	return 0;
 
+cleanup_trampoline:
+	bpf_cgroup_storages_unlink(new_storage);
+
 cleanup:
 	if (old_prog) {
 		pl->prog = old_prog;
@@ -601,9 +676,13 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
 	struct list_head *progs;
 	bool found = false;
 
-	atype = to_cgroup_bpf_attach_type(link->type);
-	if (atype < 0)
-		return -EINVAL;
+	if (link->type == BPF_LSM_CGROUP) {
+		atype = link->link.prog->aux->cgroup_atype;
+	} else {
+		atype = to_cgroup_bpf_attach_type(link->type);
+		if (atype < 0)
+			return -EINVAL;
+	}
 
 	progs = &cgrp->bpf.progs[atype];
 
@@ -619,6 +698,9 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
 	if (!found)
 		return -ENOENT;
 
+	if (link->type == BPF_LSM_CGROUP)
+		new_prog->aux->cgroup_atype = atype;
+
 	old_prog = xchg(&link->link.prog, new_prog);
 	replace_effective_prog(cgrp, atype, link);
 	bpf_prog_put(old_prog);
@@ -702,9 +784,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	u32 flags;
 	int err;
 
-	atype = to_cgroup_bpf_attach_type(type);
-	if (atype < 0)
-		return -EINVAL;
+	if (type == BPF_LSM_CGROUP) {
+		struct bpf_prog *p = prog ? : link->link.prog;
+
+		atype = p->aux->cgroup_atype;
+	} else {
+		atype = to_cgroup_bpf_attach_type(type);
+		if (atype < 0)
+			return -EINVAL;
+	}
 
 	progs = &cgrp->bpf.progs[atype];
 	flags = cgrp->bpf.flags[atype];
@@ -726,6 +814,10 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (err)
 		goto cleanup;
 
+	if (type == BPF_LSM_CGROUP)
+		bpf_cgroup_lsm_shim_release(prog ? : link->link.prog,
+					    atype);
+
 	/* now can actually delete it from this cgroup list */
 	list_del(&pl->node);
 	kfree(pl);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cdaa1152436a..351166cea25c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3139,6 +3139,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 		return prog->enforce_expected_attach_type &&
 			prog->expected_attach_type != attach_type ?
 			-EINVAL : 0;
+	case BPF_PROG_TYPE_LSM:
+		if (prog->expected_attach_type != BPF_LSM_CGROUP)
+			return -EINVAL;
+		return 0;
+
 	default:
 		return 0;
 	}
@@ -3194,6 +3199,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_SK_LOOKUP;
 	case BPF_XDP:
 		return BPF_PROG_TYPE_XDP;
+	case BPF_LSM_CGROUP:
+		return BPF_PROG_TYPE_LSM;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -3247,6 +3254,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_LSM:
 		ret = cgroup_bpf_prog_attach(attr, ptype, prog);
 		break;
 	default:
@@ -3284,6 +3292,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_LSM:
 		return cgroup_bpf_prog_detach(attr, ptype);
 	default:
 		return -EINVAL;
@@ -4317,6 +4326,7 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_LSM:
 		ret = cgroup_bpf_link_attach(attr, prog);
 		break;
 	case BPF_PROG_TYPE_TRACING:
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 0c4fd194e801..fca1dea786c7 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -11,6 +11,8 @@
 #include <linux/rcupdate_wait.h>
 #include <linux/module.h>
 #include <linux/static_call.h>
+#include <linux/bpf_verifier.h>
+#include <linux/bpf_lsm.h>
 
 /* dummy _ops. The verifier will operate on target program's ops. */
 const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@@ -394,6 +396,7 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
 		return BPF_TRAMP_MODIFY_RETURN;
 	case BPF_TRACE_FEXIT:
 		return BPF_TRAMP_FEXIT;
+	case BPF_LSM_CGROUP:
 	case BPF_LSM_MAC:
 		if (!prog->aux->attach_func_proto->type)
 			/* The function returns void, we cannot modify its
@@ -485,6 +488,147 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
 	return err;
 }
 
+static struct bpf_prog *cgroup_shim_alloc(const struct bpf_prog *prog,
+					  bpf_func_t bpf_func)
+{
+	struct bpf_prog *p;
+
+	p = bpf_prog_alloc(1, 0);
+	if (!p)
+		return NULL;
+
+	p->jited = false;
+	p->bpf_func = bpf_func;
+
+	p->aux->cgroup_atype = prog->aux->cgroup_atype;
+	p->aux->attach_func_proto = prog->aux->attach_func_proto;
+	p->aux->attach_btf_id = prog->aux->attach_btf_id;
+	p->aux->attach_btf = prog->aux->attach_btf;
+	btf_get(p->aux->attach_btf);
+	p->type = BPF_PROG_TYPE_LSM;
+	p->expected_attach_type = BPF_LSM_MAC;
+	bpf_prog_inc(p);
+
+	return p;
+}
+
+static struct bpf_prog *cgroup_shim_find(struct bpf_trampoline *tr,
+					 bpf_func_t bpf_func)
+{
+	const struct bpf_prog_aux *aux;
+	int kind;
+
+	for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
+		hlist_for_each_entry(aux, &tr->progs_hlist[kind], tramp_hlist) {
+			struct bpf_prog *p = aux->prog;
+
+			if (!p->jited && p->bpf_func == bpf_func)
+				return p;
+		}
+	}
+
+	return NULL;
+}
+
+int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
+				    struct bpf_attach_target_info *tgt_info)
+{
+	struct bpf_prog *shim_prog = NULL;
+	struct bpf_trampoline *tr;
+	bpf_func_t bpf_func;
+	u64 key;
+	int err;
+
+	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
+					 prog->aux->attach_btf_id);
+
+	err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
+	if (err)
+		return err;
+
+	tr = bpf_trampoline_get(key, tgt_info);
+	if (!tr)
+		return  -ENOMEM;
+
+	mutex_lock(&tr->mutex);
+
+	shim_prog = cgroup_shim_find(tr, bpf_func);
+	if (shim_prog) {
+		/* Reusing existing shim attached by the other program.
+		 */
+		bpf_prog_inc(shim_prog);
+		mutex_unlock(&tr->mutex);
+		return 0;
+	}
+
+	/* Allocate and install new shim.
+	 */
+
+	shim_prog = cgroup_shim_alloc(prog, bpf_func);
+	if (!shim_prog) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = __bpf_trampoline_link_prog(shim_prog, tr);
+	if (err)
+		goto out;
+
+	mutex_unlock(&tr->mutex);
+
+	return 0;
+out:
+	if (shim_prog)
+		bpf_prog_put(shim_prog);
+
+	mutex_unlock(&tr->mutex);
+	return err;
+}
+
+void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
+{
+	struct bpf_prog *shim_prog;
+	struct bpf_trampoline *tr;
+	bpf_func_t bpf_func;
+	u64 key;
+	int err;
+
+	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
+					 prog->aux->attach_btf_id);
+
+	err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
+	if (err)
+		return;
+
+	tr = bpf_trampoline_lookup(key);
+	if (!tr)
+		return;
+
+	mutex_lock(&tr->mutex);
+
+	shim_prog = cgroup_shim_find(tr, bpf_func);
+	if (shim_prog) {
+		/* We use shim_prog refcnt for tracking whether to
+		 * remove the shim program from the trampoline.
+		 * Trampoline's mutex is held while refcnt is
+		 * added/subtracted so we don't need to care about
+		 * potential races.
+		 */
+
+		if (atomic64_read(&shim_prog->aux->refcnt) == 1)
+			WARN_ON_ONCE(__bpf_trampoline_unlink_prog(shim_prog, tr));
+
+		bpf_prog_put(shim_prog);
+	}
+
+	mutex_unlock(&tr->mutex);
+
+	bpf_trampoline_put(tr); /* bpf_trampoline_lookup */
+
+	if (shim_prog)
+		bpf_trampoline_put(tr);
+}
+
 struct bpf_trampoline *bpf_trampoline_get(u64 key,
 					  struct bpf_attach_target_info *tgt_info)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9c1a02b82ecd..cc84954846d7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14197,6 +14197,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		fallthrough;
 	case BPF_MODIFY_RETURN:
 	case BPF_LSM_MAC:
+	case BPF_LSM_CGROUP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
 		if (!btf_type_is_func(t)) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d14b10b85e51..bbe48a2dd852 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -998,6 +998,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
 	BPF_TRACE_KPROBE_MULTI,
+	BPF_LSM_CGROUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 1/7] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor Stanislav Fomichev
@ 2022-04-07 22:31 ` Stanislav Fomichev
  2022-04-08 22:56   ` Martin KaFai Lau
  2022-04-07 22:31 ` [PATCH bpf-next v3 4/7] bpf: allow writing to a subset of sock fields from lsm progtype Stanislav Fomichev
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

Previous patch adds 1:1 mapping between all 211 LSM hooks
and bpf_cgroup program array. Instead of reserving a slot per
possible hook, reserve 10 slots per cgroup for lsm programs.
Those slots are dynamically allocated on demand and reclaimed.
This still adds some bloat to the cgroup and brings us back to
roughly pre-cgroup_bpf_attach_type times.

It should be possible to eventually extend this idea to all hooks if
the memory consumption is unacceptable and shrink overall effective
programs array.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf-cgroup-defs.h |  4 +-
 include/linux/bpf_lsm.h         |  6 ---
 kernel/bpf/bpf_lsm.c            |  9 ++--
 kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
 4 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
index 6c661b4df9fa..d42516e86b3a 100644
--- a/include/linux/bpf-cgroup-defs.h
+++ b/include/linux/bpf-cgroup-defs.h
@@ -10,7 +10,9 @@
 
 struct bpf_prog_array;
 
-#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
+/* Maximum number of concurrently attachable per-cgroup LSM hooks.
+ */
+#define CGROUP_LSM_NUM 10
 
 enum cgroup_bpf_attach_type {
 	CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index 7f0e59f5f9be..613de44aa429 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
 void bpf_inode_storage_free(struct inode *inode);
 
 int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
-int bpf_lsm_hook_idx(u32 btf_id);
 
 #else /* !CONFIG_BPF_LSM */
 
@@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
 	return -ENOENT;
 }
 
-static inline int bpf_lsm_hook_idx(u32 btf_id)
-{
-	return -EINVAL;
-}
-
 #endif /* CONFIG_BPF_LSM */
 
 #endif /* _LINUX_BPF_LSM_H */
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index eca258ba71d8..8b948ec9ab73 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
 	if (unlikely(!sk))
 		return 0;
 
+	rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
 	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	if (likely(cgrp))
 		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
 					    ctx, bpf_prog_run, 0);
+	rcu_read_unlock();
 	return ret;
 }
 
@@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
 	/*prog = container_of(insn, struct bpf_prog, insnsi);*/
 	prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
 
-	rcu_read_lock();
+	rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
 	cgrp = task_dfl_cgroup(current);
 	if (likely(cgrp))
 		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
@@ -122,11 +124,6 @@ int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
 	return 0;
 }
 
-int bpf_lsm_hook_idx(u32 btf_id)
-{
-	return btf_id_set_index(&bpf_lsm_hooks, btf_id);
-}
-
 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog)
 {
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 8c77703954f7..fca95e021e7e 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -26,15 +26,68 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
 
 #ifdef CONFIG_BPF_LSM
+/* Readers are protected by rcu+synchronize_rcu.
+ * Writers are protected by cgroup_mutex.
+ */
+int cgroup_lsm_atype_usecnt[CGROUP_LSM_NUM];
+u32 cgroup_lsm_atype_btf_id[CGROUP_LSM_NUM];
+
 static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
 {
-	return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
+	int i;
+
+	WARN_ON_ONCE(!mutex_is_locked(&cgroup_mutex));
+
+	for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype_btf_id); i++) {
+		if (cgroup_lsm_atype_btf_id[i] != attach_btf_id)
+			continue;
+
+		cgroup_lsm_atype_usecnt[i]++;
+		return CGROUP_LSM_START + i;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype_usecnt); i++) {
+		if (cgroup_lsm_atype_usecnt[i] != 0)
+			continue;
+
+		cgroup_lsm_atype_btf_id[i] = attach_btf_id;
+		cgroup_lsm_atype_usecnt[i] = 1;
+		return CGROUP_LSM_START + i;
+	}
+
+	return -E2BIG;
+}
+
+static void bpf_lsm_attach_type_put(u32 attach_btf_id)
+{
+	int i;
+
+	WARN_ON_ONCE(!mutex_is_locked(&cgroup_mutex));
+
+	for (i = 0; i < ARRAY_SIZE(cgroup_lsm_atype_btf_id); i++) {
+		if (cgroup_lsm_atype_btf_id[i] != attach_btf_id)
+			continue;
+
+		if (--cgroup_lsm_atype_usecnt[i] <= 0) {
+			/* Wait for any existing users to finish.
+			 */
+			synchronize_rcu();
+			WARN_ON_ONCE(cgroup_lsm_atype_usecnt[i] < 0);
+		}
+		return;
+	}
+
+	WARN_ON_ONCE(1);
 }
 #else
 static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
 {
 	return -EOPNOTSUPP;
 }
+
+static void bpf_lsm_attach_type_put(u32 attach_btf_id)
+{
+}
 #endif
 
 void cgroup_bpf_offline(struct cgroup *cgrp)
@@ -130,6 +183,7 @@ static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
 		return;
 
 	bpf_trampoline_unlink_cgroup_shim(prog);
+	bpf_lsm_attach_type_put(prog->aux->attach_btf_id);
 }
 
 /**
@@ -522,27 +576,37 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 
 	progs = &cgrp->bpf.progs[atype];
 
-	if (!hierarchy_allows_attach(cgrp, atype))
-		return -EPERM;
+	if (!hierarchy_allows_attach(cgrp, atype)) {
+		err = -EPERM;
+		goto cleanup_attach_type;
+	}
 
-	if (!list_empty(progs) && cgrp->bpf.flags[atype] != saved_flags)
+	if (!list_empty(progs) && cgrp->bpf.flags[atype] != saved_flags) {
 		/* Disallow attaching non-overridable on top
 		 * of existing overridable in this cgroup.
 		 * Disallow attaching multi-prog if overridable or none
 		 */
-		return -EPERM;
+		err = -EPERM;
+		goto cleanup_attach_type;
+	}
 
-	if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
-		return -E2BIG;
+	if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS) {
+		err = -E2BIG;
+		goto cleanup_attach_type;
+	}
 
 	pl = find_attach_entry(progs, prog, link, replace_prog,
 			       flags & BPF_F_ALLOW_MULTI);
-	if (IS_ERR(pl))
-		return PTR_ERR(pl);
+	if (IS_ERR(pl)) {
+		err = PTR_ERR(pl);
+		goto cleanup_attach_type;
+	}
 
 	if (bpf_cgroup_storages_alloc(storage, new_storage, type,
-				      prog ? : link->link.prog, cgrp))
-		return -ENOMEM;
+				      prog ? : link->link.prog, cgrp)) {
+		err = -ENOMEM;
+		goto cleanup_attach_type;
+	}
 
 	if (pl) {
 		old_prog = pl->prog;
@@ -550,7 +614,8 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
 		if (!pl) {
 			bpf_cgroup_storages_free(new_storage);
-			return -ENOMEM;
+			err = -ENOMEM;
+			goto cleanup_attach_type;
 		}
 		list_add_tail(&pl->node, progs);
 	}
@@ -595,6 +660,13 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 		list_del(&pl->node);
 		kfree(pl);
 	}
+
+cleanup_attach_type:
+	if (type == BPF_LSM_CGROUP) {
+		struct bpf_prog *p = prog ? : link->link.prog;
+
+		bpf_lsm_attach_type_put(p->aux->attach_btf_id);
+	}
 	return err;
 }
 
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 4/7] bpf: allow writing to a subset of sock fields from lsm progtype
  2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2022-04-07 22:31 ` [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
@ 2022-04-07 22:31 ` Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 5/7] libbpf: add lsm_cgoup_sock type Stanislav Fomichev
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

For now, allow only the obvious ones, like sk_priority and sk_mark.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 kernel/bpf/bpf_lsm.c  | 58 +++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c |  3 ++-
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 8b948ec9ab73..cc13da18d8b3 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -332,7 +332,65 @@ bool bpf_lsm_is_sleepable_hook(u32 btf_id)
 const struct bpf_prog_ops lsm_prog_ops = {
 };
 
+static int lsm_btf_struct_access(struct bpf_verifier_log *log,
+					const struct btf *btf,
+					const struct btf_type *t, int off,
+					int size, enum bpf_access_type atype,
+					u32 *next_btf_id,
+					enum bpf_type_flag *flag)
+{
+	const struct btf_type *sock_type;
+	struct btf *btf_vmlinux;
+	s32 type_id;
+	size_t end;
+
+	if (atype == BPF_READ)
+		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
+					 flag);
+
+	btf_vmlinux = bpf_get_btf_vmlinux();
+	if (!btf_vmlinux) {
+		bpf_log(log, "no vmlinux btf\n");
+		return -EOPNOTSUPP;
+	}
+
+	type_id = btf_find_by_name_kind(btf_vmlinux, "sock", BTF_KIND_STRUCT);
+	if (type_id < 0) {
+		bpf_log(log, "'struct sock' not found in vmlinux btf\n");
+		return -EINVAL;
+	}
+
+	sock_type = btf_type_by_id(btf_vmlinux, type_id);
+
+	if (t != sock_type) {
+		bpf_log(log, "only 'struct sock' writes are supported\n");
+		return -EACCES;
+	}
+
+	switch (off) {
+	case bpf_ctx_range(struct sock, sk_priority):
+		end = offsetofend(struct sock, sk_priority);
+		break;
+	case bpf_ctx_range(struct sock, sk_mark):
+		end = offsetofend(struct sock, sk_mark);
+		break;
+	default:
+		bpf_log(log, "no write support to 'struct sock' at off %d\n", off);
+		return -EACCES;
+	}
+
+	if (off + size > end) {
+		bpf_log(log,
+			"write access at off %d with size %d beyond the member of 'struct sock' ended at %zu\n",
+			off, size, end);
+		return -EACCES;
+	}
+
+	return NOT_INIT;
+}
+
 const struct bpf_verifier_ops lsm_verifier_ops = {
 	.get_func_proto = bpf_lsm_func_proto,
 	.is_valid_access = btf_ctx_access,
+	.btf_struct_access = lsm_btf_struct_access,
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index cc84954846d7..0d6d5be30a36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12872,7 +12872,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
 				env->prog->aux->num_exentries++;
-			} else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS) {
+			} else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS &&
+				   resolve_prog_type(env->prog) != BPF_PROG_TYPE_LSM) {
 				verbose(env, "Writes through BTF pointers are not allowed\n");
 				return -EINVAL;
 			}
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 5/7] libbpf: add lsm_cgoup_sock type
  2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (3 preceding siblings ...)
  2022-04-07 22:31 ` [PATCH bpf-next v3 4/7] bpf: allow writing to a subset of sock fields from lsm progtype Stanislav Fomichev
@ 2022-04-07 22:31 ` Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 6/7] selftests/bpf: lsm_cgroup functional test Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 7/7] selftests/bpf: verify lsm_cgroup struct sock access Stanislav Fomichev
  6 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

lsm_cgroup/ is the prefix for BPF_LSM_CGROUP.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/bpf/bpftool/common.c | 1 +
 tools/lib/bpf/libbpf.c     | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 0c1e06cf50b9..2b3bf6fa413a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -67,6 +67,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = {
 	[BPF_TRACE_FEXIT]		= "fexit",
 	[BPF_MODIFY_RETURN]		= "mod_ret",
 	[BPF_LSM_MAC]			= "lsm_mac",
+	[BPF_LSM_CGROUP]		= "lsm_cgroup",
 	[BPF_SK_LOOKUP]			= "sk_lookup",
 	[BPF_TRACE_ITER]		= "trace_iter",
 	[BPF_XDP_DEVMAP]		= "xdp_devmap",
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 016ecdd1c3e1..789726df5fe8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8691,6 +8691,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("freplace/",		EXT, 0, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("lsm/",			LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm),
 	SEC_DEF("lsm.s/",		LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
+	SEC_DEF("lsm_cgroup/",	LSM, BPF_LSM_CGROUP, SEC_ATTACH_BTF),
 	SEC_DEF("iter/",		TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF, attach_iter),
 	SEC_DEF("iter.s/",		TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_iter),
 	SEC_DEF("syscall",		SYSCALL, 0, SEC_SLEEPABLE),
@@ -9112,6 +9113,7 @@ void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
 		*kind = BTF_KIND_TYPEDEF;
 		break;
 	case BPF_LSM_MAC:
+	case BPF_LSM_CGROUP:
 		*prefix = BTF_LSM_PREFIX;
 		*kind = BTF_KIND_FUNC;
 		break;
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 6/7] selftests/bpf: lsm_cgroup functional test
  2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (4 preceding siblings ...)
  2022-04-07 22:31 ` [PATCH bpf-next v3 5/7] libbpf: add lsm_cgoup_sock type Stanislav Fomichev
@ 2022-04-07 22:31 ` Stanislav Fomichev
  2022-04-07 22:31 ` [PATCH bpf-next v3 7/7] selftests/bpf: verify lsm_cgroup struct sock access Stanislav Fomichev
  6 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

Functional test that exercises the following:

1. apply default sk_priority policy
2. permit TX-only AF_PACKET socket
3. cgroup attach/detach/replace
4. reusing trampoline shim

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/lsm_cgroup.c     | 158 ++++++++++++++++++
 .../testing/selftests/bpf/progs/lsm_cgroup.c  |  94 +++++++++++
 2 files changed, 252 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/progs/lsm_cgroup.c

diff --git a/tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c b/tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
new file mode 100644
index 000000000000..e786b63d81e2
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+
+#include "lsm_cgroup.skel.h"
+#include "cgroup_helpers.h"
+
+void test_lsm_cgroup(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_prog_attach_opts, attach_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int cgroup_fd, cgroup_fd2, err, fd, prio;
+	struct lsm_cgroup *skel = NULL;
+	int post_create_prog_fd2 = -1;
+	int post_create_prog_fd = -1;
+	int bind_link_fd2 = -1;
+	int bind_prog_fd2 = -1;
+	int alloc_prog_fd = -1;
+	int bind_prog_fd = -1;
+	int bind_link_fd = -1;
+	socklen_t socklen;
+
+	cgroup_fd = test__join_cgroup("/sock_policy");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+		goto close_skel;
+
+	cgroup_fd2 = create_and_get_cgroup("/sock_policy2");
+	if (!ASSERT_GE(cgroup_fd2, 0, "create second cgroup"))
+		goto close_skel;
+
+	skel = lsm_cgroup__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		goto close_cgroup;
+
+	post_create_prog_fd = bpf_program__fd(skel->progs.socket_post_create);
+	post_create_prog_fd2 = bpf_program__fd(skel->progs.socket_post_create2);
+	bind_prog_fd = bpf_program__fd(skel->progs.socket_bind);
+	bind_prog_fd2 = bpf_program__fd(skel->progs.socket_bind2);
+	alloc_prog_fd = bpf_program__fd(skel->progs.socket_alloc);
+
+	err = bpf_prog_attach(alloc_prog_fd, cgroup_fd, BPF_LSM_CGROUP, 0);
+	if (!ASSERT_OK(err, "attach alloc_prog_fd"))
+		goto detach_cgroup;
+
+	/* Make sure replacing works.
+	 */
+
+	err = bpf_prog_attach(post_create_prog_fd, cgroup_fd,
+			      BPF_LSM_CGROUP, 0);
+	if (!ASSERT_OK(err, "attach post_create_prog_fd"))
+		goto close_cgroup;
+
+	attach_opts.replace_prog_fd = post_create_prog_fd;
+	err = bpf_prog_attach_opts(post_create_prog_fd2, cgroup_fd,
+				   BPF_LSM_CGROUP, &attach_opts);
+	if (!ASSERT_OK(err, "prog replace post_create_prog_fd"))
+		goto detach_cgroup;
+
+	/* Try the same attach/replace via link API.
+	 */
+
+	bind_link_fd = bpf_link_create(bind_prog_fd, cgroup_fd,
+				       BPF_LSM_CGROUP, NULL);
+	if (!ASSERT_GE(bind_link_fd, 0, "link create bind_prog_fd"))
+		goto detach_cgroup;
+
+	update_opts.old_prog_fd = bind_prog_fd;
+	update_opts.flags = BPF_F_REPLACE;
+
+	err = bpf_link_update(bind_link_fd, bind_prog_fd2, &update_opts);
+	if (!ASSERT_OK(err, "link update bind_prog_fd"))
+		goto detach_cgroup;
+
+	/* Attach another instance of bind program to another cgroup.
+	 * This should trigger the reuse of the trampoline shim (two
+	 * programs attaching to the same btf_id).
+	 */
+
+	bind_link_fd2 = bpf_link_create(bind_prog_fd2, cgroup_fd2,
+					BPF_LSM_CGROUP, NULL);
+	if (!ASSERT_GE(bind_link_fd2, 0, "link create bind_prog_fd2"))
+		goto detach_cgroup;
+
+	/* AF_UNIX is prohibited.
+	 */
+
+	fd = socket(AF_UNIX, SOCK_STREAM, 0);
+	ASSERT_LT(fd, 0, "socket(AF_UNIX)");
+
+	/* AF_INET6 gets default policy (sk_priority).
+	 */
+
+	fd = socket(AF_INET6, SOCK_STREAM, 0);
+	if (!ASSERT_GE(fd, 0, "socket(SOCK_STREAM)"))
+		goto detach_cgroup;
+
+	prio = 0;
+	socklen = sizeof(prio);
+	ASSERT_GE(getsockopt(fd, SOL_SOCKET, SO_PRIORITY, &prio, &socklen), 0,
+		  "getsockopt");
+	ASSERT_EQ(prio, 123, "sk_priority");
+
+	close(fd);
+
+	/* TX-only AF_PACKET is allowed.
+	 */
+
+	ASSERT_LT(socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)), 0,
+		  "socket(AF_PACKET, ..., ETH_P_ALL)");
+
+	fd = socket(AF_PACKET, SOCK_RAW, 0);
+	ASSERT_GE(fd, 0, "socket(AF_PACKET, ..., 0)");
+
+	/* TX-only AF_PACKET can not be rebound.
+	 */
+
+	struct sockaddr_ll sa = {
+		.sll_family = AF_PACKET,
+		.sll_protocol = htons(ETH_P_ALL),
+	};
+	ASSERT_LT(bind(fd, (struct sockaddr *)&sa, sizeof(sa)), 0,
+		  "bind(ETH_P_ALL)");
+
+	close(fd);
+
+	/* Make sure other cgroup doesn't trigger the programs.
+	 */
+
+	if (!ASSERT_OK(join_cgroup(""), "join root cgroup"))
+		goto detach_cgroup;
+
+	fd = socket(AF_INET6, SOCK_STREAM, 0);
+	if (!ASSERT_GE(fd, 0, "socket(SOCK_STREAM)"))
+		goto detach_cgroup;
+
+	prio = 0;
+	socklen = sizeof(prio);
+	ASSERT_GE(getsockopt(fd, SOL_SOCKET, SO_PRIORITY, &prio, &socklen), 0,
+		  "getsockopt");
+	ASSERT_EQ(prio, 0, "sk_priority");
+
+	close(fd);
+
+detach_cgroup:
+	ASSERT_GE(bpf_prog_detach2(post_create_prog_fd2, cgroup_fd,
+				   BPF_LSM_CGROUP), 0, "detach_create");
+	close(bind_link_fd);
+	/* Don't close bind_link_fd2, exercise cgroup release cleanup. */
+	ASSERT_GE(bpf_prog_detach2(alloc_prog_fd, cgroup_fd,
+				   BPF_LSM_CGROUP), 0, "detach_alloc");
+
+close_cgroup:
+	close(cgroup_fd);
+close_skel:
+	lsm_cgroup__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/lsm_cgroup.c b/tools/testing/selftests/bpf/progs/lsm_cgroup.c
new file mode 100644
index 000000000000..fd3b2daa26aa
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/lsm_cgroup.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#ifndef AF_PACKET
+#define AF_PACKET 17
+#endif
+
+#ifndef AF_UNIX
+#define AF_UNIX 1
+#endif
+
+#ifndef EPERM
+#define EPERM 1
+#endif
+
+static __always_inline int real_create(struct socket *sock, int family,
+				       int protocol)
+{
+	struct sock *sk;
+
+	/* Reject non-tx-only AF_PACKET.
+	 */
+	if (family == AF_PACKET && protocol != 0)
+		return 0; /* EPERM */
+
+	sk = sock->sk;
+	if (!sk)
+		return 1;
+
+	/* The rest of the sockets get default policy.
+	 */
+	sk->sk_priority = 123;
+	return 1;
+}
+
+SEC("lsm_cgroup/socket_post_create")
+int BPF_PROG(socket_post_create, struct socket *sock, int family,
+	     int type, int protocol, int kern)
+{
+	return real_create(sock, family, protocol);
+}
+
+SEC("lsm_cgroup/socket_post_create")
+int BPF_PROG(socket_post_create2, struct socket *sock, int family,
+	     int type, int protocol, int kern)
+{
+	return real_create(sock, family, protocol);
+}
+
+static __always_inline int real_bind(struct socket *sock,
+				     struct sockaddr *address,
+				     int addrlen)
+{
+	struct sockaddr_ll sa = {};
+
+	if (sock->sk->__sk_common.skc_family != AF_PACKET)
+		return 1;
+
+	if (sock->sk->sk_kern_sock)
+		return 1;
+
+	bpf_probe_read_kernel(&sa, sizeof(sa), address);
+	if (sa.sll_protocol)
+		return 0; /* EPERM */
+
+	return 1;
+}
+
+SEC("lsm_cgroup/socket_bind")
+int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address,
+	     int addrlen)
+{
+	return real_bind(sock, address, addrlen);
+}
+
+SEC("lsm_cgroup/socket_bind")
+int BPF_PROG(socket_bind2, struct socket *sock, struct sockaddr *address,
+	     int addrlen)
+{
+	return real_bind(sock, address, addrlen);
+}
+
+SEC("lsm_cgroup/sk_alloc_security")
+int BPF_PROG(socket_alloc, struct sock *sk, int family, gfp_t priority)
+{
+	if (family == AF_UNIX)
+		return 0; /* EPERM */
+	return 1;
+}
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 7/7] selftests/bpf: verify lsm_cgroup struct sock access
  2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
                   ` (5 preceding siblings ...)
  2022-04-07 22:31 ` [PATCH bpf-next v3 6/7] selftests/bpf: lsm_cgroup functional test Stanislav Fomichev
@ 2022-04-07 22:31 ` Stanislav Fomichev
  6 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-07 22:31 UTC (permalink / raw)
  To: netdev, bpf; +Cc: ast, daniel, andrii, Stanislav Fomichev

sk_priority & sk_mark are writable, the rest is readonly.

Add new ldx_offset fixups to lookup the offset of struct field.
Allow using test.kfunc regardless of prog_type.

One interesting thing here is that the verifier doesn't
really force me to add NULL checks anywhere :-/

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/test_verifier.c   | 54 ++++++++++++++++++-
 .../selftests/bpf/verifier/lsm_cgroup.c       | 34 ++++++++++++
 2 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/lsm_cgroup.c

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index a2cd236c32eb..d6bc55c54aaa 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -75,6 +75,12 @@ struct kfunc_btf_id_pair {
 	int insn_idx;
 };
 
+struct ldx_offset {
+	const char *strct;
+	const char *field;
+	int insn_idx;
+};
+
 struct bpf_test {
 	const char *descr;
 	struct bpf_insn	insns[MAX_INSNS];
@@ -102,6 +108,7 @@ struct bpf_test {
 	int fixup_map_ringbuf[MAX_FIXUPS];
 	int fixup_map_timer[MAX_FIXUPS];
 	struct kfunc_btf_id_pair fixup_kfunc_btf_id[MAX_FIXUPS];
+	struct ldx_offset fixup_ldx[MAX_FIXUPS];
 	/* Expected verifier log output for result REJECT or VERBOSE_ACCEPT.
 	 * Can be a tab-separated sequence of expected strings. An empty string
 	 * means no log verification.
@@ -755,6 +762,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	int *fixup_map_ringbuf = test->fixup_map_ringbuf;
 	int *fixup_map_timer = test->fixup_map_timer;
 	struct kfunc_btf_id_pair *fixup_kfunc_btf_id = test->fixup_kfunc_btf_id;
+	struct ldx_offset *fixup_ldx = test->fixup_ldx;
 
 	if (test->fill_helper) {
 		test->fill_insns = calloc(MAX_TEST_INSNS, sizeof(struct bpf_insn));
@@ -967,6 +975,50 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 			fixup_kfunc_btf_id++;
 		} while (fixup_kfunc_btf_id->kfunc);
 	}
+
+	if (fixup_ldx->strct) {
+		const struct btf_member *memb;
+		const struct btf_type *tp;
+		const char *name;
+		struct btf *btf;
+		int btf_id;
+		int off;
+		int i;
+
+		btf = btf__load_vmlinux_btf();
+
+		do {
+			off = -1;
+			if (!btf)
+				goto next_ldx;
+
+			btf_id = btf__find_by_name_kind(btf,
+							fixup_ldx->strct,
+							BTF_KIND_STRUCT);
+			if (btf_id < 0)
+				goto next_ldx;
+
+			tp = btf__type_by_id(btf, btf_id);
+			memb = btf_members(tp);
+
+			for (i = 0; i < btf_vlen(tp); i++) {
+				name = btf__name_by_offset(btf,
+							   memb->name_off);
+				if (strcmp(fixup_ldx->field, name) == 0) {
+					off = memb->offset / 8;
+					break;
+				}
+				memb++;
+			}
+
+next_ldx:
+			prog[fixup_ldx->insn_idx].off = off;
+			fixup_ldx++;
+
+		} while (fixup_ldx->strct);
+
+		btf__free(btf);
+	}
 }
 
 struct libcap {
@@ -1131,7 +1183,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 		opts.log_level = 4;
 	opts.prog_flags = pflags;
 
-	if (prog_type == BPF_PROG_TYPE_TRACING && test->kfunc) {
+	if (test->kfunc) {
 		int attach_btf_id;
 
 		attach_btf_id = libbpf_find_vmlinux_btf_id(test->kfunc,
diff --git a/tools/testing/selftests/bpf/verifier/lsm_cgroup.c b/tools/testing/selftests/bpf/verifier/lsm_cgroup.c
new file mode 100644
index 000000000000..af0efe783511
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/lsm_cgroup.c
@@ -0,0 +1,34 @@
+#define SK_WRITABLE_FIELD(tp, field, size, res) \
+{ \
+	.descr = field, \
+	.insns = { \
+		/* r1 = *(u64 *)(r1 + 0) */ \
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0), \
+		/* r1 = *(u64 *)(r1 + offsetof(struct socket, sk)) */ \
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0), \
+		/* r2 = *(u64 *)(r1 + offsetof(struct sock, <field>)) */ \
+		BPF_LDX_MEM(size, BPF_REG_2, BPF_REG_1, 0), \
+		/* *(u64 *)(r1 + offsetof(struct sock, <field>)) = r2 */ \
+		BPF_STX_MEM(size, BPF_REG_1, BPF_REG_2, 0), \
+		BPF_MOV64_IMM(BPF_REG_0, 1), \
+		BPF_EXIT_INSN(), \
+	}, \
+	.result = res, \
+	.errstr = res ? "no write support to 'struct sock' at off" : "", \
+	.prog_type = BPF_PROG_TYPE_LSM, \
+	.expected_attach_type = BPF_LSM_CGROUP, \
+	.kfunc = "socket_post_create", \
+	.fixup_ldx = { \
+		{ "socket", "sk", 1 }, \
+		{ tp, field, 2 }, \
+		{ tp, field, 3 }, \
+	}, \
+}
+
+SK_WRITABLE_FIELD("sock_common", "skc_family", BPF_H, REJECT),
+SK_WRITABLE_FIELD("sock", "sk_sndtimeo", BPF_DW, REJECT),
+SK_WRITABLE_FIELD("sock", "sk_priority", BPF_W, ACCEPT),
+SK_WRITABLE_FIELD("sock", "sk_mark", BPF_W, ACCEPT),
+SK_WRITABLE_FIELD("sock", "sk_pacing_rate", BPF_DW, REJECT),
+
+#undef SK_WRITABLE_FIELD
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-07 22:31 ` [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor Stanislav Fomichev
@ 2022-04-08 14:20   ` kernel test robot
  2022-04-08 15:53   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2022-04-08 14:20 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf
  Cc: kbuild-all, ast, daniel, andrii, Stanislav Fomichev

Hi Stanislav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220408-063705
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm64-randconfig-r011-20220408 (https://download.01.org/0day-ci/archive/20220408/202204082214.0EUCPtwa-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/3c3f15b5422ca616e2585d699c47aa4e7b7dcf1d
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220408-063705
        git checkout 3c3f15b5422ca616e2585d699c47aa4e7b7dcf1d
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arm64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   aarch64-linux-ld: Unexpected GOT/PLT entries detected!
   aarch64-linux-ld: Unexpected run-time procedure linkages detected!
   aarch64-linux-ld: ID map text too big or misaligned
   aarch64-linux-ld: kernel/bpf/trampoline.o: in function `bpf_trampoline_compute_key':
   include/linux/bpf_verifier.h:540: undefined reference to `btf_obj_id'
>> aarch64-linux-ld: include/linux/bpf_verifier.h:540: undefined reference to `btf_obj_id'

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-07 22:31 ` [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor Stanislav Fomichev
  2022-04-08 14:20   ` kernel test robot
@ 2022-04-08 15:53   ` kernel test robot
  2022-04-08 16:42     ` Martin KaFai Lau
  2022-04-08 22:12   ` Martin KaFai Lau
  2022-04-11  8:26   ` Dan Carpenter
  3 siblings, 1 reply; 33+ messages in thread
From: kernel test robot @ 2022-04-08 15:53 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf
  Cc: llvm, kbuild-all, ast, daniel, andrii, Stanislav Fomichev

Hi Stanislav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220408-063705
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-a005 (https://download.01.org/0day-ci/archive/20220408/202204082305.Qs2g5Dzf-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project c29a51b3a257908aebc01cd7c4655665db317d66)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/3c3f15b5422ca616e2585d699c47aa4e7b7dcf1d
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220408-063705
        git checkout 3c3f15b5422ca616e2585d699c47aa4e7b7dcf1d
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: btf_obj_id
   >>> referenced by trampoline.c
   >>>               bpf/trampoline.o:(bpf_trampoline_link_cgroup_shim) in archive kernel/built-in.a
   >>> referenced by trampoline.c
   >>>               bpf/trampoline.o:(bpf_trampoline_unlink_cgroup_shim) in archive kernel/built-in.a

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-08 15:53   ` kernel test robot
@ 2022-04-08 16:42     ` Martin KaFai Lau
  0 siblings, 0 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-08 16:42 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, kernel test robot, bpf, llvm, kbuild-all, ast, daniel, andrii

On Fri, Apr 08, 2022 at 11:53:47PM +0800, kernel test robot wrote:
> Hi Stanislav,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on bpf-next/master]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220408-063705
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> config: x86_64-randconfig-a005 (https://download.01.org/0day-ci/archive/20220408/202204082305.Qs2g5Dzf-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project c29a51b3a257908aebc01cd7c4655665db317d66)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/3c3f15b5422ca616e2585d699c47aa4e7b7dcf1d
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220408-063705
>         git checkout 3c3f15b5422ca616e2585d699c47aa4e7b7dcf1d
>         # save the config file to linux build tree
>         mkdir build_dir
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
> >> ld.lld: error: undefined symbol: btf_obj_id
>    >>> referenced by trampoline.c
>    >>>               bpf/trampoline.o:(bpf_trampoline_link_cgroup_shim) in archive kernel/built-in.a
>    >>> referenced by trampoline.c
>    >>>               bpf/trampoline.o:(bpf_trampoline_unlink_cgroup_shim) in archive kernel/built-in.a
It is probably because obj-$(CONFIG_BPF_JIT) += trampoline.o
while obj-$(CONFIG_BPF_SYSCALL) += btf.o.

Good catch but seems minor and should not affect the review.
Please hold off the respin a little first so that the review
can continue on this revision.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-07 22:31 ` [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor Stanislav Fomichev
  2022-04-08 14:20   ` kernel test robot
  2022-04-08 15:53   ` kernel test robot
@ 2022-04-08 22:12   ` Martin KaFai Lau
  2022-04-11 19:07     ` Stanislav Fomichev
  2022-04-11  8:26   ` Dan Carpenter
  3 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-08 22:12 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Thu, Apr 07, 2022 at 03:31:07PM -0700, Stanislav Fomichev wrote:
> diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> index 064eccba641d..eca258ba71d8 100644
> --- a/kernel/bpf/bpf_lsm.c
> +++ b/kernel/bpf/bpf_lsm.c
> @@ -35,6 +35,98 @@ BTF_SET_START(bpf_lsm_hooks)
>  #undef LSM_HOOK
>  BTF_SET_END(bpf_lsm_hooks)
>  
> +static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> +						const struct bpf_insn *insn)
> +{
> +	const struct bpf_prog *prog;
> +	struct socket *sock;
> +	struct cgroup *cgrp;
> +	struct sock *sk;
> +	int ret = 0;
> +	u64 *regs;
> +
> +	regs = (u64 *)ctx;
> +	sock = (void *)(unsigned long)regs[BPF_REG_0];
> +	/*prog = container_of(insn, struct bpf_prog, insnsi);*/
> +	prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
nit. Rename prog to shim_prog.

> +
> +	if (unlikely(!sock))
Is it possible in the lsm hooks?  Can these hooks
be rejected at the load time instead?

> +		return 0;
> +
> +	sk = sock->sk;
> +	if (unlikely(!sk))
Same here.

> +		return 0;
> +
> +	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> +	if (likely(cgrp))
> +		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> +					    ctx, bpf_prog_run, 0);
> +	return ret;
> +}
> +
> +static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> +						 const struct bpf_insn *insn)
> +{
> +	const struct bpf_prog *prog;
> +	struct cgroup *cgrp;
> +	int ret = 0;
> +
> +	if (unlikely(!current))
> +		return 0;
> +
> +	/*prog = container_of(insn, struct bpf_prog, insnsi);*/
> +	prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
nit. shim_prog here also.

> +
> +	rcu_read_lock();
> +	cgrp = task_dfl_cgroup(current);
> +	if (likely(cgrp))
> +		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> +					    ctx, bpf_prog_run, 0);
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> +			     bpf_func_t *bpf_func)
> +{
> +	const struct btf_type *first_arg_type;
> +	const struct btf_type *sock_type;
> +	const struct btf *btf_vmlinux;
> +	const struct btf_param *args;
> +	s32 type_id;
> +
> +	if (!prog->aux->attach_func_proto ||
> +	    !btf_type_is_func_proto(prog->aux->attach_func_proto))
Are these cases possible at the attaching time or they have already been
rejected at the load time?  If it is the latter, these tests can be
removed.

> +		return -EINVAL;
> +
> +	if (btf_type_vlen(prog->aux->attach_func_proto) < 1)
Is it consistent with the existing BPF_LSM_MAC?
or is there something special about BPF_LSM_CGROUP that
it cannot support this func ?

> +		return -EINVAL;
> +
> +	args = (const struct btf_param *)(prog->aux->attach_func_proto + 1);
nit.
	args = btf_params(prog->aux->attach_func_proto);

> +
> +	btf_vmlinux = bpf_get_btf_vmlinux();
> +	if (!btf_vmlinux)
> +		return -EINVAL;
> +
> +	type_id = btf_find_by_name_kind(btf_vmlinux, "socket", BTF_KIND_STRUCT);
> +	if (type_id < 0)
> +		return -EINVAL;
> +	sock_type = btf_type_by_id(btf_vmlinux, type_id);
> +
> +	first_arg_type = btf_type_resolve_ptr(btf_vmlinux, args[0].type, NULL);
> +	if (first_arg_type == sock_type)
> +		*bpf_func = __cgroup_bpf_run_lsm_socket;
> +	else
> +		*bpf_func = __cgroup_bpf_run_lsm_current;
> +
> +	return 0;
> +}
> +
> +int bpf_lsm_hook_idx(u32 btf_id)
> +{
> +	return btf_id_set_index(&bpf_lsm_hooks, btf_id);
> +}
> +
>  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
>  			const struct bpf_prog *prog)
>  {
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 0918a39279f6..4199de31f49c 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -4971,6 +4971,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  
>  	if (arg == nr_args) {
>  		switch (prog->expected_attach_type) {
> +		case BPF_LSM_CGROUP:
>  		case BPF_LSM_MAC:
>  		case BPF_TRACE_FEXIT:
>  			/* When LSM programs are attached to void LSM hooks
> @@ -6396,6 +6397,16 @@ static int btf_id_cmp_func(const void *a, const void *b)
>  	return *pa - *pb;
>  }
>  
> +int btf_id_set_index(const struct btf_id_set *set, u32 id)
> +{
> +	const u32 *p;
> +
> +	p = bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func);
> +	if (!p)
> +		return -1;
> +	return p - set->ids;
> +}
> +
>  bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
>  {
>  	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 128028efda64..8c77703954f7 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -14,6 +14,9 @@
>  #include <linux/string.h>
>  #include <linux/bpf.h>
>  #include <linux/bpf-cgroup.h>
> +#include <linux/btf_ids.h>
> +#include <linux/bpf_lsm.h>
> +#include <linux/bpf_verifier.h>
>  #include <net/sock.h>
>  #include <net/bpf_sk_storage.h>
>  
> @@ -22,6 +25,18 @@
>  DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE);
>  EXPORT_SYMBOL(cgroup_bpf_enabled_key);
>  
> +#ifdef CONFIG_BPF_LSM
> +static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
> +{
> +	return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
> +}
> +#else
> +static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
> +{
> +	return -EOPNOTSUPP;
> +}
> +#endif
> +
>  void cgroup_bpf_offline(struct cgroup *cgrp)
>  {
>  	cgroup_get(cgrp);
> @@ -89,6 +104,14 @@ static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
>  		bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
>  }
>  
> +static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
> +{
> +	enum bpf_cgroup_storage_type stype;
> +
> +	for_each_cgroup_storage_type(stype)
> +		bpf_cgroup_storage_unlink(storages[stype]);
> +}
> +
>  /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
>   * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
>   * doesn't free link memory, which will eventually be done by bpf_link's
> @@ -100,6 +123,15 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
>  	link->cgroup = NULL;
>  }
>  
> +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
> +					enum cgroup_bpf_attach_type atype)
> +{
> +	if (!prog || atype != prog->aux->cgroup_atype)
prog cannot be NULL here, no?

The 'atype != prog->aux->cgroup_atype' looks suspicious also considering
prog->aux->cgroup_atype is only initialized (and meaningful) for BPF_LSM_CGROUP.
I suspect incorrectly passing this test will crash in the below
bpf_trampoline_unlink_cgroup_shim(). More on this later.

> +		return;
> +
> +	bpf_trampoline_unlink_cgroup_shim(prog);
> +}
> +
>  /**
>   * cgroup_bpf_release() - put references of all bpf programs and
>   *                        release all cgroup bpf data
> @@ -123,10 +155,16 @@ static void cgroup_bpf_release(struct work_struct *work)
Copying some missing loop context here:

	for (atype = 0; atype < ARRAY_SIZE(cgrp->bpf.progs); atype++) {
		struct list_head *progs = &cgrp->bpf.progs[atype];
		struct bpf_prog_list *pl, *pltmp;
				  
>  
>  		list_for_each_entry_safe(pl, pltmp, progs, node) {
>  			list_del(&pl->node);
> -			if (pl->prog)
> +			if (pl->prog) {
> +				bpf_cgroup_lsm_shim_release(pl->prog,
> +							    atype);
atype could be 0 (CGROUP_INET_INGRESS) here.  bpf_cgroup_lsm_shim_release()
above will go ahead with bpf_trampoline_unlink_cgroup_shim().
It will break some of the assumptions.  e.g. prog->aux->attach_btf is NULL
for CGROUP_INET_INGRESS.

Instead, only call bpf_cgroup_lsm_shim_release() for BPF_LSM_CGROUP ?

If the above observation is sane, I wonder if the existing test_progs
have uncovered it or may be the existing tests just always detach
cleanly itself before cleaning the cgroup which then avoided this case.

>  				bpf_prog_put(pl->prog);
> -			if (pl->link)
> +			}
> +			if (pl->link) {
> +				bpf_cgroup_lsm_shim_release(pl->link->link.prog,
> +							    atype);
>  				bpf_cgroup_link_auto_detach(pl->link);
> +			}
>  			kfree(pl);
>  			static_branch_dec(&cgroup_bpf_enabled_key[atype]);
>  		}
> @@ -439,6 +477,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	struct bpf_prog *old_prog = NULL;
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
>  	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> +	struct bpf_attach_target_info tgt_info = {};
>  	enum cgroup_bpf_attach_type atype;
>  	struct bpf_prog_list *pl;
>  	struct list_head *progs;
> @@ -455,9 +494,31 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  		/* replace_prog implies BPF_F_REPLACE, and vice versa */
>  		return -EINVAL;
>  
> -	atype = to_cgroup_bpf_attach_type(type);
> -	if (atype < 0)
> -		return -EINVAL;
> +	if (type == BPF_LSM_CGROUP) {
> +		struct bpf_prog *p = prog ? : link->link.prog;
> +
> +		if (replace_prog) {
> +			/* Reusing shim from the original program.
> +			 */
> +			atype = replace_prog->aux->cgroup_atype;
> +		} else {
> +			err = bpf_check_attach_target(NULL, p, NULL,
> +						      p->aux->attach_btf_id,
> +						      &tgt_info);
> +			if (err)
> +				return -EINVAL;
> +
> +			atype = bpf_lsm_attach_type_get(p->aux->attach_btf_id);
> +			if (atype < 0)
> +				return atype;
> +		}
> +
> +		p->aux->cgroup_atype = atype;
hmm.... not sure about this assignment for the replace_prog case.
In particular, the attaching prog's cgroup_atype can be decided
by the replace_prog's cgroup_atype?  Was there some checks
before to ensure the replace_prog and the attaching prog have
the same attach_btf_id?

> +	} else {
> +		atype = to_cgroup_bpf_attach_type(type);
> +		if (atype < 0)
> +			return -EINVAL;
> +	}
>  
>  	progs = &cgrp->bpf.progs[atype];
>  
> @@ -503,13 +564,27 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	if (err)
>  		goto cleanup;
>  
> +	bpf_cgroup_storages_link(new_storage, cgrp, type);
> +
> +	if (type == BPF_LSM_CGROUP && !old_prog) {
> +		struct bpf_prog *p = prog ? : link->link.prog;
> +		int err;
> +
> +		err = bpf_trampoline_link_cgroup_shim(p, &tgt_info);
> +		if (err)
> +			goto cleanup_trampoline;
> +	}
> +
>  	if (old_prog)
>  		bpf_prog_put(old_prog);
>  	else
>  		static_branch_inc(&cgroup_bpf_enabled_key[atype]);
> -	bpf_cgroup_storages_link(new_storage, cgrp, type);
> +
>  	return 0;
>  
> +cleanup_trampoline:
> +	bpf_cgroup_storages_unlink(new_storage);
> +
>  cleanup:
>  	if (old_prog) {
>  		pl->prog = old_prog;
> @@ -601,9 +676,13 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
>  	struct list_head *progs;
>  	bool found = false;
>  
> -	atype = to_cgroup_bpf_attach_type(link->type);
> -	if (atype < 0)
> -		return -EINVAL;
> +	if (link->type == BPF_LSM_CGROUP) {
> +		atype = link->link.prog->aux->cgroup_atype;
> +	} else {
> +		atype = to_cgroup_bpf_attach_type(link->type);
> +		if (atype < 0)
> +			return -EINVAL;
> +	}
>  
>  	progs = &cgrp->bpf.progs[atype];
>  
> @@ -619,6 +698,9 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
>  	if (!found)
>  		return -ENOENT;
>  
> +	if (link->type == BPF_LSM_CGROUP)
> +		new_prog->aux->cgroup_atype = atype;
> +
>  	old_prog = xchg(&link->link.prog, new_prog);
>  	replace_effective_prog(cgrp, atype, link);
>  	bpf_prog_put(old_prog);
> @@ -702,9 +784,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	u32 flags;
>  	int err;
>  
> -	atype = to_cgroup_bpf_attach_type(type);
> -	if (atype < 0)
> -		return -EINVAL;
> +	if (type == BPF_LSM_CGROUP) {
> +		struct bpf_prog *p = prog ? : link->link.prog;
> +
> +		atype = p->aux->cgroup_atype;
> +	} else {
> +		atype = to_cgroup_bpf_attach_type(type);
> +		if (atype < 0)
> +			return -EINVAL;
> +	}
>  
>  	progs = &cgrp->bpf.progs[atype];
>  	flags = cgrp->bpf.flags[atype];
> @@ -726,6 +814,10 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	if (err)
>  		goto cleanup;
>  
> +	if (type == BPF_LSM_CGROUP)
> +		bpf_cgroup_lsm_shim_release(prog ? : link->link.prog,
> +					    atype);
> +
>  	/* now can actually delete it from this cgroup list */
>  	list_del(&pl->node);
>  	kfree(pl);

[ ... ]

> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 0c4fd194e801..fca1dea786c7 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -11,6 +11,8 @@
>  #include <linux/rcupdate_wait.h>
>  #include <linux/module.h>
>  #include <linux/static_call.h>
> +#include <linux/bpf_verifier.h>
> +#include <linux/bpf_lsm.h>
>  
>  /* dummy _ops. The verifier will operate on target program's ops. */
>  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> @@ -394,6 +396,7 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
>  		return BPF_TRAMP_MODIFY_RETURN;
>  	case BPF_TRACE_FEXIT:
>  		return BPF_TRAMP_FEXIT;
> +	case BPF_LSM_CGROUP:
Considering BPF_LSM_CGROUP is added here and the 'prog' for the
case concerning here is the shim_prog ... (more below)

>  	case BPF_LSM_MAC:
>  		if (!prog->aux->attach_func_proto->type)
>  			/* The function returns void, we cannot modify its
> @@ -485,6 +488,147 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
>  	return err;
>  }
>  
> +static struct bpf_prog *cgroup_shim_alloc(const struct bpf_prog *prog,
> +					  bpf_func_t bpf_func)
> +{
> +	struct bpf_prog *p;
> +
> +	p = bpf_prog_alloc(1, 0);
> +	if (!p)
> +		return NULL;
> +
> +	p->jited = false;
> +	p->bpf_func = bpf_func;
> +
> +	p->aux->cgroup_atype = prog->aux->cgroup_atype;
> +	p->aux->attach_func_proto = prog->aux->attach_func_proto;
> +	p->aux->attach_btf_id = prog->aux->attach_btf_id;
> +	p->aux->attach_btf = prog->aux->attach_btf;
> +	btf_get(p->aux->attach_btf);
> +	p->type = BPF_PROG_TYPE_LSM;
> +	p->expected_attach_type = BPF_LSM_MAC;
... should this be BPF_LSM_CGROUP instead ?

or the above 'case BPF_LSM_CGROUP:' addition is not needed ?

> +	bpf_prog_inc(p);
> +
> +	return p;
> +}
> +
> +static struct bpf_prog *cgroup_shim_find(struct bpf_trampoline *tr,
> +					 bpf_func_t bpf_func)
> +{
> +	const struct bpf_prog_aux *aux;
> +	int kind;
> +
> +	for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
Can bpf_attach_type_to_tramp() be used here instead of
looping all ?

> +		hlist_for_each_entry(aux, &tr->progs_hlist[kind], tramp_hlist) {
> +			struct bpf_prog *p = aux->prog;
> +
> +			if (!p->jited && p->bpf_func == bpf_func)
Is the "!p->jited" test needed ?

> +				return p;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> +				    struct bpf_attach_target_info *tgt_info)
> +{
> +	struct bpf_prog *shim_prog = NULL;
> +	struct bpf_trampoline *tr;
> +	bpf_func_t bpf_func;
> +	u64 key;
> +	int err;
> +
> +	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> +					 prog->aux->attach_btf_id);
> +
> +	err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> +	if (err)
> +		return err;
> +
> +	tr = bpf_trampoline_get(key, tgt_info);
> +	if (!tr)
> +		return  -ENOMEM;
> +
> +	mutex_lock(&tr->mutex);
> +
> +	shim_prog = cgroup_shim_find(tr, bpf_func);
> +	if (shim_prog) {
> +		/* Reusing existing shim attached by the other program.
> +		 */
The shim_prog is reused by >1 BPF_LSM_CGROUP progs and
shim_prog is hidden from the userspace also (no id), so it may worth
to bring this up:

In __bpf_prog_enter(), other than some bpf stats of the shim_prog
will become useless which is a very minor thing, it is also checking
shim_prog->active and bump the misses counter.  Now, the misses counter
is no longer visible to users.  Since it is actually running the cgroup prog,
may be there is no need for the active check ?

> +		bpf_prog_inc(shim_prog);
> +		mutex_unlock(&tr->mutex);
> +		return 0;
> +	}
> +
> +	/* Allocate and install new shim.
> +	 */
> +
> +	shim_prog = cgroup_shim_alloc(prog, bpf_func);
> +	if (!shim_prog) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	err = __bpf_trampoline_link_prog(shim_prog, tr);
> +	if (err)
> +		goto out;
> +
> +	mutex_unlock(&tr->mutex);
> +
> +	return 0;
> +out:
> +	if (shim_prog)
> +		bpf_prog_put(shim_prog);
> +
> +	mutex_unlock(&tr->mutex);
> +	return err;
> +}
> +
> +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
> +{
> +	struct bpf_prog *shim_prog;
> +	struct bpf_trampoline *tr;
> +	bpf_func_t bpf_func;
> +	u64 key;
> +	int err;
> +
> +	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> +					 prog->aux->attach_btf_id);
> +
> +	err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> +	if (err)
> +		return;
> +
> +	tr = bpf_trampoline_lookup(key);
> +	if (!tr)
> +		return;
> +
> +	mutex_lock(&tr->mutex);
> +
> +	shim_prog = cgroup_shim_find(tr, bpf_func);
> +	if (shim_prog) {
> +		/* We use shim_prog refcnt for tracking whether to
> +		 * remove the shim program from the trampoline.
> +		 * Trampoline's mutex is held while refcnt is
> +		 * added/subtracted so we don't need to care about
> +		 * potential races.
> +		 */
> +
> +		if (atomic64_read(&shim_prog->aux->refcnt) == 1)
> +			WARN_ON_ONCE(__bpf_trampoline_unlink_prog(shim_prog, tr));
> +
> +		bpf_prog_put(shim_prog);
> +	}
> +
> +	mutex_unlock(&tr->mutex);
> +
> +	bpf_trampoline_put(tr); /* bpf_trampoline_lookup */
> +
> +	if (shim_prog)
> +		bpf_trampoline_put(tr);
> +}
> +

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-07 22:31 ` [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
@ 2022-04-08 22:56   ` Martin KaFai Lau
  2022-04-09 17:04     ` Jakub Sitnicki
  2022-04-11 18:46     ` Stanislav Fomichev
  0 siblings, 2 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-08 22:56 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> Previous patch adds 1:1 mapping between all 211 LSM hooks
> and bpf_cgroup program array. Instead of reserving a slot per
> possible hook, reserve 10 slots per cgroup for lsm programs.
> Those slots are dynamically allocated on demand and reclaimed.
> This still adds some bloat to the cgroup and brings us back to
> roughly pre-cgroup_bpf_attach_type times.
> 
> It should be possible to eventually extend this idea to all hooks if
> the memory consumption is unacceptable and shrink overall effective
> programs array.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf-cgroup-defs.h |  4 +-
>  include/linux/bpf_lsm.h         |  6 ---
>  kernel/bpf/bpf_lsm.c            |  9 ++--
>  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
>  4 files changed, 90 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> index 6c661b4df9fa..d42516e86b3a 100644
> --- a/include/linux/bpf-cgroup-defs.h
> +++ b/include/linux/bpf-cgroup-defs.h
> @@ -10,7 +10,9 @@
>  
>  struct bpf_prog_array;
>  
> -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> + */
> +#define CGROUP_LSM_NUM 10
hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
a static 211 (and potentially growing in the future) is not good either.
I currently do not have a better idea also. :/

Have you thought about other dynamic schemes or they would be too slow ?

>  enum cgroup_bpf_attach_type {
>  	CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> index 7f0e59f5f9be..613de44aa429 100644
> --- a/include/linux/bpf_lsm.h
> +++ b/include/linux/bpf_lsm.h
> @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
>  void bpf_inode_storage_free(struct inode *inode);
>  
>  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> -int bpf_lsm_hook_idx(u32 btf_id);
>  
>  #else /* !CONFIG_BPF_LSM */
>  
> @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
>  	return -ENOENT;
>  }
>  
> -static inline int bpf_lsm_hook_idx(u32 btf_id)
> -{
> -	return -EINVAL;
> -}
> -
>  #endif /* CONFIG_BPF_LSM */
>  
>  #endif /* _LINUX_BPF_LSM_H */
> diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> index eca258ba71d8..8b948ec9ab73 100644
> --- a/kernel/bpf/bpf_lsm.c
> +++ b/kernel/bpf/bpf_lsm.c
> @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
>  	if (unlikely(!sk))
>  		return 0;
>  
> +	rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
>  	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
>  	if (likely(cgrp))
>  		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
>  					    ctx, bpf_prog_run, 0);
> +	rcu_read_unlock();
>  	return ret;
>  }
>  
> @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
>  	/*prog = container_of(insn, struct bpf_prog, insnsi);*/
>  	prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
>  
> -	rcu_read_lock();
> +	rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
I think this is also needed for task_dfl_cgroup().  If yes,
will be a good idea to adjust the comment if it ends up
using the 'CGROUP_LSM_NUM 10' scheme.

While at rcu_read_lock(), have you thought about what major things are
needed to make BPF_LSM_CGROUP sleepable ?

The cgroup local storage could be one that require changes but it seems
the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
The current use case doesn't need it?  

>  	cgrp = task_dfl_cgroup(current);
>  	if (likely(cgrp))
>  		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> @@ -122,11 +124,6 @@ int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
>  	return 0;
>  }
>  
> -int bpf_lsm_hook_idx(u32 btf_id)
> -{
> -	return btf_id_set_index(&bpf_lsm_hooks, btf_id);
> -}
> -
>  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
>  			const struct bpf_prog *prog)
>  {

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-08 22:56   ` Martin KaFai Lau
@ 2022-04-09 17:04     ` Jakub Sitnicki
  2022-04-11 18:44       ` Stanislav Fomichev
  2022-04-12  1:19       ` Martin KaFai Lau
  2022-04-11 18:46     ` Stanislav Fomichev
  1 sibling, 2 replies; 33+ messages in thread
From: Jakub Sitnicki @ 2022-04-09 17:04 UTC (permalink / raw)
  To: Stanislav Fomichev, Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Apr 08, 2022 at 03:56 PM -07, Martin KaFai Lau wrote:
> On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
>> Previous patch adds 1:1 mapping between all 211 LSM hooks
>> and bpf_cgroup program array. Instead of reserving a slot per
>> possible hook, reserve 10 slots per cgroup for lsm programs.
>> Those slots are dynamically allocated on demand and reclaimed.
>> This still adds some bloat to the cgroup and brings us back to
>> roughly pre-cgroup_bpf_attach_type times.
>> 
>> It should be possible to eventually extend this idea to all hooks if
>> the memory consumption is unacceptable and shrink overall effective
>> programs array.
>> 
>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> ---
>>  include/linux/bpf-cgroup-defs.h |  4 +-
>>  include/linux/bpf_lsm.h         |  6 ---
>>  kernel/bpf/bpf_lsm.c            |  9 ++--
>>  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
>>  4 files changed, 90 insertions(+), 25 deletions(-)
>> 
>> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
>> index 6c661b4df9fa..d42516e86b3a 100644
>> --- a/include/linux/bpf-cgroup-defs.h
>> +++ b/include/linux/bpf-cgroup-defs.h
>> @@ -10,7 +10,9 @@
>>  
>>  struct bpf_prog_array;
>>  
>> -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
>> +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
>> + */
>> +#define CGROUP_LSM_NUM 10
> hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> a static 211 (and potentially growing in the future) is not good either.
> I currently do not have a better idea also. :/
>
> Have you thought about other dynamic schemes or they would be too slow ?

As long as we're talking ideas - how about a 2-level lookup?

L1: 0..255 -> { 0..31, -1 }, where -1 is inactive cgroup_bp_attach_type
L2: 0..31 -> struct bpf_prog_array * for cgroup->bpf.effective[],
             struct hlist_head [^1]  for cgroup->bpf.progs[],
             u32                     for cgroup->bpf.flags[],

This way we could have 32 distinct _active_ attachment types for each
cgroup instance, to be shared among regular cgroup attach types and BPF
LSM attach types.

It is 9 extra slots in comparison to today, so if anyone has cgroups
that make use of all available attach types at the same time, we don't
break their setup.

The L1 lookup table would still a few slots for new cgroup [^2] or LSM
hooks:

  256 - 23 (cgroup attach types) - 211 (LSM hooks) = 22

Memory bloat:

 +256 B - L1 lookup table
 + 72 B - extra effective[] slots
 + 72 B - extra progs[] slots
 + 36 B - extra flags[] slots
 -184 B - savings from switching to hlist_head
 ------
 +252 B per cgroup instance

Total cgroup_bpf{} size change - 720 B -> 968 B.

WDYT?

[^1] It looks like we can easily switch from cgroup->bpf.progs[] from
     list_head to hlist_head and save some bytes!

     We only access the list tail in __cgroup_bpf_attach(). We can
     either iterate over the list and eat the cost there or push the new
     prog onto the front.

     I think we treat cgroup->bpf.progs[] everywhere like an unordered
     set. Except for __cgroup_bpf_query, where the user might notice the
     order change in the BPF_PROG_QUERY dump.

[^2] Unrelated, but we would like to propose a
     CGROUP_INET[46]_POST_CONNECT hook in the near future to make it
     easier to bind UDP sockets to 4-tuple without creating conflicts:

     https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx/ebpf_connect4
 
[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-07 22:31 ` [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor Stanislav Fomichev
                     ` (2 preceding siblings ...)
  2022-04-08 22:12   ` Martin KaFai Lau
@ 2022-04-11  8:26   ` Dan Carpenter
  3 siblings, 0 replies; 33+ messages in thread
From: Dan Carpenter @ 2022-04-11  8:26 UTC (permalink / raw)
  To: kbuild, Stanislav Fomichev, netdev, bpf
  Cc: lkp, kbuild-all, ast, daniel, andrii, Stanislav Fomichev

Hi Stanislav,

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/bpf-cgroup_sock-lsm-flavor/20220408-063705
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: openrisc-randconfig-m031-20220408 (https://download.01.org/0day-ci/archive/20220409/202204090535.gy7lTeMG-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 11.2.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
kernel/bpf/cgroup.c:575 __cgroup_bpf_attach() warn: missing error code 'err'

vim +/err +575 kernel/bpf/cgroup.c

588e5d8766486e He Fengqing        2021-10-29  471  static int __cgroup_bpf_attach(struct cgroup *cgrp,
af6eea57437a83 Andrii Nakryiko    2020-03-29  472  			       struct bpf_prog *prog, struct bpf_prog *replace_prog,
af6eea57437a83 Andrii Nakryiko    2020-03-29  473  			       struct bpf_cgroup_link *link,
324bda9e6c5add Alexei Starovoitov 2017-10-02  474  			       enum bpf_attach_type type, u32 flags)
3007098494bec6 Daniel Mack        2016-11-23  475  {
7dd68b3279f179 Andrey Ignatov     2019-12-18  476  	u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
324bda9e6c5add Alexei Starovoitov 2017-10-02  477  	struct bpf_prog *old_prog = NULL;
62039c30c19dca Andrii Nakryiko    2020-03-09  478  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
7d9c3427894fe7 YiFei Zhu          2020-07-23  479  	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  480  	struct bpf_attach_target_info tgt_info = {};
6fc88c354f3af8 Dave Marchevsky    2021-08-19  481  	enum cgroup_bpf_attach_type atype;
af6eea57437a83 Andrii Nakryiko    2020-03-29  482  	struct bpf_prog_list *pl;
6fc88c354f3af8 Dave Marchevsky    2021-08-19  483  	struct list_head *progs;
324bda9e6c5add Alexei Starovoitov 2017-10-02  484  	int err;
324bda9e6c5add Alexei Starovoitov 2017-10-02  485  
7dd68b3279f179 Andrey Ignatov     2019-12-18  486  	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
7dd68b3279f179 Andrey Ignatov     2019-12-18  487  	    ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
324bda9e6c5add Alexei Starovoitov 2017-10-02  488  		/* invalid combination */
324bda9e6c5add Alexei Starovoitov 2017-10-02  489  		return -EINVAL;
af6eea57437a83 Andrii Nakryiko    2020-03-29  490  	if (link && (prog || replace_prog))
af6eea57437a83 Andrii Nakryiko    2020-03-29  491  		/* only either link or prog/replace_prog can be specified */
af6eea57437a83 Andrii Nakryiko    2020-03-29  492  		return -EINVAL;
af6eea57437a83 Andrii Nakryiko    2020-03-29  493  	if (!!replace_prog != !!(flags & BPF_F_REPLACE))
af6eea57437a83 Andrii Nakryiko    2020-03-29  494  		/* replace_prog implies BPF_F_REPLACE, and vice versa */
af6eea57437a83 Andrii Nakryiko    2020-03-29  495  		return -EINVAL;
324bda9e6c5add Alexei Starovoitov 2017-10-02  496  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  497  	if (type == BPF_LSM_CGROUP) {
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  498  		struct bpf_prog *p = prog ? : link->link.prog;
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  499  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  500  		if (replace_prog) {
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  501  			/* Reusing shim from the original program.
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  502  			 */
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  503  			atype = replace_prog->aux->cgroup_atype;
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  504  		} else {
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  505  			err = bpf_check_attach_target(NULL, p, NULL,
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  506  						      p->aux->attach_btf_id,
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  507  						      &tgt_info);
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  508  			if (err)
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  509  				return -EINVAL;
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  510  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  511  			atype = bpf_lsm_attach_type_get(p->aux->attach_btf_id);
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  512  			if (atype < 0)
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  513  				return atype;
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  514  		}
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  515  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  516  		p->aux->cgroup_atype = atype;
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  517  	} else {
6fc88c354f3af8 Dave Marchevsky    2021-08-19  518  		atype = to_cgroup_bpf_attach_type(type);
6fc88c354f3af8 Dave Marchevsky    2021-08-19  519  		if (atype < 0)
6fc88c354f3af8 Dave Marchevsky    2021-08-19  520  			return -EINVAL;
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  521  	}
6fc88c354f3af8 Dave Marchevsky    2021-08-19  522  
6fc88c354f3af8 Dave Marchevsky    2021-08-19  523  	progs = &cgrp->bpf.progs[atype];
6fc88c354f3af8 Dave Marchevsky    2021-08-19  524  
6fc88c354f3af8 Dave Marchevsky    2021-08-19  525  	if (!hierarchy_allows_attach(cgrp, atype))
7f677633379b4a Alexei Starovoitov 2017-02-10  526  		return -EPERM;
7f677633379b4a Alexei Starovoitov 2017-02-10  527  
6fc88c354f3af8 Dave Marchevsky    2021-08-19  528  	if (!list_empty(progs) && cgrp->bpf.flags[atype] != saved_flags)
324bda9e6c5add Alexei Starovoitov 2017-10-02  529  		/* Disallow attaching non-overridable on top
324bda9e6c5add Alexei Starovoitov 2017-10-02  530  		 * of existing overridable in this cgroup.
324bda9e6c5add Alexei Starovoitov 2017-10-02  531  		 * Disallow attaching multi-prog if overridable or none
7f677633379b4a Alexei Starovoitov 2017-02-10  532  		 */
7f677633379b4a Alexei Starovoitov 2017-02-10  533  		return -EPERM;
7f677633379b4a Alexei Starovoitov 2017-02-10  534  
324bda9e6c5add Alexei Starovoitov 2017-10-02  535  	if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
324bda9e6c5add Alexei Starovoitov 2017-10-02  536  		return -E2BIG;
324bda9e6c5add Alexei Starovoitov 2017-10-02  537  
af6eea57437a83 Andrii Nakryiko    2020-03-29  538  	pl = find_attach_entry(progs, prog, link, replace_prog,
af6eea57437a83 Andrii Nakryiko    2020-03-29  539  			       flags & BPF_F_ALLOW_MULTI);
af6eea57437a83 Andrii Nakryiko    2020-03-29  540  	if (IS_ERR(pl))
af6eea57437a83 Andrii Nakryiko    2020-03-29  541  		return PTR_ERR(pl);
324bda9e6c5add Alexei Starovoitov 2017-10-02  542  
7d9c3427894fe7 YiFei Zhu          2020-07-23  543  	if (bpf_cgroup_storages_alloc(storage, new_storage, type,
7d9c3427894fe7 YiFei Zhu          2020-07-23  544  				      prog ? : link->link.prog, cgrp))
324bda9e6c5add Alexei Starovoitov 2017-10-02  545  		return -ENOMEM;
d7bf2c10af0531 Roman Gushchin     2018-08-02  546  
af6eea57437a83 Andrii Nakryiko    2020-03-29  547  	if (pl) {
1020c1f24a946e Andrey Ignatov     2019-12-18  548  		old_prog = pl->prog;
324bda9e6c5add Alexei Starovoitov 2017-10-02  549  	} else {
324bda9e6c5add Alexei Starovoitov 2017-10-02  550  		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
d7bf2c10af0531 Roman Gushchin     2018-08-02  551  		if (!pl) {
7d9c3427894fe7 YiFei Zhu          2020-07-23  552  			bpf_cgroup_storages_free(new_storage);
324bda9e6c5add Alexei Starovoitov 2017-10-02  553  			return -ENOMEM;
d7bf2c10af0531 Roman Gushchin     2018-08-02  554  		}
324bda9e6c5add Alexei Starovoitov 2017-10-02  555  		list_add_tail(&pl->node, progs);
324bda9e6c5add Alexei Starovoitov 2017-10-02  556  	}
1020c1f24a946e Andrey Ignatov     2019-12-18  557  
324bda9e6c5add Alexei Starovoitov 2017-10-02  558  	pl->prog = prog;
af6eea57437a83 Andrii Nakryiko    2020-03-29  559  	pl->link = link;
00c4eddf7ee5cb Andrii Nakryiko    2020-03-24  560  	bpf_cgroup_storages_assign(pl->storage, storage);
6fc88c354f3af8 Dave Marchevsky    2021-08-19  561  	cgrp->bpf.flags[atype] = saved_flags;
324bda9e6c5add Alexei Starovoitov 2017-10-02  562  
6fc88c354f3af8 Dave Marchevsky    2021-08-19  563  	err = update_effective_progs(cgrp, atype);
324bda9e6c5add Alexei Starovoitov 2017-10-02  564  	if (err)
324bda9e6c5add Alexei Starovoitov 2017-10-02  565  		goto cleanup;
324bda9e6c5add Alexei Starovoitov 2017-10-02  566  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  567  	bpf_cgroup_storages_link(new_storage, cgrp, type);
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  568  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  569  	if (type == BPF_LSM_CGROUP && !old_prog) {
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  570  		struct bpf_prog *p = prog ? : link->link.prog;
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  571  		int err;

This "err" shadows an earlier declaration

3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  572  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  573  		err = bpf_trampoline_link_cgroup_shim(p, &tgt_info);
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  574  		if (err)
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07 @575  			goto cleanup_trampoline;

and leads to a missing error code bug.

3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  576  	}
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  577  
af6eea57437a83 Andrii Nakryiko    2020-03-29  578  	if (old_prog)
324bda9e6c5add Alexei Starovoitov 2017-10-02  579  		bpf_prog_put(old_prog);
af6eea57437a83 Andrii Nakryiko    2020-03-29  580  	else
6fc88c354f3af8 Dave Marchevsky    2021-08-19  581  		static_branch_inc(&cgroup_bpf_enabled_key[atype]);
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  582  
324bda9e6c5add Alexei Starovoitov 2017-10-02  583  	return 0;
324bda9e6c5add Alexei Starovoitov 2017-10-02  584  
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  585  cleanup_trampoline:
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  586  	bpf_cgroup_storages_unlink(new_storage);
3c3f15b5422ca6 Stanislav Fomichev 2022-04-07  587  
324bda9e6c5add Alexei Starovoitov 2017-10-02  588  cleanup:
af6eea57437a83 Andrii Nakryiko    2020-03-29  589  	if (old_prog) {
324bda9e6c5add Alexei Starovoitov 2017-10-02  590  		pl->prog = old_prog;
af6eea57437a83 Andrii Nakryiko    2020-03-29  591  		pl->link = NULL;
8bad74f9840f87 Roman Gushchin     2018-09-28  592  	}
7d9c3427894fe7 YiFei Zhu          2020-07-23  593  	bpf_cgroup_storages_free(new_storage);
af6eea57437a83 Andrii Nakryiko    2020-03-29  594  	if (!old_prog) {
324bda9e6c5add Alexei Starovoitov 2017-10-02  595  		list_del(&pl->node);
324bda9e6c5add Alexei Starovoitov 2017-10-02  596  		kfree(pl);
324bda9e6c5add Alexei Starovoitov 2017-10-02  597  	}
324bda9e6c5add Alexei Starovoitov 2017-10-02  598  	return err;
324bda9e6c5add Alexei Starovoitov 2017-10-02  599  }

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-09 17:04     ` Jakub Sitnicki
@ 2022-04-11 18:44       ` Stanislav Fomichev
  2022-04-15 17:39         ` Jakub Sitnicki
  2022-04-12  1:19       ` Martin KaFai Lau
  1 sibling, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-11 18:44 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: Martin KaFai Lau, netdev, bpf, ast, daniel, andrii

On Sat, Apr 9, 2022 at 11:10 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Fri, Apr 08, 2022 at 03:56 PM -07, Martin KaFai Lau wrote:
> > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> >> Previous patch adds 1:1 mapping between all 211 LSM hooks
> >> and bpf_cgroup program array. Instead of reserving a slot per
> >> possible hook, reserve 10 slots per cgroup for lsm programs.
> >> Those slots are dynamically allocated on demand and reclaimed.
> >> This still adds some bloat to the cgroup and brings us back to
> >> roughly pre-cgroup_bpf_attach_type times.
> >>
> >> It should be possible to eventually extend this idea to all hooks if
> >> the memory consumption is unacceptable and shrink overall effective
> >> programs array.
> >>
> >> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >> ---
> >>  include/linux/bpf-cgroup-defs.h |  4 +-
> >>  include/linux/bpf_lsm.h         |  6 ---
> >>  kernel/bpf/bpf_lsm.c            |  9 ++--
> >>  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> >>  4 files changed, 90 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> >> index 6c661b4df9fa..d42516e86b3a 100644
> >> --- a/include/linux/bpf-cgroup-defs.h
> >> +++ b/include/linux/bpf-cgroup-defs.h
> >> @@ -10,7 +10,9 @@
> >>
> >>  struct bpf_prog_array;
> >>
> >> -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> >> +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> >> + */
> >> +#define CGROUP_LSM_NUM 10
> > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > a static 211 (and potentially growing in the future) is not good either.
> > I currently do not have a better idea also. :/
> >
> > Have you thought about other dynamic schemes or they would be too slow ?
>
> As long as we're talking ideas - how about a 2-level lookup?
>
> L1: 0..255 -> { 0..31, -1 }, where -1 is inactive cgroup_bp_attach_type
> L2: 0..31 -> struct bpf_prog_array * for cgroup->bpf.effective[],
>              struct hlist_head [^1]  for cgroup->bpf.progs[],
>              u32                     for cgroup->bpf.flags[],
>
> This way we could have 32 distinct _active_ attachment types for each
> cgroup instance, to be shared among regular cgroup attach types and BPF
> LSM attach types.
>
> It is 9 extra slots in comparison to today, so if anyone has cgroups
> that make use of all available attach types at the same time, we don't
> break their setup.
>
> The L1 lookup table would still a few slots for new cgroup [^2] or LSM
> hooks:
>
>   256 - 23 (cgroup attach types) - 211 (LSM hooks) = 22
>
> Memory bloat:
>
>  +256 B - L1 lookup table
>  + 72 B - extra effective[] slots
>  + 72 B - extra progs[] slots
>  + 36 B - extra flags[] slots
>  -184 B - savings from switching to hlist_head
>  ------
>  +252 B per cgroup instance
>
> Total cgroup_bpf{} size change - 720 B -> 968 B.
>
> WDYT?

Sounds workable, thanks! Let me try and see how it goes. I guess we
don't even have to increase the size of the effective array with this
mode,;having 23 unique slots per cgroup seems like a good start? So
the cgroup_bpf{} growth would be +256B L1 (technically, we only need 5
bits per entry, so can shrink to 160B) -185B for hlist_head

> [^1] It looks like we can easily switch from cgroup->bpf.progs[] from
>      list_head to hlist_head and save some bytes!
>
>      We only access the list tail in __cgroup_bpf_attach(). We can
>      either iterate over the list and eat the cost there or push the new
>      prog onto the front.
>
>      I think we treat cgroup->bpf.progs[] everywhere like an unordered
>      set. Except for __cgroup_bpf_query, where the user might notice the
>      order change in the BPF_PROG_QUERY dump.


[...]

> [^2] Unrelated, but we would like to propose a
>      CGROUP_INET[46]_POST_CONNECT hook in the near future to make it
>      easier to bind UDP sockets to 4-tuple without creating conflicts:
>
>      https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx/ebpf_connect4

Do you think those new lsm hooks can be used instead? If not, what's missing?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-08 22:56   ` Martin KaFai Lau
  2022-04-09 17:04     ` Jakub Sitnicki
@ 2022-04-11 18:46     ` Stanislav Fomichev
  2022-04-12  1:36       ` Martin KaFai Lau
  1 sibling, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-11 18:46 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > and bpf_cgroup program array. Instead of reserving a slot per
> > possible hook, reserve 10 slots per cgroup for lsm programs.
> > Those slots are dynamically allocated on demand and reclaimed.
> > This still adds some bloat to the cgroup and brings us back to
> > roughly pre-cgroup_bpf_attach_type times.
> >
> > It should be possible to eventually extend this idea to all hooks if
> > the memory consumption is unacceptable and shrink overall effective
> > programs array.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/bpf-cgroup-defs.h |  4 +-
> >  include/linux/bpf_lsm.h         |  6 ---
> >  kernel/bpf/bpf_lsm.c            |  9 ++--
> >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> >  4 files changed, 90 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > index 6c661b4df9fa..d42516e86b3a 100644
> > --- a/include/linux/bpf-cgroup-defs.h
> > +++ b/include/linux/bpf-cgroup-defs.h
> > @@ -10,7 +10,9 @@
> >
> >  struct bpf_prog_array;
> >
> > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > + */
> > +#define CGROUP_LSM_NUM 10
> hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> a static 211 (and potentially growing in the future) is not good either.
> I currently do not have a better idea also. :/
>
> Have you thought about other dynamic schemes or they would be too slow ?
>
> >  enum cgroup_bpf_attach_type {
> >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > index 7f0e59f5f9be..613de44aa429 100644
> > --- a/include/linux/bpf_lsm.h
> > +++ b/include/linux/bpf_lsm.h
> > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> >  void bpf_inode_storage_free(struct inode *inode);
> >
> >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > -int bpf_lsm_hook_idx(u32 btf_id);
> >
> >  #else /* !CONFIG_BPF_LSM */
> >
> > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> >       return -ENOENT;
> >  }
> >
> > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > -{
> > -     return -EINVAL;
> > -}
> > -
> >  #endif /* CONFIG_BPF_LSM */
> >
> >  #endif /* _LINUX_BPF_LSM_H */
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index eca258ba71d8..8b948ec9ab73 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> >       if (unlikely(!sk))
> >               return 0;
> >
> > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> >       if (likely(cgrp))
> >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> >                                           ctx, bpf_prog_run, 0);
> > +     rcu_read_unlock();
> >       return ret;
> >  }
> >
> > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> >
> > -     rcu_read_lock();
> > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> I think this is also needed for task_dfl_cgroup().  If yes,
> will be a good idea to adjust the comment if it ends up
> using the 'CGROUP_LSM_NUM 10' scheme.
>
> While at rcu_read_lock(), have you thought about what major things are
> needed to make BPF_LSM_CGROUP sleepable ?
>
> The cgroup local storage could be one that require changes but it seems
> the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> The current use case doesn't need it?

No, I haven't thought about sleepable at all yet :-( But seems like
having that rcu lock here might be problematic if we want to sleep? In
this case, Jakub's suggestion seems better.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-08 22:12   ` Martin KaFai Lau
@ 2022-04-11 19:07     ` Stanislav Fomichev
  2022-04-12  1:04       ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-11 19:07 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

":  , wi



On Fri, Apr 8, 2022 at 3:13 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Apr 07, 2022 at 03:31:07PM -0700, Stanislav Fomichev wrote:
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index 064eccba641d..eca258ba71d8 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -35,6 +35,98 @@ BTF_SET_START(bpf_lsm_hooks)
> >  #undef LSM_HOOK
> >  BTF_SET_END(bpf_lsm_hooks)
> >
> > +static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > +                                             const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *prog;
> > +     struct socket *sock;
> > +     struct cgroup *cgrp;
> > +     struct sock *sk;
> > +     int ret = 0;
> > +     u64 *regs;
> > +
> > +     regs = (u64 *)ctx;
> > +     sock = (void *)(unsigned long)regs[BPF_REG_0];
> > +     /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> nit. Rename prog to shim_prog.
>
> > +
> > +     if (unlikely(!sock))
> Is it possible in the lsm hooks?  Can these hooks
> be rejected at the load time instead?

Doesn't seem like it can be null, at least from the quick review that
I had; I'll take a deeper look.
I guess in general I wanted to be more defensive here because there
are 200+ hooks, the new ones might arrive, and it's better to have the
check?

> > +             return 0;
> > +
> > +     sk = sock->sk;
> > +     if (unlikely(!sk))
> Same here.
>
> > +             return 0;
> > +
> > +     cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > +     if (likely(cgrp))
> > +             ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > +                                         ctx, bpf_prog_run, 0);
> > +     return ret;
> > +}
> > +
> > +static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > +                                              const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *prog;
> > +     struct cgroup *cgrp;
> > +     int ret = 0;
> > +
> > +     if (unlikely(!current))
> > +             return 0;
> > +
> > +     /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> nit. shim_prog here also.
>
> > +
> > +     rcu_read_lock();
> > +     cgrp = task_dfl_cgroup(current);
> > +     if (likely(cgrp))
> > +             ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > +                                         ctx, bpf_prog_run, 0);
> > +     rcu_read_unlock();
> > +     return ret;
> > +}
> > +
> > +int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > +                          bpf_func_t *bpf_func)
> > +{
> > +     const struct btf_type *first_arg_type;
> > +     const struct btf_type *sock_type;
> > +     const struct btf *btf_vmlinux;
> > +     const struct btf_param *args;
> > +     s32 type_id;
> > +
> > +     if (!prog->aux->attach_func_proto ||
> > +         !btf_type_is_func_proto(prog->aux->attach_func_proto))
> Are these cases possible at the attaching time or they have already been
> rejected at the load time?  If it is the latter, these tests can be
> removed.

I think you're right, should be rejected at loading time, I'll check.

> > +             return -EINVAL;
> > +
> > +     if (btf_type_vlen(prog->aux->attach_func_proto) < 1)
> Is it consistent with the existing BPF_LSM_MAC?
> or is there something special about BPF_LSM_CGROUP that
> it cannot support this func ?

Looks like there is a lsm hook that doesn't take any arguments, so
yeah, it's inconsistent, I'll have to fix that, thanks!

> > +             return -EINVAL;
> > +
> > +     args = (const struct btf_param *)(prog->aux->attach_func_proto + 1);
> nit.
>         args = btf_params(prog->aux->attach_func_proto);
>
> > +
> > +     btf_vmlinux = bpf_get_btf_vmlinux();
> > +     if (!btf_vmlinux)
> > +             return -EINVAL;
> > +
> > +     type_id = btf_find_by_name_kind(btf_vmlinux, "socket", BTF_KIND_STRUCT);
> > +     if (type_id < 0)
> > +             return -EINVAL;
> > +     sock_type = btf_type_by_id(btf_vmlinux, type_id);
> > +
> > +     first_arg_type = btf_type_resolve_ptr(btf_vmlinux, args[0].type, NULL);
> > +     if (first_arg_type == sock_type)
> > +             *bpf_func = __cgroup_bpf_run_lsm_socket;
> > +     else
> > +             *bpf_func = __cgroup_bpf_run_lsm_current;
> > +
> > +     return 0;
> > +}
> > +
> > +int bpf_lsm_hook_idx(u32 btf_id)
> > +{
> > +     return btf_id_set_index(&bpf_lsm_hooks, btf_id);
> > +}
> > +
> >  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
> >                       const struct bpf_prog *prog)
> >  {
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 0918a39279f6..4199de31f49c 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -4971,6 +4971,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >
> >       if (arg == nr_args) {
> >               switch (prog->expected_attach_type) {
> > +             case BPF_LSM_CGROUP:
> >               case BPF_LSM_MAC:
> >               case BPF_TRACE_FEXIT:
> >                       /* When LSM programs are attached to void LSM hooks
> > @@ -6396,6 +6397,16 @@ static int btf_id_cmp_func(const void *a, const void *b)
> >       return *pa - *pb;
> >  }
> >
> > +int btf_id_set_index(const struct btf_id_set *set, u32 id)
> > +{
> > +     const u32 *p;
> > +
> > +     p = bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func);
> > +     if (!p)
> > +             return -1;
> > +     return p - set->ids;
> > +}
> > +
> >  bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
> >  {
> >       return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index 128028efda64..8c77703954f7 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -14,6 +14,9 @@
> >  #include <linux/string.h>
> >  #include <linux/bpf.h>
> >  #include <linux/bpf-cgroup.h>
> > +#include <linux/btf_ids.h>
> > +#include <linux/bpf_lsm.h>
> > +#include <linux/bpf_verifier.h>
> >  #include <net/sock.h>
> >  #include <net/bpf_sk_storage.h>
> >
> > @@ -22,6 +25,18 @@
> >  DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE);
> >  EXPORT_SYMBOL(cgroup_bpf_enabled_key);
> >
> > +#ifdef CONFIG_BPF_LSM
> > +static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
> > +{
> > +     return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
> > +}
> > +#else
> > +static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
> > +{
> > +     return -EOPNOTSUPP;
> > +}
> > +#endif
> > +
> >  void cgroup_bpf_offline(struct cgroup *cgrp)
> >  {
> >       cgroup_get(cgrp);
> > @@ -89,6 +104,14 @@ static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
> >               bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
> >  }
> >
> > +static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
> > +{
> > +     enum bpf_cgroup_storage_type stype;
> > +
> > +     for_each_cgroup_storage_type(stype)
> > +             bpf_cgroup_storage_unlink(storages[stype]);
> > +}
> > +
> >  /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
> >   * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
> >   * doesn't free link memory, which will eventually be done by bpf_link's
> > @@ -100,6 +123,15 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
> >       link->cgroup = NULL;
> >  }
> >
> > +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
> > +                                     enum cgroup_bpf_attach_type atype)
> > +{
> > +     if (!prog || atype != prog->aux->cgroup_atype)
> prog cannot be NULL here, no?
>
> The 'atype != prog->aux->cgroup_atype' looks suspicious also considering
> prog->aux->cgroup_atype is only initialized (and meaningful) for BPF_LSM_CGROUP.
> I suspect incorrectly passing this test will crash in the below
> bpf_trampoline_unlink_cgroup_shim(). More on this later.
>
> > +             return;
> > +
> > +     bpf_trampoline_unlink_cgroup_shim(prog);
> > +}
> > +
> >  /**
> >   * cgroup_bpf_release() - put references of all bpf programs and
> >   *                        release all cgroup bpf data
> > @@ -123,10 +155,16 @@ static void cgroup_bpf_release(struct work_struct *work)
> Copying some missing loop context here:
>
>         for (atype = 0; atype < ARRAY_SIZE(cgrp->bpf.progs); atype++) {
>                 struct list_head *progs = &cgrp->bpf.progs[atype];
>                 struct bpf_prog_list *pl, *pltmp;
>
> >
> >               list_for_each_entry_safe(pl, pltmp, progs, node) {
> >                       list_del(&pl->node);
> > -                     if (pl->prog)
> > +                     if (pl->prog) {
> > +                             bpf_cgroup_lsm_shim_release(pl->prog,
> > +                                                         atype);
> atype could be 0 (CGROUP_INET_INGRESS) here.  bpf_cgroup_lsm_shim_release()
> above will go ahead with bpf_trampoline_unlink_cgroup_shim().
> It will break some of the assumptions.  e.g. prog->aux->attach_btf is NULL
> for CGROUP_INET_INGRESS.
>
> Instead, only call bpf_cgroup_lsm_shim_release() for BPF_LSM_CGROUP ?
>
> If the above observation is sane, I wonder if the existing test_progs
> have uncovered it or may be the existing tests just always detach
> cleanly itself before cleaning the cgroup which then avoided this case.

Might be what's happening here:

https://github.com/kernel-patches/bpf/runs/5876983908?check_suite_focus=true

Although, I'm not sure why it's z15 only. Good point on filtering by
BPF_LSM_CGROUP, will do.

> >                               bpf_prog_put(pl->prog);
> > -                     if (pl->link)
> > +                     }
> > +                     if (pl->link) {
> > +                             bpf_cgroup_lsm_shim_release(pl->link->link.prog,
> > +                                                         atype);
> >                               bpf_cgroup_link_auto_detach(pl->link);
> > +                     }
> >                       kfree(pl);
> >                       static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> >               }
> > @@ -439,6 +477,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       struct bpf_prog *old_prog = NULL;
> >       struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> >       struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> > +     struct bpf_attach_target_info tgt_info = {};
> >       enum cgroup_bpf_attach_type atype;
> >       struct bpf_prog_list *pl;
> >       struct list_head *progs;
> > @@ -455,9 +494,31 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >               /* replace_prog implies BPF_F_REPLACE, and vice versa */
> >               return -EINVAL;
> >
> > -     atype = to_cgroup_bpf_attach_type(type);
> > -     if (atype < 0)
> > -             return -EINVAL;
> > +     if (type == BPF_LSM_CGROUP) {
> > +             struct bpf_prog *p = prog ? : link->link.prog;
> > +
> > +             if (replace_prog) {
> > +                     /* Reusing shim from the original program.
> > +                      */
> > +                     atype = replace_prog->aux->cgroup_atype;
> > +             } else {
> > +                     err = bpf_check_attach_target(NULL, p, NULL,
> > +                                                   p->aux->attach_btf_id,
> > +                                                   &tgt_info);
> > +                     if (err)
> > +                             return -EINVAL;
> > +
> > +                     atype = bpf_lsm_attach_type_get(p->aux->attach_btf_id);
> > +                     if (atype < 0)
> > +                             return atype;
> > +             }
> > +
> > +             p->aux->cgroup_atype = atype;
> hmm.... not sure about this assignment for the replace_prog case.
> In particular, the attaching prog's cgroup_atype can be decided
> by the replace_prog's cgroup_atype?  Was there some checks
> before to ensure the replace_prog and the attaching prog have
> the same attach_btf_id?

I was assuming that yes, there should be some checks to confirm we are
replacing the prog with the same type. Will verify.

> > +     } else {
> > +             atype = to_cgroup_bpf_attach_type(type);
> > +             if (atype < 0)
> > +                     return -EINVAL;
> > +     }
> >
> >       progs = &cgrp->bpf.progs[atype];
> >
> > @@ -503,13 +564,27 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       if (err)
> >               goto cleanup;
> >
> > +     bpf_cgroup_storages_link(new_storage, cgrp, type);
> > +
> > +     if (type == BPF_LSM_CGROUP && !old_prog) {
> > +             struct bpf_prog *p = prog ? : link->link.prog;
> > +             int err;
> > +
> > +             err = bpf_trampoline_link_cgroup_shim(p, &tgt_info);
> > +             if (err)
> > +                     goto cleanup_trampoline;
> > +     }
> > +
> >       if (old_prog)
> >               bpf_prog_put(old_prog);
> >       else
> >               static_branch_inc(&cgroup_bpf_enabled_key[atype]);
> > -     bpf_cgroup_storages_link(new_storage, cgrp, type);
> > +
> >       return 0;
> >
> > +cleanup_trampoline:
> > +     bpf_cgroup_storages_unlink(new_storage);
> > +
> >  cleanup:
> >       if (old_prog) {
> >               pl->prog = old_prog;
> > @@ -601,9 +676,13 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
> >       struct list_head *progs;
> >       bool found = false;
> >
> > -     atype = to_cgroup_bpf_attach_type(link->type);
> > -     if (atype < 0)
> > -             return -EINVAL;
> > +     if (link->type == BPF_LSM_CGROUP) {
> > +             atype = link->link.prog->aux->cgroup_atype;
> > +     } else {
> > +             atype = to_cgroup_bpf_attach_type(link->type);
> > +             if (atype < 0)
> > +                     return -EINVAL;
> > +     }
> >
> >       progs = &cgrp->bpf.progs[atype];
> >
> > @@ -619,6 +698,9 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
> >       if (!found)
> >               return -ENOENT;
> >
> > +     if (link->type == BPF_LSM_CGROUP)
> > +             new_prog->aux->cgroup_atype = atype;
> > +
> >       old_prog = xchg(&link->link.prog, new_prog);
> >       replace_effective_prog(cgrp, atype, link);
> >       bpf_prog_put(old_prog);
> > @@ -702,9 +784,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >       u32 flags;
> >       int err;
> >
> > -     atype = to_cgroup_bpf_attach_type(type);
> > -     if (atype < 0)
> > -             return -EINVAL;
> > +     if (type == BPF_LSM_CGROUP) {
> > +             struct bpf_prog *p = prog ? : link->link.prog;
> > +
> > +             atype = p->aux->cgroup_atype;
> > +     } else {
> > +             atype = to_cgroup_bpf_attach_type(type);
> > +             if (atype < 0)
> > +                     return -EINVAL;
> > +     }
> >
> >       progs = &cgrp->bpf.progs[atype];
> >       flags = cgrp->bpf.flags[atype];
> > @@ -726,6 +814,10 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >       if (err)
> >               goto cleanup;
> >
> > +     if (type == BPF_LSM_CGROUP)
> > +             bpf_cgroup_lsm_shim_release(prog ? : link->link.prog,
> > +                                         atype);
> > +
> >       /* now can actually delete it from this cgroup list */
> >       list_del(&pl->node);
> >       kfree(pl);
>
> [ ... ]
>
> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > index 0c4fd194e801..fca1dea786c7 100644
> > --- a/kernel/bpf/trampoline.c
> > +++ b/kernel/bpf/trampoline.c
> > @@ -11,6 +11,8 @@
> >  #include <linux/rcupdate_wait.h>
> >  #include <linux/module.h>
> >  #include <linux/static_call.h>
> > +#include <linux/bpf_verifier.h>
> > +#include <linux/bpf_lsm.h>
> >
> >  /* dummy _ops. The verifier will operate on target program's ops. */
> >  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> > @@ -394,6 +396,7 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
> >               return BPF_TRAMP_MODIFY_RETURN;
> >       case BPF_TRACE_FEXIT:
> >               return BPF_TRAMP_FEXIT;
> > +     case BPF_LSM_CGROUP:
> Considering BPF_LSM_CGROUP is added here and the 'prog' for the
> case concerning here is the shim_prog ... (more below)
>
> >       case BPF_LSM_MAC:
> >               if (!prog->aux->attach_func_proto->type)
> >                       /* The function returns void, we cannot modify its
> > @@ -485,6 +488,147 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
> >       return err;
> >  }
> >
> > +static struct bpf_prog *cgroup_shim_alloc(const struct bpf_prog *prog,
> > +                                       bpf_func_t bpf_func)
> > +{
> > +     struct bpf_prog *p;
> > +
> > +     p = bpf_prog_alloc(1, 0);
> > +     if (!p)
> > +             return NULL;
> > +
> > +     p->jited = false;
> > +     p->bpf_func = bpf_func;
> > +
> > +     p->aux->cgroup_atype = prog->aux->cgroup_atype;
> > +     p->aux->attach_func_proto = prog->aux->attach_func_proto;
> > +     p->aux->attach_btf_id = prog->aux->attach_btf_id;
> > +     p->aux->attach_btf = prog->aux->attach_btf;
> > +     btf_get(p->aux->attach_btf);
> > +     p->type = BPF_PROG_TYPE_LSM;
> > +     p->expected_attach_type = BPF_LSM_MAC;
> ... should this be BPF_LSM_CGROUP instead ?
>
> or the above 'case BPF_LSM_CGROUP:' addition is not needed ?

Yeah, not needed, will remove.

> > +     bpf_prog_inc(p);
> > +
> > +     return p;
> > +}
> > +
> > +static struct bpf_prog *cgroup_shim_find(struct bpf_trampoline *tr,
> > +                                      bpf_func_t bpf_func)
> > +{
> > +     const struct bpf_prog_aux *aux;
> > +     int kind;
> > +
> > +     for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
> Can bpf_attach_type_to_tramp() be used here instead of
> looping all ?

Seems like it needs a bpf_prog as an argument, so it's easier to loop?

> > +             hlist_for_each_entry(aux, &tr->progs_hlist[kind], tramp_hlist) {
> > +                     struct bpf_prog *p = aux->prog;
> > +
> > +                     if (!p->jited && p->bpf_func == bpf_func)
> Is the "!p->jited" test needed ?

Not really, will drop.

> > +                             return p;
> > +             }
> > +     }
> > +
> > +     return NULL;
> > +}
> > +
> > +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> > +                                 struct bpf_attach_target_info *tgt_info)
> > +{
> > +     struct bpf_prog *shim_prog = NULL;
> > +     struct bpf_trampoline *tr;
> > +     bpf_func_t bpf_func;
> > +     u64 key;
> > +     int err;
> > +
> > +     key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > +                                      prog->aux->attach_btf_id);
> > +
> > +     err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > +     if (err)
> > +             return err;
> > +
> > +     tr = bpf_trampoline_get(key, tgt_info);
> > +     if (!tr)
> > +             return  -ENOMEM;
> > +
> > +     mutex_lock(&tr->mutex);
> > +
> > +     shim_prog = cgroup_shim_find(tr, bpf_func);
> > +     if (shim_prog) {
> > +             /* Reusing existing shim attached by the other program.
> > +              */
> The shim_prog is reused by >1 BPF_LSM_CGROUP progs and
> shim_prog is hidden from the userspace also (no id), so it may worth
> to bring this up:
>
> In __bpf_prog_enter(), other than some bpf stats of the shim_prog
> will become useless which is a very minor thing, it is also checking
> shim_prog->active and bump the misses counter.  Now, the misses counter
> is no longer visible to users.  Since it is actually running the cgroup prog,
> may be there is no need for the active check ?

Agree that the active counter will probably be taken care of when the
actual program runs; but now sure it worth the effort in trying to
remove it here?
Regarding "no longer visible to users": that's a good point. Should I
actually add those shim progs to the prog_idr? Or just hide it as
"internal implementation detail"?

Thank you for the review!

> > +             bpf_prog_inc(shim_prog);
> > +             mutex_unlock(&tr->mutex);
> > +             return 0;
> > +     }
> > +
> > +     /* Allocate and install new shim.
> > +      */
> > +
> > +     shim_prog = cgroup_shim_alloc(prog, bpf_func);
> > +     if (!shim_prog) {
> > +             err = -ENOMEM;
> > +             goto out;
> > +     }
> > +
> > +     err = __bpf_trampoline_link_prog(shim_prog, tr);
> > +     if (err)
> > +             goto out;
> > +
> > +     mutex_unlock(&tr->mutex);
> > +
> > +     return 0;
> > +out:
> > +     if (shim_prog)
> > +             bpf_prog_put(shim_prog);
> > +
> > +     mutex_unlock(&tr->mutex);
> > +     return err;
> > +}
> > +
> > +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
> > +{
> > +     struct bpf_prog *shim_prog;
> > +     struct bpf_trampoline *tr;
> > +     bpf_func_t bpf_func;
> > +     u64 key;
> > +     int err;
> > +
> > +     key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > +                                      prog->aux->attach_btf_id);
> > +
> > +     err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > +     if (err)
> > +             return;
> > +
> > +     tr = bpf_trampoline_lookup(key);
> > +     if (!tr)
> > +             return;
> > +
> > +     mutex_lock(&tr->mutex);
> > +
> > +     shim_prog = cgroup_shim_find(tr, bpf_func);
> > +     if (shim_prog) {
> > +             /* We use shim_prog refcnt for tracking whether to
> > +              * remove the shim program from the trampoline.
> > +              * Trampoline's mutex is held while refcnt is
> > +              * added/subtracted so we don't need to care about
> > +              * potential races.
> > +              */
> > +
> > +             if (atomic64_read(&shim_prog->aux->refcnt) == 1)
> > +                     WARN_ON_ONCE(__bpf_trampoline_unlink_prog(shim_prog, tr));
> > +
> > +             bpf_prog_put(shim_prog);
> > +     }
> > +
> > +     mutex_unlock(&tr->mutex);
> > +
> > +     bpf_trampoline_put(tr); /* bpf_trampoline_lookup */
> > +
> > +     if (shim_prog)
> > +             bpf_trampoline_put(tr);
> > +}
> > +

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-11 19:07     ` Stanislav Fomichev
@ 2022-04-12  1:04       ` Martin KaFai Lau
  2022-04-12 16:42         ` Stanislav Fomichev
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-12  1:04 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Mon, Apr 11, 2022 at 12:07:20PM -0700, Stanislav Fomichev wrote:
> ":  , wi
> 
> 
> 
> On Fri, Apr 8, 2022 at 3:13 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Thu, Apr 07, 2022 at 03:31:07PM -0700, Stanislav Fomichev wrote:
> > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > index 064eccba641d..eca258ba71d8 100644
> > > --- a/kernel/bpf/bpf_lsm.c
> > > +++ b/kernel/bpf/bpf_lsm.c
> > > @@ -35,6 +35,98 @@ BTF_SET_START(bpf_lsm_hooks)
> > >  #undef LSM_HOOK
> > >  BTF_SET_END(bpf_lsm_hooks)
> > >
> > > +static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > +                                             const struct bpf_insn *insn)
> > > +{
> > > +     const struct bpf_prog *prog;
> > > +     struct socket *sock;
> > > +     struct cgroup *cgrp;
> > > +     struct sock *sk;
> > > +     int ret = 0;
> > > +     u64 *regs;
> > > +
> > > +     regs = (u64 *)ctx;
> > > +     sock = (void *)(unsigned long)regs[BPF_REG_0];
> > > +     /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > +     prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > nit. Rename prog to shim_prog.
> >
> > > +
> > > +     if (unlikely(!sock))
> > Is it possible in the lsm hooks?  Can these hooks
> > be rejected at the load time instead?
> 
> Doesn't seem like it can be null, at least from the quick review that
> I had; I'll take a deeper look.
> I guess in general I wanted to be more defensive here because there
> are 200+ hooks, the new ones might arrive, and it's better to have the
> check?
not too worried about an extra runtime check for now.
Instead, have a concern that it will be a usage surprise when a successfully
attached bpf program is then always silently ignored.

Another question, for example, the inet_conn_request lsm_hook:
LSM_HOOK(int, 0, inet_conn_request, const struct sock *sk, struct sk_buff *skb,
         struct request_sock *req)

'struct sock *sk' is the first argument, so it will use the current's cgroup.
inet_conn_request() is likely run in a softirq though and then it will be
incorrect.  This runs in softirq case may not be limited to hooks that
take sk/sock argument also, not sure.

> > > +             return 0;
> > > +
> > > +     sk = sock->sk;
> > > +     if (unlikely(!sk))
> > Same here.
> >
> > > +             return 0;
> > > +
> > > +     cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > +     if (likely(cgrp))
unrelated, but while talking extra check,

I think the shim_prog has already acted as a higher level (per attach-btf_id)
knob but do you think it may still worth to do a bpf_empty_prog_array
check here in case a cgroup may not have prog to run ?

> > > +             ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > +                                         ctx, bpf_prog_run, 0);

[ ... ]

> > > @@ -100,6 +123,15 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
> > >       link->cgroup = NULL;
> > >  }
> > >
> > > +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
> > > +                                     enum cgroup_bpf_attach_type atype)
> > > +{
> > > +     if (!prog || atype != prog->aux->cgroup_atype)
> > prog cannot be NULL here, no?
> >
> > The 'atype != prog->aux->cgroup_atype' looks suspicious also considering
> > prog->aux->cgroup_atype is only initialized (and meaningful) for BPF_LSM_CGROUP.
> > I suspect incorrectly passing this test will crash in the below
> > bpf_trampoline_unlink_cgroup_shim(). More on this later.
> >
> > > +             return;
> > > +
> > > +     bpf_trampoline_unlink_cgroup_shim(prog);
> > > +}
> > > +
> > >  /**
> > >   * cgroup_bpf_release() - put references of all bpf programs and
> > >   *                        release all cgroup bpf data
> > > @@ -123,10 +155,16 @@ static void cgroup_bpf_release(struct work_struct *work)
> > Copying some missing loop context here:
> >
> >         for (atype = 0; atype < ARRAY_SIZE(cgrp->bpf.progs); atype++) {
> >                 struct list_head *progs = &cgrp->bpf.progs[atype];
> >                 struct bpf_prog_list *pl, *pltmp;
> >
> > >
> > >               list_for_each_entry_safe(pl, pltmp, progs, node) {
> > >                       list_del(&pl->node);
> > > -                     if (pl->prog)
> > > +                     if (pl->prog) {
> > > +                             bpf_cgroup_lsm_shim_release(pl->prog,
> > > +                                                         atype);
> > atype could be 0 (CGROUP_INET_INGRESS) here.  bpf_cgroup_lsm_shim_release()
> > above will go ahead with bpf_trampoline_unlink_cgroup_shim().
> > It will break some of the assumptions.  e.g. prog->aux->attach_btf is NULL
> > for CGROUP_INET_INGRESS.
> >
> > Instead, only call bpf_cgroup_lsm_shim_release() for BPF_LSM_CGROUP ?
> >
> > If the above observation is sane, I wonder if the existing test_progs
> > have uncovered it or may be the existing tests just always detach
> > cleanly itself before cleaning the cgroup which then avoided this case.
> 
> Might be what's happening here:
> 
> https://github.com/kernel-patches/bpf/runs/5876983908?check_suite_focus=true
hmm.... this one looks different.  I am thinking the oops should happen
in bpf_obj_id() which is not inlined.  didn't ring any bell for now
after a quick look, so yeah let's fix the known first.

> 
> Although, I'm not sure why it's z15 only. Good point on filtering by
> BPF_LSM_CGROUP, will do.
> 
> > >                               bpf_prog_put(pl->prog);
> > > -                     if (pl->link)
> > > +                     }
> > > +                     if (pl->link) {
> > > +                             bpf_cgroup_lsm_shim_release(pl->link->link.prog,
> > > +                                                         atype);
> > >                               bpf_cgroup_link_auto_detach(pl->link);
> > > +                     }
> > >                       kfree(pl);
> > >                       static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> > >               }

[ ... ]

> > > +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> > > +                                 struct bpf_attach_target_info *tgt_info)
> > > +{
> > > +     struct bpf_prog *shim_prog = NULL;
> > > +     struct bpf_trampoline *tr;
> > > +     bpf_func_t bpf_func;
> > > +     u64 key;
> > > +     int err;
> > > +
> > > +     key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > > +                                      prog->aux->attach_btf_id);
> > > +
> > > +     err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > > +     if (err)
> > > +             return err;
> > > +
> > > +     tr = bpf_trampoline_get(key, tgt_info);
> > > +     if (!tr)
> > > +             return  -ENOMEM;
> > > +
> > > +     mutex_lock(&tr->mutex);
> > > +
> > > +     shim_prog = cgroup_shim_find(tr, bpf_func);
> > > +     if (shim_prog) {
> > > +             /* Reusing existing shim attached by the other program.
> > > +              */
> > The shim_prog is reused by >1 BPF_LSM_CGROUP progs and
> > shim_prog is hidden from the userspace also (no id), so it may worth
> > to bring this up:
> >
> > In __bpf_prog_enter(), other than some bpf stats of the shim_prog
> > will become useless which is a very minor thing, it is also checking
> > shim_prog->active and bump the misses counter.  Now, the misses counter
> > is no longer visible to users.  Since it is actually running the cgroup prog,
> > may be there is no need for the active check ?
> 
> Agree that the active counter will probably be taken care of when the
> actual program runs;
iirc, the BPF_PROG_RUN_ARRAY_CG does not need the active counter.

> but now sure it worth the effort in trying to
> remove it here?
I was thinking if the active counter got triggered and missed calling the
BPF_LSM_CGROUP, then there is no way to tell this case got hit without
exposing the stats of the shim_prog and it could be a pretty hard
problem to chase.  It probably won't be an issue for non-sleepable now
if the rcu_read_lock() maps to preempt_disable().  Not sure about the
future sleepable case.

I am thinking to avoid doing all the active count and stats count
in __bpf_prog_enter() and __bpf_prog_exit() for BPF_LSM_CGROUP.  afaik,
only the rcu_read_lock and rcu_read_unlock are useful to protect
the shim_prog itself.  May be a __bpf_nostats_enter() and
__bpf_nostats_exit().

> Regarding "no longer visible to users": that's a good point. Should I
> actually add those shim progs to the prog_idr? Or just hide it as
> "internal implementation detail"?
Then no need to expose the shim_progs to the idr.

~~~~
[ btw, while thinking the shim_prog, I also think there is no need for one
  shim_prog for each attach_btf_id which is essentially
  prog->aux->cgroup_atype.  The static prog->aux->cgroup_atype can be
  passed in the stack when preparing the trampoline.
  just an idea and not suggesting must be done now.  This can be
  optimized later since it does not affect the API. ]
  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-09 17:04     ` Jakub Sitnicki
  2022-04-11 18:44       ` Stanislav Fomichev
@ 2022-04-12  1:19       ` Martin KaFai Lau
  2022-04-12 16:42         ` Stanislav Fomichev
  1 sibling, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-12  1:19 UTC (permalink / raw)
  To: Jakub Sitnicki, Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Sat, Apr 09, 2022 at 07:04:05PM +0200, Jakub Sitnicki wrote:
> >> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> >> index 6c661b4df9fa..d42516e86b3a 100644
> >> --- a/include/linux/bpf-cgroup-defs.h
> >> +++ b/include/linux/bpf-cgroup-defs.h
> >> @@ -10,7 +10,9 @@
> >>  
> >>  struct bpf_prog_array;
> >>  
> >> -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> >> +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> >> + */
> >> +#define CGROUP_LSM_NUM 10
> > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > a static 211 (and potentially growing in the future) is not good either.
> > I currently do not have a better idea also. :/
> >
> > Have you thought about other dynamic schemes or they would be too slow ?
> 
> As long as we're talking ideas - how about a 2-level lookup?
> 
> L1: 0..255 -> { 0..31, -1 }, where -1 is inactive cgroup_bp_attach_type
> L2: 0..31 -> struct bpf_prog_array * for cgroup->bpf.effective[],
>              struct hlist_head [^1]  for cgroup->bpf.progs[],
>              u32                     for cgroup->bpf.flags[],
> 
> This way we could have 32 distinct _active_ attachment types for each
> cgroup instance, to be shared among regular cgroup attach types and BPF
> LSM attach types.
> 
> It is 9 extra slots in comparison to today, so if anyone has cgroups
> that make use of all available attach types at the same time, we don't
> break their setup.
> 
> The L1 lookup table would still a few slots for new cgroup [^2] or LSM
> hooks:
> 
>   256 - 23 (cgroup attach types) - 211 (LSM hooks) = 22
> 
> Memory bloat:
> 
>  +256 B - L1 lookup table
Does L1 need to be per cgroup ?

or different cgroups usually have a very different active(/effective) set ?

>  + 72 B - extra effective[] slots
>  + 72 B - extra progs[] slots
>  + 36 B - extra flags[] slots
>  -184 B - savings from switching to hlist_head
>  ------
>  +252 B per cgroup instance
> 
> Total cgroup_bpf{} size change - 720 B -> 968 B.
> 
> WDYT?
> 
> [^1] It looks like we can easily switch from cgroup->bpf.progs[] from
>      list_head to hlist_head and save some bytes!
> 
>      We only access the list tail in __cgroup_bpf_attach(). We can
>      either iterate over the list and eat the cost there or push the new
>      prog onto the front.
> 
>      I think we treat cgroup->bpf.progs[] everywhere like an unordered
>      set. Except for __cgroup_bpf_query, where the user might notice the
>      order change in the BPF_PROG_QUERY dump.
> 
> [^2] Unrelated, but we would like to propose a
>      CGROUP_INET[46]_POST_CONNECT hook in the near future to make it
>      easier to bind UDP sockets to 4-tuple without creating conflicts:
> 
>      https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx/ebpf_connect4
>  
> [...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-11 18:46     ` Stanislav Fomichev
@ 2022-04-12  1:36       ` Martin KaFai Lau
  2022-04-12 16:42         ` Stanislav Fomichev
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-12  1:36 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > and bpf_cgroup program array. Instead of reserving a slot per
> > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > Those slots are dynamically allocated on demand and reclaimed.
> > > This still adds some bloat to the cgroup and brings us back to
> > > roughly pre-cgroup_bpf_attach_type times.
> > >
> > > It should be possible to eventually extend this idea to all hooks if
> > > the memory consumption is unacceptable and shrink overall effective
> > > programs array.
> > >
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > >  include/linux/bpf_lsm.h         |  6 ---
> > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > index 6c661b4df9fa..d42516e86b3a 100644
> > > --- a/include/linux/bpf-cgroup-defs.h
> > > +++ b/include/linux/bpf-cgroup-defs.h
> > > @@ -10,7 +10,9 @@
> > >
> > >  struct bpf_prog_array;
> > >
> > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > + */
> > > +#define CGROUP_LSM_NUM 10
> > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > a static 211 (and potentially growing in the future) is not good either.
> > I currently do not have a better idea also. :/
> >
> > Have you thought about other dynamic schemes or they would be too slow ?
> >
> > >  enum cgroup_bpf_attach_type {
> > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > index 7f0e59f5f9be..613de44aa429 100644
> > > --- a/include/linux/bpf_lsm.h
> > > +++ b/include/linux/bpf_lsm.h
> > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > >  void bpf_inode_storage_free(struct inode *inode);
> > >
> > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > -int bpf_lsm_hook_idx(u32 btf_id);
> > >
> > >  #else /* !CONFIG_BPF_LSM */
> > >
> > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > >       return -ENOENT;
> > >  }
> > >
> > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > -{
> > > -     return -EINVAL;
> > > -}
> > > -
> > >  #endif /* CONFIG_BPF_LSM */
> > >
> > >  #endif /* _LINUX_BPF_LSM_H */
> > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > index eca258ba71d8..8b948ec9ab73 100644
> > > --- a/kernel/bpf/bpf_lsm.c
> > > +++ b/kernel/bpf/bpf_lsm.c
> > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > >       if (unlikely(!sk))
> > >               return 0;
> > >
> > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > >       if (likely(cgrp))
> > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > >                                           ctx, bpf_prog_run, 0);
> > > +     rcu_read_unlock();
> > >       return ret;
> > >  }
> > >
> > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > >
> > > -     rcu_read_lock();
> > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > I think this is also needed for task_dfl_cgroup().  If yes,
> > will be a good idea to adjust the comment if it ends up
> > using the 'CGROUP_LSM_NUM 10' scheme.
> >
> > While at rcu_read_lock(), have you thought about what major things are
> > needed to make BPF_LSM_CGROUP sleepable ?
> >
> > The cgroup local storage could be one that require changes but it seems
> > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > The current use case doesn't need it?
> 
> No, I haven't thought about sleepable at all yet :-( But seems like
> having that rcu lock here might be problematic if we want to sleep? In
> this case, Jakub's suggestion seems better.
The new rcu_read_lock() here seems fine after some thoughts.

I was looking at the helpers in cgroup_base_func_proto() to get a sense
on sleepable support.  Only the bpf_get_local_storage caught my eyes for
now because it uses a call_rcu to free the storage.  That will be the
major one to change for sleepable that I can think of for now.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
  2022-04-12  1:04       ` Martin KaFai Lau
@ 2022-04-12 16:42         ` Stanislav Fomichev
  0 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-12 16:42 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

> On Mon, Apr 11, 2022 at 6:04 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Mon, Apr 11, 2022 at 12:07:20PM -0700, Stanislav Fomichev wrote:
> > On Fri, Apr 8, 2022 at 3:13 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Thu, Apr 07, 2022 at 03:31:07PM -0700, Stanislav Fomichev wrote:
> > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > index 064eccba641d..eca258ba71d8 100644
> > > > --- a/kernel/bpf/bpf_lsm.c
> > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > @@ -35,6 +35,98 @@ BTF_SET_START(bpf_lsm_hooks)
> > > >  #undef LSM_HOOK
> > > >  BTF_SET_END(bpf_lsm_hooks)
> > > >
> > > > +static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > > +                                             const struct bpf_insn *insn)
> > > > +{
> > > > +     const struct bpf_prog *prog;
> > > > +     struct socket *sock;
> > > > +     struct cgroup *cgrp;
> > > > +     struct sock *sk;
> > > > +     int ret = 0;
> > > > +     u64 *regs;
> > > > +
> > > > +     regs = (u64 *)ctx;
> > > > +     sock = (void *)(unsigned long)regs[BPF_REG_0];
> > > > +     /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > > +     prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > nit. Rename prog to shim_prog.
> > >
> > > > +
> > > > +     if (unlikely(!sock))
> > > Is it possible in the lsm hooks?  Can these hooks
> > > be rejected at the load time instead?
> >
> > Doesn't seem like it can be null, at least from the quick review that
> > I had; I'll take a deeper look.
> > I guess in general I wanted to be more defensive here because there
> > are 200+ hooks, the new ones might arrive, and it's better to have the
> > check?
> not too worried about an extra runtime check for now.
> Instead, have a concern that it will be a usage surprise when a successfully
> attached bpf program is then always silently ignored.
>
> Another question, for example, the inet_conn_request lsm_hook:
> LSM_HOOK(int, 0, inet_conn_request, const struct sock *sk, struct sk_buff *skb,
>          struct request_sock *req)
>
> 'struct sock *sk' is the first argument, so it will use the current's cgroup.
> inet_conn_request() is likely run in a softirq though and then it will be
> incorrect.  This runs in softirq case may not be limited to hooks that
> take sk/sock argument also, not sure.

For now, I decided not to treat 'struct sock' cases as 'socket'
because of cases like sk_alloc_security where 'struct sock' is not
initialized. Looks like treating them as 'current' is also not 100%
foolproof. I guess we'd still have to have some special
cases/exceptions. Let me bring back that 'struct sock' handler and add
some btf-set to treat other non-inet_conn_request as the exception for
now.

> > > > +             return 0;
> > > > +
> > > > +     sk = sock->sk;
> > > > +     if (unlikely(!sk))
> > > Same here.
> > >
> > > > +             return 0;
> > > > +
> > > > +     cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > > +     if (likely(cgrp))
> unrelated, but while talking extra check,
>
> I think the shim_prog has already acted as a higher level (per attach-btf_id)
> knob but do you think it may still worth to do a bpf_empty_prog_array
> check here in case a cgroup may not have prog to run ?

Oh yeah, good idea, let me add those cgroup_bpf_sock_enabled.

> > > > +             ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > > +                                         ctx, bpf_prog_run, 0);
>
> [ ... ]
>
> > > > @@ -100,6 +123,15 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
> > > >       link->cgroup = NULL;
> > > >  }
> > > >
> > > > +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
> > > > +                                     enum cgroup_bpf_attach_type atype)
> > > > +{
> > > > +     if (!prog || atype != prog->aux->cgroup_atype)
> > > prog cannot be NULL here, no?
> > >
> > > The 'atype != prog->aux->cgroup_atype' looks suspicious also considering
> > > prog->aux->cgroup_atype is only initialized (and meaningful) for BPF_LSM_CGROUP.
> > > I suspect incorrectly passing this test will crash in the below
> > > bpf_trampoline_unlink_cgroup_shim(). More on this later.
> > >
> > > > +             return;
> > > > +
> > > > +     bpf_trampoline_unlink_cgroup_shim(prog);
> > > > +}
> > > > +
> > > >  /**
> > > >   * cgroup_bpf_release() - put references of all bpf programs and
> > > >   *                        release all cgroup bpf data
> > > > @@ -123,10 +155,16 @@ static void cgroup_bpf_release(struct work_struct *work)
> > > Copying some missing loop context here:
> > >
> > >         for (atype = 0; atype < ARRAY_SIZE(cgrp->bpf.progs); atype++) {
> > >                 struct list_head *progs = &cgrp->bpf.progs[atype];
> > >                 struct bpf_prog_list *pl, *pltmp;
> > >
> > > >
> > > >               list_for_each_entry_safe(pl, pltmp, progs, node) {
> > > >                       list_del(&pl->node);
> > > > -                     if (pl->prog)
> > > > +                     if (pl->prog) {
> > > > +                             bpf_cgroup_lsm_shim_release(pl->prog,
> > > > +                                                         atype);
> > > atype could be 0 (CGROUP_INET_INGRESS) here.  bpf_cgroup_lsm_shim_release()
> > > above will go ahead with bpf_trampoline_unlink_cgroup_shim().
> > > It will break some of the assumptions.  e.g. prog->aux->attach_btf is NULL
> > > for CGROUP_INET_INGRESS.
> > >
> > > Instead, only call bpf_cgroup_lsm_shim_release() for BPF_LSM_CGROUP ?
> > >
> > > If the above observation is sane, I wonder if the existing test_progs
> > > have uncovered it or may be the existing tests just always detach
> > > cleanly itself before cleaning the cgroup which then avoided this case.
> >
> > Might be what's happening here:
> >
> > https://github.com/kernel-patches/bpf/runs/5876983908?check_suite_focus=true
> hmm.... this one looks different.  I am thinking the oops should happen
> in bpf_obj_id() which is not inlined.  didn't ring any bell for now
> after a quick look, so yeah let's fix the known first.
>
> >
> > Although, I'm not sure why it's z15 only. Good point on filtering by
> > BPF_LSM_CGROUP, will do.
> >
> > > >                               bpf_prog_put(pl->prog);
> > > > -                     if (pl->link)
> > > > +                     }
> > > > +                     if (pl->link) {
> > > > +                             bpf_cgroup_lsm_shim_release(pl->link->link.prog,
> > > > +                                                         atype);
> > > >                               bpf_cgroup_link_auto_detach(pl->link);
> > > > +                     }
> > > >                       kfree(pl);
> > > >                       static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> > > >               }
>
> [ ... ]
>
> > > > +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> > > > +                                 struct bpf_attach_target_info *tgt_info)
> > > > +{
> > > > +     struct bpf_prog *shim_prog = NULL;
> > > > +     struct bpf_trampoline *tr;
> > > > +     bpf_func_t bpf_func;
> > > > +     u64 key;
> > > > +     int err;
> > > > +
> > > > +     key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > > > +                                      prog->aux->attach_btf_id);
> > > > +
> > > > +     err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > > > +     if (err)
> > > > +             return err;
> > > > +
> > > > +     tr = bpf_trampoline_get(key, tgt_info);
> > > > +     if (!tr)
> > > > +             return  -ENOMEM;
> > > > +
> > > > +     mutex_lock(&tr->mutex);
> > > > +
> > > > +     shim_prog = cgroup_shim_find(tr, bpf_func);
> > > > +     if (shim_prog) {
> > > > +             /* Reusing existing shim attached by the other program.
> > > > +              */
> > > The shim_prog is reused by >1 BPF_LSM_CGROUP progs and
> > > shim_prog is hidden from the userspace also (no id), so it may worth
> > > to bring this up:
> > >
> > > In __bpf_prog_enter(), other than some bpf stats of the shim_prog
> > > will become useless which is a very minor thing, it is also checking
> > > shim_prog->active and bump the misses counter.  Now, the misses counter
> > > is no longer visible to users.  Since it is actually running the cgroup prog,
> > > may be there is no need for the active check ?
> >
> > Agree that the active counter will probably be taken care of when the
> > actual program runs;
> iirc, the BPF_PROG_RUN_ARRAY_CG does not need the active counter.
>
> > but now sure it worth the effort in trying to
> > remove it here?
> I was thinking if the active counter got triggered and missed calling the
> BPF_LSM_CGROUP, then there is no way to tell this case got hit without
> exposing the stats of the shim_prog and it could be a pretty hard
> problem to chase.  It probably won't be an issue for non-sleepable now
> if the rcu_read_lock() maps to preempt_disable().  Not sure about the
> future sleepable case.
>
> I am thinking to avoid doing all the active count and stats count
> in __bpf_prog_enter() and __bpf_prog_exit() for BPF_LSM_CGROUP.  afaik,
> only the rcu_read_lock and rcu_read_unlock are useful to protect
> the shim_prog itself.  May be a __bpf_nostats_enter() and
> __bpf_nostats_exit().

SG, let me try to skip that for BPF_LSM_CGROUP case.

> > Regarding "no longer visible to users": that's a good point. Should I
> > actually add those shim progs to the prog_idr? Or just hide it as
> > "internal implementation detail"?
> Then no need to expose the shim_progs to the idr.
>
> ~~~~
> [ btw, while thinking the shim_prog, I also think there is no need for one
>   shim_prog for each attach_btf_id which is essentially
>   prog->aux->cgroup_atype.  The static prog->aux->cgroup_atype can be
>   passed in the stack when preparing the trampoline.
>   just an idea and not suggesting must be done now.  This can be
>   optimized later since it does not affect the API. ]

Ack, I guess in theory, there needs to be only two "global"
shim_progs, one for 'struct socket' and another for 'current' (or
more, for other types). I went with allocating an instance per
trampoline to avoid having that global state. Working under tr->mutex
simplifies things a bit imo, but, as you said, we can optimize here if
needed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12  1:36       ` Martin KaFai Lau
@ 2022-04-12 16:42         ` Stanislav Fomichev
  2022-04-12 18:13           ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-12 16:42 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Mon, Apr 11, 2022 at 6:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> > On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > > and bpf_cgroup program array. Instead of reserving a slot per
> > > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > > Those slots are dynamically allocated on demand and reclaimed.
> > > > This still adds some bloat to the cgroup and brings us back to
> > > > roughly pre-cgroup_bpf_attach_type times.
> > > >
> > > > It should be possible to eventually extend this idea to all hooks if
> > > > the memory consumption is unacceptable and shrink overall effective
> > > > programs array.
> > > >
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > > >  include/linux/bpf_lsm.h         |  6 ---
> > > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > > index 6c661b4df9fa..d42516e86b3a 100644
> > > > --- a/include/linux/bpf-cgroup-defs.h
> > > > +++ b/include/linux/bpf-cgroup-defs.h
> > > > @@ -10,7 +10,9 @@
> > > >
> > > >  struct bpf_prog_array;
> > > >
> > > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > > + */
> > > > +#define CGROUP_LSM_NUM 10
> > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > a static 211 (and potentially growing in the future) is not good either.
> > > I currently do not have a better idea also. :/
> > >
> > > Have you thought about other dynamic schemes or they would be too slow ?
> > >
> > > >  enum cgroup_bpf_attach_type {
> > > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > > index 7f0e59f5f9be..613de44aa429 100644
> > > > --- a/include/linux/bpf_lsm.h
> > > > +++ b/include/linux/bpf_lsm.h
> > > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > > >  void bpf_inode_storage_free(struct inode *inode);
> > > >
> > > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > > -int bpf_lsm_hook_idx(u32 btf_id);
> > > >
> > > >  #else /* !CONFIG_BPF_LSM */
> > > >
> > > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > > >       return -ENOENT;
> > > >  }
> > > >
> > > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > > -{
> > > > -     return -EINVAL;
> > > > -}
> > > > -
> > > >  #endif /* CONFIG_BPF_LSM */
> > > >
> > > >  #endif /* _LINUX_BPF_LSM_H */
> > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > index eca258ba71d8..8b948ec9ab73 100644
> > > > --- a/kernel/bpf/bpf_lsm.c
> > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > >       if (unlikely(!sk))
> > > >               return 0;
> > > >
> > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > >       if (likely(cgrp))
> > > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > >                                           ctx, bpf_prog_run, 0);
> > > > +     rcu_read_unlock();
> > > >       return ret;
> > > >  }
> > > >
> > > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > >
> > > > -     rcu_read_lock();
> > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > I think this is also needed for task_dfl_cgroup().  If yes,
> > > will be a good idea to adjust the comment if it ends up
> > > using the 'CGROUP_LSM_NUM 10' scheme.
> > >
> > > While at rcu_read_lock(), have you thought about what major things are
> > > needed to make BPF_LSM_CGROUP sleepable ?
> > >
> > > The cgroup local storage could be one that require changes but it seems
> > > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > > The current use case doesn't need it?
> >
> > No, I haven't thought about sleepable at all yet :-( But seems like
> > having that rcu lock here might be problematic if we want to sleep? In
> > this case, Jakub's suggestion seems better.
> The new rcu_read_lock() here seems fine after some thoughts.
>
> I was looking at the helpers in cgroup_base_func_proto() to get a sense
> on sleepable support.  Only the bpf_get_local_storage caught my eyes for
> now because it uses a call_rcu to free the storage.  That will be the
> major one to change for sleepable that I can think of for now.

That rcu_read_lock should be switched over to rcu_read_lock_trace in
the sleepable case I'm assuming? Are we allowed to sleep while holding
rcu_read_lock_trace?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12  1:19       ` Martin KaFai Lau
@ 2022-04-12 16:42         ` Stanislav Fomichev
  2022-04-12 17:40           ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-12 16:42 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: Jakub Sitnicki, netdev, bpf, ast, daniel, andrii

On Mon, Apr 11, 2022 at 6:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Sat, Apr 09, 2022 at 07:04:05PM +0200, Jakub Sitnicki wrote:
> > >> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > >> index 6c661b4df9fa..d42516e86b3a 100644
> > >> --- a/include/linux/bpf-cgroup-defs.h
> > >> +++ b/include/linux/bpf-cgroup-defs.h
> > >> @@ -10,7 +10,9 @@
> > >>
> > >>  struct bpf_prog_array;
> > >>
> > >> -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > >> +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > >> + */
> > >> +#define CGROUP_LSM_NUM 10
> > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > a static 211 (and potentially growing in the future) is not good either.
> > > I currently do not have a better idea also. :/
> > >
> > > Have you thought about other dynamic schemes or they would be too slow ?
> >
> > As long as we're talking ideas - how about a 2-level lookup?
> >
> > L1: 0..255 -> { 0..31, -1 }, where -1 is inactive cgroup_bp_attach_type
> > L2: 0..31 -> struct bpf_prog_array * for cgroup->bpf.effective[],
> >              struct hlist_head [^1]  for cgroup->bpf.progs[],
> >              u32                     for cgroup->bpf.flags[],
> >
> > This way we could have 32 distinct _active_ attachment types for each
> > cgroup instance, to be shared among regular cgroup attach types and BPF
> > LSM attach types.
> >
> > It is 9 extra slots in comparison to today, so if anyone has cgroups
> > that make use of all available attach types at the same time, we don't
> > break their setup.
> >
> > The L1 lookup table would still a few slots for new cgroup [^2] or LSM
> > hooks:
> >
> >   256 - 23 (cgroup attach types) - 211 (LSM hooks) = 22
> >
> > Memory bloat:
> >
> >  +256 B - L1 lookup table
> Does L1 need to be per cgroup ?
>
> or different cgroups usually have a very different active(/effective) set ?

I'm assuming the suggestion is to have it per cgroup. Otherwise, if it's
global, it's close to whatever I'm proposing in the original patch. As I
mentioned in the commit message, in theory, all cgroup_bpf can be managed
the way I propose to manage 10 lsm slots if we get to the point where
10 slots is not enough.

I've played with this mode a bit and it looks a bit complicated :-( Since it's
per cgroup, we have to be careful to preserve this mapping during
cgroup_bpf_inherit.
I'll see what I can do, but I'll most likely revert to my initial
version for now (I'll also include list_head->hlist_head conversion
patch, very nice idea).



> >  + 72 B - extra effective[] slots
> >  + 72 B - extra progs[] slots
> >  + 36 B - extra flags[] slots
> >  -184 B - savings from switching to hlist_head
> >  ------
> >  +252 B per cgroup instance
> >
> > Total cgroup_bpf{} size change - 720 B -> 968 B.
> >
> > WDYT?
> >
> > [^1] It looks like we can easily switch from cgroup->bpf.progs[] from
> >      list_head to hlist_head and save some bytes!
> >
> >      We only access the list tail in __cgroup_bpf_attach(). We can
> >      either iterate over the list and eat the cost there or push the new
> >      prog onto the front.
> >
> >      I think we treat cgroup->bpf.progs[] everywhere like an unordered
> >      set. Except for __cgroup_bpf_query, where the user might notice the
> >      order change in the BPF_PROG_QUERY dump.
> >
> > [^2] Unrelated, but we would like to propose a
> >      CGROUP_INET[46]_POST_CONNECT hook in the near future to make it
> >      easier to bind UDP sockets to 4-tuple without creating conflicts:
> >
> >      https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx/ebpf_connect4
> >
> > [...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12 16:42         ` Stanislav Fomichev
@ 2022-04-12 17:40           ` Martin KaFai Lau
  0 siblings, 0 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-12 17:40 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: Jakub Sitnicki, netdev, bpf, ast, daniel, andrii

On Tue, Apr 12, 2022 at 09:42:41AM -0700, Stanislav Fomichev wrote:
> On Mon, Apr 11, 2022 at 6:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Sat, Apr 09, 2022 at 07:04:05PM +0200, Jakub Sitnicki wrote:
> > > >> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > >> index 6c661b4df9fa..d42516e86b3a 100644
> > > >> --- a/include/linux/bpf-cgroup-defs.h
> > > >> +++ b/include/linux/bpf-cgroup-defs.h
> > > >> @@ -10,7 +10,9 @@
> > > >>
> > > >>  struct bpf_prog_array;
> > > >>
> > > >> -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > >> +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > >> + */
> > > >> +#define CGROUP_LSM_NUM 10
> > > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > > a static 211 (and potentially growing in the future) is not good either.
> > > > I currently do not have a better idea also. :/
> > > >
> > > > Have you thought about other dynamic schemes or they would be too slow ?
> > >
> > > As long as we're talking ideas - how about a 2-level lookup?
> > >
> > > L1: 0..255 -> { 0..31, -1 }, where -1 is inactive cgroup_bp_attach_type
> > > L2: 0..31 -> struct bpf_prog_array * for cgroup->bpf.effective[],
> > >              struct hlist_head [^1]  for cgroup->bpf.progs[],
> > >              u32                     for cgroup->bpf.flags[],
> > >
> > > This way we could have 32 distinct _active_ attachment types for each
> > > cgroup instance, to be shared among regular cgroup attach types and BPF
> > > LSM attach types.
> > >
> > > It is 9 extra slots in comparison to today, so if anyone has cgroups
> > > that make use of all available attach types at the same time, we don't
> > > break their setup.
> > >
> > > The L1 lookup table would still a few slots for new cgroup [^2] or LSM
> > > hooks:
> > >
> > >   256 - 23 (cgroup attach types) - 211 (LSM hooks) = 22
> > >
> > > Memory bloat:
> > >
> > >  +256 B - L1 lookup table
> > Does L1 need to be per cgroup ?
> >
> > or different cgroups usually have a very different active(/effective) set ?
> 
> I'm assuming the suggestion is to have it per cgroup. Otherwise, if it's
> global, it's close to whatever I'm proposing in the original patch. As I
> mentioned in the commit message, in theory, all cgroup_bpf can be managed
> the way I propose to manage 10 lsm slots if we get to the point where
> 10 slots is not enough.
Ah, indeed. The global one will be similar to the original patch.  I was
thinking only use the spaces saved from list_head->hlist_head to get a
larger progs[] instead of spending it on L1 lookup table.

Also, I think u8 should be enough for the flags[].

> I've played with this mode a bit and it looks a bit complicated :-( Since it's
> per cgroup, we have to be careful to preserve this mapping during
> cgroup_bpf_inherit.
> I'll see what I can do, but I'll most likely revert to my initial
> version for now (I'll also include list_head->hlist_head conversion
> patch, very nice idea).
sgtm.

> 
> 
> 
> > >  + 72 B - extra effective[] slots
> > >  + 72 B - extra progs[] slots
> > >  + 36 B - extra flags[] slots
> > >  -184 B - savings from switching to hlist_head
> > >  ------
> > >  +252 B per cgroup instance
> > >
> > > Total cgroup_bpf{} size change - 720 B -> 968 B.
> > >
> > > WDYT?
> > >
> > > [^1] It looks like we can easily switch from cgroup->bpf.progs[] from
> > >      list_head to hlist_head and save some bytes!
> > >
> > >      We only access the list tail in __cgroup_bpf_attach(). We can
> > >      either iterate over the list and eat the cost there or push the new
> > >      prog onto the front.
> > >
> > >      I think we treat cgroup->bpf.progs[] everywhere like an unordered
> > >      set. Except for __cgroup_bpf_query, where the user might notice the
> > >      order change in the BPF_PROG_QUERY dump.
> > >
> > > [^2] Unrelated, but we would like to propose a
> > >      CGROUP_INET[46]_POST_CONNECT hook in the near future to make it
> > >      easier to bind UDP sockets to 4-tuple without creating conflicts:
> > >
> > >      https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx/ebpf_connect4
> > >
> > > [...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12 16:42         ` Stanislav Fomichev
@ 2022-04-12 18:13           ` Martin KaFai Lau
  2022-04-12 19:01             ` Stanislav Fomichev
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-12 18:13 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Tue, Apr 12, 2022 at 09:42:40AM -0700, Stanislav Fomichev wrote:
> On Mon, Apr 11, 2022 at 6:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> > > On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > > > and bpf_cgroup program array. Instead of reserving a slot per
> > > > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > > > Those slots are dynamically allocated on demand and reclaimed.
> > > > > This still adds some bloat to the cgroup and brings us back to
> > > > > roughly pre-cgroup_bpf_attach_type times.
> > > > >
> > > > > It should be possible to eventually extend this idea to all hooks if
> > > > > the memory consumption is unacceptable and shrink overall effective
> > > > > programs array.
> > > > >
> > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > ---
> > > > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > > > >  include/linux/bpf_lsm.h         |  6 ---
> > > > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > > > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > > > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > > > index 6c661b4df9fa..d42516e86b3a 100644
> > > > > --- a/include/linux/bpf-cgroup-defs.h
> > > > > +++ b/include/linux/bpf-cgroup-defs.h
> > > > > @@ -10,7 +10,9 @@
> > > > >
> > > > >  struct bpf_prog_array;
> > > > >
> > > > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > > > + */
> > > > > +#define CGROUP_LSM_NUM 10
> > > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > > a static 211 (and potentially growing in the future) is not good either.
> > > > I currently do not have a better idea also. :/
> > > >
> > > > Have you thought about other dynamic schemes or they would be too slow ?
> > > >
> > > > >  enum cgroup_bpf_attach_type {
> > > > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > > > index 7f0e59f5f9be..613de44aa429 100644
> > > > > --- a/include/linux/bpf_lsm.h
> > > > > +++ b/include/linux/bpf_lsm.h
> > > > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > > > >  void bpf_inode_storage_free(struct inode *inode);
> > > > >
> > > > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > > > -int bpf_lsm_hook_idx(u32 btf_id);
> > > > >
> > > > >  #else /* !CONFIG_BPF_LSM */
> > > > >
> > > > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > > > >       return -ENOENT;
> > > > >  }
> > > > >
> > > > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > > > -{
> > > > > -     return -EINVAL;
> > > > > -}
> > > > > -
> > > > >  #endif /* CONFIG_BPF_LSM */
> > > > >
> > > > >  #endif /* _LINUX_BPF_LSM_H */
> > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > index eca258ba71d8..8b948ec9ab73 100644
> > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > > >       if (unlikely(!sk))
> > > > >               return 0;
> > > > >
> > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > > >       if (likely(cgrp))
> > > > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > > >                                           ctx, bpf_prog_run, 0);
> > > > > +     rcu_read_unlock();
> > > > >       return ret;
> > > > >  }
> > > > >
> > > > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > > > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > > >
> > > > > -     rcu_read_lock();
> > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > I think this is also needed for task_dfl_cgroup().  If yes,
> > > > will be a good idea to adjust the comment if it ends up
> > > > using the 'CGROUP_LSM_NUM 10' scheme.
> > > >
> > > > While at rcu_read_lock(), have you thought about what major things are
> > > > needed to make BPF_LSM_CGROUP sleepable ?
> > > >
> > > > The cgroup local storage could be one that require changes but it seems
> > > > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > > > The current use case doesn't need it?
> > >
> > > No, I haven't thought about sleepable at all yet :-( But seems like
> > > having that rcu lock here might be problematic if we want to sleep? In
> > > this case, Jakub's suggestion seems better.
> > The new rcu_read_lock() here seems fine after some thoughts.
> >
> > I was looking at the helpers in cgroup_base_func_proto() to get a sense
> > on sleepable support.  Only the bpf_get_local_storage caught my eyes for
> > now because it uses a call_rcu to free the storage.  That will be the
> > major one to change for sleepable that I can think of for now.
> 
> That rcu_read_lock should be switched over to rcu_read_lock_trace in
> the sleepable case I'm assuming? Are we allowed to sleep while holding
> rcu_read_lock_trace?
Ah. right, suddenly forgot the obvious in between emails :(

In that sense, may as well remove the rcu_read_lock() here and let
the trampoline to decide which one (rcu_read_lock or rcu_read_lock_trace)
to call before calling the shim_prog.  The __bpf_prog_enter(_sleepable) will
call the right rcu_read_lock(_trace) based on the prog is sleepable or not.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12 18:13           ` Martin KaFai Lau
@ 2022-04-12 19:01             ` Stanislav Fomichev
  2022-04-12 20:19               ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-12 19:01 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Tue, Apr 12, 2022 at 11:13 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Apr 12, 2022 at 09:42:40AM -0700, Stanislav Fomichev wrote:
> > On Mon, Apr 11, 2022 at 6:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> > > > On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > > > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > > > > and bpf_cgroup program array. Instead of reserving a slot per
> > > > > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > > > > Those slots are dynamically allocated on demand and reclaimed.
> > > > > > This still adds some bloat to the cgroup and brings us back to
> > > > > > roughly pre-cgroup_bpf_attach_type times.
> > > > > >
> > > > > > It should be possible to eventually extend this idea to all hooks if
> > > > > > the memory consumption is unacceptable and shrink overall effective
> > > > > > programs array.
> > > > > >
> > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > ---
> > > > > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > > > > >  include/linux/bpf_lsm.h         |  6 ---
> > > > > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > > > > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > > > > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > > > > index 6c661b4df9fa..d42516e86b3a 100644
> > > > > > --- a/include/linux/bpf-cgroup-defs.h
> > > > > > +++ b/include/linux/bpf-cgroup-defs.h
> > > > > > @@ -10,7 +10,9 @@
> > > > > >
> > > > > >  struct bpf_prog_array;
> > > > > >
> > > > > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > > > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > > > > + */
> > > > > > +#define CGROUP_LSM_NUM 10
> > > > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > > > a static 211 (and potentially growing in the future) is not good either.
> > > > > I currently do not have a better idea also. :/
> > > > >
> > > > > Have you thought about other dynamic schemes or they would be too slow ?
> > > > >
> > > > > >  enum cgroup_bpf_attach_type {
> > > > > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > > > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > > > > index 7f0e59f5f9be..613de44aa429 100644
> > > > > > --- a/include/linux/bpf_lsm.h
> > > > > > +++ b/include/linux/bpf_lsm.h
> > > > > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > > > > >  void bpf_inode_storage_free(struct inode *inode);
> > > > > >
> > > > > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > > > > -int bpf_lsm_hook_idx(u32 btf_id);
> > > > > >
> > > > > >  #else /* !CONFIG_BPF_LSM */
> > > > > >
> > > > > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > > > > >       return -ENOENT;
> > > > > >  }
> > > > > >
> > > > > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > > > > -{
> > > > > > -     return -EINVAL;
> > > > > > -}
> > > > > > -
> > > > > >  #endif /* CONFIG_BPF_LSM */
> > > > > >
> > > > > >  #endif /* _LINUX_BPF_LSM_H */
> > > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > > index eca258ba71d8..8b948ec9ab73 100644
> > > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > > > >       if (unlikely(!sk))
> > > > > >               return 0;
> > > > > >
> > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > > > >       if (likely(cgrp))
> > > > > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > > > >                                           ctx, bpf_prog_run, 0);
> > > > > > +     rcu_read_unlock();
> > > > > >       return ret;
> > > > > >  }
> > > > > >
> > > > > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > > > > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > > > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > > > >
> > > > > > -     rcu_read_lock();
> > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > I think this is also needed for task_dfl_cgroup().  If yes,
> > > > > will be a good idea to adjust the comment if it ends up
> > > > > using the 'CGROUP_LSM_NUM 10' scheme.
> > > > >
> > > > > While at rcu_read_lock(), have you thought about what major things are
> > > > > needed to make BPF_LSM_CGROUP sleepable ?
> > > > >
> > > > > The cgroup local storage could be one that require changes but it seems
> > > > > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > > > > The current use case doesn't need it?
> > > >
> > > > No, I haven't thought about sleepable at all yet :-( But seems like
> > > > having that rcu lock here might be problematic if we want to sleep? In
> > > > this case, Jakub's suggestion seems better.
> > > The new rcu_read_lock() here seems fine after some thoughts.
> > >
> > > I was looking at the helpers in cgroup_base_func_proto() to get a sense
> > > on sleepable support.  Only the bpf_get_local_storage caught my eyes for
> > > now because it uses a call_rcu to free the storage.  That will be the
> > > major one to change for sleepable that I can think of for now.
> >
> > That rcu_read_lock should be switched over to rcu_read_lock_trace in
> > the sleepable case I'm assuming? Are we allowed to sleep while holding
> > rcu_read_lock_trace?
> Ah. right, suddenly forgot the obvious in between emails :(
>
> In that sense, may as well remove the rcu_read_lock() here and let
> the trampoline to decide which one (rcu_read_lock or rcu_read_lock_trace)
> to call before calling the shim_prog.  The __bpf_prog_enter(_sleepable) will
> call the right rcu_read_lock(_trace) based on the prog is sleepable or not.

Removing rcu_read_lock in __cgroup_bpf_run_lsm_current might be
problematic because we also want to guarantee current's cgroup doesn't
go away. I'm assuming things like task migrating to a new cgroup and
the old one being freed can happen while we are trying to get cgroup's
effective array.

I guess BPF_PROG_RUN_ARRAY_CG will also need some work before
sleepable can happen (it calls rcu_read_lock unconditionally).

Also, it doesn't seem like BPF_PROG_RUN_ARRAY_CG rcu usage is correct.
It receives __rcu array_rcu, takes rcu read lock and does deref. I'm
assuming that array_rcu can be free'd before we even get to
BPF_PROG_RUN_ARRAY_CG's rcu_read_lock? (so having rcu_read_lock around
BPF_PROG_RUN_ARRAY_CG makes sense)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12 19:01             ` Stanislav Fomichev
@ 2022-04-12 20:19               ` Martin KaFai Lau
  2022-04-12 20:36                 ` Stanislav Fomichev
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-12 20:19 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Tue, Apr 12, 2022 at 12:01:41PM -0700, Stanislav Fomichev wrote:
> On Tue, Apr 12, 2022 at 11:13 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Tue, Apr 12, 2022 at 09:42:40AM -0700, Stanislav Fomichev wrote:
> > > On Mon, Apr 11, 2022 at 6:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> > > > > On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > >
> > > > > > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > > > > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > > > > > and bpf_cgroup program array. Instead of reserving a slot per
> > > > > > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > > > > > Those slots are dynamically allocated on demand and reclaimed.
> > > > > > > This still adds some bloat to the cgroup and brings us back to
> > > > > > > roughly pre-cgroup_bpf_attach_type times.
> > > > > > >
> > > > > > > It should be possible to eventually extend this idea to all hooks if
> > > > > > > the memory consumption is unacceptable and shrink overall effective
> > > > > > > programs array.
> > > > > > >
> > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > ---
> > > > > > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > > > > > >  include/linux/bpf_lsm.h         |  6 ---
> > > > > > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > > > > > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > > > > > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > > > > > >
> > > > > > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > > > > > index 6c661b4df9fa..d42516e86b3a 100644
> > > > > > > --- a/include/linux/bpf-cgroup-defs.h
> > > > > > > +++ b/include/linux/bpf-cgroup-defs.h
> > > > > > > @@ -10,7 +10,9 @@
> > > > > > >
> > > > > > >  struct bpf_prog_array;
> > > > > > >
> > > > > > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > > > > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > > > > > + */
> > > > > > > +#define CGROUP_LSM_NUM 10
> > > > > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > > > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > > > > a static 211 (and potentially growing in the future) is not good either.
> > > > > > I currently do not have a better idea also. :/
> > > > > >
> > > > > > Have you thought about other dynamic schemes or they would be too slow ?
> > > > > >
> > > > > > >  enum cgroup_bpf_attach_type {
> > > > > > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > > > > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > > > > > index 7f0e59f5f9be..613de44aa429 100644
> > > > > > > --- a/include/linux/bpf_lsm.h
> > > > > > > +++ b/include/linux/bpf_lsm.h
> > > > > > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > > > > > >  void bpf_inode_storage_free(struct inode *inode);
> > > > > > >
> > > > > > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > > > > > -int bpf_lsm_hook_idx(u32 btf_id);
> > > > > > >
> > > > > > >  #else /* !CONFIG_BPF_LSM */
> > > > > > >
> > > > > > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > > > > > >       return -ENOENT;
> > > > > > >  }
> > > > > > >
> > > > > > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > > > > > -{
> > > > > > > -     return -EINVAL;
> > > > > > > -}
> > > > > > > -
> > > > > > >  #endif /* CONFIG_BPF_LSM */
> > > > > > >
> > > > > > >  #endif /* _LINUX_BPF_LSM_H */
> > > > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > > > index eca258ba71d8..8b948ec9ab73 100644
> > > > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > > > > >       if (unlikely(!sk))
> > > > > > >               return 0;
> > > > > > >
> > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > > > > >       if (likely(cgrp))
> > > > > > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > > > > >                                           ctx, bpf_prog_run, 0);
> > > > > > > +     rcu_read_unlock();
> > > > > > >       return ret;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > > > > > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > > > > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > > > > >
> > > > > > > -     rcu_read_lock();
> > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > I think this is also needed for task_dfl_cgroup().  If yes,
> > > > > > will be a good idea to adjust the comment if it ends up
> > > > > > using the 'CGROUP_LSM_NUM 10' scheme.
> > > > > >
> > > > > > While at rcu_read_lock(), have you thought about what major things are
> > > > > > needed to make BPF_LSM_CGROUP sleepable ?
> > > > > >
> > > > > > The cgroup local storage could be one that require changes but it seems
> > > > > > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > > > > > The current use case doesn't need it?
> > > > >
> > > > > No, I haven't thought about sleepable at all yet :-( But seems like
> > > > > having that rcu lock here might be problematic if we want to sleep? In
> > > > > this case, Jakub's suggestion seems better.
> > > > The new rcu_read_lock() here seems fine after some thoughts.
> > > >
> > > > I was looking at the helpers in cgroup_base_func_proto() to get a sense
> > > > on sleepable support.  Only the bpf_get_local_storage caught my eyes for
> > > > now because it uses a call_rcu to free the storage.  That will be the
> > > > major one to change for sleepable that I can think of for now.
> > >
> > > That rcu_read_lock should be switched over to rcu_read_lock_trace in
> > > the sleepable case I'm assuming? Are we allowed to sleep while holding
> > > rcu_read_lock_trace?
> > Ah. right, suddenly forgot the obvious in between emails :(
> >
> > In that sense, may as well remove the rcu_read_lock() here and let
> > the trampoline to decide which one (rcu_read_lock or rcu_read_lock_trace)
> > to call before calling the shim_prog.  The __bpf_prog_enter(_sleepable) will
> > call the right rcu_read_lock(_trace) based on the prog is sleepable or not.
> 
> Removing rcu_read_lock in __cgroup_bpf_run_lsm_current might be
> problematic because we also want to guarantee current's cgroup doesn't
> go away. I'm assuming things like task migrating to a new cgroup and
> the old one being freed can happen while we are trying to get cgroup's
> effective array.
Right, sleepable one may need a short rcu_read_lock only upto
a point that the cgrp->bpf.effective[...] is obtained.
call_rcu_tasks_trace() is then needed to free the bpf_prog_array.

The future sleepable one may be better off to have a different shim func,
not sure.  rcu_read_lock() can be added back later if it ends up reusing
the same shim func is cleaner.

> I guess BPF_PROG_RUN_ARRAY_CG will also need some work before
> sleepable can happen (it calls rcu_read_lock unconditionally).
Yep.  I think so.

> 
> Also, it doesn't seem like BPF_PROG_RUN_ARRAY_CG rcu usage is correct.
> It receives __rcu array_rcu, takes rcu read lock and does deref. I'm
> assuming that array_rcu can be free'd before we even get to
> BPF_PROG_RUN_ARRAY_CG's rcu_read_lock? (so having rcu_read_lock around
> BPF_PROG_RUN_ARRAY_CG makes sense)
BPF_PROG_RUN_ARRAY_CG is __always_inline though.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12 20:19               ` Martin KaFai Lau
@ 2022-04-12 20:36                 ` Stanislav Fomichev
  2022-04-12 22:13                   ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-12 20:36 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Tue, Apr 12, 2022 at 1:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Apr 12, 2022 at 12:01:41PM -0700, Stanislav Fomichev wrote:
> > On Tue, Apr 12, 2022 at 11:13 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Tue, Apr 12, 2022 at 09:42:40AM -0700, Stanislav Fomichev wrote:
> > > > On Mon, Apr 11, 2022 at 6:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> > > > > > On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > > > > > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > > > > > > and bpf_cgroup program array. Instead of reserving a slot per
> > > > > > > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > > > > > > Those slots are dynamically allocated on demand and reclaimed.
> > > > > > > > This still adds some bloat to the cgroup and brings us back to
> > > > > > > > roughly pre-cgroup_bpf_attach_type times.
> > > > > > > >
> > > > > > > > It should be possible to eventually extend this idea to all hooks if
> > > > > > > > the memory consumption is unacceptable and shrink overall effective
> > > > > > > > programs array.
> > > > > > > >
> > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > > ---
> > > > > > > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > > > > > > >  include/linux/bpf_lsm.h         |  6 ---
> > > > > > > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > > > > > > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > > > > > > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > > > > > > index 6c661b4df9fa..d42516e86b3a 100644
> > > > > > > > --- a/include/linux/bpf-cgroup-defs.h
> > > > > > > > +++ b/include/linux/bpf-cgroup-defs.h
> > > > > > > > @@ -10,7 +10,9 @@
> > > > > > > >
> > > > > > > >  struct bpf_prog_array;
> > > > > > > >
> > > > > > > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > > > > > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > > > > > > + */
> > > > > > > > +#define CGROUP_LSM_NUM 10
> > > > > > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > > > > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > > > > > a static 211 (and potentially growing in the future) is not good either.
> > > > > > > I currently do not have a better idea also. :/
> > > > > > >
> > > > > > > Have you thought about other dynamic schemes or they would be too slow ?
> > > > > > >
> > > > > > > >  enum cgroup_bpf_attach_type {
> > > > > > > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > > > > > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > > > > > > index 7f0e59f5f9be..613de44aa429 100644
> > > > > > > > --- a/include/linux/bpf_lsm.h
> > > > > > > > +++ b/include/linux/bpf_lsm.h
> > > > > > > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > > > > > > >  void bpf_inode_storage_free(struct inode *inode);
> > > > > > > >
> > > > > > > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > > > > > > -int bpf_lsm_hook_idx(u32 btf_id);
> > > > > > > >
> > > > > > > >  #else /* !CONFIG_BPF_LSM */
> > > > > > > >
> > > > > > > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > > > > > > >       return -ENOENT;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > > > > > > -{
> > > > > > > > -     return -EINVAL;
> > > > > > > > -}
> > > > > > > > -
> > > > > > > >  #endif /* CONFIG_BPF_LSM */
> > > > > > > >
> > > > > > > >  #endif /* _LINUX_BPF_LSM_H */
> > > > > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > > > > index eca258ba71d8..8b948ec9ab73 100644
> > > > > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > > > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > > > > > >       if (unlikely(!sk))
> > > > > > > >               return 0;
> > > > > > > >
> > > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > > > > > >       if (likely(cgrp))
> > > > > > > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > > > > > >                                           ctx, bpf_prog_run, 0);
> > > > > > > > +     rcu_read_unlock();
> > > > > > > >       return ret;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > > > > > > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > > > > > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > > > > > >
> > > > > > > > -     rcu_read_lock();
> > > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > > I think this is also needed for task_dfl_cgroup().  If yes,
> > > > > > > will be a good idea to adjust the comment if it ends up
> > > > > > > using the 'CGROUP_LSM_NUM 10' scheme.
> > > > > > >
> > > > > > > While at rcu_read_lock(), have you thought about what major things are
> > > > > > > needed to make BPF_LSM_CGROUP sleepable ?
> > > > > > >
> > > > > > > The cgroup local storage could be one that require changes but it seems
> > > > > > > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > > > > > > The current use case doesn't need it?
> > > > > >
> > > > > > No, I haven't thought about sleepable at all yet :-( But seems like
> > > > > > having that rcu lock here might be problematic if we want to sleep? In
> > > > > > this case, Jakub's suggestion seems better.
> > > > > The new rcu_read_lock() here seems fine after some thoughts.
> > > > >
> > > > > I was looking at the helpers in cgroup_base_func_proto() to get a sense
> > > > > on sleepable support.  Only the bpf_get_local_storage caught my eyes for
> > > > > now because it uses a call_rcu to free the storage.  That will be the
> > > > > major one to change for sleepable that I can think of for now.
> > > >
> > > > That rcu_read_lock should be switched over to rcu_read_lock_trace in
> > > > the sleepable case I'm assuming? Are we allowed to sleep while holding
> > > > rcu_read_lock_trace?
> > > Ah. right, suddenly forgot the obvious in between emails :(
> > >
> > > In that sense, may as well remove the rcu_read_lock() here and let
> > > the trampoline to decide which one (rcu_read_lock or rcu_read_lock_trace)
> > > to call before calling the shim_prog.  The __bpf_prog_enter(_sleepable) will
> > > call the right rcu_read_lock(_trace) based on the prog is sleepable or not.
> >
> > Removing rcu_read_lock in __cgroup_bpf_run_lsm_current might be
> > problematic because we also want to guarantee current's cgroup doesn't
> > go away. I'm assuming things like task migrating to a new cgroup and
> > the old one being freed can happen while we are trying to get cgroup's
> > effective array.
> Right, sleepable one may need a short rcu_read_lock only upto
> a point that the cgrp->bpf.effective[...] is obtained.
> call_rcu_tasks_trace() is then needed to free the bpf_prog_array.
>
> The future sleepable one may be better off to have a different shim func,
> not sure.  rcu_read_lock() can be added back later if it ends up reusing
> the same shim func is cleaner.

In this case I'll probably have rcu_read_lock for
cgroup+bpf_lsm_attach_type_get for the current shim.

> > I guess BPF_PROG_RUN_ARRAY_CG will also need some work before
> > sleepable can happen (it calls rcu_read_lock unconditionally).
> Yep.  I think so.
>
> >
> > Also, it doesn't seem like BPF_PROG_RUN_ARRAY_CG rcu usage is correct.
> > It receives __rcu array_rcu, takes rcu read lock and does deref. I'm
> > assuming that array_rcu can be free'd before we even get to
> > BPF_PROG_RUN_ARRAY_CG's rcu_read_lock? (so having rcu_read_lock around
> > BPF_PROG_RUN_ARRAY_CG makes sense)
> BPF_PROG_RUN_ARRAY_CG is __always_inline though.

Does it help? This should still expand to the following, right?

array_rcu = cgrp->bpf.effective[atype];

/* theoretically, array_rcu can be freed here? */

rcu_read_lock();
array = rcu_dereference(array_rcu);
...

Feels like the callers of BPF_PROG_RUN_ARRAY_CG really have to care
about rcu locking, not the BPF_PROG_RUN_ARRAY_CG itself.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12 20:36                 ` Stanislav Fomichev
@ 2022-04-12 22:13                   ` Martin KaFai Lau
  2022-04-12 22:42                     ` Stanislav Fomichev
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2022-04-12 22:13 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii

On Tue, Apr 12, 2022 at 01:36:45PM -0700, Stanislav Fomichev wrote:
> On Tue, Apr 12, 2022 at 1:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Tue, Apr 12, 2022 at 12:01:41PM -0700, Stanislav Fomichev wrote:
> > > On Tue, Apr 12, 2022 at 11:13 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Tue, Apr 12, 2022 at 09:42:40AM -0700, Stanislav Fomichev wrote:
> > > > > On Mon, Apr 11, 2022 at 6:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > >
> > > > > > On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> > > > > > > On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > > > > > > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > > > > > > > and bpf_cgroup program array. Instead of reserving a slot per
> > > > > > > > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > > > > > > > Those slots are dynamically allocated on demand and reclaimed.
> > > > > > > > > This still adds some bloat to the cgroup and brings us back to
> > > > > > > > > roughly pre-cgroup_bpf_attach_type times.
> > > > > > > > >
> > > > > > > > > It should be possible to eventually extend this idea to all hooks if
> > > > > > > > > the memory consumption is unacceptable and shrink overall effective
> > > > > > > > > programs array.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > > > ---
> > > > > > > > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > > > > > > > >  include/linux/bpf_lsm.h         |  6 ---
> > > > > > > > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > > > > > > > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > > > > > > > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > > > > > > > index 6c661b4df9fa..d42516e86b3a 100644
> > > > > > > > > --- a/include/linux/bpf-cgroup-defs.h
> > > > > > > > > +++ b/include/linux/bpf-cgroup-defs.h
> > > > > > > > > @@ -10,7 +10,9 @@
> > > > > > > > >
> > > > > > > > >  struct bpf_prog_array;
> > > > > > > > >
> > > > > > > > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > > > > > > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > > > > > > > + */
> > > > > > > > > +#define CGROUP_LSM_NUM 10
> > > > > > > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > > > > > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > > > > > > a static 211 (and potentially growing in the future) is not good either.
> > > > > > > > I currently do not have a better idea also. :/
> > > > > > > >
> > > > > > > > Have you thought about other dynamic schemes or they would be too slow ?
> > > > > > > >
> > > > > > > > >  enum cgroup_bpf_attach_type {
> > > > > > > > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > > > > > > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > > > > > > > index 7f0e59f5f9be..613de44aa429 100644
> > > > > > > > > --- a/include/linux/bpf_lsm.h
> > > > > > > > > +++ b/include/linux/bpf_lsm.h
> > > > > > > > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > > > > > > > >  void bpf_inode_storage_free(struct inode *inode);
> > > > > > > > >
> > > > > > > > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > > > > > > > -int bpf_lsm_hook_idx(u32 btf_id);
> > > > > > > > >
> > > > > > > > >  #else /* !CONFIG_BPF_LSM */
> > > > > > > > >
> > > > > > > > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > > > > > > > >       return -ENOENT;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > > > > > > > -{
> > > > > > > > > -     return -EINVAL;
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > >  #endif /* CONFIG_BPF_LSM */
> > > > > > > > >
> > > > > > > > >  #endif /* _LINUX_BPF_LSM_H */
> > > > > > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > > > > > index eca258ba71d8..8b948ec9ab73 100644
> > > > > > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > > > > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > > > > > > >       if (unlikely(!sk))
> > > > > > > > >               return 0;
> > > > > > > > >
> > > > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > > > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > > > > > > >       if (likely(cgrp))
> > > > > > > > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > > > > > > >                                           ctx, bpf_prog_run, 0);
> > > > > > > > > +     rcu_read_unlock();
> > > > > > > > >       return ret;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > > > > > > > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > > > > > > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > > > > > > >
> > > > > > > > > -     rcu_read_lock();
> > > > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > > > I think this is also needed for task_dfl_cgroup().  If yes,
> > > > > > > > will be a good idea to adjust the comment if it ends up
> > > > > > > > using the 'CGROUP_LSM_NUM 10' scheme.
> > > > > > > >
> > > > > > > > While at rcu_read_lock(), have you thought about what major things are
> > > > > > > > needed to make BPF_LSM_CGROUP sleepable ?
> > > > > > > >
> > > > > > > > The cgroup local storage could be one that require changes but it seems
> > > > > > > > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > > > > > > > The current use case doesn't need it?
> > > > > > >
> > > > > > > No, I haven't thought about sleepable at all yet :-( But seems like
> > > > > > > having that rcu lock here might be problematic if we want to sleep? In
> > > > > > > this case, Jakub's suggestion seems better.
> > > > > > The new rcu_read_lock() here seems fine after some thoughts.
> > > > > >
> > > > > > I was looking at the helpers in cgroup_base_func_proto() to get a sense
> > > > > > on sleepable support.  Only the bpf_get_local_storage caught my eyes for
> > > > > > now because it uses a call_rcu to free the storage.  That will be the
> > > > > > major one to change for sleepable that I can think of for now.
> > > > >
> > > > > That rcu_read_lock should be switched over to rcu_read_lock_trace in
> > > > > the sleepable case I'm assuming? Are we allowed to sleep while holding
> > > > > rcu_read_lock_trace?
> > > > Ah. right, suddenly forgot the obvious in between emails :(
> > > >
> > > > In that sense, may as well remove the rcu_read_lock() here and let
> > > > the trampoline to decide which one (rcu_read_lock or rcu_read_lock_trace)
> > > > to call before calling the shim_prog.  The __bpf_prog_enter(_sleepable) will
> > > > call the right rcu_read_lock(_trace) based on the prog is sleepable or not.
> > >
> > > Removing rcu_read_lock in __cgroup_bpf_run_lsm_current might be
> > > problematic because we also want to guarantee current's cgroup doesn't
> > > go away. I'm assuming things like task migrating to a new cgroup and
> > > the old one being freed can happen while we are trying to get cgroup's
> > > effective array.
> > Right, sleepable one may need a short rcu_read_lock only upto
> > a point that the cgrp->bpf.effective[...] is obtained.
> > call_rcu_tasks_trace() is then needed to free the bpf_prog_array.
> >
> > The future sleepable one may be better off to have a different shim func,
> > not sure.  rcu_read_lock() can be added back later if it ends up reusing
> > the same shim func is cleaner.
> 
> In this case I'll probably have rcu_read_lock for
> cgroup+bpf_lsm_attach_type_get for the current shim.
yeah, depending on rcu grace period to free up cgroup_lsm_atype_btf_id
should be fine.  It just needs to wait another grace period for sleepable
in the future.

Also, just came to my mind, if it wants sleepable and non-sleepable
to be in the same cgrp->bpf.effective[] array.  It may need more
thoughts on when to do the rcu_read_lock() and rcu_read_trace_lock().

> > > I guess BPF_PROG_RUN_ARRAY_CG will also need some work before
> > > sleepable can happen (it calls rcu_read_lock unconditionally).
> > Yep.  I think so.
> >
> > >
> > > Also, it doesn't seem like BPF_PROG_RUN_ARRAY_CG rcu usage is correct.
> > > It receives __rcu array_rcu, takes rcu read lock and does deref. I'm
> > > assuming that array_rcu can be free'd before we even get to
> > > BPF_PROG_RUN_ARRAY_CG's rcu_read_lock? (so having rcu_read_lock around
> > > BPF_PROG_RUN_ARRAY_CG makes sense)
> > BPF_PROG_RUN_ARRAY_CG is __always_inline though.
> 
> Does it help? This should still expand to the following, right?
> 
> array_rcu = cgrp->bpf.effective[atype];
I think you are right:

86   	 	   ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
0xffffffff812534bb <+155>:	mov    -0x10(%rbx),%rdx
0xffffffff812534bf <+159>:	movl   $0x0,-0x38(%rbp)
0xffffffff812534c6 <+166>:	movslq 0x300(%rdx),%rdx
0xffffffff812534cd <+173>:	mov    0x500(%rax,%rdx,8),%rbx

[ ... ]

1375	array = rcu_dereference(array_rcu);
0xffffffff8125350d <+237>:	callq  0xffffffff81145a50 <rcu_read_lock_held>
0xffffffff81253512 <+242>:	test   %eax,%eax
0xffffffff81253514 <+244>:	je     0xffffffff812537a7 <__cgroup_bpf_run_lsm_current+903>

[ ... ]

1376        item = &array->items[0];
0xffffffff8125351a <+250>:    lea    -0x40(%rbp),%rdx
0xffffffff8125351e <+254>:    mov    %gs:0x1af40,%rax
0xffffffff81253527 <+263>:    lea    0x10(%rbx),%r12

[ ... ]

1378        while ((prog = READ_ONCE(item->prog))) {
0xffffffff81253541 <+289>:    test   %rbx,%rbx
0xffffffff81253544 <+292>:    je     0xffffffff81253596 <__cgroup_bpf_run_lsm_current+374>


Do you know if a macro can work as expected ?


> /* theoretically, array_rcu can be freed here? */
> 
> rcu_read_lock();
> array = rcu_dereference(array_rcu);
> ...
> 
> Feels like the callers of BPF_PROG_RUN_ARRAY_CG really have to care
> about rcu locking, not the BPF_PROG_RUN_ARRAY_CG itself.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-12 22:13                   ` Martin KaFai Lau
@ 2022-04-12 22:42                     ` Stanislav Fomichev
  0 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-12 22:42 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii

On Tue, Apr 12, 2022 at 3:13 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Apr 12, 2022 at 01:36:45PM -0700, Stanislav Fomichev wrote:
> > On Tue, Apr 12, 2022 at 1:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Tue, Apr 12, 2022 at 12:01:41PM -0700, Stanislav Fomichev wrote:
> > > > On Tue, Apr 12, 2022 at 11:13 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Tue, Apr 12, 2022 at 09:42:40AM -0700, Stanislav Fomichev wrote:
> > > > > > On Mon, Apr 11, 2022 at 6:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > >
> > > > > > > On Mon, Apr 11, 2022 at 11:46:20AM -0700, Stanislav Fomichev wrote:
> > > > > > > > On Fri, Apr 8, 2022 at 3:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Apr 07, 2022 at 03:31:08PM -0700, Stanislav Fomichev wrote:
> > > > > > > > > > Previous patch adds 1:1 mapping between all 211 LSM hooks
> > > > > > > > > > and bpf_cgroup program array. Instead of reserving a slot per
> > > > > > > > > > possible hook, reserve 10 slots per cgroup for lsm programs.
> > > > > > > > > > Those slots are dynamically allocated on demand and reclaimed.
> > > > > > > > > > This still adds some bloat to the cgroup and brings us back to
> > > > > > > > > > roughly pre-cgroup_bpf_attach_type times.
> > > > > > > > > >
> > > > > > > > > > It should be possible to eventually extend this idea to all hooks if
> > > > > > > > > > the memory consumption is unacceptable and shrink overall effective
> > > > > > > > > > programs array.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > > > > ---
> > > > > > > > > >  include/linux/bpf-cgroup-defs.h |  4 +-
> > > > > > > > > >  include/linux/bpf_lsm.h         |  6 ---
> > > > > > > > > >  kernel/bpf/bpf_lsm.c            |  9 ++--
> > > > > > > > > >  kernel/bpf/cgroup.c             | 96 ++++++++++++++++++++++++++++-----
> > > > > > > > > >  4 files changed, 90 insertions(+), 25 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > > > > > > > > > index 6c661b4df9fa..d42516e86b3a 100644
> > > > > > > > > > --- a/include/linux/bpf-cgroup-defs.h
> > > > > > > > > > +++ b/include/linux/bpf-cgroup-defs.h
> > > > > > > > > > @@ -10,7 +10,9 @@
> > > > > > > > > >
> > > > > > > > > >  struct bpf_prog_array;
> > > > > > > > > >
> > > > > > > > > > -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */
> > > > > > > > > > +/* Maximum number of concurrently attachable per-cgroup LSM hooks.
> > > > > > > > > > + */
> > > > > > > > > > +#define CGROUP_LSM_NUM 10
> > > > > > > > > hmm...only 10 different lsm hooks (or 10 different attach_btf_ids) can
> > > > > > > > > have BPF_LSM_CGROUP programs attached.  This feels quite limited but having
> > > > > > > > > a static 211 (and potentially growing in the future) is not good either.
> > > > > > > > > I currently do not have a better idea also. :/
> > > > > > > > >
> > > > > > > > > Have you thought about other dynamic schemes or they would be too slow ?
> > > > > > > > >
> > > > > > > > > >  enum cgroup_bpf_attach_type {
> > > > > > > > > >       CGROUP_BPF_ATTACH_TYPE_INVALID = -1,
> > > > > > > > > > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > > > > > > > > > index 7f0e59f5f9be..613de44aa429 100644
> > > > > > > > > > --- a/include/linux/bpf_lsm.h
> > > > > > > > > > +++ b/include/linux/bpf_lsm.h
> > > > > > > > > > @@ -43,7 +43,6 @@ extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
> > > > > > > > > >  void bpf_inode_storage_free(struct inode *inode);
> > > > > > > > > >
> > > > > > > > > >  int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
> > > > > > > > > > -int bpf_lsm_hook_idx(u32 btf_id);
> > > > > > > > > >
> > > > > > > > > >  #else /* !CONFIG_BPF_LSM */
> > > > > > > > > >
> > > > > > > > > > @@ -74,11 +73,6 @@ static inline int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > > > > > > > > >       return -ENOENT;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > -static inline int bpf_lsm_hook_idx(u32 btf_id)
> > > > > > > > > > -{
> > > > > > > > > > -     return -EINVAL;
> > > > > > > > > > -}
> > > > > > > > > > -
> > > > > > > > > >  #endif /* CONFIG_BPF_LSM */
> > > > > > > > > >
> > > > > > > > > >  #endif /* _LINUX_BPF_LSM_H */
> > > > > > > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > > > > > > index eca258ba71d8..8b948ec9ab73 100644
> > > > > > > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > > > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > > > > > > @@ -57,10 +57,12 @@ static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > > > > > > > >       if (unlikely(!sk))
> > > > > > > > > >               return 0;
> > > > > > > > > >
> > > > > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > > > > >       cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > > > > > > > >       if (likely(cgrp))
> > > > > > > > > >               ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > > > > > > > >                                           ctx, bpf_prog_run, 0);
> > > > > > > > > > +     rcu_read_unlock();
> > > > > > > > > >       return ret;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > @@ -77,7 +79,7 @@ static unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > > > > > > > > >       /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > > > > > > > >       prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > > > > > > > > >
> > > > > > > > > > -     rcu_read_lock();
> > > > > > > > > > +     rcu_read_lock(); /* See bpf_lsm_attach_type_get(). */
> > > > > > > > > I think this is also needed for task_dfl_cgroup().  If yes,
> > > > > > > > > will be a good idea to adjust the comment if it ends up
> > > > > > > > > using the 'CGROUP_LSM_NUM 10' scheme.
> > > > > > > > >
> > > > > > > > > While at rcu_read_lock(), have you thought about what major things are
> > > > > > > > > needed to make BPF_LSM_CGROUP sleepable ?
> > > > > > > > >
> > > > > > > > > The cgroup local storage could be one that require changes but it seems
> > > > > > > > > the cgroup local storage is not available to BPF_LSM_GROUP in this change set.
> > > > > > > > > The current use case doesn't need it?
> > > > > > > >
> > > > > > > > No, I haven't thought about sleepable at all yet :-( But seems like
> > > > > > > > having that rcu lock here might be problematic if we want to sleep? In
> > > > > > > > this case, Jakub's suggestion seems better.
> > > > > > > The new rcu_read_lock() here seems fine after some thoughts.
> > > > > > >
> > > > > > > I was looking at the helpers in cgroup_base_func_proto() to get a sense
> > > > > > > on sleepable support.  Only the bpf_get_local_storage caught my eyes for
> > > > > > > now because it uses a call_rcu to free the storage.  That will be the
> > > > > > > major one to change for sleepable that I can think of for now.
> > > > > >
> > > > > > That rcu_read_lock should be switched over to rcu_read_lock_trace in
> > > > > > the sleepable case I'm assuming? Are we allowed to sleep while holding
> > > > > > rcu_read_lock_trace?
> > > > > Ah. right, suddenly forgot the obvious in between emails :(
> > > > >
> > > > > In that sense, may as well remove the rcu_read_lock() here and let
> > > > > the trampoline to decide which one (rcu_read_lock or rcu_read_lock_trace)
> > > > > to call before calling the shim_prog.  The __bpf_prog_enter(_sleepable) will
> > > > > call the right rcu_read_lock(_trace) based on the prog is sleepable or not.
> > > >
> > > > Removing rcu_read_lock in __cgroup_bpf_run_lsm_current might be
> > > > problematic because we also want to guarantee current's cgroup doesn't
> > > > go away. I'm assuming things like task migrating to a new cgroup and
> > > > the old one being freed can happen while we are trying to get cgroup's
> > > > effective array.
> > > Right, sleepable one may need a short rcu_read_lock only upto
> > > a point that the cgrp->bpf.effective[...] is obtained.
> > > call_rcu_tasks_trace() is then needed to free the bpf_prog_array.
> > >
> > > The future sleepable one may be better off to have a different shim func,
> > > not sure.  rcu_read_lock() can be added back later if it ends up reusing
> > > the same shim func is cleaner.
> >
> > In this case I'll probably have rcu_read_lock for
> > cgroup+bpf_lsm_attach_type_get for the current shim.
> yeah, depending on rcu grace period to free up cgroup_lsm_atype_btf_id
> should be fine.  It just needs to wait another grace period for sleepable
> in the future.
>
> Also, just came to my mind, if it wants sleepable and non-sleepable
> to be in the same cgrp->bpf.effective[] array.  It may need more
> thoughts on when to do the rcu_read_lock() and rcu_read_trace_lock().

Ack. I'll try to put these details into a commit message so once we
get to the sleepable support we won't have to do these investigations
again.

> > > > I guess BPF_PROG_RUN_ARRAY_CG will also need some work before
> > > > sleepable can happen (it calls rcu_read_lock unconditionally).
> > > Yep.  I think so.
> > >
> > > >
> > > > Also, it doesn't seem like BPF_PROG_RUN_ARRAY_CG rcu usage is correct.
> > > > It receives __rcu array_rcu, takes rcu read lock and does deref. I'm
> > > > assuming that array_rcu can be free'd before we even get to
> > > > BPF_PROG_RUN_ARRAY_CG's rcu_read_lock? (so having rcu_read_lock around
> > > > BPF_PROG_RUN_ARRAY_CG makes sense)
> > > BPF_PROG_RUN_ARRAY_CG is __always_inline though.
> >
> > Does it help? This should still expand to the following, right?
> >
> > array_rcu = cgrp->bpf.effective[atype];
> I think you are right:
>
> 86                 ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> 0xffffffff812534bb <+155>:      mov    -0x10(%rbx),%rdx
> 0xffffffff812534bf <+159>:      movl   $0x0,-0x38(%rbp)
> 0xffffffff812534c6 <+166>:      movslq 0x300(%rdx),%rdx
> 0xffffffff812534cd <+173>:      mov    0x500(%rax,%rdx,8),%rbx
>
> [ ... ]
>
> 1375    array = rcu_dereference(array_rcu);
> 0xffffffff8125350d <+237>:      callq  0xffffffff81145a50 <rcu_read_lock_held>
> 0xffffffff81253512 <+242>:      test   %eax,%eax
> 0xffffffff81253514 <+244>:      je     0xffffffff812537a7 <__cgroup_bpf_run_lsm_current+903>
>
> [ ... ]
>
> 1376        item = &array->items[0];
> 0xffffffff8125351a <+250>:    lea    -0x40(%rbp),%rdx
> 0xffffffff8125351e <+254>:    mov    %gs:0x1af40,%rax
> 0xffffffff81253527 <+263>:    lea    0x10(%rbx),%r12
>
> [ ... ]
>
> 1378        while ((prog = READ_ONCE(item->prog))) {
> 0xffffffff81253541 <+289>:    test   %rbx,%rbx
> 0xffffffff81253544 <+292>:    je     0xffffffff81253596 <__cgroup_bpf_run_lsm_current+374>
>
>
> Do you know if a macro can work as expected ?
>
>
> > /* theoretically, array_rcu can be freed here? */
> >
> > rcu_read_lock();
> > array = rcu_dereference(array_rcu);
> > ...
> >
> > Feels like the callers of BPF_PROG_RUN_ARRAY_CG really have to care
> > about rcu locking, not the BPF_PROG_RUN_ARRAY_CG itself.

Oh, right, they've been broken since we converted from a define to an
inline function. With the define it should've been working correctly.

I can move those rcu_read_lock to the callers of
BPF_PROG_RUN_ARRAY_CG_FLAGS/BPF_PROG_RUN_ARRAY_CG/BPF_PROG_RUN_ARRAY.
Doesn't seem like going back to the defines is the way to go.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-11 18:44       ` Stanislav Fomichev
@ 2022-04-15 17:39         ` Jakub Sitnicki
  2022-04-15 18:46           ` Stanislav Fomichev
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Sitnicki @ 2022-04-15 17:39 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: Martin KaFai Lau, netdev, bpf, ast, daniel, andrii

On Mon, Apr 11, 2022 at 11:44 AM -07, Stanislav Fomichev wrote:
> On Sat, Apr 9, 2022 at 11:10 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:

[...]

>> [^1] It looks like we can easily switch from cgroup->bpf.progs[] from
>>      list_head to hlist_head and save some bytes!
>>
>>      We only access the list tail in __cgroup_bpf_attach(). We can
>>      either iterate over the list and eat the cost there or push the new
>>      prog onto the front.
>>
>>      I think we treat cgroup->bpf.progs[] everywhere like an unordered
>>      set. Except for __cgroup_bpf_query, where the user might notice the
>>      order change in the BPF_PROG_QUERY dump.
>
>
> [...]
>
>> [^2] Unrelated, but we would like to propose a
>>      CGROUP_INET[46]_POST_CONNECT hook in the near future to make it
>>      easier to bind UDP sockets to 4-tuple without creating conflicts:
>>
>>      https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx/ebpf_connect4
>
> Do you think those new lsm hooks can be used instead? If not, what's missing?

Same as for CGROUP_INET hooks, there is no post-connect() LSM hook.

Why are we looking for a post-connect hook?

Having a pre- and a post- connect hook, would allow us to turn the whole
connect() syscall into a critical section with synchronization done in
BPF - lock on pre-connect, unlock on post-connect.

Why do we want to serialize connect() calls?

To check for 4-tuple conflict with an existing unicast UDP socket, in
which case we want fail connect() if there is a conflict.

That said, ideally we would rather have a mechanism like
IP_BIND_ADDRESS_NO_PORT, but for UDP, and one that allows selecting both
an local IP and port.

We're hoping to put together an RFC sometime this quarter.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program
  2022-04-15 17:39         ` Jakub Sitnicki
@ 2022-04-15 18:46           ` Stanislav Fomichev
  0 siblings, 0 replies; 33+ messages in thread
From: Stanislav Fomichev @ 2022-04-15 18:46 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: Martin KaFai Lau, netdev, bpf, ast, daniel, andrii

On Fri, Apr 15, 2022 at 10:49 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Mon, Apr 11, 2022 at 11:44 AM -07, Stanislav Fomichev wrote:
> > On Sat, Apr 9, 2022 at 11:10 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> [...]
>
> >> [^1] It looks like we can easily switch from cgroup->bpf.progs[] from
> >>      list_head to hlist_head and save some bytes!
> >>
> >>      We only access the list tail in __cgroup_bpf_attach(). We can
> >>      either iterate over the list and eat the cost there or push the new
> >>      prog onto the front.
> >>
> >>      I think we treat cgroup->bpf.progs[] everywhere like an unordered
> >>      set. Except for __cgroup_bpf_query, where the user might notice the
> >>      order change in the BPF_PROG_QUERY dump.
> >
> >
> > [...]
> >
> >> [^2] Unrelated, but we would like to propose a
> >>      CGROUP_INET[46]_POST_CONNECT hook in the near future to make it
> >>      easier to bind UDP sockets to 4-tuple without creating conflicts:
> >>
> >>      https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx/ebpf_connect4
> >
> > Do you think those new lsm hooks can be used instead? If not, what's missing?
>
> Same as for CGROUP_INET hooks, there is no post-connect() LSM hook.

There is inet_conn_established, but looks like it triggers only for
tcp. Selinux is the only user, so I'm assuming we should be able to
extend it as needed?

I'm not sure how far we can go with adding custom hooks :-( So moving
to fentry/lsm seems like the way to go. Maybe we should follow up with
a per-cgroup fentry as well :-D


> Why are we looking for a post-connect hook?
>
> Having a pre- and a post- connect hook, would allow us to turn the whole
> connect() syscall into a critical section with synchronization done in
> BPF - lock on pre-connect, unlock on post-connect.
>
> Why do we want to serialize connect() calls?
>
> To check for 4-tuple conflict with an existing unicast UDP socket, in
> which case we want fail connect() if there is a conflict.
>
> That said, ideally we would rather have a mechanism like
> IP_BIND_ADDRESS_NO_PORT, but for UDP, and one that allows selecting both
> an local IP and port.
>
> We're hoping to put together an RFC sometime this quarter.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-04-15 18:46 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 22:31 [PATCH bpf-next v3 0/7] bpf: cgroup_sock lsm flavor Stanislav Fomichev
2022-04-07 22:31 ` [PATCH bpf-next v3 1/7] bpf: add bpf_func_t and trampoline helpers Stanislav Fomichev
2022-04-07 22:31 ` [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor Stanislav Fomichev
2022-04-08 14:20   ` kernel test robot
2022-04-08 15:53   ` kernel test robot
2022-04-08 16:42     ` Martin KaFai Lau
2022-04-08 22:12   ` Martin KaFai Lau
2022-04-11 19:07     ` Stanislav Fomichev
2022-04-12  1:04       ` Martin KaFai Lau
2022-04-12 16:42         ` Stanislav Fomichev
2022-04-11  8:26   ` Dan Carpenter
2022-04-07 22:31 ` [PATCH bpf-next v3 3/7] bpf: minimize number of allocated lsm slots per program Stanislav Fomichev
2022-04-08 22:56   ` Martin KaFai Lau
2022-04-09 17:04     ` Jakub Sitnicki
2022-04-11 18:44       ` Stanislav Fomichev
2022-04-15 17:39         ` Jakub Sitnicki
2022-04-15 18:46           ` Stanislav Fomichev
2022-04-12  1:19       ` Martin KaFai Lau
2022-04-12 16:42         ` Stanislav Fomichev
2022-04-12 17:40           ` Martin KaFai Lau
2022-04-11 18:46     ` Stanislav Fomichev
2022-04-12  1:36       ` Martin KaFai Lau
2022-04-12 16:42         ` Stanislav Fomichev
2022-04-12 18:13           ` Martin KaFai Lau
2022-04-12 19:01             ` Stanislav Fomichev
2022-04-12 20:19               ` Martin KaFai Lau
2022-04-12 20:36                 ` Stanislav Fomichev
2022-04-12 22:13                   ` Martin KaFai Lau
2022-04-12 22:42                     ` Stanislav Fomichev
2022-04-07 22:31 ` [PATCH bpf-next v3 4/7] bpf: allow writing to a subset of sock fields from lsm progtype Stanislav Fomichev
2022-04-07 22:31 ` [PATCH bpf-next v3 5/7] libbpf: add lsm_cgoup_sock type Stanislav Fomichev
2022-04-07 22:31 ` [PATCH bpf-next v3 6/7] selftests/bpf: lsm_cgroup functional test Stanislav Fomichev
2022-04-07 22:31 ` [PATCH bpf-next v3 7/7] selftests/bpf: verify lsm_cgroup struct sock access Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).