All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value
@ 2021-07-26 16:11 Andrii Nakryiko
  2021-07-26 16:11 ` [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
                   ` (13 more replies)
  0 siblings, 14 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:11 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

This patch set implements an ability for users to specify custom black box u64
value for each BPF program attachment, which is available to BPF program at
runtime. This is a feature that's critically missing for cases when some sort
of generic processing needs to be done by the common BPF program logic (or
even exactly the same BPF program) across multiple BPF hooks (e.g., many
uniformly handled kprobes) and it's important to be able to distinguish
between each BPF hook at runtime (e.g., for additional configuration lookup).

Currently, something like that can be only achieved through:
  - code-generation and BPF program cloning, which is very complicated and
    unmaintainable;
  - on-the-fly C code generation and further runtime compilation, which is
    what BCC uses and allows to do pretty simply. The big downside is a very
    heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
    many BPF program clones and the compilation process itself);
  - in some cases (kprobes and sometimes uprobes) it's possible to do function
    IP lookup to get function-specific configuration. This doesn't work for
    all the cases (e.g., when attaching uprobes to shared libraries) and has
    higher runtime overhead and additional programming complexity due to
    BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
    BPF helper was added, it was also very complicated and unstable (API-wise)
    to get traced function's IP from fentry/fexit and kretprobe.

With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
to build generic tracing tooling simply and efficiently, ability to provide
additional user context value for each *attachment* (as opposed to each BPF
program) is extremely important. Two immediate users of this functionality are
going to be libbpf-based USDT library (currently in development) and retsnoop
([0]), but I'm sure more applications will come once users get this feature in
their kernels.

To achieve above described, all perf_event-based BPF hooks are made available
through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
LINK_CREATE command for program attachments and generally brings
perf_event-based attachments into a common BPF link infrastructure.

With that, LINK_CREATE gets ability to pass throught user_ctx value during
link creation (BPF program attachment) time. bpf_get_user_ctx() BPF helper is
added to allow fetching this value at runtime from BPF program side. user_ctx
is stored either on struct perf_event itself and fetched from the BPF program
context, or is passed through ambient BPF run context, added in
c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current").

On the libbpf side of things, BPF perf link is utilized whenever is supported
by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
All the tracing attach APIs are extended with OPTS and user_ctx is passed
through corresponding opts structs.

Last part of the patch set adds few self-tests utilizing new APIs.

There are also a few refactorings along the way to make things cleaner and
easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
throughout libbpf and selftests.

Follow-up patches will extend user_ctx to fentry/fexit programs.

  [0] https://github.com/anakryiko/retsnoop

Cc: Peter Zijlstra <peterz@infradead.org> # for perf_event changes

v1->v2:
  - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.

Andrii Nakryiko (14):
  bpf: refactor BPF_PROG_RUN into a function
  bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions
  bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input
  bpf: implement minimal BPF perf link
  bpf: allow to specify user-provided context value for BPF perf links
  bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value
  libbpf: re-build libbpf.so when libbpf.map changes
  libbpf: remove unused bpf_link's destroy operation, but add dealloc
  libbpf: use BPF perf link when supported by kernel
  libbpf: add user_ctx support to bpf_link_create() API
  libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs
  selftests/bpf: test low-level perf BPF link API
  selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h}
  selftests/bpf: add user_ctx selftests for high-level APIs

 drivers/media/rc/bpf-lirc.c                   |   4 +-
 include/linux/bpf.h                           | 206 ++++++++------
 include/linux/bpf_types.h                     |   3 +
 include/linux/filter.h                        |  63 +++--
 include/linux/perf_event.h                    |   1 +
 include/linux/trace_events.h                  |   7 +-
 include/uapi/linux/bpf.h                      |  25 ++
 kernel/bpf/cgroup.c                           |  32 +--
 kernel/bpf/core.c                             |  29 +-
 kernel/bpf/syscall.c                          | 105 +++++++-
 kernel/events/core.c                          |  72 ++---
 kernel/trace/bpf_trace.c                      |  45 +++-
 tools/include/uapi/linux/bpf.h                |  25 ++
 tools/lib/bpf/Makefile                        |  10 +-
 tools/lib/bpf/bpf.c                           |  32 ++-
 tools/lib/bpf/bpf.h                           |   8 +-
 tools/lib/bpf/libbpf.c                        | 196 +++++++++++---
 tools/lib/bpf/libbpf.h                        |  71 ++++-
 tools/lib/bpf/libbpf.map                      |   3 +
 tools/lib/bpf/libbpf_internal.h               |  32 ++-
 .../selftests/bpf/prog_tests/attach_probe.c   |  61 +----
 .../selftests/bpf/prog_tests/perf_link.c      |  89 ++++++
 .../selftests/bpf/prog_tests/user_ctx.c       | 254 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_link.c      |  16 ++
 .../selftests/bpf/progs/test_user_ctx.c       |  85 ++++++
 tools/testing/selftests/bpf/trace_helpers.c   |  66 +++++
 tools/testing/selftests/bpf/trace_helpers.h   |   3 +
 27 files changed, 1230 insertions(+), 313 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_link.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/user_ctx.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_user_ctx.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
@ 2021-07-26 16:11 ` Andrii Nakryiko
  2021-07-29 16:49   ` Yonghong Song
  2021-07-26 16:11 ` [PATCH v2 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:11 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Turn BPF_PROG_RUN into a proper always inlined function. No functional and
performance changes are intended, but it makes it much easier to understand
what's going on with how BPF programs are actually get executed. It's more
obvious what types and callbacks are expected. Also extra () around input
parameters can be dropped, as well as `__` variable prefixes intended to avoid
naming collisions, which makes the code simpler to read and write.

This refactoring also highlighted one possible issue. BPF_PROG_RUN is both
a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
BPF_PROG_RUN into a function causes naming conflict compilation error. So
rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. To avoid unnecessary code
churn across many networking calls to BPF_PROG_RUN, #define BPF_PROG_RUN as an
alias to bpf_prog_run.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/filter.h | 58 +++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 21 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ba36989f711a..e59c97c72233 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -585,25 +585,41 @@ struct sk_filter {
 
 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 
-#define __BPF_PROG_RUN(prog, ctx, dfunc)	({			\
-	u32 __ret;							\
-	cant_migrate();							\
-	if (static_branch_unlikely(&bpf_stats_enabled_key)) {		\
-		struct bpf_prog_stats *__stats;				\
-		u64 __start = sched_clock();				\
-		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
-		__stats = this_cpu_ptr(prog->stats);			\
-		u64_stats_update_begin(&__stats->syncp);		\
-		__stats->cnt++;						\
-		__stats->nsecs += sched_clock() - __start;		\
-		u64_stats_update_end(&__stats->syncp);			\
-	} else {							\
-		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
-	}								\
-	__ret; })
-
-#define BPF_PROG_RUN(prog, ctx)						\
-	__BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func)
+typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
+					  const struct bpf_insn *insnsi,
+					  unsigned int (*bpf_func)(const void *,
+								   const struct bpf_insn *));
+
+static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
+					  const void *ctx,
+					  bpf_dispatcher_fn dfunc)
+{
+	u32 ret;
+
+	cant_migrate();
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {
+		struct bpf_prog_stats *stats;
+		u64 start = sched_clock();
+
+		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
+		stats = this_cpu_ptr(prog->stats);
+		u64_stats_update_begin(&stats->syncp);
+		stats->cnt++;
+		stats->nsecs += sched_clock() - start;
+		u64_stats_update_end(&stats->syncp);
+	} else {
+		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
+	}
+	return ret;
+}
+
+static __always_inline u32 bpf_prog_run(const struct bpf_prog *prog, const void *ctx)
+{
+	return __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
+}
+
+/* avoids name conflict with BPF_PROG_RUN enum definedi uapi/linux/bpf.h */
+#define BPF_PROG_RUN bpf_prog_run
 
 /*
  * Use in preemptible and therefore migratable context to make sure that
@@ -622,7 +638,7 @@ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
 	u32 ret;
 
 	migrate_disable();
-	ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func);
+	ret = __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
 	migrate_enable();
 	return ret;
 }
@@ -768,7 +784,7 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	return __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
  2021-07-26 16:11 ` [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
@ 2021-07-26 16:11 ` Andrii Nakryiko
  2021-07-29 17:04   ` Yonghong Song
  2021-07-26 16:12 ` [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:11 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
with all the same readability and maintainability benefits. Making them into
functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
functions. Also, explicitly specifying the type of the BPF prog run callback
required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
internally to const struct sk_buff.

Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
the differences in bpf_run_ctx used for those two different use cases.

I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
everyone.

The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
pass-through user-provided context value in the next patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h      | 187 +++++++++++++++++++++++----------------
 include/linux/filter.h   |   5 +-
 kernel/bpf/cgroup.c      |  32 +++----
 kernel/trace/bpf_trace.c |   2 +-
 4 files changed, 132 insertions(+), 94 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c8cc09013210..9c44b56b698f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1146,67 +1146,124 @@ struct bpf_run_ctx {};
 
 struct bpf_cg_run_ctx {
 	struct bpf_run_ctx run_ctx;
-	struct bpf_prog_array_item *prog_item;
+	const struct bpf_prog_array_item *prog_item;
 };
 
+#ifdef CONFIG_BPF_SYSCALL
+static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
+{
+	struct bpf_run_ctx *old_ctx;
+
+	old_ctx = current->bpf_ctx;
+	current->bpf_ctx = new_ctx;
+	return old_ctx;
+}
+
+static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
+{
+	current->bpf_ctx = old_ctx;
+}
+#else /* CONFIG_BPF_SYSCALL */
+static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
+{
+	return NULL;
+}
+
+static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
+{
+}
+#endif /* CONFIG_BPF_SYSCALL */
+
+
 /* BPF program asks to bypass CAP_NET_BIND_SERVICE in bind. */
 #define BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE			(1 << 0)
 /* BPF program asks to set CN on the packet. */
 #define BPF_RET_SET_CN						(1 << 0)
 
-#define BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, ret_flags)		\
-	({								\
-		struct bpf_prog_array_item *_item;			\
-		struct bpf_prog *_prog;					\
-		struct bpf_prog_array *_array;				\
-		struct bpf_run_ctx *old_run_ctx;			\
-		struct bpf_cg_run_ctx run_ctx;				\
-		u32 _ret = 1;						\
-		u32 func_ret;						\
-		migrate_disable();					\
-		rcu_read_lock();					\
-		_array = rcu_dereference(array);			\
-		_item = &_array->items[0];				\
-		old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);	\
-		while ((_prog = READ_ONCE(_item->prog))) {		\
-			run_ctx.prog_item = _item;			\
-			func_ret = func(_prog, ctx);			\
-			_ret &= (func_ret & 1);				\
-			*(ret_flags) |= (func_ret >> 1);		\
-			_item++;					\
-		}							\
-		bpf_reset_run_ctx(old_run_ctx);				\
-		rcu_read_unlock();					\
-		migrate_enable();					\
-		_ret;							\
-	 })
-
-#define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null, set_cg_storage)	\
-	({						\
-		struct bpf_prog_array_item *_item;	\
-		struct bpf_prog *_prog;			\
-		struct bpf_prog_array *_array;		\
-		struct bpf_run_ctx *old_run_ctx;	\
-		struct bpf_cg_run_ctx run_ctx;		\
-		u32 _ret = 1;				\
-		migrate_disable();			\
-		rcu_read_lock();			\
-		_array = rcu_dereference(array);	\
-		if (unlikely(check_non_null && !_array))\
-			goto _out;			\
-		_item = &_array->items[0];		\
-		old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);\
-		while ((_prog = READ_ONCE(_item->prog))) {	\
-			run_ctx.prog_item = _item;	\
-			_ret &= func(_prog, ctx);	\
-			_item++;			\
-		}					\
-		bpf_reset_run_ctx(old_run_ctx);		\
-_out:							\
-		rcu_read_unlock();			\
-		migrate_enable();			\
-		_ret;					\
-	 })
+typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx);
+
+static __always_inline u32
+BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
+			    const void *ctx, bpf_prog_run_fn run_prog,
+			    u32 *ret_flags)
+{
+	const struct bpf_prog_array_item *item;
+	const struct bpf_prog *prog;
+	const struct bpf_prog_array *array;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_cg_run_ctx run_ctx;
+	u32 ret = 1;
+	u32 func_ret;
+
+	migrate_disable();
+	rcu_read_lock();
+	array = rcu_dereference(array_rcu);
+	item = &array->items[0];
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	while ((prog = READ_ONCE(item->prog))) {
+		run_ctx.prog_item = item;
+		func_ret = run_prog(prog, ctx);
+		ret &= (func_ret & 1);
+		*(ret_flags) |= (func_ret >> 1);
+		item++;
+	}
+	bpf_reset_run_ctx(old_run_ctx);
+	rcu_read_unlock();
+	migrate_enable();
+	return ret;
+}
+
+static __always_inline u32
+BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
+		      const void *ctx, bpf_prog_run_fn run_prog)
+{
+	const struct bpf_prog_array_item *item;
+	const struct bpf_prog *prog;
+	const struct bpf_prog_array *array;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_cg_run_ctx run_ctx;
+	u32 ret = 1;
+
+	migrate_disable();
+	rcu_read_lock();
+	array = rcu_dereference(array_rcu);
+	item = &array->items[0];
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	while ((prog = READ_ONCE(item->prog))) {
+		run_ctx.prog_item = item;
+		ret &= run_prog(prog, ctx);
+		item++;
+	}
+	bpf_reset_run_ctx(old_run_ctx);
+	rcu_read_unlock();
+	migrate_enable();
+	return ret;
+}
+
+static __always_inline u32
+BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
+		   const void *ctx, bpf_prog_run_fn run_prog)
+{
+	const struct bpf_prog_array_item *item;
+	const struct bpf_prog *prog;
+	const struct bpf_prog_array *array;
+	u32 ret = 1;
+
+	migrate_disable();
+	rcu_read_lock();
+	array = rcu_dereference(array_rcu);
+	if (unlikely(!array))
+		goto out;
+	item = &array->items[0];
+	while ((prog = READ_ONCE(item->prog))) {
+		ret &= run_prog(prog, ctx);
+		item++;
+	}
+out:
+	rcu_read_unlock();
+	migrate_enable();
+	return ret;
+}
 
 /* To be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs
  * so BPF programs can request cwr for TCP packets.
@@ -1235,7 +1292,7 @@ _out:							\
 		u32 _flags = 0;				\
 		bool _cn;				\
 		u32 _ret;				\
-		_ret = BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, &_flags); \
+		_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, &_flags); \
 		_cn = _flags & BPF_RET_SET_CN;		\
 		if (_ret)				\
 			_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);	\
@@ -1244,12 +1301,6 @@ _out:							\
 		_ret;					\
 	})
 
-#define BPF_PROG_RUN_ARRAY(array, ctx, func)		\
-	__BPF_PROG_RUN_ARRAY(array, ctx, func, false, true)
-
-#define BPF_PROG_RUN_ARRAY_CHECK(array, ctx, func)	\
-	__BPF_PROG_RUN_ARRAY(array, ctx, func, true, false)
-
 #ifdef CONFIG_BPF_SYSCALL
 DECLARE_PER_CPU(int, bpf_prog_active);
 extern struct mutex bpf_stats_enabled_mutex;
@@ -1284,20 +1335,6 @@ static inline void bpf_enable_instrumentation(void)
 	migrate_enable();
 }
 
-static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
-{
-	struct bpf_run_ctx *old_ctx;
-
-	old_ctx = current->bpf_ctx;
-	current->bpf_ctx = new_ctx;
-	return old_ctx;
-}
-
-static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
-{
-	current->bpf_ctx = old_ctx;
-}
-
 extern const struct file_operations bpf_map_fops;
 extern const struct file_operations bpf_prog_fops;
 extern const struct file_operations bpf_iter_fops;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index e59c97c72233..c38a872265c6 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -711,7 +711,7 @@ static inline void bpf_restore_data_end(
 	cb->data_end = saved_data_end;
 }
 
-static inline u8 *bpf_skb_cb(struct sk_buff *skb)
+static inline u8 *bpf_skb_cb(const struct sk_buff *skb)
 {
 	/* eBPF programs may read/write skb->cb[] area to transfer meta
 	 * data between tail calls. Since this also needs to work with
@@ -732,8 +732,9 @@ static inline u8 *bpf_skb_cb(struct sk_buff *skb)
 
 /* Must be invoked with migration disabled */
 static inline u32 __bpf_prog_run_save_cb(const struct bpf_prog *prog,
-					 struct sk_buff *skb)
+					 const void *ctx)
 {
+	const struct sk_buff *skb = ctx;
 	u8 *cb_data = bpf_skb_cb(skb);
 	u8 cb_saved[BPF_SKB_CB_LEN];
 	u32 res;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index b567ca46555c..dd2c0d4ae4f8 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1012,8 +1012,8 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 		ret = BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(
 			cgrp->bpf.effective[type], skb, __bpf_prog_run_save_cb);
 	} else {
-		ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
-					  __bpf_prog_run_save_cb);
+		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], skb,
+					    __bpf_prog_run_save_cb);
 		ret = (ret == 1 ? 0 : -EPERM);
 	}
 	bpf_restore_data_end(skb, saved_data_end);
@@ -1043,7 +1043,7 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	int ret;
 
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], sk, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], sk, BPF_PROG_RUN);
 	return ret == 1 ? 0 : -EPERM;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
@@ -1090,8 +1090,8 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 	}
 
 	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
-	ret = BPF_PROG_RUN_ARRAY_FLAGS(cgrp->bpf.effective[type], &ctx,
-				       BPF_PROG_RUN, flags);
+	ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(cgrp->bpf.effective[type], &ctx,
+				          BPF_PROG_RUN, flags);
 
 	return ret == 1 ? 0 : -EPERM;
 }
@@ -1120,8 +1120,8 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	int ret;
 
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], sock_ops,
-				 BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], sock_ops,
+				    BPF_PROG_RUN);
 	return ret == 1 ? 0 : -EPERM;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sock_ops);
@@ -1139,8 +1139,8 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 
 	rcu_read_lock();
 	cgrp = task_dfl_cgroup(current);
-	allow = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx,
-				   BPF_PROG_RUN);
+	allow = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], &ctx,
+				      BPF_PROG_RUN);
 	rcu_read_unlock();
 
 	return !allow;
@@ -1271,7 +1271,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 
 	rcu_read_lock();
 	cgrp = task_dfl_cgroup(current);
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
 	rcu_read_unlock();
 
 	kfree(ctx.cur_val);
@@ -1385,8 +1385,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 	}
 
 	lock_sock(sk);
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_SETSOCKOPT],
-				 &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[BPF_CGROUP_SETSOCKOPT],
+				    &ctx, BPF_PROG_RUN);
 	release_sock(sk);
 
 	if (!ret) {
@@ -1495,8 +1495,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	}
 
 	lock_sock(sk);
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
-				 &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
+				    &ctx, BPF_PROG_RUN);
 	release_sock(sk);
 
 	if (!ret) {
@@ -1556,8 +1556,8 @@ int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
 	 * be called if that data shouldn't be "exported".
 	 */
 
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
-				 &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
+				    &ctx, BPF_PROG_RUN);
 	if (!ret)
 		return -EPERM;
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c5e0b6a64091..b427eac10780 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -124,7 +124,7 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 	 * out of events when it was updated in between this and the
 	 * rcu_dereference() which is accepted risk.
 	 */
-	ret = BPF_PROG_RUN_ARRAY_CHECK(call->prog_array, ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY(call->prog_array, ctx, bpf_prog_run);
 
  out:
 	__this_cpu_dec(bpf_prog_active);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
  2021-07-26 16:11 ` [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
  2021-07-26 16:11 ` [PATCH v2 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-27  8:48   ` Peter Zijlstra
  2021-07-29 17:09   ` Yonghong Song
  2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
input argument, which makes it easier to re-use for other internal uses
(coming up for BPF link in the next patch). BPF program FD is not as
convenient and in some cases it's not available. So switch to struct bpf_prog,
move out refcounting outside and let caller do bpf_prog_put() in case of an
error. This follows the approach of most of the other BPF internal functions.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/core.c | 61 ++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 464917096e73..bf4689403498 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5574,7 +5574,7 @@ static inline int perf_fget_light(int fd, struct fd *p)
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
-static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd);
+static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
 			  struct perf_event_attr *attr);
 
@@ -5637,7 +5637,22 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 		return perf_event_set_filter(event, (void __user *)arg);
 
 	case PERF_EVENT_IOC_SET_BPF:
-		return perf_event_set_bpf_prog(event, arg);
+	{
+		struct bpf_prog *prog;
+		int err;
+
+		prog = bpf_prog_get(arg);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+
+		err = perf_event_set_bpf_prog(event, prog);
+		if (err) {
+			bpf_prog_put(prog);
+			return err;
+		}
+
+		return 0;
+	}
 
 	case PERF_EVENT_IOC_PAUSE_OUTPUT: {
 		struct perf_buffer *rb;
@@ -9923,10 +9938,8 @@ static void bpf_overflow_handler(struct perf_event *event,
 	event->orig_overflow_handler(event, data, regs);
 }
 
-static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
 {
-	struct bpf_prog *prog;
-
 	if (event->overflow_handler_context)
 		/* hw breakpoint or kernel counter */
 		return -EINVAL;
@@ -9934,9 +9947,8 @@ static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
 	if (event->prog)
 		return -EEXIST;
 
-	prog = bpf_prog_get_type(prog_fd, BPF_PROG_TYPE_PERF_EVENT);
-	if (IS_ERR(prog))
-		return PTR_ERR(prog);
+	if (prog->type != BPF_PROG_TYPE_PERF_EVENT)
+		return -EINVAL;
 
 	if (event->attr.precise_ip &&
 	    prog->call_get_stack &&
@@ -9952,7 +9964,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
 		 * attached to perf_sample_data, do not allow attaching BPF
 		 * program that calls bpf_get_[stack|stackid].
 		 */
-		bpf_prog_put(prog);
 		return -EPROTO;
 	}
 
@@ -9974,7 +9985,7 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
 	bpf_prog_put(prog);
 }
 #else
-static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
 {
 	return -EOPNOTSUPP;
 }
@@ -10002,14 +10013,12 @@ static inline bool perf_event_is_tracing(struct perf_event *event)
 	return false;
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	bool is_kprobe, is_tracepoint, is_syscall_tp;
-	struct bpf_prog *prog;
-	int ret;
 
 	if (!perf_event_is_tracing(event))
-		return perf_event_set_bpf_handler(event, prog_fd);
+		return perf_event_set_bpf_handler(event, prog);
 
 	is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE;
 	is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT;
@@ -10018,38 +10027,24 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
 		/* bpf programs can only be attached to u/kprobe or tracepoint */
 		return -EINVAL;
 
-	prog = bpf_prog_get(prog_fd);
-	if (IS_ERR(prog))
-		return PTR_ERR(prog);
-
 	if ((is_kprobe && prog->type != BPF_PROG_TYPE_KPROBE) ||
 	    (is_tracepoint && prog->type != BPF_PROG_TYPE_TRACEPOINT) ||
-	    (is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT)) {
-		/* valid fd, but invalid bpf program type */
-		bpf_prog_put(prog);
+	    (is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT))
 		return -EINVAL;
-	}
 
 	/* Kprobe override only works for kprobes, not uprobes. */
 	if (prog->kprobe_override &&
-	    !(event->tp_event->flags & TRACE_EVENT_FL_KPROBE)) {
-		bpf_prog_put(prog);
+	    !(event->tp_event->flags & TRACE_EVENT_FL_KPROBE))
 		return -EINVAL;
-	}
 
 	if (is_tracepoint || is_syscall_tp) {
 		int off = trace_event_get_offsets(event->tp_event);
 
-		if (prog->aux->max_ctx_offset > off) {
-			bpf_prog_put(prog);
+		if (prog->aux->max_ctx_offset > off)
 			return -EACCES;
-		}
 	}
 
-	ret = perf_event_attach_bpf_prog(event, prog);
-	if (ret)
-		bpf_prog_put(prog);
-	return ret;
+	return perf_event_attach_bpf_prog(event, prog);
 }
 
 static void perf_event_free_bpf_prog(struct perf_event *event)
@@ -10071,7 +10066,7 @@ static void perf_event_free_filter(struct perf_event *event)
 {
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	return -ENOENT;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-27  9:04   ` Peter Zijlstra
                     ` (3 more replies)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links Andrii Nakryiko
                   ` (9 subsequent siblings)
  13 siblings, 4 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.

BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.

Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.

One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.

The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_types.h      |   3 +
 include/linux/trace_events.h   |   3 +
 include/uapi/linux/bpf.h       |   2 +
 kernel/bpf/syscall.c           | 105 ++++++++++++++++++++++++++++++---
 kernel/events/core.c           |  10 ++--
 tools/include/uapi/linux/bpf.h |   2 +
 6 files changed, 112 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a9db1eae6796..0a1ada7f174d 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
 #ifdef CONFIG_NET
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
 #endif
+#ifdef CONFIG_PERF_EVENTS
+BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
+#endif
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index ad413b382a3c..8ac92560d3a3 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
 void perf_trace_buf_update(void *record, u16 type);
 void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
+void perf_event_free_bpf_prog(struct perf_event *event);
+
 void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
 void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
 void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2db6925e04f4..00b1267ab4f0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -993,6 +993,7 @@ enum bpf_attach_type {
 	BPF_SK_SKB_VERDICT,
 	BPF_SK_REUSEPORT_SELECT,
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
+	BPF_PERF_EVENT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1006,6 +1007,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_PERF_EVENT = 6,
 
 	MAX_BPF_LINK_TYPE,
 };
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9a2068e39d23..80c03bedd6e6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2906,6 +2906,79 @@ static const struct bpf_link_ops bpf_raw_tp_link_lops = {
 	.fill_link_info = bpf_raw_tp_link_fill_link_info,
 };
 
+#ifdef CONFIG_PERF_EVENTS
+struct bpf_perf_link {
+	struct bpf_link link;
+	struct file *perf_file;
+};
+
+static void bpf_perf_link_release(struct bpf_link *link)
+{
+	struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
+	struct perf_event *event = perf_link->perf_file->private_data;
+
+	perf_event_free_bpf_prog(event);
+	fput(perf_link->perf_file);
+}
+
+static void bpf_perf_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
+
+	kfree(perf_link);
+}
+
+static const struct bpf_link_ops bpf_perf_link_lops = {
+	.release = bpf_perf_link_release,
+	.dealloc = bpf_perf_link_dealloc,
+};
+
+static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct bpf_link_primer link_primer;
+	struct bpf_perf_link *link;
+	struct perf_event *event;
+	struct file *perf_file;
+	int err;
+
+	if (attr->link_create.flags)
+		return -EINVAL;
+
+	perf_file = perf_event_get(attr->link_create.target_fd);
+	if (IS_ERR(perf_file))
+		return PTR_ERR(perf_file);
+
+	link = kzalloc(sizeof(*link), GFP_USER);
+	if (!link) {
+		err = -ENOMEM;
+		goto out_put_file;
+	}
+	bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog);
+	link->perf_file = perf_file;
+
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
+		kfree(link);
+		goto out_put_file;
+	}
+
+	event = perf_file->private_data;
+	err = perf_event_set_bpf_prog(event, prog);
+	if (err) {
+		bpf_link_cleanup(&link_primer);
+		goto out_put_file;
+	}
+	/* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */
+	bpf_prog_inc(prog);
+
+	return bpf_link_settle(&link_primer);
+
+out_put_file:
+	fput(perf_file);
+	return err;
+}
+#endif /* CONFIG_PERF_EVENTS */
+
 #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
 
 static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
@@ -4147,15 +4220,26 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	if (ret)
 		goto out;
 
-	if (prog->type == BPF_PROG_TYPE_EXT) {
+	switch (prog->type) {
+	case BPF_PROG_TYPE_EXT:
 		ret = tracing_bpf_link_attach(attr, uattr, prog);
 		goto out;
-	}
-
-	ptype = attach_type_to_prog_type(attr->link_create.attach_type);
-	if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
-		ret = -EINVAL;
-		goto out;
+	case BPF_PROG_TYPE_PERF_EVENT:
+	case BPF_PROG_TYPE_KPROBE:
+	case BPF_PROG_TYPE_TRACEPOINT:
+		if (attr->link_create.attach_type != BPF_PERF_EVENT) {
+			ret = -EINVAL;
+			goto out;
+		}
+		ptype = prog->type;
+		break;
+	default:
+		ptype = attach_type_to_prog_type(attr->link_create.attach_type);
+		if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
+			ret = -EINVAL;
+			goto out;
+		}
+		break;
 	}
 
 	switch (ptype) {
@@ -4179,6 +4263,13 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_PROG_TYPE_XDP:
 		ret = bpf_xdp_link_attach(attr, prog);
 		break;
+#endif
+#ifdef CONFIG_PERF_EVENTS
+	case BPF_PROG_TYPE_PERF_EVENT:
+	case BPF_PROG_TYPE_TRACEPOINT:
+	case BPF_PROG_TYPE_KPROBE:
+		ret = bpf_perf_link_attach(attr, prog);
+		break;
 #endif
 	default:
 		ret = -EINVAL;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bf4689403498..b125943599ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4697,7 +4697,6 @@ find_get_context(struct pmu *pmu, struct task_struct *task,
 }
 
 static void perf_event_free_filter(struct perf_event *event);
-static void perf_event_free_bpf_prog(struct perf_event *event);
 
 static void free_event_rcu(struct rcu_head *head)
 {
@@ -5574,7 +5573,6 @@ static inline int perf_fget_light(int fd, struct fd *p)
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
-static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
 			  struct perf_event_attr *attr);
 
@@ -10013,7 +10011,7 @@ static inline bool perf_event_is_tracing(struct perf_event *event)
 	return false;
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	bool is_kprobe, is_tracepoint, is_syscall_tp;
 
@@ -10047,7 +10045,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *pr
 	return perf_event_attach_bpf_prog(event, prog);
 }
 
-static void perf_event_free_bpf_prog(struct perf_event *event)
+void perf_event_free_bpf_prog(struct perf_event *event)
 {
 	if (!perf_event_is_tracing(event)) {
 		perf_event_free_bpf_handler(event);
@@ -10066,12 +10064,12 @@ static void perf_event_free_filter(struct perf_event *event)
 {
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	return -ENOENT;
 }
 
-static void perf_event_free_bpf_prog(struct perf_event *event)
+void perf_event_free_bpf_prog(struct perf_event *event)
 {
 }
 #endif /* CONFIG_EVENT_TRACING */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2db6925e04f4..00b1267ab4f0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -993,6 +993,7 @@ enum bpf_attach_type {
 	BPF_SK_SKB_VERDICT,
 	BPF_SK_REUSEPORT_SELECT,
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
+	BPF_PERF_EVENT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1006,6 +1007,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_PERF_EVENT = 6,
 
 	MAX_BPF_LINK_TYPE,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-27  9:11   ` Peter Zijlstra
  2021-07-29 18:00   ` Yonghong Song
  2021-07-26 16:12 ` [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value Andrii Nakryiko
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add ability for users to specify custom u64 value when creating BPF link for
perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).

This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).

This is done by storing u64 user_ctx in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use 2 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for user_ctx, reuse that space through union of
cgroup_storage and new user_ctx field.

Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user context value from inside a BPF program. Generic
perf_event BPF programs will access this value from perf_event itself through
passed in BPF program context.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 drivers/media/rc/bpf-lirc.c    |  4 ++--
 include/linux/bpf.h            | 16 +++++++++++++++-
 include/linux/perf_event.h     |  1 +
 include/linux/trace_events.h   |  6 +++---
 include/uapi/linux/bpf.h       |  7 +++++++
 kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
 kernel/bpf/syscall.c           |  2 +-
 kernel/events/core.c           | 21 ++++++++++++++-------
 kernel/trace/bpf_trace.c       |  8 +++++---
 tools/include/uapi/linux/bpf.h |  7 +++++++
 10 files changed, 73 insertions(+), 28 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index afae0afe3f81..7490494273e4 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
 		goto unlock;
 	}
 
-	ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+	ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
 	if (ret < 0)
 		goto unlock;
 
@@ -193,7 +193,7 @@ static int lirc_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog)
 	}
 
 	old_array = lirc_rcu_dereference(raw->progs);
-	ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array);
+	ret = bpf_prog_array_copy(old_array, prog, NULL, 0, &new_array);
 	/*
 	 * Do not use bpf_prog_array_delete_safe() as we would end up
 	 * with a dummy entry in the array, and the we would free the
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9c44b56b698f..74b35faf0b73 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1114,7 +1114,10 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
  */
 struct bpf_prog_array_item {
 	struct bpf_prog *prog;
-	struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
+	union {
+		struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
+		u64 user_ctx;
+	};
 };
 
 struct bpf_prog_array {
@@ -1140,6 +1143,7 @@ int bpf_prog_array_copy_info(struct bpf_prog_array *array,
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
+			u64 include_user_ctx,
 			struct bpf_prog_array **new_array);
 
 struct bpf_run_ctx {};
@@ -1149,6 +1153,11 @@ struct bpf_cg_run_ctx {
 	const struct bpf_prog_array_item *prog_item;
 };
 
+struct bpf_trace_run_ctx {
+	struct bpf_run_ctx run_ctx;
+	u64 user_ctx;
+};
+
 #ifdef CONFIG_BPF_SYSCALL
 static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
 {
@@ -1247,6 +1256,8 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
 	const struct bpf_prog_array_item *item;
 	const struct bpf_prog *prog;
 	const struct bpf_prog_array *array;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_trace_run_ctx run_ctx;
 	u32 ret = 1;
 
 	migrate_disable();
@@ -1254,11 +1265,14 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
 	array = rcu_dereference(array_rcu);
 	if (unlikely(!array))
 		goto out;
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 	item = &array->items[0];
 	while ((prog = READ_ONCE(item->prog))) {
+		run_ctx.user_ctx = item->user_ctx;
 		ret &= run_prog(prog, ctx);
 		item++;
 	}
+	bpf_reset_run_ctx(old_run_ctx);
 out:
 	rcu_read_unlock();
 	migrate_enable();
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2d510ad750ed..97ab46802800 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -762,6 +762,7 @@ struct perf_event {
 #ifdef CONFIG_BPF_SYSCALL
 	perf_overflow_handler_t		orig_overflow_handler;
 	struct bpf_prog			*prog;
+	u64				user_ctx;
 #endif
 
 #ifdef CONFIG_EVENT_TRACING
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8ac92560d3a3..4543852f1480 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
 
 #ifdef CONFIG_BPF_EVENTS
 unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
-int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
+int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
@@ -692,7 +692,7 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
 }
 
 static inline int
-perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx)
 {
 	return -EOPNOTSUPP;
 }
@@ -803,7 +803,7 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
 void perf_trace_buf_update(void *record, u16 type);
 void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
-int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx);
 void perf_event_free_bpf_prog(struct perf_event *event);
 
 void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 00b1267ab4f0..bc1fd54a8f58 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1448,6 +1448,13 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			struct {
+				/* black box user-provided value passed through
+				 * to BPF program at the execution time and
+				 * accessible through bpf_get_user_ctx() BPF helper
+				 */
+				__u64		user_ctx;
+			} perf_event;
 		};
 	} link_create;
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9b1577498373..7e4c8bf3e8d1 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2097,13 +2097,13 @@ int bpf_prog_array_update_at(struct bpf_prog_array *array, int index,
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
+			u64 include_user_ctx,
 			struct bpf_prog_array **new_array)
 {
 	int new_prog_cnt, carry_prog_cnt = 0;
-	struct bpf_prog_array_item *existing;
+	struct bpf_prog_array_item *existing, *new;
 	struct bpf_prog_array *array;
 	bool found_exclude = false;
-	int new_prog_idx = 0;
 
 	/* Figure out how many existing progs we need to carry over to
 	 * the new array.
@@ -2140,20 +2140,27 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 	array = bpf_prog_array_alloc(new_prog_cnt + 1, GFP_KERNEL);
 	if (!array)
 		return -ENOMEM;
+	new = array->items;
 
 	/* Fill in the new prog array */
 	if (carry_prog_cnt) {
 		existing = old_array->items;
-		for (; existing->prog; existing++)
-			if (existing->prog != exclude_prog &&
-			    existing->prog != &dummy_bpf_prog.prog) {
-				array->items[new_prog_idx++].prog =
-					existing->prog;
-			}
+		for (; existing->prog; existing++) {
+			if (existing->prog == exclude_prog ||
+			    existing->prog == &dummy_bpf_prog.prog)
+				continue;
+
+			new->prog = existing->prog;
+			new->user_ctx = existing->user_ctx;
+			new++;
+		}
 	}
-	if (include_prog)
-		array->items[new_prog_idx++].prog = include_prog;
-	array->items[new_prog_idx].prog = NULL;
+	if (include_prog) {
+		new->prog = include_prog;
+		new->user_ctx = include_user_ctx;
+		new++;
+	}
+	new->prog = NULL;
 	*new_array = array;
 	return 0;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 80c03bedd6e6..67f82d053935 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2963,7 +2963,7 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro
 	}
 
 	event = perf_file->private_data;
-	err = perf_event_set_bpf_prog(event, prog);
+	err = perf_event_set_bpf_prog(event, prog, attr->link_create.perf_event.user_ctx);
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_file;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b125943599ce..3dcdf58290eb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5643,7 +5643,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
-		err = perf_event_set_bpf_prog(event, prog);
+		err = perf_event_set_bpf_prog(event, prog, 0);
 		if (err) {
 			bpf_prog_put(prog);
 			return err;
@@ -9936,7 +9936,9 @@ static void bpf_overflow_handler(struct perf_event *event,
 	event->orig_overflow_handler(event, data, regs);
 }
 
-static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
+static int perf_event_set_bpf_handler(struct perf_event *event,
+				      struct bpf_prog *prog,
+				      u64 user_ctx)
 {
 	if (event->overflow_handler_context)
 		/* hw breakpoint or kernel counter */
@@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog
 	}
 
 	event->prog = prog;
+	event->user_ctx = user_ctx;
 	event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
 	WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
 	return 0;
@@ -9983,7 +9986,9 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
 	bpf_prog_put(prog);
 }
 #else
-static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
+static int perf_event_set_bpf_handler(struct perf_event *event,
+				      struct bpf_prog *prog,
+				      u64 user_ctx)
 {
 	return -EOPNOTSUPP;
 }
@@ -10011,12 +10016,13 @@ static inline bool perf_event_is_tracing(struct perf_event *event)
 	return false;
 }
 
-int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog,
+			    u64 user_ctx)
 {
 	bool is_kprobe, is_tracepoint, is_syscall_tp;
 
 	if (!perf_event_is_tracing(event))
-		return perf_event_set_bpf_handler(event, prog);
+		return perf_event_set_bpf_handler(event, prog, user_ctx);
 
 	is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE;
 	is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT;
@@ -10042,7 +10048,7 @@ int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 			return -EACCES;
 	}
 
-	return perf_event_attach_bpf_prog(event, prog);
+	return perf_event_attach_bpf_prog(event, prog, user_ctx);
 }
 
 void perf_event_free_bpf_prog(struct perf_event *event)
@@ -10064,7 +10070,8 @@ static void perf_event_free_filter(struct perf_event *event)
 {
 }
 
-int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog,
+			    u64 user_ctx)
 {
 	return -ENOENT;
 }
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b427eac10780..c9cf6a0d0fb3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1674,7 +1674,8 @@ static DEFINE_MUTEX(bpf_event_mutex);
 #define BPF_TRACE_MAX_PROGS 64
 
 int perf_event_attach_bpf_prog(struct perf_event *event,
-			       struct bpf_prog *prog)
+			       struct bpf_prog *prog,
+			       u64 user_ctx)
 {
 	struct bpf_prog_array *old_array;
 	struct bpf_prog_array *new_array;
@@ -1701,12 +1702,13 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
 		goto unlock;
 	}
 
-	ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+	ret = bpf_prog_array_copy(old_array, NULL, prog, user_ctx, &new_array);
 	if (ret < 0)
 		goto unlock;
 
 	/* set the new array to event->tp_event and set event->prog */
 	event->prog = prog;
+	event->user_ctx = user_ctx;
 	rcu_assign_pointer(event->tp_event->prog_array, new_array);
 	bpf_prog_array_free(old_array);
 
@@ -1727,7 +1729,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
 		goto unlock;
 
 	old_array = bpf_event_rcu_dereference(event->tp_event->prog_array);
-	ret = bpf_prog_array_copy(old_array, event->prog, NULL, &new_array);
+	ret = bpf_prog_array_copy(old_array, event->prog, NULL, 0, &new_array);
 	if (ret == -ENOENT)
 		goto unlock;
 	if (ret < 0) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 00b1267ab4f0..bc1fd54a8f58 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1448,6 +1448,13 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			struct {
+				/* black box user-provided value passed through
+				 * to BPF program at the execution time and
+				 * accessible through bpf_get_user_ctx() BPF helper
+				 */
+				__u64		user_ctx;
+			} perf_event;
 		};
 	} link_create;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-29 18:17   ` Yonghong Song
  2021-07-26 16:12 ` [PATCH v2 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add new BPF helper, bpf_get_user_ctx(), which can be used by BPF programs to
get access to the user_ctx value, specified during BPF program attachment (BPF
link creation) time.

Currently all perf_event-backed BPF program types support bpf_get_user_ctx()
helper. Follow-up patches will add support for fentry/fexit programs as well.

While at it, mark bpf_tracing_func_proto() as static to make it obvious that
it's only used from within the kernel/trace/bpf_trace.c.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  3 ---
 include/uapi/linux/bpf.h       | 16 ++++++++++++++++
 kernel/trace/bpf_trace.c       | 35 +++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h | 16 ++++++++++++++++
 4 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 74b35faf0b73..94ebedc1e13a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2110,9 +2110,6 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
 
-const struct bpf_func_proto *bpf_tracing_func_proto(
-	enum bpf_func_id func_id, const struct bpf_prog *prog);
-
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bc1fd54a8f58..96afeced3467 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4856,6 +4856,21 @@ union bpf_attr {
  * 		Get address of the traced function (for tracing and kprobe programs).
  * 	Return
  * 		Address of the traced function.
+ *
+ * u64 bpf_get_user_ctx(void *ctx)
+ * 	Description
+ * 		Get user_ctx value provided (optionally) during the program
+ * 		attachment. It might be different for each individual
+ * 		attachment, even if BPF program itself is the same.
+ * 		Expects BPF program context *ctx* as a first argument.
+ *
+ * 		Supported for the following program types:
+ *			- kprobe/uprobe;
+ *			- tracepoint;
+ *			- perf_event.
+ * 	Return
+ *		Value specified by user at BPF link creation/attachment time
+ *		or 0, if it was not specified.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5032,6 +5047,7 @@ union bpf_attr {
 	FN(timer_start),		\
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
+	FN(get_user_ctx),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c9cf6a0d0fb3..b14978b3f6fb 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -975,7 +975,34 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
-const struct bpf_func_proto *
+BPF_CALL_1(bpf_get_user_ctx_trace, void *, ctx)
+{
+	struct bpf_trace_run_ctx *run_ctx;
+
+	run_ctx = container_of(current->bpf_ctx, struct bpf_trace_run_ctx, run_ctx);
+	return run_ctx->user_ctx;
+}
+
+static const struct bpf_func_proto bpf_get_user_ctx_proto_trace = {
+	.func		= bpf_get_user_ctx_trace,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
+BPF_CALL_1(bpf_get_user_ctx_pe, struct bpf_perf_event_data_kern *, ctx)
+{
+	return ctx->event->user_ctx;
+}
+
+static const struct bpf_func_proto bpf_get_user_ctx_proto_pe = {
+	.func		= bpf_get_user_ctx_pe,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
+static const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
@@ -1108,6 +1135,8 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 	case BPF_FUNC_get_func_ip:
 		return &bpf_get_func_ip_proto_kprobe;
+	case BPF_FUNC_get_user_ctx:
+		return &bpf_get_user_ctx_proto_trace;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
@@ -1218,6 +1247,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_stackid_proto_tp;
 	case BPF_FUNC_get_stack:
 		return &bpf_get_stack_proto_tp;
+	case BPF_FUNC_get_user_ctx:
+		return &bpf_get_user_ctx_proto_trace;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
@@ -1325,6 +1356,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_prog_read_value_proto;
 	case BPF_FUNC_read_branch_records:
 		return &bpf_read_branch_records_proto;
+	case BPF_FUNC_get_user_ctx:
+		return &bpf_get_user_ctx_proto_pe;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bc1fd54a8f58..96afeced3467 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4856,6 +4856,21 @@ union bpf_attr {
  * 		Get address of the traced function (for tracing and kprobe programs).
  * 	Return
  * 		Address of the traced function.
+ *
+ * u64 bpf_get_user_ctx(void *ctx)
+ * 	Description
+ * 		Get user_ctx value provided (optionally) during the program
+ * 		attachment. It might be different for each individual
+ * 		attachment, even if BPF program itself is the same.
+ * 		Expects BPF program context *ctx* as a first argument.
+ *
+ * 		Supported for the following program types:
+ *			- kprobe/uprobe;
+ *			- tracepoint;
+ *			- perf_event.
+ * 	Return
+ *		Value specified by user at BPF link creation/attachment time
+ *		or 0, if it was not specified.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5032,6 +5047,7 @@ union bpf_attr {
 	FN(timer_start),		\
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
+	FN(get_user_ctx),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-26 16:12 ` [PATCH v2 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc Andrii Nakryiko
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
changes to libbpf.map are not detected and versioned symbols mismatch error
will be reported until `make clean && make` is used, which is a suboptimal
developer experience.

Fixes: 306b267cb3c4 ("libbpf: Verify versioned symbols")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/Makefile | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index ec14aa725bb0..74c3b73a5fbe 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -4,8 +4,9 @@
 RM ?= rm
 srctree = $(abs_srctree)
 
+VERSION_SCRIPT := libbpf.map
 LIBBPF_VERSION := $(shell \
-	grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \
+	grep -oE '^LIBBPF_([0-9.]+)' $(VERSION_SCRIPT) | \
 	sort -rV | head -n1 | cut -d'_' -f2)
 LIBBPF_MAJOR_VERSION := $(firstword $(subst ., ,$(LIBBPF_VERSION)))
 
@@ -110,7 +111,6 @@ SHARED_OBJDIR	:= $(OUTPUT)sharedobjs/
 STATIC_OBJDIR	:= $(OUTPUT)staticobjs/
 BPF_IN_SHARED	:= $(SHARED_OBJDIR)libbpf-in.o
 BPF_IN_STATIC	:= $(STATIC_OBJDIR)libbpf-in.o
-VERSION_SCRIPT	:= libbpf.map
 BPF_HELPER_DEFS	:= $(OUTPUT)bpf_helper_defs.h
 
 LIB_TARGET	:= $(addprefix $(OUTPUT),$(LIB_TARGET))
@@ -163,10 +163,10 @@ $(BPF_HELPER_DEFS): $(srctree)/tools/include/uapi/linux/bpf.h
 
 $(OUTPUT)libbpf.so: $(OUTPUT)libbpf.so.$(LIBBPF_VERSION)
 
-$(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED)
+$(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED) $(VERSION_SCRIPT)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) \
 		--shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
-		-Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -lz -o $@
+		-Wl,--version-script=$(VERSION_SCRIPT) $< -lelf -lz -o $@
 	@ln -sf $(@F) $(OUTPUT)libbpf.so
 	@ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
 
@@ -181,7 +181,7 @@ $(OUTPUT)libbpf.pc:
 
 check: check_abi
 
-check_abi: $(OUTPUT)libbpf.so
+check_abi: $(OUTPUT)libbpf.so $(VERSION_SCRIPT)
 	@if [ "$(GLOBAL_SYM_COUNT)" != "$(VERSIONED_SYM_COUNT)" ]; then	 \
 		echo "Warning: Num of global symbols in $(BPF_IN_SHARED)"	 \
 		     "($(GLOBAL_SYM_COUNT)) does NOT match with num of"	 \
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-26 16:12 ` [PATCH v2 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel Andrii Nakryiko
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Rafael David Tinoco

bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
to override deallocation procedure, with default doing plain free(link). This
is necessary for cases when we want to "subclass" struct bpf_link to keep
extra information, as is the case in the next patch adding struct
bpf_link_perf.

Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a53ca29b44ab..f944342c0152 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10070,7 +10070,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 
 struct bpf_link {
 	int (*detach)(struct bpf_link *link);
-	int (*destroy)(struct bpf_link *link);
+	void (*dealloc)(struct bpf_link *link);
 	char *pin_path;		/* NULL, if not pinned */
 	int fd;			/* hook FD, -1 if not applicable */
 	bool disconnected;
@@ -10109,11 +10109,12 @@ int bpf_link__destroy(struct bpf_link *link)
 
 	if (!link->disconnected && link->detach)
 		err = link->detach(link);
-	if (link->destroy)
-		link->destroy(link);
 	if (link->pin_path)
 		free(link->pin_path);
-	free(link);
+	if (link->dealloc)
+		link->dealloc(link);
+	else
+		free(link);
 
 	return libbpf_err(err);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-26 16:12 ` [PATCH v2 bpf-next 10/14] libbpf: add user_ctx support to bpf_link_create() API Andrii Nakryiko
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Rafael David Tinoco

Detect kernel support for BPF perf link and prefer it when attaching to
perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
open until BPF link is destroyed, at which point both perf_event FD and BPF
link FD will be closed.

This preserves current behavior in which perf_event FD is open for the
duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
will keep both perf_event and bpf_link FDs open, so it will be up to
(advanced) user to close them. This approach is demonstrated in user_ctx.c
selftests, added in this patch set.

Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 111 +++++++++++++++++++++++++++++++++--------
 1 file changed, 90 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f944342c0152..682e7aa8f90b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -193,6 +193,8 @@ enum kern_feature_id {
 	FEAT_MODULE_BTF,
 	/* BTF_KIND_FLOAT support */
 	FEAT_BTF_FLOAT,
+	/* BPF perf link support */
+	FEAT_PERF_LINK,
 	__FEAT_CNT,
 };
 
@@ -4342,6 +4344,37 @@ static int probe_module_btf(void)
 	return !err;
 }
 
+static int probe_perf_link(void)
+{
+	struct bpf_load_program_attr attr;
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd, link_fd, err;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.prog_type = BPF_PROG_TYPE_TRACEPOINT;
+	attr.insns = insns;
+	attr.insns_cnt = ARRAY_SIZE(insns);
+	attr.license = "GPL";
+	prog_fd = bpf_load_program_xattr(&attr, NULL, 0);
+	if (prog_fd < 0)
+		return -errno;
+
+	/* use invalid perf_event FD to get EBADF, if link is supported;
+	 * otherwise EINVAL should be returned
+	 */
+	link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
+	err = -errno; /* close() can clobber errno */
+
+	if (link_fd >= 0)
+		close(link_fd);
+	close(prog_fd);
+
+	return link_fd < 0 && err == -EBADF;
+}
+
 enum kern_feature_result {
 	FEAT_UNKNOWN = 0,
 	FEAT_SUPPORTED = 1,
@@ -4392,6 +4425,9 @@ static struct kern_feature_desc {
 	[FEAT_BTF_FLOAT] = {
 		"BTF_KIND_FLOAT support", probe_kern_btf_float,
 	},
+	[FEAT_PERF_LINK] = {
+		"BPF perf link support", probe_perf_link,
+	},
 };
 
 static bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
@@ -10211,23 +10247,38 @@ int bpf_link__unpin(struct bpf_link *link)
 	return 0;
 }
 
-static int bpf_link__detach_perf_event(struct bpf_link *link)
+struct bpf_link_perf {
+	struct bpf_link link;
+	int perf_event_fd;
+};
+
+static int bpf_link_perf_detach(struct bpf_link *link)
 {
-	int err;
+	struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
+	int err = 0;
 
-	err = ioctl(link->fd, PERF_EVENT_IOC_DISABLE, 0);
-	if (err)
+	if (ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0) < 0)
 		err = -errno;
 
+	if (perf_link->perf_event_fd != link->fd)
+		close(perf_link->perf_event_fd);
 	close(link->fd);
+
 	return libbpf_err(err);
 }
 
+static void bpf_link_perf_dealloc(struct bpf_link *link)
+{
+	struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
+
+	free(perf_link);
+}
+
 struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pfd)
 {
 	char errmsg[STRERR_BUFSIZE];
-	struct bpf_link *link;
-	int prog_fd, err;
+	struct bpf_link_perf *link;
+	int prog_fd, link_fd = -1, err;
 
 	if (pfd < 0) {
 		pr_warn("prog '%s': invalid perf event FD %d\n",
@@ -10244,27 +10295,45 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 	link = calloc(1, sizeof(*link));
 	if (!link)
 		return libbpf_err_ptr(-ENOMEM);
-	link->detach = &bpf_link__detach_perf_event;
-	link->fd = pfd;
+	link->link.detach = &bpf_link_perf_detach;
+	link->link.dealloc = &bpf_link_perf_dealloc;
+	link->perf_event_fd = pfd;
 
-	if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
-		err = -errno;
-		free(link);
-		pr_warn("prog '%s': failed to attach to pfd %d: %s\n",
-			prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
-		if (err == -EPROTO)
-			pr_warn("prog '%s': try add PERF_SAMPLE_CALLCHAIN to or remove exclude_callchain_[kernel|user] from pfd %d\n",
-				prog->name, pfd);
-		return libbpf_err_ptr(err);
+	if (kernel_supports(prog->obj, FEAT_PERF_LINK)) {
+		link_fd = bpf_link_create(prog_fd, pfd, BPF_PERF_EVENT, NULL);
+		if (link_fd < 0) {
+			err = -errno;
+			pr_warn("prog '%s': failed to create BPF link for perf_event FD %d: %d (%s)\n",
+				prog->name, pfd,
+				err, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+			goto err_out;
+		}
+		link->link.fd = link_fd;
+	} else {
+		if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
+			err = -errno;
+			pr_warn("prog '%s': failed to attach to perf_event FD %d: %s\n",
+				prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+			if (err == -EPROTO)
+				pr_warn("prog '%s': try add PERF_SAMPLE_CALLCHAIN to or remove exclude_callchain_[kernel|user] from pfd %d\n",
+					prog->name, pfd);
+			goto err_out;
+		}
+		link->link.fd = pfd;
 	}
 	if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
 		err = -errno;
-		free(link);
-		pr_warn("prog '%s': failed to enable pfd %d: %s\n",
+		pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
 			prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
-		return libbpf_err_ptr(err);
+		goto err_out;
 	}
-	return link;
+
+	return &link->link;
+err_out:
+	if (link_fd >= 0)
+		close(link_fd);
+	free(link);
+	return libbpf_err_ptr(err);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 10/14] libbpf: add user_ctx support to bpf_link_create() API
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-26 16:12 ` [PATCH v2 bpf-next 11/14] libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add ability to specify user_ctx value when creating BPF perf link with
bpf_link_create() low-level API.

Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
and corresponding OPTS struct to accomodate such changes. Add extra checks to
prevent using incompatible/unexpected combinations of fields.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c             | 32 +++++++++++++++++++++++++-------
 tools/lib/bpf/bpf.h             |  8 +++++++-
 tools/lib/bpf/libbpf_internal.h | 32 ++++++++++++++++++++++----------
 3 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 86dcac44f32f..8dcbee80ced7 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -684,8 +684,13 @@ int bpf_link_create(int prog_fd, int target_fd,
 	iter_info_len = OPTS_GET(opts, iter_info_len, 0);
 	target_btf_id = OPTS_GET(opts, target_btf_id, 0);
 
-	if (iter_info_len && target_btf_id)
-		return libbpf_err(-EINVAL);
+	/* validate we don't have unexpected combinations of non-zero fields */
+	if (iter_info_len || target_btf_id) {
+		if (iter_info_len && target_btf_id)
+			return libbpf_err(-EINVAL);
+		if (!OPTS_ZEROED(opts, target_btf_id))
+			return libbpf_err(-EINVAL);
+	}
 
 	memset(&attr, 0, sizeof(attr));
 	attr.link_create.prog_fd = prog_fd;
@@ -693,14 +698,27 @@ int bpf_link_create(int prog_fd, int target_fd,
 	attr.link_create.attach_type = attach_type;
 	attr.link_create.flags = OPTS_GET(opts, flags, 0);
 
-	if (iter_info_len) {
-		attr.link_create.iter_info =
-			ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
-		attr.link_create.iter_info_len = iter_info_len;
-	} else if (target_btf_id) {
+	if (target_btf_id) {
 		attr.link_create.target_btf_id = target_btf_id;
+		goto proceed;
 	}
 
+	switch (attach_type) {
+	case BPF_TRACE_ITER:
+		attr.link_create.iter_info = ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
+		attr.link_create.iter_info_len = iter_info_len;
+		break;
+	case BPF_PERF_EVENT:
+		attr.link_create.perf_event.user_ctx = OPTS_GET(opts, perf_event.user_ctx, 0);
+		if (!OPTS_ZEROED(opts, perf_event))
+			return libbpf_err(-EINVAL);
+		break;
+	default:
+		if (!OPTS_ZEROED(opts, flags))
+			return libbpf_err(-EINVAL);
+		break;
+	}
+proceed:
 	fd = sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
 	return libbpf_err_errno(fd);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 4f758f8f50cd..49d5d08c3832 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -177,8 +177,14 @@ struct bpf_link_create_opts {
 	union bpf_iter_link_info *iter_info;
 	__u32 iter_info_len;
 	__u32 target_btf_id;
+	union {
+		struct {
+			__u64 user_ctx;
+		} perf_event;
+	};
+	size_t :0;
 };
-#define bpf_link_create_opts__last_field target_btf_id
+#define bpf_link_create_opts__last_field perf_event
 
 LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
 			       enum bpf_attach_type attach_type,
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 016ca7cb4f8a..e371188c8f87 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -195,6 +195,17 @@ void *libbpf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz,
 		     size_t cur_cnt, size_t max_cnt, size_t add_cnt);
 int libbpf_ensure_mem(void **data, size_t *cap_cnt, size_t elem_sz, size_t need_cnt);
 
+static inline bool libbpf_is_mem_zeroed(const char *p, ssize_t len)
+{
+	while (len > 0) {
+		if (*p)
+			return false;
+		p++;
+		len--;
+	}
+	return true;
+}
+
 static inline bool libbpf_validate_opts(const char *opts,
 					size_t opts_sz, size_t user_sz,
 					const char *type_name)
@@ -203,16 +214,9 @@ static inline bool libbpf_validate_opts(const char *opts,
 		pr_warn("%s size (%zu) is too small\n", type_name, user_sz);
 		return false;
 	}
-	if (user_sz > opts_sz) {
-		size_t i;
-
-		for (i = opts_sz; i < user_sz; i++) {
-			if (opts[i]) {
-				pr_warn("%s has non-zero extra bytes\n",
-					type_name);
-				return false;
-			}
-		}
+	if (!libbpf_is_mem_zeroed(opts + opts_sz, (ssize_t)user_sz - opts_sz)) {
+		pr_warn("%s has non-zero extra bytes\n", type_name);
+		return false;
 	}
 	return true;
 }
@@ -232,6 +236,14 @@ static inline bool libbpf_validate_opts(const char *opts,
 			(opts)->field = value;	\
 	} while (0)
 
+#define OPTS_ZEROED(opts, last_nonzero_field)				      \
+({									      \
+	ssize_t __off = offsetofend(typeof(*(opts)), last_nonzero_field);     \
+	!(opts) || libbpf_is_mem_zeroed((const void *)opts + __off,	      \
+					(opts)->sz - __off);		      \
+})
+
+
 int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
 int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
 int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 11/14] libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 10/14] libbpf: add user_ctx support to bpf_link_create() API Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-30  1:11   ` Rafael David Tinoco
  2021-07-26 16:12 ` [PATCH v2 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API Andrii Nakryiko
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Rafael David Tinoco

Wire through user_ctx for all attach APIs that use perf_event_open under the
hood:
  - for kprobes, extend existing bpf_kprobe_opts with user_ctx field;
  - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
    pass user_ctx through opts.

For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
user_ctx is not supported either, return error and log warning for user.

Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c   | 78 +++++++++++++++++++++++++++++++++-------
 tools/lib/bpf/libbpf.h   | 71 +++++++++++++++++++++++++++++-------
 tools/lib/bpf/libbpf.map |  3 ++
 3 files changed, 127 insertions(+), 25 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 682e7aa8f90b..5836d3627ba6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10274,12 +10274,16 @@ static void bpf_link_perf_dealloc(struct bpf_link *link)
 	free(perf_link);
 }
 
-struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pfd)
+struct bpf_link *bpf_program__attach_perf_event_opts(struct bpf_program *prog, int pfd,
+						     const struct bpf_perf_event_opts *opts)
 {
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link_perf *link;
 	int prog_fd, link_fd = -1, err;
 
+	if (!OPTS_VALID(opts, bpf_perf_event_opts))
+		return libbpf_err_ptr(-EINVAL);
+
 	if (pfd < 0) {
 		pr_warn("prog '%s': invalid perf event FD %d\n",
 			prog->name, pfd);
@@ -10300,7 +10304,10 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 	link->perf_event_fd = pfd;
 
 	if (kernel_supports(prog->obj, FEAT_PERF_LINK)) {
-		link_fd = bpf_link_create(prog_fd, pfd, BPF_PERF_EVENT, NULL);
+		DECLARE_LIBBPF_OPTS(bpf_link_create_opts, link_opts,
+			.perf_event.user_ctx = OPTS_GET(opts, user_ctx, 0));
+
+		link_fd = bpf_link_create(prog_fd, pfd, BPF_PERF_EVENT, &link_opts);
 		if (link_fd < 0) {
 			err = -errno;
 			pr_warn("prog '%s': failed to create BPF link for perf_event FD %d: %d (%s)\n",
@@ -10310,6 +10317,12 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 		}
 		link->link.fd = link_fd;
 	} else {
+		if (OPTS_GET(opts, user_ctx, 0)) {
+			pr_warn("prog '%s': user context value is not supported\n", prog->name);
+			err = -EOPNOTSUPP;
+			goto err_out;
+		}
+
 		if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
 			err = -errno;
 			pr_warn("prog '%s': failed to attach to perf_event FD %d: %s\n",
@@ -10336,6 +10349,11 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 	return libbpf_err_ptr(err);
 }
 
+struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pfd)
+{
+	return bpf_program__attach_perf_event_opts(prog, pfd, NULL);
+}
+
 /*
  * this function is expected to parse integer in the range of [0, 2^31-1] from
  * given file using scanf format string fmt. If actual parsed value is
@@ -10444,8 +10462,9 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
 struct bpf_link *
 bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 				const char *func_name,
-				struct bpf_kprobe_opts *opts)
+				const struct bpf_kprobe_opts *opts)
 {
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	unsigned long offset;
@@ -10457,6 +10476,7 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 
 	retprobe = OPTS_GET(opts, retprobe, false);
 	offset = OPTS_GET(opts, offset, 0);
+	pe_opts.user_ctx = OPTS_GET(opts, user_ctx, 0);
 
 	pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name,
 				    offset, -1 /* pid */);
@@ -10466,7 +10486,7 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 			libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(pfd);
 	}
-	link = bpf_program__attach_perf_event(prog, pfd);
+	link = bpf_program__attach_perf_event_opts(prog, pfd, &pe_opts);
 	err = libbpf_get_error(link);
 	if (err) {
 		close(pfd);
@@ -10521,14 +10541,22 @@ static struct bpf_link *attach_kprobe(const struct bpf_sec_def *sec,
 	return link;
 }
 
-struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
-					    bool retprobe, pid_t pid,
-					    const char *binary_path,
-					    size_t func_offset)
+LIBBPF_API struct bpf_link *
+bpf_program__attach_uprobe_opts(struct bpf_program *prog, pid_t pid,
+				const char *binary_path, size_t func_offset,
+				const struct bpf_uprobe_opts *opts)
 {
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	int pfd, err;
+	bool retprobe;
+
+	if (!OPTS_VALID(opts, bpf_uprobe_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	retprobe = OPTS_GET(opts, retprobe, false);
+	pe_opts.user_ctx = OPTS_GET(opts, user_ctx, 0);
 
 	pfd = perf_event_open_probe(true /* uprobe */, retprobe,
 				    binary_path, func_offset, pid);
@@ -10539,7 +10567,7 @@ struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
 			libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(pfd);
 	}
-	link = bpf_program__attach_perf_event(prog, pfd);
+	link = bpf_program__attach_perf_event_opts(prog, pfd, &pe_opts);
 	err = libbpf_get_error(link);
 	if (err) {
 		close(pfd);
@@ -10552,6 +10580,16 @@ struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
 	return link;
 }
 
+struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
+					    bool retprobe, pid_t pid,
+					    const char *binary_path,
+					    size_t func_offset)
+{
+	DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts, .retprobe = retprobe);
+
+	return bpf_program__attach_uprobe_opts(prog, pid, binary_path, func_offset, &opts);
+}
+
 static int determine_tracepoint_id(const char *tp_category,
 				   const char *tp_name)
 {
@@ -10602,14 +10640,21 @@ static int perf_event_open_tracepoint(const char *tp_category,
 	return pfd;
 }
 
-struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
-						const char *tp_category,
-						const char *tp_name)
+struct bpf_link *bpf_program__attach_tracepoint_opts(struct bpf_program *prog,
+						     const char *tp_category,
+						     const char *tp_name,
+						     const struct bpf_tracepoint_opts *opts)
 {
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	int pfd, err;
 
+	if (!OPTS_VALID(opts, bpf_tracepoint_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	pe_opts.user_ctx = OPTS_GET(opts, user_ctx, 0);
+
 	pfd = perf_event_open_tracepoint(tp_category, tp_name);
 	if (pfd < 0) {
 		pr_warn("prog '%s': failed to create tracepoint '%s/%s' perf event: %s\n",
@@ -10617,7 +10662,7 @@ struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
 			libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(pfd);
 	}
-	link = bpf_program__attach_perf_event(prog, pfd);
+	link = bpf_program__attach_perf_event_opts(prog, pfd, &pe_opts);
 	err = libbpf_get_error(link);
 	if (err) {
 		close(pfd);
@@ -10629,6 +10674,13 @@ struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
 	return link;
 }
 
+struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
+						const char *tp_category,
+						const char *tp_name)
+{
+	return bpf_program__attach_tracepoint_opts(prog, tp_category, tp_name, NULL);
+}
+
 static struct bpf_link *attach_tp(const struct bpf_sec_def *sec,
 				  struct bpf_program *prog)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 1271d99bb7aa..85d336bcb510 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -104,17 +104,6 @@ struct bpf_object_open_opts {
 };
 #define bpf_object_open_opts__last_field btf_custom_path
 
-struct bpf_kprobe_opts {
-	/* size of this struct, for forward/backward compatiblity */
-	size_t sz;
-	/* function's offset to install kprobe to */
-	unsigned long offset;
-	/* kprobe is return probe */
-	bool retprobe;
-	size_t :0;
-};
-#define bpf_kprobe_opts__last_field retprobe
-
 LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
 LIBBPF_API struct bpf_object *
 bpf_object__open_file(const char *path, const struct bpf_object_open_opts *opts);
@@ -255,24 +244,82 @@ LIBBPF_API int bpf_link__destroy(struct bpf_link *link);
 
 LIBBPF_API struct bpf_link *
 bpf_program__attach(struct bpf_program *prog);
+
+struct bpf_perf_event_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_user_ctx() */
+	__u64 user_ctx;
+};
+#define bpf_perf_event_opts__last_field user_ctx
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_perf_event(struct bpf_program *prog, int pfd);
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_perf_event_opts(struct bpf_program *prog, int pfd,
+				    const struct bpf_perf_event_opts *opts);
+
+struct bpf_kprobe_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_user_ctx() */
+	__u64 user_ctx;
+	/* function's offset to install kprobe to */
+	unsigned long offset;
+	/* kprobe is return probe */
+	bool retprobe;
+	size_t :0;
+};
+#define bpf_kprobe_opts__last_field retprobe
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_kprobe(struct bpf_program *prog, bool retprobe,
 			   const char *func_name);
 LIBBPF_API struct bpf_link *
 bpf_program__attach_kprobe_opts(struct bpf_program *prog,
                                 const char *func_name,
-                                struct bpf_kprobe_opts *opts);
+                                const struct bpf_kprobe_opts *opts);
+
+struct bpf_uprobe_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_user_ctx() */
+	__u64 user_ctx;
+	/* uprobe is return probe, invoked at function return time */
+	bool retprobe;
+	size_t :0;
+};
+#define bpf_uprobe_opts__last_field retprobe
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_uprobe(struct bpf_program *prog, bool retprobe,
 			   pid_t pid, const char *binary_path,
 			   size_t func_offset);
+LIBBPF_API struct bpf_link *
+bpf_program__attach_uprobe_opts(struct bpf_program *prog, pid_t pid,
+				const char *binary_path, size_t func_offset,
+				const struct bpf_uprobe_opts *opts);
+
+struct bpf_tracepoint_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_user_ctx() */
+	__u64 user_ctx;
+};
+#define bpf_tracepoint_opts__last_field user_ctx
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_tracepoint(struct bpf_program *prog,
 			       const char *tp_category,
 			       const char *tp_name);
 LIBBPF_API struct bpf_link *
+bpf_program__attach_tracepoint_opts(struct bpf_program *prog,
+				    const char *tp_category,
+				    const char *tp_name,
+				    const struct bpf_tracepoint_opts *opts);
+
+LIBBPF_API struct bpf_link *
 bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
 				   const char *tp_name);
 LIBBPF_API struct bpf_link *
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index c240d488eb5e..a156f012e23d 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -374,6 +374,9 @@ LIBBPF_0.5.0 {
 		bpf_map__pin_path;
 		bpf_map_lookup_and_delete_elem_flags;
 		bpf_program__attach_kprobe_opts;
+		bpf_program__attach_perf_event_opts;
+		bpf_program__attach_tracepoint_opts;
+		bpf_program__attach_uprobe_opts;
 		bpf_object__gen_loader;
 		btf_dump__dump_type_data;
 		libbpf_set_strict_mode;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (10 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 11/14] libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-26 16:12 ` [PATCH v2 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h} Andrii Nakryiko
  2021-07-26 16:12 ` [PATCH v2 bpf-next 14/14] selftests/bpf: add user_ctx selftests for high-level APIs Andrii Nakryiko
  13 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add tests utilizing low-level bpf_link_create() API to create perf BPF link.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/perf_link.c      | 89 +++++++++++++++++++
 .../selftests/bpf/progs/test_perf_link.c      | 16 ++++
 2 files changed, 105 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_link.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_link.c b/tools/testing/selftests/bpf/prog_tests/perf_link.c
new file mode 100644
index 000000000000..b1abd0c46607
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_link.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <test_progs.h>
+#include "test_perf_link.skel.h"
+
+static void burn_cpu(void)
+{
+	volatile int j = 0;
+	cpu_set_t cpu_set;
+	int i, err;
+
+	/* generate some branches on cpu 0 */
+	CPU_ZERO(&cpu_set);
+	CPU_SET(0, &cpu_set);
+	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+	ASSERT_OK(err, "set_thread_affinity");
+
+	/* spin the loop for a while (random high number) */
+	for (i = 0; i < 1000000; ++i)
+		++j;
+}
+
+void test_perf_link(void)
+{
+	struct test_perf_link *skel = NULL;
+	struct perf_event_attr attr;
+	int pfd = -1, link_fd = -1, err;
+	int run_cnt_before, run_cnt_after;
+	struct bpf_link_info info;
+	__u32 info_len = sizeof(info);
+
+	/* create perf event */
+	memset(&attr, 0, sizeof(attr));
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_SOFTWARE;
+	attr.config = PERF_COUNT_SW_CPU_CLOCK;
+	attr.freq = 1;
+	attr.sample_freq = 4000;
+	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+	if (!ASSERT_GE(pfd, 0, "perf_fd"))
+		goto cleanup;
+
+	skel = test_perf_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	link_fd = bpf_link_create(bpf_program__fd(skel->progs.handler), pfd,
+				  BPF_PERF_EVENT, NULL);
+	if (!ASSERT_GE(link_fd, 0, "link_fd"))
+		goto cleanup;
+
+	memset(&info, 0, sizeof(info));
+	err = bpf_obj_get_info_by_fd(link_fd, &info, &info_len);
+	if (!ASSERT_OK(err, "link_get_info"))
+		goto cleanup;
+
+	ASSERT_EQ(info.type, BPF_LINK_TYPE_PERF_EVENT, "link_type");
+	ASSERT_GT(info.id, 0, "link_id");
+	ASSERT_GT(info.prog_id, 0, "link_prog_id");
+
+	/* ensure we get at least one perf_event prog execution */
+	burn_cpu();
+	ASSERT_GT(skel->bss->run_cnt, 0, "run_cnt");
+
+	/* perf_event is still active, but we close link and BPF program
+	 * shouldn't be executed anymore
+	 */
+	close(link_fd);
+	link_fd = -1;
+
+	/* make sure there are no stragglers */
+	kern_sync_rcu();
+
+	run_cnt_before = skel->bss->run_cnt;
+	burn_cpu();
+	run_cnt_after = skel->bss->run_cnt;
+
+	ASSERT_EQ(run_cnt_before, run_cnt_after, "run_cnt_before_after");
+
+cleanup:
+	if (link_fd >= 0)
+		close(link_fd);
+	if (pfd >= 0)
+		close(pfd);
+	test_perf_link__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_link.c b/tools/testing/selftests/bpf/progs/test_perf_link.c
new file mode 100644
index 000000000000..c1db9fd98d0c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_link.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int run_cnt = 0;
+
+SEC("perf_event")
+int handler(struct pt_regs *ctx)
+{
+	__sync_fetch_and_add(&run_cnt, 1);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h}
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (11 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  2021-07-26 16:12 ` [PATCH v2 bpf-next 14/14] selftests/bpf: add user_ctx selftests for high-level APIs Andrii Nakryiko
  13 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Extract two helpers used for working with uprobes into trace_helpers.{c,h} to
be re-used between multiple uprobe-using selftests. Also rename get_offset()
into more appropriate get_uprobe_offset().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/attach_probe.c   | 61 +----------------
 tools/testing/selftests/bpf/trace_helpers.c   | 66 +++++++++++++++++++
 tools/testing/selftests/bpf/trace_helpers.h   |  3 +
 3 files changed, 70 insertions(+), 60 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index ec11e20d2b92..e40b41c44f8b 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -2,65 +2,6 @@
 #include <test_progs.h>
 #include "test_attach_probe.skel.h"
 
-#if defined(__powerpc64__) && defined(_CALL_ELF) && _CALL_ELF == 2
-
-#define OP_RT_RA_MASK   0xffff0000UL
-#define LIS_R2          0x3c400000UL
-#define ADDIS_R2_R12    0x3c4c0000UL
-#define ADDI_R2_R2      0x38420000UL
-
-static ssize_t get_offset(ssize_t addr, ssize_t base)
-{
-	u32 *insn = (u32 *) addr;
-
-	/*
-	 * A PPC64 ABIv2 function may have a local and a global entry
-	 * point. We need to use the local entry point when patching
-	 * functions, so identify and step over the global entry point
-	 * sequence.
-	 *
-	 * The global entry point sequence is always of the form:
-	 *
-	 * addis r2,r12,XXXX
-	 * addi  r2,r2,XXXX
-	 *
-	 * A linker optimisation may convert the addis to lis:
-	 *
-	 * lis   r2,XXXX
-	 * addi  r2,r2,XXXX
-	 */
-	if ((((*insn & OP_RT_RA_MASK) == ADDIS_R2_R12) ||
-	     ((*insn & OP_RT_RA_MASK) == LIS_R2)) &&
-	    ((*(insn + 1) & OP_RT_RA_MASK) == ADDI_R2_R2))
-		return (ssize_t)(insn + 2) - base;
-	else
-		return addr - base;
-}
-#else
-#define get_offset(addr, base) (addr - base)
-#endif
-
-ssize_t get_base_addr() {
-	size_t start, offset;
-	char buf[256];
-	FILE *f;
-
-	f = fopen("/proc/self/maps", "r");
-	if (!f)
-		return -errno;
-
-	while (fscanf(f, "%zx-%*x %s %zx %*[^\n]\n",
-		      &start, buf, &offset) == 3) {
-		if (strcmp(buf, "r-xp") == 0) {
-			fclose(f);
-			return start - offset;
-		}
-	}
-
-	fclose(f);
-	return -EINVAL;
-}
-
 void test_attach_probe(void)
 {
 	int duration = 0;
@@ -74,7 +15,7 @@ void test_attach_probe(void)
 	if (CHECK(base_addr < 0, "get_base_addr",
 		  "failed to find base addr: %zd", base_addr))
 		return;
-	uprobe_offset = get_offset((size_t)&get_base_addr, base_addr);
+	uprobe_offset = get_uprobe_offset(&get_base_addr, base_addr);
 
 	skel = test_attach_probe__open_and_load();
 	if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 1bbd1d9830c8..381dafce1d8f 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -136,3 +136,69 @@ void read_trace_pipe(void)
 		}
 	}
 }
+
+#if defined(__powerpc64__) && defined(_CALL_ELF) && _CALL_ELF == 2
+
+#define OP_RT_RA_MASK   0xffff0000UL
+#define LIS_R2          0x3c400000UL
+#define ADDIS_R2_R12    0x3c4c0000UL
+#define ADDI_R2_R2      0x38420000UL
+
+ssize_t get_uprobe_offset(const void *addr, ssize_t base)
+{
+	u32 *insn = (u32 *)(uintptr_t)addr;
+
+	/*
+	 * A PPC64 ABIv2 function may have a local and a global entry
+	 * point. We need to use the local entry point when patching
+	 * functions, so identify and step over the global entry point
+	 * sequence.
+	 *
+	 * The global entry point sequence is always of the form:
+	 *
+	 * addis r2,r12,XXXX
+	 * addi  r2,r2,XXXX
+	 *
+	 * A linker optimisation may convert the addis to lis:
+	 *
+	 * lis   r2,XXXX
+	 * addi  r2,r2,XXXX
+	 */
+	if ((((*insn & OP_RT_RA_MASK) == ADDIS_R2_R12) ||
+	     ((*insn & OP_RT_RA_MASK) == LIS_R2)) &&
+	    ((*(insn + 1) & OP_RT_RA_MASK) == ADDI_R2_R2))
+		return (ssize_t)(insn + 2) - base;
+	else
+		return (uintptr_t)addr - base;
+}
+
+#else
+
+ssize_t get_uprobe_offset(const void *addr, ssize_t base)
+{
+	return (uintptr_t)addr - base;
+}
+
+#endif
+
+ssize_t get_base_addr(void)
+{
+	size_t start, offset;
+	char buf[256];
+	FILE *f;
+
+	f = fopen("/proc/self/maps", "r");
+	if (!f)
+		return -errno;
+
+	while (fscanf(f, "%zx-%*x %s %zx %*[^\n]\n",
+		      &start, buf, &offset) == 3) {
+		if (strcmp(buf, "r-xp") == 0) {
+			fclose(f);
+			return start - offset;
+		}
+	}
+
+	fclose(f);
+	return -EINVAL;
+}
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
index f62fdef9e589..3d9435b3dd3b 100644
--- a/tools/testing/selftests/bpf/trace_helpers.h
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -18,4 +18,7 @@ int kallsyms_find(const char *sym, unsigned long long *addr);
 
 void read_trace_pipe(void);
 
+ssize_t get_uprobe_offset(const void *addr, ssize_t base);
+ssize_t get_base_addr(void);
+
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 bpf-next 14/14] selftests/bpf: add user_ctx selftests for high-level APIs
  2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
                   ` (12 preceding siblings ...)
  2021-07-26 16:12 ` [PATCH v2 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h} Andrii Nakryiko
@ 2021-07-26 16:12 ` Andrii Nakryiko
  13 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-26 16:12 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add selftest with few subtests testing proper user_ctx usage.

Kprobe and uprobe subtests are pretty straightforward and just validate that
the same BPF program attached with different user_ctx will be triggered with
those different user_ctx values.

Tracepoint subtest is a bit more interesting, as it is the only
perf_event-based BPF hook that shares bpf_prog_array between multiple
perf_events internally. This means that the same BPF program can't be attached
to the same tracepoint multiple times. So we have 3 identical copies. This
arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array
list manipulation logic when programs are attached and detached.  The test
validates that user_ctx isn't mixed up and isn't lost during such list
manipulations.

Perf_event subtest validates that two BPF links can be created against the
same perf_event (but not at the same time, only one BPF program can be
attached to perf_event itself), and that for each we can specify different
user_ctx value.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/user_ctx.c       | 254 ++++++++++++++++++
 .../selftests/bpf/progs/test_user_ctx.c       |  85 ++++++
 2 files changed, 339 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/user_ctx.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_user_ctx.c

diff --git a/tools/testing/selftests/bpf/prog_tests/user_ctx.c b/tools/testing/selftests/bpf/prog_tests/user_ctx.c
new file mode 100644
index 000000000000..86374c8666dd
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/user_ctx.c
@@ -0,0 +1,254 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <test_progs.h>
+#include "test_user_ctx.skel.h"
+
+static void kprobe_subtest(struct test_user_ctx *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts);
+	struct bpf_link *link1 = NULL, *link2 = NULL;
+	struct bpf_link *retlink1 = NULL, *retlink2 = NULL;
+
+	/* attach two kprobes */
+	opts.user_ctx = 0x1;
+	opts.retprobe = false;
+	link1 = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+						 SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(link1, "link1"))
+		goto cleanup;
+
+	opts.user_ctx = 0x2;
+	opts.retprobe = false;
+	link2 = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+						 SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(link2, "link2"))
+		goto cleanup;
+
+	/* attach two kretprobes */
+	opts.user_ctx = 0x10;
+	opts.retprobe = true;
+	retlink1 = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+						    SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(retlink1, "retlink1"))
+		goto cleanup;
+
+	opts.user_ctx = 0x20;
+	opts.retprobe = true;
+	retlink2 = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+						    SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(retlink2, "retlink2"))
+		goto cleanup;
+
+	/* trigger kprobe && kretprobe */
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->kprobe_res, 0x1 | 0x2, "kprobe_res");
+	ASSERT_EQ(skel->bss->kretprobe_res, 0x10 | 0x20, "kretprobe_res");
+
+cleanup:
+	bpf_link__destroy(link1);
+	bpf_link__destroy(link2);
+	bpf_link__destroy(retlink1);
+	bpf_link__destroy(retlink2);
+}
+
+static void uprobe_subtest(struct test_user_ctx *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts);
+	struct bpf_link *link1 = NULL, *link2 = NULL;
+	struct bpf_link *retlink1 = NULL, *retlink2 = NULL;
+	size_t uprobe_offset;
+	ssize_t base_addr;
+
+	base_addr = get_base_addr();
+	uprobe_offset = get_uprobe_offset(&get_base_addr, base_addr);
+
+	/* attach two uprobes */
+	opts.user_ctx = 0x100;
+	opts.retprobe = false;
+	link1 = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe, 0 /* self pid */,
+						"/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(link1, "link1"))
+		goto cleanup;
+
+	opts.user_ctx = 0x200;
+	opts.retprobe = false;
+	link2 = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe, -1 /* any pid */,
+						"/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(link2, "link2"))
+		goto cleanup;
+
+	/* attach two uretprobes */
+	opts.user_ctx = 0x1000;
+	opts.retprobe = true;
+	retlink1 = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe, -1 /* any pid */,
+						   "/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(retlink1, "retlink1"))
+		goto cleanup;
+
+	opts.user_ctx = 0x2000;
+	opts.retprobe = true;
+	retlink2 = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe, 0 /* self pid */,
+						   "/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(retlink2, "retlink2"))
+		goto cleanup;
+
+	/* trigger uprobe && uretprobe */
+	get_base_addr();
+
+	ASSERT_EQ(skel->bss->uprobe_res, 0x100 | 0x200, "uprobe_res");
+	ASSERT_EQ(skel->bss->uretprobe_res, 0x1000 | 0x2000, "uretprobe_res");
+
+cleanup:
+	bpf_link__destroy(link1);
+	bpf_link__destroy(link2);
+	bpf_link__destroy(retlink1);
+	bpf_link__destroy(retlink2);
+}
+
+static void tp_subtest(struct test_user_ctx *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tracepoint_opts, opts);
+	struct bpf_link *link1 = NULL, *link2 = NULL, *link3 = NULL;
+
+	/* attach first tp prog */
+	opts.user_ctx = 0x10000;
+	link1 = bpf_program__attach_tracepoint_opts(skel->progs.handle_tp1,
+						    "syscalls", "sys_enter_nanosleep", &opts);
+	if (!ASSERT_OK_PTR(link1, "link1"))
+		goto cleanup;
+
+	/* attach second tp prog */
+	opts.user_ctx = 0x20000;
+	link2 = bpf_program__attach_tracepoint_opts(skel->progs.handle_tp2,
+						    "syscalls", "sys_enter_nanosleep", &opts);
+	if (!ASSERT_OK_PTR(link2, "link2"))
+		goto cleanup;
+
+	/* trigger tracepoints */
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->tp_res, 0x10000 | 0x20000, "tp_res1");
+
+	/* now we detach first prog and will attach third one, which causes
+	 * two internal calls to bpf_prog_array_copy(), shuffling
+	 * bpf_prog_array_items around. We test here that we don't lose track
+	 * of associated user_ctxs.
+	 */
+	bpf_link__destroy(link1);
+	link1 = NULL;
+	kern_sync_rcu();
+	skel->bss->tp_res = 0;
+
+	/* attach third tp prog */
+	opts.user_ctx = 0x40000;
+	link3 = bpf_program__attach_tracepoint_opts(skel->progs.handle_tp3,
+						    "syscalls", "sys_enter_nanosleep", &opts);
+	if (!ASSERT_OK_PTR(link3, "link3"))
+		goto cleanup;
+
+	/* trigger tracepoints */
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->tp_res, 0x20000 | 0x40000, "tp_res2");
+
+cleanup:
+	bpf_link__destroy(link1);
+	bpf_link__destroy(link2);
+	bpf_link__destroy(link3);
+}
+
+static void burn_cpu(void)
+{
+	volatile int j = 0;
+	cpu_set_t cpu_set;
+	int i, err;
+
+	/* generate some branches on cpu 0 */
+	CPU_ZERO(&cpu_set);
+	CPU_SET(0, &cpu_set);
+	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+	ASSERT_OK(err, "set_thread_affinity");
+
+	/* spin the loop for a while (random high number) */
+	for (i = 0; i < 1000000; ++i)
+		++j;
+}
+
+static void pe_subtest(struct test_user_ctx *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, opts);
+	struct bpf_link *link = NULL;
+	struct perf_event_attr attr;
+	int pfd = -1;
+
+	/* create perf event */
+	memset(&attr, 0, sizeof(attr));
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_SOFTWARE;
+	attr.config = PERF_COUNT_SW_CPU_CLOCK;
+	attr.freq = 1;
+	attr.sample_freq = 4000;
+	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+	if (!ASSERT_GE(pfd, 0, "perf_fd"))
+		goto cleanup;
+
+	opts.user_ctx = 0x100000;
+	link = bpf_program__attach_perf_event_opts(skel->progs.handle_pe, pfd, &opts);
+	if (!ASSERT_OK_PTR(link, "link1"))
+		goto cleanup;
+
+	burn_cpu(); /* trigger BPF prog */
+
+	ASSERT_EQ(skel->bss->pe_res, 0x100000, "pe_res1");
+
+	/* prevent bpf_link__destroy() closing pfd itself */
+	bpf_link__disconnect(link);
+	/* close BPF link's FD explicitly */
+	close(bpf_link__fd(link));
+	/* free up memory used by struct bpf_link */
+	bpf_link__destroy(link);
+	link = NULL;
+	kern_sync_rcu();
+	skel->bss->pe_res = 0;
+
+	opts.user_ctx = 0x200000;
+	link = bpf_program__attach_perf_event_opts(skel->progs.handle_pe, pfd, &opts);
+	if (!ASSERT_OK_PTR(link, "link2"))
+		goto cleanup;
+
+	burn_cpu(); /* trigger BPF prog */
+
+	ASSERT_EQ(skel->bss->pe_res, 0x200000, "pe_res2");
+
+cleanup:
+	close(pfd);
+	bpf_link__destroy(link);
+}
+
+void test_user_ctx(void)
+{
+	struct test_user_ctx *skel;
+
+	skel = test_user_ctx__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	skel->bss->my_tid = syscall(SYS_gettid);
+
+	if (test__start_subtest("kprobe"))
+		kprobe_subtest(skel);
+	if (test__start_subtest("uprobe"))
+		uprobe_subtest(skel);
+	if (test__start_subtest("tracepoint"))
+		tp_subtest(skel);
+	if (test__start_subtest("perf_event"))
+		pe_subtest(skel);
+
+	test_user_ctx__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_user_ctx.c b/tools/testing/selftests/bpf/progs/test_user_ctx.c
new file mode 100644
index 000000000000..e641bb5e1774
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_user_ctx.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int my_tid;
+
+int kprobe_res;
+int kprobe_multi_res;
+int kretprobe_res;
+int uprobe_res;
+int uretprobe_res;
+int tp_res;
+int pe_res;
+
+static void update(void *ctx, int *res)
+{
+	if (my_tid != (u32)bpf_get_current_pid_tgid())
+		return;
+
+	*res |= bpf_get_user_ctx(ctx);
+}
+
+SEC("kprobe/sys_nanosleep")
+int handle_kprobe(struct pt_regs *ctx)
+{
+	update(ctx, &kprobe_res);
+	return 0;
+}
+
+SEC("kretprobe/sys_nanosleep")
+int handle_kretprobe(struct pt_regs *ctx)
+{
+	update(ctx, &kretprobe_res);
+	return 0;
+}
+
+SEC("uprobe/trigger_func")
+int handle_uprobe(struct pt_regs *ctx)
+{
+	update(ctx, &uprobe_res);
+	return 0;
+}
+
+SEC("uretprobe/trigger_func")
+int handle_uretprobe(struct pt_regs *ctx)
+{
+	update(ctx, &uretprobe_res);
+	return 0;
+}
+
+/* bpf_prog_array, used by kernel internally to keep track of attached BPF
+ * programs to a given BPF hook (e.g., for tracepoints) doesn't allow the same
+ * BPF program to be attached multiple times. So have three identical copies
+ * ready to attach to the same tracepoint.
+ */
+SEC("tp/syscalls/sys_enter_nanosleep")
+int handle_tp1(struct pt_regs *ctx)
+{
+	update(ctx, &tp_res);
+	return 0;
+}
+SEC("tp/syscalls/sys_enter_nanosleep")
+int handle_tp2(struct pt_regs *ctx)
+{
+	update(ctx, &tp_res);
+	return 0;
+}
+SEC("tp/syscalls/sys_enter_nanosleep")
+int handle_tp3(void *ctx)
+{
+	update(ctx, &tp_res);
+	return 1;
+}
+
+SEC("perf_event")
+int handle_pe(struct pt_regs *ctx)
+{
+	update(ctx, &pe_res);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input
  2021-07-26 16:12 ` [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
@ 2021-07-27  8:48   ` Peter Zijlstra
  2021-07-29 17:09   ` Yonghong Song
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-07-27  8:48 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team

On Mon, Jul 26, 2021 at 09:12:00AM -0700, Andrii Nakryiko wrote:
> Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
> input argument, which makes it easier to re-use for other internal uses
> (coming up for BPF link in the next patch). BPF program FD is not as
> convenient and in some cases it's not available. So switch to struct bpf_prog,
> move out refcounting outside and let caller do bpf_prog_put() in case of an
> error. This follows the approach of most of the other BPF internal functions.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
@ 2021-07-27  9:04   ` Peter Zijlstra
  2021-07-30  4:23     ` Andrii Nakryiko
  2021-07-27  9:12   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-07-27  9:04 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team

On Mon, Jul 26, 2021 at 09:12:01AM -0700, Andrii Nakryiko wrote:
> Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
> BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
> the common BPF link infrastructure, allowing to list all active perf_event
> based attachments, auto-detaching BPF program from perf_event when link's FD
> is closed, get generic BPF link fdinfo/get_info functionality.
> 
> BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
> are currently supported.
> 
> Force-detaching and atomic BPF program updates are not yet implemented, but
> with perf_event-based BPF links we now have common framework for this without
> the need to extend ioctl()-based perf_event interface.
> 
> One interesting consideration is a new value for bpf_attach_type, which
> BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
> bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
> bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
> BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
> program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
> mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
> define a single BPF_PERF_EVENT attach type for all of them and adjust
> link_create()'s logic for checking correspondence between attach type and
> program type.
> 
> The alternative would be to define three new attach types (e.g., BPF_KPROBE,
> BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
> and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
> libbpf. I chose to not do this to avoid unnecessary proliferation of
> bpf_attach_type enum values and not have to deal with naming conflicts.
> 

So I have no idea what all that means... I don't speak BPF. That said,
the patch doesn't look terrible.

One little question below, but otherwise:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> +static void bpf_perf_link_release(struct bpf_link *link)
> +{
> +	struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
> +	struct perf_event *event = perf_link->perf_file->private_data;
> +
> +	perf_event_free_bpf_prog(event);
> +	fput(perf_link->perf_file);
> +}

> +static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	struct bpf_link_primer link_primer;
> +	struct bpf_perf_link *link;
> +	struct perf_event *event;
> +	struct file *perf_file;
> +	int err;
> +
> +	if (attr->link_create.flags)
> +		return -EINVAL;
> +
> +	perf_file = perf_event_get(attr->link_create.target_fd);
> +	if (IS_ERR(perf_file))
> +		return PTR_ERR(perf_file);
> +
> +	link = kzalloc(sizeof(*link), GFP_USER);
> +	if (!link) {
> +		err = -ENOMEM;
> +		goto out_put_file;
> +	}
> +	bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog);
> +	link->perf_file = perf_file;
> +
> +	err = bpf_link_prime(&link->link, &link_primer);
> +	if (err) {
> +		kfree(link);
> +		goto out_put_file;
> +	}
> +
> +	event = perf_file->private_data;
> +	err = perf_event_set_bpf_prog(event, prog);
> +	if (err) {
> +		bpf_link_cleanup(&link_primer);
> +		goto out_put_file;
> +	}
> +	/* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */

Is that otherwise expected? AFAICT the previous users of that function
were guaranteed the existance of the BPF program. But afaict there is
nothing that prevents perf_event_*_bpf_prog() from doing the addition
refcounting if that is more convenient.

> +	bpf_prog_inc(prog);
> +
> +	return bpf_link_settle(&link_primer);
> +
> +out_put_file:
> +	fput(perf_file);
> +	return err;
> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-26 16:12 ` [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links Andrii Nakryiko
@ 2021-07-27  9:11   ` Peter Zijlstra
  2021-07-27 21:09     ` Andrii Nakryiko
  2021-07-29 18:00   ` Yonghong Song
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-07-27  9:11 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team

On Mon, Jul 26, 2021 at 09:12:02AM -0700, Andrii Nakryiko wrote:
> Add ability for users to specify custom u64 value when creating BPF link for
> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).

If I read this right, the value is dependent on the link, not the
program. In which case:

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2d510ad750ed..97ab46802800 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -762,6 +762,7 @@ struct perf_event {
>  #ifdef CONFIG_BPF_SYSCALL
>  	perf_overflow_handler_t		orig_overflow_handler;
>  	struct bpf_prog			*prog;
> +	u64				user_ctx;
>  #endif
>  
>  #ifdef CONFIG_EVENT_TRACING
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 8ac92560d3a3..4543852f1480 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
>  
>  #ifdef CONFIG_BPF_EVENTS
>  unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
> -int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> +int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx);

This API would be misleading, because it is about setting the program.

>  void perf_event_detach_bpf_prog(struct perf_event *event);
>  int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>  int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);

> @@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog
>  	}
>  
>  	event->prog = prog;
> +	event->user_ctx = user_ctx;
>  	event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
>  	WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
>  	return 0;

Also, the name @user_ctx is a bit confusing. Would something like
@bpf_cookie or somesuch not be a better name?

Combined would it not make more sense to add something like:

extern int perf_event_set_bpf_cookie(struct perf_event *event, u64 cookie);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
  2021-07-27  9:04   ` Peter Zijlstra
@ 2021-07-27  9:12   ` Peter Zijlstra
  2021-07-27 20:56     ` Andrii Nakryiko
  2021-07-27 15:40   ` Jiri Olsa
  2021-07-29 17:35   ` Yonghong Song
  3 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-07-27  9:12 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team

On Mon, Jul 26, 2021 at 09:12:01AM -0700, Andrii Nakryiko wrote:
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index ad413b382a3c..8ac92560d3a3 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
>  void perf_trace_buf_update(void *record, u16 type);
>  void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>  
> +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> +void perf_event_free_bpf_prog(struct perf_event *event);
> +
>  void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>  void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
>  void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,

Oh, I just noticed, is this the right header to put these in? Should
this not go into include/linux/perf_event.h ?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
  2021-07-27  9:04   ` Peter Zijlstra
  2021-07-27  9:12   ` Peter Zijlstra
@ 2021-07-27 15:40   ` Jiri Olsa
  2021-07-27 20:56     ` Andrii Nakryiko
  2021-07-29 17:35   ` Yonghong Song
  3 siblings, 1 reply; 43+ messages in thread
From: Jiri Olsa @ 2021-07-27 15:40 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team, Peter Zijlstra

On Mon, Jul 26, 2021 at 09:12:01AM -0700, Andrii Nakryiko wrote:
> Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
> BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
> the common BPF link infrastructure, allowing to list all active perf_event
> based attachments, auto-detaching BPF program from perf_event when link's FD
> is closed, get generic BPF link fdinfo/get_info functionality.
> 
> BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
> are currently supported.
> 
> Force-detaching and atomic BPF program updates are not yet implemented, but
> with perf_event-based BPF links we now have common framework for this without
> the need to extend ioctl()-based perf_event interface.
> 
> One interesting consideration is a new value for bpf_attach_type, which
> BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
> bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
> bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
> BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
> program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
> mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
> define a single BPF_PERF_EVENT attach type for all of them and adjust
> link_create()'s logic for checking correspondence between attach type and
> program type.
> 
> The alternative would be to define three new attach types (e.g., BPF_KPROBE,
> BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
> and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
> libbpf. I chose to not do this to avoid unnecessary proliferation of
> bpf_attach_type enum values and not have to deal with naming conflicts.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf_types.h      |   3 +
>  include/linux/trace_events.h   |   3 +
>  include/uapi/linux/bpf.h       |   2 +
>  kernel/bpf/syscall.c           | 105 ++++++++++++++++++++++++++++++---
>  kernel/events/core.c           |  10 ++--
>  tools/include/uapi/linux/bpf.h |   2 +
>  6 files changed, 112 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index a9db1eae6796..0a1ada7f174d 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
>  #ifdef CONFIG_NET
>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
>  #endif
> +#ifdef CONFIG_PERF_EVENTS
> +BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> +#endif
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index ad413b382a3c..8ac92560d3a3 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
>  void perf_trace_buf_update(void *record, u16 type);
>  void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>  
> +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> +void perf_event_free_bpf_prog(struct perf_event *event);
> +
>  void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>  void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
>  void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2db6925e04f4..00b1267ab4f0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -993,6 +993,7 @@ enum bpf_attach_type {
>  	BPF_SK_SKB_VERDICT,
>  	BPF_SK_REUSEPORT_SELECT,
>  	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> +	BPF_PERF_EVENT,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> @@ -1006,6 +1007,7 @@ enum bpf_link_type {
>  	BPF_LINK_TYPE_ITER = 4,
>  	BPF_LINK_TYPE_NETNS = 5,
>  	BPF_LINK_TYPE_XDP = 6,
> +	BPF_LINK_TYPE_PERF_EVENT = 6,

hi, should be 7

jirka


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-27  9:12   ` Peter Zijlstra
@ 2021-07-27 20:56     ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-27 20:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Tue, Jul 27, 2021 at 2:15 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jul 26, 2021 at 09:12:01AM -0700, Andrii Nakryiko wrote:
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index ad413b382a3c..8ac92560d3a3 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
> >  void perf_trace_buf_update(void *record, u16 type);
> >  void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
> >
> > +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> > +void perf_event_free_bpf_prog(struct perf_event *event);
> > +
> >  void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> >  void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
> >  void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
>
> Oh, I just noticed, is this the right header to put these in? Should
> this not go into include/linux/perf_event.h ?

Not that I care much, but this one has perf_event_attach_bpf_prog()
and perf_event_detach_bpf_prog(), so it felt appropriate to put it
here. perf_event.h only seems to have perf_event_bpf_event() for
BPF-related stuff (which seems to be just notification events, not
really BPF functionality per se). But let me know if you prefer to add
these new ones to perf_event.h.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-27 15:40   ` Jiri Olsa
@ 2021-07-27 20:56     ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-27 20:56 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Tue, Jul 27, 2021 at 8:40 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Mon, Jul 26, 2021 at 09:12:01AM -0700, Andrii Nakryiko wrote:
> > Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
> > BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
> > the common BPF link infrastructure, allowing to list all active perf_event
> > based attachments, auto-detaching BPF program from perf_event when link's FD
> > is closed, get generic BPF link fdinfo/get_info functionality.
> >
> > BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
> > are currently supported.
> >
> > Force-detaching and atomic BPF program updates are not yet implemented, but
> > with perf_event-based BPF links we now have common framework for this without
> > the need to extend ioctl()-based perf_event interface.
> >
> > One interesting consideration is a new value for bpf_attach_type, which
> > BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
> > bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
> > bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
> > BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
> > program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
> > mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
> > define a single BPF_PERF_EVENT attach type for all of them and adjust
> > link_create()'s logic for checking correspondence between attach type and
> > program type.
> >
> > The alternative would be to define three new attach types (e.g., BPF_KPROBE,
> > BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
> > and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
> > libbpf. I chose to not do this to avoid unnecessary proliferation of
> > bpf_attach_type enum values and not have to deal with naming conflicts.
> >
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf_types.h      |   3 +
> >  include/linux/trace_events.h   |   3 +
> >  include/uapi/linux/bpf.h       |   2 +
> >  kernel/bpf/syscall.c           | 105 ++++++++++++++++++++++++++++++---
> >  kernel/events/core.c           |  10 ++--
> >  tools/include/uapi/linux/bpf.h |   2 +
> >  6 files changed, 112 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index a9db1eae6796..0a1ada7f174d 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
> >  #ifdef CONFIG_NET
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
> >  #endif
> > +#ifdef CONFIG_PERF_EVENTS
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> > +#endif
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index ad413b382a3c..8ac92560d3a3 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
> >  void perf_trace_buf_update(void *record, u16 type);
> >  void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
> >
> > +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> > +void perf_event_free_bpf_prog(struct perf_event *event);
> > +
> >  void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> >  void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
> >  void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 2db6925e04f4..00b1267ab4f0 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -993,6 +993,7 @@ enum bpf_attach_type {
> >       BPF_SK_SKB_VERDICT,
> >       BPF_SK_REUSEPORT_SELECT,
> >       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > +     BPF_PERF_EVENT,
> >       __MAX_BPF_ATTACH_TYPE
> >  };
> >
> > @@ -1006,6 +1007,7 @@ enum bpf_link_type {
> >       BPF_LINK_TYPE_ITER = 4,
> >       BPF_LINK_TYPE_NETNS = 5,
> >       BPF_LINK_TYPE_XDP = 6,
> > +     BPF_LINK_TYPE_PERF_EVENT = 6,
>
> hi, should be 7
>

doh! Eagle eyes! Will fix, thanks :)

> jirka
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-27  9:11   ` Peter Zijlstra
@ 2021-07-27 21:09     ` Andrii Nakryiko
  2021-07-28  8:58       ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-27 21:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Tue, Jul 27, 2021 at 2:14 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jul 26, 2021 at 09:12:02AM -0700, Andrii Nakryiko wrote:
> > Add ability for users to specify custom u64 value when creating BPF link for
> > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
>
> If I read this right, the value is dependent on the link, not the
> program. In which case:

You can see it both ways. BPF link in this (and at least few other
cases) is just this invisible orchestrator of BPF program
attachment/detachment. The underlying perf_event subsystem doesn't
know about the existence of the BPF link at all. In the end, it's
actually struct bpf_prog that is added to perf_event or into tp's
bpf_prog_array list, and this user-provided value (bpf cookie per
below) is associated with that particular attachment. So when we call
trace_call_bpf() from tracepoint or kprobe/uprobe, there is no BPF
link anywhere, it's just a list of bpf_prog_array_items, with bpf_prog
pointer and associated user value. Note, exactly the same bpf_prog can
be attached to another perf_event with a completely different cookie
and that's expected and is fine.

So in short, perf_event just needs to know about attaching/detaching
bpf_prog pointer (and this cookie), it doesn't need to know about
bpf_link. Everything is handled the same regardless if bpf_link is
used to attach or ioctl(PERF_EVENT_IOC_SET_BPF).

>
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 2d510ad750ed..97ab46802800 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -762,6 +762,7 @@ struct perf_event {
> >  #ifdef CONFIG_BPF_SYSCALL
> >       perf_overflow_handler_t         orig_overflow_handler;
> >       struct bpf_prog                 *prog;
> > +     u64                             user_ctx;
> >  #endif
> >
> >  #ifdef CONFIG_EVENT_TRACING
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index 8ac92560d3a3..4543852f1480 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
> >
> >  #ifdef CONFIG_BPF_EVENTS
> >  unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
> > -int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> > +int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx);
>
> This API would be misleading, because it is about setting the program.

Answered above, here perf_event just provides a low-level internal API
for attaching bpf_prog with associated value. BPF link is a
higher-level invisible concept as far as perf_event is concerned.

>
> >  void perf_event_detach_bpf_prog(struct perf_event *event);
> >  int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> >  int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
>
> > @@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog
> >       }
> >
> >       event->prog = prog;
> > +     event->user_ctx = user_ctx;
> >       event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> >       WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
> >       return 0;
>
> Also, the name @user_ctx is a bit confusing. Would something like
> @bpf_cookie or somesuch not be a better name?

I struggled to come up with a good name, user_ctx was the best I could
do. But I do like bpf_cookie for this, thank you! I'll switch the
terminology in the next revision.

>
> Combined would it not make more sense to add something like:
>
> extern int perf_event_set_bpf_cookie(struct perf_event *event, u64 cookie);

Passing that user_ctx along the bpf_prog makes it clear that they go
together and user_ctx is immutable once set. I don't actually plan to
allow updating this cookie value.

>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-27 21:09     ` Andrii Nakryiko
@ 2021-07-28  8:58       ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-07-28  8:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Tue, Jul 27, 2021 at 02:09:08PM -0700, Andrii Nakryiko wrote:
> On Tue, Jul 27, 2021 at 2:14 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Jul 26, 2021 at 09:12:02AM -0700, Andrii Nakryiko wrote:
> > > Add ability for users to specify custom u64 value when creating BPF link for
> > > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
> >
> > If I read this right, the value is dependent on the link, not the
> > program. In which case:
> 
> You can see it both ways. BPF link in this (and at least few other
> cases) is just this invisible orchestrator of BPF program
> attachment/detachment. The underlying perf_event subsystem doesn't
> know about the existence of the BPF link at all. In the end, it's
> actually struct bpf_prog that is added to perf_event or into tp's
> bpf_prog_array list, and this user-provided value (bpf cookie per
> below) is associated with that particular attachment. So when we call
> trace_call_bpf() from tracepoint or kprobe/uprobe, there is no BPF
> link anywhere, it's just a list of bpf_prog_array_items, with bpf_prog
> pointer and associated user value. Note, exactly the same bpf_prog can
> be attached to another perf_event with a completely different cookie
> and that's expected and is fine.
> 
> So in short, perf_event just needs to know about attaching/detaching
> bpf_prog pointer (and this cookie), it doesn't need to know about
> bpf_link. Everything is handled the same regardless if bpf_link is
> used to attach or ioctl(PERF_EVENT_IOC_SET_BPF).

OK, fair enough I suppose.

> > > @@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog
> > >       }
> > >
> > >       event->prog = prog;
> > > +     event->user_ctx = user_ctx;
> > >       event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> > >       WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
> > >       return 0;
> >
> > Also, the name @user_ctx is a bit confusing. Would something like
> > @bpf_cookie or somesuch not be a better name?
> 
> I struggled to come up with a good name, user_ctx was the best I could
> do. But I do like bpf_cookie for this, thank you! I'll switch the
> terminology in the next revision.

y/w :-)


Thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function
  2021-07-26 16:11 ` [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
@ 2021-07-29 16:49   ` Yonghong Song
  2021-07-30  4:05     ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Yonghong Song @ 2021-07-29 16:49 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/26/21 9:11 AM, Andrii Nakryiko wrote:
> Turn BPF_PROG_RUN into a proper always inlined function. No functional and
> performance changes are intended, but it makes it much easier to understand
> what's going on with how BPF programs are actually get executed. It's more
> obvious what types and callbacks are expected. Also extra () around input
> parameters can be dropped, as well as `__` variable prefixes intended to avoid
> naming collisions, which makes the code simpler to read and write.
> 
> This refactoring also highlighted one possible issue. BPF_PROG_RUN is both
> a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
> BPF_PROG_RUN into a function causes naming conflict compilation error. So
> rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
> bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. To avoid unnecessary code
> churn across many networking calls to BPF_PROG_RUN, #define BPF_PROG_RUN as an
> alias to bpf_prog_run.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>   include/linux/filter.h | 58 +++++++++++++++++++++++++++---------------
>   1 file changed, 37 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index ba36989f711a..e59c97c72233 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -585,25 +585,41 @@ struct sk_filter {
>   
>   DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>   
> -#define __BPF_PROG_RUN(prog, ctx, dfunc)	({			\
> -	u32 __ret;							\
> -	cant_migrate();							\
> -	if (static_branch_unlikely(&bpf_stats_enabled_key)) {		\
> -		struct bpf_prog_stats *__stats;				\
> -		u64 __start = sched_clock();				\
> -		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
> -		__stats = this_cpu_ptr(prog->stats);			\
> -		u64_stats_update_begin(&__stats->syncp);		\
> -		__stats->cnt++;						\
> -		__stats->nsecs += sched_clock() - __start;		\
> -		u64_stats_update_end(&__stats->syncp);			\
> -	} else {							\
> -		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
> -	}								\
> -	__ret; })
> -
> -#define BPF_PROG_RUN(prog, ctx)						\
> -	__BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func)
> +typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
> +					  const struct bpf_insn *insnsi,
> +					  unsigned int (*bpf_func)(const void *,
> +								   const struct bpf_insn *));
> +
> +static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
> +					  const void *ctx,
> +					  bpf_dispatcher_fn dfunc)
> +{
> +	u32 ret;
> +
> +	cant_migrate();
> +	if (static_branch_unlikely(&bpf_stats_enabled_key)) {
> +		struct bpf_prog_stats *stats;
> +		u64 start = sched_clock();
> +
> +		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> +		stats = this_cpu_ptr(prog->stats);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->cnt++;
> +		stats->nsecs += sched_clock() - start;
> +		u64_stats_update_end(&stats->syncp);
> +	} else {
> +		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> +	}
> +	return ret;
> +}
> +
> +static __always_inline u32 bpf_prog_run(const struct bpf_prog *prog, const void *ctx)
> +{
> +	return __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
> +}
> +
> +/* avoids name conflict with BPF_PROG_RUN enum definedi uapi/linux/bpf.h */
> +#define BPF_PROG_RUN bpf_prog_run
>   
>   /*
>    * Use in preemptible and therefore migratable context to make sure that
> @@ -622,7 +638,7 @@ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
>   	u32 ret;
>   
>   	migrate_disable();
> -	ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func);
> +	ret = __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);

This can be replaced with bpf_prog_run(prog, ctx).

>   	migrate_enable();
>   	return ret;
>   }
> @@ -768,7 +784,7 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
>   	 * under local_bh_disable(), which provides the needed RCU protection
>   	 * for accessing map entries.
>   	 */
> -	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +	return __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
>   }
>   
>   void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions
  2021-07-26 16:11 ` [PATCH v2 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
@ 2021-07-29 17:04   ` Yonghong Song
  0 siblings, 0 replies; 43+ messages in thread
From: Yonghong Song @ 2021-07-29 17:04 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/26/21 9:11 AM, Andrii Nakryiko wrote:
> Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
> with all the same readability and maintainability benefits. Making them into
> functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
> functions. Also, explicitly specifying the type of the BPF prog run callback
> required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
> internally to const struct sk_buff.
> 
> Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
> BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
> the differences in bpf_run_ctx used for those two different use cases.
> 
> I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
> struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
> cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
> required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
> everyone.
> 
> The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
> pass-through user-provided context value in the next patch.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input
  2021-07-26 16:12 ` [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
  2021-07-27  8:48   ` Peter Zijlstra
@ 2021-07-29 17:09   ` Yonghong Song
  1 sibling, 0 replies; 43+ messages in thread
From: Yonghong Song @ 2021-07-29 17:09 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
> input argument, which makes it easier to re-use for other internal uses
> (coming up for BPF link in the next patch). BPF program FD is not as
> convenient and in some cases it's not available. So switch to struct bpf_prog,
> move out refcounting outside and let caller do bpf_prog_put() in case of an
> error. This follows the approach of most of the other BPF internal functions.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
                     ` (2 preceding siblings ...)
  2021-07-27 15:40   ` Jiri Olsa
@ 2021-07-29 17:35   ` Yonghong Song
  2021-07-30  4:16     ` Andrii Nakryiko
  3 siblings, 1 reply; 43+ messages in thread
From: Yonghong Song @ 2021-07-29 17:35 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
> BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
> the common BPF link infrastructure, allowing to list all active perf_event
> based attachments, auto-detaching BPF program from perf_event when link's FD
> is closed, get generic BPF link fdinfo/get_info functionality.
> 
> BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
> are currently supported.
> 
> Force-detaching and atomic BPF program updates are not yet implemented, but
> with perf_event-based BPF links we now have common framework for this without
> the need to extend ioctl()-based perf_event interface.
> 
> One interesting consideration is a new value for bpf_attach_type, which
> BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
> bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
> bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
> BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
> program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
> mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
> define a single BPF_PERF_EVENT attach type for all of them and adjust
> link_create()'s logic for checking correspondence between attach type and
> program type.
> 
> The alternative would be to define three new attach types (e.g., BPF_KPROBE,
> BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
> and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
> libbpf. I chose to not do this to avoid unnecessary proliferation of
> bpf_attach_type enum values and not have to deal with naming conflicts.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>   include/linux/bpf_types.h      |   3 +
>   include/linux/trace_events.h   |   3 +
>   include/uapi/linux/bpf.h       |   2 +
>   kernel/bpf/syscall.c           | 105 ++++++++++++++++++++++++++++++---
>   kernel/events/core.c           |  10 ++--
>   tools/include/uapi/linux/bpf.h |   2 +
>   6 files changed, 112 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index a9db1eae6796..0a1ada7f174d 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
>   #ifdef CONFIG_NET
>   BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
>   #endif
> +#ifdef CONFIG_PERF_EVENTS
> +BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> +#endif
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index ad413b382a3c..8ac92560d3a3 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
>   void perf_trace_buf_update(void *record, u16 type);
>   void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>   
> +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> +void perf_event_free_bpf_prog(struct perf_event *event);
> +
>   void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>   void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
>   void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2db6925e04f4..00b1267ab4f0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -993,6 +993,7 @@ enum bpf_attach_type {
>   	BPF_SK_SKB_VERDICT,
>   	BPF_SK_REUSEPORT_SELECT,
>   	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> +	BPF_PERF_EVENT,
>   	__MAX_BPF_ATTACH_TYPE
>   };
>   
> @@ -1006,6 +1007,7 @@ enum bpf_link_type {
>   	BPF_LINK_TYPE_ITER = 4,
>   	BPF_LINK_TYPE_NETNS = 5,
>   	BPF_LINK_TYPE_XDP = 6,
> +	BPF_LINK_TYPE_PERF_EVENT = 6,

As Jiri has pointed out, BPF_LINK_TYPE_PERF_EVENT = 7.

>   
>   	MAX_BPF_LINK_TYPE,
>   };
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 9a2068e39d23..80c03bedd6e6 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2906,6 +2906,79 @@ static const struct bpf_link_ops bpf_raw_tp_link_lops = {
>   	.fill_link_info = bpf_raw_tp_link_fill_link_info,
>   };
>   
> +#ifdef CONFIG_PERF_EVENTS
> +struct bpf_perf_link {
> +	struct bpf_link link;
> +	struct file *perf_file;
> +};
> +
> +static void bpf_perf_link_release(struct bpf_link *link)
> +{
> +	struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
> +	struct perf_event *event = perf_link->perf_file->private_data;
> +
> +	perf_event_free_bpf_prog(event);
> +	fput(perf_link->perf_file);
> +}
> +
> +static void bpf_perf_link_dealloc(struct bpf_link *link)
> +{
> +	struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
> +
> +	kfree(perf_link);
> +}
> +
> +static const struct bpf_link_ops bpf_perf_link_lops = {
> +	.release = bpf_perf_link_release,
> +	.dealloc = bpf_perf_link_dealloc,
> +};
> +
> +static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	struct bpf_link_primer link_primer;
> +	struct bpf_perf_link *link;
> +	struct perf_event *event;
> +	struct file *perf_file;
> +	int err;
> +
> +	if (attr->link_create.flags)
> +		return -EINVAL;
> +
> +	perf_file = perf_event_get(attr->link_create.target_fd);
> +	if (IS_ERR(perf_file))
> +		return PTR_ERR(perf_file);
> +
> +	link = kzalloc(sizeof(*link), GFP_USER);

add __GFP_NOWARN flag?

> +	if (!link) {
> +		err = -ENOMEM;
> +		goto out_put_file;
> +	}
> +	bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog);
> +	link->perf_file = perf_file;
> +
> +	err = bpf_link_prime(&link->link, &link_primer);
> +	if (err) {
> +		kfree(link);
> +		goto out_put_file;
> +	}
> +
> +	event = perf_file->private_data;
> +	err = perf_event_set_bpf_prog(event, prog);
> +	if (err) {
> +		bpf_link_cleanup(&link_primer);

Do you need kfree(link) here?

> +		goto out_put_file;
> +	}
> +	/* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */
> +	bpf_prog_inc(prog);
> +
> +	return bpf_link_settle(&link_primer);
> +
> +out_put_file:
> +	fput(perf_file);
> +	return err;
> +}
> +#endif /* CONFIG_PERF_EVENTS */
> +
>   #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
>   
[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-26 16:12 ` [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links Andrii Nakryiko
  2021-07-27  9:11   ` Peter Zijlstra
@ 2021-07-29 18:00   ` Yonghong Song
  2021-07-30  4:31     ` Andrii Nakryiko
  1 sibling, 1 reply; 43+ messages in thread
From: Yonghong Song @ 2021-07-29 18:00 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> Add ability for users to specify custom u64 value when creating BPF link for
> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
> 
> This is useful for cases when the same BPF program is used for attaching and
> processing invocation of different tracepoints/kprobes/uprobes in a generic
> fashion, but such that each invocation is distinguished from each other (e.g.,
> BPF program can look up additional information associated with a specific
> kernel function without having to rely on function IP lookups). This enables
> new use cases to be implemented simply and efficiently that previously were
> possible only through code generation (and thus multiple instances of almost
> identical BPF program) or compilation at runtime (BCC-style) on target hosts
> (even more expensive resource-wise). For uprobes it is not even possible in
> some cases to know function IP before hand (e.g., when attaching to shared
> library without PID filtering, in which case base load address is not known
> for a library).
> 
> This is done by storing u64 user_ctx in struct bpf_prog_array_item,
> corresponding to each attached and run BPF program. Given cgroup BPF programs
> already use 2 8-byte pointers for their needs and cgroup BPF programs don't
> have (yet?) support for user_ctx, reuse that space through union of
> cgroup_storage and new user_ctx field.
> 
> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> program execution code, which luckily is now also split from
> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> giving access to this user context value from inside a BPF program. Generic
> perf_event BPF programs will access this value from perf_event itself through
> passed in BPF program context.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>   drivers/media/rc/bpf-lirc.c    |  4 ++--
>   include/linux/bpf.h            | 16 +++++++++++++++-
>   include/linux/perf_event.h     |  1 +
>   include/linux/trace_events.h   |  6 +++---
>   include/uapi/linux/bpf.h       |  7 +++++++
>   kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
>   kernel/bpf/syscall.c           |  2 +-
>   kernel/events/core.c           | 21 ++++++++++++++-------
>   kernel/trace/bpf_trace.c       |  8 +++++---
>   tools/include/uapi/linux/bpf.h |  7 +++++++
>   10 files changed, 73 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
> index afae0afe3f81..7490494273e4 100644
> --- a/drivers/media/rc/bpf-lirc.c
> +++ b/drivers/media/rc/bpf-lirc.c
> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
>   		goto unlock;
>   	}
>   
> -	ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
> +	ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
>   	if (ret < 0)
>   		goto unlock;
>   
[...]
>   void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 00b1267ab4f0..bc1fd54a8f58 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1448,6 +1448,13 @@ union bpf_attr {
>   				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
>   				__u32		iter_info_len;	/* iter_info length */
>   			};
> +			struct {
> +				/* black box user-provided value passed through
> +				 * to BPF program at the execution time and
> +				 * accessible through bpf_get_user_ctx() BPF helper
> +				 */
> +				__u64		user_ctx;
> +			} perf_event;

Is it possible to fold this field into previous union?

                 union {
                         __u32           target_btf_id;  /* btf_id of 
target to attach to */
                         struct {
                                 __aligned_u64   iter_info;      /* 
extra bpf_iter_link_info */
                                 __u32           iter_info_len;  /* 
iter_info length */
                         };
                 };


>   		};
>   	} link_create;
>   
[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value
  2021-07-26 16:12 ` [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value Andrii Nakryiko
@ 2021-07-29 18:17   ` Yonghong Song
  2021-07-30  4:49     ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Yonghong Song @ 2021-07-29 18:17 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> Add new BPF helper, bpf_get_user_ctx(), which can be used by BPF programs to
> get access to the user_ctx value, specified during BPF program attachment (BPF
> link creation) time.
> 
> Currently all perf_event-backed BPF program types support bpf_get_user_ctx()
> helper. Follow-up patches will add support for fentry/fexit programs as well.
> 
> While at it, mark bpf_tracing_func_proto() as static to make it obvious that
> it's only used from within the kernel/trace/bpf_trace.c.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>   include/linux/bpf.h            |  3 ---
>   include/uapi/linux/bpf.h       | 16 ++++++++++++++++
>   kernel/trace/bpf_trace.c       | 35 +++++++++++++++++++++++++++++++++-
>   tools/include/uapi/linux/bpf.h | 16 ++++++++++++++++
>   4 files changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 74b35faf0b73..94ebedc1e13a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2110,9 +2110,6 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
>   extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
>   extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
>   
> -const struct bpf_func_proto *bpf_tracing_func_proto(
> -	enum bpf_func_id func_id, const struct bpf_prog *prog);
> -
>   const struct bpf_func_proto *tracing_prog_func_proto(
>     enum bpf_func_id func_id, const struct bpf_prog *prog);
>   
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index bc1fd54a8f58..96afeced3467 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -4856,6 +4856,21 @@ union bpf_attr {
>    * 		Get address of the traced function (for tracing and kprobe programs).
>    * 	Return
>    * 		Address of the traced function.
> + *
> + * u64 bpf_get_user_ctx(void *ctx)
> + * 	Description
> + * 		Get user_ctx value provided (optionally) during the program
> + * 		attachment. It might be different for each individual
> + * 		attachment, even if BPF program itself is the same.
> + * 		Expects BPF program context *ctx* as a first argument.
> + *
> + * 		Supported for the following program types:
> + *			- kprobe/uprobe;
> + *			- tracepoint;
> + *			- perf_event.

I think it is possible in the future we may need to support more
program types with user_ctx, not just u64 but more than 64bit value. 
Should we may make this helper extensible like
     long bpf_get_user_ctx(void *ctx, void *user_ctx, u32 user_ctx_len)

The return value will 0 to be good and a negative indicating an error.
What do you think?

> + * 	Return
> + *		Value specified by user at BPF link creation/attachment time
> + *		or 0, if it was not specified.
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -5032,6 +5047,7 @@ union bpf_attr {
>   	FN(timer_start),		\
>   	FN(timer_cancel),		\
>   	FN(get_func_ip),		\
> +	FN(get_user_ctx),		\
>   	/* */
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index c9cf6a0d0fb3..b14978b3f6fb 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -975,7 +975,34 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
>   	.arg1_type	= ARG_PTR_TO_CTX,
>   };
>   
> -const struct bpf_func_proto *
> +BPF_CALL_1(bpf_get_user_ctx_trace, void *, ctx)
> +{
> +	struct bpf_trace_run_ctx *run_ctx;
> +
> +	run_ctx = container_of(current->bpf_ctx, struct bpf_trace_run_ctx, run_ctx);
> +	return run_ctx->user_ctx;
> +}
> +
> +static const struct bpf_func_proto bpf_get_user_ctx_proto_trace = {
> +	.func		= bpf_get_user_ctx_trace,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +};
> +
[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 11/14] libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs
  2021-07-26 16:12 ` [PATCH v2 bpf-next 11/14] libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
@ 2021-07-30  1:11   ` Rafael David Tinoco
  0 siblings, 0 replies; 43+ messages in thread
From: Rafael David Tinoco @ 2021-07-30  1:11 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf

On Mon, Jul 26, 2021, at 13:12, Andrii Nakryiko wrote:
> Wire through user_ctx for all attach APIs that use perf_event_open under the
> hood:
>   - for kprobes, extend existing bpf_kprobe_opts with user_ctx field;
>   - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
>     pass user_ctx through opts.
> 
> For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
> user_ctx is not supported either, return error and log warning for user.
> 
> Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

I think this one is fuzzy in v2. Checking them now for my purposes. Thanks for CC'ing.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function
  2021-07-29 16:49   ` Yonghong Song
@ 2021-07-30  4:05     ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  4:05 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Thu, Jul 29, 2021 at 9:50 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/26/21 9:11 AM, Andrii Nakryiko wrote:
> > Turn BPF_PROG_RUN into a proper always inlined function. No functional and
> > performance changes are intended, but it makes it much easier to understand
> > what's going on with how BPF programs are actually get executed. It's more
> > obvious what types and callbacks are expected. Also extra () around input
> > parameters can be dropped, as well as `__` variable prefixes intended to avoid
> > naming collisions, which makes the code simpler to read and write.
> >
> > This refactoring also highlighted one possible issue. BPF_PROG_RUN is both
> > a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
> > BPF_PROG_RUN into a function causes naming conflict compilation error. So
> > rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
> > bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. To avoid unnecessary code
> > churn across many networking calls to BPF_PROG_RUN, #define BPF_PROG_RUN as an
> > alias to bpf_prog_run.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >   include/linux/filter.h | 58 +++++++++++++++++++++++++++---------------
> >   1 file changed, 37 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index ba36989f711a..e59c97c72233 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -585,25 +585,41 @@ struct sk_filter {
> >
> >   DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> >
> > -#define __BPF_PROG_RUN(prog, ctx, dfunc)     ({                      \
> > -     u32 __ret;                                                      \
> > -     cant_migrate();                                                 \
> > -     if (static_branch_unlikely(&bpf_stats_enabled_key)) {           \
> > -             struct bpf_prog_stats *__stats;                         \
> > -             u64 __start = sched_clock();                            \
> > -             __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);   \
> > -             __stats = this_cpu_ptr(prog->stats);                    \
> > -             u64_stats_update_begin(&__stats->syncp);                \
> > -             __stats->cnt++;                                         \
> > -             __stats->nsecs += sched_clock() - __start;              \
> > -             u64_stats_update_end(&__stats->syncp);                  \
> > -     } else {                                                        \
> > -             __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);   \
> > -     }                                                               \
> > -     __ret; })
> > -
> > -#define BPF_PROG_RUN(prog, ctx)                                              \
> > -     __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func)
> > +typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
> > +                                       const struct bpf_insn *insnsi,
> > +                                       unsigned int (*bpf_func)(const void *,
> > +                                                                const struct bpf_insn *));
> > +
> > +static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
> > +                                       const void *ctx,
> > +                                       bpf_dispatcher_fn dfunc)
> > +{
> > +     u32 ret;
> > +
> > +     cant_migrate();
> > +     if (static_branch_unlikely(&bpf_stats_enabled_key)) {
> > +             struct bpf_prog_stats *stats;
> > +             u64 start = sched_clock();
> > +
> > +             ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > +             stats = this_cpu_ptr(prog->stats);
> > +             u64_stats_update_begin(&stats->syncp);
> > +             stats->cnt++;
> > +             stats->nsecs += sched_clock() - start;
> > +             u64_stats_update_end(&stats->syncp);
> > +     } else {
> > +             ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > +     }
> > +     return ret;
> > +}
> > +
> > +static __always_inline u32 bpf_prog_run(const struct bpf_prog *prog, const void *ctx)
> > +{
> > +     return __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
> > +}
> > +
> > +/* avoids name conflict with BPF_PROG_RUN enum definedi uapi/linux/bpf.h */
> > +#define BPF_PROG_RUN bpf_prog_run
> >
> >   /*
> >    * Use in preemptible and therefore migratable context to make sure that
> > @@ -622,7 +638,7 @@ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
> >       u32 ret;
> >
> >       migrate_disable();
> > -     ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func);
> > +     ret = __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
>
> This can be replaced with bpf_prog_run(prog, ctx).
>

ok, sure

> >       migrate_enable();
> >       return ret;
> >   }
> > @@ -768,7 +784,7 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
> >        * under local_bh_disable(), which provides the needed RCU protection
> >        * for accessing map entries.
> >        */
> > -     return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> > +     return __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> >   }
> >
> >   void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-29 17:35   ` Yonghong Song
@ 2021-07-30  4:16     ` Andrii Nakryiko
  2021-07-30  5:42       ` Yonghong Song
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  4:16 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Thu, Jul 29, 2021 at 10:36 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> > Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
> > BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
> > the common BPF link infrastructure, allowing to list all active perf_event
> > based attachments, auto-detaching BPF program from perf_event when link's FD
> > is closed, get generic BPF link fdinfo/get_info functionality.
> >
> > BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
> > are currently supported.
> >
> > Force-detaching and atomic BPF program updates are not yet implemented, but
> > with perf_event-based BPF links we now have common framework for this without
> > the need to extend ioctl()-based perf_event interface.
> >
> > One interesting consideration is a new value for bpf_attach_type, which
> > BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
> > bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
> > bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
> > BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
> > program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
> > mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
> > define a single BPF_PERF_EVENT attach type for all of them and adjust
> > link_create()'s logic for checking correspondence between attach type and
> > program type.
> >
> > The alternative would be to define three new attach types (e.g., BPF_KPROBE,
> > BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
> > and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
> > libbpf. I chose to not do this to avoid unnecessary proliferation of
> > bpf_attach_type enum values and not have to deal with naming conflicts.
> >
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >   include/linux/bpf_types.h      |   3 +
> >   include/linux/trace_events.h   |   3 +
> >   include/uapi/linux/bpf.h       |   2 +
> >   kernel/bpf/syscall.c           | 105 ++++++++++++++++++++++++++++++---
> >   kernel/events/core.c           |  10 ++--
> >   tools/include/uapi/linux/bpf.h |   2 +
> >   6 files changed, 112 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index a9db1eae6796..0a1ada7f174d 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
> >   #ifdef CONFIG_NET
> >   BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
> >   #endif
> > +#ifdef CONFIG_PERF_EVENTS
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> > +#endif
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index ad413b382a3c..8ac92560d3a3 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
> >   void perf_trace_buf_update(void *record, u16 type);
> >   void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
> >
> > +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> > +void perf_event_free_bpf_prog(struct perf_event *event);
> > +
> >   void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> >   void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
> >   void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 2db6925e04f4..00b1267ab4f0 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -993,6 +993,7 @@ enum bpf_attach_type {
> >       BPF_SK_SKB_VERDICT,
> >       BPF_SK_REUSEPORT_SELECT,
> >       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > +     BPF_PERF_EVENT,
> >       __MAX_BPF_ATTACH_TYPE
> >   };
> >
> > @@ -1006,6 +1007,7 @@ enum bpf_link_type {
> >       BPF_LINK_TYPE_ITER = 4,
> >       BPF_LINK_TYPE_NETNS = 5,
> >       BPF_LINK_TYPE_XDP = 6,
> > +     BPF_LINK_TYPE_PERF_EVENT = 6,
>
> As Jiri has pointed out, BPF_LINK_TYPE_PERF_EVENT = 7.

yep, fixed

>
> >
> >       MAX_BPF_LINK_TYPE,
> >   };
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 9a2068e39d23..80c03bedd6e6 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2906,6 +2906,79 @@ static const struct bpf_link_ops bpf_raw_tp_link_lops = {
> >       .fill_link_info = bpf_raw_tp_link_fill_link_info,
> >   };
> >
> > +#ifdef CONFIG_PERF_EVENTS
> > +struct bpf_perf_link {
> > +     struct bpf_link link;
> > +     struct file *perf_file;
> > +};
> > +
> > +static void bpf_perf_link_release(struct bpf_link *link)
> > +{
> > +     struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
> > +     struct perf_event *event = perf_link->perf_file->private_data;
> > +
> > +     perf_event_free_bpf_prog(event);
> > +     fput(perf_link->perf_file);
> > +}
> > +
> > +static void bpf_perf_link_dealloc(struct bpf_link *link)
> > +{
> > +     struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
> > +
> > +     kfree(perf_link);
> > +}
> > +
> > +static const struct bpf_link_ops bpf_perf_link_lops = {
> > +     .release = bpf_perf_link_release,
> > +     .dealloc = bpf_perf_link_dealloc,
> > +};
> > +
> > +static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > +{
> > +     struct bpf_link_primer link_primer;
> > +     struct bpf_perf_link *link;
> > +     struct perf_event *event;
> > +     struct file *perf_file;
> > +     int err;
> > +
> > +     if (attr->link_create.flags)
> > +             return -EINVAL;
> > +
> > +     perf_file = perf_event_get(attr->link_create.target_fd);
> > +     if (IS_ERR(perf_file))
> > +             return PTR_ERR(perf_file);
> > +
> > +     link = kzalloc(sizeof(*link), GFP_USER);
>
> add __GFP_NOWARN flag?

I looked at few other bpf_link_alloc places in this file, they don't
use NOWARN flag. I think the idea with NOWARN flag is to avoid memory
alloc warnings when amount of allocated memory depends on
user-specified parameter (like the size of the map value). In this
case it's just a single fixed-size kernel object, so while users can
create lots of them, each is fixed in size. It's similar as any other
kernel object (e.g., struct file). So I think it's good as is.

>
> > +     if (!link) {
> > +             err = -ENOMEM;
> > +             goto out_put_file;
> > +     }
> > +     bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog);
> > +     link->perf_file = perf_file;
> > +
> > +     err = bpf_link_prime(&link->link, &link_primer);
> > +     if (err) {
> > +             kfree(link);
> > +             goto out_put_file;
> > +     }
> > +
> > +     event = perf_file->private_data;
> > +     err = perf_event_set_bpf_prog(event, prog);
> > +     if (err) {
> > +             bpf_link_cleanup(&link_primer);
>
> Do you need kfree(link) here?

bpf_link_cleanup() will call kfree() in deferred fashion. This is due
to bpf_link_prime() allocating anon_inode file internally, so it needs
to be freed carefully and that's what bpf_link_cleanup() is for.

>
> > +             goto out_put_file;
> > +     }
> > +     /* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */
> > +     bpf_prog_inc(prog);
> > +
> > +     return bpf_link_settle(&link_primer);
> > +
> > +out_put_file:
> > +     fput(perf_file);
> > +     return err;
> > +}
> > +#endif /* CONFIG_PERF_EVENTS */
> > +
> >   #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
> >
> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-27  9:04   ` Peter Zijlstra
@ 2021-07-30  4:23     ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  4:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Tue, Jul 27, 2021 at 2:04 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jul 26, 2021 at 09:12:01AM -0700, Andrii Nakryiko wrote:
> > Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
> > BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
> > the common BPF link infrastructure, allowing to list all active perf_event
> > based attachments, auto-detaching BPF program from perf_event when link's FD
> > is closed, get generic BPF link fdinfo/get_info functionality.
> >
> > BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
> > are currently supported.
> >
> > Force-detaching and atomic BPF program updates are not yet implemented, but
> > with perf_event-based BPF links we now have common framework for this without
> > the need to extend ioctl()-based perf_event interface.
> >
> > One interesting consideration is a new value for bpf_attach_type, which
> > BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
> > bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
> > bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
> > BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
> > program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
> > mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
> > define a single BPF_PERF_EVENT attach type for all of them and adjust
> > link_create()'s logic for checking correspondence between attach type and
> > program type.
> >
> > The alternative would be to define three new attach types (e.g., BPF_KPROBE,
> > BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
> > and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
> > libbpf. I chose to not do this to avoid unnecessary proliferation of
> > bpf_attach_type enum values and not have to deal with naming conflicts.
> >
>
> So I have no idea what all that means... I don't speak BPF. That said,
> the patch doesn't look terrible.
>
> One little question below, but otherwise:
>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> > +static void bpf_perf_link_release(struct bpf_link *link)
> > +{
> > +     struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
> > +     struct perf_event *event = perf_link->perf_file->private_data;
> > +
> > +     perf_event_free_bpf_prog(event);
> > +     fput(perf_link->perf_file);
> > +}
>
> > +static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > +{
> > +     struct bpf_link_primer link_primer;
> > +     struct bpf_perf_link *link;
> > +     struct perf_event *event;
> > +     struct file *perf_file;
> > +     int err;
> > +
> > +     if (attr->link_create.flags)
> > +             return -EINVAL;
> > +
> > +     perf_file = perf_event_get(attr->link_create.target_fd);
> > +     if (IS_ERR(perf_file))
> > +             return PTR_ERR(perf_file);
> > +
> > +     link = kzalloc(sizeof(*link), GFP_USER);
> > +     if (!link) {
> > +             err = -ENOMEM;
> > +             goto out_put_file;
> > +     }
> > +     bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog);
> > +     link->perf_file = perf_file;
> > +
> > +     err = bpf_link_prime(&link->link, &link_primer);
> > +     if (err) {
> > +             kfree(link);
> > +             goto out_put_file;
> > +     }
> > +
> > +     event = perf_file->private_data;
> > +     err = perf_event_set_bpf_prog(event, prog);
> > +     if (err) {
> > +             bpf_link_cleanup(&link_primer);
> > +             goto out_put_file;
> > +     }
> > +     /* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */
>
> Is that otherwise expected? AFAICT the previous users of that function
> were guaranteed the existance of the BPF program. But afaict there is
> nothing that prevents perf_event_*_bpf_prog() from doing the addition
> refcounting if that is more convenient.

Sorry, I missed this on my last pass. Yes, it's expected. The general
convention we use for BPF when passing bpf_prog (and bpf_map and other
objects like that) is that the caller already has an incremented
refcnt before calling callee. If callee succeeds, that refcnt is
"transferred" into the caller (so callee doesn't increment it, caller
doesn't put it). If callee errors out, caller is decrementing refcnt
after necessary clean up, but callee does nothing. While asymmetrical,
in practice it results in a simple and straightforward  error handling
logic.

In this case bpf_perf_link_attach() assumes one refcnt from its
caller, but if everything is ok and perf_event_set_bpf_prog()
succeeds, we need to keep 2 refcnts: one for bpf_link and one for
perf_event_set_bpf_prog() internally. So we just bump refcnt one extra
time. I intentionally removed bpf_prog_put() from
perf_event_set_bpf_prog() in the previous patch to make error handling
uniform with the rest of the code and simpler overall.

>
> > +     bpf_prog_inc(prog);
> > +
> > +     return bpf_link_settle(&link_primer);
> > +
> > +out_put_file:
> > +     fput(perf_file);
> > +     return err;
> > +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-29 18:00   ` Yonghong Song
@ 2021-07-30  4:31     ` Andrii Nakryiko
  2021-07-30  5:49       ` Yonghong Song
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  4:31 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> > Add ability for users to specify custom u64 value when creating BPF link for
> > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
> >
> > This is useful for cases when the same BPF program is used for attaching and
> > processing invocation of different tracepoints/kprobes/uprobes in a generic
> > fashion, but such that each invocation is distinguished from each other (e.g.,
> > BPF program can look up additional information associated with a specific
> > kernel function without having to rely on function IP lookups). This enables
> > new use cases to be implemented simply and efficiently that previously were
> > possible only through code generation (and thus multiple instances of almost
> > identical BPF program) or compilation at runtime (BCC-style) on target hosts
> > (even more expensive resource-wise). For uprobes it is not even possible in
> > some cases to know function IP before hand (e.g., when attaching to shared
> > library without PID filtering, in which case base load address is not known
> > for a library).
> >
> > This is done by storing u64 user_ctx in struct bpf_prog_array_item,
> > corresponding to each attached and run BPF program. Given cgroup BPF programs
> > already use 2 8-byte pointers for their needs and cgroup BPF programs don't
> > have (yet?) support for user_ctx, reuse that space through union of
> > cgroup_storage and new user_ctx field.
> >
> > Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> > This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> > program execution code, which luckily is now also split from
> > BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> > giving access to this user context value from inside a BPF program. Generic
> > perf_event BPF programs will access this value from perf_event itself through
> > passed in BPF program context.
> >
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >   drivers/media/rc/bpf-lirc.c    |  4 ++--
> >   include/linux/bpf.h            | 16 +++++++++++++++-
> >   include/linux/perf_event.h     |  1 +
> >   include/linux/trace_events.h   |  6 +++---
> >   include/uapi/linux/bpf.h       |  7 +++++++
> >   kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
> >   kernel/bpf/syscall.c           |  2 +-
> >   kernel/events/core.c           | 21 ++++++++++++++-------
> >   kernel/trace/bpf_trace.c       |  8 +++++---
> >   tools/include/uapi/linux/bpf.h |  7 +++++++
> >   10 files changed, 73 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
> > index afae0afe3f81..7490494273e4 100644
> > --- a/drivers/media/rc/bpf-lirc.c
> > +++ b/drivers/media/rc/bpf-lirc.c
> > @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
> >               goto unlock;
> >       }
> >
> > -     ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
> > +     ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
> >       if (ret < 0)
> >               goto unlock;
> >
> [...]
> >   void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 00b1267ab4f0..bc1fd54a8f58 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1448,6 +1448,13 @@ union bpf_attr {
> >                               __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
> >                               __u32           iter_info_len;  /* iter_info length */
> >                       };
> > +                     struct {
> > +                             /* black box user-provided value passed through
> > +                              * to BPF program at the execution time and
> > +                              * accessible through bpf_get_user_ctx() BPF helper
> > +                              */
> > +                             __u64           user_ctx;
> > +                     } perf_event;
>
> Is it possible to fold this field into previous union?
>
>                  union {
>                          __u32           target_btf_id;  /* btf_id of
> target to attach to */
>                          struct {
>                                  __aligned_u64   iter_info;      /*
> extra bpf_iter_link_info */
>                                  __u32           iter_info_len;  /*
> iter_info length */
>                          };
>                  };
>
>

I didn't want to do it, because different types of BPF links will
accept this user_ctx (or now bpf_cookie). And then we'll have to have
different locations of that field for different types of links.

For example, when/if we add this user_ctx to BPF iterator programs,
having __u64 user_ctx in the same anonymous union will make it overlap
with iter_info, which is a problem. So I want to have a link
type-specific sections in LINK_CREATE command section, to allow the
same field name at different locations.

I actually think that we should put iter_info/iter_info_len into a
named field, like this (also added user_ctx for bpf_iter link as a
demonstration):

struct {
    __aligned_u64 info;
    __u32         info_len;
    __aligned_u64 user_ctx;  /* see how it's at a different offset
than perf_event.user_ctx */
} iter;
struct {
    __u64         user_ctx;
} perf_event;

(of course keeping already existing fields in anonymous struct for
backwards compatibility)

I decided to not do that in this patch set, though, to not distract
from the main goal. But I think we should avoid this shared field
"namespace" across different link types going forward.


> >               };
> >       } link_create;
> >
> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value
  2021-07-29 18:17   ` Yonghong Song
@ 2021-07-30  4:49     ` Andrii Nakryiko
  2021-07-30  5:53       ` Yonghong Song
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  4:49 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Thu, Jul 29, 2021 at 11:17 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> > Add new BPF helper, bpf_get_user_ctx(), which can be used by BPF programs to
> > get access to the user_ctx value, specified during BPF program attachment (BPF
> > link creation) time.
> >
> > Currently all perf_event-backed BPF program types support bpf_get_user_ctx()
> > helper. Follow-up patches will add support for fentry/fexit programs as well.
> >
> > While at it, mark bpf_tracing_func_proto() as static to make it obvious that
> > it's only used from within the kernel/trace/bpf_trace.c.
> >
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >   include/linux/bpf.h            |  3 ---
> >   include/uapi/linux/bpf.h       | 16 ++++++++++++++++
> >   kernel/trace/bpf_trace.c       | 35 +++++++++++++++++++++++++++++++++-
> >   tools/include/uapi/linux/bpf.h | 16 ++++++++++++++++
> >   4 files changed, 66 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 74b35faf0b73..94ebedc1e13a 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2110,9 +2110,6 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
> >   extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
> >   extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
> >
> > -const struct bpf_func_proto *bpf_tracing_func_proto(
> > -     enum bpf_func_id func_id, const struct bpf_prog *prog);
> > -
> >   const struct bpf_func_proto *tracing_prog_func_proto(
> >     enum bpf_func_id func_id, const struct bpf_prog *prog);
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index bc1fd54a8f58..96afeced3467 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -4856,6 +4856,21 @@ union bpf_attr {
> >    *          Get address of the traced function (for tracing and kprobe programs).
> >    *  Return
> >    *          Address of the traced function.
> > + *
> > + * u64 bpf_get_user_ctx(void *ctx)
> > + *   Description
> > + *           Get user_ctx value provided (optionally) during the program
> > + *           attachment. It might be different for each individual
> > + *           attachment, even if BPF program itself is the same.
> > + *           Expects BPF program context *ctx* as a first argument.
> > + *
> > + *           Supported for the following program types:
> > + *                   - kprobe/uprobe;
> > + *                   - tracepoint;
> > + *                   - perf_event.
>
> I think it is possible in the future we may need to support more
> program types with user_ctx, not just u64 but more than 64bit value.
> Should we may make this helper extensible like
>      long bpf_get_user_ctx(void *ctx, void *user_ctx, u32 user_ctx_len)
>
> The return value will 0 to be good and a negative indicating an error.
> What do you think?

I explicitly wanted to keep this user_ctx/bpf_cookie to a small fixed
size. __u64 is perfect because it's small enough to not require
dynamic memory allocation, but big enough to store any kind of index
into an array *or* user-space pointer. So if user needs more storage
than 8 bytes, they will be able to have a bigger array where
user_ctx/bpf_cookie is just an integer index or some sort of key into
hashmap, whichever is more convenient.

So I'd like to keep it lean and simple. It is already powerful enough
to support any scenario, IMO.

>
> > + *   Return
> > + *           Value specified by user at BPF link creation/attachment time
> > + *           or 0, if it was not specified.
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)               \
> >       FN(unspec),                     \
> > @@ -5032,6 +5047,7 @@ union bpf_attr {
> >       FN(timer_start),                \
> >       FN(timer_cancel),               \
> >       FN(get_func_ip),                \
> > +     FN(get_user_ctx),               \
> >       /* */
> >
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index c9cf6a0d0fb3..b14978b3f6fb 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -975,7 +975,34 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
> >       .arg1_type      = ARG_PTR_TO_CTX,
> >   };
> >
> > -const struct bpf_func_proto *
> > +BPF_CALL_1(bpf_get_user_ctx_trace, void *, ctx)
> > +{
> > +     struct bpf_trace_run_ctx *run_ctx;
> > +
> > +     run_ctx = container_of(current->bpf_ctx, struct bpf_trace_run_ctx, run_ctx);
> > +     return run_ctx->user_ctx;
> > +}
> > +
> > +static const struct bpf_func_proto bpf_get_user_ctx_proto_trace = {
> > +     .func           = bpf_get_user_ctx_trace,
> > +     .gpl_only       = false,
> > +     .ret_type       = RET_INTEGER,
> > +     .arg1_type      = ARG_PTR_TO_CTX,
> > +};
> > +
> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-30  4:16     ` Andrii Nakryiko
@ 2021-07-30  5:42       ` Yonghong Song
  0 siblings, 0 replies; 43+ messages in thread
From: Yonghong Song @ 2021-07-30  5:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra



On 7/29/21 9:16 PM, Andrii Nakryiko wrote:
> On Thu, Jul 29, 2021 at 10:36 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
>>> Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
>>> BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
>>> the common BPF link infrastructure, allowing to list all active perf_event
>>> based attachments, auto-detaching BPF program from perf_event when link's FD
>>> is closed, get generic BPF link fdinfo/get_info functionality.
>>>
>>> BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
>>> are currently supported.
>>>
>>> Force-detaching and atomic BPF program updates are not yet implemented, but
>>> with perf_event-based BPF links we now have common framework for this without
>>> the need to extend ioctl()-based perf_event interface.
>>>
>>> One interesting consideration is a new value for bpf_attach_type, which
>>> BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
>>> bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
>>> bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
>>> BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
>>> program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
>>> mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
>>> define a single BPF_PERF_EVENT attach type for all of them and adjust
>>> link_create()'s logic for checking correspondence between attach type and
>>> program type.
>>>
>>> The alternative would be to define three new attach types (e.g., BPF_KPROBE,
>>> BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
>>> and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
>>> libbpf. I chose to not do this to avoid unnecessary proliferation of
>>> bpf_attach_type enum values and not have to deal with naming conflicts.
>>>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>> ---
>>>    include/linux/bpf_types.h      |   3 +
>>>    include/linux/trace_events.h   |   3 +
>>>    include/uapi/linux/bpf.h       |   2 +
>>>    kernel/bpf/syscall.c           | 105 ++++++++++++++++++++++++++++++---
>>>    kernel/events/core.c           |  10 ++--
>>>    tools/include/uapi/linux/bpf.h |   2 +
>>>    6 files changed, 112 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>>> index a9db1eae6796..0a1ada7f174d 100644
>>> --- a/include/linux/bpf_types.h
>>> +++ b/include/linux/bpf_types.h
>>> @@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
>>>    #ifdef CONFIG_NET
>>>    BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
>>>    #endif
>>> +#ifdef CONFIG_PERF_EVENTS
>>> +BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
>>> +#endif
>>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>>> index ad413b382a3c..8ac92560d3a3 100644
>>> --- a/include/linux/trace_events.h
>>> +++ b/include/linux/trace_events.h
>>> @@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
>>>    void perf_trace_buf_update(void *record, u16 type);
>>>    void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>>>
>>> +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
>>> +void perf_event_free_bpf_prog(struct perf_event *event);
>>> +
>>>    void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>>>    void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
>>>    void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 2db6925e04f4..00b1267ab4f0 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -993,6 +993,7 @@ enum bpf_attach_type {
>>>        BPF_SK_SKB_VERDICT,
>>>        BPF_SK_REUSEPORT_SELECT,
>>>        BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
>>> +     BPF_PERF_EVENT,
>>>        __MAX_BPF_ATTACH_TYPE
>>>    };
>>>
>>> @@ -1006,6 +1007,7 @@ enum bpf_link_type {
>>>        BPF_LINK_TYPE_ITER = 4,
>>>        BPF_LINK_TYPE_NETNS = 5,
>>>        BPF_LINK_TYPE_XDP = 6,
>>> +     BPF_LINK_TYPE_PERF_EVENT = 6,
>>
>> As Jiri has pointed out, BPF_LINK_TYPE_PERF_EVENT = 7.
> 
> yep, fixed
> 
>>
>>>
>>>        MAX_BPF_LINK_TYPE,
>>>    };
>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 9a2068e39d23..80c03bedd6e6 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -2906,6 +2906,79 @@ static const struct bpf_link_ops bpf_raw_tp_link_lops = {
>>>        .fill_link_info = bpf_raw_tp_link_fill_link_info,
>>>    };
>>>
>>> +#ifdef CONFIG_PERF_EVENTS
>>> +struct bpf_perf_link {
>>> +     struct bpf_link link;
>>> +     struct file *perf_file;
>>> +};
>>> +
>>> +static void bpf_perf_link_release(struct bpf_link *link)
>>> +{
>>> +     struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
>>> +     struct perf_event *event = perf_link->perf_file->private_data;
>>> +
>>> +     perf_event_free_bpf_prog(event);
>>> +     fput(perf_link->perf_file);
>>> +}
>>> +
>>> +static void bpf_perf_link_dealloc(struct bpf_link *link)
>>> +{
>>> +     struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
>>> +
>>> +     kfree(perf_link);
>>> +}
>>> +
>>> +static const struct bpf_link_ops bpf_perf_link_lops = {
>>> +     .release = bpf_perf_link_release,
>>> +     .dealloc = bpf_perf_link_dealloc,
>>> +};
>>> +
>>> +static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>> +{
>>> +     struct bpf_link_primer link_primer;
>>> +     struct bpf_perf_link *link;
>>> +     struct perf_event *event;
>>> +     struct file *perf_file;
>>> +     int err;
>>> +
>>> +     if (attr->link_create.flags)
>>> +             return -EINVAL;
>>> +
>>> +     perf_file = perf_event_get(attr->link_create.target_fd);
>>> +     if (IS_ERR(perf_file))
>>> +             return PTR_ERR(perf_file);
>>> +
>>> +     link = kzalloc(sizeof(*link), GFP_USER);
>>
>> add __GFP_NOWARN flag?
> 
> I looked at few other bpf_link_alloc places in this file, they don't
> use NOWARN flag. I think the idea with NOWARN flag is to avoid memory
> alloc warnings when amount of allocated memory depends on
> user-specified parameter (like the size of the map value). In this
> case it's just a single fixed-size kernel object, so while users can
> create lots of them, each is fixed in size. It's similar as any other
> kernel object (e.g., struct file). So I think it's good as is.

That is fine. This is really a small struct, unlikely we have issues.

> 
>>
>>> +     if (!link) {
>>> +             err = -ENOMEM;
>>> +             goto out_put_file;
>>> +     }
>>> +     bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog);
>>> +     link->perf_file = perf_file;
>>> +
>>> +     err = bpf_link_prime(&link->link, &link_primer);
>>> +     if (err) {
>>> +             kfree(link);
>>> +             goto out_put_file;
>>> +     }
>>> +
>>> +     event = perf_file->private_data;
>>> +     err = perf_event_set_bpf_prog(event, prog);
>>> +     if (err) {
>>> +             bpf_link_cleanup(&link_primer);
>>
>> Do you need kfree(link) here?
> 
> bpf_link_cleanup() will call kfree() in deferred fashion. This is due
> to bpf_link_prime() allocating anon_inode file internally, so it needs
> to be freed carefully and that's what bpf_link_cleanup() is for.

Looking at the code again, I am able to figure out. Indeed,
kfree(link) is called through file->release().

> 
>>
>>> +             goto out_put_file;
>>> +     }
>>> +     /* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */
>>> +     bpf_prog_inc(prog);
>>> +
>>> +     return bpf_link_settle(&link_primer);
>>> +
>>> +out_put_file:
>>> +     fput(perf_file);
>>> +     return err;
>>> +}
>>> +#endif /* CONFIG_PERF_EVENTS */
>>> +
>>>    #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
>>>
>> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-30  4:31     ` Andrii Nakryiko
@ 2021-07-30  5:49       ` Yonghong Song
  2021-07-30 17:48         ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Yonghong Song @ 2021-07-30  5:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra



On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
> On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
>>> Add ability for users to specify custom u64 value when creating BPF link for
>>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
>>>
>>> This is useful for cases when the same BPF program is used for attaching and
>>> processing invocation of different tracepoints/kprobes/uprobes in a generic
>>> fashion, but such that each invocation is distinguished from each other (e.g.,
>>> BPF program can look up additional information associated with a specific
>>> kernel function without having to rely on function IP lookups). This enables
>>> new use cases to be implemented simply and efficiently that previously were
>>> possible only through code generation (and thus multiple instances of almost
>>> identical BPF program) or compilation at runtime (BCC-style) on target hosts
>>> (even more expensive resource-wise). For uprobes it is not even possible in
>>> some cases to know function IP before hand (e.g., when attaching to shared
>>> library without PID filtering, in which case base load address is not known
>>> for a library).
>>>
>>> This is done by storing u64 user_ctx in struct bpf_prog_array_item,
>>> corresponding to each attached and run BPF program. Given cgroup BPF programs
>>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't
>>> have (yet?) support for user_ctx, reuse that space through union of
>>> cgroup_storage and new user_ctx field.
>>>
>>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
>>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
>>> program execution code, which luckily is now also split from
>>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
>>> giving access to this user context value from inside a BPF program. Generic
>>> perf_event BPF programs will access this value from perf_event itself through
>>> passed in BPF program context.
>>>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>> ---
>>>    drivers/media/rc/bpf-lirc.c    |  4 ++--
>>>    include/linux/bpf.h            | 16 +++++++++++++++-
>>>    include/linux/perf_event.h     |  1 +
>>>    include/linux/trace_events.h   |  6 +++---
>>>    include/uapi/linux/bpf.h       |  7 +++++++
>>>    kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
>>>    kernel/bpf/syscall.c           |  2 +-
>>>    kernel/events/core.c           | 21 ++++++++++++++-------
>>>    kernel/trace/bpf_trace.c       |  8 +++++---
>>>    tools/include/uapi/linux/bpf.h |  7 +++++++
>>>    10 files changed, 73 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
>>> index afae0afe3f81..7490494273e4 100644
>>> --- a/drivers/media/rc/bpf-lirc.c
>>> +++ b/drivers/media/rc/bpf-lirc.c
>>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
>>>                goto unlock;
>>>        }
>>>
>>> -     ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
>>> +     ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
>>>        if (ret < 0)
>>>                goto unlock;
>>>
>> [...]
>>>    void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 00b1267ab4f0..bc1fd54a8f58 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -1448,6 +1448,13 @@ union bpf_attr {
>>>                                __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
>>>                                __u32           iter_info_len;  /* iter_info length */
>>>                        };
>>> +                     struct {
>>> +                             /* black box user-provided value passed through
>>> +                              * to BPF program at the execution time and
>>> +                              * accessible through bpf_get_user_ctx() BPF helper
>>> +                              */
>>> +                             __u64           user_ctx;
>>> +                     } perf_event;
>>
>> Is it possible to fold this field into previous union?
>>
>>                   union {
>>                           __u32           target_btf_id;  /* btf_id of
>> target to attach to */
>>                           struct {
>>                                   __aligned_u64   iter_info;      /*
>> extra bpf_iter_link_info */
>>                                   __u32           iter_info_len;  /*
>> iter_info length */
>>                           };
>>                   };
>>
>>
> 
> I didn't want to do it, because different types of BPF links will
> accept this user_ctx (or now bpf_cookie). And then we'll have to have
> different locations of that field for different types of links.
> 
> For example, when/if we add this user_ctx to BPF iterator programs,
> having __u64 user_ctx in the same anonymous union will make it overlap
> with iter_info, which is a problem. So I want to have a link
> type-specific sections in LINK_CREATE command section, to allow the
> same field name at different locations.
> 
> I actually think that we should put iter_info/iter_info_len into a
> named field, like this (also added user_ctx for bpf_iter link as a
> demonstration):
> 
> struct {
>      __aligned_u64 info;
>      __u32         info_len;
>      __aligned_u64 user_ctx;  /* see how it's at a different offset
> than perf_event.user_ctx */
> } iter;
> struct {
>      __u64         user_ctx;
> } perf_event;
> 
> (of course keeping already existing fields in anonymous struct for
> backwards compatibility)

Okay, then since user_ctx may be used by many link types. How
about just with the field "user_ctx" without struct perf_event.
Sometime like

__u64	user_ctx;

instead of

struct {
	__u64	user_ctx;
} perf_event;

> 
> I decided to not do that in this patch set, though, to not distract
> from the main goal. But I think we should avoid this shared field
> "namespace" across different link types going forward.
> 
> 
>>>                };
>>>        } link_create;
>>>
>> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value
  2021-07-30  4:49     ` Andrii Nakryiko
@ 2021-07-30  5:53       ` Yonghong Song
  0 siblings, 0 replies; 43+ messages in thread
From: Yonghong Song @ 2021-07-30  5:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra



On 7/29/21 9:49 PM, Andrii Nakryiko wrote:
> On Thu, Jul 29, 2021 at 11:17 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
>>> Add new BPF helper, bpf_get_user_ctx(), which can be used by BPF programs to
>>> get access to the user_ctx value, specified during BPF program attachment (BPF
>>> link creation) time.
>>>
>>> Currently all perf_event-backed BPF program types support bpf_get_user_ctx()
>>> helper. Follow-up patches will add support for fentry/fexit programs as well.
>>>
>>> While at it, mark bpf_tracing_func_proto() as static to make it obvious that
>>> it's only used from within the kernel/trace/bpf_trace.c.
>>>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>> ---
>>>    include/linux/bpf.h            |  3 ---
>>>    include/uapi/linux/bpf.h       | 16 ++++++++++++++++
>>>    kernel/trace/bpf_trace.c       | 35 +++++++++++++++++++++++++++++++++-
>>>    tools/include/uapi/linux/bpf.h | 16 ++++++++++++++++
>>>    4 files changed, 66 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index 74b35faf0b73..94ebedc1e13a 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -2110,9 +2110,6 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
>>>    extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
>>>    extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
>>>
>>> -const struct bpf_func_proto *bpf_tracing_func_proto(
>>> -     enum bpf_func_id func_id, const struct bpf_prog *prog);
>>> -
>>>    const struct bpf_func_proto *tracing_prog_func_proto(
>>>      enum bpf_func_id func_id, const struct bpf_prog *prog);
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index bc1fd54a8f58..96afeced3467 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -4856,6 +4856,21 @@ union bpf_attr {
>>>     *          Get address of the traced function (for tracing and kprobe programs).
>>>     *  Return
>>>     *          Address of the traced function.
>>> + *
>>> + * u64 bpf_get_user_ctx(void *ctx)
>>> + *   Description
>>> + *           Get user_ctx value provided (optionally) during the program
>>> + *           attachment. It might be different for each individual
>>> + *           attachment, even if BPF program itself is the same.
>>> + *           Expects BPF program context *ctx* as a first argument.
>>> + *
>>> + *           Supported for the following program types:
>>> + *                   - kprobe/uprobe;
>>> + *                   - tracepoint;
>>> + *                   - perf_event.
>>
>> I think it is possible in the future we may need to support more
>> program types with user_ctx, not just u64 but more than 64bit value.
>> Should we may make this helper extensible like
>>       long bpf_get_user_ctx(void *ctx, void *user_ctx, u32 user_ctx_len)
>>
>> The return value will 0 to be good and a negative indicating an error.
>> What do you think?
> 
> I explicitly wanted to keep this user_ctx/bpf_cookie to a small fixed
> size. __u64 is perfect because it's small enough to not require
> dynamic memory allocation, but big enough to store any kind of index
> into an array *or* user-space pointer. So if user needs more storage
> than 8 bytes, they will be able to have a bigger array where
> user_ctx/bpf_cookie is just an integer index or some sort of key into
> hashmap, whichever is more convenient.

Okay. returning an index to a map is a good idea. This way, indeed a u64 
return value is enough.

> 
> So I'd like to keep it lean and simple. It is already powerful enough
> to support any scenario, IMO.
> 
>>
>>> + *   Return
>>> + *           Value specified by user at BPF link creation/attachment time
>>> + *           or 0, if it was not specified.
>>>     */
>>>    #define __BPF_FUNC_MAPPER(FN)               \
>>>        FN(unspec),                     \
>>> @@ -5032,6 +5047,7 @@ union bpf_attr {
>>>        FN(timer_start),                \
>>>        FN(timer_cancel),               \
>>>        FN(get_func_ip),                \
>>> +     FN(get_user_ctx),               \
>>>        /* */
>>>
>>>    /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index c9cf6a0d0fb3..b14978b3f6fb 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -975,7 +975,34 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
>>>        .arg1_type      = ARG_PTR_TO_CTX,
>>>    };
>>>
>>> -const struct bpf_func_proto *
>>> +BPF_CALL_1(bpf_get_user_ctx_trace, void *, ctx)
>>> +{
>>> +     struct bpf_trace_run_ctx *run_ctx;
>>> +
>>> +     run_ctx = container_of(current->bpf_ctx, struct bpf_trace_run_ctx, run_ctx);
>>> +     return run_ctx->user_ctx;
>>> +}
>>> +
>>> +static const struct bpf_func_proto bpf_get_user_ctx_proto_trace = {
>>> +     .func           = bpf_get_user_ctx_trace,
>>> +     .gpl_only       = false,
>>> +     .ret_type       = RET_INTEGER,
>>> +     .arg1_type      = ARG_PTR_TO_CTX,
>>> +};
>>> +
>> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-30  5:49       ` Yonghong Song
@ 2021-07-30 17:48         ` Andrii Nakryiko
  2021-07-30 21:34           ` Yonghong Song
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-30 17:48 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
> > On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> >>> Add ability for users to specify custom u64 value when creating BPF link for
> >>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
> >>>
> >>> This is useful for cases when the same BPF program is used for attaching and
> >>> processing invocation of different tracepoints/kprobes/uprobes in a generic
> >>> fashion, but such that each invocation is distinguished from each other (e.g.,
> >>> BPF program can look up additional information associated with a specific
> >>> kernel function without having to rely on function IP lookups). This enables
> >>> new use cases to be implemented simply and efficiently that previously were
> >>> possible only through code generation (and thus multiple instances of almost
> >>> identical BPF program) or compilation at runtime (BCC-style) on target hosts
> >>> (even more expensive resource-wise). For uprobes it is not even possible in
> >>> some cases to know function IP before hand (e.g., when attaching to shared
> >>> library without PID filtering, in which case base load address is not known
> >>> for a library).
> >>>
> >>> This is done by storing u64 user_ctx in struct bpf_prog_array_item,
> >>> corresponding to each attached and run BPF program. Given cgroup BPF programs
> >>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't
> >>> have (yet?) support for user_ctx, reuse that space through union of
> >>> cgroup_storage and new user_ctx field.
> >>>
> >>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> >>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> >>> program execution code, which luckily is now also split from
> >>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> >>> giving access to this user context value from inside a BPF program. Generic
> >>> perf_event BPF programs will access this value from perf_event itself through
> >>> passed in BPF program context.
> >>>
> >>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >>> ---
> >>>    drivers/media/rc/bpf-lirc.c    |  4 ++--
> >>>    include/linux/bpf.h            | 16 +++++++++++++++-
> >>>    include/linux/perf_event.h     |  1 +
> >>>    include/linux/trace_events.h   |  6 +++---
> >>>    include/uapi/linux/bpf.h       |  7 +++++++
> >>>    kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
> >>>    kernel/bpf/syscall.c           |  2 +-
> >>>    kernel/events/core.c           | 21 ++++++++++++++-------
> >>>    kernel/trace/bpf_trace.c       |  8 +++++---
> >>>    tools/include/uapi/linux/bpf.h |  7 +++++++
> >>>    10 files changed, 73 insertions(+), 28 deletions(-)
> >>>
> >>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
> >>> index afae0afe3f81..7490494273e4 100644
> >>> --- a/drivers/media/rc/bpf-lirc.c
> >>> +++ b/drivers/media/rc/bpf-lirc.c
> >>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
> >>>                goto unlock;
> >>>        }
> >>>
> >>> -     ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
> >>> +     ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
> >>>        if (ret < 0)
> >>>                goto unlock;
> >>>
> >> [...]
> >>>    void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index 00b1267ab4f0..bc1fd54a8f58 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -1448,6 +1448,13 @@ union bpf_attr {
> >>>                                __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
> >>>                                __u32           iter_info_len;  /* iter_info length */
> >>>                        };
> >>> +                     struct {
> >>> +                             /* black box user-provided value passed through
> >>> +                              * to BPF program at the execution time and
> >>> +                              * accessible through bpf_get_user_ctx() BPF helper
> >>> +                              */
> >>> +                             __u64           user_ctx;
> >>> +                     } perf_event;
> >>
> >> Is it possible to fold this field into previous union?
> >>
> >>                   union {
> >>                           __u32           target_btf_id;  /* btf_id of
> >> target to attach to */
> >>                           struct {
> >>                                   __aligned_u64   iter_info;      /*
> >> extra bpf_iter_link_info */
> >>                                   __u32           iter_info_len;  /*
> >> iter_info length */
> >>                           };
> >>                   };
> >>
> >>
> >
> > I didn't want to do it, because different types of BPF links will
> > accept this user_ctx (or now bpf_cookie). And then we'll have to have
> > different locations of that field for different types of links.
> >
> > For example, when/if we add this user_ctx to BPF iterator programs,
> > having __u64 user_ctx in the same anonymous union will make it overlap
> > with iter_info, which is a problem. So I want to have a link
> > type-specific sections in LINK_CREATE command section, to allow the
> > same field name at different locations.
> >
> > I actually think that we should put iter_info/iter_info_len into a
> > named field, like this (also added user_ctx for bpf_iter link as a
> > demonstration):
> >
> > struct {
> >      __aligned_u64 info;
> >      __u32         info_len;
> >      __aligned_u64 user_ctx;  /* see how it's at a different offset
> > than perf_event.user_ctx */
> > } iter;
> > struct {
> >      __u64         user_ctx;
> > } perf_event;
> >
> > (of course keeping already existing fields in anonymous struct for
> > backwards compatibility)
>
> Okay, then since user_ctx may be used by many link types. How
> about just with the field "user_ctx" without struct perf_event.

I'd love to do it because it is indeed generic and common field, like
target_fd. But I'm not sure what you are proposing below. Where
exactly that user_ctx (now called bpf_cookie) goes in your example? I
see few possible options that allow preserving ABI backwards
compatibility. Let's see if you and everyone else likes any of those
better. I'll use the full LINK_CREATE sub-struct definition from
bpf_attr to make it clear. And to demonstrate how this can be extended
to bpf_iter in the future, please note this part as this is an
important aspect.

1. Full backwards compatibility and per-link type sections (my current
approach):

        struct { /* struct used by BPF_LINK_CREATE command */
                __u32           prog_fd;
                union {
                        __u32           target_fd;
                        __u32           target_ifindex;
                };
                __u32           attach_type;
                __u32           flags;
                union {
                        __u32           target_btf_id;
                        struct {
                                __aligned_u64   iter_info;
                                __u32           iter_info_len;
                        };
                        struct {
                                __u64           bpf_cookie;
                        } perf_event;
                        struct {
                                __aligned_u64   info;
                                __u32           info_len;
                                __aligned_u64   bpf_cookie;
                        } iter;
               };
        } link_create;

The good property here is that we can keep easily extending link
type-specific sections with extra fields where needed. For common
stuff like bpf_cookie it's suboptimal because we'll need to duplicate
field definition in each struct inside that union, but I think that's
fine. From end-user point of view, they will know which type of link
they are creating, so the use will be straightforward. This is why I
went with this approach. But let's consider alternatives.

2. Non-backwards compatible layout but extra flag to specify that new
field layout is used.

        struct { /* struct used by BPF_LINK_CREATE command */
                __u32           prog_fd;
                union {
                        __u32           target_fd;
                        __u32           target_ifindex;
                };
                __u32           attach_type;
                __u32           flags; /* this will start supporting
some new flag like BPF_F_LINK_CREATE_NEW */
                __u64           bpf_cookie; /* common field now */
                union { /* this parts is effectively deprecated now */
                        __u32           target_btf_id;
                        struct {
                                __aligned_u64   iter_info;
                                __u32           iter_info_len;
                        };
                        struct { /* this is new layout, but needs
BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */
                            __u64       bpf_cookie;
                            union {
                                struct {
                                    __u32     target_btf_id;
                                } ext;
                                struct {
                                    __aligned_u64 info;
                                    __u32         info_len;
                                } iter;
                            }
                        }
                };
        } link_create;

This makes bpf_cookie a common field, but at least for EXT (freplace/)
and ITER (bpf_iter/) links we need to specify extra flag to specify
that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter
then will use iter.info and iter.info_len, and can use plain
bpf_cookie.

IMO, this is way too confusing and a maintainability nightmare.

I'm trying to guess what you are proposing, I can read it two ways,
but let me know if I missed something.

3. Just add bpf_cookie field before link type-specific section.

        struct { /* struct used by BPF_LINK_CREATE command */
                __u32           prog_fd;
                union {
                        __u32           target_fd;
                        __u32           target_ifindex;
                };
                __u32           attach_type;
                __u32           flags;
                __u64           bpf_cookie;  // <<<<<<<<<< HERE
                union {
                        __u32           target_btf_id;
                        struct {
                                __aligned_u64   iter_info;
                                __u32           iter_info_len;
                        };
                };
        } link_create;

This looks really nice and would be great, but that changes offsets
for target_btf_id/iter_info/iter_info_len, so a no go. The only way to
rectify this is what proposal #2 above does with an extra flag.

4. Add bpf_cookie after link-type specific part:

        struct { /* struct used by BPF_LINK_CREATE command */
                __u32           prog_fd;
                union {
                        __u32           target_fd;
                        __u32           target_ifindex;
                };
                __u32           attach_type;
                __u32           flags;
                union {
                        __u32           target_btf_id;
                        struct {
                                __aligned_u64   iter_info;
                                __u32           iter_info_len;
                        };
                        struct {
                };
                __u64           bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE
        } link_create;

This could work. But we are wasting 16 bytes currently used for
target_btf_id/iter_info/iter_info_len. If we later need to do
something link type-specific, we can add it to the existing union if
we need <= 16 bytes, otherwise we'll need to start another union after
bpf_cookie, splitting this into two link type-specific sections.

Overall, this might work, especially assuming we won't need to extend
iter-specific portions. But I really hate that we didn't do named
structs inside that union (i.e., ext.target_btf_id and
iter.info/iter.info_len) and I'd like to rectify that in the follow up
patches with named structs duplicating existing field layout, but with
proper naming. But splitting this LINK_CREATE bpf_attr part into two
unions would make it hard and awkward in the future.

So, thoughts? Did you have something else in mind that I missed?


> Sometime like
>
> __u64   user_ctx;
>
> instead of
>
> struct {
>         __u64   user_ctx;
> } perf_event;
>
> >
> > I decided to not do that in this patch set, though, to not distract
> > from the main goal. But I think we should avoid this shared field
> > "namespace" across different link types going forward.
> >
> >
> >>>                };
> >>>        } link_create;
> >>>
> >> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-30 17:48         ` Andrii Nakryiko
@ 2021-07-30 21:34           ` Yonghong Song
  2021-07-30 22:06             ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Yonghong Song @ 2021-07-30 21:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra



On 7/30/21 10:48 AM, Andrii Nakryiko wrote:
> On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
>>> On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
>>>>> Add ability for users to specify custom u64 value when creating BPF link for
>>>>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
>>>>>
>>>>> This is useful for cases when the same BPF program is used for attaching and
>>>>> processing invocation of different tracepoints/kprobes/uprobes in a generic
>>>>> fashion, but such that each invocation is distinguished from each other (e.g.,
>>>>> BPF program can look up additional information associated with a specific
>>>>> kernel function without having to rely on function IP lookups). This enables
>>>>> new use cases to be implemented simply and efficiently that previously were
>>>>> possible only through code generation (and thus multiple instances of almost
>>>>> identical BPF program) or compilation at runtime (BCC-style) on target hosts
>>>>> (even more expensive resource-wise). For uprobes it is not even possible in
>>>>> some cases to know function IP before hand (e.g., when attaching to shared
>>>>> library without PID filtering, in which case base load address is not known
>>>>> for a library).
>>>>>
>>>>> This is done by storing u64 user_ctx in struct bpf_prog_array_item,
>>>>> corresponding to each attached and run BPF program. Given cgroup BPF programs
>>>>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't
>>>>> have (yet?) support for user_ctx, reuse that space through union of
>>>>> cgroup_storage and new user_ctx field.
>>>>>
>>>>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
>>>>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
>>>>> program execution code, which luckily is now also split from
>>>>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
>>>>> giving access to this user context value from inside a BPF program. Generic
>>>>> perf_event BPF programs will access this value from perf_event itself through
>>>>> passed in BPF program context.
>>>>>
>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>>>> ---
>>>>>     drivers/media/rc/bpf-lirc.c    |  4 ++--
>>>>>     include/linux/bpf.h            | 16 +++++++++++++++-
>>>>>     include/linux/perf_event.h     |  1 +
>>>>>     include/linux/trace_events.h   |  6 +++---
>>>>>     include/uapi/linux/bpf.h       |  7 +++++++
>>>>>     kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
>>>>>     kernel/bpf/syscall.c           |  2 +-
>>>>>     kernel/events/core.c           | 21 ++++++++++++++-------
>>>>>     kernel/trace/bpf_trace.c       |  8 +++++---
>>>>>     tools/include/uapi/linux/bpf.h |  7 +++++++
>>>>>     10 files changed, 73 insertions(+), 28 deletions(-)
>>>>>
>>>>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
>>>>> index afae0afe3f81..7490494273e4 100644
>>>>> --- a/drivers/media/rc/bpf-lirc.c
>>>>> +++ b/drivers/media/rc/bpf-lirc.c
>>>>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
>>>>>                 goto unlock;
>>>>>         }
>>>>>
>>>>> -     ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
>>>>> +     ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
>>>>>         if (ret < 0)
>>>>>                 goto unlock;
>>>>>
>>>> [...]
>>>>>     void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>> index 00b1267ab4f0..bc1fd54a8f58 100644
>>>>> --- a/include/uapi/linux/bpf.h
>>>>> +++ b/include/uapi/linux/bpf.h
>>>>> @@ -1448,6 +1448,13 @@ union bpf_attr {
>>>>>                                 __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
>>>>>                                 __u32           iter_info_len;  /* iter_info length */
>>>>>                         };
>>>>> +                     struct {
>>>>> +                             /* black box user-provided value passed through
>>>>> +                              * to BPF program at the execution time and
>>>>> +                              * accessible through bpf_get_user_ctx() BPF helper
>>>>> +                              */
>>>>> +                             __u64           user_ctx;
>>>>> +                     } perf_event;
>>>>
>>>> Is it possible to fold this field into previous union?
>>>>
>>>>                    union {
>>>>                            __u32           target_btf_id;  /* btf_id of
>>>> target to attach to */
>>>>                            struct {
>>>>                                    __aligned_u64   iter_info;      /*
>>>> extra bpf_iter_link_info */
>>>>                                    __u32           iter_info_len;  /*
>>>> iter_info length */
>>>>                            };
>>>>                    };
>>>>
>>>>
>>>
>>> I didn't want to do it, because different types of BPF links will
>>> accept this user_ctx (or now bpf_cookie). And then we'll have to have
>>> different locations of that field for different types of links.
>>>
>>> For example, when/if we add this user_ctx to BPF iterator programs,
>>> having __u64 user_ctx in the same anonymous union will make it overlap
>>> with iter_info, which is a problem. So I want to have a link
>>> type-specific sections in LINK_CREATE command section, to allow the
>>> same field name at different locations.
>>>
>>> I actually think that we should put iter_info/iter_info_len into a
>>> named field, like this (also added user_ctx for bpf_iter link as a
>>> demonstration):
>>>
>>> struct {
>>>       __aligned_u64 info;
>>>       __u32         info_len;
>>>       __aligned_u64 user_ctx;  /* see how it's at a different offset
>>> than perf_event.user_ctx */
>>> } iter;
>>> struct {
>>>       __u64         user_ctx;
>>> } perf_event;
>>>
>>> (of course keeping already existing fields in anonymous struct for
>>> backwards compatibility)
>>
>> Okay, then since user_ctx may be used by many link types. How
>> about just with the field "user_ctx" without struct perf_event.
> 
> I'd love to do it because it is indeed generic and common field, like
> target_fd. But I'm not sure what you are proposing below. Where
> exactly that user_ctx (now called bpf_cookie) goes in your example? I
> see few possible options that allow preserving ABI backwards
> compatibility. Let's see if you and everyone else likes any of those
> better. I'll use the full LINK_CREATE sub-struct definition from
> bpf_attr to make it clear. And to demonstrate how this can be extended
> to bpf_iter in the future, please note this part as this is an
> important aspect.
> 
> 1. Full backwards compatibility and per-link type sections (my current
> approach):
> 
>          struct { /* struct used by BPF_LINK_CREATE command */
>                  __u32           prog_fd;
>                  union {
>                          __u32           target_fd;
>                          __u32           target_ifindex;
>                  };
>                  __u32           attach_type;
>                  __u32           flags;
>                  union {
>                          __u32           target_btf_id;
>                          struct {
>                                  __aligned_u64   iter_info;
>                                  __u32           iter_info_len;
>                          };
>                          struct {
>                                  __u64           bpf_cookie;
>                          } perf_event;
>                          struct {
>                                  __aligned_u64   info;
>                                  __u32           info_len;
>                                  __aligned_u64   bpf_cookie;
>                          } iter;
>                 };
>          } link_create;
> 
> The good property here is that we can keep easily extending link
> type-specific sections with extra fields where needed. For common
> stuff like bpf_cookie it's suboptimal because we'll need to duplicate
> field definition in each struct inside that union, but I think that's
> fine. From end-user point of view, they will know which type of link
> they are creating, so the use will be straightforward. This is why I
> went with this approach. But let's consider alternatives.
> 
> 2. Non-backwards compatible layout but extra flag to specify that new
> field layout is used.
> 
>          struct { /* struct used by BPF_LINK_CREATE command */
>                  __u32           prog_fd;
>                  union {
>                          __u32           target_fd;
>                          __u32           target_ifindex;
>                  };
>                  __u32           attach_type;
>                  __u32           flags; /* this will start supporting
> some new flag like BPF_F_LINK_CREATE_NEW */
>                  __u64           bpf_cookie; /* common field now */
>                  union { /* this parts is effectively deprecated now */
>                          __u32           target_btf_id;
>                          struct {
>                                  __aligned_u64   iter_info;
>                                  __u32           iter_info_len;
>                          };
>                          struct { /* this is new layout, but needs
> BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */
>                              __u64       bpf_cookie;
>                              union {
>                                  struct {
>                                      __u32     target_btf_id;
>                                  } ext;
>                                  struct {
>                                      __aligned_u64 info;
>                                      __u32         info_len;
>                                  } iter;
>                              }
>                          }
>                  };
>          } link_create;
> 
> This makes bpf_cookie a common field, but at least for EXT (freplace/)
> and ITER (bpf_iter/) links we need to specify extra flag to specify
> that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter
> then will use iter.info and iter.info_len, and can use plain
> bpf_cookie.
> 
> IMO, this is way too confusing and a maintainability nightmare.
> 
> I'm trying to guess what you are proposing, I can read it two ways,
> but let me know if I missed something.
> 
> 3. Just add bpf_cookie field before link type-specific section.
> 
>          struct { /* struct used by BPF_LINK_CREATE command */
>                  __u32           prog_fd;
>                  union {
>                          __u32           target_fd;
>                          __u32           target_ifindex;
>                  };
>                  __u32           attach_type;
>                  __u32           flags;
>                  __u64           bpf_cookie;  // <<<<<<<<<< HERE
>                  union {
>                          __u32           target_btf_id;
>                          struct {
>                                  __aligned_u64   iter_info;
>                                  __u32           iter_info_len;
>                          };
>                  };
>          } link_create;
> 
> This looks really nice and would be great, but that changes offsets
> for target_btf_id/iter_info/iter_info_len, so a no go. The only way to
> rectify this is what proposal #2 above does with an extra flag.
> 
> 4. Add bpf_cookie after link-type specific part:
> 
>          struct { /* struct used by BPF_LINK_CREATE command */
>                  __u32           prog_fd;
>                  union {
>                          __u32           target_fd;
>                          __u32           target_ifindex;
>                  };
>                  __u32           attach_type;
>                  __u32           flags;
>                  union {
>                          __u32           target_btf_id;
>                          struct {
>                                  __aligned_u64   iter_info;
>                                  __u32           iter_info_len;
>                          };
>                          struct {
>                  };
>                  __u64           bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE
>          } link_create;
> 
> This could work. But we are wasting 16 bytes currently used for
> target_btf_id/iter_info/iter_info_len. If we later need to do
> something link type-specific, we can add it to the existing union if
> we need <= 16 bytes, otherwise we'll need to start another union after
> bpf_cookie, splitting this into two link type-specific sections.
> 
> Overall, this might work, especially assuming we won't need to extend
> iter-specific portions. But I really hate that we didn't do named
> structs inside that union (i.e., ext.target_btf_id and
> iter.info/iter.info_len) and I'd like to rectify that in the follow up
> patches with named structs duplicating existing field layout, but with
> proper naming. But splitting this LINK_CREATE bpf_attr part into two
> unions would make it hard and awkward in the future.
> 
> So, thoughts? Did you have something else in mind that I missed?

What I proposed is your option 4. Yes, in the future if there is there
are something we want to add to bpf iter, we can add to iter_info, so
it should not be an issue. Any other new link_type may utilized the same
union with
    struct {
       __aligned_u64  new_type_info;
       __u32          new_type_info_len;
    };
and this will put extensibility into new_type_info.
I know this may be a little bit hassle but it should work.

Your option 1 should work too, which is what I proposed in the beginning
to put into the union and we can feel free to add bpf_cookie for each
individual link type. This is actually cleaner.

> 
> 
>> Sometime like
>>
>> __u64   user_ctx;
>>
>> instead of
>>
>> struct {
>>          __u64   user_ctx;
>> } perf_event;
>>
>>>
>>> I decided to not do that in this patch set, though, to not distract
>>> from the main goal. But I think we should avoid this shared field
>>> "namespace" across different link types going forward.
>>>
>>>
>>>>>                 };
>>>>>         } link_create;
>>>>>
>>>> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-30 21:34           ` Yonghong Song
@ 2021-07-30 22:06             ` Andrii Nakryiko
  2021-07-30 22:28               ` Yonghong Song
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2021-07-30 22:06 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Fri, Jul 30, 2021 at 2:34 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/30/21 10:48 AM, Andrii Nakryiko wrote:
> > On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
> >>> On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> >>>>> Add ability for users to specify custom u64 value when creating BPF link for
> >>>>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
> >>>>>
> >>>>> This is useful for cases when the same BPF program is used for attaching and
> >>>>> processing invocation of different tracepoints/kprobes/uprobes in a generic
> >>>>> fashion, but such that each invocation is distinguished from each other (e.g.,
> >>>>> BPF program can look up additional information associated with a specific
> >>>>> kernel function without having to rely on function IP lookups). This enables
> >>>>> new use cases to be implemented simply and efficiently that previously were
> >>>>> possible only through code generation (and thus multiple instances of almost
> >>>>> identical BPF program) or compilation at runtime (BCC-style) on target hosts
> >>>>> (even more expensive resource-wise). For uprobes it is not even possible in
> >>>>> some cases to know function IP before hand (e.g., when attaching to shared
> >>>>> library without PID filtering, in which case base load address is not known
> >>>>> for a library).
> >>>>>
> >>>>> This is done by storing u64 user_ctx in struct bpf_prog_array_item,
> >>>>> corresponding to each attached and run BPF program. Given cgroup BPF programs
> >>>>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't
> >>>>> have (yet?) support for user_ctx, reuse that space through union of
> >>>>> cgroup_storage and new user_ctx field.
> >>>>>
> >>>>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> >>>>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> >>>>> program execution code, which luckily is now also split from
> >>>>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> >>>>> giving access to this user context value from inside a BPF program. Generic
> >>>>> perf_event BPF programs will access this value from perf_event itself through
> >>>>> passed in BPF program context.
> >>>>>
> >>>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >>>>> ---
> >>>>>     drivers/media/rc/bpf-lirc.c    |  4 ++--
> >>>>>     include/linux/bpf.h            | 16 +++++++++++++++-
> >>>>>     include/linux/perf_event.h     |  1 +
> >>>>>     include/linux/trace_events.h   |  6 +++---
> >>>>>     include/uapi/linux/bpf.h       |  7 +++++++
> >>>>>     kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
> >>>>>     kernel/bpf/syscall.c           |  2 +-
> >>>>>     kernel/events/core.c           | 21 ++++++++++++++-------
> >>>>>     kernel/trace/bpf_trace.c       |  8 +++++---
> >>>>>     tools/include/uapi/linux/bpf.h |  7 +++++++
> >>>>>     10 files changed, 73 insertions(+), 28 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
> >>>>> index afae0afe3f81..7490494273e4 100644
> >>>>> --- a/drivers/media/rc/bpf-lirc.c
> >>>>> +++ b/drivers/media/rc/bpf-lirc.c
> >>>>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
> >>>>>                 goto unlock;
> >>>>>         }
> >>>>>
> >>>>> -     ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
> >>>>> +     ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
> >>>>>         if (ret < 0)
> >>>>>                 goto unlock;
> >>>>>
> >>>> [...]
> >>>>>     void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>>>> index 00b1267ab4f0..bc1fd54a8f58 100644
> >>>>> --- a/include/uapi/linux/bpf.h
> >>>>> +++ b/include/uapi/linux/bpf.h
> >>>>> @@ -1448,6 +1448,13 @@ union bpf_attr {
> >>>>>                                 __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
> >>>>>                                 __u32           iter_info_len;  /* iter_info length */
> >>>>>                         };
> >>>>> +                     struct {
> >>>>> +                             /* black box user-provided value passed through
> >>>>> +                              * to BPF program at the execution time and
> >>>>> +                              * accessible through bpf_get_user_ctx() BPF helper
> >>>>> +                              */
> >>>>> +                             __u64           user_ctx;
> >>>>> +                     } perf_event;
> >>>>
> >>>> Is it possible to fold this field into previous union?
> >>>>
> >>>>                    union {
> >>>>                            __u32           target_btf_id;  /* btf_id of
> >>>> target to attach to */
> >>>>                            struct {
> >>>>                                    __aligned_u64   iter_info;      /*
> >>>> extra bpf_iter_link_info */
> >>>>                                    __u32           iter_info_len;  /*
> >>>> iter_info length */
> >>>>                            };
> >>>>                    };
> >>>>
> >>>>
> >>>
> >>> I didn't want to do it, because different types of BPF links will
> >>> accept this user_ctx (or now bpf_cookie). And then we'll have to have
> >>> different locations of that field for different types of links.
> >>>
> >>> For example, when/if we add this user_ctx to BPF iterator programs,
> >>> having __u64 user_ctx in the same anonymous union will make it overlap
> >>> with iter_info, which is a problem. So I want to have a link
> >>> type-specific sections in LINK_CREATE command section, to allow the
> >>> same field name at different locations.
> >>>
> >>> I actually think that we should put iter_info/iter_info_len into a
> >>> named field, like this (also added user_ctx for bpf_iter link as a
> >>> demonstration):
> >>>
> >>> struct {
> >>>       __aligned_u64 info;
> >>>       __u32         info_len;
> >>>       __aligned_u64 user_ctx;  /* see how it's at a different offset
> >>> than perf_event.user_ctx */
> >>> } iter;
> >>> struct {
> >>>       __u64         user_ctx;
> >>> } perf_event;
> >>>
> >>> (of course keeping already existing fields in anonymous struct for
> >>> backwards compatibility)
> >>
> >> Okay, then since user_ctx may be used by many link types. How
> >> about just with the field "user_ctx" without struct perf_event.
> >
> > I'd love to do it because it is indeed generic and common field, like
> > target_fd. But I'm not sure what you are proposing below. Where
> > exactly that user_ctx (now called bpf_cookie) goes in your example? I
> > see few possible options that allow preserving ABI backwards
> > compatibility. Let's see if you and everyone else likes any of those
> > better. I'll use the full LINK_CREATE sub-struct definition from
> > bpf_attr to make it clear. And to demonstrate how this can be extended
> > to bpf_iter in the future, please note this part as this is an
> > important aspect.
> >
> > 1. Full backwards compatibility and per-link type sections (my current
> > approach):
> >
> >          struct { /* struct used by BPF_LINK_CREATE command */
> >                  __u32           prog_fd;
> >                  union {
> >                          __u32           target_fd;
> >                          __u32           target_ifindex;
> >                  };
> >                  __u32           attach_type;
> >                  __u32           flags;
> >                  union {
> >                          __u32           target_btf_id;
> >                          struct {
> >                                  __aligned_u64   iter_info;
> >                                  __u32           iter_info_len;
> >                          };
> >                          struct {
> >                                  __u64           bpf_cookie;
> >                          } perf_event;
> >                          struct {
> >                                  __aligned_u64   info;
> >                                  __u32           info_len;
> >                                  __aligned_u64   bpf_cookie;
> >                          } iter;
> >                 };
> >          } link_create;
> >
> > The good property here is that we can keep easily extending link
> > type-specific sections with extra fields where needed. For common
> > stuff like bpf_cookie it's suboptimal because we'll need to duplicate
> > field definition in each struct inside that union, but I think that's
> > fine. From end-user point of view, they will know which type of link
> > they are creating, so the use will be straightforward. This is why I
> > went with this approach. But let's consider alternatives.
> >
> > 2. Non-backwards compatible layout but extra flag to specify that new
> > field layout is used.
> >
> >          struct { /* struct used by BPF_LINK_CREATE command */
> >                  __u32           prog_fd;
> >                  union {
> >                          __u32           target_fd;
> >                          __u32           target_ifindex;
> >                  };
> >                  __u32           attach_type;
> >                  __u32           flags; /* this will start supporting
> > some new flag like BPF_F_LINK_CREATE_NEW */
> >                  __u64           bpf_cookie; /* common field now */
> >                  union { /* this parts is effectively deprecated now */
> >                          __u32           target_btf_id;
> >                          struct {
> >                                  __aligned_u64   iter_info;
> >                                  __u32           iter_info_len;
> >                          };
> >                          struct { /* this is new layout, but needs
> > BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */
> >                              __u64       bpf_cookie;
> >                              union {
> >                                  struct {
> >                                      __u32     target_btf_id;
> >                                  } ext;
> >                                  struct {
> >                                      __aligned_u64 info;
> >                                      __u32         info_len;
> >                                  } iter;
> >                              }
> >                          }
> >                  };
> >          } link_create;
> >
> > This makes bpf_cookie a common field, but at least for EXT (freplace/)
> > and ITER (bpf_iter/) links we need to specify extra flag to specify
> > that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter
> > then will use iter.info and iter.info_len, and can use plain
> > bpf_cookie.
> >
> > IMO, this is way too confusing and a maintainability nightmare.
> >
> > I'm trying to guess what you are proposing, I can read it two ways,
> > but let me know if I missed something.
> >
> > 3. Just add bpf_cookie field before link type-specific section.
> >
> >          struct { /* struct used by BPF_LINK_CREATE command */
> >                  __u32           prog_fd;
> >                  union {
> >                          __u32           target_fd;
> >                          __u32           target_ifindex;
> >                  };
> >                  __u32           attach_type;
> >                  __u32           flags;
> >                  __u64           bpf_cookie;  // <<<<<<<<<< HERE
> >                  union {
> >                          __u32           target_btf_id;
> >                          struct {
> >                                  __aligned_u64   iter_info;
> >                                  __u32           iter_info_len;
> >                          };
> >                  };
> >          } link_create;
> >
> > This looks really nice and would be great, but that changes offsets
> > for target_btf_id/iter_info/iter_info_len, so a no go. The only way to
> > rectify this is what proposal #2 above does with an extra flag.
> >
> > 4. Add bpf_cookie after link-type specific part:
> >
> >          struct { /* struct used by BPF_LINK_CREATE command */
> >                  __u32           prog_fd;
> >                  union {
> >                          __u32           target_fd;
> >                          __u32           target_ifindex;
> >                  };
> >                  __u32           attach_type;
> >                  __u32           flags;
> >                  union {
> >                          __u32           target_btf_id;
> >                          struct {
> >                                  __aligned_u64   iter_info;
> >                                  __u32           iter_info_len;
> >                          };
> >                          struct {
> >                  };
> >                  __u64           bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE
> >          } link_create;
> >
> > This could work. But we are wasting 16 bytes currently used for
> > target_btf_id/iter_info/iter_info_len. If we later need to do
> > something link type-specific, we can add it to the existing union if
> > we need <= 16 bytes, otherwise we'll need to start another union after
> > bpf_cookie, splitting this into two link type-specific sections.
> >
> > Overall, this might work, especially assuming we won't need to extend
> > iter-specific portions. But I really hate that we didn't do named
> > structs inside that union (i.e., ext.target_btf_id and
> > iter.info/iter.info_len) and I'd like to rectify that in the follow up
> > patches with named structs duplicating existing field layout, but with
> > proper naming. But splitting this LINK_CREATE bpf_attr part into two
> > unions would make it hard and awkward in the future.
> >
> > So, thoughts? Did you have something else in mind that I missed?
>
> What I proposed is your option 4. Yes, in the future if there is there
> are something we want to add to bpf iter, we can add to iter_info, so
> it should not be an issue. Any other new link_type may utilized the same
> union with
>     struct {
>        __aligned_u64  new_type_info;
>        __u32          new_type_info_len;
>     };
> and this will put extensibility into new_type_info.
> I know this may be a little bit hassle but it should work.
>

I see what you mean. With this extra pointer we shouldn't need more
than 16 bytes per link type. That's unnecessary complication for a lot
of simpler types of links, unfortunately, though definitely an option.

We could have also done approach #4 but maybe leave 16-32 bytes before
bpf_cookie for the union, so that it's much less likely that we'll run
out of space there. Not very clean either, so I don't know.

I'll keep it here for discussion for now, let's see if anyone has
strong preferences and opinions.

> Your option 1 should work too, which is what I proposed in the beginning
> to put into the union and we can feel free to add bpf_cookie for each
> individual link type. This is actually cleaner.

Oh, you did? I must have misunderstood then. If you like approach #1,
then it's what I'm doing right now, so let's keep it as is and let's
see if anyone else has preferences.

>
> >
> >
> >> Sometime like
> >>
> >> __u64   user_ctx;
> >>
> >> instead of
> >>
> >> struct {
> >>          __u64   user_ctx;
> >> } perf_event;
> >>
> >>>
> >>> I decided to not do that in this patch set, though, to not distract
> >>> from the main goal. But I think we should avoid this shared field
> >>> "namespace" across different link types going forward.
> >>>
> >>>
> >>>>>                 };
> >>>>>         } link_create;
> >>>>>
> >>>> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
  2021-07-30 22:06             ` Andrii Nakryiko
@ 2021-07-30 22:28               ` Yonghong Song
  0 siblings, 0 replies; 43+ messages in thread
From: Yonghong Song @ 2021-07-30 22:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra



On 7/30/21 3:06 PM, Andrii Nakryiko wrote:
> On Fri, Jul 30, 2021 at 2:34 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/30/21 10:48 AM, Andrii Nakryiko wrote:
>>> On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
>>>>> On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
>>>>>>> Add ability for users to specify custom u64 value when creating BPF link for
>>>>>>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
>>>>>>>
>>>>>>> This is useful for cases when the same BPF program is used for attaching and
>>>>>>> processing invocation of different tracepoints/kprobes/uprobes in a generic
>>>>>>> fashion, but such that each invocation is distinguished from each other (e.g.,
>>>>>>> BPF program can look up additional information associated with a specific
>>>>>>> kernel function without having to rely on function IP lookups). This enables
>>>>>>> new use cases to be implemented simply and efficiently that previously were
>>>>>>> possible only through code generation (and thus multiple instances of almost
>>>>>>> identical BPF program) or compilation at runtime (BCC-style) on target hosts
>>>>>>> (even more expensive resource-wise). For uprobes it is not even possible in
>>>>>>> some cases to know function IP before hand (e.g., when attaching to shared
>>>>>>> library without PID filtering, in which case base load address is not known
>>>>>>> for a library).
>>>>>>>
>>>>>>> This is done by storing u64 user_ctx in struct bpf_prog_array_item,
>>>>>>> corresponding to each attached and run BPF program. Given cgroup BPF programs
>>>>>>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't
>>>>>>> have (yet?) support for user_ctx, reuse that space through union of
>>>>>>> cgroup_storage and new user_ctx field.
>>>>>>>
>>>>>>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
>>>>>>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
>>>>>>> program execution code, which luckily is now also split from
>>>>>>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
>>>>>>> giving access to this user context value from inside a BPF program. Generic
>>>>>>> perf_event BPF programs will access this value from perf_event itself through
>>>>>>> passed in BPF program context.
>>>>>>>
>>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>>>>>> ---
>>>>>>>      drivers/media/rc/bpf-lirc.c    |  4 ++--
>>>>>>>      include/linux/bpf.h            | 16 +++++++++++++++-
>>>>>>>      include/linux/perf_event.h     |  1 +
>>>>>>>      include/linux/trace_events.h   |  6 +++---
>>>>>>>      include/uapi/linux/bpf.h       |  7 +++++++
>>>>>>>      kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
>>>>>>>      kernel/bpf/syscall.c           |  2 +-
>>>>>>>      kernel/events/core.c           | 21 ++++++++++++++-------
>>>>>>>      kernel/trace/bpf_trace.c       |  8 +++++---
>>>>>>>      tools/include/uapi/linux/bpf.h |  7 +++++++
>>>>>>>      10 files changed, 73 insertions(+), 28 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
>>>>>>> index afae0afe3f81..7490494273e4 100644
>>>>>>> --- a/drivers/media/rc/bpf-lirc.c
>>>>>>> +++ b/drivers/media/rc/bpf-lirc.c
>>>>>>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
>>>>>>>                  goto unlock;
>>>>>>>          }
>>>>>>>
>>>>>>> -     ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
>>>>>>> +     ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
>>>>>>>          if (ret < 0)
>>>>>>>                  goto unlock;
>>>>>>>
>>>>>> [...]
>>>>>>>      void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>>>> index 00b1267ab4f0..bc1fd54a8f58 100644
>>>>>>> --- a/include/uapi/linux/bpf.h
>>>>>>> +++ b/include/uapi/linux/bpf.h
>>>>>>> @@ -1448,6 +1448,13 @@ union bpf_attr {
>>>>>>>                                  __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
>>>>>>>                                  __u32           iter_info_len;  /* iter_info length */
>>>>>>>                          };
>>>>>>> +                     struct {
>>>>>>> +                             /* black box user-provided value passed through
>>>>>>> +                              * to BPF program at the execution time and
>>>>>>> +                              * accessible through bpf_get_user_ctx() BPF helper
>>>>>>> +                              */
>>>>>>> +                             __u64           user_ctx;
>>>>>>> +                     } perf_event;
>>>>>>
>>>>>> Is it possible to fold this field into previous union?
>>>>>>
>>>>>>                     union {
>>>>>>                             __u32           target_btf_id;  /* btf_id of
>>>>>> target to attach to */
>>>>>>                             struct {
>>>>>>                                     __aligned_u64   iter_info;      /*
>>>>>> extra bpf_iter_link_info */
>>>>>>                                     __u32           iter_info_len;  /*
>>>>>> iter_info length */
>>>>>>                             };
>>>>>>                     };
>>>>>>
>>>>>>
>>>>>
>>>>> I didn't want to do it, because different types of BPF links will
>>>>> accept this user_ctx (or now bpf_cookie). And then we'll have to have
>>>>> different locations of that field for different types of links.
>>>>>
>>>>> For example, when/if we add this user_ctx to BPF iterator programs,
>>>>> having __u64 user_ctx in the same anonymous union will make it overlap
>>>>> with iter_info, which is a problem. So I want to have a link
>>>>> type-specific sections in LINK_CREATE command section, to allow the
>>>>> same field name at different locations.
>>>>>
>>>>> I actually think that we should put iter_info/iter_info_len into a
>>>>> named field, like this (also added user_ctx for bpf_iter link as a
>>>>> demonstration):
>>>>>
>>>>> struct {
>>>>>        __aligned_u64 info;
>>>>>        __u32         info_len;
>>>>>        __aligned_u64 user_ctx;  /* see how it's at a different offset
>>>>> than perf_event.user_ctx */
>>>>> } iter;
>>>>> struct {
>>>>>        __u64         user_ctx;
>>>>> } perf_event;
>>>>>
>>>>> (of course keeping already existing fields in anonymous struct for
>>>>> backwards compatibility)
>>>>
>>>> Okay, then since user_ctx may be used by many link types. How
>>>> about just with the field "user_ctx" without struct perf_event.
>>>
>>> I'd love to do it because it is indeed generic and common field, like
>>> target_fd. But I'm not sure what you are proposing below. Where
>>> exactly that user_ctx (now called bpf_cookie) goes in your example? I
>>> see few possible options that allow preserving ABI backwards
>>> compatibility. Let's see if you and everyone else likes any of those
>>> better. I'll use the full LINK_CREATE sub-struct definition from
>>> bpf_attr to make it clear. And to demonstrate how this can be extended
>>> to bpf_iter in the future, please note this part as this is an
>>> important aspect.
>>>
>>> 1. Full backwards compatibility and per-link type sections (my current
>>> approach):
>>>
>>>           struct { /* struct used by BPF_LINK_CREATE command */
>>>                   __u32           prog_fd;
>>>                   union {
>>>                           __u32           target_fd;
>>>                           __u32           target_ifindex;
>>>                   };
>>>                   __u32           attach_type;
>>>                   __u32           flags;
>>>                   union {
>>>                           __u32           target_btf_id;
>>>                           struct {
>>>                                   __aligned_u64   iter_info;
>>>                                   __u32           iter_info_len;
>>>                           };
>>>                           struct {
>>>                                   __u64           bpf_cookie;
>>>                           } perf_event;
>>>                           struct {
>>>                                   __aligned_u64   info;
>>>                                   __u32           info_len;
>>>                                   __aligned_u64   bpf_cookie;
>>>                           } iter;
>>>                  };
>>>           } link_create;
>>>
>>> The good property here is that we can keep easily extending link
>>> type-specific sections with extra fields where needed. For common
>>> stuff like bpf_cookie it's suboptimal because we'll need to duplicate
>>> field definition in each struct inside that union, but I think that's
>>> fine. From end-user point of view, they will know which type of link
>>> they are creating, so the use will be straightforward. This is why I
>>> went with this approach. But let's consider alternatives.
>>>
>>> 2. Non-backwards compatible layout but extra flag to specify that new
>>> field layout is used.
>>>
>>>           struct { /* struct used by BPF_LINK_CREATE command */
>>>                   __u32           prog_fd;
>>>                   union {
>>>                           __u32           target_fd;
>>>                           __u32           target_ifindex;
>>>                   };
>>>                   __u32           attach_type;
>>>                   __u32           flags; /* this will start supporting
>>> some new flag like BPF_F_LINK_CREATE_NEW */
>>>                   __u64           bpf_cookie; /* common field now */
>>>                   union { /* this parts is effectively deprecated now */
>>>                           __u32           target_btf_id;
>>>                           struct {
>>>                                   __aligned_u64   iter_info;
>>>                                   __u32           iter_info_len;
>>>                           };
>>>                           struct { /* this is new layout, but needs
>>> BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */
>>>                               __u64       bpf_cookie;
>>>                               union {
>>>                                   struct {
>>>                                       __u32     target_btf_id;
>>>                                   } ext;
>>>                                   struct {
>>>                                       __aligned_u64 info;
>>>                                       __u32         info_len;
>>>                                   } iter;
>>>                               }
>>>                           }
>>>                   };
>>>           } link_create;
>>>
>>> This makes bpf_cookie a common field, but at least for EXT (freplace/)
>>> and ITER (bpf_iter/) links we need to specify extra flag to specify
>>> that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter
>>> then will use iter.info and iter.info_len, and can use plain
>>> bpf_cookie.
>>>
>>> IMO, this is way too confusing and a maintainability nightmare.
>>>
>>> I'm trying to guess what you are proposing, I can read it two ways,
>>> but let me know if I missed something.
>>>
>>> 3. Just add bpf_cookie field before link type-specific section.
>>>
>>>           struct { /* struct used by BPF_LINK_CREATE command */
>>>                   __u32           prog_fd;
>>>                   union {
>>>                           __u32           target_fd;
>>>                           __u32           target_ifindex;
>>>                   };
>>>                   __u32           attach_type;
>>>                   __u32           flags;
>>>                   __u64           bpf_cookie;  // <<<<<<<<<< HERE
>>>                   union {
>>>                           __u32           target_btf_id;
>>>                           struct {
>>>                                   __aligned_u64   iter_info;
>>>                                   __u32           iter_info_len;
>>>                           };
>>>                   };
>>>           } link_create;
>>>
>>> This looks really nice and would be great, but that changes offsets
>>> for target_btf_id/iter_info/iter_info_len, so a no go. The only way to
>>> rectify this is what proposal #2 above does with an extra flag.
>>>
>>> 4. Add bpf_cookie after link-type specific part:
>>>
>>>           struct { /* struct used by BPF_LINK_CREATE command */
>>>                   __u32           prog_fd;
>>>                   union {
>>>                           __u32           target_fd;
>>>                           __u32           target_ifindex;
>>>                   };
>>>                   __u32           attach_type;
>>>                   __u32           flags;
>>>                   union {
>>>                           __u32           target_btf_id;
>>>                           struct {
>>>                                   __aligned_u64   iter_info;
>>>                                   __u32           iter_info_len;
>>>                           };
>>>                           struct {
>>>                   };
>>>                   __u64           bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE
>>>           } link_create;
>>>
>>> This could work. But we are wasting 16 bytes currently used for
>>> target_btf_id/iter_info/iter_info_len. If we later need to do
>>> something link type-specific, we can add it to the existing union if
>>> we need <= 16 bytes, otherwise we'll need to start another union after
>>> bpf_cookie, splitting this into two link type-specific sections.
>>>
>>> Overall, this might work, especially assuming we won't need to extend
>>> iter-specific portions. But I really hate that we didn't do named
>>> structs inside that union (i.e., ext.target_btf_id and
>>> iter.info/iter.info_len) and I'd like to rectify that in the follow up
>>> patches with named structs duplicating existing field layout, but with
>>> proper naming. But splitting this LINK_CREATE bpf_attr part into two
>>> unions would make it hard and awkward in the future.
>>>
>>> So, thoughts? Did you have something else in mind that I missed?
>>
>> What I proposed is your option 4. Yes, in the future if there is there
>> are something we want to add to bpf iter, we can add to iter_info, so
>> it should not be an issue. Any other new link_type may utilized the same
>> union with
>>      struct {
>>         __aligned_u64  new_type_info;
>>         __u32          new_type_info_len;
>>      };
>> and this will put extensibility into new_type_info.
>> I know this may be a little bit hassle but it should work.
>>
> 
> I see what you mean. With this extra pointer we shouldn't need more
> than 16 bytes per link type. That's unnecessary complication for a lot
> of simpler types of links, unfortunately, though definitely an option.
> 
> We could have also done approach #4 but maybe leave 16-32 bytes before
> bpf_cookie for the union, so that it's much less likely that we'll run
> out of space there. Not very clean either, so I don't know.
> 
> I'll keep it here for discussion for now, let's see if anyone has
> strong preferences and opinions.
> 
>> Your option 1 should work too, which is what I proposed in the beginning
>> to put into the union and we can feel free to add bpf_cookie for each
>> individual link type. This is actually cleaner.
> 
> Oh, you did? I must have misunderstood then. If you like approach #1,
> then it's what I'm doing right now, so let's keep it as is and let's
> see if anyone else has preferences.

Just checked old emails. It is actually my misunderstanding.
I probably mismatched "{" and "}" and thought you placed
outside the union and made the suggestion. So never mind,
we are on the same page :-)

> 
>>
>>>
>>>
>>>> Sometime like
>>>>
>>>> __u64   user_ctx;
>>>>
>>>> instead of
>>>>
>>>> struct {
>>>>           __u64   user_ctx;
>>>> } perf_event;
>>>>
>>>>>
>>>>> I decided to not do that in this patch set, though, to not distract
>>>>> from the main goal. But I think we should avoid this shared field
>>>>> "namespace" across different link types going forward.
>>>>>
>>>>>
>>>>>>>                  };
>>>>>>>          } link_create;
>>>>>>>
>>>>>> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2021-07-30 22:29 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
2021-07-26 16:11 ` [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
2021-07-29 16:49   ` Yonghong Song
2021-07-30  4:05     ` Andrii Nakryiko
2021-07-26 16:11 ` [PATCH v2 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
2021-07-29 17:04   ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
2021-07-27  8:48   ` Peter Zijlstra
2021-07-29 17:09   ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
2021-07-27  9:04   ` Peter Zijlstra
2021-07-30  4:23     ` Andrii Nakryiko
2021-07-27  9:12   ` Peter Zijlstra
2021-07-27 20:56     ` Andrii Nakryiko
2021-07-27 15:40   ` Jiri Olsa
2021-07-27 20:56     ` Andrii Nakryiko
2021-07-29 17:35   ` Yonghong Song
2021-07-30  4:16     ` Andrii Nakryiko
2021-07-30  5:42       ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links Andrii Nakryiko
2021-07-27  9:11   ` Peter Zijlstra
2021-07-27 21:09     ` Andrii Nakryiko
2021-07-28  8:58       ` Peter Zijlstra
2021-07-29 18:00   ` Yonghong Song
2021-07-30  4:31     ` Andrii Nakryiko
2021-07-30  5:49       ` Yonghong Song
2021-07-30 17:48         ` Andrii Nakryiko
2021-07-30 21:34           ` Yonghong Song
2021-07-30 22:06             ` Andrii Nakryiko
2021-07-30 22:28               ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value Andrii Nakryiko
2021-07-29 18:17   ` Yonghong Song
2021-07-30  4:49     ` Andrii Nakryiko
2021-07-30  5:53       ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 10/14] libbpf: add user_ctx support to bpf_link_create() API Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 11/14] libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
2021-07-30  1:11   ` Rafael David Tinoco
2021-07-26 16:12 ` [PATCH v2 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h} Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 14/14] selftests/bpf: add user_ctx selftests for high-level APIs Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.