All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie
@ 2021-07-30  5:33 Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
                   ` (14 more replies)
  0 siblings, 15 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:33 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

This patch set implements an ability for users to specify custom black box u64
value for each BPF program attachment, bpf_cookie, which is available to BPF
program at runtime. This is a feature that's critically missing for cases when
some sort of generic processing needs to be done by the common BPF program
logic (or even exactly the same BPF program) across multiple BPF hooks (e.g.,
many uniformly handled kprobes) and it's important to be able to distinguish
between each BPF hook at runtime (e.g., for additional configuration lookup).

Currently, something like that can be only achieved through:
  - code-generation and BPF program cloning, which is very complicated and
    unmaintainable;
  - on-the-fly C code generation and further runtime compilation, which is
    what BCC uses and allows to do pretty simply. The big downside is a very
    heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
    many BPF program clones and the compilation process itself);
  - in some cases (kprobes and sometimes uprobes) it's possible to do function
    IP lookup to get function-specific configuration. This doesn't work for
    all the cases (e.g., when attaching uprobes to shared libraries) and has
    higher runtime overhead and additional programming complexity due to
    BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
    BPF helper was added, it was also very complicated and unstable (API-wise)
    to get traced function's IP from fentry/fexit and kretprobe.

With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
to build generic tracing tooling simply and efficiently, ability to provide
additional bpf_cookie value for each *attachment* (as opposed to each BPF
program) is extremely important. Two immediate users of this functionality are
going to be libbpf-based USDT library (currently in development) and retsnoop
([0]), but I'm sure more applications will come once users get this feature in
their kernels.

To achieve above described, all perf_event-based BPF hooks are made available
through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
LINK_CREATE command for program attachments and generally brings
perf_event-based attachments into a common BPF link infrastructure.

With that, LINK_CREATE gets ability to pass throught bpf_cookie value during
link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF
helper is added to allow fetching this value at runtime from BPF program side.
BPF cookie is stored either on struct perf_event itself and fetched from the
BPF program context, or is passed through ambient BPF run context, added in
c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current").

On the libbpf side of things, BPF perf link is utilized whenever is supported
by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
All the tracing attach APIs are extended with OPTS and bpf_cookie is passed
through corresponding opts structs.

Last part of the patch set adds few self-tests utilizing new APIs.

There are also a few refactorings along the way to make things cleaner and
easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
throughout libbpf and selftests.

Follow-up patches will extend bpf_cookie to fentry/fexit programs.

  [0] https://github.com/anakryiko/retsnoop

Cc: Peter Zijlstra <peterz@infradead.org> # for perf_event changes

v2->v3:
  - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter);
  - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri);
  - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong);

v1->v2:
  - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.

Andrii Nakryiko (14):
  bpf: refactor BPF_PROG_RUN into a function
  bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions
  bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input
  bpf: implement minimal BPF perf link
  bpf: allow to specify user-provided bpf_cookie for BPF perf links
  bpf: add bpf_get_attach_cookie() BPF helper to access bpf_cookie value
  libbpf: re-build libbpf.so when libbpf.map changes
  libbpf: remove unused bpf_link's destroy operation, but add dealloc
  libbpf: use BPF perf link when supported by kernel
  libbpf: add bpf_cookie support to bpf_link_create() API
  libbpf: add bpf_cookie to perf_event, kprobe, uprobe, and tp attach
    APIs
  selftests/bpf: test low-level perf BPF link API
  selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h}
  selftests/bpf: add bpf_cookie selftests for high-level APIs

 drivers/media/rc/bpf-lirc.c                   |   4 +-
 include/linux/bpf.h                           | 206 ++++++++------
 include/linux/bpf_types.h                     |   3 +
 include/linux/filter.h                        |  63 +++--
 include/linux/perf_event.h                    |   1 +
 include/linux/trace_events.h                  |   7 +-
 include/uapi/linux/bpf.h                      |  25 ++
 kernel/bpf/cgroup.c                           |  32 +--
 kernel/bpf/core.c                             |  29 +-
 kernel/bpf/syscall.c                          | 105 +++++++-
 kernel/events/core.c                          |  72 ++---
 kernel/trace/bpf_trace.c                      |  45 +++-
 tools/include/uapi/linux/bpf.h                |  25 ++
 tools/lib/bpf/Makefile                        |  10 +-
 tools/lib/bpf/bpf.c                           |  32 ++-
 tools/lib/bpf/bpf.h                           |   8 +-
 tools/lib/bpf/libbpf.c                        | 196 +++++++++++---
 tools/lib/bpf/libbpf.h                        |  71 ++++-
 tools/lib/bpf/libbpf.map                      |   3 +
 tools/lib/bpf/libbpf_internal.h               |  32 ++-
 .../selftests/bpf/prog_tests/attach_probe.c   |  61 +----
 .../selftests/bpf/prog_tests/bpf_cookie.c     | 254 ++++++++++++++++++
 .../selftests/bpf/prog_tests/perf_link.c      |  89 ++++++
 .../selftests/bpf/progs/test_bpf_cookie.c     |  85 ++++++
 .../selftests/bpf/progs/test_perf_link.c      |  16 ++
 tools/testing/selftests/bpf/trace_helpers.c   |  66 +++++
 tools/testing/selftests/bpf/trace_helpers.h   |   3 +
 27 files changed, 1230 insertions(+), 313 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_cookie.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_link.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30 17:19   ` Yonghong Song
  2021-08-09 22:43   ` Daniel Borkmann
  2021-07-30  5:34 ` [PATCH v3 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Turn BPF_PROG_RUN into a proper always inlined function. No functional and
performance changes are intended, but it makes it much easier to understand
what's going on with how BPF programs are actually get executed. It's more
obvious what types and callbacks are expected. Also extra () around input
parameters can be dropped, as well as `__` variable prefixes intended to avoid
naming collisions, which makes the code simpler to read and write.

This refactoring also highlighted one possible issue. BPF_PROG_RUN is both
a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
BPF_PROG_RUN into a function causes naming conflict compilation error. So
rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. To avoid unnecessary code
churn across many networking calls to BPF_PROG_RUN, #define BPF_PROG_RUN as an
alias to bpf_prog_run.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/filter.h | 58 +++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 21 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ba36989f711a..18518e321ce4 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -585,25 +585,41 @@ struct sk_filter {
 
 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 
-#define __BPF_PROG_RUN(prog, ctx, dfunc)	({			\
-	u32 __ret;							\
-	cant_migrate();							\
-	if (static_branch_unlikely(&bpf_stats_enabled_key)) {		\
-		struct bpf_prog_stats *__stats;				\
-		u64 __start = sched_clock();				\
-		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
-		__stats = this_cpu_ptr(prog->stats);			\
-		u64_stats_update_begin(&__stats->syncp);		\
-		__stats->cnt++;						\
-		__stats->nsecs += sched_clock() - __start;		\
-		u64_stats_update_end(&__stats->syncp);			\
-	} else {							\
-		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
-	}								\
-	__ret; })
-
-#define BPF_PROG_RUN(prog, ctx)						\
-	__BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func)
+typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
+					  const struct bpf_insn *insnsi,
+					  unsigned int (*bpf_func)(const void *,
+								   const struct bpf_insn *));
+
+static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
+					  const void *ctx,
+					  bpf_dispatcher_fn dfunc)
+{
+	u32 ret;
+
+	cant_migrate();
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {
+		struct bpf_prog_stats *stats;
+		u64 start = sched_clock();
+
+		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
+		stats = this_cpu_ptr(prog->stats);
+		u64_stats_update_begin(&stats->syncp);
+		stats->cnt++;
+		stats->nsecs += sched_clock() - start;
+		u64_stats_update_end(&stats->syncp);
+	} else {
+		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
+	}
+	return ret;
+}
+
+static __always_inline u32 bpf_prog_run(const struct bpf_prog *prog, const void *ctx)
+{
+	return __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
+}
+
+/* avoids name conflict with BPF_PROG_RUN enum definedi uapi/linux/bpf.h */
+#define BPF_PROG_RUN bpf_prog_run
 
 /*
  * Use in preemptible and therefore migratable context to make sure that
@@ -622,7 +638,7 @@ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
 	u32 ret;
 
 	migrate_disable();
-	ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func);
+	ret = bpf_prog_run(prog, ctx);
 	migrate_enable();
 	return ret;
 }
@@ -768,7 +784,7 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	return __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-08-09 23:00   ` Daniel Borkmann
  2021-07-30  5:34 ` [PATCH v3 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Yonghong Song

Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
with all the same readability and maintainability benefits. Making them into
functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
functions. Also, explicitly specifying the type of the BPF prog run callback
required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
internally to const struct sk_buff.

Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
the differences in bpf_run_ctx used for those two different use cases.

I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
everyone.

The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
pass-through user-provided context value in the next patch.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h      | 187 +++++++++++++++++++++++----------------
 include/linux/filter.h   |   5 +-
 kernel/bpf/cgroup.c      |  32 +++----
 kernel/trace/bpf_trace.c |   2 +-
 4 files changed, 132 insertions(+), 94 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c8cc09013210..9c44b56b698f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1146,67 +1146,124 @@ struct bpf_run_ctx {};
 
 struct bpf_cg_run_ctx {
 	struct bpf_run_ctx run_ctx;
-	struct bpf_prog_array_item *prog_item;
+	const struct bpf_prog_array_item *prog_item;
 };
 
+#ifdef CONFIG_BPF_SYSCALL
+static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
+{
+	struct bpf_run_ctx *old_ctx;
+
+	old_ctx = current->bpf_ctx;
+	current->bpf_ctx = new_ctx;
+	return old_ctx;
+}
+
+static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
+{
+	current->bpf_ctx = old_ctx;
+}
+#else /* CONFIG_BPF_SYSCALL */
+static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
+{
+	return NULL;
+}
+
+static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
+{
+}
+#endif /* CONFIG_BPF_SYSCALL */
+
+
 /* BPF program asks to bypass CAP_NET_BIND_SERVICE in bind. */
 #define BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE			(1 << 0)
 /* BPF program asks to set CN on the packet. */
 #define BPF_RET_SET_CN						(1 << 0)
 
-#define BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, ret_flags)		\
-	({								\
-		struct bpf_prog_array_item *_item;			\
-		struct bpf_prog *_prog;					\
-		struct bpf_prog_array *_array;				\
-		struct bpf_run_ctx *old_run_ctx;			\
-		struct bpf_cg_run_ctx run_ctx;				\
-		u32 _ret = 1;						\
-		u32 func_ret;						\
-		migrate_disable();					\
-		rcu_read_lock();					\
-		_array = rcu_dereference(array);			\
-		_item = &_array->items[0];				\
-		old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);	\
-		while ((_prog = READ_ONCE(_item->prog))) {		\
-			run_ctx.prog_item = _item;			\
-			func_ret = func(_prog, ctx);			\
-			_ret &= (func_ret & 1);				\
-			*(ret_flags) |= (func_ret >> 1);		\
-			_item++;					\
-		}							\
-		bpf_reset_run_ctx(old_run_ctx);				\
-		rcu_read_unlock();					\
-		migrate_enable();					\
-		_ret;							\
-	 })
-
-#define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null, set_cg_storage)	\
-	({						\
-		struct bpf_prog_array_item *_item;	\
-		struct bpf_prog *_prog;			\
-		struct bpf_prog_array *_array;		\
-		struct bpf_run_ctx *old_run_ctx;	\
-		struct bpf_cg_run_ctx run_ctx;		\
-		u32 _ret = 1;				\
-		migrate_disable();			\
-		rcu_read_lock();			\
-		_array = rcu_dereference(array);	\
-		if (unlikely(check_non_null && !_array))\
-			goto _out;			\
-		_item = &_array->items[0];		\
-		old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);\
-		while ((_prog = READ_ONCE(_item->prog))) {	\
-			run_ctx.prog_item = _item;	\
-			_ret &= func(_prog, ctx);	\
-			_item++;			\
-		}					\
-		bpf_reset_run_ctx(old_run_ctx);		\
-_out:							\
-		rcu_read_unlock();			\
-		migrate_enable();			\
-		_ret;					\
-	 })
+typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx);
+
+static __always_inline u32
+BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
+			    const void *ctx, bpf_prog_run_fn run_prog,
+			    u32 *ret_flags)
+{
+	const struct bpf_prog_array_item *item;
+	const struct bpf_prog *prog;
+	const struct bpf_prog_array *array;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_cg_run_ctx run_ctx;
+	u32 ret = 1;
+	u32 func_ret;
+
+	migrate_disable();
+	rcu_read_lock();
+	array = rcu_dereference(array_rcu);
+	item = &array->items[0];
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	while ((prog = READ_ONCE(item->prog))) {
+		run_ctx.prog_item = item;
+		func_ret = run_prog(prog, ctx);
+		ret &= (func_ret & 1);
+		*(ret_flags) |= (func_ret >> 1);
+		item++;
+	}
+	bpf_reset_run_ctx(old_run_ctx);
+	rcu_read_unlock();
+	migrate_enable();
+	return ret;
+}
+
+static __always_inline u32
+BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
+		      const void *ctx, bpf_prog_run_fn run_prog)
+{
+	const struct bpf_prog_array_item *item;
+	const struct bpf_prog *prog;
+	const struct bpf_prog_array *array;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_cg_run_ctx run_ctx;
+	u32 ret = 1;
+
+	migrate_disable();
+	rcu_read_lock();
+	array = rcu_dereference(array_rcu);
+	item = &array->items[0];
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	while ((prog = READ_ONCE(item->prog))) {
+		run_ctx.prog_item = item;
+		ret &= run_prog(prog, ctx);
+		item++;
+	}
+	bpf_reset_run_ctx(old_run_ctx);
+	rcu_read_unlock();
+	migrate_enable();
+	return ret;
+}
+
+static __always_inline u32
+BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
+		   const void *ctx, bpf_prog_run_fn run_prog)
+{
+	const struct bpf_prog_array_item *item;
+	const struct bpf_prog *prog;
+	const struct bpf_prog_array *array;
+	u32 ret = 1;
+
+	migrate_disable();
+	rcu_read_lock();
+	array = rcu_dereference(array_rcu);
+	if (unlikely(!array))
+		goto out;
+	item = &array->items[0];
+	while ((prog = READ_ONCE(item->prog))) {
+		ret &= run_prog(prog, ctx);
+		item++;
+	}
+out:
+	rcu_read_unlock();
+	migrate_enable();
+	return ret;
+}
 
 /* To be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs
  * so BPF programs can request cwr for TCP packets.
@@ -1235,7 +1292,7 @@ _out:							\
 		u32 _flags = 0;				\
 		bool _cn;				\
 		u32 _ret;				\
-		_ret = BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, &_flags); \
+		_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, &_flags); \
 		_cn = _flags & BPF_RET_SET_CN;		\
 		if (_ret)				\
 			_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);	\
@@ -1244,12 +1301,6 @@ _out:							\
 		_ret;					\
 	})
 
-#define BPF_PROG_RUN_ARRAY(array, ctx, func)		\
-	__BPF_PROG_RUN_ARRAY(array, ctx, func, false, true)
-
-#define BPF_PROG_RUN_ARRAY_CHECK(array, ctx, func)	\
-	__BPF_PROG_RUN_ARRAY(array, ctx, func, true, false)
-
 #ifdef CONFIG_BPF_SYSCALL
 DECLARE_PER_CPU(int, bpf_prog_active);
 extern struct mutex bpf_stats_enabled_mutex;
@@ -1284,20 +1335,6 @@ static inline void bpf_enable_instrumentation(void)
 	migrate_enable();
 }
 
-static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
-{
-	struct bpf_run_ctx *old_ctx;
-
-	old_ctx = current->bpf_ctx;
-	current->bpf_ctx = new_ctx;
-	return old_ctx;
-}
-
-static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
-{
-	current->bpf_ctx = old_ctx;
-}
-
 extern const struct file_operations bpf_map_fops;
 extern const struct file_operations bpf_prog_fops;
 extern const struct file_operations bpf_iter_fops;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 18518e321ce4..830b98999a68 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -711,7 +711,7 @@ static inline void bpf_restore_data_end(
 	cb->data_end = saved_data_end;
 }
 
-static inline u8 *bpf_skb_cb(struct sk_buff *skb)
+static inline u8 *bpf_skb_cb(const struct sk_buff *skb)
 {
 	/* eBPF programs may read/write skb->cb[] area to transfer meta
 	 * data between tail calls. Since this also needs to work with
@@ -732,8 +732,9 @@ static inline u8 *bpf_skb_cb(struct sk_buff *skb)
 
 /* Must be invoked with migration disabled */
 static inline u32 __bpf_prog_run_save_cb(const struct bpf_prog *prog,
-					 struct sk_buff *skb)
+					 const void *ctx)
 {
+	const struct sk_buff *skb = ctx;
 	u8 *cb_data = bpf_skb_cb(skb);
 	u8 cb_saved[BPF_SKB_CB_LEN];
 	u32 res;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index b567ca46555c..dd2c0d4ae4f8 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1012,8 +1012,8 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 		ret = BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(
 			cgrp->bpf.effective[type], skb, __bpf_prog_run_save_cb);
 	} else {
-		ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
-					  __bpf_prog_run_save_cb);
+		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], skb,
+					    __bpf_prog_run_save_cb);
 		ret = (ret == 1 ? 0 : -EPERM);
 	}
 	bpf_restore_data_end(skb, saved_data_end);
@@ -1043,7 +1043,7 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	int ret;
 
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], sk, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], sk, BPF_PROG_RUN);
 	return ret == 1 ? 0 : -EPERM;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
@@ -1090,8 +1090,8 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 	}
 
 	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
-	ret = BPF_PROG_RUN_ARRAY_FLAGS(cgrp->bpf.effective[type], &ctx,
-				       BPF_PROG_RUN, flags);
+	ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(cgrp->bpf.effective[type], &ctx,
+				          BPF_PROG_RUN, flags);
 
 	return ret == 1 ? 0 : -EPERM;
 }
@@ -1120,8 +1120,8 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	int ret;
 
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], sock_ops,
-				 BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], sock_ops,
+				    BPF_PROG_RUN);
 	return ret == 1 ? 0 : -EPERM;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sock_ops);
@@ -1139,8 +1139,8 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 
 	rcu_read_lock();
 	cgrp = task_dfl_cgroup(current);
-	allow = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx,
-				   BPF_PROG_RUN);
+	allow = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], &ctx,
+				      BPF_PROG_RUN);
 	rcu_read_unlock();
 
 	return !allow;
@@ -1271,7 +1271,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 
 	rcu_read_lock();
 	cgrp = task_dfl_cgroup(current);
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
 	rcu_read_unlock();
 
 	kfree(ctx.cur_val);
@@ -1385,8 +1385,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 	}
 
 	lock_sock(sk);
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_SETSOCKOPT],
-				 &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[BPF_CGROUP_SETSOCKOPT],
+				    &ctx, BPF_PROG_RUN);
 	release_sock(sk);
 
 	if (!ret) {
@@ -1495,8 +1495,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	}
 
 	lock_sock(sk);
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
-				 &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
+				    &ctx, BPF_PROG_RUN);
 	release_sock(sk);
 
 	if (!ret) {
@@ -1556,8 +1556,8 @@ int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
 	 * be called if that data shouldn't be "exported".
 	 */
 
-	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
-				 &ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
+				    &ctx, BPF_PROG_RUN);
 	if (!ret)
 		return -EPERM;
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c5e0b6a64091..b427eac10780 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -124,7 +124,7 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 	 * out of events when it was updated in between this and the
 	 * rcu_dereference() which is accepted risk.
 	 */
-	ret = BPF_PROG_RUN_ARRAY_CHECK(call->prog_array, ctx, BPF_PROG_RUN);
+	ret = BPF_PROG_RUN_ARRAY(call->prog_array, ctx, bpf_prog_run);
 
  out:
 	__this_cpu_dec(bpf_prog_active);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Yonghong Song

Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
input argument, which makes it easier to re-use for other internal uses
(coming up for BPF link in the next patch). BPF program FD is not as
convenient and in some cases it's not available. So switch to struct bpf_prog,
move out refcounting outside and let caller do bpf_prog_put() in case of an
error. This follows the approach of most of the other BPF internal functions.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/core.c | 61 ++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 464917096e73..bf4689403498 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5574,7 +5574,7 @@ static inline int perf_fget_light(int fd, struct fd *p)
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
-static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd);
+static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
 			  struct perf_event_attr *attr);
 
@@ -5637,7 +5637,22 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 		return perf_event_set_filter(event, (void __user *)arg);
 
 	case PERF_EVENT_IOC_SET_BPF:
-		return perf_event_set_bpf_prog(event, arg);
+	{
+		struct bpf_prog *prog;
+		int err;
+
+		prog = bpf_prog_get(arg);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+
+		err = perf_event_set_bpf_prog(event, prog);
+		if (err) {
+			bpf_prog_put(prog);
+			return err;
+		}
+
+		return 0;
+	}
 
 	case PERF_EVENT_IOC_PAUSE_OUTPUT: {
 		struct perf_buffer *rb;
@@ -9923,10 +9938,8 @@ static void bpf_overflow_handler(struct perf_event *event,
 	event->orig_overflow_handler(event, data, regs);
 }
 
-static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
 {
-	struct bpf_prog *prog;
-
 	if (event->overflow_handler_context)
 		/* hw breakpoint or kernel counter */
 		return -EINVAL;
@@ -9934,9 +9947,8 @@ static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
 	if (event->prog)
 		return -EEXIST;
 
-	prog = bpf_prog_get_type(prog_fd, BPF_PROG_TYPE_PERF_EVENT);
-	if (IS_ERR(prog))
-		return PTR_ERR(prog);
+	if (prog->type != BPF_PROG_TYPE_PERF_EVENT)
+		return -EINVAL;
 
 	if (event->attr.precise_ip &&
 	    prog->call_get_stack &&
@@ -9952,7 +9964,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
 		 * attached to perf_sample_data, do not allow attaching BPF
 		 * program that calls bpf_get_[stack|stackid].
 		 */
-		bpf_prog_put(prog);
 		return -EPROTO;
 	}
 
@@ -9974,7 +9985,7 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
 	bpf_prog_put(prog);
 }
 #else
-static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
 {
 	return -EOPNOTSUPP;
 }
@@ -10002,14 +10013,12 @@ static inline bool perf_event_is_tracing(struct perf_event *event)
 	return false;
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	bool is_kprobe, is_tracepoint, is_syscall_tp;
-	struct bpf_prog *prog;
-	int ret;
 
 	if (!perf_event_is_tracing(event))
-		return perf_event_set_bpf_handler(event, prog_fd);
+		return perf_event_set_bpf_handler(event, prog);
 
 	is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE;
 	is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT;
@@ -10018,38 +10027,24 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
 		/* bpf programs can only be attached to u/kprobe or tracepoint */
 		return -EINVAL;
 
-	prog = bpf_prog_get(prog_fd);
-	if (IS_ERR(prog))
-		return PTR_ERR(prog);
-
 	if ((is_kprobe && prog->type != BPF_PROG_TYPE_KPROBE) ||
 	    (is_tracepoint && prog->type != BPF_PROG_TYPE_TRACEPOINT) ||
-	    (is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT)) {
-		/* valid fd, but invalid bpf program type */
-		bpf_prog_put(prog);
+	    (is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT))
 		return -EINVAL;
-	}
 
 	/* Kprobe override only works for kprobes, not uprobes. */
 	if (prog->kprobe_override &&
-	    !(event->tp_event->flags & TRACE_EVENT_FL_KPROBE)) {
-		bpf_prog_put(prog);
+	    !(event->tp_event->flags & TRACE_EVENT_FL_KPROBE))
 		return -EINVAL;
-	}
 
 	if (is_tracepoint || is_syscall_tp) {
 		int off = trace_event_get_offsets(event->tp_event);
 
-		if (prog->aux->max_ctx_offset > off) {
-			bpf_prog_put(prog);
+		if (prog->aux->max_ctx_offset > off)
 			return -EACCES;
-		}
 	}
 
-	ret = perf_event_attach_bpf_prog(event, prog);
-	if (ret)
-		bpf_prog_put(prog);
-	return ret;
+	return perf_event_attach_bpf_prog(event, prog);
 }
 
 static void perf_event_free_bpf_prog(struct perf_event *event)
@@ -10071,7 +10066,7 @@ static void perf_event_free_filter(struct perf_event *event)
 {
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
+static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	return -ENOENT;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30 17:24   ` Yonghong Song
  2021-07-30  5:34 ` [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links Andrii Nakryiko
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.

BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.

Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.

One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.

The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_types.h      |   3 +
 include/linux/trace_events.h   |   3 +
 include/uapi/linux/bpf.h       |   2 +
 kernel/bpf/syscall.c           | 105 ++++++++++++++++++++++++++++++---
 kernel/events/core.c           |  10 ++--
 tools/include/uapi/linux/bpf.h |   2 +
 6 files changed, 112 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a9db1eae6796..0a1ada7f174d 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
 #ifdef CONFIG_NET
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
 #endif
+#ifdef CONFIG_PERF_EVENTS
+BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
+#endif
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index ad413b382a3c..8ac92560d3a3 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -803,6 +803,9 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
 void perf_trace_buf_update(void *record, u16 type);
 void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
+void perf_event_free_bpf_prog(struct perf_event *event);
+
 void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
 void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
 void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2db6925e04f4..94fe8329b28f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -993,6 +993,7 @@ enum bpf_attach_type {
 	BPF_SK_SKB_VERDICT,
 	BPF_SK_REUSEPORT_SELECT,
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
+	BPF_PERF_EVENT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1006,6 +1007,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_PERF_EVENT = 7,
 
 	MAX_BPF_LINK_TYPE,
 };
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9a2068e39d23..80c03bedd6e6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2906,6 +2906,79 @@ static const struct bpf_link_ops bpf_raw_tp_link_lops = {
 	.fill_link_info = bpf_raw_tp_link_fill_link_info,
 };
 
+#ifdef CONFIG_PERF_EVENTS
+struct bpf_perf_link {
+	struct bpf_link link;
+	struct file *perf_file;
+};
+
+static void bpf_perf_link_release(struct bpf_link *link)
+{
+	struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
+	struct perf_event *event = perf_link->perf_file->private_data;
+
+	perf_event_free_bpf_prog(event);
+	fput(perf_link->perf_file);
+}
+
+static void bpf_perf_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link);
+
+	kfree(perf_link);
+}
+
+static const struct bpf_link_ops bpf_perf_link_lops = {
+	.release = bpf_perf_link_release,
+	.dealloc = bpf_perf_link_dealloc,
+};
+
+static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct bpf_link_primer link_primer;
+	struct bpf_perf_link *link;
+	struct perf_event *event;
+	struct file *perf_file;
+	int err;
+
+	if (attr->link_create.flags)
+		return -EINVAL;
+
+	perf_file = perf_event_get(attr->link_create.target_fd);
+	if (IS_ERR(perf_file))
+		return PTR_ERR(perf_file);
+
+	link = kzalloc(sizeof(*link), GFP_USER);
+	if (!link) {
+		err = -ENOMEM;
+		goto out_put_file;
+	}
+	bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog);
+	link->perf_file = perf_file;
+
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
+		kfree(link);
+		goto out_put_file;
+	}
+
+	event = perf_file->private_data;
+	err = perf_event_set_bpf_prog(event, prog);
+	if (err) {
+		bpf_link_cleanup(&link_primer);
+		goto out_put_file;
+	}
+	/* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */
+	bpf_prog_inc(prog);
+
+	return bpf_link_settle(&link_primer);
+
+out_put_file:
+	fput(perf_file);
+	return err;
+}
+#endif /* CONFIG_PERF_EVENTS */
+
 #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
 
 static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
@@ -4147,15 +4220,26 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	if (ret)
 		goto out;
 
-	if (prog->type == BPF_PROG_TYPE_EXT) {
+	switch (prog->type) {
+	case BPF_PROG_TYPE_EXT:
 		ret = tracing_bpf_link_attach(attr, uattr, prog);
 		goto out;
-	}
-
-	ptype = attach_type_to_prog_type(attr->link_create.attach_type);
-	if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
-		ret = -EINVAL;
-		goto out;
+	case BPF_PROG_TYPE_PERF_EVENT:
+	case BPF_PROG_TYPE_KPROBE:
+	case BPF_PROG_TYPE_TRACEPOINT:
+		if (attr->link_create.attach_type != BPF_PERF_EVENT) {
+			ret = -EINVAL;
+			goto out;
+		}
+		ptype = prog->type;
+		break;
+	default:
+		ptype = attach_type_to_prog_type(attr->link_create.attach_type);
+		if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
+			ret = -EINVAL;
+			goto out;
+		}
+		break;
 	}
 
 	switch (ptype) {
@@ -4179,6 +4263,13 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_PROG_TYPE_XDP:
 		ret = bpf_xdp_link_attach(attr, prog);
 		break;
+#endif
+#ifdef CONFIG_PERF_EVENTS
+	case BPF_PROG_TYPE_PERF_EVENT:
+	case BPF_PROG_TYPE_TRACEPOINT:
+	case BPF_PROG_TYPE_KPROBE:
+		ret = bpf_perf_link_attach(attr, prog);
+		break;
 #endif
 	default:
 		ret = -EINVAL;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bf4689403498..b125943599ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4697,7 +4697,6 @@ find_get_context(struct pmu *pmu, struct task_struct *task,
 }
 
 static void perf_event_free_filter(struct perf_event *event);
-static void perf_event_free_bpf_prog(struct perf_event *event);
 
 static void free_event_rcu(struct rcu_head *head)
 {
@@ -5574,7 +5573,6 @@ static inline int perf_fget_light(int fd, struct fd *p)
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
-static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
 			  struct perf_event_attr *attr);
 
@@ -10013,7 +10011,7 @@ static inline bool perf_event_is_tracing(struct perf_event *event)
 	return false;
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	bool is_kprobe, is_tracepoint, is_syscall_tp;
 
@@ -10047,7 +10045,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *pr
 	return perf_event_attach_bpf_prog(event, prog);
 }
 
-static void perf_event_free_bpf_prog(struct perf_event *event)
+void perf_event_free_bpf_prog(struct perf_event *event)
 {
 	if (!perf_event_is_tracing(event)) {
 		perf_event_free_bpf_handler(event);
@@ -10066,12 +10064,12 @@ static void perf_event_free_filter(struct perf_event *event)
 {
 }
 
-static int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 {
 	return -ENOENT;
 }
 
-static void perf_event_free_bpf_prog(struct perf_event *event)
+void perf_event_free_bpf_prog(struct perf_event *event)
 {
 }
 #endif /* CONFIG_EVENT_TRACING */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2db6925e04f4..94fe8329b28f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -993,6 +993,7 @@ enum bpf_attach_type {
 	BPF_SK_SKB_VERDICT,
 	BPF_SK_REUSEPORT_SELECT,
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
+	BPF_PERF_EVENT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1006,6 +1007,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_PERF_EVENT = 7,
 
 	MAX_BPF_LINK_TYPE,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30 22:02   ` Yonghong Song
  2021-08-09 23:30   ` Daniel Borkmann
  2021-07-30  5:34 ` [PATCH v3 bpf-next 06/14] bpf: add bpf_get_attach_cookie() BPF helper to access bpf_cookie value Andrii Nakryiko
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).

This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).

This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.

Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 drivers/media/rc/bpf-lirc.c    |  4 ++--
 include/linux/bpf.h            | 16 +++++++++++++++-
 include/linux/perf_event.h     |  1 +
 include/linux/trace_events.h   |  6 +++---
 include/uapi/linux/bpf.h       |  7 +++++++
 kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
 kernel/bpf/syscall.c           |  2 +-
 kernel/events/core.c           | 21 ++++++++++++++-------
 kernel/trace/bpf_trace.c       |  8 +++++---
 tools/include/uapi/linux/bpf.h |  7 +++++++
 10 files changed, 73 insertions(+), 28 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index afae0afe3f81..7490494273e4 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
 		goto unlock;
 	}
 
-	ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+	ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
 	if (ret < 0)
 		goto unlock;
 
@@ -193,7 +193,7 @@ static int lirc_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog)
 	}
 
 	old_array = lirc_rcu_dereference(raw->progs);
-	ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array);
+	ret = bpf_prog_array_copy(old_array, prog, NULL, 0, &new_array);
 	/*
 	 * Do not use bpf_prog_array_delete_safe() as we would end up
 	 * with a dummy entry in the array, and the we would free the
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9c44b56b698f..7da3826eeae5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1114,7 +1114,10 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
  */
 struct bpf_prog_array_item {
 	struct bpf_prog *prog;
-	struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
+	union {
+		struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
+		u64 bpf_cookie;
+	};
 };
 
 struct bpf_prog_array {
@@ -1140,6 +1143,7 @@ int bpf_prog_array_copy_info(struct bpf_prog_array *array,
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
+			u64 bpf_cookie,
 			struct bpf_prog_array **new_array);
 
 struct bpf_run_ctx {};
@@ -1149,6 +1153,11 @@ struct bpf_cg_run_ctx {
 	const struct bpf_prog_array_item *prog_item;
 };
 
+struct bpf_trace_run_ctx {
+	struct bpf_run_ctx run_ctx;
+	u64 bpf_cookie;
+};
+
 #ifdef CONFIG_BPF_SYSCALL
 static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
 {
@@ -1247,6 +1256,8 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
 	const struct bpf_prog_array_item *item;
 	const struct bpf_prog *prog;
 	const struct bpf_prog_array *array;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_trace_run_ctx run_ctx;
 	u32 ret = 1;
 
 	migrate_disable();
@@ -1254,11 +1265,14 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
 	array = rcu_dereference(array_rcu);
 	if (unlikely(!array))
 		goto out;
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 	item = &array->items[0];
 	while ((prog = READ_ONCE(item->prog))) {
+		run_ctx.bpf_cookie = item->bpf_cookie;
 		ret &= run_prog(prog, ctx);
 		item++;
 	}
+	bpf_reset_run_ctx(old_run_ctx);
 out:
 	rcu_read_unlock();
 	migrate_enable();
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2d510ad750ed..fe156a8170aa 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -762,6 +762,7 @@ struct perf_event {
 #ifdef CONFIG_BPF_SYSCALL
 	perf_overflow_handler_t		orig_overflow_handler;
 	struct bpf_prog			*prog;
+	u64				bpf_cookie;
 #endif
 
 #ifdef CONFIG_EVENT_TRACING
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8ac92560d3a3..8e0631a4b046 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
 
 #ifdef CONFIG_BPF_EVENTS
 unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
-int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
+int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
@@ -692,7 +692,7 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
 }
 
 static inline int
-perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
 {
 	return -EOPNOTSUPP;
 }
@@ -803,7 +803,7 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
 void perf_trace_buf_update(void *record, u16 type);
 void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
-int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_free_bpf_prog(struct perf_event *event);
 
 void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 94fe8329b28f..63ee482d50e1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1448,6 +1448,13 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			struct {
+				/* black box user-provided value passed through
+				 * to BPF program at the execution time and
+				 * accessible through bpf_get_attach_cookie() BPF helper
+				 */
+				__u64		bpf_cookie;
+			} perf_event;
 		};
 	} link_create;
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9b1577498373..6f2dd5f2d4ad 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2097,13 +2097,13 @@ int bpf_prog_array_update_at(struct bpf_prog_array *array, int index,
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
+			u64 bpf_cookie,
 			struct bpf_prog_array **new_array)
 {
 	int new_prog_cnt, carry_prog_cnt = 0;
-	struct bpf_prog_array_item *existing;
+	struct bpf_prog_array_item *existing, *new;
 	struct bpf_prog_array *array;
 	bool found_exclude = false;
-	int new_prog_idx = 0;
 
 	/* Figure out how many existing progs we need to carry over to
 	 * the new array.
@@ -2140,20 +2140,27 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 	array = bpf_prog_array_alloc(new_prog_cnt + 1, GFP_KERNEL);
 	if (!array)
 		return -ENOMEM;
+	new = array->items;
 
 	/* Fill in the new prog array */
 	if (carry_prog_cnt) {
 		existing = old_array->items;
-		for (; existing->prog; existing++)
-			if (existing->prog != exclude_prog &&
-			    existing->prog != &dummy_bpf_prog.prog) {
-				array->items[new_prog_idx++].prog =
-					existing->prog;
-			}
+		for (; existing->prog; existing++) {
+			if (existing->prog == exclude_prog ||
+			    existing->prog == &dummy_bpf_prog.prog)
+				continue;
+
+			new->prog = existing->prog;
+			new->bpf_cookie = existing->bpf_cookie;
+			new++;
+		}
 	}
-	if (include_prog)
-		array->items[new_prog_idx++].prog = include_prog;
-	array->items[new_prog_idx].prog = NULL;
+	if (include_prog) {
+		new->prog = include_prog;
+		new->bpf_cookie = bpf_cookie;
+		new++;
+	}
+	new->prog = NULL;
 	*new_array = array;
 	return 0;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 80c03bedd6e6..7420e1334ab2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2963,7 +2963,7 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro
 	}
 
 	event = perf_file->private_data;
-	err = perf_event_set_bpf_prog(event, prog);
+	err = perf_event_set_bpf_prog(event, prog, attr->link_create.perf_event.bpf_cookie);
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_file;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b125943599ce..5d1b6fca1910 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5643,7 +5643,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
-		err = perf_event_set_bpf_prog(event, prog);
+		err = perf_event_set_bpf_prog(event, prog, 0);
 		if (err) {
 			bpf_prog_put(prog);
 			return err;
@@ -9936,7 +9936,9 @@ static void bpf_overflow_handler(struct perf_event *event,
 	event->orig_overflow_handler(event, data, regs);
 }
 
-static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
+static int perf_event_set_bpf_handler(struct perf_event *event,
+				      struct bpf_prog *prog,
+				      u64 bpf_cookie)
 {
 	if (event->overflow_handler_context)
 		/* hw breakpoint or kernel counter */
@@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog
 	}
 
 	event->prog = prog;
+	event->bpf_cookie = bpf_cookie;
 	event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
 	WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
 	return 0;
@@ -9983,7 +9986,9 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
 	bpf_prog_put(prog);
 }
 #else
-static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog)
+static int perf_event_set_bpf_handler(struct perf_event *event,
+				      struct bpf_prog *prog,
+				      u64 bpf_cookie)
 {
 	return -EOPNOTSUPP;
 }
@@ -10011,12 +10016,13 @@ static inline bool perf_event_is_tracing(struct perf_event *event)
 	return false;
 }
 
-int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog,
+			    u64 bpf_cookie)
 {
 	bool is_kprobe, is_tracepoint, is_syscall_tp;
 
 	if (!perf_event_is_tracing(event))
-		return perf_event_set_bpf_handler(event, prog);
+		return perf_event_set_bpf_handler(event, prog, bpf_cookie);
 
 	is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE;
 	is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT;
@@ -10042,7 +10048,7 @@ int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 			return -EACCES;
 	}
 
-	return perf_event_attach_bpf_prog(event, prog);
+	return perf_event_attach_bpf_prog(event, prog, bpf_cookie);
 }
 
 void perf_event_free_bpf_prog(struct perf_event *event)
@@ -10064,7 +10070,8 @@ static void perf_event_free_filter(struct perf_event *event)
 {
 }
 
-int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
+int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog,
+			    u64 bpf_cookie)
 {
 	return -ENOENT;
 }
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b427eac10780..b6c30aa0173c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1674,7 +1674,8 @@ static DEFINE_MUTEX(bpf_event_mutex);
 #define BPF_TRACE_MAX_PROGS 64
 
 int perf_event_attach_bpf_prog(struct perf_event *event,
-			       struct bpf_prog *prog)
+			       struct bpf_prog *prog,
+			       u64 bpf_cookie)
 {
 	struct bpf_prog_array *old_array;
 	struct bpf_prog_array *new_array;
@@ -1701,12 +1702,13 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
 		goto unlock;
 	}
 
-	ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+	ret = bpf_prog_array_copy(old_array, NULL, prog, bpf_cookie, &new_array);
 	if (ret < 0)
 		goto unlock;
 
 	/* set the new array to event->tp_event and set event->prog */
 	event->prog = prog;
+	event->bpf_cookie = bpf_cookie;
 	rcu_assign_pointer(event->tp_event->prog_array, new_array);
 	bpf_prog_array_free(old_array);
 
@@ -1727,7 +1729,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
 		goto unlock;
 
 	old_array = bpf_event_rcu_dereference(event->tp_event->prog_array);
-	ret = bpf_prog_array_copy(old_array, event->prog, NULL, &new_array);
+	ret = bpf_prog_array_copy(old_array, event->prog, NULL, 0, &new_array);
 	if (ret == -ENOENT)
 		goto unlock;
 	if (ret < 0) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 94fe8329b28f..63ee482d50e1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1448,6 +1448,13 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			struct {
+				/* black box user-provided value passed through
+				 * to BPF program at the execution time and
+				 * accessible through bpf_get_attach_cookie() BPF helper
+				 */
+				__u64		bpf_cookie;
+			} perf_event;
 		};
 	} link_create;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 06/14] bpf: add bpf_get_attach_cookie() BPF helper to access bpf_cookie value
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30 22:13   ` Yonghong Song
  2021-07-30  5:34 ` [PATCH v3 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
to get access to a user-provided bpf_cookie value, specified during BPF
program attachment (BPF link creation) time.

Naming is hard, though. With the concept being named "BPF cookie", I've
considered calling the helper:
  - bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
    cookie;
  - bpf_get_bpf_cookie() -- too much tautology;
  - bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
    attach BPF program to BPF hook, it's still an "attachment" and the
    bpf_cookie is associated with BPF program attachment to a hook, not a BPF
    link itself. Technically, we could support bpf_cookie with old-style
    cgroup programs.So I ultimately rejected it in favor of
    bpf_get_attach_cookie().

Currently all perf_event-backed BPF program types support
bpf_get_attach_cookie() helper. Follow-up patches will add support for
fentry/fexit programs as well.

While at it, mark bpf_tracing_func_proto() as static to make it obvious that
it's only used from within the kernel/trace/bpf_trace.c.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  3 ---
 include/uapi/linux/bpf.h       | 16 ++++++++++++++++
 kernel/trace/bpf_trace.c       | 35 +++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h | 16 ++++++++++++++++
 4 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7da3826eeae5..e04ae3b94112 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2110,9 +2110,6 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
 
-const struct bpf_func_proto *bpf_tracing_func_proto(
-	enum bpf_func_id func_id, const struct bpf_prog *prog);
-
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 63ee482d50e1..c4f7892edb2b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4856,6 +4856,21 @@ union bpf_attr {
  * 		Get address of the traced function (for tracing and kprobe programs).
  * 	Return
  * 		Address of the traced function.
+ *
+ * u64 bpf_get_attach_cookie(void *ctx)
+ * 	Description
+ * 		Get bpf_cookie value provided (optionally) during the program
+ * 		attachment. It might be different for each individual
+ * 		attachment, even if BPF program itself is the same.
+ * 		Expects BPF program context *ctx* as a first argument.
+ *
+ * 		Supported for the following program types:
+ *			- kprobe/uprobe;
+ *			- tracepoint;
+ *			- perf_event.
+ * 	Return
+ *		Value specified by user at BPF link creation/attachment time
+ *		or 0, if it was not specified.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5032,6 +5047,7 @@ union bpf_attr {
 	FN(timer_start),		\
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
+	FN(get_attach_cookie),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b6c30aa0173c..afbc8be9653f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -975,7 +975,34 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
-const struct bpf_func_proto *
+BPF_CALL_1(bpf_get_attach_cookie_trace, void *, ctx)
+{
+	struct bpf_trace_run_ctx *run_ctx;
+
+	run_ctx = container_of(current->bpf_ctx, struct bpf_trace_run_ctx, run_ctx);
+	return run_ctx->bpf_cookie;
+}
+
+static const struct bpf_func_proto bpf_get_attach_cookie_proto_trace = {
+	.func		= bpf_get_attach_cookie_trace,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
+BPF_CALL_1(bpf_get_attach_cookie_pe, struct bpf_perf_event_data_kern *, ctx)
+{
+	return ctx->event->bpf_cookie;
+}
+
+static const struct bpf_func_proto bpf_get_attach_cookie_proto_pe = {
+	.func		= bpf_get_attach_cookie_pe,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
+static const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
@@ -1108,6 +1135,8 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 	case BPF_FUNC_get_func_ip:
 		return &bpf_get_func_ip_proto_kprobe;
+	case BPF_FUNC_get_attach_cookie:
+		return &bpf_get_attach_cookie_proto_trace;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
@@ -1218,6 +1247,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_stackid_proto_tp;
 	case BPF_FUNC_get_stack:
 		return &bpf_get_stack_proto_tp;
+	case BPF_FUNC_get_attach_cookie:
+		return &bpf_get_attach_cookie_proto_trace;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
@@ -1325,6 +1356,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_prog_read_value_proto;
 	case BPF_FUNC_read_branch_records:
 		return &bpf_read_branch_records_proto;
+	case BPF_FUNC_get_attach_cookie:
+		return &bpf_get_attach_cookie_proto_pe;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 63ee482d50e1..c4f7892edb2b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4856,6 +4856,21 @@ union bpf_attr {
  * 		Get address of the traced function (for tracing and kprobe programs).
  * 	Return
  * 		Address of the traced function.
+ *
+ * u64 bpf_get_attach_cookie(void *ctx)
+ * 	Description
+ * 		Get bpf_cookie value provided (optionally) during the program
+ * 		attachment. It might be different for each individual
+ * 		attachment, even if BPF program itself is the same.
+ * 		Expects BPF program context *ctx* as a first argument.
+ *
+ * 		Supported for the following program types:
+ *			- kprobe/uprobe;
+ *			- tracepoint;
+ *			- perf_event.
+ * 	Return
+ *		Value specified by user at BPF link creation/attachment time
+ *		or 0, if it was not specified.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5032,6 +5047,7 @@ union bpf_attr {
 	FN(timer_start),		\
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
+	FN(get_attach_cookie),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 06/14] bpf: add bpf_get_attach_cookie() BPF helper to access bpf_cookie value Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30 22:31   ` Yonghong Song
  2021-07-30  5:34 ` [PATCH v3 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc Andrii Nakryiko
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
changes to libbpf.map are not detected and versioned symbols mismatch error
will be reported until `make clean && make` is used, which is a suboptimal
developer experience.

Fixes: 306b267cb3c4 ("libbpf: Verify versioned symbols")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/Makefile | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index ec14aa725bb0..74c3b73a5fbe 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -4,8 +4,9 @@
 RM ?= rm
 srctree = $(abs_srctree)
 
+VERSION_SCRIPT := libbpf.map
 LIBBPF_VERSION := $(shell \
-	grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \
+	grep -oE '^LIBBPF_([0-9.]+)' $(VERSION_SCRIPT) | \
 	sort -rV | head -n1 | cut -d'_' -f2)
 LIBBPF_MAJOR_VERSION := $(firstword $(subst ., ,$(LIBBPF_VERSION)))
 
@@ -110,7 +111,6 @@ SHARED_OBJDIR	:= $(OUTPUT)sharedobjs/
 STATIC_OBJDIR	:= $(OUTPUT)staticobjs/
 BPF_IN_SHARED	:= $(SHARED_OBJDIR)libbpf-in.o
 BPF_IN_STATIC	:= $(STATIC_OBJDIR)libbpf-in.o
-VERSION_SCRIPT	:= libbpf.map
 BPF_HELPER_DEFS	:= $(OUTPUT)bpf_helper_defs.h
 
 LIB_TARGET	:= $(addprefix $(OUTPUT),$(LIB_TARGET))
@@ -163,10 +163,10 @@ $(BPF_HELPER_DEFS): $(srctree)/tools/include/uapi/linux/bpf.h
 
 $(OUTPUT)libbpf.so: $(OUTPUT)libbpf.so.$(LIBBPF_VERSION)
 
-$(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED)
+$(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED) $(VERSION_SCRIPT)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) \
 		--shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
-		-Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -lz -o $@
+		-Wl,--version-script=$(VERSION_SCRIPT) $< -lelf -lz -o $@
 	@ln -sf $(@F) $(OUTPUT)libbpf.so
 	@ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
 
@@ -181,7 +181,7 @@ $(OUTPUT)libbpf.pc:
 
 check: check_abi
 
-check_abi: $(OUTPUT)libbpf.so
+check_abi: $(OUTPUT)libbpf.so $(VERSION_SCRIPT)
 	@if [ "$(GLOBAL_SYM_COUNT)" != "$(VERSIONED_SYM_COUNT)" ]; then	 \
 		echo "Warning: Num of global symbols in $(BPF_IN_SHARED)"	 \
 		     "($(GLOBAL_SYM_COUNT)) does NOT match with num of"	 \
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel Andrii Nakryiko
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Rafael David Tinoco

bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
to override deallocation procedure, with default doing plain free(link). This
is necessary for cases when we want to "subclass" struct bpf_link to keep
extra information, as is the case in the next patch adding struct
bpf_link_perf.

Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 313883179919..9654b2569ed0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8808,7 +8808,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 
 struct bpf_link {
 	int (*detach)(struct bpf_link *link);
-	int (*destroy)(struct bpf_link *link);
+	void (*dealloc)(struct bpf_link *link);
 	char *pin_path;		/* NULL, if not pinned */
 	int fd;			/* hook FD, -1 if not applicable */
 	bool disconnected;
@@ -8847,11 +8847,12 @@ int bpf_link__destroy(struct bpf_link *link)
 
 	if (!link->disconnected && link->detach)
 		err = link->detach(link);
-	if (link->destroy)
-		link->destroy(link);
 	if (link->pin_path)
 		free(link->pin_path);
-	free(link);
+	if (link->dealloc)
+		link->dealloc(link);
+	else
+		free(link);
 
 	return libbpf_err(err);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 10/14] libbpf: add bpf_cookie support to bpf_link_create() API Andrii Nakryiko
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Rafael David Tinoco

Detect kernel support for BPF perf link and prefer it when attaching to
perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
open until BPF link is destroyed, at which point both perf_event FD and BPF
link FD will be closed.

This preserves current behavior in which perf_event FD is open for the
duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
will keep both perf_event and bpf_link FDs open, so it will be up to
(advanced) user to close them. This approach is demonstrated in bpf_cookie.c
selftests, added in this patch set.

Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 111 +++++++++++++++++++++++++++++++++--------
 1 file changed, 90 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9654b2569ed0..d2230454e2d9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -193,6 +193,8 @@ enum kern_feature_id {
 	FEAT_MODULE_BTF,
 	/* BTF_KIND_FLOAT support */
 	FEAT_BTF_FLOAT,
+	/* BPF perf link support */
+	FEAT_PERF_LINK,
 	__FEAT_CNT,
 };
 
@@ -4337,6 +4339,37 @@ static int probe_module_btf(void)
 	return !err;
 }
 
+static int probe_perf_link(void)
+{
+	struct bpf_load_program_attr attr;
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd, link_fd, err;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.prog_type = BPF_PROG_TYPE_TRACEPOINT;
+	attr.insns = insns;
+	attr.insns_cnt = ARRAY_SIZE(insns);
+	attr.license = "GPL";
+	prog_fd = bpf_load_program_xattr(&attr, NULL, 0);
+	if (prog_fd < 0)
+		return -errno;
+
+	/* use invalid perf_event FD to get EBADF, if link is supported;
+	 * otherwise EINVAL should be returned
+	 */
+	link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
+	err = -errno; /* close() can clobber errno */
+
+	if (link_fd >= 0)
+		close(link_fd);
+	close(prog_fd);
+
+	return link_fd < 0 && err == -EBADF;
+}
+
 enum kern_feature_result {
 	FEAT_UNKNOWN = 0,
 	FEAT_SUPPORTED = 1,
@@ -4387,6 +4420,9 @@ static struct kern_feature_desc {
 	[FEAT_BTF_FLOAT] = {
 		"BTF_KIND_FLOAT support", probe_kern_btf_float,
 	},
+	[FEAT_PERF_LINK] = {
+		"BPF perf link support", probe_perf_link,
+	},
 };
 
 static bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
@@ -8949,23 +8985,38 @@ int bpf_link__unpin(struct bpf_link *link)
 	return 0;
 }
 
-static int bpf_link__detach_perf_event(struct bpf_link *link)
+struct bpf_link_perf {
+	struct bpf_link link;
+	int perf_event_fd;
+};
+
+static int bpf_link_perf_detach(struct bpf_link *link)
 {
-	int err;
+	struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
+	int err = 0;
 
-	err = ioctl(link->fd, PERF_EVENT_IOC_DISABLE, 0);
-	if (err)
+	if (ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0) < 0)
 		err = -errno;
 
+	if (perf_link->perf_event_fd != link->fd)
+		close(perf_link->perf_event_fd);
 	close(link->fd);
+
 	return libbpf_err(err);
 }
 
+static void bpf_link_perf_dealloc(struct bpf_link *link)
+{
+	struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
+
+	free(perf_link);
+}
+
 struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pfd)
 {
 	char errmsg[STRERR_BUFSIZE];
-	struct bpf_link *link;
-	int prog_fd, err;
+	struct bpf_link_perf *link;
+	int prog_fd, link_fd = -1, err;
 
 	if (pfd < 0) {
 		pr_warn("prog '%s': invalid perf event FD %d\n",
@@ -8982,27 +9033,45 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 	link = calloc(1, sizeof(*link));
 	if (!link)
 		return libbpf_err_ptr(-ENOMEM);
-	link->detach = &bpf_link__detach_perf_event;
-	link->fd = pfd;
+	link->link.detach = &bpf_link_perf_detach;
+	link->link.dealloc = &bpf_link_perf_dealloc;
+	link->perf_event_fd = pfd;
 
-	if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
-		err = -errno;
-		free(link);
-		pr_warn("prog '%s': failed to attach to pfd %d: %s\n",
-			prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
-		if (err == -EPROTO)
-			pr_warn("prog '%s': try add PERF_SAMPLE_CALLCHAIN to or remove exclude_callchain_[kernel|user] from pfd %d\n",
-				prog->name, pfd);
-		return libbpf_err_ptr(err);
+	if (kernel_supports(prog->obj, FEAT_PERF_LINK)) {
+		link_fd = bpf_link_create(prog_fd, pfd, BPF_PERF_EVENT, NULL);
+		if (link_fd < 0) {
+			err = -errno;
+			pr_warn("prog '%s': failed to create BPF link for perf_event FD %d: %d (%s)\n",
+				prog->name, pfd,
+				err, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+			goto err_out;
+		}
+		link->link.fd = link_fd;
+	} else {
+		if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
+			err = -errno;
+			pr_warn("prog '%s': failed to attach to perf_event FD %d: %s\n",
+				prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+			if (err == -EPROTO)
+				pr_warn("prog '%s': try add PERF_SAMPLE_CALLCHAIN to or remove exclude_callchain_[kernel|user] from pfd %d\n",
+					prog->name, pfd);
+			goto err_out;
+		}
+		link->link.fd = pfd;
 	}
 	if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
 		err = -errno;
-		free(link);
-		pr_warn("prog '%s': failed to enable pfd %d: %s\n",
+		pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
 			prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
-		return libbpf_err_ptr(err);
+		goto err_out;
 	}
-	return link;
+
+	return &link->link;
+err_out:
+	if (link_fd >= 0)
+		close(link_fd);
+	free(link);
+	return libbpf_err_ptr(err);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 10/14] libbpf: add bpf_cookie support to bpf_link_create() API
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 11/14] libbpf: add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add ability to specify bpf_cookie value when creating BPF perf link with
bpf_link_create() low-level API.

Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
and corresponding OPTS struct to accomodate such changes. Add extra checks to
prevent using incompatible/unexpected combinations of fields.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c             | 32 +++++++++++++++++++++++++-------
 tools/lib/bpf/bpf.h             |  8 +++++++-
 tools/lib/bpf/libbpf_internal.h | 32 ++++++++++++++++++++++----------
 3 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 86dcac44f32f..2401fad090c5 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -684,8 +684,13 @@ int bpf_link_create(int prog_fd, int target_fd,
 	iter_info_len = OPTS_GET(opts, iter_info_len, 0);
 	target_btf_id = OPTS_GET(opts, target_btf_id, 0);
 
-	if (iter_info_len && target_btf_id)
-		return libbpf_err(-EINVAL);
+	/* validate we don't have unexpected combinations of non-zero fields */
+	if (iter_info_len || target_btf_id) {
+		if (iter_info_len && target_btf_id)
+			return libbpf_err(-EINVAL);
+		if (!OPTS_ZEROED(opts, target_btf_id))
+			return libbpf_err(-EINVAL);
+	}
 
 	memset(&attr, 0, sizeof(attr));
 	attr.link_create.prog_fd = prog_fd;
@@ -693,14 +698,27 @@ int bpf_link_create(int prog_fd, int target_fd,
 	attr.link_create.attach_type = attach_type;
 	attr.link_create.flags = OPTS_GET(opts, flags, 0);
 
-	if (iter_info_len) {
-		attr.link_create.iter_info =
-			ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
-		attr.link_create.iter_info_len = iter_info_len;
-	} else if (target_btf_id) {
+	if (target_btf_id) {
 		attr.link_create.target_btf_id = target_btf_id;
+		goto proceed;
 	}
 
+	switch (attach_type) {
+	case BPF_TRACE_ITER:
+		attr.link_create.iter_info = ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
+		attr.link_create.iter_info_len = iter_info_len;
+		break;
+	case BPF_PERF_EVENT:
+		attr.link_create.perf_event.bpf_cookie = OPTS_GET(opts, perf_event.bpf_cookie, 0);
+		if (!OPTS_ZEROED(opts, perf_event))
+			return libbpf_err(-EINVAL);
+		break;
+	default:
+		if (!OPTS_ZEROED(opts, flags))
+			return libbpf_err(-EINVAL);
+		break;
+	}
+proceed:
 	fd = sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
 	return libbpf_err_errno(fd);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 4f758f8f50cd..6fffb3cdf39b 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -177,8 +177,14 @@ struct bpf_link_create_opts {
 	union bpf_iter_link_info *iter_info;
 	__u32 iter_info_len;
 	__u32 target_btf_id;
+	union {
+		struct {
+			__u64 bpf_cookie;
+		} perf_event;
+	};
+	size_t :0;
 };
-#define bpf_link_create_opts__last_field target_btf_id
+#define bpf_link_create_opts__last_field perf_event
 
 LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
 			       enum bpf_attach_type attach_type,
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index f7b691d5f9eb..533b0211f40a 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -196,6 +196,17 @@ void *libbpf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz,
 		     size_t cur_cnt, size_t max_cnt, size_t add_cnt);
 int libbpf_ensure_mem(void **data, size_t *cap_cnt, size_t elem_sz, size_t need_cnt);
 
+static inline bool libbpf_is_mem_zeroed(const char *p, ssize_t len)
+{
+	while (len > 0) {
+		if (*p)
+			return false;
+		p++;
+		len--;
+	}
+	return true;
+}
+
 static inline bool libbpf_validate_opts(const char *opts,
 					size_t opts_sz, size_t user_sz,
 					const char *type_name)
@@ -204,16 +215,9 @@ static inline bool libbpf_validate_opts(const char *opts,
 		pr_warn("%s size (%zu) is too small\n", type_name, user_sz);
 		return false;
 	}
-	if (user_sz > opts_sz) {
-		size_t i;
-
-		for (i = opts_sz; i < user_sz; i++) {
-			if (opts[i]) {
-				pr_warn("%s has non-zero extra bytes\n",
-					type_name);
-				return false;
-			}
-		}
+	if (!libbpf_is_mem_zeroed(opts + opts_sz, (ssize_t)user_sz - opts_sz)) {
+		pr_warn("%s has non-zero extra bytes\n", type_name);
+		return false;
 	}
 	return true;
 }
@@ -233,6 +237,14 @@ static inline bool libbpf_validate_opts(const char *opts,
 			(opts)->field = value;	\
 	} while (0)
 
+#define OPTS_ZEROED(opts, last_nonzero_field)				      \
+({									      \
+	ssize_t __off = offsetofend(typeof(*(opts)), last_nonzero_field);     \
+	!(opts) || libbpf_is_mem_zeroed((const void *)opts + __off,	      \
+					(opts)->sz - __off);		      \
+})
+
+
 int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
 int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
 int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 11/14] libbpf: add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 10/14] libbpf: add bpf_cookie support to bpf_link_create() API Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API Andrii Nakryiko
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra, Rafael David Tinoco

Wire through bpf_cookie for all attach APIs that use perf_event_open under the
hood:
  - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
  - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
    pass bpf_cookie through opts.

For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
bpf_cookie is not supported either, return error and log warning for user.

Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c   | 78 +++++++++++++++++++++++++++++++++-------
 tools/lib/bpf/libbpf.h   | 71 +++++++++++++++++++++++++++++-------
 tools/lib/bpf/libbpf.map |  3 ++
 3 files changed, 127 insertions(+), 25 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d2230454e2d9..b8455641c9d5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -9012,12 +9012,16 @@ static void bpf_link_perf_dealloc(struct bpf_link *link)
 	free(perf_link);
 }
 
-struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pfd)
+struct bpf_link *bpf_program__attach_perf_event_opts(struct bpf_program *prog, int pfd,
+						     const struct bpf_perf_event_opts *opts)
 {
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link_perf *link;
 	int prog_fd, link_fd = -1, err;
 
+	if (!OPTS_VALID(opts, bpf_perf_event_opts))
+		return libbpf_err_ptr(-EINVAL);
+
 	if (pfd < 0) {
 		pr_warn("prog '%s': invalid perf event FD %d\n",
 			prog->name, pfd);
@@ -9038,7 +9042,10 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 	link->perf_event_fd = pfd;
 
 	if (kernel_supports(prog->obj, FEAT_PERF_LINK)) {
-		link_fd = bpf_link_create(prog_fd, pfd, BPF_PERF_EVENT, NULL);
+		DECLARE_LIBBPF_OPTS(bpf_link_create_opts, link_opts,
+			.perf_event.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0));
+
+		link_fd = bpf_link_create(prog_fd, pfd, BPF_PERF_EVENT, &link_opts);
 		if (link_fd < 0) {
 			err = -errno;
 			pr_warn("prog '%s': failed to create BPF link for perf_event FD %d: %d (%s)\n",
@@ -9048,6 +9055,12 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 		}
 		link->link.fd = link_fd;
 	} else {
+		if (OPTS_GET(opts, bpf_cookie, 0)) {
+			pr_warn("prog '%s': user context value is not supported\n", prog->name);
+			err = -EOPNOTSUPP;
+			goto err_out;
+		}
+
 		if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
 			err = -errno;
 			pr_warn("prog '%s': failed to attach to perf_event FD %d: %s\n",
@@ -9074,6 +9087,11 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pf
 	return libbpf_err_ptr(err);
 }
 
+struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int pfd)
+{
+	return bpf_program__attach_perf_event_opts(prog, pfd, NULL);
+}
+
 /*
  * this function is expected to parse integer in the range of [0, 2^31-1] from
  * given file using scanf format string fmt. If actual parsed value is
@@ -9182,8 +9200,9 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
 struct bpf_link *
 bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 				const char *func_name,
-				struct bpf_kprobe_opts *opts)
+				const struct bpf_kprobe_opts *opts)
 {
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	unsigned long offset;
@@ -9195,6 +9214,7 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 
 	retprobe = OPTS_GET(opts, retprobe, false);
 	offset = OPTS_GET(opts, offset, 0);
+	pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
 
 	pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name,
 				    offset, -1 /* pid */);
@@ -9204,7 +9224,7 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 			libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(pfd);
 	}
-	link = bpf_program__attach_perf_event(prog, pfd);
+	link = bpf_program__attach_perf_event_opts(prog, pfd, &pe_opts);
 	err = libbpf_get_error(link);
 	if (err) {
 		close(pfd);
@@ -9259,14 +9279,22 @@ static struct bpf_link *attach_kprobe(const struct bpf_sec_def *sec,
 	return link;
 }
 
-struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
-					    bool retprobe, pid_t pid,
-					    const char *binary_path,
-					    size_t func_offset)
+LIBBPF_API struct bpf_link *
+bpf_program__attach_uprobe_opts(struct bpf_program *prog, pid_t pid,
+				const char *binary_path, size_t func_offset,
+				const struct bpf_uprobe_opts *opts)
 {
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	int pfd, err;
+	bool retprobe;
+
+	if (!OPTS_VALID(opts, bpf_uprobe_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	retprobe = OPTS_GET(opts, retprobe, false);
+	pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
 
 	pfd = perf_event_open_probe(true /* uprobe */, retprobe,
 				    binary_path, func_offset, pid);
@@ -9277,7 +9305,7 @@ struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
 			libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(pfd);
 	}
-	link = bpf_program__attach_perf_event(prog, pfd);
+	link = bpf_program__attach_perf_event_opts(prog, pfd, &pe_opts);
 	err = libbpf_get_error(link);
 	if (err) {
 		close(pfd);
@@ -9290,6 +9318,16 @@ struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
 	return link;
 }
 
+struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
+					    bool retprobe, pid_t pid,
+					    const char *binary_path,
+					    size_t func_offset)
+{
+	DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts, .retprobe = retprobe);
+
+	return bpf_program__attach_uprobe_opts(prog, pid, binary_path, func_offset, &opts);
+}
+
 static int determine_tracepoint_id(const char *tp_category,
 				   const char *tp_name)
 {
@@ -9340,14 +9378,21 @@ static int perf_event_open_tracepoint(const char *tp_category,
 	return pfd;
 }
 
-struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
-						const char *tp_category,
-						const char *tp_name)
+struct bpf_link *bpf_program__attach_tracepoint_opts(struct bpf_program *prog,
+						     const char *tp_category,
+						     const char *tp_name,
+						     const struct bpf_tracepoint_opts *opts)
 {
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	int pfd, err;
 
+	if (!OPTS_VALID(opts, bpf_tracepoint_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
+
 	pfd = perf_event_open_tracepoint(tp_category, tp_name);
 	if (pfd < 0) {
 		pr_warn("prog '%s': failed to create tracepoint '%s/%s' perf event: %s\n",
@@ -9355,7 +9400,7 @@ struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
 			libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(pfd);
 	}
-	link = bpf_program__attach_perf_event(prog, pfd);
+	link = bpf_program__attach_perf_event_opts(prog, pfd, &pe_opts);
 	err = libbpf_get_error(link);
 	if (err) {
 		close(pfd);
@@ -9367,6 +9412,13 @@ struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
 	return link;
 }
 
+struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
+						const char *tp_category,
+						const char *tp_name)
+{
+	return bpf_program__attach_tracepoint_opts(prog, tp_category, tp_name, NULL);
+}
+
 static struct bpf_link *attach_tp(const struct bpf_sec_def *sec,
 				  struct bpf_program *prog)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 1271d99bb7aa..1f4a67285365 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -104,17 +104,6 @@ struct bpf_object_open_opts {
 };
 #define bpf_object_open_opts__last_field btf_custom_path
 
-struct bpf_kprobe_opts {
-	/* size of this struct, for forward/backward compatiblity */
-	size_t sz;
-	/* function's offset to install kprobe to */
-	unsigned long offset;
-	/* kprobe is return probe */
-	bool retprobe;
-	size_t :0;
-};
-#define bpf_kprobe_opts__last_field retprobe
-
 LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
 LIBBPF_API struct bpf_object *
 bpf_object__open_file(const char *path, const struct bpf_object_open_opts *opts);
@@ -255,24 +244,82 @@ LIBBPF_API int bpf_link__destroy(struct bpf_link *link);
 
 LIBBPF_API struct bpf_link *
 bpf_program__attach(struct bpf_program *prog);
+
+struct bpf_perf_event_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_attach_cookie() */
+	__u64 bpf_cookie;
+};
+#define bpf_perf_event_opts__last_field bpf_cookie
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_perf_event(struct bpf_program *prog, int pfd);
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_perf_event_opts(struct bpf_program *prog, int pfd,
+				    const struct bpf_perf_event_opts *opts);
+
+struct bpf_kprobe_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_attach_cookie() */
+	__u64 bpf_cookie;
+	/* function's offset to install kprobe to */
+	unsigned long offset;
+	/* kprobe is return probe */
+	bool retprobe;
+	size_t :0;
+};
+#define bpf_kprobe_opts__last_field retprobe
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_kprobe(struct bpf_program *prog, bool retprobe,
 			   const char *func_name);
 LIBBPF_API struct bpf_link *
 bpf_program__attach_kprobe_opts(struct bpf_program *prog,
                                 const char *func_name,
-                                struct bpf_kprobe_opts *opts);
+                                const struct bpf_kprobe_opts *opts);
+
+struct bpf_uprobe_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_attach_cookie() */
+	__u64 bpf_cookie;
+	/* uprobe is return probe, invoked at function return time */
+	bool retprobe;
+	size_t :0;
+};
+#define bpf_uprobe_opts__last_field retprobe
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_uprobe(struct bpf_program *prog, bool retprobe,
 			   pid_t pid, const char *binary_path,
 			   size_t func_offset);
+LIBBPF_API struct bpf_link *
+bpf_program__attach_uprobe_opts(struct bpf_program *prog, pid_t pid,
+				const char *binary_path, size_t func_offset,
+				const struct bpf_uprobe_opts *opts);
+
+struct bpf_tracepoint_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_attach_cookie() */
+	__u64 bpf_cookie;
+};
+#define bpf_tracepoint_opts__last_field bpf_cookie
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_tracepoint(struct bpf_program *prog,
 			       const char *tp_category,
 			       const char *tp_name);
 LIBBPF_API struct bpf_link *
+bpf_program__attach_tracepoint_opts(struct bpf_program *prog,
+				    const char *tp_category,
+				    const char *tp_name,
+				    const struct bpf_tracepoint_opts *opts);
+
+LIBBPF_API struct bpf_link *
 bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
 				   const char *tp_name);
 LIBBPF_API struct bpf_link *
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 5aca3686ca5e..090aca3452bc 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -374,6 +374,9 @@ LIBBPF_0.5.0 {
 		bpf_map__pin_path;
 		bpf_map_lookup_and_delete_elem_flags;
 		bpf_program__attach_kprobe_opts;
+		bpf_program__attach_perf_event_opts;
+		bpf_program__attach_tracepoint_opts;
+		bpf_program__attach_uprobe_opts;
 		bpf_object__gen_loader;
 		btf__load_from_kernel_by_id;
 		btf__load_from_kernel_by_id_split;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (10 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 11/14] libbpf: add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h} Andrii Nakryiko
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add tests utilizing low-level bpf_link_create() API to create perf BPF link.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/perf_link.c      | 89 +++++++++++++++++++
 .../selftests/bpf/progs/test_perf_link.c      | 16 ++++
 2 files changed, 105 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_link.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_link.c b/tools/testing/selftests/bpf/prog_tests/perf_link.c
new file mode 100644
index 000000000000..b1abd0c46607
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_link.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <test_progs.h>
+#include "test_perf_link.skel.h"
+
+static void burn_cpu(void)
+{
+	volatile int j = 0;
+	cpu_set_t cpu_set;
+	int i, err;
+
+	/* generate some branches on cpu 0 */
+	CPU_ZERO(&cpu_set);
+	CPU_SET(0, &cpu_set);
+	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+	ASSERT_OK(err, "set_thread_affinity");
+
+	/* spin the loop for a while (random high number) */
+	for (i = 0; i < 1000000; ++i)
+		++j;
+}
+
+void test_perf_link(void)
+{
+	struct test_perf_link *skel = NULL;
+	struct perf_event_attr attr;
+	int pfd = -1, link_fd = -1, err;
+	int run_cnt_before, run_cnt_after;
+	struct bpf_link_info info;
+	__u32 info_len = sizeof(info);
+
+	/* create perf event */
+	memset(&attr, 0, sizeof(attr));
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_SOFTWARE;
+	attr.config = PERF_COUNT_SW_CPU_CLOCK;
+	attr.freq = 1;
+	attr.sample_freq = 4000;
+	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+	if (!ASSERT_GE(pfd, 0, "perf_fd"))
+		goto cleanup;
+
+	skel = test_perf_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	link_fd = bpf_link_create(bpf_program__fd(skel->progs.handler), pfd,
+				  BPF_PERF_EVENT, NULL);
+	if (!ASSERT_GE(link_fd, 0, "link_fd"))
+		goto cleanup;
+
+	memset(&info, 0, sizeof(info));
+	err = bpf_obj_get_info_by_fd(link_fd, &info, &info_len);
+	if (!ASSERT_OK(err, "link_get_info"))
+		goto cleanup;
+
+	ASSERT_EQ(info.type, BPF_LINK_TYPE_PERF_EVENT, "link_type");
+	ASSERT_GT(info.id, 0, "link_id");
+	ASSERT_GT(info.prog_id, 0, "link_prog_id");
+
+	/* ensure we get at least one perf_event prog execution */
+	burn_cpu();
+	ASSERT_GT(skel->bss->run_cnt, 0, "run_cnt");
+
+	/* perf_event is still active, but we close link and BPF program
+	 * shouldn't be executed anymore
+	 */
+	close(link_fd);
+	link_fd = -1;
+
+	/* make sure there are no stragglers */
+	kern_sync_rcu();
+
+	run_cnt_before = skel->bss->run_cnt;
+	burn_cpu();
+	run_cnt_after = skel->bss->run_cnt;
+
+	ASSERT_EQ(run_cnt_before, run_cnt_after, "run_cnt_before_after");
+
+cleanup:
+	if (link_fd >= 0)
+		close(link_fd);
+	if (pfd >= 0)
+		close(pfd);
+	test_perf_link__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_link.c b/tools/testing/selftests/bpf/progs/test_perf_link.c
new file mode 100644
index 000000000000..c1db9fd98d0c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_link.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int run_cnt = 0;
+
+SEC("perf_event")
+int handler(struct pt_regs *ctx)
+{
+	__sync_fetch_and_add(&run_cnt, 1);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h}
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (11 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-07-30  5:34 ` [PATCH v3 bpf-next 14/14] selftests/bpf: add bpf_cookie selftests for high-level APIs Andrii Nakryiko
  2021-08-01  5:11 ` [PATCH bpf-next v4] libbpf: introduce legacy kprobe events support Rafael David Tinoco
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Extract two helpers used for working with uprobes into trace_helpers.{c,h} to
be re-used between multiple uprobe-using selftests. Also rename get_offset()
into more appropriate get_uprobe_offset().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/attach_probe.c   | 61 +----------------
 tools/testing/selftests/bpf/trace_helpers.c   | 66 +++++++++++++++++++
 tools/testing/selftests/bpf/trace_helpers.h   |  3 +
 3 files changed, 70 insertions(+), 60 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index ec11e20d2b92..e40b41c44f8b 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -2,65 +2,6 @@
 #include <test_progs.h>
 #include "test_attach_probe.skel.h"
 
-#if defined(__powerpc64__) && defined(_CALL_ELF) && _CALL_ELF == 2
-
-#define OP_RT_RA_MASK   0xffff0000UL
-#define LIS_R2          0x3c400000UL
-#define ADDIS_R2_R12    0x3c4c0000UL
-#define ADDI_R2_R2      0x38420000UL
-
-static ssize_t get_offset(ssize_t addr, ssize_t base)
-{
-	u32 *insn = (u32 *) addr;
-
-	/*
-	 * A PPC64 ABIv2 function may have a local and a global entry
-	 * point. We need to use the local entry point when patching
-	 * functions, so identify and step over the global entry point
-	 * sequence.
-	 *
-	 * The global entry point sequence is always of the form:
-	 *
-	 * addis r2,r12,XXXX
-	 * addi  r2,r2,XXXX
-	 *
-	 * A linker optimisation may convert the addis to lis:
-	 *
-	 * lis   r2,XXXX
-	 * addi  r2,r2,XXXX
-	 */
-	if ((((*insn & OP_RT_RA_MASK) == ADDIS_R2_R12) ||
-	     ((*insn & OP_RT_RA_MASK) == LIS_R2)) &&
-	    ((*(insn + 1) & OP_RT_RA_MASK) == ADDI_R2_R2))
-		return (ssize_t)(insn + 2) - base;
-	else
-		return addr - base;
-}
-#else
-#define get_offset(addr, base) (addr - base)
-#endif
-
-ssize_t get_base_addr() {
-	size_t start, offset;
-	char buf[256];
-	FILE *f;
-
-	f = fopen("/proc/self/maps", "r");
-	if (!f)
-		return -errno;
-
-	while (fscanf(f, "%zx-%*x %s %zx %*[^\n]\n",
-		      &start, buf, &offset) == 3) {
-		if (strcmp(buf, "r-xp") == 0) {
-			fclose(f);
-			return start - offset;
-		}
-	}
-
-	fclose(f);
-	return -EINVAL;
-}
-
 void test_attach_probe(void)
 {
 	int duration = 0;
@@ -74,7 +15,7 @@ void test_attach_probe(void)
 	if (CHECK(base_addr < 0, "get_base_addr",
 		  "failed to find base addr: %zd", base_addr))
 		return;
-	uprobe_offset = get_offset((size_t)&get_base_addr, base_addr);
+	uprobe_offset = get_uprobe_offset(&get_base_addr, base_addr);
 
 	skel = test_attach_probe__open_and_load();
 	if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 1bbd1d9830c8..381dafce1d8f 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -136,3 +136,69 @@ void read_trace_pipe(void)
 		}
 	}
 }
+
+#if defined(__powerpc64__) && defined(_CALL_ELF) && _CALL_ELF == 2
+
+#define OP_RT_RA_MASK   0xffff0000UL
+#define LIS_R2          0x3c400000UL
+#define ADDIS_R2_R12    0x3c4c0000UL
+#define ADDI_R2_R2      0x38420000UL
+
+ssize_t get_uprobe_offset(const void *addr, ssize_t base)
+{
+	u32 *insn = (u32 *)(uintptr_t)addr;
+
+	/*
+	 * A PPC64 ABIv2 function may have a local and a global entry
+	 * point. We need to use the local entry point when patching
+	 * functions, so identify and step over the global entry point
+	 * sequence.
+	 *
+	 * The global entry point sequence is always of the form:
+	 *
+	 * addis r2,r12,XXXX
+	 * addi  r2,r2,XXXX
+	 *
+	 * A linker optimisation may convert the addis to lis:
+	 *
+	 * lis   r2,XXXX
+	 * addi  r2,r2,XXXX
+	 */
+	if ((((*insn & OP_RT_RA_MASK) == ADDIS_R2_R12) ||
+	     ((*insn & OP_RT_RA_MASK) == LIS_R2)) &&
+	    ((*(insn + 1) & OP_RT_RA_MASK) == ADDI_R2_R2))
+		return (ssize_t)(insn + 2) - base;
+	else
+		return (uintptr_t)addr - base;
+}
+
+#else
+
+ssize_t get_uprobe_offset(const void *addr, ssize_t base)
+{
+	return (uintptr_t)addr - base;
+}
+
+#endif
+
+ssize_t get_base_addr(void)
+{
+	size_t start, offset;
+	char buf[256];
+	FILE *f;
+
+	f = fopen("/proc/self/maps", "r");
+	if (!f)
+		return -errno;
+
+	while (fscanf(f, "%zx-%*x %s %zx %*[^\n]\n",
+		      &start, buf, &offset) == 3) {
+		if (strcmp(buf, "r-xp") == 0) {
+			fclose(f);
+			return start - offset;
+		}
+	}
+
+	fclose(f);
+	return -EINVAL;
+}
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
index f62fdef9e589..3d9435b3dd3b 100644
--- a/tools/testing/selftests/bpf/trace_helpers.h
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -18,4 +18,7 @@ int kallsyms_find(const char *sym, unsigned long long *addr);
 
 void read_trace_pipe(void);
 
+ssize_t get_uprobe_offset(const void *addr, ssize_t base);
+ssize_t get_base_addr(void);
+
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 bpf-next 14/14] selftests/bpf: add bpf_cookie selftests for high-level APIs
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (12 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h} Andrii Nakryiko
@ 2021-07-30  5:34 ` Andrii Nakryiko
  2021-08-01  5:11 ` [PATCH bpf-next v4] libbpf: introduce legacy kprobe events support Rafael David Tinoco
  14 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-07-30  5:34 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Peter Zijlstra

Add selftest with few subtests testing proper bpf_cookie usage.

Kprobe and uprobe subtests are pretty straightforward and just validate that
the same BPF program attached with different bpf_cookie will be triggered with
those different bpf_cookie values.

Tracepoint subtest is a bit more interesting, as it is the only
perf_event-based BPF hook that shares bpf_prog_array between multiple
perf_events internally. This means that the same BPF program can't be attached
to the same tracepoint multiple times. So we have 3 identical copies. This
arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array
list manipulation logic when programs are attached and detached.  The test
validates that bpf_cookie isn't mixed up and isn't lost during such list
manipulations.

Perf_event subtest validates that two BPF links can be created against the
same perf_event (but not at the same time, only one BPF program can be
attached to perf_event itself), and that for each we can specify different
bpf_cookie value.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/bpf_cookie.c     | 254 ++++++++++++++++++
 .../selftests/bpf/progs/test_bpf_cookie.c     |  85 ++++++
 2 files changed, 339 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_cookie.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
new file mode 100644
index 000000000000..5eea3c3a40fe
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
@@ -0,0 +1,254 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <test_progs.h>
+#include "test_bpf_cookie.skel.h"
+
+static void kprobe_subtest(struct test_bpf_cookie *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts);
+	struct bpf_link *link1 = NULL, *link2 = NULL;
+	struct bpf_link *retlink1 = NULL, *retlink2 = NULL;
+
+	/* attach two kprobes */
+	opts.bpf_cookie = 0x1;
+	opts.retprobe = false;
+	link1 = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+						 SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(link1, "link1"))
+		goto cleanup;
+
+	opts.bpf_cookie = 0x2;
+	opts.retprobe = false;
+	link2 = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+						 SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(link2, "link2"))
+		goto cleanup;
+
+	/* attach two kretprobes */
+	opts.bpf_cookie = 0x10;
+	opts.retprobe = true;
+	retlink1 = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+						    SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(retlink1, "retlink1"))
+		goto cleanup;
+
+	opts.bpf_cookie = 0x20;
+	opts.retprobe = true;
+	retlink2 = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+						    SYS_NANOSLEEP_KPROBE_NAME, &opts);
+	if (!ASSERT_OK_PTR(retlink2, "retlink2"))
+		goto cleanup;
+
+	/* trigger kprobe && kretprobe */
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->kprobe_res, 0x1 | 0x2, "kprobe_res");
+	ASSERT_EQ(skel->bss->kretprobe_res, 0x10 | 0x20, "kretprobe_res");
+
+cleanup:
+	bpf_link__destroy(link1);
+	bpf_link__destroy(link2);
+	bpf_link__destroy(retlink1);
+	bpf_link__destroy(retlink2);
+}
+
+static void uprobe_subtest(struct test_bpf_cookie *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts);
+	struct bpf_link *link1 = NULL, *link2 = NULL;
+	struct bpf_link *retlink1 = NULL, *retlink2 = NULL;
+	size_t uprobe_offset;
+	ssize_t base_addr;
+
+	base_addr = get_base_addr();
+	uprobe_offset = get_uprobe_offset(&get_base_addr, base_addr);
+
+	/* attach two uprobes */
+	opts.bpf_cookie = 0x100;
+	opts.retprobe = false;
+	link1 = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe, 0 /* self pid */,
+						"/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(link1, "link1"))
+		goto cleanup;
+
+	opts.bpf_cookie = 0x200;
+	opts.retprobe = false;
+	link2 = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe, -1 /* any pid */,
+						"/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(link2, "link2"))
+		goto cleanup;
+
+	/* attach two uretprobes */
+	opts.bpf_cookie = 0x1000;
+	opts.retprobe = true;
+	retlink1 = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe, -1 /* any pid */,
+						   "/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(retlink1, "retlink1"))
+		goto cleanup;
+
+	opts.bpf_cookie = 0x2000;
+	opts.retprobe = true;
+	retlink2 = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe, 0 /* self pid */,
+						   "/proc/self/exe", uprobe_offset, &opts);
+	if (!ASSERT_OK_PTR(retlink2, "retlink2"))
+		goto cleanup;
+
+	/* trigger uprobe && uretprobe */
+	get_base_addr();
+
+	ASSERT_EQ(skel->bss->uprobe_res, 0x100 | 0x200, "uprobe_res");
+	ASSERT_EQ(skel->bss->uretprobe_res, 0x1000 | 0x2000, "uretprobe_res");
+
+cleanup:
+	bpf_link__destroy(link1);
+	bpf_link__destroy(link2);
+	bpf_link__destroy(retlink1);
+	bpf_link__destroy(retlink2);
+}
+
+static void tp_subtest(struct test_bpf_cookie *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tracepoint_opts, opts);
+	struct bpf_link *link1 = NULL, *link2 = NULL, *link3 = NULL;
+
+	/* attach first tp prog */
+	opts.bpf_cookie = 0x10000;
+	link1 = bpf_program__attach_tracepoint_opts(skel->progs.handle_tp1,
+						    "syscalls", "sys_enter_nanosleep", &opts);
+	if (!ASSERT_OK_PTR(link1, "link1"))
+		goto cleanup;
+
+	/* attach second tp prog */
+	opts.bpf_cookie = 0x20000;
+	link2 = bpf_program__attach_tracepoint_opts(skel->progs.handle_tp2,
+						    "syscalls", "sys_enter_nanosleep", &opts);
+	if (!ASSERT_OK_PTR(link2, "link2"))
+		goto cleanup;
+
+	/* trigger tracepoints */
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->tp_res, 0x10000 | 0x20000, "tp_res1");
+
+	/* now we detach first prog and will attach third one, which causes
+	 * two internal calls to bpf_prog_array_copy(), shuffling
+	 * bpf_prog_array_items around. We test here that we don't lose track
+	 * of associated bpf_cookies.
+	 */
+	bpf_link__destroy(link1);
+	link1 = NULL;
+	kern_sync_rcu();
+	skel->bss->tp_res = 0;
+
+	/* attach third tp prog */
+	opts.bpf_cookie = 0x40000;
+	link3 = bpf_program__attach_tracepoint_opts(skel->progs.handle_tp3,
+						    "syscalls", "sys_enter_nanosleep", &opts);
+	if (!ASSERT_OK_PTR(link3, "link3"))
+		goto cleanup;
+
+	/* trigger tracepoints */
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->tp_res, 0x20000 | 0x40000, "tp_res2");
+
+cleanup:
+	bpf_link__destroy(link1);
+	bpf_link__destroy(link2);
+	bpf_link__destroy(link3);
+}
+
+static void burn_cpu(void)
+{
+	volatile int j = 0;
+	cpu_set_t cpu_set;
+	int i, err;
+
+	/* generate some branches on cpu 0 */
+	CPU_ZERO(&cpu_set);
+	CPU_SET(0, &cpu_set);
+	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+	ASSERT_OK(err, "set_thread_affinity");
+
+	/* spin the loop for a while (random high number) */
+	for (i = 0; i < 1000000; ++i)
+		++j;
+}
+
+static void pe_subtest(struct test_bpf_cookie *skel)
+{
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, opts);
+	struct bpf_link *link = NULL;
+	struct perf_event_attr attr;
+	int pfd = -1;
+
+	/* create perf event */
+	memset(&attr, 0, sizeof(attr));
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_SOFTWARE;
+	attr.config = PERF_COUNT_SW_CPU_CLOCK;
+	attr.freq = 1;
+	attr.sample_freq = 4000;
+	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+	if (!ASSERT_GE(pfd, 0, "perf_fd"))
+		goto cleanup;
+
+	opts.bpf_cookie = 0x100000;
+	link = bpf_program__attach_perf_event_opts(skel->progs.handle_pe, pfd, &opts);
+	if (!ASSERT_OK_PTR(link, "link1"))
+		goto cleanup;
+
+	burn_cpu(); /* trigger BPF prog */
+
+	ASSERT_EQ(skel->bss->pe_res, 0x100000, "pe_res1");
+
+	/* prevent bpf_link__destroy() closing pfd itself */
+	bpf_link__disconnect(link);
+	/* close BPF link's FD explicitly */
+	close(bpf_link__fd(link));
+	/* free up memory used by struct bpf_link */
+	bpf_link__destroy(link);
+	link = NULL;
+	kern_sync_rcu();
+	skel->bss->pe_res = 0;
+
+	opts.bpf_cookie = 0x200000;
+	link = bpf_program__attach_perf_event_opts(skel->progs.handle_pe, pfd, &opts);
+	if (!ASSERT_OK_PTR(link, "link2"))
+		goto cleanup;
+
+	burn_cpu(); /* trigger BPF prog */
+
+	ASSERT_EQ(skel->bss->pe_res, 0x200000, "pe_res2");
+
+cleanup:
+	close(pfd);
+	bpf_link__destroy(link);
+}
+
+void test_bpf_cookie(void)
+{
+	struct test_bpf_cookie *skel;
+
+	skel = test_bpf_cookie__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	skel->bss->my_tid = syscall(SYS_gettid);
+
+	if (test__start_subtest("kprobe"))
+		kprobe_subtest(skel);
+	if (test__start_subtest("uprobe"))
+		uprobe_subtest(skel);
+	if (test__start_subtest("tracepoint"))
+		tp_subtest(skel);
+	if (test__start_subtest("perf_event"))
+		pe_subtest(skel);
+
+	test_bpf_cookie__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_cookie.c b/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
new file mode 100644
index 000000000000..2d3a7710e2ce
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int my_tid;
+
+int kprobe_res;
+int kprobe_multi_res;
+int kretprobe_res;
+int uprobe_res;
+int uretprobe_res;
+int tp_res;
+int pe_res;
+
+static void update(void *ctx, int *res)
+{
+	if (my_tid != (u32)bpf_get_current_pid_tgid())
+		return;
+
+	*res |= bpf_get_attach_cookie(ctx);
+}
+
+SEC("kprobe/sys_nanosleep")
+int handle_kprobe(struct pt_regs *ctx)
+{
+	update(ctx, &kprobe_res);
+	return 0;
+}
+
+SEC("kretprobe/sys_nanosleep")
+int handle_kretprobe(struct pt_regs *ctx)
+{
+	update(ctx, &kretprobe_res);
+	return 0;
+}
+
+SEC("uprobe/trigger_func")
+int handle_uprobe(struct pt_regs *ctx)
+{
+	update(ctx, &uprobe_res);
+	return 0;
+}
+
+SEC("uretprobe/trigger_func")
+int handle_uretprobe(struct pt_regs *ctx)
+{
+	update(ctx, &uretprobe_res);
+	return 0;
+}
+
+/* bpf_prog_array, used by kernel internally to keep track of attached BPF
+ * programs to a given BPF hook (e.g., for tracepoints) doesn't allow the same
+ * BPF program to be attached multiple times. So have three identical copies
+ * ready to attach to the same tracepoint.
+ */
+SEC("tp/syscalls/sys_enter_nanosleep")
+int handle_tp1(struct pt_regs *ctx)
+{
+	update(ctx, &tp_res);
+	return 0;
+}
+SEC("tp/syscalls/sys_enter_nanosleep")
+int handle_tp2(struct pt_regs *ctx)
+{
+	update(ctx, &tp_res);
+	return 0;
+}
+SEC("tp/syscalls/sys_enter_nanosleep")
+int handle_tp3(void *ctx)
+{
+	update(ctx, &tp_res);
+	return 1;
+}
+
+SEC("perf_event")
+int handle_pe(struct pt_regs *ctx)
+{
+	update(ctx, &pe_res);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function
  2021-07-30  5:34 ` [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
@ 2021-07-30 17:19   ` Yonghong Song
  2021-08-09 22:43   ` Daniel Borkmann
  1 sibling, 0 replies; 40+ messages in thread
From: Yonghong Song @ 2021-07-30 17:19 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/29/21 10:34 PM, Andrii Nakryiko wrote:
> Turn BPF_PROG_RUN into a proper always inlined function. No functional and
> performance changes are intended, but it makes it much easier to understand
> what's going on with how BPF programs are actually get executed. It's more
> obvious what types and callbacks are expected. Also extra () around input
> parameters can be dropped, as well as `__` variable prefixes intended to avoid
> naming collisions, which makes the code simpler to read and write.
> 
> This refactoring also highlighted one possible issue. BPF_PROG_RUN is both
> a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
> BPF_PROG_RUN into a function causes naming conflict compilation error. So
> rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
> bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. To avoid unnecessary code
> churn across many networking calls to BPF_PROG_RUN, #define BPF_PROG_RUN as an
> alias to bpf_prog_run.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 04/14] bpf: implement minimal BPF perf link
  2021-07-30  5:34 ` [PATCH v3 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
@ 2021-07-30 17:24   ` Yonghong Song
  0 siblings, 0 replies; 40+ messages in thread
From: Yonghong Song @ 2021-07-30 17:24 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/29/21 10:34 PM, Andrii Nakryiko wrote:
> Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
> BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
> the common BPF link infrastructure, allowing to list all active perf_event
> based attachments, auto-detaching BPF program from perf_event when link's FD
> is closed, get generic BPF link fdinfo/get_info functionality.
> 
> BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
> are currently supported.
> 
> Force-detaching and atomic BPF program updates are not yet implemented, but
> with perf_event-based BPF links we now have common framework for this without
> the need to extend ioctl()-based perf_event interface.
> 
> One interesting consideration is a new value for bpf_attach_type, which
> BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
> bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
> bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
> BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
> program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
> mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
> define a single BPF_PERF_EVENT attach type for all of them and adjust
> link_create()'s logic for checking correspondence between attach type and
> program type.
> 
> The alternative would be to define three new attach types (e.g., BPF_KPROBE,
> BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
> and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
> libbpf. I chose to not do this to avoid unnecessary proliferation of
> bpf_attach_type enum values and not have to deal with naming conflicts.
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links
  2021-07-30  5:34 ` [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links Andrii Nakryiko
@ 2021-07-30 22:02   ` Yonghong Song
  2021-08-09 23:30   ` Daniel Borkmann
  1 sibling, 0 replies; 40+ messages in thread
From: Yonghong Song @ 2021-07-30 22:02 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/29/21 10:34 PM, Andrii Nakryiko wrote:
> Add ability for users to specify custom u64 value (bpf_cookie) when creating
> BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
> tracepoints).
> 
> This is useful for cases when the same BPF program is used for attaching and
> processing invocation of different tracepoints/kprobes/uprobes in a generic
> fashion, but such that each invocation is distinguished from each other (e.g.,
> BPF program can look up additional information associated with a specific
> kernel function without having to rely on function IP lookups). This enables
> new use cases to be implemented simply and efficiently that previously were
> possible only through code generation (and thus multiple instances of almost
> identical BPF program) or compilation at runtime (BCC-style) on target hosts
> (even more expensive resource-wise). For uprobes it is not even possible in
> some cases to know function IP before hand (e.g., when attaching to shared
> library without PID filtering, in which case base load address is not known
> for a library).
> 
> This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
> corresponding to each attached and run BPF program. Given cgroup BPF programs
> already use two 8-byte pointers for their needs and cgroup BPF programs don't
> have (yet?) support for bpf_cookie, reuse that space through union of
> cgroup_storage and new bpf_cookie field.
> 
> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> program execution code, which luckily is now also split from
> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> giving access to this user-provided cookie value from inside a BPF program.
> Generic perf_event BPF programs will access this value from perf_event itself
> through passed in BPF program context.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 06/14] bpf: add bpf_get_attach_cookie() BPF helper to access bpf_cookie value
  2021-07-30  5:34 ` [PATCH v3 bpf-next 06/14] bpf: add bpf_get_attach_cookie() BPF helper to access bpf_cookie value Andrii Nakryiko
@ 2021-07-30 22:13   ` Yonghong Song
  0 siblings, 0 replies; 40+ messages in thread
From: Yonghong Song @ 2021-07-30 22:13 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/29/21 10:34 PM, Andrii Nakryiko wrote:
> Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
> to get access to a user-provided bpf_cookie value, specified during BPF
> program attachment (BPF link creation) time.
> 
> Naming is hard, though. With the concept being named "BPF cookie", I've
> considered calling the helper:
>    - bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
>      cookie;
>    - bpf_get_bpf_cookie() -- too much tautology;
>    - bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
>      attach BPF program to BPF hook, it's still an "attachment" and the
>      bpf_cookie is associated with BPF program attachment to a hook, not a BPF
>      link itself. Technically, we could support bpf_cookie with old-style
>      cgroup programs.So I ultimately rejected it in favor of
>      bpf_get_attach_cookie().
> 
> Currently all perf_event-backed BPF program types support
> bpf_get_attach_cookie() helper. Follow-up patches will add support for
> fentry/fexit programs as well.
> 
> While at it, mark bpf_tracing_func_proto() as static to make it obvious that
> it's only used from within the kernel/trace/bpf_trace.c.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes
  2021-07-30  5:34 ` [PATCH v3 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
@ 2021-07-30 22:31   ` Yonghong Song
  0 siblings, 0 replies; 40+ messages in thread
From: Yonghong Song @ 2021-07-30 22:31 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Peter Zijlstra



On 7/29/21 10:34 PM, Andrii Nakryiko wrote:
> Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
> changes to libbpf.map are not detected and versioned symbols mismatch error
> will be reported until `make clean && make` is used, which is a suboptimal
> developer experience.
> 
> Fixes: 306b267cb3c4 ("libbpf: Verify versioned symbols")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH bpf-next v4] libbpf: introduce legacy kprobe events support
  2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
                   ` (13 preceding siblings ...)
  2021-07-30  5:34 ` [PATCH v3 bpf-next 14/14] selftests/bpf: add bpf_cookie selftests for high-level APIs Andrii Nakryiko
@ 2021-08-01  5:11 ` Rafael David Tinoco
  2021-08-06 22:29   ` Andrii Nakryiko
  14 siblings, 1 reply; 40+ messages in thread
From: Rafael David Tinoco @ 2021-08-01  5:11 UTC (permalink / raw)
  To: bpf; +Cc: rafaeldtinoco, andrii.nakryiko

Allow kprobe tracepoint events creation through legacy interface, as the
kprobe dynamic PMUs support, used by default, was only created in v4.17.

After commit "bpf: implement minimal BPF perf link", it was allowed that
some extra - to the link - information is accessed through container_of
struct bpf_link. This allows the tracing perf event legacy name, and
information whether it is a retprobe, to be saved outside bpf_link
structure, which would not be optimal.

This enables CO.RE support for older kernels.

Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
---
 tools/lib/bpf/libbpf.c | 127 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 125 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e1b7b2b6618c..40037340a3e7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8985,9 +8985,55 @@ int bpf_link__unpin(struct bpf_link *link)
 	return 0;
 }
 
+static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_t offset) {
+	int fd, ret = 0;
+	pid_t p = getpid();
+	char cmd[192] = {}, probename[128] = {}, probefunc[128] = {};
+	const char *file = "/sys/kernel/debug/tracing/kprobe_events";
+
+	if (retprobe)
+		snprintf(probename, sizeof(probename), "kretprobes/%s_libbpf_%u", name, p);
+	else
+		snprintf(probename, sizeof(probename), "kprobes/%s_libbpf_%u", name, p);
+
+	if (offset)
+		snprintf(probefunc, sizeof(probefunc), "%s+%lu", name, offset);
+
+	if (add) {
+		snprintf(cmd, sizeof(cmd), "%c:%s %s",
+			 retprobe ? 'r' : 'p',
+			 probename,
+			 offset ? probefunc : name);
+	} else {
+		snprintf(cmd, sizeof(cmd), "-:%s", probename);
+	}
+
+	fd = open(file, O_WRONLY | O_APPEND, 0);
+	if (!fd)
+		return -errno;
+	ret = write(fd, cmd, strlen(cmd));
+	if (ret < 0)
+		ret = -errno;
+	close(fd);
+
+	return ret;
+}
+
+static inline int add_kprobe_event_legacy(const char *name, bool retprobe, uint64_t offset)
+{
+	return poke_kprobe_events(true, name, retprobe, offset);
+}
+
+static inline int remove_kprobe_event_legacy(const char *name, bool retprobe)
+{
+	return poke_kprobe_events(false, name, retprobe, 0);
+}
+
 struct bpf_link_perf {
 	struct bpf_link link;
 	int perf_event_fd;
+	char *legacy_name;
+	bool is_retprobe;
 };
 
 static int bpf_link_perf_detach(struct bpf_link *link)
@@ -9002,6 +9048,10 @@ static int bpf_link_perf_detach(struct bpf_link *link)
 		close(perf_link->perf_event_fd);
 	close(link->fd);
 
+	/* legacy kprobe needs to be removed after perf event fd closure */
+	if (perf_link->legacy_name)
+		remove_kprobe_event_legacy(perf_link->legacy_name, perf_link->is_retprobe);
+
 	return libbpf_err(err);
 }
 
@@ -9009,6 +9059,9 @@ static void bpf_link_perf_dealloc(struct bpf_link *link)
 {
 	struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
 
+	if (perf_link->legacy_name)
+		free(perf_link->legacy_name);
+
 	free(perf_link);
 }
 
@@ -9122,6 +9175,26 @@ static int parse_uint_from_file(const char *file, const char *fmt)
 	return ret;
 }
 
+static bool determine_kprobe_legacy(void)
+{
+	const char *file = "/sys/bus/event_source/devices/kprobe/type";
+
+	return access(file, 0) == 0 ? false : true;
+}
+
+static int determine_kprobe_perf_type_legacy(const char *func_name, bool is_retprobe)
+{
+	char file[192];
+
+	const char *fname = "/sys/kernel/debug/tracing/events/%s/%s_libbpf_%d/id";
+
+	snprintf(file, sizeof(file), fname,
+		 is_retprobe ? "kretprobes" : "kprobes",
+		 func_name, getpid());
+
+	return parse_uint_from_file(file, "%d\n");
+}
+
 static int determine_kprobe_perf_type(void)
 {
 	const char *file = "/sys/bus/event_source/devices/kprobe/type";
@@ -9197,6 +9270,41 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
 	return pfd;
 }
 
+static int perf_event_kprobe_open_legacy(bool retprobe, const char *name, uint64_t offset, int pid)
+{
+	struct perf_event_attr attr = {};
+	char errmsg[STRERR_BUFSIZE];
+	int type, pfd, err;
+
+	err = add_kprobe_event_legacy(name, retprobe, offset);
+	if (err < 0) {
+		pr_warn("failed to add legacy kprobe event: %s\n",
+			libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+		return err;
+	}
+	type = determine_kprobe_perf_type_legacy(name, retprobe);
+	if (type < 0) {
+		pr_warn("failed to determine legacy kprobe event id: %s\n",
+			libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
+		return type;
+	}
+	attr.size = sizeof(attr);
+	attr.config = type;
+	attr.type = PERF_TYPE_TRACEPOINT;
+
+	pfd = syscall(__NR_perf_event_open, &attr,
+		      pid < 0 ? -1 : pid, /* pid */
+		      pid == -1 ? 0 : -1, /* cpu */
+		      -1 /* group_fd */,  PERF_FLAG_FD_CLOEXEC);
+	if (pfd < 0) {
+		err = -errno;
+		pr_warn("legacy kprobe perf_event_open() failed: %s\n",
+			libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+		return err;
+	}
+	return pfd;
+}
+
 struct bpf_link *
 bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 				const char *func_name,
@@ -9208,6 +9316,7 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 	unsigned long offset;
 	bool retprobe;
 	int pfd, err;
+	bool legacy;
 
 	if (!OPTS_VALID(opts, bpf_kprobe_opts))
 		return libbpf_err_ptr(-EINVAL);
@@ -9216,8 +9325,16 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 	offset = OPTS_GET(opts, offset, 0);
 	pe_opts.user_ctx = OPTS_GET(opts, user_ctx, 0);
 
-	pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name,
-				    offset, -1 /* pid */);
+	legacy = determine_kprobe_legacy();
+	if (!legacy) {
+		pfd = perf_event_open_probe(false /* uprobe */,
+					    retprobe, func_name,
+					    offset, -1 /* pid */);
+	} else {
+		pfd = perf_event_kprobe_open_legacy(retprobe, func_name,
+						    0 /* offset */,
+						   -1 /* pid */);
+	}
 	if (pfd < 0) {
 		pr_warn("prog '%s': failed to create %s '%s' perf event: %s\n",
 			prog->name, retprobe ? "kretprobe" : "kprobe", func_name,
@@ -9233,6 +9350,12 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 			libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(err);
 	}
+	if (legacy) {
+		struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
+		perf_link->legacy_name = strdup(func_name);
+		perf_link->is_retprobe = retprobe;
+	}
+
 	return link;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v4] libbpf: introduce legacy kprobe events support
  2021-08-01  5:11 ` [PATCH bpf-next v4] libbpf: introduce legacy kprobe events support Rafael David Tinoco
@ 2021-08-06 22:29   ` Andrii Nakryiko
  2021-09-12  6:48     ` [PATCH bpf-next v5] " Rafael David Tinoco
  0 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2021-08-06 22:29 UTC (permalink / raw)
  To: Rafael David Tinoco; +Cc: bpf

On Sat, Jul 31, 2021 at 10:11 PM Rafael David Tinoco
<rafaeldtinoco@gmail.com> wrote:
>
> Allow kprobe tracepoint events creation through legacy interface, as the
> kprobe dynamic PMUs support, used by default, was only created in v4.17.
>
> After commit "bpf: implement minimal BPF perf link", it was allowed that
> some extra - to the link - information is accessed through container_of
> struct bpf_link. This allows the tracing perf event legacy name, and
> information whether it is a retprobe, to be saved outside bpf_link
> structure, which would not be optimal.
>
> This enables CO.RE support for older kernels.

nit: it's CO-RE (or Compile Once - Run Everywhere), let's keep
consistent spelling.

>
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
> ---

Looks good overall, modulo various nits and skipped error handling.
Please wait for bpf_perf_link changes to get applied and submit the
next revision based on it. Thanks!


>  tools/lib/bpf/libbpf.c | 127 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 125 insertions(+), 2 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e1b7b2b6618c..40037340a3e7 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -8985,9 +8985,55 @@ int bpf_link__unpin(struct bpf_link *link)
>         return 0;
>  }
>
> +static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_t offset) {

'{' should be on separate line

Please use scripts/checkpatch.pl on your changes, there are a bunch of
stylistic problems in your code, which you would have caught yourself
if you ran this script.

> +       int fd, ret = 0;
> +       pid_t p = getpid();
> +       char cmd[192] = {}, probename[128] = {}, probefunc[128] = {};
> +       const char *file = "/sys/kernel/debug/tracing/kprobe_events";
> +
> +       if (retprobe)
> +               snprintf(probename, sizeof(probename), "kretprobes/%s_libbpf_%u", name, p);
> +       else
> +               snprintf(probename, sizeof(probename), "kprobes/%s_libbpf_%u", name, p);
> +
> +       if (offset)
> +               snprintf(probefunc, sizeof(probefunc), "%s+%lu", name, offset);
> +
> +       if (add) {
> +               snprintf(cmd, sizeof(cmd), "%c:%s %s",
> +                        retprobe ? 'r' : 'p',
> +                        probename,
> +                        offset ? probefunc : name);
> +       } else {
> +               snprintf(cmd, sizeof(cmd), "-:%s", probename);
> +       }
> +
> +       fd = open(file, O_WRONLY | O_APPEND, 0);
> +       if (!fd)
> +               return -errno;
> +       ret = write(fd, cmd, strlen(cmd));
> +       if (ret < 0)
> +               ret = -errno;
> +       close(fd);
> +
> +       return ret;
> +}
> +
> +static inline int add_kprobe_event_legacy(const char *name, bool retprobe, uint64_t offset)
> +{
> +       return poke_kprobe_events(true, name, retprobe, offset);
> +}
> +
> +static inline int remove_kprobe_event_legacy(const char *name, bool retprobe)
> +{
> +       return poke_kprobe_events(false, name, retprobe, 0);
> +}
> +
>  struct bpf_link_perf {
>         struct bpf_link link;
>         int perf_event_fd;
> +       char *legacy_name;

My initial reaction was to have a still different struct specifically
for legacy kprobe (e.g., struct bpf_link_kprobe_legacy or something
like that). But we'll need to do similar changes to uprobes as well,
so now I think it's ok to have this as an optional extra field on
perf_event-based link. If that causes problems we can always change
that later.

To make it clearer, can you name it "legacy_probe_name" and leave a
short comment mentioning that this is used to remember original
identifier we used to create legacy kprobe?

> +       bool is_retprobe;
>  };
>
>  static int bpf_link_perf_detach(struct bpf_link *link)
> @@ -9002,6 +9048,10 @@ static int bpf_link_perf_detach(struct bpf_link *link)
>                 close(perf_link->perf_event_fd);
>         close(link->fd);
>
> +       /* legacy kprobe needs to be removed after perf event fd closure */
> +       if (perf_link->legacy_name)
> +               remove_kprobe_event_legacy(perf_link->legacy_name, perf_link->is_retprobe);

check and propagate error?

> +
>         return libbpf_err(err);
>  }
>
> @@ -9009,6 +9059,9 @@ static void bpf_link_perf_dealloc(struct bpf_link *link)
>  {
>         struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
>
> +       if (perf_link->legacy_name)
> +               free(perf_link->legacy_name);

free() handles NULL properly, no need for if

> +
>         free(perf_link);
>  }
>
> @@ -9122,6 +9175,26 @@ static int parse_uint_from_file(const char *file, const char *fmt)
>         return ret;
>  }
>
> +static bool determine_kprobe_legacy(void)
> +{
> +       const char *file = "/sys/bus/event_source/devices/kprobe/type";
> +
> +       return access(file, 0) == 0 ? false : true;
> +}
> +
> +static int determine_kprobe_perf_type_legacy(const char *func_name, bool is_retprobe)
> +{
> +       char file[192];
> +

extra empty line

> +       const char *fname = "/sys/kernel/debug/tracing/events/%s/%s_libbpf_%d/id";
> +
> +       snprintf(file, sizeof(file), fname,
> +                is_retprobe ? "kretprobes" : "kprobes",
> +                func_name, getpid());
> +
> +       return parse_uint_from_file(file, "%d\n");
> +}
> +
>  static int determine_kprobe_perf_type(void)
>  {
>         const char *file = "/sys/bus/event_source/devices/kprobe/type";
> @@ -9197,6 +9270,41 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
>         return pfd;
>  }
>
> +static int perf_event_kprobe_open_legacy(bool retprobe, const char *name, uint64_t offset, int pid)
> +{
> +       struct perf_event_attr attr = {};
> +       char errmsg[STRERR_BUFSIZE];
> +       int type, pfd, err;
> +
> +       err = add_kprobe_event_legacy(name, retprobe, offset);
> +       if (err < 0) {
> +               pr_warn("failed to add legacy kprobe event: %s\n",
> +                       libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
> +               return err;
> +       }
> +       type = determine_kprobe_perf_type_legacy(name, retprobe);
> +       if (type < 0) {
> +               pr_warn("failed to determine legacy kprobe event id: %s\n",
> +                       libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
> +               return type;
> +       }
> +       attr.size = sizeof(attr);
> +       attr.config = type;
> +       attr.type = PERF_TYPE_TRACEPOINT;
> +
> +       pfd = syscall(__NR_perf_event_open, &attr,
> +                     pid < 0 ? -1 : pid, /* pid */
> +                     pid == -1 ? 0 : -1, /* cpu */
> +                     -1 /* group_fd */,  PERF_FLAG_FD_CLOEXEC);
> +       if (pfd < 0) {
> +               err = -errno;
> +               pr_warn("legacy kprobe perf_event_open() failed: %s\n",
> +                       libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
> +               return err;
> +       }
> +       return pfd;
> +}
> +
>  struct bpf_link *
>  bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>                                 const char *func_name,
> @@ -9208,6 +9316,7 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>         unsigned long offset;
>         bool retprobe;
>         int pfd, err;
> +       bool legacy;
>
>         if (!OPTS_VALID(opts, bpf_kprobe_opts))
>                 return libbpf_err_ptr(-EINVAL);
> @@ -9216,8 +9325,16 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>         offset = OPTS_GET(opts, offset, 0);
>         pe_opts.user_ctx = OPTS_GET(opts, user_ctx, 0);
>
> -       pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name,
> -                                   offset, -1 /* pid */);
> +       legacy = determine_kprobe_legacy();
> +       if (!legacy) {
> +               pfd = perf_event_open_probe(false /* uprobe */,
> +                                           retprobe, func_name,
> +                                           offset, -1 /* pid */);
> +       } else {
> +               pfd = perf_event_kprobe_open_legacy(retprobe, func_name,
> +                                                   0 /* offset */,
> +                                                  -1 /* pid */);
> +       }
>         if (pfd < 0) {
>                 pr_warn("prog '%s': failed to create %s '%s' perf event: %s\n",
>                         prog->name, retprobe ? "kretprobe" : "kprobe", func_name,
> @@ -9233,6 +9350,12 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>                         libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
>                 return libbpf_err_ptr(err);
>         }
> +       if (legacy) {
> +               struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);

empty line between variable declaration and statements

> +               perf_link->legacy_name = strdup(func_name);

check NULL, could be ENOMEM. Probably easier to allocate this at the
beginning, before we do heavy-lifting of perf_event_open(), clean up
will be simpler.

> +               perf_link->is_retprobe = retprobe;
> +       }
> +
>         return link;
>  }
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function
  2021-07-30  5:34 ` [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
  2021-07-30 17:19   ` Yonghong Song
@ 2021-08-09 22:43   ` Daniel Borkmann
  2021-08-10  0:28     ` Andrii Nakryiko
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2021-08-09 22:43 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast; +Cc: kernel-team, Peter Zijlstra

On 7/30/21 7:34 AM, Andrii Nakryiko wrote:
> Turn BPF_PROG_RUN into a proper always inlined function. No functional and
> performance changes are intended, but it makes it much easier to understand
> what's going on with how BPF programs are actually get executed. It's more
> obvious what types and callbacks are expected. Also extra () around input
> parameters can be dropped, as well as `__` variable prefixes intended to avoid
> naming collisions, which makes the code simpler to read and write.
> 
> This refactoring also highlighted one possible issue. BPF_PROG_RUN is both
> a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
> BPF_PROG_RUN into a function causes naming conflict compilation error. So
> rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
> bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. To avoid unnecessary code
> churn across many networking calls to BPF_PROG_RUN, #define BPF_PROG_RUN as an
> alias to bpf_prog_run.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Change itself looks good, small nit below:

> ---
>   include/linux/filter.h | 58 +++++++++++++++++++++++++++---------------
>   1 file changed, 37 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index ba36989f711a..18518e321ce4 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -585,25 +585,41 @@ struct sk_filter {
>   
>   DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>   
> -#define __BPF_PROG_RUN(prog, ctx, dfunc)	({			\
> -	u32 __ret;							\
> -	cant_migrate();							\
> -	if (static_branch_unlikely(&bpf_stats_enabled_key)) {		\
> -		struct bpf_prog_stats *__stats;				\
> -		u64 __start = sched_clock();				\
> -		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
> -		__stats = this_cpu_ptr(prog->stats);			\
> -		u64_stats_update_begin(&__stats->syncp);		\
> -		__stats->cnt++;						\
> -		__stats->nsecs += sched_clock() - __start;		\
> -		u64_stats_update_end(&__stats->syncp);			\
> -	} else {							\
> -		__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
> -	}								\
> -	__ret; })
> -
> -#define BPF_PROG_RUN(prog, ctx)						\
> -	__BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func)
> +typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
> +					  const struct bpf_insn *insnsi,
> +					  unsigned int (*bpf_func)(const void *,
> +								   const struct bpf_insn *));
> +
> +static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
> +					  const void *ctx,
> +					  bpf_dispatcher_fn dfunc)
> +{
> +	u32 ret;
> +
> +	cant_migrate();
> +	if (static_branch_unlikely(&bpf_stats_enabled_key)) {
> +		struct bpf_prog_stats *stats;
> +		u64 start = sched_clock();
> +
> +		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> +		stats = this_cpu_ptr(prog->stats);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->cnt++;
> +		stats->nsecs += sched_clock() - start;
> +		u64_stats_update_end(&stats->syncp);
> +	} else {
> +		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> +	}
> +	return ret;
> +}
> +
> +static __always_inline u32 bpf_prog_run(const struct bpf_prog *prog, const void *ctx)
> +{
> +	return __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
> +}
> +
> +/* avoids name conflict with BPF_PROG_RUN enum definedi uapi/linux/bpf.h */

(definedi)

> +#define BPF_PROG_RUN bpf_prog_run

Given the unfortunate conflict in BPF_PROG_RUN, can't we just toss the BPF_PROG_RUN to
bpf_prog_run altogether and bite the bullet once to remove it from the tree? (Same as the
other macro names in next patch.) There are a number of instances, but still to the extend
that it should be doable.

>   /*
>    * Use in preemptible and therefore migratable context to make sure that
> @@ -622,7 +638,7 @@ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
>   	u32 ret;
>   
>   	migrate_disable();
> -	ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func);
> +	ret = bpf_prog_run(prog, ctx);
>   	migrate_enable();
>   	return ret;
>   }
> @@ -768,7 +784,7 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
>   	 * under local_bh_disable(), which provides the needed RCU protection
>   	 * for accessing map entries.
>   	 */
> -	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +	return __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
>   }
>   
>   void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions
  2021-07-30  5:34 ` [PATCH v3 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
@ 2021-08-09 23:00   ` Daniel Borkmann
  2021-08-10  0:36     ` Andrii Nakryiko
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2021-08-09 23:00 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast; +Cc: kernel-team, Peter Zijlstra, Yonghong Song

On 7/30/21 7:34 AM, Andrii Nakryiko wrote:
> Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
> with all the same readability and maintainability benefits. Making them into
> functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
> functions. Also, explicitly specifying the type of the BPF prog run callback
> required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
> internally to const struct sk_buff.
> 
> Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
> BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
> the differences in bpf_run_ctx used for those two different use cases.
> 
> I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
> struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
> cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
> required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
> everyone.
> 
> The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
> pass-through user-provided context value in the next patch.
> 
> Acked-by: Yonghong Song <yhs@fb.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>   include/linux/bpf.h      | 187 +++++++++++++++++++++++----------------
>   include/linux/filter.h   |   5 +-
>   kernel/bpf/cgroup.c      |  32 +++----
>   kernel/trace/bpf_trace.c |   2 +-
>   4 files changed, 132 insertions(+), 94 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index c8cc09013210..9c44b56b698f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1146,67 +1146,124 @@ struct bpf_run_ctx {};
>   
>   struct bpf_cg_run_ctx {
>   	struct bpf_run_ctx run_ctx;
> -	struct bpf_prog_array_item *prog_item;
> +	const struct bpf_prog_array_item *prog_item;
>   };
>   
> +#ifdef CONFIG_BPF_SYSCALL
> +static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
> +{
> +	struct bpf_run_ctx *old_ctx;
> +
> +	old_ctx = current->bpf_ctx;
> +	current->bpf_ctx = new_ctx;
> +	return old_ctx;
> +}
> +
> +static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
> +{
> +	current->bpf_ctx = old_ctx;
> +}
> +#else /* CONFIG_BPF_SYSCALL */
> +static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
> +{
> +	return NULL;
> +}
> +
> +static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
> +{
> +}
> +#endif /* CONFIG_BPF_SYSCALL */

nit, but either is fine..:

static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
{
	struct bpf_run_ctx *old_ctx = NULL;

#ifdef CONFIG_BPF_SYSCALL
	old_ctx = current->bpf_ctx;
	current->bpf_ctx = new_ctx;
#endif
	return old_ctx;
}

static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
{
#ifdef CONFIG_BPF_SYSCALL
	current->bpf_ctx = old_ctx;
#endif
}

>   /* BPF program asks to bypass CAP_NET_BIND_SERVICE in bind. */
>   #define BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE			(1 << 0)
>   /* BPF program asks to set CN on the packet. */
>   #define BPF_RET_SET_CN						(1 << 0)
>   
> -#define BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, ret_flags)		\
> -	({								\
> -		struct bpf_prog_array_item *_item;			\
> -		struct bpf_prog *_prog;					\
> -		struct bpf_prog_array *_array;				\
> -		struct bpf_run_ctx *old_run_ctx;			\
> -		struct bpf_cg_run_ctx run_ctx;				\
> -		u32 _ret = 1;						\
> -		u32 func_ret;						\
> -		migrate_disable();					\
> -		rcu_read_lock();					\
> -		_array = rcu_dereference(array);			\
> -		_item = &_array->items[0];				\
> -		old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);	\
> -		while ((_prog = READ_ONCE(_item->prog))) {		\
> -			run_ctx.prog_item = _item;			\
> -			func_ret = func(_prog, ctx);			\
> -			_ret &= (func_ret & 1);				\
> -			*(ret_flags) |= (func_ret >> 1);		\
> -			_item++;					\
> -		}							\
> -		bpf_reset_run_ctx(old_run_ctx);				\
> -		rcu_read_unlock();					\
> -		migrate_enable();					\
> -		_ret;							\
> -	 })
> -
> -#define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null, set_cg_storage)	\
> -	({						\
> -		struct bpf_prog_array_item *_item;	\
> -		struct bpf_prog *_prog;			\
> -		struct bpf_prog_array *_array;		\
> -		struct bpf_run_ctx *old_run_ctx;	\
> -		struct bpf_cg_run_ctx run_ctx;		\
> -		u32 _ret = 1;				\
> -		migrate_disable();			\
> -		rcu_read_lock();			\
> -		_array = rcu_dereference(array);	\
> -		if (unlikely(check_non_null && !_array))\
> -			goto _out;			\
> -		_item = &_array->items[0];		\
> -		old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);\
> -		while ((_prog = READ_ONCE(_item->prog))) {	\
> -			run_ctx.prog_item = _item;	\
> -			_ret &= func(_prog, ctx);	\
> -			_item++;			\
> -		}					\
> -		bpf_reset_run_ctx(old_run_ctx);		\
> -_out:							\
> -		rcu_read_unlock();			\
> -		migrate_enable();			\
> -		_ret;					\
> -	 })
> +typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx);
> +
> +static __always_inline u32
> +BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
> +			    const void *ctx, bpf_prog_run_fn run_prog,
> +			    u32 *ret_flags)
> +{
> +	const struct bpf_prog_array_item *item;
> +	const struct bpf_prog *prog;
> +	const struct bpf_prog_array *array;
> +	struct bpf_run_ctx *old_run_ctx;
> +	struct bpf_cg_run_ctx run_ctx;
> +	u32 ret = 1;
> +	u32 func_ret;
> +
> +	migrate_disable();
> +	rcu_read_lock();
> +	array = rcu_dereference(array_rcu);
> +	item = &array->items[0];
> +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> +	while ((prog = READ_ONCE(item->prog))) {
> +		run_ctx.prog_item = item;
> +		func_ret = run_prog(prog, ctx);
> +		ret &= (func_ret & 1);
> +		*(ret_flags) |= (func_ret >> 1);
> +		item++;
> +	}
> +	bpf_reset_run_ctx(old_run_ctx);
> +	rcu_read_unlock();
> +	migrate_enable();
> +	return ret;
> +}
> +
> +static __always_inline u32
> +BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
> +		      const void *ctx, bpf_prog_run_fn run_prog)
> +{
> +	const struct bpf_prog_array_item *item;
> +	const struct bpf_prog *prog;
> +	const struct bpf_prog_array *array;
> +	struct bpf_run_ctx *old_run_ctx;
> +	struct bpf_cg_run_ctx run_ctx;
> +	u32 ret = 1;
> +
> +	migrate_disable();
> +	rcu_read_lock();
> +	array = rcu_dereference(array_rcu);
> +	item = &array->items[0];
> +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> +	while ((prog = READ_ONCE(item->prog))) {
> +		run_ctx.prog_item = item;
> +		ret &= run_prog(prog, ctx);
> +		item++;
> +	}
> +	bpf_reset_run_ctx(old_run_ctx);
> +	rcu_read_unlock();
> +	migrate_enable();
> +	return ret;
> +}
> +
> +static __always_inline u32
> +BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> +		   const void *ctx, bpf_prog_run_fn run_prog)
> +{
> +	const struct bpf_prog_array_item *item;
> +	const struct bpf_prog *prog;
> +	const struct bpf_prog_array *array;
> +	u32 ret = 1;
> +
> +	migrate_disable();
> +	rcu_read_lock();
> +	array = rcu_dereference(array_rcu);
> +	if (unlikely(!array))
> +		goto out;
> +	item = &array->items[0];
> +	while ((prog = READ_ONCE(item->prog))) {
> +		ret &= run_prog(prog, ctx);
> +		item++;
> +	}
> +out:
> +	rcu_read_unlock();
> +	migrate_enable();
> +	return ret;
> +}

Is there any way we could consolidate the above somewhat further and have things
optimized out at compilation time, e.g. when const args are null/non-null? :/

>   /* To be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs
>    * so BPF programs can request cwr for TCP packets.
> @@ -1235,7 +1292,7 @@ _out:							\
>   		u32 _flags = 0;				\
[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links
  2021-07-30  5:34 ` [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links Andrii Nakryiko
  2021-07-30 22:02   ` Yonghong Song
@ 2021-08-09 23:30   ` Daniel Borkmann
  2021-08-10  0:52     ` Andrii Nakryiko
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2021-08-09 23:30 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast; +Cc: kernel-team, Peter Zijlstra

On 7/30/21 7:34 AM, Andrii Nakryiko wrote:
> Add ability for users to specify custom u64 value (bpf_cookie) when creating
> BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
> tracepoints).
> 
> This is useful for cases when the same BPF program is used for attaching and
> processing invocation of different tracepoints/kprobes/uprobes in a generic
> fashion, but such that each invocation is distinguished from each other (e.g.,
> BPF program can look up additional information associated with a specific
> kernel function without having to rely on function IP lookups). This enables
> new use cases to be implemented simply and efficiently that previously were
> possible only through code generation (and thus multiple instances of almost
> identical BPF program) or compilation at runtime (BCC-style) on target hosts
> (even more expensive resource-wise). For uprobes it is not even possible in
> some cases to know function IP before hand (e.g., when attaching to shared
> library without PID filtering, in which case base load address is not known
> for a library).
> 
> This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
> corresponding to each attached and run BPF program. Given cgroup BPF programs
> already use two 8-byte pointers for their needs and cgroup BPF programs don't
> have (yet?) support for bpf_cookie, reuse that space through union of
> cgroup_storage and new bpf_cookie field.
> 
> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> program execution code, which luckily is now also split from
> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> giving access to this user-provided cookie value from inside a BPF program.
> Generic perf_event BPF programs will access this value from perf_event itself
> through passed in BPF program context.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
[...]
>   
> +struct bpf_trace_run_ctx {
> +	struct bpf_run_ctx run_ctx;
> +	u64 bpf_cookie;
> +};
> +
>   #ifdef CONFIG_BPF_SYSCALL
>   static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
>   {
> @@ -1247,6 +1256,8 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
>   	const struct bpf_prog_array_item *item;
>   	const struct bpf_prog *prog;
>   	const struct bpf_prog_array *array;
> +	struct bpf_run_ctx *old_run_ctx;
> +	struct bpf_trace_run_ctx run_ctx;
>   	u32 ret = 1;
>   
>   	migrate_disable();
> @@ -1254,11 +1265,14 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
>   	array = rcu_dereference(array_rcu);
>   	if (unlikely(!array))
>   		goto out;
> +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>   	item = &array->items[0];
>   	while ((prog = READ_ONCE(item->prog))) {
> +		run_ctx.bpf_cookie = item->bpf_cookie;
>   		ret &= run_prog(prog, ctx);
>   		item++;
>   	}
> +	bpf_reset_run_ctx(old_run_ctx);
>   out:
>   	rcu_read_unlock();
>   	migrate_enable();
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2d510ad750ed..fe156a8170aa 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -762,6 +762,7 @@ struct perf_event {
>   #ifdef CONFIG_BPF_SYSCALL
>   	perf_overflow_handler_t		orig_overflow_handler;
>   	struct bpf_prog			*prog;
> +	u64				bpf_cookie;
>   #endif
>   
>   #ifdef CONFIG_EVENT_TRACING
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 8ac92560d3a3..8e0631a4b046 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
>   
>   #ifdef CONFIG_BPF_EVENTS
>   unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
> -int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> +int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>   void perf_event_detach_bpf_prog(struct perf_event *event);
>   int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>   int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> @@ -692,7 +692,7 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
>   }
>   
>   static inline int
> -perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
> +perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
>   {
>   	return -EOPNOTSUPP;
>   }
> @@ -803,7 +803,7 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
>   void perf_trace_buf_update(void *record, u16 type);
>   void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>   
> -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>   void perf_event_free_bpf_prog(struct perf_event *event);
>   
>   void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 94fe8329b28f..63ee482d50e1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1448,6 +1448,13 @@ union bpf_attr {
>   				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
>   				__u32		iter_info_len;	/* iter_info length */
>   			};
> +			struct {
> +				/* black box user-provided value passed through
> +				 * to BPF program at the execution time and
> +				 * accessible through bpf_get_attach_cookie() BPF helper
> +				 */
> +				__u64		bpf_cookie;

 From API PoV, should we just name this link_id to avoid confusion around gen_cookie_next()
users? Do we expect other link types to implement similar mechanism? I'd think probably yes
if the prog would be common and e.g. do htab lookups based on that opaque value.

Is the 8b chosen given function IP fits, or is there a different rationale size-wise? Should
this be of dynamic size to be more future proof, e.g. hidden map like in prog's global sections
that libbpf sets up / prepopulates internally, but tied to link object instead?

> +			} perf_event;
>   		};
>   	} link_create;
>   
Thanks,
Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function
  2021-08-09 22:43   ` Daniel Borkmann
@ 2021-08-10  0:28     ` Andrii Nakryiko
  0 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-08-10  0:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Kernel Team, Peter Zijlstra

On Mon, Aug 9, 2021 at 3:43 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/30/21 7:34 AM, Andrii Nakryiko wrote:
> > Turn BPF_PROG_RUN into a proper always inlined function. No functional and
> > performance changes are intended, but it makes it much easier to understand
> > what's going on with how BPF programs are actually get executed. It's more
> > obvious what types and callbacks are expected. Also extra () around input
> > parameters can be dropped, as well as `__` variable prefixes intended to avoid
> > naming collisions, which makes the code simpler to read and write.
> >
> > This refactoring also highlighted one possible issue. BPF_PROG_RUN is both
> > a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
> > BPF_PROG_RUN into a function causes naming conflict compilation error. So
> > rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
> > bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. To avoid unnecessary code
> > churn across many networking calls to BPF_PROG_RUN, #define BPF_PROG_RUN as an
> > alias to bpf_prog_run.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Change itself looks good, small nit below:
>
> > ---
> >   include/linux/filter.h | 58 +++++++++++++++++++++++++++---------------
> >   1 file changed, 37 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index ba36989f711a..18518e321ce4 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -585,25 +585,41 @@ struct sk_filter {
> >
> >   DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> >
> > -#define __BPF_PROG_RUN(prog, ctx, dfunc)     ({                      \
> > -     u32 __ret;                                                      \
> > -     cant_migrate();                                                 \
> > -     if (static_branch_unlikely(&bpf_stats_enabled_key)) {           \
> > -             struct bpf_prog_stats *__stats;                         \
> > -             u64 __start = sched_clock();                            \
> > -             __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);   \
> > -             __stats = this_cpu_ptr(prog->stats);                    \
> > -             u64_stats_update_begin(&__stats->syncp);                \
> > -             __stats->cnt++;                                         \
> > -             __stats->nsecs += sched_clock() - __start;              \
> > -             u64_stats_update_end(&__stats->syncp);                  \
> > -     } else {                                                        \
> > -             __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);   \
> > -     }                                                               \
> > -     __ret; })
> > -
> > -#define BPF_PROG_RUN(prog, ctx)                                              \
> > -     __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func)
> > +typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
> > +                                       const struct bpf_insn *insnsi,
> > +                                       unsigned int (*bpf_func)(const void *,
> > +                                                                const struct bpf_insn *));
> > +
> > +static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
> > +                                       const void *ctx,
> > +                                       bpf_dispatcher_fn dfunc)
> > +{
> > +     u32 ret;
> > +
> > +     cant_migrate();
> > +     if (static_branch_unlikely(&bpf_stats_enabled_key)) {
> > +             struct bpf_prog_stats *stats;
> > +             u64 start = sched_clock();
> > +
> > +             ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > +             stats = this_cpu_ptr(prog->stats);
> > +             u64_stats_update_begin(&stats->syncp);
> > +             stats->cnt++;
> > +             stats->nsecs += sched_clock() - start;
> > +             u64_stats_update_end(&stats->syncp);
> > +     } else {
> > +             ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > +     }
> > +     return ret;
> > +}
> > +
> > +static __always_inline u32 bpf_prog_run(const struct bpf_prog *prog, const void *ctx)
> > +{
> > +     return __bpf_prog_run(prog, ctx, bpf_dispatcher_nop_func);
> > +}
> > +
> > +/* avoids name conflict with BPF_PROG_RUN enum definedi uapi/linux/bpf.h */
>
> (definedi)
>

oops, will fix

> > +#define BPF_PROG_RUN bpf_prog_run
>
> Given the unfortunate conflict in BPF_PROG_RUN, can't we just toss the BPF_PROG_RUN to
> bpf_prog_run altogether and bite the bullet once to remove it from the tree? (Same as the
> other macro names in next patch.) There are a number of instances, but still to the extend
> that it should be doable.

Yeah, absolutely. I wasn't sure if you'd hate the renaming noise. I'll
get rid of BPF_PROG_RUN macro in the next revision.

>
> >   /*
> >    * Use in preemptible and therefore migratable context to make sure that
> > @@ -622,7 +638,7 @@ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
> >       u32 ret;
> >
> >       migrate_disable();
> > -     ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func);
> > +     ret = bpf_prog_run(prog, ctx);
> >       migrate_enable();
> >       return ret;
> >   }
> > @@ -768,7 +784,7 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
> >        * under local_bh_disable(), which provides the needed RCU protection
> >        * for accessing map entries.
> >        */
> > -     return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> > +     return __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> >   }
> >
> >   void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
> >
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions
  2021-08-09 23:00   ` Daniel Borkmann
@ 2021-08-10  0:36     ` Andrii Nakryiko
  0 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-08-10  0:36 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Kernel Team,
	Peter Zijlstra, Yonghong Song

On Mon, Aug 9, 2021 at 4:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/30/21 7:34 AM, Andrii Nakryiko wrote:
> > Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
> > with all the same readability and maintainability benefits. Making them into
> > functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
> > functions. Also, explicitly specifying the type of the BPF prog run callback
> > required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
> > internally to const struct sk_buff.
> >
> > Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
> > BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
> > the differences in bpf_run_ctx used for those two different use cases.
> >
> > I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
> > struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
> > cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
> > required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
> > everyone.
> >
> > The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
> > pass-through user-provided context value in the next patch.
> >
> > Acked-by: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >   include/linux/bpf.h      | 187 +++++++++++++++++++++++----------------
> >   include/linux/filter.h   |   5 +-
> >   kernel/bpf/cgroup.c      |  32 +++----
> >   kernel/trace/bpf_trace.c |   2 +-
> >   4 files changed, 132 insertions(+), 94 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index c8cc09013210..9c44b56b698f 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1146,67 +1146,124 @@ struct bpf_run_ctx {};
> >
> >   struct bpf_cg_run_ctx {
> >       struct bpf_run_ctx run_ctx;
> > -     struct bpf_prog_array_item *prog_item;
> > +     const struct bpf_prog_array_item *prog_item;
> >   };
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
> > +{
> > +     struct bpf_run_ctx *old_ctx;
> > +
> > +     old_ctx = current->bpf_ctx;
> > +     current->bpf_ctx = new_ctx;
> > +     return old_ctx;
> > +}
> > +
> > +static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
> > +{
> > +     current->bpf_ctx = old_ctx;
> > +}
> > +#else /* CONFIG_BPF_SYSCALL */
> > +static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
> > +{
> > +     return NULL;
> > +}
> > +
> > +static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
> > +{
> > +}
> > +#endif /* CONFIG_BPF_SYSCALL */
>
> nit, but either is fine..:
>
> static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
> {
>         struct bpf_run_ctx *old_ctx = NULL;
>
> #ifdef CONFIG_BPF_SYSCALL
>         old_ctx = current->bpf_ctx;
>         current->bpf_ctx = new_ctx;
> #endif
>         return old_ctx;
> }
>
> static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
> {
> #ifdef CONFIG_BPF_SYSCALL
>         current->bpf_ctx = old_ctx;
> #endif
> }

sure, I don't mind

>
> >   /* BPF program asks to bypass CAP_NET_BIND_SERVICE in bind. */
> >   #define BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE                        (1 << 0)
> >   /* BPF program asks to set CN on the packet. */
> >   #define BPF_RET_SET_CN                                              (1 << 0)
> >
> > -#define BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, ret_flags)                \
> > -     ({                                                              \
> > -             struct bpf_prog_array_item *_item;                      \
> > -             struct bpf_prog *_prog;                                 \
> > -             struct bpf_prog_array *_array;                          \
> > -             struct bpf_run_ctx *old_run_ctx;                        \
> > -             struct bpf_cg_run_ctx run_ctx;                          \
> > -             u32 _ret = 1;                                           \
> > -             u32 func_ret;                                           \
> > -             migrate_disable();                                      \
> > -             rcu_read_lock();                                        \
> > -             _array = rcu_dereference(array);                        \
> > -             _item = &_array->items[0];                              \
> > -             old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);        \
> > -             while ((_prog = READ_ONCE(_item->prog))) {              \
> > -                     run_ctx.prog_item = _item;                      \
> > -                     func_ret = func(_prog, ctx);                    \
> > -                     _ret &= (func_ret & 1);                         \
> > -                     *(ret_flags) |= (func_ret >> 1);                \
> > -                     _item++;                                        \
> > -             }                                                       \
> > -             bpf_reset_run_ctx(old_run_ctx);                         \
> > -             rcu_read_unlock();                                      \
> > -             migrate_enable();                                       \
> > -             _ret;                                                   \
> > -      })
> > -
> > -#define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null, set_cg_storage)       \
> > -     ({                                              \
> > -             struct bpf_prog_array_item *_item;      \
> > -             struct bpf_prog *_prog;                 \
> > -             struct bpf_prog_array *_array;          \
> > -             struct bpf_run_ctx *old_run_ctx;        \
> > -             struct bpf_cg_run_ctx run_ctx;          \
> > -             u32 _ret = 1;                           \
> > -             migrate_disable();                      \
> > -             rcu_read_lock();                        \
> > -             _array = rcu_dereference(array);        \
> > -             if (unlikely(check_non_null && !_array))\
> > -                     goto _out;                      \
> > -             _item = &_array->items[0];              \
> > -             old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);\
> > -             while ((_prog = READ_ONCE(_item->prog))) {      \
> > -                     run_ctx.prog_item = _item;      \
> > -                     _ret &= func(_prog, ctx);       \
> > -                     _item++;                        \
> > -             }                                       \
> > -             bpf_reset_run_ctx(old_run_ctx);         \
> > -_out:                                                        \
> > -             rcu_read_unlock();                      \
> > -             migrate_enable();                       \
> > -             _ret;                                   \
> > -      })
> > +typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx);
> > +
> > +static __always_inline u32
> > +BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
> > +                         const void *ctx, bpf_prog_run_fn run_prog,
> > +                         u32 *ret_flags)
> > +{
> > +     const struct bpf_prog_array_item *item;
> > +     const struct bpf_prog *prog;
> > +     const struct bpf_prog_array *array;
> > +     struct bpf_run_ctx *old_run_ctx;
> > +     struct bpf_cg_run_ctx run_ctx;
> > +     u32 ret = 1;
> > +     u32 func_ret;
> > +
> > +     migrate_disable();
> > +     rcu_read_lock();
> > +     array = rcu_dereference(array_rcu);
> > +     item = &array->items[0];
> > +     old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > +     while ((prog = READ_ONCE(item->prog))) {
> > +             run_ctx.prog_item = item;
> > +             func_ret = run_prog(prog, ctx);
> > +             ret &= (func_ret & 1);
> > +             *(ret_flags) |= (func_ret >> 1);
> > +             item++;
> > +     }
> > +     bpf_reset_run_ctx(old_run_ctx);
> > +     rcu_read_unlock();
> > +     migrate_enable();
> > +     return ret;
> > +}
> > +
> > +static __always_inline u32
> > +BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
> > +                   const void *ctx, bpf_prog_run_fn run_prog)
> > +{
> > +     const struct bpf_prog_array_item *item;
> > +     const struct bpf_prog *prog;
> > +     const struct bpf_prog_array *array;
> > +     struct bpf_run_ctx *old_run_ctx;
> > +     struct bpf_cg_run_ctx run_ctx;
> > +     u32 ret = 1;
> > +
> > +     migrate_disable();
> > +     rcu_read_lock();
> > +     array = rcu_dereference(array_rcu);
> > +     item = &array->items[0];
> > +     old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > +     while ((prog = READ_ONCE(item->prog))) {
> > +             run_ctx.prog_item = item;
> > +             ret &= run_prog(prog, ctx);
> > +             item++;
> > +     }
> > +     bpf_reset_run_ctx(old_run_ctx);
> > +     rcu_read_unlock();
> > +     migrate_enable();
> > +     return ret;
> > +}
> > +
> > +static __always_inline u32
> > +BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> > +                const void *ctx, bpf_prog_run_fn run_prog)
> > +{
> > +     const struct bpf_prog_array_item *item;
> > +     const struct bpf_prog *prog;
> > +     const struct bpf_prog_array *array;
> > +     u32 ret = 1;
> > +
> > +     migrate_disable();
> > +     rcu_read_lock();
> > +     array = rcu_dereference(array_rcu);
> > +     if (unlikely(!array))
> > +             goto out;
> > +     item = &array->items[0];
> > +     while ((prog = READ_ONCE(item->prog))) {
> > +             ret &= run_prog(prog, ctx);
> > +             item++;
> > +     }
> > +out:
> > +     rcu_read_unlock();
> > +     migrate_enable();
> > +     return ret;
> > +}
>
> Is there any way we could consolidate the above somewhat further and have things
> optimized out at compilation time, e.g. when const args are null/non-null? :/

Do you mean like passing "bool check_for_null" as an extra argument and then do

if (check_for_null && unlikely(!array))
    goto out;

?

I feel like that actually makes a bigger mess and just unnecessarily
obfuscates code, while saving just a few lines of straightforward
code. We didn't even do that for BPF_PROG_RUN_ARRAY_FLAGS vs
__BPF_PROG_RUN_ARRAY before, even though there are about 2 lines of
code differences.

But also in one of the next patches I'm adding perf-specific
bpf_perf_run_ctx (different from bpf_cg_run_ctx), so it will make
sharing the logic within these BPF_PROG_RUN_ARRAY*  even harder and
more convoluted.

Or were you talking about some other aspect here?

>
> >   /* To be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs
> >    * so BPF programs can request cwr for TCP packets.
> > @@ -1235,7 +1292,7 @@ _out:                                                   \
> >               u32 _flags = 0;                         \
> [...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links
  2021-08-09 23:30   ` Daniel Borkmann
@ 2021-08-10  0:52     ` Andrii Nakryiko
  0 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-08-10  0:52 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Kernel Team, Peter Zijlstra

On Mon, Aug 9, 2021 at 4:30 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/30/21 7:34 AM, Andrii Nakryiko wrote:
> > Add ability for users to specify custom u64 value (bpf_cookie) when creating
> > BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
> > tracepoints).
> >
> > This is useful for cases when the same BPF program is used for attaching and
> > processing invocation of different tracepoints/kprobes/uprobes in a generic
> > fashion, but such that each invocation is distinguished from each other (e.g.,
> > BPF program can look up additional information associated with a specific
> > kernel function without having to rely on function IP lookups). This enables
> > new use cases to be implemented simply and efficiently that previously were
> > possible only through code generation (and thus multiple instances of almost
> > identical BPF program) or compilation at runtime (BCC-style) on target hosts
> > (even more expensive resource-wise). For uprobes it is not even possible in
> > some cases to know function IP before hand (e.g., when attaching to shared
> > library without PID filtering, in which case base load address is not known
> > for a library).
> >
> > This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
> > corresponding to each attached and run BPF program. Given cgroup BPF programs
> > already use two 8-byte pointers for their needs and cgroup BPF programs don't
> > have (yet?) support for bpf_cookie, reuse that space through union of
> > cgroup_storage and new bpf_cookie field.
> >
> > Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> > This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> > program execution code, which luckily is now also split from
> > BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> > giving access to this user-provided cookie value from inside a BPF program.
> > Generic perf_event BPF programs will access this value from perf_event itself
> > through passed in BPF program context.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> [...]
> >
> > +struct bpf_trace_run_ctx {
> > +     struct bpf_run_ctx run_ctx;
> > +     u64 bpf_cookie;
> > +};
> > +
> >   #ifdef CONFIG_BPF_SYSCALL
> >   static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
> >   {
> > @@ -1247,6 +1256,8 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> >       const struct bpf_prog_array_item *item;
> >       const struct bpf_prog *prog;
> >       const struct bpf_prog_array *array;
> > +     struct bpf_run_ctx *old_run_ctx;
> > +     struct bpf_trace_run_ctx run_ctx;
> >       u32 ret = 1;
> >
> >       migrate_disable();
> > @@ -1254,11 +1265,14 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> >       array = rcu_dereference(array_rcu);
> >       if (unlikely(!array))
> >               goto out;
> > +     old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >       item = &array->items[0];
> >       while ((prog = READ_ONCE(item->prog))) {
> > +             run_ctx.bpf_cookie = item->bpf_cookie;
> >               ret &= run_prog(prog, ctx);
> >               item++;
> >       }
> > +     bpf_reset_run_ctx(old_run_ctx);
> >   out:
> >       rcu_read_unlock();
> >       migrate_enable();
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 2d510ad750ed..fe156a8170aa 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -762,6 +762,7 @@ struct perf_event {
> >   #ifdef CONFIG_BPF_SYSCALL
> >       perf_overflow_handler_t         orig_overflow_handler;
> >       struct bpf_prog                 *prog;
> > +     u64                             bpf_cookie;
> >   #endif
> >
> >   #ifdef CONFIG_EVENT_TRACING
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index 8ac92560d3a3..8e0631a4b046 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
> >
> >   #ifdef CONFIG_BPF_EVENTS
> >   unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
> > -int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> > +int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
> >   void perf_event_detach_bpf_prog(struct perf_event *event);
> >   int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> >   int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> > @@ -692,7 +692,7 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
> >   }
> >
> >   static inline int
> > -perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
> > +perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
> >   {
> >       return -EOPNOTSUPP;
> >   }
> > @@ -803,7 +803,7 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
> >   void perf_trace_buf_update(void *record, u16 type);
> >   void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
> >
> > -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
> > +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
> >   void perf_event_free_bpf_prog(struct perf_event *event);
> >
> >   void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 94fe8329b28f..63ee482d50e1 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1448,6 +1448,13 @@ union bpf_attr {
> >                               __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
> >                               __u32           iter_info_len;  /* iter_info length */
> >                       };
> > +                     struct {
> > +                             /* black box user-provided value passed through
> > +                              * to BPF program at the execution time and
> > +                              * accessible through bpf_get_attach_cookie() BPF helper
> > +                              */
> > +                             __u64           bpf_cookie;
>
>  From API PoV, should we just name this link_id to avoid confusion around gen_cookie_next()

I'd like to avoid the "id" connotations, as it doesn't have to be
ID-like at all. It can be duplicated across multiple
links/attachments, it could be random, it could be sequential, it
could be hash, whatever. I started originally with "user value" name
for the concept, switching to "user context" before submitting v1. And
then Peter proposed "cookie" terminology and it clicked, because
that's how custom user-provided value is often called when passed back
to user-provided callback. I agree about possible confusion with
socket and netns cookie, but ultimately there are only so many
different words we can use :) I think link_id itself is super
confusing as well, as there is BPF link ID (similar to prog ID and map
ID), which means a completely different thing, yet applies to the same
BPF link object.

> users? Do we expect other link types to implement similar mechanism? I'd think probably yes
> if the prog would be common and e.g. do htab lookups based on that opaque value.

Yes, you are right. I intend to add it at least to fentry/fexit
programs, but ultimately any type of program and its attachment might
benefit (e.g., BPF iterator is a pretty good candidate as well). In
practice, I'd try to use array to pass extra info, but hashmap is also
an option, of course.

>
> Is the 8b chosen given function IP fits, or is there a different rationale size-wise? Should
> this be of dynamic size to be more future proof, e.g. hidden map like in prog's global sections
> that libbpf sets up / prepopulates internally, but tied to link object instead?

See my previous reply to Yonghong about this ([0]). 4 bytes probably
would be workable in most cases, but felt like unnecessary economy,
given we are still going to use full 8 bytes in prog_array_item and
the likes. But I also explicitly don't want an arbitrary sized
"storage", that's not necessary. Users are free to handle extra
storage on their own and use this cookie value as an index/key into
that extra storage. So far I always used an ARRAY map for this, it's
actually very easy for user-space to do this efficiently, based on a
specific problem it is solving. I feel like adding libbpf conventions
with some hidden map just adds lots of complexity without adding much
value. Sometimes it has to be dynamically-sized BPF_MAP_TYPE_ARRAY,
but often it's enough to have a global variable (struct my_config
configs[MAX_CONFIG_CNT]), and in some other cases we don't even need a
storage at all.


  [0] https://lore.kernel.org/bpf/CAEf4BzY2fVJN5CEdWDDNkWQ9En4N6Rynnnzj7hTnWG65BqdusQ@mail.gmail.com/


>
> > +                     } perf_event;
> >               };
> >       } link_create;
> >
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH bpf-next v5] libbpf: introduce legacy kprobe events support
  2021-08-06 22:29   ` Andrii Nakryiko
@ 2021-09-12  6:48     ` Rafael David Tinoco
  2021-09-12  6:52       ` Rafael David Tinoco
  2021-09-14  4:44       ` Andrii Nakryiko
  0 siblings, 2 replies; 40+ messages in thread
From: Rafael David Tinoco @ 2021-09-12  6:48 UTC (permalink / raw)
  To: bpf; +Cc: andrii.nakryiko, rafaeldtinoco

Allow kprobe tracepoint events creation through legacy interface, as the
kprobe dynamic PMUs support, used by default, was only created in v4.17.

After commit "bpf: implement minimal BPF perf link", it was allowed that
some extra - to the link - information is accessed through container_of
struct bpf_link. This allows the tracing perf event legacy name, and
information whether it is a retprobe, to be saved outside bpf_link
structure, which would not be optimal.

This enables CO-RE support for older kernels.

Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
---
 tools/lib/bpf/libbpf.c | 141 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 135 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 88d8825fc6f6..780a45e54572 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8987,9 +8987,57 @@ int bpf_link__unpin(struct bpf_link *link)
 	return 0;
 }
 
+static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_t offset)
+{
+	int fd, ret = 0;
+	pid_t p = getpid();
+	char cmd[192] = {}, probename[128] = {}, probefunc[128] = {};
+	const char *file = "/sys/kernel/debug/tracing/kprobe_events";
+
+	if (retprobe)
+		snprintf(probename, sizeof(probename), "kretprobes/%s_libbpf_%u", name, p);
+	else
+		snprintf(probename, sizeof(probename), "kprobes/%s_libbpf_%u", name, p);
+
+	if (offset)
+		snprintf(probefunc, sizeof(probefunc), "%s+%lu", name, offset);
+
+	if (add) {
+		snprintf(cmd, sizeof(cmd), "%c:%s %s",
+			 retprobe ? 'r' : 'p',
+			 probename,
+			 offset ? probefunc : name);
+	} else {
+		snprintf(cmd, sizeof(cmd), "-:%s", probename);
+	}
+
+	fd = open(file, O_WRONLY | O_APPEND, 0);
+	if (!fd)
+		return -errno;
+	ret = write(fd, cmd, strlen(cmd));
+	if (ret < 0)
+		ret = -errno;
+	close(fd);
+
+	return ret;
+}
+
+static inline int add_kprobe_event_legacy(const char *name, bool retprobe, uint64_t offset)
+{
+	return poke_kprobe_events(true, name, retprobe, offset);
+}
+
+static inline int remove_kprobe_event_legacy(const char *name, bool retprobe)
+{
+	return poke_kprobe_events(false, name, retprobe, 0);
+}
+
 struct bpf_link_perf {
 	struct bpf_link link;
 	int perf_event_fd;
+	/* legacy kprobe support: keep track of probe identifier and type */
+	char *legacy_probe_name;
+	bool legacy_is_retprobe;
 };
 
 static int bpf_link_perf_detach(struct bpf_link *link)
@@ -8997,20 +9045,26 @@ static int bpf_link_perf_detach(struct bpf_link *link)
 	struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
 	int err = 0;
 
-	if (ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0) < 0)
-		err = -errno;
+	ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0);
 
 	if (perf_link->perf_event_fd != link->fd)
 		close(perf_link->perf_event_fd);
 	close(link->fd);
 
-	return libbpf_err(err);
+	/* legacy kprobe needs to be removed after perf event fd closure */
+	if (perf_link->legacy_probe_name) {
+		err = remove_kprobe_event_legacy(perf_link->legacy_probe_name,
+						 perf_link->legacy_is_retprobe);
+	}
+
+	return err;
 }
 
 static void bpf_link_perf_dealloc(struct bpf_link *link)
 {
 	struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
 
+	free(perf_link->legacy_probe_name);
 	free(perf_link);
 }
 
@@ -9124,6 +9178,25 @@ static int parse_uint_from_file(const char *file, const char *fmt)
 	return ret;
 }
 
+static bool determine_kprobe_legacy(void)
+{
+	const char *file = "/sys/bus/event_source/devices/kprobe/type";
+
+	return access(file, 0) == 0 ? false : true;
+}
+
+static int determine_kprobe_perf_type_legacy(const char *func_name, bool is_retprobe)
+{
+	char file[192];
+	const char *fname = "/sys/kernel/debug/tracing/events/%s/%s_libbpf_%d/id";
+
+	snprintf(file, sizeof(file), fname,
+		 is_retprobe ? "kretprobes" : "kprobes",
+		 func_name, getpid());
+
+	return parse_uint_from_file(file, "%d\n");
+}
+
 static int determine_kprobe_perf_type(void)
 {
 	const char *file = "/sys/bus/event_source/devices/kprobe/type";
@@ -9206,6 +9279,41 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
 	return pfd;
 }
 
+static int perf_event_kprobe_open_legacy(bool retprobe, const char *name, uint64_t offset, int pid)
+{
+	struct perf_event_attr attr = {};
+	char errmsg[STRERR_BUFSIZE];
+	int type, pfd, err;
+
+	err = add_kprobe_event_legacy(name, retprobe, offset);
+	if (err < 0) {
+		pr_warn("failed to add legacy kprobe event: %s\n",
+			libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+		return err;
+	}
+	type = determine_kprobe_perf_type_legacy(name, retprobe);
+	if (type < 0) {
+		pr_warn("failed to determine legacy kprobe event id: %s\n",
+			libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
+		return type;
+	}
+	attr.size = sizeof(attr);
+	attr.config = type;
+	attr.type = PERF_TYPE_TRACEPOINT;
+
+	pfd = syscall(__NR_perf_event_open, &attr,
+		      pid < 0 ? -1 : pid, /* pid */
+		      pid == -1 ? 0 : -1, /* cpu */
+		      -1 /* group_fd */,  PERF_FLAG_FD_CLOEXEC);
+	if (pfd < 0) {
+		err = -errno;
+		pr_warn("legacy kprobe perf_event_open() failed: %s\n",
+			libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+		return err;
+	}
+	return pfd;
+}
+
 struct bpf_link *
 bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 				const char *func_name,
@@ -9215,8 +9323,9 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	unsigned long offset;
-	bool retprobe;
+	bool retprobe, legacy;
 	int pfd, err;
+	char *legacy_probe = NULL;
 
 	if (!OPTS_VALID(opts, bpf_kprobe_opts))
 		return libbpf_err_ptr(-EINVAL);
@@ -9225,8 +9334,21 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 	offset = OPTS_GET(opts, offset, 0);
 	pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
 
-	pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name,
-				    offset, -1 /* pid */, 0 /* ref_ctr_off */);
+	legacy = determine_kprobe_legacy();
+	if (!legacy) {
+		pfd = perf_event_open_probe(false /* uprobe */,
+					    retprobe, func_name,
+					    offset, -1 /* pid */,
+					    0 /* ref_ctr_off */);
+	} else {
+		legacy_probe = strdup(func_name);
+		err = libbpf_get_error(legacy_probe);
+		if (err)
+			return libbpf_err_ptr(err);
+		pfd = perf_event_kprobe_open_legacy(retprobe, func_name,
+						   0 /* offset */,
+						   -1 /* pid */);
+	}
 	if (pfd < 0) {
 		pr_warn("prog '%s': failed to create %s '%s' perf event: %s\n",
 			prog->name, retprobe ? "kretprobe" : "kprobe", func_name,
@@ -9242,6 +9364,13 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
 			libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
 		return libbpf_err_ptr(err);
 	}
+	if (legacy) {
+		struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
+
+		perf_link->legacy_probe_name = legacy_probe;
+		perf_link->legacy_is_retprobe = retprobe;
+	}
+
 	return link;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v5] libbpf: introduce legacy kprobe events support
  2021-09-12  6:48     ` [PATCH bpf-next v5] " Rafael David Tinoco
@ 2021-09-12  6:52       ` Rafael David Tinoco
  2021-09-14  4:44       ` Andrii Nakryiko
  1 sibling, 0 replies; 40+ messages in thread
From: Rafael David Tinoco @ 2021-09-12  6:52 UTC (permalink / raw)
  To: bpf; +Cc: andrii.nakryiko

> Allow kprobe tracepoint events creation through legacy interface, as the
> kprobe dynamic PMUs support, used by default, was only created in v4.17.
> 
> After commit "bpf: implement minimal BPF perf link", it was allowed that
> some extra - to the link - information is accessed through container_of
> struct bpf_link. This allows the tracing perf event legacy name, and
> information whether it is a retprobe, to be saved outside bpf_link
> structure, which would not be optimal.
> 
> This enables CO-RE support for older kernels.
> 
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>

Tested using: https://github.com/rafaeldtinoco/portablebpf

Single execution:

# cat kernel/debug/tracing/kprobe_events
p:kprobes/tcp_connect_libbpf_20166 tcp_connect
r4:kretprobes/tcp_connect_libbpf_20166 tcp_connect

Simultaneous execution:

# cat kernel/debug/tracing/kprobe_events
p:kprobes/tcp_connect_libbpf_20166 tcp_connect
r4:kretprobes/tcp_connect_libbpf_20166 tcp_connect
p:kprobes/tcp_connect_libbpf_20177 tcp_connect
r4:kretprobes/tcp_connect_libbpf_20177 tcp_connect

kprobe_events was cleared after execution.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v5] libbpf: introduce legacy kprobe events support
  2021-09-12  6:48     ` [PATCH bpf-next v5] " Rafael David Tinoco
  2021-09-12  6:52       ` Rafael David Tinoco
@ 2021-09-14  4:44       ` Andrii Nakryiko
  2021-09-14  5:02         ` Rafael David Tinoco
  1 sibling, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2021-09-14  4:44 UTC (permalink / raw)
  To: Rafael David Tinoco; +Cc: bpf

On Sat, Sep 11, 2021 at 11:48 PM Rafael David Tinoco
<rafaeldtinoco@gmail.com> wrote:
>
> Allow kprobe tracepoint events creation through legacy interface, as the
> kprobe dynamic PMUs support, used by default, was only created in v4.17.
>
> After commit "bpf: implement minimal BPF perf link", it was allowed that
> some extra - to the link - information is accessed through container_of
> struct bpf_link. This allows the tracing perf event legacy name, and
> information whether it is a retprobe, to be saved outside bpf_link
> structure, which would not be optimal.
>
> This enables CO-RE support for older kernels.
>
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
> ---

I've adjusted the commit message a bit (this has nothing to do with
CO-RE per se, so I dropped that, for example). Also see my comments
below, I've applied all that to your patch while applying, please
check them out.

>  tools/lib/bpf/libbpf.c | 141 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 135 insertions(+), 6 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 88d8825fc6f6..780a45e54572 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -8987,9 +8987,57 @@ int bpf_link__unpin(struct bpf_link *link)
>         return 0;
>  }
>
> +static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_t offset)
> +{
> +       int fd, ret = 0;
> +       pid_t p = getpid();
> +       char cmd[192] = {}, probename[128] = {}, probefunc[128] = {};

cmd and probename don't need initialization, and for probefunc it's
cheaper to just do probefunc[0] = '\0' separately to avoid zeroing out
all 128 characters

> +       const char *file = "/sys/kernel/debug/tracing/kprobe_events";
> +
> +       if (retprobe)
> +               snprintf(probename, sizeof(probename), "kretprobes/%s_libbpf_%u", name, p);
> +       else
> +               snprintf(probename, sizeof(probename), "kprobes/%s_libbpf_%u", name, p);
> +
> +       if (offset)
> +               snprintf(probefunc, sizeof(probefunc), "%s+%lu", name, offset);

(size_t)offset and %zu is necessary to avoid compilation warnings on
some architectures (e.g., mips, I think)

> +
> +       if (add) {
> +               snprintf(cmd, sizeof(cmd), "%c:%s %s",
> +                        retprobe ? 'r' : 'p',
> +                        probename,
> +                        offset ? probefunc : name);
> +       } else {
> +               snprintf(cmd, sizeof(cmd), "-:%s", probename);
> +       }
> +
> +       fd = open(file, O_WRONLY | O_APPEND, 0);
> +       if (!fd)
> +               return -errno;
> +       ret = write(fd, cmd, strlen(cmd));
> +       if (ret < 0)
> +               ret = -errno;
> +       close(fd);
> +
> +       return ret;
> +}
> +
> +static inline int add_kprobe_event_legacy(const char *name, bool retprobe, uint64_t offset)
> +{
> +       return poke_kprobe_events(true, name, retprobe, offset);
> +}
> +
> +static inline int remove_kprobe_event_legacy(const char *name, bool retprobe)
> +{
> +       return poke_kprobe_events(false, name, retprobe, 0);
> +}
> +
>  struct bpf_link_perf {
>         struct bpf_link link;
>         int perf_event_fd;
> +       /* legacy kprobe support: keep track of probe identifier and type */
> +       char *legacy_probe_name;
> +       bool legacy_is_retprobe;
>  };
>
>  static int bpf_link_perf_detach(struct bpf_link *link)
> @@ -8997,20 +9045,26 @@ static int bpf_link_perf_detach(struct bpf_link *link)
>         struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
>         int err = 0;
>
> -       if (ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0) < 0)
> -               err = -errno;
> +       ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0);

what's the reason for dropping the error check? I kept it, but please
let me know if there is any reason to drop it

>
>         if (perf_link->perf_event_fd != link->fd)
>                 close(perf_link->perf_event_fd);
>         close(link->fd);
>
> -       return libbpf_err(err);
> +       /* legacy kprobe needs to be removed after perf event fd closure */
> +       if (perf_link->legacy_probe_name) {
> +               err = remove_kprobe_event_legacy(perf_link->legacy_probe_name,
> +                                                perf_link->legacy_is_retprobe);
> +       }
> +
> +       return err;
>  }
>
>  static void bpf_link_perf_dealloc(struct bpf_link *link)
>  {
>         struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
>
> +       free(perf_link->legacy_probe_name);
>         free(perf_link);
>  }
>
> @@ -9124,6 +9178,25 @@ static int parse_uint_from_file(const char *file, const char *fmt)
>         return ret;
>  }
>
> +static bool determine_kprobe_legacy(void)
> +{
> +       const char *file = "/sys/bus/event_source/devices/kprobe/type";
> +
> +       return access(file, 0) == 0 ? false : true;

this is almost the same as what determine_kprobe_perf_type() does, so
we can just reuse it

> +}
> +
> +static int determine_kprobe_perf_type_legacy(const char *func_name, bool is_retprobe)
> +{
> +       char file[192];
> +       const char *fname = "/sys/kernel/debug/tracing/events/%s/%s_libbpf_%d/id";
> +
> +       snprintf(file, sizeof(file), fname,
> +                is_retprobe ? "kretprobes" : "kprobes",
> +                func_name, getpid());
> +
> +       return parse_uint_from_file(file, "%d\n");
> +}
> +
>  static int determine_kprobe_perf_type(void)
>  {
>         const char *file = "/sys/bus/event_source/devices/kprobe/type";
> @@ -9206,6 +9279,41 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
>         return pfd;
>  }
>
> +static int perf_event_kprobe_open_legacy(bool retprobe, const char *name, uint64_t offset, int pid)
> +{
> +       struct perf_event_attr attr = {};
> +       char errmsg[STRERR_BUFSIZE];
> +       int type, pfd, err;
> +
> +       err = add_kprobe_event_legacy(name, retprobe, offset);
> +       if (err < 0) {
> +               pr_warn("failed to add legacy kprobe event: %s\n",
> +                       libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
> +               return err;
> +       }
> +       type = determine_kprobe_perf_type_legacy(name, retprobe);
> +       if (type < 0) {
> +               pr_warn("failed to determine legacy kprobe event id: %s\n",
> +                       libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
> +               return type;
> +       }
> +       attr.size = sizeof(attr);
> +       attr.config = type;
> +       attr.type = PERF_TYPE_TRACEPOINT;
> +
> +       pfd = syscall(__NR_perf_event_open, &attr,
> +                     pid < 0 ? -1 : pid, /* pid */
> +                     pid == -1 ? 0 : -1, /* cpu */
> +                     -1 /* group_fd */,  PERF_FLAG_FD_CLOEXEC);
> +       if (pfd < 0) {
> +               err = -errno;
> +               pr_warn("legacy kprobe perf_event_open() failed: %s\n",
> +                       libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
> +               return err;
> +       }
> +       return pfd;
> +}
> +
>  struct bpf_link *
>  bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>                                 const char *func_name,
> @@ -9215,8 +9323,9 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>         char errmsg[STRERR_BUFSIZE];
>         struct bpf_link *link;
>         unsigned long offset;
> -       bool retprobe;
> +       bool retprobe, legacy;
>         int pfd, err;
> +       char *legacy_probe = NULL;
>
>         if (!OPTS_VALID(opts, bpf_kprobe_opts))
>                 return libbpf_err_ptr(-EINVAL);
> @@ -9225,8 +9334,21 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>         offset = OPTS_GET(opts, offset, 0);
>         pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
>
> -       pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name,
> -                                   offset, -1 /* pid */, 0 /* ref_ctr_off */);
> +       legacy = determine_kprobe_legacy();
> +       if (!legacy) {
> +               pfd = perf_event_open_probe(false /* uprobe */,
> +                                           retprobe, func_name,
> +                                           offset, -1 /* pid */,
> +                                           0 /* ref_ctr_off */);
> +       } else {
> +               legacy_probe = strdup(func_name);
> +               err = libbpf_get_error(legacy_probe);
> +               if (err)
> +                       return libbpf_err_ptr(err);

let's not use libbpf_get_error() to check libc errors. Should be just
if (!legacy_probe)


> +               pfd = perf_event_kprobe_open_legacy(retprobe, func_name,
> +                                                  0 /* offset */,

gotta use the actual offset

> +                                                  -1 /* pid */);
> +       }
>         if (pfd < 0) {
>                 pr_warn("prog '%s': failed to create %s '%s' perf event: %s\n",
>                         prog->name, retprobe ? "kretprobe" : "kprobe", func_name,
> @@ -9242,6 +9364,13 @@ bpf_program__attach_kprobe_opts(struct bpf_program *prog,
>                         libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
>                 return libbpf_err_ptr(err);
>         }
> +       if (legacy) {
> +               struct bpf_link_perf *perf_link = container_of(link, struct bpf_link_perf, link);
> +
> +               perf_link->legacy_probe_name = legacy_probe;
> +               perf_link->legacy_is_retprobe = retprobe;
> +       }
> +
>         return link;
>  }
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v5] libbpf: introduce legacy kprobe events support
  2021-09-14  4:44       ` Andrii Nakryiko
@ 2021-09-14  5:02         ` Rafael David Tinoco
  2021-09-14  5:21           ` Andrii Nakryiko
  0 siblings, 1 reply; 40+ messages in thread
From: Rafael David Tinoco @ 2021-09-14  5:02 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf


>> Allow kprobe tracepoint events creation through legacy interface, as the
>> kprobe dynamic PMUs support, used by default, was only created in v4.17.
>> 
>> After commit "bpf: implement minimal BPF perf link", it was allowed that
>> some extra - to the link - information is accessed through container_of
>> struct bpf_link. This allows the tracing perf event legacy name, and
>> information whether it is a retprobe, to be saved outside bpf_link
>> structure, which would not be optimal.
>> 
>> This enables CO-RE support for older kernels.
>> 
>> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>> Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
>> ---
> 
> I've adjusted the commit message a bit (this has nothing to do with
> CO-RE per se, so I dropped that, for example). Also see my comments
> below, I've applied all that to your patch while applying, please
> check them out.

Thanks. I'm assuming you don't need a v6 based on your adjustment comments, let me know if you do please.

...

>> 
>> -       if (ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0) < 0)
>> -               err = -errno;
>> +       ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0);
> 
> what's the reason for dropping the error check? I kept it, but please
> let me know if there is any reason to drop it

From: _perf_ioctl() -> case PERF_EVENT_IOC_DISABLE: func = _perf_event_disable;

_perf_ioctl() will always return 0 and func is void (*func)(struct perf_event *).


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v5] libbpf: introduce legacy kprobe events support
  2021-09-14  5:02         ` Rafael David Tinoco
@ 2021-09-14  5:21           ` Andrii Nakryiko
  2021-09-14 14:27             ` sunyucong
  0 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2021-09-14  5:21 UTC (permalink / raw)
  To: Rafael David Tinoco; +Cc: bpf

On Mon, Sep 13, 2021 at 10:03 PM Rafael David Tinoco
<rafaeldtinoco@gmail.com> wrote:
>
>
> >> Allow kprobe tracepoint events creation through legacy interface, as the
> >> kprobe dynamic PMUs support, used by default, was only created in v4.17.
> >>
> >> After commit "bpf: implement minimal BPF perf link", it was allowed that
> >> some extra - to the link - information is accessed through container_of
> >> struct bpf_link. This allows the tracing perf event legacy name, and
> >> information whether it is a retprobe, to be saved outside bpf_link
> >> structure, which would not be optimal.
> >>
> >> This enables CO-RE support for older kernels.
> >>
> >> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> >> Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
> >> ---
> >
> > I've adjusted the commit message a bit (this has nothing to do with
> > CO-RE per se, so I dropped that, for example). Also see my comments
> > below, I've applied all that to your patch while applying, please
> > check them out.
>
> Thanks. I'm assuming you don't need a v6 based on your adjustment comments, let me know if you do please.
>

Nope, I've applied it to bpf-next.

> ...
>
> >>
> >> -       if (ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0) < 0)
> >> -               err = -errno;
> >> +       ioctl(perf_link->perf_event_fd, PERF_EVENT_IOC_DISABLE, 0);
> >
> > what's the reason for dropping the error check? I kept it, but please
> > let me know if there is any reason to drop it
>
> From: _perf_ioctl() -> case PERF_EVENT_IOC_DISABLE: func = _perf_event_disable;
>
> _perf_ioctl() will always return 0 and func is void (*func)(struct perf_event *).
>

And what about all the future kernels? This is an unnecessary
assumption that it will always succeed.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v5] libbpf: introduce legacy kprobe events support
  2021-09-14  5:21           ` Andrii Nakryiko
@ 2021-09-14 14:27             ` sunyucong
  2021-09-14 15:42               ` Rafael David Tinoco
  0 siblings, 1 reply; 40+ messages in thread
From: sunyucong @ 2021-09-14 14:27 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Rafael David Tinoco, bpf

> +static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_t offset)
> +{
> +       int fd, ret = 0;

This patch introduced a warning/error in CI

libbpf.c: In function ‘poke_kprobe_events’:
648 libbpf.c:9063:37: error: ‘%s’ directive output may be truncated
writing up to 127 bytes into a region of size between 62 and 189
[-Werror=format-truncation=]
649 9063 | snprintf(cmd, sizeof(cmd), "%c:%s %s",
650 | ^~
651 In file included from /usr/include/stdio.h:867,
652 from libbpf.c:17:
653 /usr/include/x86_64-linux-gnu/bits/stdio2.h:67:10: note:
‘__builtin___snprintf_chk’ output between 4 and 258 bytes into a
destination of size 192
654 67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
655 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
656 68 | __bos (__s), __fmt, __va_arg_pack ());
657 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v5] libbpf: introduce legacy kprobe events support
  2021-09-14 14:27             ` sunyucong
@ 2021-09-14 15:42               ` Rafael David Tinoco
  2021-09-14 20:34                 ` [PATCH bpf-next] libbpf: fix build error introduced by legacy kprobe feature Rafael David Tinoco
  2021-09-14 21:35                 ` [PATCH bpf-next v2] " Rafael David Tinoco
  0 siblings, 2 replies; 40+ messages in thread
From: Rafael David Tinoco @ 2021-09-14 15:42 UTC (permalink / raw)
  To: sunyucong; +Cc: Andrii Nakryiko, bpf


> On 14 Sep 2021, at 11:27 AM, sunyucong@gmail.com wrote:
> 
>> +static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_t offset)
>> +{
>> +       int fd, ret = 0;
> 
> This patch introduced a warning/error in CI
> 
> libbpf.c: In function ‘poke_kprobe_events’:
> 648 libbpf.c:9063:37: error: ‘%s’ directive output may be truncated
> writing up to 127 bytes into a region of size between 62 and 189
> [-Werror=format-truncation=]
> 649 9063 | snprintf(cmd, sizeof(cmd), "%c:%s %s",
> 650 | ^~
> 651 In file included from /usr/include/stdio.h:867,
> 652 from libbpf.c:17:
> 653 /usr/include/x86_64-linux-gnu/bits/stdio2.h:67:10: note:
> ‘__builtin___snprintf_chk’ output between 4 and 258 bytes into a
> destination of size 192
> 654 67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> 655 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 656 68 | __bos (__s), __fmt, __va_arg_pack ());
> 657 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

That happened because:

+	char cmd[192] = {}, probename[128] = {}, probefunc[128] = {};

was changed from original patch without initializing those to '\0' instead.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH bpf-next] libbpf: fix build error introduced by legacy kprobe feature
  2021-09-14 15:42               ` Rafael David Tinoco
@ 2021-09-14 20:34                 ` Rafael David Tinoco
  2021-09-14 21:04                   ` Alexei Starovoitov
  2021-09-14 21:35                 ` [PATCH bpf-next v2] " Rafael David Tinoco
  1 sibling, 1 reply; 40+ messages in thread
From: Rafael David Tinoco @ 2021-09-14 20:34 UTC (permalink / raw)
  To: rafaeldtinoco; +Cc: andrii.nakryiko, bpf, sunyucong

Fix commit 467b3225553a ("libbpf: Introduce legacy kprobe events
support") build issue under FORTIFY_SOURCE.

Reported-by: sunyucong@gmail.com
Cc: andrii.nakryiko@gmail.com
Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6ecfdc1fa7ba..b45eab3d30cd 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8997,7 +8997,7 @@ static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_
 {
 	int fd, ret = 0;
 	pid_t p = getpid();
-	char cmd[192], probename[128], probefunc[128];
+	char cmd[192] = "\0", probename[128] = "\0", probefunc[128] = "\0";
 	const char *file = "/sys/kernel/debug/tracing/kprobe_events";
 
 	if (retprobe)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next] libbpf: fix build error introduced by legacy kprobe feature
  2021-09-14 20:34                 ` [PATCH bpf-next] libbpf: fix build error introduced by legacy kprobe feature Rafael David Tinoco
@ 2021-09-14 21:04                   ` Alexei Starovoitov
  0 siblings, 0 replies; 40+ messages in thread
From: Alexei Starovoitov @ 2021-09-14 21:04 UTC (permalink / raw)
  To: Rafael David Tinoco; +Cc: Andrii Nakryiko, bpf, sunyucong

On Tue, Sep 14, 2021 at 1:36 PM Rafael David Tinoco
<rafaeldtinoco@gmail.com> wrote:
>
> Fix commit 467b3225553a ("libbpf: Introduce legacy kprobe events
> support") build issue under FORTIFY_SOURCE.
>
> Reported-by: sunyucong@gmail.com
> Cc: andrii.nakryiko@gmail.com
> Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
> ---
>  tools/lib/bpf/libbpf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 6ecfdc1fa7ba..b45eab3d30cd 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -8997,7 +8997,7 @@ static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_
>  {
>         int fd, ret = 0;
>         pid_t p = getpid();
> -       char cmd[192], probename[128], probefunc[128];
> +       char cmd[192] = "\0", probename[128] = "\0", probefunc[128] = "\0";

It doesn't look that it solves the issue:
https://github.com/kernel-patches/bpf/runs/3603448190

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH bpf-next v2] libbpf: fix build error introduced by legacy kprobe feature
  2021-09-14 15:42               ` Rafael David Tinoco
  2021-09-14 20:34                 ` [PATCH bpf-next] libbpf: fix build error introduced by legacy kprobe feature Rafael David Tinoco
@ 2021-09-14 21:35                 ` Rafael David Tinoco
  2021-09-14 21:39                   ` Rafael David Tinoco
  1 sibling, 1 reply; 40+ messages in thread
From: Rafael David Tinoco @ 2021-09-14 21:35 UTC (permalink / raw)
  To: rafaeldtinoco; +Cc: andrii.nakryiko, bpf, sunyucong, alexei.starovoitov

Fix commit 467b3225553a ("libbpf: Introduce legacy kprobe events
support") build issue under FORTIFY_SOURCE.

Reported-by: sunyucong@gmail.com
Cc: andrii.nakryiko@gmail.com
Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6ecfdc1fa7ba..d962796024c8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8997,7 +8997,7 @@ static int poke_kprobe_events(bool add, const char *name, bool retprobe, uint64_
 {
 	int fd, ret = 0;
 	pid_t p = getpid();
-	char cmd[192], probename[128], probefunc[128];
+	char cmd[288] = "\0", probename[128] = "\0", probefunc[128] = "\0";
 	const char *file = "/sys/kernel/debug/tracing/kprobe_events";
 
 	if (retprobe)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v2] libbpf: fix build error introduced by legacy kprobe feature
  2021-09-14 21:35                 ` [PATCH bpf-next v2] " Rafael David Tinoco
@ 2021-09-14 21:39                   ` Rafael David Tinoco
  2021-09-14 21:48                     ` Andrii Nakryiko
  0 siblings, 1 reply; 40+ messages in thread
From: Rafael David Tinoco @ 2021-09-14 21:39 UTC (permalink / raw)
  To: Rafael David Tinoco; +Cc: Andrii Nakryiko, bpf, sunyucong, alexei.starovoitov


> -	char cmd[192], probename[128], probefunc[128];
> +	char cmd[288] = "\0", probename[128] = "\0", probefunc[128] = "\0";
> 	const char *file = "/sys/kernel/debug/tracing/kprobe_events";

I had gcc-10 with:

libbpf.c: In function ‘poke_kprobe_events’:
libbpf.c:9012:37: error: ‘%s’ directive output may be truncated writing up to 127 bytes into a region of size between 62 and 189 [-Werror=format-truncation=]
 9012 |   snprintf(cmd, sizeof(cmd), "%c:%s %s",
      |                                     ^~
In file included from /usr/include/stdio.h:866,
                 from libbpf.c:17:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 4 and 258 bytes into a destination of size 192
   71 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   72 |        __glibc_objsize (__s), __fmt,
      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   73 |        __va_arg_pack ());
      |        ~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

locally AND it fixed the issue for me but Alexei reported:

https://github.com/kernel-patches/bpf/runs/3603448190

with a truncation of max 258 bytes. I raised cmd size to 288. 

Let's see if that fixes the issue.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH bpf-next v2] libbpf: fix build error introduced by legacy kprobe feature
  2021-09-14 21:39                   ` Rafael David Tinoco
@ 2021-09-14 21:48                     ` Andrii Nakryiko
  0 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2021-09-14 21:48 UTC (permalink / raw)
  To: Rafael David Tinoco; +Cc: bpf, Yucong Sun, Alexei Starovoitov

On Tue, Sep 14, 2021 at 2:39 PM Rafael David Tinoco
<rafaeldtinoco@gmail.com> wrote:
>
>
> > -     char cmd[192], probename[128], probefunc[128];
> > +     char cmd[288] = "\0", probename[128] = "\0", probefunc[128] = "\0";
> >       const char *file = "/sys/kernel/debug/tracing/kprobe_events";
>
> I had gcc-10 with:
>
> libbpf.c: In function ‘poke_kprobe_events’:
> libbpf.c:9012:37: error: ‘%s’ directive output may be truncated writing up to 127 bytes into a region of size between 62 and 189 [-Werror=format-truncation=]
>  9012 |   snprintf(cmd, sizeof(cmd), "%c:%s %s",
>       |                                     ^~
> In file included from /usr/include/stdio.h:866,
>                  from libbpf.c:17:
> /usr/include/x86_64-linux-gnu/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 4 and 258 bytes into a destination of size 192
>    71 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    72 |        __glibc_objsize (__s), __fmt,
>       |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    73 |        __va_arg_pack ());
>       |        ~~~~~~~~~~~~~~~~~
> cc1: all warnings being treated as errors
>
> locally AND it fixed the issue for me but Alexei reported:
>
> https://github.com/kernel-patches/bpf/runs/3603448190
>
> with a truncation of max 258 bytes. I raised cmd size to 288.

Right, if offset != 0 GCC is able to deduce that sizeof(probename) +
sizeof(probefunc) add up to 256 (plus few more characters in format
string) and that will be truncated in cmd. It completely ignores such
possibility when offset == 0 and we use name as is. Either way, I
bumped the cmd size to 256 and force-pushed. It should build fine now.
Sorry for delay.

>
> Let's see if that fixes the issue.
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-09-14 21:48 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30  5:33 [PATCH v3 bpf-next 00/14] BPF perf link and user-provided bpf_cookie Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
2021-07-30 17:19   ` Yonghong Song
2021-08-09 22:43   ` Daniel Borkmann
2021-08-10  0:28     ` Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
2021-08-09 23:00   ` Daniel Borkmann
2021-08-10  0:36     ` Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
2021-07-30 17:24   ` Yonghong Song
2021-07-30  5:34 ` [PATCH v3 bpf-next 05/14] bpf: allow to specify user-provided bpf_cookie for BPF perf links Andrii Nakryiko
2021-07-30 22:02   ` Yonghong Song
2021-08-09 23:30   ` Daniel Borkmann
2021-08-10  0:52     ` Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 06/14] bpf: add bpf_get_attach_cookie() BPF helper to access bpf_cookie value Andrii Nakryiko
2021-07-30 22:13   ` Yonghong Song
2021-07-30  5:34 ` [PATCH v3 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
2021-07-30 22:31   ` Yonghong Song
2021-07-30  5:34 ` [PATCH v3 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 10/14] libbpf: add bpf_cookie support to bpf_link_create() API Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 11/14] libbpf: add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h} Andrii Nakryiko
2021-07-30  5:34 ` [PATCH v3 bpf-next 14/14] selftests/bpf: add bpf_cookie selftests for high-level APIs Andrii Nakryiko
2021-08-01  5:11 ` [PATCH bpf-next v4] libbpf: introduce legacy kprobe events support Rafael David Tinoco
2021-08-06 22:29   ` Andrii Nakryiko
2021-09-12  6:48     ` [PATCH bpf-next v5] " Rafael David Tinoco
2021-09-12  6:52       ` Rafael David Tinoco
2021-09-14  4:44       ` Andrii Nakryiko
2021-09-14  5:02         ` Rafael David Tinoco
2021-09-14  5:21           ` Andrii Nakryiko
2021-09-14 14:27             ` sunyucong
2021-09-14 15:42               ` Rafael David Tinoco
2021-09-14 20:34                 ` [PATCH bpf-next] libbpf: fix build error introduced by legacy kprobe feature Rafael David Tinoco
2021-09-14 21:04                   ` Alexei Starovoitov
2021-09-14 21:35                 ` [PATCH bpf-next v2] " Rafael David Tinoco
2021-09-14 21:39                   ` Rafael David Tinoco
2021-09-14 21:48                     ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.